Back in 2015 – following an executive bloodbath and shortly before it would be deemed the world’s most dangerous bank by the International Monetary Fund (IMF) – a small team of engineers in Deutsche Bank’s London office were tasked by their new management with transforming the bank into operating “everything-as-a-service.”
Now, three years on, those engineers have built Fabric, an internal platform-as-a-service (PaaS) that is already being used by thousands of Deutsche Bank employees to run thousands of applications, all with the aim of running 80 percent of workloads on Fabric by 2022.
Built on top of Red Hat’s OpenShift PaaS, Fabric incorporates a slew of features specific to the highly regulated banking industry to accelerate application development and deployment.
It’s a rapid success story for a highly leveraged and highly regulated international bank – one in the midst of a turnaround effort and that registered a loss of €5.7 billion ($7.4 billion) last year – and one that even has management considering whether Fabric is good enough to sell to rival banks to eventually turn its technology investments into a revenue stream.
A key problem Fabric helped solve was one that confronted the bank’s new leadership when it arrived in 2015: a sizeable virtual machine (VM) estate that was only being utilised at a rate of around eight percent.
“The CIOs got together and realised they had a problem to fix because this is just money that’s bleeding out to the organisation,” platform-as-a-service product owner at Deutsche Bank, Emma Williamson, said during a recent Red Hat OpenShift Commons event in London.
So the bank set out to drastically modernise its application estate around cloud native technologies like containers and Kubernetes, all with the aim of cutting this waste tied to its legacy platforms and help drive a broader shift towards the cloud.
Here’s how they went about it.
From RFP to PoC to MVP
The bank started by tendering a request for proposal (RFP) for a container platform to form the bedrock of its new platform, kickstarting a broader shift towards running the bank on more flexible and scalable public cloud infrastructure.
The bank also wanted a new dev, test, and deployment environment that wouldn’t require the sort of heavy lifting its old homegrown stack of Java-based apps running on the likes of WebLogic and jBoss required.
It quickly settled on Red Hat’s OpenShift PaaS (acquired by IBM last year as part of its $34 billion Red Hat deal) – over the likes of Salesforce’s Force.com, the now-defunct IBM Bluemix, and VMware’s Cloud Foundry – and started spinning up a proof of concept (PoC), followed swiftly by a minimum viable product, or platform in this case (MVP).
In the process, Deutsche Bank engineers wove in banking-specific functionality and added elements from Avi Networks’ application delivery platform and Ansible configuration management.
“To think that from the middle of 2015, to the beginning of 2016, we’ve gone from RFP, to PoC to MVP, that was ridiculous. Nothing at Deutsche Bank was ever done that fast and I don’t think anything has been done that fast since then,” Dipesh Patel, a senior engineer on the PaaS professional services team at Deutsche Bank said on stage at the same Red Hat event.
Deutsche Bank took a fairly unusual, decentralised approach to picking which workloads and applications to port to Fabric first, leaving it to the engineers themselves.
“We needed that buy-in at the beginning from the CIOs and developers to want to onboard, because we can’t mandate that. All we can do is build it and hope that they come,” Williamson said.
That doesn’t mean her team doesn’t make recommendations. They would prefer applications be broken down into micro-services, if possible, but some teams have opted to lift and shift monolithic workloads.
To cut waste, engineers aren’t limited in terms of cores, but they are limited to 16GB of memory before they have to justify any additional capacity.
Previously the bank was procuring “the highest spec box, with the most CPU and memory that the application budget would cover. Most often these high spec machines would run a single workload only, grossly underutilising the machine resources,” Williamson said.
Now, “we expect them to go through a little bit of a trial to up that memory, because we don’t want just raw waste on our clusters, especially if people are having requests in place,” Williamson said. “What I don’t want to do is keep adding to the clusters because I’m just going to end up in the space where we were back in 2015, where we had eight or nine per cent utilisation, which is just ridiculous.”
The results of this shift have been promising so far, with 49 per cent of the bank’s applications now being hosted on just 10 per cent of the bank’s infrastructure, leading to a 60 per cent utilisation rate and “significant” cost saving.
“Everything that we do as a bank is about fewer resources, lower cost. We had to have that money saved, we had to have the VM exits, get rid of waste estate. Which we’re still journeying through,” Williamson admits.
Of course, being a big bank, compliance and governance also had to be built into the platform.
Jeremy Crawford, head of Fabric engineering and operations, talked about the importance of having Red Hat’s enterprise hardened version of the OpenShift Container Platformto help put management’s minds at ease.
“That’s part of the value add that an enterprise platform gives you over and above having someone just go off and think they can do this on a [managed Kubernetes service] somewhere outside of any governance, because they’re not going to get particularly far,” Crawford said.
This is also why the bank integrated Avi Networks software – which was acquired by VMware in July last year – into the Fabric platform. Avi Networks provides automated services like IP whitelisting and network security rules to help ensure services are only being accessed by authorised personnel.
“Without Avi I don’t think the platform would have launched or been as successful, quickly, as it has been because it’s basically fast tracking that ability to get into production,” Crawford said.
Better utilisation, too much speed
Deutsche did go through some growing pains with the new platform. The bank went big on the training and community building aspects early, bringing in Red Hat to help with training and quickly standing up community forums and Symphony chats for internal troubleshooting to happen.
“We opened it up as a Symphony chat channel, which was supposed to be community help and users helping users like Stack Overflow, Slack, whatever,” Patel explained. “It turned into: ‘Hi Fabric team, I have this incident number, can you solve it for us?’ So we are trying to retrospectively go back and fix all of the habits that have appeared.”
Williamson and her team certainly struggled with the sheer level of adoption early on, as developers proved to be a demanding bunch.
“It feels like we weren’t able to give them enough, quickly enough,” she admits. “We’re a tiny team compared to the rest of the bank. I would have slowed down. I wouldn’t have done everything as fast, if I had my time again.”
“We launched this massive image of a Ferrari, but we had a little Ford Fiesta going: ‘Oh, my God, I don’t know if I can cope’,” she added.
Removing the training wheels
That being said, now that the training wheels are off for the developers, they are taking full advantage.
“The speed that they’re able to do things is vastly improved,” Williamson said. One team, for example, was able to get a new application into production in just three weeks, whereas Williamson said she has been in situations “where I’ve had to wait 14 months for a VM to come in.”
In terms of results the bank is now running 49 per cent of applications on PaaS, according to Williamson. Could they ever get to 100 per cent?
“No, we will never get to 100 per cent,” she said. “But we don’t want to get to 100 per cent. There’s always going to be applications that can’t go on Fabric. Applications that can’t go to cloud.
"Those platinum mainframe applications, rightfully so, because of the capabilities that they need. I think our stretch target is probably around 80 per cent by the end of 2022 and I think we can do it.”
Still, the learning curve is steep when it comes to cloud native technology like containers, with teams “that haven’t been Kubernetes or OpenShift users and in a lot of cases are not even out and out developers but support staff tasked to deploy applications onto a developer-centric platform. That’s our clientele, right? We have to work with that,” Patel said.
Documentation is a good start, but of course “as engineers none of us read documentation as it is, even if it’s step by step, they don’t follow it.” he admits.
Next stop: the cloud
Currently Fabric is operating across four regions: US, UK, APAC, and Germany on a variety of infrastructure from three unnamed on-premises providers and Microsoft Azure public cloud, but with a private network. The aim is to move as much to the cloud as possible in the future.
The issue with the old school, on-premises providers hinges on paperwork. “The contracts are a problem. We’ve got reduced SLAs; they certainly can’t compete with [those of] the cloud providers. [Then there’s] the inconsistent configuration, because if you don’t have control over the base image like you do in the cloud, you don’t own the full stack and have problems,” Crawford explained.
No matter what infrastructure Fabric resides on, Crawford and his team have essentially become an internal PaaS provider, with all of the pressure that brings.
“This is why people love the platform – we take away all the pain, and we have to deal with all the subtleties and the differences between the different providers and then you just consume the platform,” Crawford said. “Unfortunately, that does make it a little bit difficult to operate in certain cases.”
The transition to service provider has incurred some major changes to the platform team, bringing in new engineering, operations, product management, professional services, program management, and site reliability expertise.
“We had three guys when we started this, so we built a whole brand new team, whole new way of working, new people, new thoughts,” Patel said.
Finally, Deutsche appears to be serious about the possibility of commercialising its Fabric platform for other financial services companies to procure it as a service.
This follows a broader trend in the industry of banks looking to monetise their investment in cloud native technology, essentially becoming software vendors themselves.
This can be seen at Goldman Sachs, which is exploring the possibility of spinning out a banking-as-a-service offering to sell to other financial services firms, and the UK fintech OakNorth, which white labels its own analytical intelligence platform for other banks to offer credit decisioning services.
Patel talked about how Deutsche Bank has thought about Fabric as a standalone product and brand from the start, so that “we could potentially, one day, maybe, if everything aligned properly, sell to other [financial services companies]. We could probably package this up and sell it as we’ve got it now, today,” he said.
“You have to aim high and that’s the idea,” Crawford added. “We want to manage this as a product and it needs to be, as a consequence, mature in terms of documentation and defining those boundaries. How you consume it, making it easy to consume, is a big, big push as well.”