CloudOps: A framework for optimising cloud operations
- 25 May, 2022 05:45
Anyone who’s involved in the creation of software products should be well familiar with DevOps, a set of practices that combines software development and IT operations, with the goal of shortening the development lifecycle and providing continuous delivery and high-quality products.
A related concept, CloudOps, for “cloud operations,” has emerged as enterprises increasingly move application development and workloads to the cloud, and those cloud outlays become more complex.
Here we examine what CloudOps is, how it can benefit an organisation, and the key issues customers should keep in mind when implementing CloudOps in the enterprise.
What is CloudOps?
CloudOps is an operations practice for managing the delivery, optimisation, and performance of IT services and workloads running in a cloud environment.
Whether an enterprise is operating with a multi-cloud, hybrid cloud, or private cloud strategy, CloudOps is intended to establish procedures and best practices for cloud-based processes, in much the same way DevOps does for application development and delivery.
CloudOps: A multi-layered framework for cloud operations
“Holistic CloudOps is a framework with several layers that help enterprises manage all aspects of their cloud ecosystem,” said Jason Hatch, vice president and cloud centre of excellence lead at consulting firm Capgemini.
One is a governance layer that includes activities such as financial operations — also known as FinOps — to control costs and manage budgeting for the cloud. “The governance layer should also contain the architecture standards on how and what is able to be deployed in a cloud, and have a way to programmatically enforce those standards,” Hatch added.
Other framework layers include the cloud application layer, which covers how an organization deploys and manages/monitors applications and application-specific services in the cloud; the cloud operations layer, for the deployment, management, monitoring, and operations of cloud services; and the cloud foundations layer, which includes core services such as identity, network management, logging, central backup management, infrastructure as code, and central monitoring functions.
“Spanning all these layers is the ‘security layer,’ which includes vulnerability and threat management, workload protection, and the integration into a company’s larger cybersecurity management function,” Hatch said.
Where CloudOps fits in the enterprise
The CloudOps model has particular relevance for application delivery, something many organisations are focusing on with digital initiatives aimed at increasing sales and enhancing customer experience.
“CloudOps bring together the overarching five responsibilities of building, deploying, operating, monitoring, and managing the functions of [web] application delivery in the cloud,” noted Suresh Kuppahally, executive vice president for engineering and operations at Replicon, a provider of cloud-based services.
Networking, computing, security, and storage are four key components that must be kept in mind during the initial build and design stage, Kuppahally said. “From there, companies deploy their application either automatically or through continuous integration and continuous delivery,” he said.
An organisation’s CloudOps team should also operate with a clear separation of duties and independence from engineering or product teams, Kuppahally said, adding that doing so enables CloudOps to bring out “the transparency and quality of service [QoS] accountability within an organisation.”
Benefits of CloudOps
The business benefits of CloudOps can be considerable, starting with an organisations overall disposition toward cloud services, explained Capgemini’s Hatch.
CloudOps “helps drive further adoption and usage of cloud within enterprises. If companies can effectively deploy, manage, and secure their cloud environments, it should increase their usage of cloud and provide the ability to experiment and innovate with new services and technology,” he said. “This, in turn, can make them more agile, provide faster time to market, and help drive innovation.”
Businesses that leverage CloudOps can also achieve better management and financial control over the growing number of cloud services they use, Hatch added.
“We continue to hear from customers that they are exceeding their cloud budgets and they either don’t know why or are unable to implement the controls to manage that,” Hatch said.
“Effective CloudOps [helps] to mitigate this. At a governance layer, we can implement better budgeting and financial tracking and optimisation. This is also facilitated at the operations level, with better automation in deployment and management.”
Another top benefit cited by clients of consulting firm Protiviti is the ability to automatically release authorised resources in the cloud, said Will Thomas, managing director at the firm, which helps organisations manage the growing complexities of the cloud.
Enhanced security is another key benefit of CloudOps, Thomas said, as the model “ensures alignment to security controls, standards, and/or frameworks with establishing policies that can restrict noncompliant actions while reporting on health and activities within the cloud.”
Thomas also believes that companies that practice CloudOps are better positioned to optimise their cloud environments because “a CloudOps engineer is going to focus on leveraging authorised resources within the cloud to modernise applications with the latest and greatest services,” he said.
Moreover, organisations deploying CloudOps can establish schedules for proper resource allocation based on performance and cost considerations; continually report and review metrics on cloud health; and support the proactive configuration of resources while maintaining regulatory compliance within the cloud, he added.
Replicon’s Kuppahally points to CloudOps’ ability to scale cloud services cost effectively without impacting QoS. “Aligning QoS goals and CloudOps investment is very strategic,” he said, as “a dedicated CloudOps team can be incentivised to manage operating costs, and hence will have a vested interest in reducing the operational costs.”
CloudOps in practice
Stretto is one company benefitting from its adoption of CloudOps. The bankruptcy services and technology firm, which serves the corporate and consumer bankruptcy sectors, identified a need for CloudOps practices early on and incorporated key principles into its applications and systems running in the cloud, said Stretto CTO George Tsounis.
“For example, we set hard, fast rules that we would only use infrastructure as code [IaC] practices for any deployment,” Tsounis said. “We achieved redundancy by deciding that all our applications/systems would always be run across two availability zones, so we leverage the cloud provider’s built-in high-availability capabilities.”
The key part of Stretto’s strategy is leveraging CloudOps practices to ensure a more proactive approach to its technology operations, Tsounis said.
“We prefer to empower our architects and engineers to create high-performing, self-healing, and resilient cloud-native solutions for our internal and external clients, rather than continuing to operate with a reactive approach,” he said.
The introduction of cloud services, and even the process of transitioning to serverless capabilities, comes with unique challenges, Tsounis said. “CloudOps is the strategy that helps us handle those challenges,” he added.
Among the benefits CloudOps has ultimately delivered for Stretto are cost reduction, scalability, automation, simplified disaster recovery, and seamless integration as the infrastructure becomes part of the application.
“Our teams have benefited with application improvements across the board where these CloudOps ideals have been adopted,” Tsounis said.
“CloudOps practices improve quality as well. This is made possible by leveraging IaC approaches to make the deployment and configuration of cloud infrastructure repeatable. We reduced configuration errors and now have consistent infrastructure configurations utilising IaC as we roll out applications through our various environments.”
Stretto has seen an approximately 20 per cent reduction in quality issues by removing manual configuration of its cloud infrastructure, Tsounis said.
“Leveraging CloupOps practices provides engineers with the confidence they need to know that application/system behaviour in pre-production environments will be the same when released to the production environment,” he said.
“In addition, we’ve seen overall IT operational improvement due to less service-desk and internal tickets, resulting from the quality improvements in our applications.”
Keeping up with an evolving methodology
Nothing stands still when it comes to cloud services and how they are used, so organisations employing CloudOps need to tweak their approaches on a regular basis to keep up with changes.
For many enterprises, this is still new territory with a learning curve they need to overcome.
“As more enterprises adopt true multi-cloud deployments, their CloudOps implementations also need to mature and scale,” Capgemini’s Hatch said. “Many customers manage their cloud landscapes in silos, with different tools and processes managing each cloud landscape, and minimal ability to view their entire cloud landscape holistically.”
To be more efficient and effective, “companies need to develop their CloudOps frameworks to be able to easily plug in new cloud providers and services while still providing the right levels of management, monitoring, and operational rigour,” Hatch said.
The way companies handle incident management in the cloud can also use improvement, Kuppahally said.
“This is an area where most of the CloudOps teams struggle,” he said. “They are flooded with both internal and external incidents, and lose track of effectively managing them. Having a dedicated program management [process] to streamline incident management triaging and prioritisation is one of the ways to mitigate risks.”
At the same time, organisations need to reduce the rate of false positive alarms for incidents.
“CloudOps teams drown when they can’t keep up with the high false alarms rates,” Kuppahally said. “Having an effective strategy and plan to reduce or eliminate false alarms is a very critical success factor.”
CloudOps could benefit from technologies such as artificial intelligence (AI) and machine learning, said Aref Matin, executive vice president and CTO at Wiley, a provider of research and education services.
“Through machine learning, CloudOps tools can help define enterprise-wide policies, detect and report anomalies, and take corrective actions in an automated fashion, to maintain cloud best practice policies,” Matin said.
Much like DevOps, CloudOps success rests heavily on developing a culture geared toward making the most of the framework and tools. And as more organisations move more work and processes to the cloud, they need to focus on building CloudOps expertise.
“Most clients live in a state of reaction when dealing with the cloud and cannot respond to events, changes, or requests for new services,” Protiviti’s Thomas said. “CloudOps establishes the structure for deployments enabled via automation, allows for monitoring, reviewing, and optimising existing resources, and examines corporate policies for alignment to the cloud.”
Stretto’s Tsounis agrees that organisations need “a broader understanding of the appropriate alignment of proper organisational structure, expertise, and collaboration [for] CloudOps to really work.”
“CloudOps isn’t a single team or department. The IT, security, architecture, and application teams need to collaborate and be aligned on common CloudOps practices,” the CTO said. “CloudOps doesn’t work well if these teams are working in silos.”
And based on his experience putting CloudOps in to practice, Tsounis believes organisations also need to have a better definition of the foundational skills required for CloudOps in order to be successful — and to not reinvent the wheel.
“The technology teams need to understand cloud-based architecture, networking, security, and automation,” he said. “Without foundational skills, the teams could risk implementing solutions where cloud services already exist.”