When evaluating the design of your backup systems or developing the design of a new backup and recovery system, there are arguably only two metrics that matter: how fast you can recover, and how much data you will lose when you recover. If you build your design around the agreed-upon numbers for these two metrics, and then repeatedly test that you are able to meet those metrics in a recovery, you’ll be in good shape.
The problem is that few people know what these metrics are for their organisation. This isn’t a matter of ignorance, though. They don’t know what they are because no one ever created the metrics in the first place. And if you don’t have agreed-upon metrics (also known as service levels), every recovery will be a failure because it will be judged against the unrealistic metrics in everyone’s heads. With the exception of those who are intimately familiar with the backup and disaster recovery system, most people have no idea how long recoveries actually take.
First, let’s understand what these two metrics are, and then let’s discuss how to get others to agree to them.
The two metrics are recovery time objective (RTO) and recovery point objective (RPO). RTO is about how fast you need your operations back on track after a disaster, while RPO is essentially the data you can afford to lose when things go south. For example, you might say that we need to be able to restore our environment to fully operational status within four hours of any kind of outage, and we can only lose one hour’s worth of data. That is a four-hour RTO and a one-hour RPO.
The most important thing for you to know about RTO and RPO is that it is not your job to set these values. RTO and RPO come from the top and must be determined by a combination of stakeholder needs and the financial cost of meeting those needs. It is a business decision, not a technical one.
To get everyone in your organisation to agree to metrics, you first must find all those who would have an opinion on such things. These are your stakeholders and subject-matter experts. These folks hold the keys to understanding what each department within your organisation really needs to have safeguarded and how crucial it is to keep things up and running. These people are essentially your customers, and you need to find out what they want for RTO and RPO.
We've got the data creators, for example. These are the people responsible for generating data, and they come from various corners of your organisation. From production and operations to product management, business intelligence, and data services, these are the folks who can give you the lowdown on how data flows and its rate of change.
Next are the executives and the decision-makers. They're the ones who can tell you just how fast your organisation's wheels are turning. They will also tell you they want everything protected all the time until you show them the price tag for a fully redundant, always-active system. That's where the real discussion about costs comes into play, and that's how you will eventually determine your RTO and RPO. This is why executives must be part of the process.
Compliance and governance are also crucial. You don't want to run afoul of the law, especially with the likes of GDPR and CCPA in the mix. So, find yourself an expert from the legal or governance teams to make sure you're playing by the rules. They'll make sure you can access data in backups and archives without any legal hiccups.
Once you've got your squad of subject-matter experts from across the organisation, it's time to sit down with them and start picking their brains. Be respectful of their time and keep things concise. It’s important for stakeholders to understand the critical concepts without getting lost in the technical jargon.
Use a story to explain things and throw in some real-world examples. Ask them how they'd cope if a piece of their data suddenly vanished or was taken away by a ransomware attack.
When you've got all their input, it's time to lay it all out on the table. Create a presentation that outlines the problem you're solving and the requirements for each department. It would also be helpful if you could present ballpark numbers of what it would cost to meet each of the RTOs and RPOs proposed by different parts of the organisation.
The goal is to get everyone to agree to an RTO and RPO and associated budget range. If you can meet an RTO and RTO of zero, and your executives are willing to pay the kind of money it requires, then that will be your RTO and RPO. What is more likely is they will settle for a more relaxed RTO and RPO because they cannot justify the cost of meeting a tighter RTO and RPO. Another possibility is that they will have different RTOs and RPOs for different parts of the organisation, where the loss of one part of the business costs the organisation more. Payment processing might get a tighter RTO and RPO than, say, human resources.
Document these metrics in a service level agreement, and then get signoff from people acknowledging that these are the metrics by which you will be judged. Then you can begin the process of designing or redesigning your backup system to meet those metrics. Once you’ve done that, you will need to regularly test your recoveries to demonstrate compliance with the agreed-upon metrics.
It might take months or even years, but it's important to get to a point where your backup and DR systems are able to meet RTOs and RPOs that are agreed upon by the organisation.