Data replication has stood the test of time, providing organisations with a reliable means of safeguarding critical information for decades. Replication creates redundant copies of vital data, ensuring its availability and resiliency in case of disasters or system failures. In this article, I will explore the intricacies of data replication, examining its fundamental components, types, and potential limitations.
Data replication starts with the selection of a source volume or filesystem that needs protection. This source volume might be a virtual disk, often referred to as a LUN (logical unit number), sourced from a storage array or volume manager. It may also take the form of a filesystem. Replication can occur either at the block level, a common practice due to its efficiency, or at the filesystem level, although the latter tends to be less favored for its relatively inferior performance.
Once the source is selected, you must choose another volume or filesystem from a distinct host to serve as the target for replication. Hopefully it will be positioned in a geographically separate location — a critical aspect of ensuring data redundancy and geographic diversity.
Organisations employ diverse replication systems offered by storage array vendors, volume manager vendors, filesystem vendors, and third-party providers. These solutions form the heart of the replication process, facilitating the synchronisation of data between the source and target volumes.
The initial synchronisation stage sets the foundation for replication by ensuring that the target volume mirrors the source volume's content. Once this milestone is achieved, the replication system diligently tracks and propagates every change that occurs on the source volume to the target volume. This continuous synchronisation, typically executed at the block level, ensures that data consistency is maintained across both volumes. How closely the target volume mirrors the source volume will be based on whether you use synchronous or asynchronous replication.
A synchronous replication system replicates any changes before acknowledging those changes to the application that made them. (When an application writes a block of data to a volume, it awaits an acknowledgment, or ACK, confirming the successful write to the source volume before proceeding to write the subsequent block.) This process resembles a two-phase commit in the database world, where both the write to the source volume and the copy of that write to the target volume are perceived as one atomic event.
The prime advantage of synchronous replication lies in its ability to ensure a high level of data protection. With the source and target volumes kept in constant synchronisation, the risk of data loss due to a disaster or failure is greatly diminished. However, there is a trade-off: The performance of the target system and the data replication path can introduce significant delays in ACK delivery, affecting application response times.
A poignant example of the performance implications of synchronous replication arose after the tragic events of 9/11. In response to the vulnerabilities exposed during the attacks, US regulators tried to mandate that financial organisations implement synchronous replication over distances exceeding 300 miles, aiming to enhance data protection and disaster recovery capabilities. However, the latency between the sites was deemed prohibitively high, ultimately leading to the abandonment of these plans.
In contrast, asynchronous replication takes a more pragmatic approach, deferring the immediate replication of changes in favor of queuing them for later transmission. Writes are split, and one of them is sent to the replication system, which adds it to the end of the queue. Depending on bandwidth and latency, the target volume may be anywhere from a few seconds to hours behind the source volume.
While asynchronous replication offers a favorable balance between performance and data protection, it does raise potential concerns. One key consideration is the risk of the replication process falling too far behind, leading to challenges in catching up with the ever-increasing backlog of changes. Under certain circumstances, certain applications may support write coalescing, a process where older writes are dropped to enable the replication system to catch up. Nonetheless, such practices must be approached with caution, as they can impact data consistency and recovery options.
While the preceding sections primarily focused on block-level replication, a similar concept extends to the concept of database replication. Here, the emphasis shifts from block-level replication to the replication of individual transactions between databases. As with other forms of replication, database replication is typically performed asynchronously, underlining its utility in safeguarding vital database records.
Replication has long been the method of choice for organisations seeking to protect mission-critical applications, driven by its ability to provide swift and efficient data recovery. Indeed, its real-time data synchronisation capabilities make it an indispensable tool in ensuring data availability during crises. However, it is essential to recognise that replication, when employed in isolation, comes with inherent limitations.
Perhaps the most glaring limitation lies in the absence of a "back button" in traditional replication systems. In the event of human errors, such as accidental data deletions or corruptions (or ransomware attacks), replication will faithfully propagate these actions to the target volume, leading to irretrievable data loss.
Consequently, relying solely on replication for data protection fails to adhere to the tried-and-true 3-2-1 rule: three copies of data across two different media, with one copy located offsite. It may look like it complies; however, since a single action can take out all copies, it does not comply with the “2” aspect of putting different copies on media with different risk profiles.
Another consideration pertains to the potential performance overhead introduced by replication. When coupling regular backups with data replication, data is effectively copied twice, resulting in a performance impact that may be deemed negligible in isolation but could accumulate when other factors come into play.
Data replication stands tall as a venerable data protection mechanism, empowering organisations with the ability to create real-time copies of vital data, bolstering data resiliency and continuity. Nevertheless, as we delve into its intricacies, we uncover its limitations and the essential role it plays within the broader tapestry of comprehensive data protection.
While replication excels at providing immediate data availability, it cannot be solely relied upon to safeguard against human errors, data corruption, cyberattacks, or the loss of multiple copies. Hence, it is crucial to complement replication with comprehensive data backup strategies, adhering to the 3-2-1 rule and incorporating versioning capabilities. (Snapshots are a great example of a companion tool that can make replication more valuable.)
By embracing a holistic approach to data protection, combining the power of data replication with robust backup practices, organisations can confidently navigate the digital landscape, safeguarding their most valuable asset: data. Emphasising the synergy between real-time replication and well-structured backups, businesses can confidently address the ever-evolving challenges of data protection and ensure the resilience of their operations.