Data loss prevention (DLP) is a set of practices (and products) that ensure that an organisation's sensitive or critical data remains available to authorised users and isn't shared with or available to unauthorised users.
The term as been around for some time — in fact, CSO columnist Jon Oltsik, an analyst at Enterprise Strategy Group, derided it as outdated a decade ago — but it has stuck. And with many companies building their entire business model around the collection and analysis of data, organisations need to implement a rigorous defence of that data to match its increasing value.
What is the purpose of DLP?
DLP vendor Digital Guardian outlines the three main use cases for DLP in a blog post:
1 - Protecting personally identifying information, and ensuring legal compliance:
Many organisations have massive databases full of potentially sensitive information about their customers and business contacts, ranging from email addresses to medical and financial records, that could cause real harm if they fell into the wrong hands. You need to ensure that data stays safe, not just because it's the right thing to do, but because host of laws, from HIPAA to GDPR to CCPA, require that you do, and mandate some the ways in which you need to do it.
2 - Protecting intellectual property:
Your organisation almost certainly has intellectual property and trade secrets that you want to keep out of the hands of competitors. DLP aims to prevent that data from being pilfered via corporate espionage or inadvertently exposed online.
3 - Getting visibility into your data:
Part of the process of locking your data down involves figuring out where your data lives in your infrastructure and how it moves around. In the age of public and hybrid clouds, this can be a complex task, and DLP tools have the added benefit of giving you a big-picture look into your own data infrastructure.
Why is DLP important?
DLP's importance is borne out by the alarming results of data not being adequately protected. 2019 was deemed the "worst year on record for breaches", with the number of records being exposed numbering in the billions. IBM pegged the average cost of a data breach at $3.92 million. In addition to the increased frequency and value of data breaches, Digital Guardian outlines a number of reasons why DLP services are being more frequently adopted by organisations.
The need for regulatory compliance plays a big part, as does the increasing power and responsibility of CISOs, who are in frequent contact with CEOs and other leadership and bring visibility to security issues like data protection. In addition, many DLP offerings are hosted services, which makes them appealing to companies who don't have the in-house staff to create and impose their own DLP policies.
How data loss prevention works
As Geekflare succinctly puts it, DLP can be boiled down to a simple pair of directives: identifying sensitive data that needs to be protected, and then preventing its loss. Obviously, the devil is in the details. The task of identifying sensitive data can be tricky, as data can exist in several different states in your infrastructure:
- Data in use: Active data in RAM, cache memory, or CPU registers
- Data in motion: Data being transmitted via a network, either one that's internal and secure or across the public internet
- Data at rest: Data stored in a database, on a filesystem, or in some sort of backup storage infrastructure
Enterprise DLP solutions are all-encompassing tools that aim to protect data in all of these states, whereas integrated DLP solutions might focus on one state, or could be integrated into a separate single-purpose tool. For instance, Microsoft's Exchange Server has DLP capabilities integrated into it specifically to prevent data loss via email.
At any rate, DLP solutions deploy agent programs to search through data under their purview. These programs use a variety of DLP techniques to sniff out data that's sensitive or worthy of protection.
Sometimes this involves looking for copies of documents or data you've supplied, and other times it involves combing through the haystack of your data looking for needles of sensitive information. McAfee's cloud security blog lays out some of these techniques, which include:
- Rule-based matching or regular expressions: Agents use known patterns to find data that matches specific rules — 16-digit numbers are usually credit card numbers, for instance, and 9-digit numbers are usually social security numbers. This is often a first pass to mark documents for later analysis.
- Database fingerprinting or exact data matching: Agents look for exact matches to pre-supplied structured data.
- Exact file matching: Agents search for documents based on their hashes, rather than their contents.
- Partial document matching: Agents look for files that partially match presupplied patterns. For instance, different versions of a form filled out by different users will have the same skeleton, which can be used to fingerprint the file.
- Statistical analysis: Some DLP solutions use machine learning or Bayesian analysis to try to identify sensitive data. You'll need a large volume of data to train the system, which might still be prone to false positives and negatives.
Most DLP solutions will also let you build your own custom combinations of rules to seek out data specific to your enterprise.
Once your DLP solution has identified sensitive data, it needs to know how to handle that data. But that's more than a technical problem. Your organisation needs to set a DLP strategy to determine how different kinds of data should be treated and what the responsibilities of internal and external users are around that data.
You'll want to be particularly careful to strike a balance between protecting your data and making your organisation's employees' jobs overly cumbersome. Digital Guardian has a great guide for developing an organisational DLP policy.
Your strategy will then inform the DLP policies and DLP procedures you'll implement with your DLP solution. You can think of those policies and procedures as the technical expression of the strategy that your organisation develops. This process obviously varies from product to product; Microsoft's Exchange documentation outlines how you would do it for that platform and is illustrative of how the process works.
Finally, when your solution identifies an action that violates one of the policies you've laid down, it will implement DLP security controls with an aim towards preventing data loss.
For instance, if your DLP solution detects a sensitive file attached to an email, it may pop up a warning to the sender or prevent the email from being sent altogether. If sensitive data is being exfiltrated over the network, the DLP solution could send an alert to an administrator or just cut off network access.
As we noted above, part of the reason for increased corporate interest in DLP is the rising power of CISOs, and if there's one thing a CISO likes, it's hard numbers that demonstrate how a new security initiative is performing. Security is notoriously hard to quantify — how do you count the dogs that don't bark? — but CISO Platform offers some potential metrics you can use to assess the success of your DLP rollout:
- Number of policy exceptions granted: Too many may indicate that you've set a policy too strict for your employees to do their jobs properly — or that employees are working around your DLP policies in an unsafe manner.
- Number of false positives generated: Ideally, this number should be zero, though in practice that's difficult to achieve. But this number is a good indicator of how well constructed your policies and procedures are, and how good a job your solution is doing analysing your data.
- Mean time to respond to alerts: This is a good indication of how well integrated your DLP system is with your overall security posture, and whether your security team takes DLP alerts seriously.
- Number of unmanaged devices on the network, number of databases not yet fingerprinted, and number of databases and data residents not yet classified: If any of these numbers are higher than zero, your rollout is not yet complete. If some of these uncatalogued systems have been added to your network after you rolled out your DLP solution, that's a sign that your procedures for building onto your infrastructure don't include integration with your DLP policies.