data loss protection DLP natural language understanding NLU

Rethinking Data Loss Protection

Armorblox is a sponsor of TechSpective

There are many facets to cybersecurity and wide array of tools and solutions that organizations employ to filter, detect, and respond to potential malicious threats. Ultimately, though, the primary objective of most attacks is data, and the most important thing for an organization to protect is data. Whether it’s corporate financial data, research and development trade secrets, customer information, bank or credit card data, personal healthcare information, or any other sensitive data, the goal of cybersecurity is to protect data and prevent unauthorized access or theft.

Data loss protection—sometimes referred to as data loss prevention or DLP—solutions are designed to keep data safe from unauthorized access and ensure it is not exfiltrated. Whether we’re talking about someone inadvertently emailing a spreadsheet attachment with customer credit card information, or an employee like an Edward Snowden downloading terabytes of sensitive information, or an external attacker stealing confidential company financial data, DLP should ostensibly detect and prevent that activity. Unfortunately, legacy DLP solutions often fall short of that promise.

Challenges of Legacy DLP

The early attempts at DLP are typically either cumbersome, ineffective, or both. Most rely on some combination of standard access controls and manual classification of files, and/or pattern matching to identify things like credit card or Social Security numbers. File classification is tedious and easy to mess up or intentionally circumvent, and pattern matching is a clunky way to detect sensitive data.


“The traditional approach to DLP is like trying to do surgery with a sledgehammer,” agreed Arjun Sambamoorthy, co-founder of Armorblox. “For example, one company wanted to detect the term ‘MS’—an abbreviation for multiple sclerosis—to prevent exposure of healthcare related information for HIPAA compliance. They quickly discovered, however, that MS also happens to be the common abbreviation for Microsoft and occurs relatively frequently in files related to Microsoft Office.”

The scenario described by Sambamoorthy is not uncommon and creates an unnecessary amount of noise. Rather than helping to detect and prevent data loss, this rudimentary pattern matching creates additional work for the IT team to sift through all of the alerts to determine if there are any potential data loss issues.

The challenge with file classification is that it is a daunting task to do an initial review and assign classification levels to existing data when implementing a DLP solution, and then the burden is placed on employees to correctly classify new data as files are created each day. Employees may assign the wrong classification by accident, or they may intentionally mis-classify data specifically to circumvent the DLP filters. Either way, depending on data classification as a means of DLP is ineffective.

Using Natural Language Understanding for Better DLP

There is a better way to do data loss protection, though. If a data security expert were to personally review each file to understand the data it contains and the context in which it is used, decisions can be made on a file by file basis as data is generated, accessed, or transmitted.

Of course, it is impractical for an organization of any size to employ somebody—or even many people—for the purpose of intimately inspecting each file. However, thanks to artificial intelligence and machine learning, the same result can be achieved using Natural Language Understanding, or NLU.

NLU solves the problem of classification because it can analyze and determine the sensitivity of files without tagging or labeling them. NLU also solves the issue of rudimentary pattern matching because it uses algorithms to determine the contents and context of files. For example, when trying to protect healthcare data for HIPAA purposes, NLU can tell the difference between a file that references “cold” as in comfort or the temperature outside versus “cold” referencing the common illness.

NLU learns and improves over time as well. Starting from some default and initial parameters for what is considered sensitive by the organization, NLU monitors for unique attributes and patterns of communication what the baseline is for normal sharing of sensitive data so it can detect future anomalies and violations more efficiently.

Protecting data is crucial, and it is arguably the most important role of cybersecurity—but there are a number of issues with legacy DLP solutions. Machine learning and NLU provide organizations with a more intelligent approach to DLP to secure and protect data more effectively.

Comments are closed.

Scroll to Top