Today’s threat landscape is massive and expanding at an alarming rate—especially given the explosion of mobile devices, hybrid cloud environments, DevOps, and containers. Organizations of all sizes, and across all industries, struggle to keep up with the volume of legitimate exploits and threats. They don’t have the time or manpower to manually sift through false positives as well. Machine learning may be the only realistic answer to reducing instances of benign or innocuous activity being detected as a suspicious or malicious action.
The Threat Landscape
Businesses today face a threat landscape that produces an average of 450,000 new potential threats every day. That’s almost 165 million new threats per year. It is virtually impossible for any organization to handle the combined volume of legitimate threats and false positives using manual processes or traditional tools. That effort needs to adaptively identify and respond to threats in real-time as the threats stream in. More people or legacy technology cannot address this challenge.
Machine learning (ML) and artificial intelligence (AI) are crucial to automating threat detection and blocking. Machine learning and AI work together to quickly, and cost-effectively identify and investigate alerts. A recursive neural network (RNN)—a type of deep learning AI—can dramatically increase the accuracy of threat detection, reducing or eliminating false positives. Unlike other technology solutions, machine learning can be tuned and trained to get better over time using the RNN.
Bye-Bye False Positives
Research is showing the promise of machine learning and AI to handle a continuously changing, and growing threat environment. In early March, Wallarm founder and CEO Ivan Novikov presented “Bye-Bye False Positives” at BSides San Francisco. The research-based presentation described how a neural network built on machine learning could be trained to learn from false positive detections and continuously tune the system to prevent future occurrences.
The goal, according to Novikov, is to train a neural network to improve existing algorithms. Traditional false positive response processes–like CAPTCHA or email alerts to IT support teams—could be replaced with automatic rule tuning by the machine learning network. The trick is to get the machine learning system to be indifferent to the detection logic that made the initial decision. Novikov stressed that the goal is not to create yet another form of detection logic. The new goal is learning, adapting, and responding better with each iterated threat or false positive.
Novikov explained that the development of a machine learning neural network that can reduce false positives starts with three basic questions. First, what is the attack payload? Next, how can we detect that payload with 100 percent accuracy? Finally, is it possible to implement that detection through machine learning? If not, why?
For the purposes of the research, Novikov used a Turing Machine example. A Turning Machine—a simple mathematical model that can simulate the logic of any algorithm—is vulnerable if it can potentially interpret input data as a set of instructions rather than simply as data. An attack payload, in that context, would be any input data that contains instructions or commands.
Novikov explained the neural network logic in detail–walking through some of the challenges of dealing with various parsers and the tradeoffs between memory resources and accuracy that are necessary to streamline performance. Watch the full presentation below for a more technical explanation.
Novikov’s presentation provided a glimpse into how data scientists are innovating around the key questions and challenges of cybersecurity. He also welcomed the audience to test this new approach for themselves. Wallarm has created and open-sourced an implementation of this alternative approach to attack detection—applying machine learning to develop a neural network predictive model. The project—called WallNet—is available on Github. Check it out and help apply machine learning to create a safer data landscape.