Unsupervised attack pattern detection in cyber-security using Bayesian topic modelling
Lightning Talk
topic modelling; anomaly detection; unsupervised learning
Cyber-systems are constantly under threat of intrusion attempts. Attacks are usually carried out with one underlying specific intent, or from groups of actors with similar objectives. Therefore, discovering such patterns is extremely valuable to threat experts. From a statistical point of view, this objective translates into a clustering task. This talk explores Bayesian topic models for clustering session data collected on honeypots, particular hosts designed to entice malicious intruders. These session commands provide a rare insight into the operational modes of cyber attackers, such as their automated or interactive nature, the individual scripting styles and their overall objectives. The main practical implications of clustering the sessions are two-fold: finding similar groups and identifying outliers. An array of Bayesian models are considered, suitably adapted to the challenges encountered with computer network data. In particular, the concepts of primary topics, session-level and command-level topics are introduced, along with a secondary topic for instructions representing common high frequency commands. Furthermore, the proposed method is extended to allow for an unbounded vocabulary size and number of latent intents. The methodologies are used to discover an unusual MIRAI variant which attempts to take over existing coin miner infrastructure.