Advanced data storage and processing technologies allow to accumulate logs, network flows and system events from various sources in terabytes of heterogeneous data. This paper presents the state of the art in data pre-processing, feature selection, and the application of a variety of machine learning methods for intrusion detection. It outlines the main challenges in big data analytics and the opportunities provided by combining the outputs of several methods to increase the accuracy of detection and decrease the number of false alarms. The authors propose an architecture of an intrusion detection system combining offline machine learning and dynamic processing of data streams.
This paper is included in the program of the Second Scientific Conference "Digital Transformation, Cyber Security and Resilience DIGILIENCE 2020 and will be published in the post-conference volume.