利用读时建模等数据分析能力,实现网络安全态势感知的落地

摘要:本文提出一种基于鸿鹄数据平台的网络安全态势感知系统,系统借助鸿鹄数据平台读时建模、时序处理、数据搜索等高效灵活的超大数据存储和分析处理能力,支持海量大数据存储、分类、统计到数据分析、关联、预测、判断的网络安全态势感知能力需求。以安全大数据为基础,从全局角度提升对安全威胁的发现识别、理解分析、响应处置能力,最终实现网络安全态势感知能力的落地。

关键词:网络安全态势感知  鸿鹄数据平台  读时建模  关联分析

1. 引言

Anderson在1980年发表的论文中首次提出基于日志进行安全审计的思想,此后,经过不断的发展已经形成了相对完善的理论,各大安全厂商也研发了可用的安全系统。

但随着高速网络的快速普及和大数据技术的普遍应用,各类流量监测系统、IDS、防火墙、终端监控系统等网络监控和防护设备,在运行过程中产生了大量有用的数据,如包数据、会话数据、日志、告警等,应用平台日志数量也呈现爆炸性的增长趋势,这些数据一定程度上反应了网络安全状态。

但由于不同的系统设备间缺乏协作,产生的数据格式以及详略程度也存在差别,因此无法对数据进行有效的融合分析,难以实现从整体和全局角度识别、分析入侵者的攻击行为,难以对网络整体安全态势全面、准确、细粒度的展现。基于此,网络安全态势感知技术应运而生,成为下一代安全技术的焦点。

网络安全态势感知是对网络安全性定量分析的一种手段,是对网络安全性的精细度量,利用网络安全态势感知技术可以全面呈现当前网络的整体安全状态,预测其发展趋势并做出有效响应,是实现主动防御的基础和前提。网络安全态势感知系统依赖于防火墙、入侵检测系统、反病毒系统、日志文件系统、恶意软件检测程序等网络安全基础设施,收集态势数据,利用数据处理模型对数据进行融合,形成安全特征信息,并对特征信息关联分析。

From the analysis of the research status at home and abroad, there are still many problems in the current network security situational awareness technology based on network traffic, cloud platform key facilities and application system logs. First, the data source and processing ideas are single. Existing network security threat prevention and control products are all aimed at a single data source, focusing on network and application entry detection data sources. However, there is a lack of overall analysis of multi-source data, and it is still difficult to comprehensively handle and correlate security incidents.

The second is the issue of timeliness. There are differences in the content and structure reported by various traffic collection devices, and the structure and granularity of logs submitted by various monitoring systems and applications are also different. Pre-defined data models and data cleaning are required for storage and analysis. If you need to add an original data field to assist in the analysis, you need to adjust the data model and store the original data again, which causes storage redundancy waste and reduces the timeliness of the system; in addition, for the problem data, it is impossible to quickly retrieve the data from the original data. Locating and solving problems takes a lot of time.

Third, the construction cost is high. A complete network security situational awareness system usually needs to deploy security devices at various traffic and business entrance locations, and the overall construction cost is high.

In view of the above problems, based on the new generation heterogeneous big data real-time analysis platform-Honghu Data Platform, this paper studies the network security situation awareness technology, builds a network security situation awareness system oriented to the network information system, and realizes efficient correlation analysis of heterogeneous and different security data, Real-time identification and location of security threats, audit and analysis of abnormal behaviors.

2. Network Security Situational Awareness System Framework

2.1 Network Security Situational Awareness System

Network security situation awareness is essentially to acquire and understand a large amount of network security data, judge the current overall security status and predict short-term future trends. It can be divided into three stages: situation extraction, situation understanding and situation prediction. The conceptual schematic diagram shown in Figure 1 is an iterative and cyclic process.

Based on the security elements and characteristics in a large-scale network environment, use methods such as data analysis, mining, and intelligent deduction to accurately understand and quantify the current security situation in cyberspace, effectively detect various attacks in cyberspace, and predict future cyberspace security. The development trend of the situation, and trace the source of the security elements that cause the situation change.

picture

Figure 1 Conceptual schematic diagram of network security situational awareness

2.1.1 Situation Extraction

The situation extraction stage mainly collects and fuses network security data. The specific process and method are as follows:

Define security elements and security features: Targeted extraction of network security data from three dimensions: asset dimension, vulnerability dimension, and threat dimension.

Data collection: Different data collection methods are adopted for different dimensions of data. Asset dimension data can be collected by means of WMI, SNMP, central manager, port scanning, etc.; vulnerability dimension data is obtained through an open source vulnerability database, and through an open vulnerability database Obtain discovered vulnerabilities; threat dimension data includes terminal data and traffic data, terminal data is collected by flume, syslog, etc., traffic data is captured by wireshark, sniffer, Libpcap library, etc. to capture packets.

Data preprocessing and fusion: standardize the data from multiple information sources, and perform association, combination, and fusion to provide decision-making information for situation assessment. Data preprocessing includes data cleaning, data integration, data specification, and data transformation.

Data cleaning: Solve the problem of data errors, including user distributed processing, impurity filtering, data cleaning, etc. for massive irregular data such as noise data, inconsistent data, and missing data. Noise data can be processed by mean substitution, regression substitution, clustering, etc.; inconsistent data needs to be processed by data integration; missing data can be processed by manual filling, similar sample filling, etc.

Data integration: To solve the problem of data redundancy, the integration is carried out from the aspects of entity, data format and integration of data itself. Commonly used methods for entities include synonym dictionaries, entity alignment based on knowledge graphs, etc.; and data formats are merged according to unified attributes; data integration is handled by averaging, voting, and weighting methods.

Data reduction: streamline data, including sample reduction, feature reduction, and dimension reduction. The method of sample reduction comes from statistics, and it is necessary to keep the characteristics of the original data set as much as possible. Feature reduction is to find the minimum feature set. The purpose of dimensionality reduction is to reduce the number of random variables or attributes to be analyzed, including methods such as wavelet transform and principal component analysis.

Data transformation: Transform data into a representation that is conducive to analysis, such as dividing data into different categories through clustering, and providing higher-level data attributes. Common methods include methods such as binning, histogram analysis, clustering, decision trees, and correlation analysis.

Data fusion: effectively integrate multi-source data, and use redundancy and complementarity to generate network situation information. Methods include classical methods and modern methods. Classical methods are based on models and probabilities, including weighted average method, Bayesian reasoning, DS evidence theory, etc. Modern methods mainly include logical reasoning and artificial intelligence methods of machine learning, such as cluster analysis, rough sets, artificial neural networks, evolutionary algorithm etc.

2.1.2 Situation Understanding

The understanding of the network security situation is to evaluate the network security situation by constructing network security situation indicators on the basis of network security detection and analysis, so as to obtain the macroscopic network security situation. The specific process and method are as follows:

Network security detection and analysis

Establish a cognitive model for network security situation awareness, use the cognitive model to conduct in-depth detection of network events, and conduct comprehensive, real-time and accurate discovery, evaluation, and evaluation of network attacks. The MDATA model (Multidimensional Data Association and Threat Analysis Model) is an effective cognitive model that solves the problems of wide data distribution and difficulty in expressing network security knowledge due to its spatio-temporal characteristics. It mainly includes three parts: association representation, association construction, and association calculation. . The various knowledge bases generated by using the MDATA model are very large, and the fog cloud computing architecture can be used to realize the management and collaborative computing of the network security situational awareness cognitive model.

Building Cybersecurity Situation Indicators

Establish a network security situational awareness indicator system, define a network security situational awareness ontology model, and efficiently calculate and understand multi-source heterogeneous security data through an explicit, formalized, and machine-readable semantic model, and analyze known network security events. Effectively correlate and deduce new attack events.

Cybersecurity Situation Assessment

Data fusion is the basis of network security situation awareness and the core of network security situation assessment. On the basis of integrating various security data, with the help of mathematical models, the evaluation value of the current network security situation is obtained through formal reasoning calculations, which are divided into qualitative and quantitative evaluations. Quantitative evaluation methods include quantitative evaluation methods based on mathematical models, quantitative evaluation methods based on knowledge reasoning, and quantitative evaluation methods based on machine learning. The quantitative evaluation method based on mathematical models comprehensively considers the factors that cause network situation changes, constructs an evaluation function based on mathematical models, and realizes the mapping between situational elements and network security quantitative evaluation values. The most commonly used methods are weight analysis and set pair analysis. The quantitative evaluation method based on knowledge reasoning establishes a database and a probability evaluation model by sorting out expert knowledge, describes and processes uncertainty information of security attributes with the help of probability theory and fuzzy theory, and analyzes the network security situation through reasoning control strategies. The quantitative evaluation method based on machine learning establishes a network security situation template through pattern recognition, correlation analysis, deep learning, etc., and classifies and grades the nature and degree of the situation through template matching and mapping.

Network Security Situation Visualization

The visualization of network security situation includes the visualization of network security data flow, the visualization of network security situation assessment, and the visualization of network attack behavior analysis. The network security situation assessment index can be displayed based on the electronic map. Current visualization tools still face the challenge of real-time display, cannot adapt to various complex situations of complex attacks, and cannot analyze complex data associations.

2.1.3 Situation Forecast

Situation prediction is based on the acquisition, transformation and processing of historical and current situation data, and establishes a mathematical model to explore the development and change laws between the data, and to reason about future development trends. The traditional network security event time forecasting techniques include gray theory forecasting, time series forecasting, regression analysis forecasting, and forecasting based on wavelet decomposition representation.

Network security event prediction technology based on knowledge reasoning includes prediction based on attack graph, prediction based on attacker's ability and intention, prediction based on attack behavior and pattern learning. Due to the randomness and uncertainty of network attacks, many scholars are currently researching artificial intelligence-based situation prediction methods, using algorithms such as neural networks and deep learning to dynamically learn and create attack strategies and behavior models to achieve accurate predictions of network security events.

2.1.4 Network Attack Source Tracing

Network attack traceability restores the attack path, determines the unknown or identity of the attacker, and finds out the cause of the attack. Traditional attack traceability technologies include traceability technology based on log storage query, traceability technology based on router technology debugging, traceability technology based on modification of network transmission data, etc. Data sources for the trace dimension, location dimension, and policy dimension are scattered, and most of them are semi-structured or even unstructured data. Therefore, research and optimize the network security knowledge base, store unstructured and semi-structured raw data, and instantly and quickly locate the original data Particularly important.

2.2 Network Security Situational Awareness System Framework

Based on network traffic, big data infrastructure platform, and application system logs, this paper uses security risk identification and perception, security event retrospective analysis, and key threat monitoring and early warning technologies to build a network security situational awareness system. Common architectures of situational awareness systems are introduced. The system architecture is shown in Figure 2, which is divided into data access processing layer, data analysis layer and situation awareness application layer. System access data mainly includes traffic probe data, logs submitted by platforms and various applications.

picture

Figure 2 Network Security Situational Awareness System Architecture

The data access processing layer defines the data standard system. The data standard system mainly includes the structure definition, data logic rule definition, data content compliance definition, log submission interactive interface method and structure definition and application of various types of data submitted by each platform. Semantic rules and structure definitions for the reporting of operating conditions in the operating log. Analyze, clean, classify, compare, and mark the collected data in a standardized manner, store them in categories, enter threat data into the threat intelligence database, analyze and enter the standardized logs submitted by the application in real time into the log database, and provide data search, analysis and mining services. The data access processing layer adopts a distributed data real-time processing framework to provide support for massive data processing capabilities.

The analysis and mining layer analyzes the attack source, attack object, and attack facility based on the probe data, evaluates the risk of the attacked facility, analyzes and statistics the attack characteristics, detects key attack behaviors, and analyzes the operation user based on the log of the application system. Behavior audit analysis, monitoring and early warning of abnormal users and abnormal behaviors.

The business application layer comprehensively analyzes the current attack sources, attack methods, and attacked facilities based on network security data, presents the overall security profile of the current platform through situational awareness, and conducts expert analysis on specific security events through threat analysis and malicious event backtracking. , to conduct security monitoring and early warning of specific attack sources, attack methods, and attacked facilities. Provide services such as comprehensive situation analysis and awareness, threat analysis, security monitoring, source tracking, log classification and statistics, log audit analysis, and exception monitoring.

3. Network security situational awareness system based on Honghu Data Platform

The existing network security situational awareness system generally uses the Flume+Kafka+Spark Streaming streaming big data processing technical framework to support real-time processing of traffic data in terms of data processing technology. However, there are many business systems, and the management of system platforms at different levels is relatively scattered. Problems are basically checked through single-point troubleshooting, and it is difficult to conduct problem discovery and root cause analysis from a global perspective.

Logs are scattered on various system devices, the data is isolated, and the overall status cannot be managed in a unified manner. After a fault occurs, it is necessary to redefine the extraction field analysis of the original log data, which takes a lot of time. There is a lack of monitoring of the system's operating status and service capabilities, and there is no good means to predict and warn of system abnormalities. In addition, the network security situational awareness system needs statistical reports, but because the scattered data cannot provide centralized management and insights, and cannot track and record user operation behaviors, it has not yet met the audit requirements.

At present, the network security situational awareness system still has the problem of collecting data overload. In order to comprehensively analyze the network security situation, if all network data is collected, the analysis efficiency will be low; analysts cannot view all the data to analyze possible attacks in cyberspace .

To solve the problem of collecting data overload, relevant rules and features are often designed for different types of threat behaviors, and various known threat behaviors are targeted to be collected. For unknown attacks, the attack behavior can only be reproduced through abnormal data traceability analysis. Since the abnormal data has been pre-processed, it carries less original information. For source tracing, the more detailed the data records, the more attack information can be mined. In order to solve the above problems, this paper will build a network security situational awareness system based on the Honghu Data Platform, a real-time big data analysis and processing platform.

3.1 Honghu Data Platform

Honghu Data Platform is a real-time big data analysis and processing platform. It adopts distributed storage and computing architecture, collects internal machine data and operation data of the enterprise, and uses technologies such as correlation analysis, behavior recognition, data modeling, and machine learning to process data. Centralized management and control, providing fast retrieval of full amount of data and real-time analysis of big data data, realizing centralized data storage, real-time query, correlation analysis, security alarm, visual display and other functions, which can be applied to security analysis, compliance audit, intelligent operation and maintenance, business In terms of analysis, Internet of Things, etc., it has powerful data visualization capabilities. The platform architecture is shown in Figure 3.

picture

Figure 3 Panoramic view of the Honghu TM data platform system architecture

Honghu Data Platform supports structured, semi-structured, and mixed structures of various time-series and text data, can efficiently store unstructured and semi-structured raw data, and achieves a high compression ratio of data storage through columnar storage, saving storage cost; and directly query and analyze the original data to easily and quickly discover the value of the data.

During data collection, it is responsible for connecting heterogeneous data from different data sources to the platform. The data index module automatically identifies and analyzes the timestamp of the data, slices the data according to the timestamp, segments the original data, and builds an inverted index. The hottest data is temporarily stored in the flash memory. After certain conditions are met, the index and original data will be compressed and written to the disk sequentially. The platform supports high-speed data injection, and a single node can reach a writing speed of 20MB/s.

During data analysis, build an engine for SQL parsing and query from scratch. When SQL parsing reaches the platform, lock the data query range and load it into memory. Use the read-time modeling rules used in the query to build a data model, and then aggregate Class relationship analysis, real-time compilation and vector computing acceleration and other technologies are used for data analysis, and a single node can process 1 million pieces of data per second. The platform supports Ad Hoc query, instant query, interactive query, correlation analysis and self-service analysis, and provides powerful data analysis capabilities.

Honghu Data Platform adopts a hybrid modeling method, which combines the efficiency of modeling at the time of writing and the flexibility of modeling at the time of reading. "Modeling when writing" refers to the traditional ETL method that needs to pre-set the data model; "Modeling when reading" uses the data ELT method to extract useful fields while searching for data, which is more flexible and agile, and saves the cost of data import. The platform data analysis flow is shown in Figure 4.

picture

Figure 4 Data analysis flow chart of Honghu Data Platform

The data processing mode adopted by Honghu Data Platform can directly query and analyze the original data, and easily and quickly discover the value of data. It is a real-time analysis platform for heterogeneous and multi-source big data. The platform adopts cloud-native and micro-service architecture, and has strong application expansion capabilities. Based on the separation of platform storage and calculation, independent expansion and flexible architecture, the platform can be widely used in security analysis, compliance audit, intelligent operation and maintenance, business analysis, Internet of Things etc.

3.2 Network Security Situational Awareness System Based on Honghu Data Platform

This section will design a network security situational awareness system that integrates security data collection, processing, analysis, security risk discovery, monitoring, alarm, and prediction based on the Honghu Data Platform.

The system integrates various sensory data sources such as user terminals, network links, application systems, and data traffic in the safe area. Analytical algorithms such as security rule models and attack reasoning models convert seemingly unconnected and chaotic security logs and alarm data into intuitive and visual security event information, and mine threat intelligence from massive data to achieve risk discovery and security. Early warning and situational awareness, improving the capabilities of security monitoring for attack discovery and security situational awareness. The system architecture is shown in Figure 5, which realizes the aggregation and storage of multi-source security data, big data analysis for threat intelligence, and situational awareness applications.

picture

Figure 5 Network Security Situational Awareness System Based on Honghu Data Platform

Based on the powerful multi-source heterogeneous data processing capability of Honghu Data Platform, the system supports multiple types of data formats, enabling network security situational awareness to obtain more types of data. The massive storage and fast processing capabilities of the Honghu Data Platform provide technical support for in-depth security analysis of high-speed network traffic, and provide computing resources for highly intelligent model algorithms. In the process of abnormal identification, smaller matching granularity and longer matching time can be used to analyze the outlier degree of unknown behavior.

On the basis of massive security information, the system focuses on the comprehensive utilization of security data for centralized analysis and processing, and recognizes the security situation through data fusion processing methods such as sorting and classification, streamlined filtering, comparative statistics, key identification, trend induction, correlation analysis, and mining prediction. Threats and risks can be perceived, and situational awareness can be visualized according to user business characteristics and security requirements.

Relying on the Honghu big data platform architecture, a large number of big data processing and analysis technologies are applied from data reception, analysis, storage to analysis and display, which can cope with high-speed processing scenarios of massive security information data in different user environments.

The situational awareness system based on the Honghu Data Platform can directly analyze the original log data efficiently and flexibly, improve the efficiency of fault location, reduce the impact of faults, and solidify the node indicators on the basis of clarifying the link topology relationship to realize real-time monitoring and early warning.

The system provides a one-stop data analysis capability, which can dig out the system operation status from the log and form a daily operation and maintenance report; save the audit log and analyze and classify user behavior, so that the security behavior of the system can be traced and convenient backtracking.

3.3 Demonstration example

The situational awareness system based on the Honghu Data Platform mainly includes functional modules such as situational awareness, security monitoring, threat intelligence, traceability, log overview, application platform logs, and abnormal statistical analysis. The system uses the Honghu Data Platform for data access, processing, and storage, and supports horizontal expansion of data processing capabilities. Taking a small amount of sample data as an example, by importing page files, the log data of data sources connected to the existing situational awareness system, such as WAF, anti-DDOS, firewall, bastion host, etc., are quickly imported into the Honghu platform, as shown in Figure 6 shown.

picture

Figure 6 Data import

Honghu Data Platform provides a variety of built-in data format processing, supports out-of-the-box, and can preview the processing effect of imported data according to the selected data format, as shown in Figure 7.

picture

Figure 7 Raw data

After the data is imported, based on the unique time-reading modeling function of the Honghu Data Platform, operations such as regularization, enrichment, filtering, and desensitization can be performed on the data during query according to the analysis requirements of situational awareness, so as to quickly complete data modeling and real-time analyze. For example, perform quick statistics on attacker IP and attack type, and the data analysis interface is shown in Figure 8. Finally, through the API, the sample data after reading-time modeling and analysis is output to the situational awareness system for situational display.

picture

Figure 8 Data Analysis

Among them, the situational awareness module presents the overall network security situation, and the display content includes network threat statistics, attack target statistics, attack sources and security trend development, as shown in Figure 9.

picture

Figure 9 Overall status of network security situational awareness

The risk monitoring module presents the security monitoring status of the network information system in a visual way, mainly including overview, event statistics, vulnerability monitoring, threat monitoring, threat event analysis, trend analysis, etc., as shown in Figure 10.

picture

Figure 10 Network security monitoring

The log overview module performs statistical analysis on the logs of each application platform, including log volume, abnormal conditions, and audit conditions of each platform, as shown in Figure 11.

picture

Figure 11 Log overview

Based on the extensive application log reporting interface of Honghu Data Platform, application platform logs support query and retrieval of various application platform logs, analysis of log details, and audit of log related behaviors, as shown in Figure 12.

picture

Figure 12 Application platform logs

4. Conclusions and future research directions

The situational security awareness system based on the Honghu Data Platform utilizes technologies in key fields such as massive data storage, separation of storage and calculation, modeling when reading, data cleaning, data analysis and mining, data visualization analysis, artificial intelligence, etc., to form a safe and reliable network security situational awareness The system has established a comprehensive and hierarchical big data center security monitoring and perception capabilities.

Based on the powerful multi-source heterogeneous integration capabilities of Honghu Data Platform, in the future, it is possible to build a data blueprint that outlines the in-depth value analysis of data with entities and relationships, and carry out model-driven, standardized and unified data processing and data governance; in terms of data collection, build a comprehensive A secure big data collection and perception system for location acquisition, network-wide aggregation, and full-dimensional integration; in terms of data fusion, a data resource fusion system with intelligent processing, fine governance, and classified organization is comprehensively constructed to form a comprehensive aggregation of general associations, general indexes, and general navigation .

Guess you like

Origin blog.csdn.net/Yhpdata888/article/details/131945177