Application and problem analysis of artificial intelligence in the field of network attack and defense

Nowadays, the Internet has penetrated deeply into people's lives, and network security problems are becoming increasingly serious. Hacker attacks, data leaks and other incidents occur frequently, and companies and individuals are facing huge security threats. Driven by the rapid development of computer network technology and the continuous progress of human society, artificial intelligence has also kept pace with the times and has been rapidly spread and developed, which in turn directly or indirectly promotes the progress and development of other subject areas. Network security issues are increasingly becoming the focus of people's attention. How can we effectively use artificial intelligence technology to protect network data security?

Algorithms, data and computing power are the three core elements for the development of artificial intelligence. In recent years, driven by various favorable factors such as algorithm enhancement, data explosion, and computing power improvement, artificial intelligence has developed rapidly and been widely used in all walks of life, and the field of cyberspace security is no exception. Network attack and defense confrontations continue to evolve and upgrade. Artificial intelligence, with its self-learning and adaptive capabilities, can provide assistance for automated network attack and defense, and has become one of the core key technologies for network attack and defense.

1. Application of artificial intelligence in the field of network attack and defense

In order to better understand the application of artificial intelligence in the field of network attack and defense, we will now develop a total of four aspects from the two dimensions of the offensive and defensive perspective and the intention of the offensive and defensive subjects to adopt artificial intelligence (as shown in the figure below).

img

Figure Application of artificial intelligence in the field of network attack and defense

(1) Artificial intelligence assists cyber attacks

Artificial intelligence makes network attacks more powerful. On the one hand, it can automate and scale the tasks involved in network attacks and obtain high profits at a lower cost; on the other hand, it can automatically analyze the security defense mechanism of the attack target and customize attacks for weak links. Thereby bypassing the security mechanism and improving the success rate of the attack. Research on the application of artificial intelligence in network attacks in recent years shows that network attacks using artificial intelligence include but are not limited to: customizing malicious code or communication traffic that bypasses anti-virus software; intelligent password guessing; breaking verification code technology Unauthorized access; spear phishing; precise positioning and attack of attack targets; automated penetration testing, etc.

  1. Malicious code is protected from killing. Using deep reinforcement learning network, a black box attack method is proposed to attack static PE (ported file) anti-virus engine. This is the first work that can generate adversarial PE malicious code, reaching 90% in simulating real-life attacks. Success rate.
  2. Generating malicious traffic based on the generative adversarial network framework IDSGAN. IDSGAN, a framework based on generative adversarial networks, utilizes generators to convert original malicious traffic into adversarial malicious traffic, which can deceive and evade intrusion detection systems. Experiments have proven that most adversarial traffic can deceive and bypass existing intrusion detection systems, with an evasion rate of over 99.0%.
  3. Intelligent password guessing. The password generation model GENPass based on multiple data sets borrows the ideas of PCFG (Probabilistic Context-Free Grammar) and GAN (Generative Adversarial Network), and improves the hit rate of a single data set and the accuracy of multiple data sets through long and short-term memory neural network training. Generalizability.
  4. New text captcha solver. A universal and effective text verification code solver based on GAN is proposed. By parameterizing the characters used in the verification code, character rotation angles, etc., the verification code training data is automatically generated, and transfer learning technology is used to tune the model, which improves the generalization ability and recognition accuracy of the verification code recognition model. This method can break all text verification codes used by the top 50 websites in the world (as of April 2018), including websites such as Google, eBay, Microsoft, Wikipedia, Taobao, Baidu, Tencent, Sohu, and JD.com.
  5. Automated advanced spear phishing. An end-to-end spear phishing method based on Twitter that uses Markov models and recurrent neural networks (LSTM) to construct tweet content that is closer to human-written content. After testing, it was found that the success rate of this phishing framework was 30% to 60%, which once exceeded the success rate of manual spear phishing (45%).
  6. Phishing email generation. The natural language generation technology NLG based on RNN (Recurrent Neural Network) automatically generates fake emails (with malicious intent) targeting the target and is trained with personal real email data and phishing email data. Experiments have proven that RNN-generated emails have better coherence and fewer grammatical errors, making them better candidates for phishing email attacks.
  7. DeepLocker new malware. The malware is highly targeted and evasive, hiding its malicious intent until it infects a specific target. Once the artificial intelligence model (Deep Neural Network DNN) identifies the attack target through facial recognition, geolocation, voice recognition, etc., malicious behavior will be released. The use of artificial intelligence makes it nearly impossible to reverse engineer the triggering conditions to unlock an attack.
  8. DeepExploit is a fully automated penetration testing tool. Utilize A3C distributed training's advanced reinforcement learning algorithm to achieve automated penetration testing, which can automatically complete intelligence collection, threat modeling, vulnerability analysis, vulnerability exploitation, post-exploitation and report generation.
  9. DeepDGA algorithm based on deep learning. The well-known domain names included on the Alexa website are used as training data, and the LSTM algorithm and GAN are used to build the model. The generated domain names are very similar to normal website domain names and are difficult to detect.
  10. Artificial intelligence-based vulnerability scanning tool. Starting in August 2019, Instagram users found that their account information had been changed by hackers and they were unable to log in to their accounts; in November 2019, a bug in the Instagram code led to a data leak, and the user's password could be displayed in the web address of the user's browser. Presumably, in both attacks, the attackers used artificial intelligence-based tools to scan for server vulnerabilities.

Based on the CyberKill Chain model proposed by Lockheed Martin in 2011 (the attack process is divided into seven stages: reconnaissance and tracking, weapon construction, payload delivery, vulnerability exploitation, installation and implantation, command and control, and goal achievement). stages) as a reference to describe the application research of artificial intelligence in network attacks (as shown in the table below). It can be seen that hackers try to use artificial intelligence technology to optimize at each attack stage of the network kill chain model in order to obtain maximum benefits.

Table Research on the application of artificial intelligence in cyber attacks

img

(2) Artificial intelligence assists network defense

Network security threats are emerging in an endless stream and are characterized by intelligence, concealment, and scale. Network security defense is facing great challenges. Artificial intelligence-driven network defense has powerful independent learning and data analysis capabilities, which greatly shortens the gap between threat discovery and response, realizes automatic and rapid identification, detection and disposal of security threats, and plays an important role in responding to various security threats. In particular, artificial intelligence has great advantages in discovering unknown threats and advanced threats such as APTs.

Artificial intelligence continues to provide new ideas for people to deal with increasingly complex network security issues. Currently, artificial intelligence has been applied to malware/traffic detection, malicious domain name/URL detection, phishing email detection, network attack detection, software vulnerability mining, threat intelligence collection, etc. Specific application research includes:

  1. Malware detection. Malware samples are converted into 2D images, and the 2D images are fed into a trained deep neural network (DNN), where the 2D images are classified as "clean" or "infected". The detection method achieved 99.07% accuracy with a false positive rate of 2.58%.
  2. Unknown encrypted malicious traffic detection. After two months of training, the LSTM-based encrypted malicious traffic detection model can identify unknown encrypted malicious traffic from many different malware families without being able to extract features from the payload.
  3. Malicious (bot) network traffic detection. BoTShark, a malicious network traffic detector that utilizes deep learning and is independent of the underlying botnet architecture, uses two deep learning detection models: stacked autoencoder Autoencoder and convolutional neural network CNN to eliminate the detection system's dependence on the main characteristics of network traffic. sex. The detector achieved a classification accuracy of 91% and a recall of 13%.
  4. A method to detect malicious domain names based on artificial intelligence. In view of the uncontrollable characteristics of many false positives/misses in threat intelligence, threat intelligence is used as a training set, and support vector machine (SVM) is used to learn the data characteristics behind threat intelligence. Through the powerful generalization ability of artificial intelligence, it can reduce false positives and improve security. The system becomes controllable.
  5. Use machine learning to detect malicious URLs. The machine learning clustering algorithm combined with domain generation algorithm DGA detection can achieve a high malicious URL detection rate, not only detecting known malicious URLs, but also detecting new variants that have never been exposed.
  6. New phishing email detection. The deep neural network DNN is used to detect phishing emails, and experiments have proven that DNN can achieve 94.27% detection performance in phishing email detection, further proving the feasibility of deep learning technology in automated phishing identification.
  7. AI-based cybersecurity platform AI2. The platform combines unsupervised machine learning and supervised learning methods. It first uses unsupervised machine learning to autonomously scan log files. Analysts confirm the scan results and incorporate the confirmed results into the AI2 system for analysis of new logs. The platform detects approximately 85% of cyberattacks.
  8. A universal vulnerability detection method based on machine learning. This is the first general vulnerability detection method based on vulnerability inconsistency. Different from existing vulnerability detection methods, this method uses two-step clustering to detect code fragments with similar but inconsistent functions, without spending a lot of time on sample collection, cleaning and labeling. At the same time, this method uses manual analysis of clustering results to locate real vulnerabilities faster. The method discovered 22 unknown vulnerabilities in open source software.
  9. Threat intelligence knowledge graph construction technology based on deep learning. Use the model trained by the deep belief network DBN to automatically extract entities and entity relationships of threat intelligence. This method has a greater recognition accuracy than shallow neural networks and a much higher rate than manual extraction. It can provide a strong guarantee for the automated construction of threat intelligence knowledge graphs.
  10. DGA domain name detection method based on hybrid word vector deep learning model. For the first time, the character-level word vectors and bigram word vectors of DGA domain names were combined to improve the information utilization of domain name strings, and a deep learning model based on the hybrid word vector method was designed. The model consists of convolutional neural networks CNN and LSTM. . Experiments prove that this method has better feature extraction capabilities and classification effects, and alleviates the negative impact of data imbalance to a certain extent.

It can be seen from the above application research that current artificial intelligence application research mainly focuses on malicious behavior detection, and continuously improves response processing, active defense and threat prediction capabilities based on detection results.

(3) Attacks on artificial intelligence’s own security issues

With the widespread application of artificial intelligence, security risks caused by immature technology and malicious applications are gradually exposed, including software implementation vulnerabilities in deep learning frameworks, malicious adversarial sample generation, training data poisoning, and strong data dependence. Hackers can bypass defenses and attack by finding weaknesses in artificial intelligence systems, causing chaos in the artificial intelligence-driven system, causing missed or misjudgments, and even causing the system to crash or be hijacked. The security issues of artificial intelligence are mainly reflected in training data, development frameworks, algorithms, models, and software and hardware equipment that carry artificial intelligence systems, as detailed below.

  1. Data Security. The quality of the data set (such as the size, balance and accuracy of the data, etc.) is crucial to the application of artificial intelligence algorithms and affects the execution results of the artificial intelligence algorithms. Bad data sets can render artificial intelligence algorithm models ineffective or produce unsafe results. A more common security problem is data poisoning attacks, which lead to artificial intelligence decision-making errors through training data pollution. For example, spammers implement simple "evasion attacks" by inserting "nice words" into spam emails to bypass the classifiers in spam filters, thereby allowing malicious emails to evade spam classification detection (earliest research).
  2. Frame safe. There are many security risks in deep learning frameworks and the third-party libraries they rely on, which can lead to runtime errors in artificial intelligence algorithms implemented based on the framework. Researchers from 360 Security Lab and other units studied the security threats existing in the implementation of three mainstream deep learning frameworks, Caffe, TensorFlow and Torch, and found that there are many vulnerabilities such as heap overflow and number overflow in the framework, 15 of which Vulnerabilities have CVE numbers.
  3. Algorithm security. Although deep neural networks have achieved good results in many fields, the reasons for their good results and the meaning of the hidden layers in the algorithm and the meaning of neuron parameters are still unclear. The lack of interpretability can easily lead to algorithm operation errors and produce Attacks such as adversarial sample attacks and algorithm backdoor implantation. Some researchers introduced an escape attack against Gmail PDF filtering, using genetic programming to randomly modify malware to achieve escape from a machine learning malware classifier based on PDF structural characteristics. This method not only successfully attacked two malicious PDF file classifiers with extremely high accuracy, but also attacked the malware classifier embedded in Gmail. It only required 4 lines of code to modify known malicious PDF samples to achieve nearly 50% Escape rate, 1 billion Gmail users were affected.
  4. Model safety. As the core of artificial intelligence applications, models have become a key target for attackers. The attacker sends a large number of prediction queries to the target model, uses the model output to steal privacy-sensitive data such as model structure, parameters, training and test data, and further trains the same or similar model to the target model; it uses traditional security technologies such as reverse engineering to directly restore the model files; Attackers use open source models to inject malicious behaviors into them and then publish and share them again. In 2017, Papernot et al. proposed a black-box model stealing attack that collects the input and output of the target classifier to build a comprehensive data set, which is used to train a substitute for the target model (a locally built similar model) to achieve the target model. attack. In addition to state-of-the-art deep neural networks, the method is also applicable to different machine learning classifier types.
  5. Software and hardware security. In addition to the above security issues, the software and hardware devices that carry artificial intelligence applications (related to data collection and storage, application operation, etc.) face traditional security risks, and existing vulnerabilities are easily exploited by attackers. At the Black Hat 2018 conference, Tencent Cohen Lab introduced the attack test situation against Tesla's Autopolit automatic assisted driving system in a remote attack scenario that avoids direct physical contact. The entire attack process started by exploiting Webkit browser vulnerabilities to execute arbitrary code in the browser, and finally gained control of Autopilot.

Attackers can launch attacks against the security issues of the above-mentioned artificial intelligence itself. One of the more common attacks is the adversarial sample attack. The attacker can interfere with the reasoning of artificial intelligence by adding a small amount of carefully constructed "perturbations" that humans cannot recognize on the input data. The process makes the model output wrong prediction results, achieving the attack effect of evading detection. In addition, adversarial sample attacks have strong transferability, and adversarial samples against specific model attacks are also effective against attacks on other different models.

(4) Protection against artificial intelligence’s own security issues

As the amount of data and computing power continues to increase, artificial intelligence application scenarios will continue to increase in the future, and artificial intelligence's own security issues have become a bottleneck for its development. The importance of artificial intelligence's own security is self-evident. Regarding the security issues of artificial intelligence itself in terms of training data, development frameworks, algorithms, models, software and hardware equipment, etc., currently the more commonly used protection methods are:

  1. Data Security. Analyze the difference between abnormal data and normal data, filter abnormal data; detect outliers in the training data set based on statistical methods; use multiple independent models for integrated analysis, and use different data sets for training by different models to reduce the impact of data poisoning attacks wait.
  2. Frame safe. Use code auditing, fuzz testing and other technologies to discover security vulnerabilities in the development framework and repair them; use community forces such as white hats and security research teams to discover security issues and reduce the security risks of the framework platform.
  3. Algorithm security. In the data collection stage, the input data is preprocessed to eliminate adversarial perturbations existing in adversarial samples. In the model training phase, adversarial samples and benign samples are used to conduct adversarial training on the neural network to defend against adversarial sample attacks; enhance the interpretability of the algorithm, and clarify the decision-making logic, internal working mechanism, decision-making process and basis of the algorithm. In the model use stage, adversarial sample detection is performed through differences in data feature layers or differences in model prediction results; the input data is reconstructed such as deformation and transformation, and the adversarial perturbation of the attacker is destroyed while retaining semantics.
  4. Model safety. In the data collection stage, the data collection granularity is strengthened to enhance the diversity of environmental factors in the training data and enhance the model's adaptability to changing environments. In the model training phase, the model can learn features that are not easily disturbed or reduce the dependence on such features to improve the robustness of the model; divide the training data into multiple sets to train independent models separately, and multiple models can vote for joint training and use. model to prevent leakage of training data; add noise to the data/model training steps or make purposeful adjustments to the model structure to reduce the sensitivity of the model output results to the training data or model and protect model data privacy; embed watermarks into the model files to avoid model theft; delete neurons in the model that are irrelevant to normal classification through model pruning to reduce the possibility of backdoor neurons functioning, or eliminate backdoors in the model by fine-tuning the model using clean data sets. During the model use phase, preprocess input data to reduce the possibility of backdoor attacks; introduce randomness (input/parameters/output) during model operation so that attackers cannot obtain accurate information about the model; confuse model output and model parameters. Update the valid information contained in interactive data such as this to reduce the readability of model information; use access control policies (identity verification, number of visits, etc.) to limit access to the model system to prevent model information leakage; verify or verify model files, Discover security issues.
  5. Software and hardware security. Encrypt model-related data during communication or storage to ensure that sensitive data is not leaked; conduct security inspections on software and hardware devices to detect malicious behaviors in a timely manner; record input and output data and core data operation records during model operation to support System decisions are made and retrospective verification is performed when problems arise.

In recent years, some tools or products for algorithm model evaluation have also emerged. In 2020, Ruilai Intelligence and Alibaba released detection platforms for the security of the algorithm model itself. In addition to conducting security assessments on the algorithm model, it also provides defense enhancement suggestions for the model; in May this year, Microsoft open sourced the AI ​​security used internally. Risk assessment tool Counterfit, which can be used for red team drills, penetration testing and vulnerability scanning, and can record attack events when attacked.

When it comes to artificial intelligence business applications, it is also necessary to develop security mechanisms based on specific application scenarios to ensure the security of business applications.

2. Artificial Intelligence Application Situation and Problem Analysis

To sum up, there have been many application studies of artificial intelligence in the field of network attack and defense, and its application potential is huge. Domestic and foreign countries are also actively exploring the possibility of automated network attack and defense. However, the unique attributes of network attack and defense and the characteristics of artificial intelligence technology bring certain limitations to the application of artificial intelligence in the field of network attack and defense.

(1) Cyber ​​attacks

There have been many attempts to apply artificial intelligence in network attacks, and have achieved good results. However, the role of artificial intelligence is still limited. In terms of vulnerability mining, current relevant challenges and competition questions mainly examine vulnerability mining in binary programs. Although automated tools have shown strong vulnerability discovery and utilization capabilities, vulnerabilities with strong logical analysis capabilities cannot be completely mined by automated tools.

In addition, due to the high cost of computing power, manpower, etc. required by artificial intelligence models, there are relatively few cyberattacks using artificial intelligence technology methods in reality. Currently, there are no real cases of large-scale cyber attacks using artificial intelligence methods.

(2) Network defense

The application of artificial intelligence has greatly improved the defense level of network security, but there are also some problems. Although artificial intelligence technology represented by deep learning can automatically extract features, it faces problems such as data hunger and interpretability. The greater the amount of data, the more accurate the AI ​​model will be. However, in the fields of malicious code detection and software vulnerability mining, there is still a lack of better data sets, resulting in low detection rates and accuracy rates based on artificial intelligence methods. Although artificial intelligence algorithms such as deep learning can better identify unknown threats, they often know what they are but don’t know why. The algorithm model lacks interpretability and cannot determine the source of the threat.

In addition, due to the special application of artificial intelligence in the field of network security and the high cost of false alarms, the application of artificial intelligence in network defense mostly uses a combination of artificial intelligence and humans. The results of a survey of 102 cybersecurity industry professionals at the 2020 RSA Conference showed that nearly 60% of the respondents believed that cybersecurity threats verified through manual verification are more convincing than automated processing by artificial intelligence.

3. Summary and outlook

Artificial intelligence has unique values ​​and advantages. Attackers use artificial intelligence as a weapon, so that malicious attack behaviors can learn by themselves, and adaptively "adapt to changes" according to the differences in the target defense system, and achieve the purpose of attack by looking for potential vulnerabilities. At the same time, the use of artificial intelligence technology can improve the current situation of network security, identify known or unknown threats faster and respond in a timely manner, and better respond to complex network attacks. At present, scientific research institutions and the industry have reached a consensus that integrating artificial intelligence technology will become the new normal for network attack and defense. The application of artificial intelligence in the field of network attack and defense is still in its early stages. Artificial intelligence is only an auxiliary means, and there is still a long way to go before realizing truly automated attack and defense.

As long as you like my article today, my private network security learning materials will be shared with you for free. Come and see what is available.

Network security learning resource sharing:

Finally, I would like to share with you a complete set of network security learning materials that I have studied myself. I hope it will be helpful to friends who want to learn network security!

Getting Started with Zero Basics

For students who have never been exposed to network security, we have prepared a detailed learning and growth roadmap for you. It can be said to be the most scientific and systematic learning route. It will be no problem for everyone to follow this general direction.

[Click to receive] CSDN gift package: "Hacker & Network Security Introduction & Advanced Learning Resource Package" free sharing

1. Learning roadmap

Insert image description here

There are a lot of things to learn about attack and defense. I have written down the specific things you need to learn in the road map above. If you can learn them all, you will have no problem taking on private work.

2. Video tutorial

Although there are many learning resources on the Internet, they are basically incomplete. This is a video tutorial on network security that I recorded myself. I have accompanying video explanations for every knowledge point in the roadmap above. [Click to receive the video tutorial]

Insert image description here

I also compiled the technical documents myself, including my experience and technical points in participating in large-scale network security operations, CTF and digging SRC vulnerabilities. There are also more than 200 e-books[Click to receive it Technical Documentation]

Insert image description here

(They are all packaged into one piece and cannot be expanded one by one. There are more than 300 episodes in total)

3. Technical documents and e-books

I also compiled the technical documents myself, including my experience and technical points in participating in large-scale network security operations, CTF and digging SRC vulnerabilities. There are also more than 200 e-books[Click to receive it Books]

Insert image description here

4. Toolkit, interview questions and source code

"If you want to do your job well, you must first sharpen your tools." I have summarized dozens of the most popular hacking tools for everyone. The scope of coverage mainly focuses on information collection, Android hacking tools, automation tools, phishing, etc. Interested students should not miss it.

Insert image description here

Finally, here are the interview questions about network security that I have compiled over the past few years. If you are looking for a job in network security, they will definitely help you a lot.

These questions are often encountered when interviewing Sangfor, Qi Anxin, Tencent or other major companies. If you have good questions or good insights, please share them.

Reference analysis: Sangfor official website, Qi’anxin official website, Freebuf, csdn, etc.

Content features: Clear organization and graphical representation to make it easier to understand.

Summary of content: Including intranet, operating system, protocol, penetration testing, security service, vulnerability, injection, XSS, CSRF, SSRF, file upload, file download, file inclusion, XXE, logical vulnerability, tools, SQLmap, NMAP, BP, MSF…

Insert image description here

Due to limited space, only part of the information is displayed. You need to click the link below to obtain it
CSDN gift package: "Hacker & Network Security Introduction & Advanced Learning Resource Package" Share for free

Guess you like

Origin blog.csdn.net/HUANGXIN9898/article/details/132824104