快速了解NSL-KDD数据集

NSL-KDD 数据集是著名的KDD’99数据集的修订版本,该数据集由四个子数据集组成:KDDTest+、KDDTest-21、KDDTrain+、KDDTrain+_20Percent。其中KDDTest-21 和 KDDTrain+_20Percent 是 KDDTrain+ 和 KDDTest+ 的子集。数据集每条记录包含 43 个特征,其中 41 个特征指的是流量输入本身,最后两个是标签(正常或攻击)和分数(流量输入本身的严重性)。

数据集中存在 4 种不同类型的攻击:拒绝服务 (DoS)、探测、用户到根 (U2R) 和远程到本地 (R2L)。每种攻击的简要说明如下:

  • DoS 是一种尝试关闭进出目标系统的流量的攻击。 IDS被系统无法处理的异常流量淹没,并关闭以保护自己。这可以防止正常流量访问网络。这方面的一个例子可能是在线零售商在大促销的一天被大量在线订单淹没,并且由于网络无法处理所有请求,它将关闭阻止付费客户购买任何东西。这是数据集中最常见的攻击。
  • 探测或监视是一种尝试从网络获取信息的攻击。这里的目标是像小偷一样窃取重要信息,无论是关于客户的个人信息还是银行信息。
  • U2R 是一种从普通用户帐户开始并尝试以超级用户 (root) 身份访问系统或网络的攻击。攻击者试图利用系统中的漏洞来获得 root权限/访问权限。
  • R2L 是一种尝试获得对远程机器的本地访问权限的攻击。攻击者没有对系统/网络的本地访问权限,并试图以“破解”他们的方式进入网络。

每种攻击的不同子类的细分如下表:
在这里插入图片描述
每种攻击类型的数据分布如下:
在这里插入图片描述
数据集中提供的特征可以分为四类:内在、内容、基于主机和基于时间。以下是对不同类别功能的描述:

  • 内在特征可以从数据包的包头中获得,无需查看有效负载本身,保存有关数据包的基本信息。此类别包含在特征 1-9。
  • 内容特征包含有关原始数据包的信息,因为它们是分多个而不是一个发送的。有了这些信息,系统就可以访问有效载荷。此类别包含在特征 10–22。
  • 基于时间的功能在两秒的窗口内对流量输入进行分析,并包含诸如尝试与同一主机建立多少连接等信息。这些特征主要是计数和速率,而不是有关流量输入内容的信息。此类别包含在特征 23–31。
  • 基于主机的功能与基于时间的功能类似,不同之处在于它不是在 2 秒的窗口内分析,而是对一系列连接进行分析(通过 x 个连接向同一主机发出多少请求)。这些功能旨在访问跨度超过两秒窗口时间跨度的攻击。此类别包含在特征 32–41。

下表中可以看到分类特征的可能值的细分。有 3 个可能的协议类型值、60 个可能的服务值和 11 个可能的标志值。
在这里插入图片描述
Flag 中的每个值代表一个连接的状态,每个值的解释如下:
在这里插入图片描述
每个特征的描述和数据集的细分如下表:

# Feature Name Description Type Value Type Ranges (Between both train and test)
1 Duration Length of time duration of the connection Continuous Integers 0 - 54451
2 Protocol Type Protocol used in the connection Categorical Strings
3 Service Destination network service used Categorical Strings
4 Flag Status of the connection – Normal or Error Categorical Strings
5 Src Bytes Number of data bytes transferred from source to destination in single connection Continuous Integers 0 - 1379963888
6 Dst Bytes Number of data bytes transferred from destination to source in single connection Continuous Integers 0 - 309937401
7 Land If source and destination IP addresses and port numbers are equal then, this variable takes value 1 else 0 Binary Integers { 0 , 1 }
8 Wrong Fragment Total number of wrong fragments in this connection Discrete Integers { 0,1,3 }
9 Urgent Number of urgent packets in this connection. Urgent packets are packets with the urgent bit activated Discrete Integers 0 - 3
10 Hot Number of “hot‟ indicators in the content such as: entering a system directory, creating programs and executing programs Continuous Integers 0 - 101
11 Num Failed Logins Count of failed login attempts Continuous Integers 0 - 4
12 Logged In Login Status : 1 if successfully logged in; 0 otherwise Binary Integers { 0 , 1 }
13 Num Compromised Number of "compromised” conditions Continuous Integers 0 - 7479
14 Root Shell 1 if root shell is obtained; 0 otherwise Binary Integers { 0 , 1 }
15 Su Attempted 1 if "su root’’ command attempted or used; 0 otherwise Discrete (Dataset contains ‘2’ value) Integers 0 - 2
16 Num Root Number of "root’’ accesses or number of operations performed as a root in the connection Continuous Integers 0 - 7468
17 Num File Creations Number of file creation operations in the connection Continuous Integers 0 - 100
18 Num Shells Number of shell prompts Continuous Integers 0 - 2
19 Num Access Files Number of operations on access control files Continuous Integers 0 - 9
20 Num Outbound Cmds Number of outbound commands in an ftp session Continuous Integers { 0 }
21 Is Hot Logins 1 if the login belongs to the "hot’’ list i.e., root or admin; else 0 Binary Integers { 0 , 1 }
22 Is Guest Login 1 if the login is a "guest’’ login; 0 otherwise Binary Integers { 0 , 1 }
23 Count Number of connections to the same destination host as the current connection in the past two seconds Discrete Integers 0 - 511
24 Srv Count Number of connections to the same service (port number) as the current connection in the past two seconds Discrete Integers 0 - 511
25 Serror Rate The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in count (23) Discrete Floats (hundredths of a decimal) 0 - 1
26 Srv Serror Rate The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in srv_count (24) Discrete Floats (hundredths of a decimal) 0 - 1
27 Rerror Rate The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in count (23) Discrete Floats (hundredths of a decimal) 0 - 1
28 Srv Rerror Rate The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in srv_count (24) Discrete Floats (hundredths of a decimal) 0 - 1
29 Same Srv Rate The percentage of connections that were to the same service, among the connections aggregated in count (23) Discrete Floats (hundredths of a decimal) 0 - 1
30 Diff Srv Rate The percentage of connections that were to different services, among the connections aggregated in count (23) Discrete Floats (hundredths of a decimal) 0 - 1
31 Srv Diff Host Rate The percentage of connections that were to different destination machines among the connections aggregated in srv_count (24) Discrete Floats (hundredths of a decimal) 0 - 1
32 Dst Host Count Number of connections having the same destination host IP address Discrete Integers 0 - 255
33 Dst Host Srv Count Number of connections having the same port number Discrete Integers 0 - 255
34 Dst Host Same Srv Rate The percentage of connections that were to different services, among the connections aggregated in dst_host_count (32) Discrete Floats (hundredths of a decimal) 0 - 1
35 Dst Host Diff Srv Rate The percentage of connections that were to different services, among the connections aggregated in dst_host_count (32) Discrete Floats (hundredths of a decimal) 0 - 1
36 Dst Host Same Src Port Rate The percentage of connections that were to the same source port, among the connections aggregated in dst_host_srv_count (33) Discrete Floats (hundredths of a decimal) 0 - 1
37 Dst Host Srv Diff Host Rate The percentage of connections that were to different destination machines, among the connections aggregated in dst_host_srv_count (33) Discrete Floats (hundredths of a decimal) 0 - 1
38 Dst Host Serror Rate The percentage of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_count (32) Discrete Floats (hundredths of a decimal) 0 - 1
39 Dst Host Srv Serror Rate The percent of connections that have activated the flag (4) s0, s1, s2 or s3, among the connections aggregated in dst_host_srv_count (33) Discrete Floats (hundredths of a decimal) 0 - 1
40 Dst Host Rerror Rate The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_count (32) Discrete Floats (hundredths of a decimal) 0 - 1
41 Dst Host Srv Rerror Rate The percentage of connections that have activated the flag (4) REJ, among the connections aggregated in dst_host_srv_count (33) Discrete Floats (hundredths of a decimal) 0 - 1
42 Class Classification of the traffic input Categorical Strings
43 Difficulty Level Difficulty level Discrete Integers 0 - 21

数据集下载链接:https://www.unb.ca/cic/datasets/nsl.html
数据集详细介绍请参考:https://towardsdatascience.com/a-deeper-dive-into-the-nsl-kdd-data-set-15c753364657

猜你喜欢

转载自blog.csdn.net/airenKKK/article/details/124619217