Python dpkt module parses the message pcap

Explain how to use the dpkt python module for parsing pcap packet, packet decoding.

Wireshark commonly used for manual analysis of storage down pcap, if you need to use a flexible program control information packet analysis, I usually use the following three methods:

1, using the Lua API wireshark, you can write lua plugin for analysis. Wireshark support agreement is one of many advantages, 2000+, and for decoding the message content is very detailed. As the underlying C language is used, and therefore decoding efficiency is very high. About the use of plug-ins, see here .

2, calling a method in python scripts written in Lua, specific methods I have explained in an article on my CSDN blog, see here .

3, the direct use of the python dpkt module de packets. Than the wireshark, limited first of all supported protocols. Here a list of all the modules dpkt, it can be seen that the protocol supports decoding is not much. But for common ip, tcp, udp, ssl, http and other protocols are supported. But as the quic google has supported protocols wireshark, but dpkt yet. Also in the decoded information dpkt wireshark not as rich.

However, in many scenarios, dpkt module or have its uses, such as the need to show respect to the method twelve or implicit call tshark program, and therefore need to install wireshark program in advance. Meanwhile wireshark is the GPL license, so in some commercial projects, it may be due to the requirements of the GPL, and will not use wireshark. Therefore, it is necessary to study and introduce dpkt module for solving common protocols enough. About introduction of dpkt, it was quite detailed in its official documents described, here .

1, the installation: pip install dpktat the time of use import dpktcan be.

2, the following is an example of decoding a HTTP response data packet:

import sys
import os
import dpkt

def checkIfHTTPRes(data):

    if len(data) < 4:
        return False

    if data[:4] == str.encode('HTTP'):
        return True

    return False

def httpPacketParser(http):

    if checkIfHTTPRes(http):
        try:
            response = dpkt.http.Response(http)
            print(response.status)

        except Exception as e:
           #print(e)
            pass

def tcpPacketParser(tcp):

    stream = tcp.data
    if len(stream):
        httpPacketParser(stream)
 
def ipPacketParser(ip):
    if isinstance(ip.data, dpkt.tcp.TCP):
        tcpPacketParser(ip.data)
        

def decodePacket(packet):
    eth = dpkt.ethernet.Ethernet(packet)
    if isinstance(eth.data, dpkt.ip.IP):
        ipPacketParser(eth.data)

def pcapReader(filename):
    try:
        with open(filename, 'rb') as f:
            capture = dpkt.pcap.Reader(f)
            i = 1
            for timestamp, packet in capture:
                decodePacket(packet)
                i += 1

    except Exception as e:
        print('parse {}, error:{}'.format(filename, e))

 
if __name__ == "__main__":
    if len (sys.argv) < 2:
        print('HELP: python {} <PCAP_PATH>'.format(sys.argv[0]))
        sys.exit(0)
        #_EXIT_
    filename = sys.argv[1]
    
    if filename:
        pcapReader(filename)

dpkt very simple principle, which for each protocol has a separate class is responsible for analyzing the protocol layer, the input layer protocol is binary data, the output binary stream is parsed into separate fields meaningful. So in the example above, the first to use dpkt.pcap.Readerthe reading pcap packets is to remove the header information packets. While the upper layer based on the decoding data transfer protocol to a specific class situation are resolved. The final data from the application layer is responsible for parsing http class dpkt.http.Response. In this part of the data for decoding dpkt HTTP protocol support this, the input is transmitted to the lower part of the HTTP data (binary code stream), i.e., the upper layer TCP data (without SSL), the output is given dpkt, including HTTP protocol header information portion, the data portion of the HTTP protocol and the information, if the data part of the compressed, dpkt also provides classes gzip decompression.

In the process of decoding, I suggest that you look at dpkt source, since the main code dpkt in dpkt this folder, the file base is organized in accordance with the type of agreement, a protocol often corresponds to a .py file, source code read level is still very clear. I use dpkt in the process, found that the documentation is not particularly detailed, but dpkt code organization is very clear, at the same time be able to learn how to output decoding information fields from the source.

In the example above dpkt still very simple, single piece main decode packets. We know that a packet stream contains multiple messages, HTTP response multiple packages must be combined in order to express complete information, such as HTML page information. Simultaneously captured packets often contain a plurality of streams, a plurality of packet streams to cross together. So for DPKT, first to build a flow table, each one organized stream of packets spliced ​​together. Here I have to say, if you use wireshark, the flow table has been built for you, you can directly use the stream flow index, but everything you need to start from scratch in dpkt in. Construction on the flow table, the follow-up article will continue to share.

This article CSDN village teenager original article, reprinted remember even with a small tail, bloggers link here .

发布了132 篇原创文章 · 获赞 183 · 访问量 28万+

Guess you like

Origin blog.csdn.net/javajiawei/article/details/100513267