National standard GB28181 protocol client development (4) Real-time video data transmission

This article is the fourth in the series "National Standard GB28181 Protocol Device Side Development" and introduces the process of real-time video data transmission. By interpreting the SDP information in the INVITE message, reading and parsing the video file or picture file, encoding the data, encapsulating h264 into PS format, and finally sending it through RTP data, the video transmission function on the device side of the GB28181 protocol is realized. This article will gradually introduce the implementation steps and related technical points of each module in detail to help readers understand and apply the GB28181 protocol for real-time video transmission.

1. Interpretation of SDP information of INVITE message

In the GB28181 protocol, during real-time audio and video transmission, INVITE messages are used to carry SDP (Session Description Protocol) information. SDP information describes the attributes and parameters of the session, including media type, transport protocol, codec, network address, etc. The following is the SDP content of an example INVITE message, with a detailed explanation of each item:

v=0
o=34020000002000000001 0 0 IN IP4 192.168.1.10
s=Play
c=IN IP4 192.168.1.10
t=0 0
m=video 40052 RTP/AVP 96
a=recvonly
a=rtpmap:96 PS/90000
y=0358902090
f=

v=0

Indicates the SDP protocol version number, here it is 0.
o=34020000002000000001 0 0 IN 192.168.1.10

The o field identifies the initiator of the session and the unique identifier of the session.
"34020000002000000001" indicates the SIP ID of the initiator of this session.
0 0 represents the start and end timestamps of the session.
IN IP4 192.168.1.10 represents the network address of the session, here it is the IPv4 address.
The s field is the name or description of the session. Here it is "Play", which means real-time audio and video.
c=IN IP4 192.168.1.10

The c field specifies the connection information of the session.
IN indicates that the network type is Internet.
IP4 192.168.1.10 represents the IPv4 address of the session.
t=0 0

The t field specifies the time information of the session.
0 0 means that the start and end time of the session are both 0, that is, the duration is undefined.
m=video 40052 RTP/AVP 96

The m field defines the media type and related parameters in the session.
video indicates that the media type is video.
40052 represents the transmission port number of the media stream.
RTP/AVP means that the transmission protocol is RTP and configured using AVP (Audio-Visual Profile).
96 indicates that the media stream is represented by number 96.
a=rtpmap:96 PS/90000

The a field contains the properties of the media stream.
rtpmap:96 means that the payload type will be numbered 96.
PS means using MPEG-PS format for data encapsulation.
90000 represents the clock rate, which is the number of clock ticks per second.
y=0358902090

The y field is a decimal integer string representing the SSRC value
f=

f field: f= v/encoding format/resolution/frame rate/bit rate type/bit rate size a/encoding format/bit rate size/sampling rate The f field is not set
here and is filled by the data sender.

2. Reading, parsing and encoding of video files or picture files

In order to transmit video data, we first need to read and parse the video file or image file. We need to use corresponding libraries or tools to read video or picture data from the file and parse it to obtain key video frames or image data to prepare for subsequent encoding and packaging.

Insert image description here

3. h264 package PS

In the GB28181 protocol, video data is usually encapsulated in MPEG-PS (MPEG Program Stream) format. The encoded video data needs to be encapsulated in PS format, including adding packaging headers and start codes, and then further encapsulating RTP.

The following is the main process of using C++ to encapsulate H.264 NALU into MPEG-PS format (only part of the code is shown):

// 将H.264的NALU列表封装为MPEG-PS格式
void MakeMPEGPS(unsigned char* h264Data, int h264Length,
    unsigned char* psData)
{
    int totalPES = (h264Length + MAX_PES_LENGTH - 1) / MAX_PES_LENGTH; // 计算总的PES包数
    int remainingBytes = h264Length; // 剩余待处理的字节数

    // MPEG-PS包头
    unsigned char mpegPSHeader[] = {0x00, 0x00, 0x01, 0xBA};

    // 分割并封装H.264数据
    for (int i = 0; i < totalPES; i++)
    {
        unsigned char* pbuf = psData;

        int pesLength = (remainingBytes > MAX_PES_LENGTH) ? MAX_PES_LENGTH : remainingBytes; // 当前PES包的长度
        remainingBytes -= pesLength; // 更新剩余待处理的字节数

        // PES包头
        unsigned char pesHeader[] = {0x00, 0x00, 0x01, 0xE0, 0x00, 0x00, 0x80, 0x00};

        // 设置PES包长度
        pesHeader[4] = (pesLength + 8) >> 8; // 高8位
        pesHeader[5] = (pesLength + 8) & 0xFF; // 低8位

        // 输出MPEG-PS包头和当前PES包头
        memcpy(pbuf, mpegPSHeader, sizeof(mpegPSHeader));
        pbuf += sizeof(mpegPSHeader);

        memcpy(pbuf, pesHeader, sizeof(pesHeader));
        pbuf += sizeof(pesHeader);

        // 输出当前PES包的H.264数据
        memcpy(pbuf, h264Data + (i * MAX_PES_LENGTH), pesLength);
        pbuf += pesLength;

        int payload_len = (pbuf - psData);

        // 封装RTP包并发送
        MakeAndSendRTP(psData, payload_len);
    }
}

It should be noted that when the h264 frame is relatively large, it will exceed the length that can be expressed by PES. At this time, the h264 frame must be segmented, encapsulated into multiple PES, and then synthesized into a PS package.

4. RTP data sending

The logic of RTP data sending is relatively simple. The following is a schematic diagram of the code in the program.

Insert image description here

The following is the demo code for RTP encapsulation (only part of the code is shown):

struct RTPHeader
{
    uint8_t version; // RTP协议版本号，固定为2
    uint8_t padding: 1; // 填充位
    uint8_t extension: 1; // 扩展位
    uint8_t csrcCount: 4; // CSRC计数器，指示CSRC标识符的个数
    uint8_t marker: 1; // 标记位
    uint8_t payloadType: 7; // 负载类型
    uint16_t sequenceNumber; // 序列号
    uint32_t timestamp; // 时间戳
    uint32_t ssrc; // 同步信源标识符
};

void MakeRTPHeader(struct RTPHeader* header, uint16_t sequenceNumber, uint32_t timestamp, uint32_t ssrc, bool isMark)
{
    // 设置RTP协议版本号为2
    header->version = 2;
    // 填充位、扩展位、CSRC计数器等字段根据具体需求进行设置
    header->padding = 0;
    header->extension = 0;
    header->csrcCount = 0;
    // 设置标记位为0（如果需要设置为1，则在需要设置的地方进行修改）
    header->marker = isMark ? 1 : 0;
    // 设置负载类型（payload type），根据具体需求进行设置
    header->payloadType = 96;
    // 设置序列号和时间戳
    header->sequenceNumber = htons(sequenceNumber); // 需要进行字节序转换（网络字节序）
    header->timestamp = htonl(timestamp); // 需要进行字节序转换（网络字节序）

    // 设置同步信源标识符
    header->ssrc = htonl(ssrc); // 需要进行字节序转换（网络字节序）
}

void sendRTPPacket(const uint8_t* mpegPSData, int mpegPSLength, uint16_t sequenceNumber, uint32_t timestamp, uint32_t ssrc)
{
    int offset = 0; // 偏移量，用于遍历MPEG-PS包数据
    int remainingLength = mpegPSLength; // 剩余长度，用于判断是否需要分割RTP报文
    uint8_t rtpbuf[RTP_PAYLOAD_MAX_SIZE]; // RTP负载数据缓冲区
    struct RTPHeader rtpHeader; // RTP报文头部

    while (remainingLength > 0)
    {
        // 计算当前RTP负载数据长度（不超过RTP负载最大大小）
        bool is_mark = false;
        int data_len = RTP_PAYLOAD_MAX_SIZE;
        if (remainingLength <= RTP_PAYLOAD_MAX_SIZE)
        {
            data_len = remainingLength;
            is_mark = true;
        }

        // 填写RTP报文头部
        MakeRTPHeader(&rtpHeader, sequenceNumber, timestamp, ssrc);

        // 复制RTP头部到RTP负载缓冲区
        memcpy(rtpbuf, &rtpHeader, sizeof(RTPHeader));

        // 复制MPEG-PS数据到RTP负载缓冲区
        memcpy(rtpbuf + RTP_HEADER_LEN, mpegPSData + offset, data_len);

        // 将完整RTP包发送出去
        if (udp_channel_)
        {
            udp_channel_->PostSendBuf(rtpbuf, RTP_HEADER_LEN + data_len);
        }

        // 更新偏移量、剩余长度、序列号等信息
        offset += data_len;
        remainingLength -= data_len;
        sequenceNumber++;
    }
}

Please add the author hbstream ( http://haibindev.cnblogs.com ) for cooperation. Please indicate the author and source when reprinting.