Dialogue audio and video Niu Ge: How to design a fully functional cross-platform low-latency RTMP player

development background

In 2015, when we were working on the mobile individual emergency command project, the push end adopted the RTMP solution, which was relatively early to get involved in RTMP at that time. After the RTMP push module was completed, we found VLC and Vitamio on the market. To test the overall delay, the actual effect is really unsatisfactory. As you know, the emergency command system has high requirements for delay in addition to stability. A delay of a few seconds (>3-5 seconds) is unacceptable to us. Yes, players such as VLC, although the functions are huge, the on-demand experience can meet the demands of most scenarios, and the live broadcast scenario is indeed not satisfactory.

For this reason, we came up with the idea of ​​developing an RTMP player suitable for low-latency scenarios, and started from the Windows platform. Considering that the existing open source players are large and comprehensive in design, they are not suitable for live broadcast scenarios, and we have plenty of time. We started The RTMP player design that started the self-developed framework, the first version was released, and the delay was already at the millisecond level. This was indeed a gratifying thing at the time, even now.

overall program structure

The RTMP live broadcast player has a clear goal. It pulls streaming data from an RTMP server (self-built server or CDN), and completes data analysis, decoding, audio and video data synchronization, and rendering.

Specifically correspond to the "receiving end" part of the figure below:

First Edition Design Goals

  • Own framework, easy to expand;
  • Support various abnormal network status processing, such as network disconnection and reconnection;
  • There is an Event status callback to ensure that developers can understand the overall status of the player;
  • Support multi-instance playback;
  • Video supports H.264, audio supports AAC/PCMA/PCMU;
  • Support buffer time setting (buffer time);
  • Support audio and video synchronization;
  • Support real-time mute.

function after iteration

  • [ Support playback protocol ] RTMP millisecond delay (200-400ms under low delay);
  •  [ Multi-instance playback ] Support multi-instance playback (lower CPU usage);
  •  [ Event Callback ] Support network status, buffer status and other callbacks;
  •  [ Video format ] Support RTMP extension H.265, H.264;
  •  [ Audio Format ] Support AAC/PCMA/PCMU/Speex;
  •  [ H.264/H.265 soft decoding ] Support H.264/H.265 soft decoding;
  •  [ H.264 hard decoding ] Windows /Android/iOS supports H.264 hard decoding;
  •  [ H.265 hard solution ] Windows/ Android/iOS supports H.265 hard solution;
  •  [ H.264/H.265 hard decoding ] Android supports setting Surface mode hard decoding and normal mode hard decoding;
  •  [ Buffer time setting ] support buffer time setting;
  •  [ First screen second open ] Support first screen second open mode (when the RTMP server caches GOP);
  •  [ Low Latency Mode ] Support ultra-low latency mode setting (public network 200~400ms);
  •  [ Complex network processing ] Support automatic adaptation of various network environments such as network disconnection and reconnection;
  •  [ Quickly switch URLs ] Support quick switching to other URLs during playback, and content switching is faster;
  •  [ Multiple rendering mechanisms for audio and video ] Android platform, video: surfaceview/OpenGL ES, audio: AudioTrack/OpenSL ES;
  •  [ Real-time mute ] Support real-time mute/unmute during playback;
  •  [ Real-time volume adjustment ] supports real-time adjustment of the playback volume during playback, and the adjustment range is [0, 100];
  •  [ Real-time snapshot ] supports intercepting the current video frame during playback;
  •  [ Only play key frames ] Windows platform supports real-time setting whether to play only key frames;
  •  [ Rendering angle ] supports 0°, 90°, 180° and 270° four video screen rendering angle settings;
  •  [ Rendering mirror ] supports horizontal inversion and vertical inversion mode settings;
  •  [ Real-time download speed update ] supports real-time callback of the current download speed (supports setting the callback time interval);
  •  [ ARGB Overlay ] Windows platform supports ARGB image overlay to display video (see C++ DEMO);
  •  [ Video data callback before decoding ] Support H.264/H.265 data callback;
  •  [ Video data callback after decoding ] Support YUV/RGB data callback after decoding;
  •  [ Video data scaling callback after decoding ] The Windows platform supports the interface for specifying the size of the callback image (the original image can be scaled and then called back to the upper layer);
  •  [ Audio data callback before decoding ] Support AAC/PCMA/PCMU/SPEEX data callback;
  •  [ Audio and video adaptation ] Support self-adaptation after audio and video information changes during playback;
  •  [ Extended recording function ] Support RTMP H.264, extended H.265 stream recording, support PCMA/PCMU/Speex to AAC and then record, support setting to record only audio or video, etc.;

interface design

On the Windows platform, we use the C interface, and provide C++ and C# calling examples. This article takes the C++ demo as an example to briefly introduce the commonly used interface design. 

1. Init/UnInit() interface

The Init and UnInit interfaces only need to be called once when multiple playback instances are started to perform basic initialization/deinitialization operations.

/*
flag目前传0,后面扩展用, pReserve传NULL,扩展用,
成功返回 NT_ERC_OK
*/NT_UINT32(NT_API *Init)(NT_UINT32 flag, NT_PVOID pReserve);

/*
这个是最后一个调用的接口
成功返回 NT_ERC_OK
*/NT_UINT32(NT_API *UnInit)();

2. Open/Close() interface

The purpose of the Open interface is mainly to create an instance and return the player instance handle normally. If there are multiple playback demands, just create multiple instances.

The Close interface, corresponding to the Open() interface, is responsible for releasing the resources of the corresponding instance. After calling the Close() interface, remember to set the instance handle to 0.

Note: For example, an instance can not only realize playback, but also record video at the same time, or pull stream (forward). In this case, when calling the Close() interface, you need to ensure that the recording and streaming are stopped normally before calling.

/*
flag目前传0,后面扩展用, pReserve传NULL,扩展用,
NT_HWND hwnd, 绘制画面用的窗口, 可以设置为NULL
获取Handle
成功返回 NT_ERC_OK
*/NT_UINT32(NT_API *Open)(NT_PHANDLE pHandle, NT_HWND hwnd, NT_UINT32 flag, NT_PVOID pReserve);

/*
调用这个接口之后handle失效,
成功返回 NT_ERC_OK
*/NT_UINT32(NT_API *Close)(NT_HANDLE handle);

3. Network status callback

For a good player, a good status callback is essential, such as real-time feedback such as network connection status, snapshot, video status, current download speed, etc., which allows upper-level developers to better control the status of the player and give users better playback experience.

/*
设置事件回调,如果想监听事件的话,建议调用Open成功后,就调用这个接口
*/NT_UINT32(NT_API *SetEventCallBack)(NT_HANDLE handle,
    NT_PVOID call_back_data, NT_SP_SDKEventCallBack call_back);

demo implementation example:

LRESULT CSmartPlayerDlg::OnSDKEvent(WPARAM wParam, LPARAM lParam){
    if (!is_playing_ && !is_recording_)
    {
        return S_OK;
    }

    NT_UINT32 event_id = (NT_UINT32)(wParam);

    if ( NT_SP_E_EVENT_ID_PLAYBACK_REACH_EOS == event_id )
    {
        StopPlayback();
        return S_OK;
    }
    elseif ( NT_SP_E_EVENT_ID_RECORDER_REACH_EOS == event_id )
    {
        StopRecorder();
        return S_OK;
    }
    elseif ( NT_SP_E_EVENT_ID_RTSP_STATUS_CODE == event_id )
    {
        int status_code = (int)lParam;
        if ( 401 == status_code )
        {
            HandleVerification();
        }

        return S_OK;
    }
    elseif (NT_SP_E_EVENT_ID_NEED_KEY == event_id)
    {
        HandleKeyEvent(false);

        return S_OK;
    }
    elseif (NT_SP_E_EVENT_ID_KEY_ERROR == event_id)
    {
        HandleKeyEvent(true);

        return S_OK;
    }
    elseif ( NT_SP_E_EVENT_ID_PULLSTREAM_REACH_EOS == event_id )
    {
        if (player_handle_ != NULL)
        {
            player_api_.StopPullStream(player_handle_);
        }

        return S_OK;
    }
    elseif ( NT_SP_E_EVENT_ID_DURATION == event_id )
    {
        NT_INT64 duration = (NT_INT64)(lParam);

        edit_duration_.SetWindowTextW(GetHMSMsFormatStr(duration, false, false).c_str());

        return S_OK;
    }

    if ( NT_SP_E_EVENT_ID_CONNECTING == event_id
        || NT_SP_E_EVENT_ID_CONNECTION_FAILED == event_id
        || NT_SP_E_EVENT_ID_CONNECTED == event_id
        || NT_SP_E_EVENT_ID_DISCONNECTED == event_id
        || NT_SP_E_EVENT_ID_NO_MEDIADATA_RECEIVED == event_id)
    {
        if ( NT_SP_E_EVENT_ID_CONNECTING == event_id )
        {
            OutputDebugStringA("connection status: connecting\r\n");
        }
        elseif ( NT_SP_E_EVENT_ID_CONNECTION_FAILED == event_id )
        {
            OutputDebugStringA("connection status: connection failed\r\n");
        }
        elseif ( NT_SP_E_EVENT_ID_CONNECTED == event_id )
        {
            OutputDebugStringA("connection status: connected\r\n");
        }
        elseif (NT_SP_E_EVENT_ID_DISCONNECTED == event_id)
        {
            OutputDebugStringA("connection status: disconnected\r\n");
        }
        elseif (NT_SP_E_EVENT_ID_NO_MEDIADATA_RECEIVED == event_id)
        {
            OutputDebugStringA("connection status: no mediadata received\r\n");
        }

        connection_status_ = event_id;
    }

    if ( NT_SP_E_EVENT_ID_START_BUFFERING == event_id
        || NT_SP_E_EVENT_ID_BUFFERING == event_id
        || NT_SP_E_EVENT_ID_STOP_BUFFERING == event_id )
    {
        buffer_status_ = event_id;
        
        if ( NT_SP_E_EVENT_ID_BUFFERING == event_id )
        {
            buffer_percent_ = (NT_INT32)lParam;

            std::wostringstream ss;
            ss << L"buffering:" << buffer_percent_ << "%";
            OutputDebugStringW(ss.str().c_str());
            OutputDebugStringW(L"\r\n");
        }
    }

    if ( NT_SP_E_EVENT_ID_DOWNLOAD_SPEED == event_id )
    {
        download_speed_ = (NT_INT32)lParam;

        /*std::wostringstream ss;
        ss << L"downloadspeed:" << download_speed_ << L"\r\n";

        OutputDebugStringW(ss.str().c_str());*/
    }

    CString show_str = base_title_;

    if ( connection_status_ != 0 )
    {
        show_str += _T("--链接状态: ");

        if ( NT_SP_E_EVENT_ID_CONNECTING == connection_status_ )
        {
            show_str += _T("链接中");
        }
        elseif ( NT_SP_E_EVENT_ID_CONNECTION_FAILED == connection_status_ )
        {
            show_str += _T("链接失败");
        }
        elseif ( NT_SP_E_EVENT_ID_CONNECTED == connection_status_ )
        {
            show_str += _T("链接成功");
        }
        elseif ( NT_SP_E_EVENT_ID_DISCONNECTED == connection_status_ )
        {
            show_str += _T("链接断开");
        }
        elseif (NT_SP_E_EVENT_ID_NO_MEDIADATA_RECEIVED == connection_status_)
        {
            show_str += _T("收不到数据");
        }
    }

    if (download_speed_ != -1)
    {
        std::wostringstream ss;
        ss << L"--下载速度:" << (download_speed_ * 8 / 1000) << "kbps"
          << L"(" << (download_speed_ / 1024) << "KB/s)";

        show_str += ss.str().c_str();
    }

    if ( buffer_status_ != 0 )
    {
        show_str += _T("--缓冲状态: ");

        if ( NT_SP_E_EVENT_ID_START_BUFFERING == buffer_status_ )
        {
            show_str += _T("开始缓冲");
        }
        elseif (NT_SP_E_EVENT_ID_BUFFERING == buffer_status_)
        {
            std::wostringstream ss;
            ss << L"缓冲中" << buffer_percent_ << "%";
            show_str += ss.str().c_str();
        }
        elseif (NT_SP_E_EVENT_ID_STOP_BUFFERING == buffer_status_)
        {
            show_str += _T("结束缓冲");
        }
    }


    SetWindowText(show_str);

    return S_OK;
}

4. Soft decoding or hard decoding?

Generally speaking, if there are not many instances playing at the same time or the resolution is not too high on the Windows platform, considering the playback experience, it is recommended to give priority to soft decoding. If a specific device requires multi-channel playback, hard decoding can also be considered. It should be noted that , if you call hard decoding, you need to check whether it supports hard decoding first, the interface is as follows:

/*
检查是否支持H264硬解码
如果支持的话返回NT_ERC_OK
*/NT_UINT32(NT_API *IsSupportH264HardwareDecoder)();


/*
检查是否支持H265硬解码
如果支持的话返回NT_ERC_OK
*/NT_UINT32(NT_API *IsSupportH265HardwareDecoder)();


/*
*设置H264硬解
*is_hardware_decoder: 1:表示硬解, 0:表示不用硬解
*reserve: 保留参数, 当前传0就好
*成功返回NT_ERC_OK
*/NT_UINT32(NT_API *SetH264HardwareDecoder)(NT_HANDLE handle, NT_INT32 is_hardware_decoder, NT_INT32 reserve);


/*
*设置H265硬解
*is_hardware_decoder: 1:表示硬解, 0:表示不用硬解
*reserve: 保留参数, 当前传0就好
*成功返回NT_ERC_OK
*/NT_UINT32(NT_API *SetH265HardwareDecoder)(NT_HANDLE handle, NT_INT32 is_hardware_decoder, NT_INT32 reserve);

5. Only solve the key frame

On the mobile side, generally there is not much demand for real scenes that only play key frames, but on the window side, in many scenes, because many channels need to be played, but they don’t want to take up too many system resources, if the full frame is played, the number of channels is too many , all decoding and drawing will increase the system resource usage. If you can handle it flexibly, you can only play key frames and switch to full-frame playback at any time, which greatly reduces system performance requirements. When you want to play full-frame, you can switch to full-frame drawing at any time. .

/*
*设置只解码视频关键帧
*is_only_dec_key_frame: 1:表示只解码关键帧, 0:表示都解码, 默认是0
*成功返回NT_ERC_OK
*/NT_UINT32(NT_API *SetOnlyDecodeVideoKeyFrame)(NT_HANDLE handle, NT_INT32 is_only_dec_key_frame);

6. Buffer time setting

Buffer time, as the name suggests, is how much data is buffered before starting to play. For example, if you set a buffer time of 2000ms, in live mode, it will play normally after receiving 2 seconds of data.

Increasing the buffer time will increase the playback delay. The advantage is that the fluency is better when the network is jittery.

/*
设置buffer,最小0ms
*/NT_UINT32(NT_API *SetBuffer)(NT_HANDLE handle, NT_INT32 buffer);

7. Real-time mute, real-time volume adjustment

Real-time mute, real-time volume adjustment As the name suggests, the player can adjust the playback volume in real time, or directly mute it, especially in multi-channel playback scenarios, it is very necessary.

/*
静音接口,1为静音,0为不静音
*/NT_UINT32(NT_API *SetMute)(NT_HANDLE handle, NT_INT32 is_mute);

/*
设置播放音量, 范围是[0, 100], 0是静音,100是最大音量, 默认是100
调用正确返回NT_ERC_OK
*/NT_UINT32(NT_API *SetAudioVolume)(NT_HANDLE handle, NT_INT32 volume);

8. Set the video screen fill mode

Set the filling mode of the video screen, such as filling the entire view, or filling the view with equal proportions. If not set, the entire view will be filled by default.

The relevant interface design is as follows:

player_api_.SetRenderScaleMode(player_handle_, btn_check_render_scale_mode_.GetCheck() == BST_CHECKED ? 1 : 0);

9. Quick start

Quick start is mainly for flashing the latest data to ensure the continuity of the screen in the scenario where the server caches the GOP.

/*
设置秒开, 1为秒开, 0为不秒开
*/NT_UINT32(NT_API* SetFastStartup)(NT_HANDLE handle, NT_INT32 isFastStartup);

10. Low latency mode

In low-latency mode, set the buffer time to 0, and the delay is lower, which is suitable for ultra-low-latency scenarios that require manipulation and control.

/*
设置低延时播放模式,默认是正常播放模式
mode: 1为低延时模式, 0为正常模式,其他只无效
接口调用成功返回NT_ERC_OK
*/NT_UINT32(NT_API* SetLowLatencyMode)(NT_HANDLE handle, NT_INT32 mode);

11. Video view rotation, horizontal | vertical flip

The interface is mainly used, for example, in scenarios where the original video is upside down, and the device cannot be adjusted, and the normal angle of the image can be played through the player.

/*
*上下反转(垂直反转)
*is_flip: 1:表示反转, 0:表示不反转
*/NT_UINT32(NT_API *SetFlipVertical)(NT_HANDLE handle, NT_INT32 is_flip);


/*
*水平反转
*is_flip: 1:表示反转, 0:表示不反转
*/NT_UINT32(NT_API *SetFlipHorizontal)(NT_HANDLE handle, NT_INT32 is_flip);


/*
设置旋转,顺时针旋转
degress: 设置0, 90, 180, 270度有效,其他值无效
注意:除了0度,其他角度播放会耗费更多CPU
接口调用成功返回NT_ERC_OK
*/NT_UINT32(NT_API* SetRotation)(NT_HANDLE handle, NT_INT32 degress);

12. Set the real-time callback download speed

Call the real-time download speed interface, and realize a more friendly interaction between the APP layer and the underlying SDK by setting the download speed time interval and whether the current download speed needs to be reported.

/*
设置下载速度上报, 默认不上报下载速度
is_report: 上报开关, 1: 表上报. 0: 表示不上报. 其他值无效.
report_interval: 上报时间间隔(上报频率),单位是秒,最小值是1秒1次. 如果小于1且设置了上报,将调用失败
注意:如果设置上报的话,请设置SetEventCallBack, 然后在回调函数里面处理这个事件.
上报事件是:NT_SP_E_EVENT_ID_DOWNLOAD_SPEED
这个接口必须在StartXXX之前调用
成功返回NT_ERC_OK
*/NT_UINT32(NT_API *SetReportDownloadSpeed)(NT_HANDLE handle,
NT_INT32 is_report, NT_INT32 report_interval);


/*
主动获取下载速度
speed: 返回下载速度,单位是Byte/s
(注意:这个接口必须在startXXX之后调用,否则会失败)
成功返回NT_ERC_OK
*/NT_UINT32(NT_API *GetDownloadSpeed)(NT_HANDLE handle, NT_INT32* speed);

13. Live snapshot

To put it simply, is it necessary to access the current playback screen during playback?

/*
捕获图片
file_name_utf8: 文件名称,utf8编码
call_back_data: 回调时用户自定义数据
call_back: 回调函数,用来通知用户截图已经完成或者失败
成功返回 NT_ERC_OK
只有在播放时调用才可能成功,其他情况下调用,返回错误.
因为生成PNG文件比较耗时,一般需要几百毫秒,为防止CPU过高,SDK会限制截图请求数量,当超过一定数量时,
调用这个接口会返回NT_ERC_SP_TOO_MANY_CAPTURE_IMAGE_REQUESTS. 这种情况下, 请延时一段时间,等SDK处理掉一些请求后,再尝试.
*/NT_UINT32(NT_API* CaptureImage)(NT_HANDLE handle, NT_PCSTR file_name_utf8,
NT_PVOID call_back_data, SP_SDKCaptureImageCallBack call_back);

The calling example is as follows:

voidCSmartPlayerDlg::OnBnClickedButtonCaptureImage(){
  if ( capture_image_path_.empty() )
  {
    AfxMessageBox(_T("请先设置保存截图文件的目录! 点击截图左边的按钮设置!"));
    return;
  }

  if ( player_handle_ == NULL )
  {
    return;
  }

  if ( !is_playing_ )
  {
    return;
  }

  std::wostringstream ss;
  ss << capture_image_path_;

  if ( capture_image_path_.back() != L'\\' )
  {
    ss << L"\\";
  }

  SYSTEMTIME sysTime;
  ::GetLocalTime(&sysTime);

  ss << L"SmartPlayer-"
    << std::setfill(L'0') << std::setw(4) << sysTime.wYear
    << std::setfill(L'0') << std::setw(2) << sysTime.wMonth
    << std::setfill(L'0') << std::setw(2) << sysTime.wDay
    << L"-"
    << std::setfill(L'0') << std::setw(2) << sysTime.wHour
    << std::setfill(L'0') << std::setw(2) << sysTime.wMinute
    << std::setfill(L'0') << std::setw(2) << sysTime.wSecond;

  ss << L"-" << std::setfill(L'0') << std::setw(3) << sysTime.wMilliseconds
    << L".png";

  std::wstring_convert<std::codecvt_utf8<wchar_t> > conv;

  auto val_str = conv.to_bytes(ss.str());

  auto ret = player_api_.CaptureImage(player_handle_, val_str.c_str(), NULL, &SM_SDKCaptureImageHandle);
  if (NT_ERC_OK == ret)
  {
    // 发送截图请求成功
  }
  elseif (NT_ERC_SP_TOO_MANY_CAPTURE_IMAGE_REQUESTS == ret)
  {
    // 通知用户延时OutputDebugStringA("Too many capture image requests!!!\r\n");
  }
  else
  {
    // 其他失败
  }
}

14. Extended recording operation

For the recording on the playback side, we have done very detailed, for example, you can only record audio or only video, set the storage path of the recording, and set the size of a single file. If it is not AAC data, you can convert it to AAC and then record.

/*
* 设置是否录视频,默认的话,如果视频源有视频就录,没有就没得录, 但有些场景下可能不想录制视频,只想录音频,所以增加个开关
* is_record_video: 1 表示录制视频, 0 表示不录制视频, 默认是1
*/NT_UINT32(NT_API *SetRecorderVideo)(NT_HANDLE handle, NT_INT32 is_record_video);


/*
* 设置是否录音频,默认的话,如果视频源有音频就录,没有就没得录, 但有些场景下可能不想录制音频,只想录视频,所以增加个开关
* is_record_audio: 1 表示录制音频, 0 表示不录制音频, 默认是1
*/NT_UINT32(NT_API *SetRecorderAudio)(NT_HANDLE handle, NT_INT32 is_record_audio);


/*
设置本地录像目录, 必须是英文目录,否则会失败
*/NT_UINT32(NT_API *SetRecorderDirectory)(NT_HANDLE handle, NT_PCSTR dir);

/*
设置单个录像文件最大大小, 当超过这个值的时候,将切割成第二个文件
size: 单位是KB(1024Byte), 当前范围是 [5MB-800MB], 超出将被设置到范围内
*/NT_UINT32(NT_API *SetRecorderFileMaxSize)(NT_HANDLE handle, NT_UINT32 size);

/*
设置录像文件名生成规则
*/NT_UINT32(NT_API *SetRecorderFileNameRuler)(NT_HANDLE handle, NT_SP_RecorderFileNameRuler* ruler);


/*
设置录像回调接口
*/NT_UINT32(NT_API *SetRecorderCallBack)(NT_HANDLE handle,
NT_PVOID call_back_data, SP_SDKRecorderCallBack call_back);


/*
设置录像时音频转AAC编码的开关, aac比较通用,sdk增加其他音频编码(比如speex, pcmu, pcma等)转aac的功能.
is_transcode: 设置为1的话,如果音频编码不是aac,则转成aac, 如果是aac,则不做转换. 设置为0的话,则不做任何转换. 默认是0.
注意: 转码会增加性能消耗
*/NT_UINT32(NT_API *SetRecorderAudioTranscodeAAC)(NT_HANDLE handle, NT_INT32 is_transcode);


/*
启动录像
*/NT_UINT32(NT_API *StartRecorder)(NT_HANDLE handle);

/*
停止录像
*/NT_UINT32(NT_API *StopRecorder)(NT_HANDLE handle);

15. Pull the data encoded by the callback callback (used in conjunction with the forwarding module)

Pull stream callback coded data, mainly for use with the forwarding module, such as pulling rtsp or rtmp stream data, directly transfer to RTMP and push to RTMP service.

/*
* 设置拉流时,吐视频数据的回调
*/NT_UINT32(NT_API *SetPullStreamVideoDataCallBack)(NT_HANDLE handle,
NT_PVOID call_back_data, SP_SDKPullStreamVideoDataCallBack call_back);

/*
* 设置拉流时,吐音频数据的回调
*/NT_UINT32(NT_API *SetPullStreamAudioDataCallBack)(NT_HANDLE handle,
NT_PVOID call_back_data, SP_SDKPullStreamAudioDataCallBack call_back);


/*
设置拉流时音频转AAC编码的开关, aac比较通用,sdk增加其他音频编码(比如speex, pcmu, pcma等)转aac的功能.
is_transcode: 设置为1的话,如果音频编码不是aac,则转成aac, 如果是aac,则不做转换. 设置为0的话,则不做任何转换. 默认是0.
注意: 转码会增加性能消耗
*/NT_UINT32(NT_API *SetPullStreamAudioTranscodeAAC)(NT_HANDLE handle, NT_INT32 is_transcode);


/*
启动拉流
*/NT_UINT32(NT_API *StartPullStream)(NT_HANDLE handle);

/*
停止拉流
*/NT_UINT32(NT_API *StopPullStream)(NT_HANDLE handle);

16. H264 user data callback or SEI data callback

If the sender adds custom user data data during 264 encoding, the data callback can be realized through the following interface. If you need to directly call back the SEI data, just adjust the following SEI callback interface.

/*
设置用户数据回调
*/NT_UINT32(NT_API *SetUserDataCallBack)(NT_HANDLE handle,
NT_PVOID call_back_data, NT_SP_SDKUserDataCallBack call_back);

The calling example is as follows:

extern"C"NT_VOID NT_CALLBACK NT_SP_SDKUserDataHandle(NT_HANDLE handle, NT_PVOID user_data,
  NT_INT32  data_type,
  NT_PVOID  data,
  NT_UINT32 size,
  NT_UINT64 timestamp,
  NT_UINT64 reserve1,
  NT_INT64  reserve2,
  NT_PVOID  reserve3){
  if ( 1 == data_type )
  {
    std::wostringstream oss;
    oss << L"userdata ";

    const NT_BYTE* byte_data = reinterpret_cast<const NT_BYTE*>(data);
    if ( byte_data != nullptr && size > 0 )
    {
      oss << L" byte data size=" << size;
    }

    std::wstring_convert<std::codecvt_utf8<wchar_t> > conv;

    oss << L" t:" << timestamp << L"\r\n";

    OutputDebugStringW(oss.str().c_str());
  }
  elseif ( 2 == data_type )
  {
    const NT_CHAR* str_data = reinterpret_cast<const NT_CHAR*>(data);
    if (str_data != nullptr && size > 0)
    {
      std::unique_ptr<std::string> s(new std::string(str_data, str_data + size));

      // oss << L" utf8 string:" << conv.from_bytes(*s);// oss << L" size=" << size;if ( !s->empty() )
      {
        HWND hwnd = reinterpret_cast<HWND>(user_data);
        if ( hwnd != nullptr && ::IsWindow(hwnd) )
        {
          ::PostMessage(hwnd, WM_USER_SDK_SP_RECV_USER_DATA, (WPARAM)s.release(), (LPARAM)timestamp);
        }
      }
    }
  }

}

17. Set the YUV and RGB data after callback decoding

If you need to perform secondary processing on the decoded yuv or rgb data, such as face recognition, you can call back the yuv rgb interface to achieve data secondary processing. For the Windows platform, if the device does not support D3D, you can also call back the data Come up to GDI mode drawing:

player_api_.SetVideoFrameCallBack(player_handle_, NT_SP_E_VIDEO_FRAME_FORMAT_RGB32,
GetSafeHwnd(), SM_SDKVideoFrameHandle);

extern"C"NT_VOID NT_CALLBACK SM_SDKVideoFrameHandle(NT_HANDLE handle, NT_PVOID userData, NT_UINT32 status,
  const NT_SP_VideoFrame* frame){
  /*if (frame != NULL)
  {
  std::ostringstream ss;
  ss << "Receive frame time_stamp:" << frame->timestamp_ << "ms" << "\r\n";
  OutputDebugStringA(ss.str().c_str());
  }*/if ( frame != NULL )
  {
    if ( NT_SP_E_VIDEO_FRAME_FORMAT_RGB32 == frame->format_
      && frame->plane0_ != NULL
      && frame->stride0_ > 0
      && frame->height_ > 0 )
    {
      std::unique_ptr<nt_rgb32_image > pImage(new nt_rgb32_image());

      pImage->size_ = frame->stride0_* frame->height_;
      pImage->data_ = new NT_BYTE[pImage->size_];

      memcpy(pImage->data_, frame->plane0_, pImage->size_);

      pImage->width_  = frame->width_;
      pImage->height_ = frame->height_;
      pImage->stride_ = frame->stride0_;

      HWND hwnd = (HWND)userData;
      if ( hwnd != NULL && ::IsWindow(hwnd) )
      {
        ::PostMessage(hwnd, WM_USER_SDK_RGB32_IMAGE, (WPARAM)handle, (LPARAM)pImage.release());
      }
    }
  }
}

Summarize

The above are some of our experience in developing RTMP players. In addition to the above basic design, there are other things, such as if the system does not support D3D, it needs to be drawn in GDI mode, real-time text is superimposed on the playback interface, and the playback screen is full-screen, etc., so I won’t go into details here. .

In addition to the Windows platform, we have also simultaneously developed RTMP players for Linux, Android, and iOS platforms. Most of the four platforms with conventional interfaces are basically unified, and the delay is also at the millisecond level. For most developers, it is not necessary to implement all the above parts, as long as 30-40% of them are implemented according to the product requirements, it is enough to meet specific scenarios.

A good player, especially to meet low-latency and stable playback (millisecond-level delay), needs to pay attention to more than that. After accumulating a lot, climbing to the top of the mountain is not to enjoy the scenery, but to find a higher mountain!

Guess you like

Origin blog.csdn.net/renhui1112/article/details/132226708