一. 前言

使用 mediasoup 进行音视频通话时，客户端可以显示当前正在说话的用户，该功能是由 Speaker 端在发送音频 RTP 数据时携带这段音频音量值大小，mediasoup 从 RTP 包获取该信息后回调给 mediasoup-demo（音频值大小通常是计算一段时间的平均值），如果音量值大于设定的阈值则会通过 activeSpeaker 通知音频接收端当前用户正在讲话，音频接收客户端接收到该信令后做展示。

二. Audio Level

RTP 协议允许扩展头部字段，a=extmap:1 urn:ieft:params:rtp-hdrext:ssrc-audio-level 表示采用 RFC6464 定义的针对 audio 的扩展头部，该头部用于客户端通知混流器音量值大小，扩展头部的内容可以使用 one-byte extension header 或者 two-byte extension header。

字段	含义
V	编解码器指示当前是否处于说话状态，V=1 表示说话中，V=0 表示静音
level	音量电平值，取值范围为 [0, 127]，表示 0 到 -127dBov，0 表示音量电平最大值，-127dBov 表示音量电平最小值

如下是 RFC6465 附录 A 提供的 audio level 值的计算方式，通过阅读代码可以知道，它其实是通过计算一段信号采样点值的对数均方根得到的 audio level 值。

db = 20 * Math.log10(∑(sample / overload)² / n_samples)

public class AudioLevelCalculator
   {

       /**
        * Calculates the audio level of a signal with specific
        * <tt>samples</tt>.
        *
        * @param samples  the samples whose audio level we need to
        * calculate.  The samples are specified as an <tt>int</tt>
        * array starting at <tt>offset</tt>, extending <tt>length</tt>
        * number of elements, and each <tt>int</tt> element in the
        * specified range representing a sample whose audio level we
        * need to calculate.  Though a sample is provided in the
        * form of an <tt>int</tt> value, the sample size in bits
        * is determined by the caller via <tt>overload</tt>.
        *
        * @param offset  the offset in <tt>samples</tt> at which the
        * samples start.
        *
        * @param length  the length of the signal specified in
        * <tt>samples<tt>, starting at <tt>offset</tt>.
        *
        * @param overload  the overload (point) of <tt>signal</tt>.
        * For example, <tt>overload</tt> can be {@link Byte#MAX_VALUE}
        * for 8-bit signed samples or {@link Short#MAX_VALUE} for
        * 16-bit signed samples.
        *
        * @return  the audio level of the specified signal.
        */
       public static int calculateAudioLevel(
           int[] samples, int offset, int length,
           int overload)
       {
           /*
            * Calculate the root mean square (RMS) of the signal.
            */
           double rms = 0;

           for (; offset < length; offset++)
           {
               double sample = samples[offset];

               sample /= overload;
               rms += sample * sample;
           }
           rms = (length == 0) ? 0 : Math.sqrt(rms / length);

           /*
            * The audio level is a logarithmic measure of the
            * rms level of an audio sample relative to a reference
            * value and is measured in decibels.
            */
           double db;

           /*
            * The minimum audio level permitted.
            */
           final double MIN_AUDIO_LEVEL = -127;

           /*
            * The maximum audio level permitted.
            */
           final double MAX_AUDIO_LEVEL = 0;

           if (rms > 0)
           {
               /*
                * The "zero" reference level is the overload level,
                * which corresponds to 1.0 in this calculation, because
                * the samples are normalized in calculating the RMS.
                */
               db = 20 * Math.log10(rms);

               /*
                * Ensure that the calculated level is within the minimum
                * and maximum range permitted.
                */
               if (db < MIN_AUDIO_LEVEL)
                   db = MIN_AUDIO_LEVEL;
               else if (db > MAX_AUDIO_LEVEL)
                   db = MAX_AUDIO_LEVEL;
           }
           else
           {
               db = MIN_AUDIO_LEVEL;
           }

           return (int)Math.round(db);
       }
   }

三. 源码剖析

mediasoup-demo 创建房间时会创建一个 AudioLevelObserver 对象用于监听指定的音频 Producer 在一段时间的平均音量值是否大于指定阈值 threshold，每个 Room 会有一个 AudioLevelObserver 对象。

mediasoup-demo 收到音频的 produce 请求后会通过 AudioLevelObserver 的 addProducer 添加该 producerId 。

mediasoup server 每次收到 RTP Packet 后会调用 AudioLevelObserver::ReceiveRtpPacket，该函数取出 RTP 包的 audio level 扩展字段值，得到 volume 后存放在该 Producer 对应的 dBovs 中。

AudioLevelObserver 对象创建后会有个定时器，它不断执行 AudioLevelObserver::Update 函数，如果这段定时器时间内收到某个 Producer 的音量值个数不小于 10 并且平均 volume 大于 threshold，则将其存放到 mapDBovsProducer，之后再通过管道通知 mediasoup-demo volumes 消息。

mediasoup-demo AudioLevelObserver 会监听 worker 的通知消息，如果是 volumes 则执行通知则获取当前所有房间内的用户，然后通过 activeSpeaker 信令通知 peerId 对应的音量值。

客户端接收到 activeSpeaker 通知后，调用 stateActions.setRoomActiveSpeaker(peerId) 设置当前 speaker。

四. 参考资料

RFC 6464 - A Real-time Transport Protocol (RTP) Header Extension for Client-to-Mixer Audio Level Indication

mediasoup显示当前Speaker

一. 前言

二. Audio Level

三. 源码剖析

四. 参考资料

猜你喜欢