一. 前言
使用 mediasoup 进行音视频通话时,客户端可以显示当前正在说话的用户,该功能是由 Speaker 端在发送音频 RTP 数据时携带这段音频音量值大小,mediasoup 从 RTP 包获取该信息后回调给 mediasoup-demo(音频值大小通常是计算一段时间的平均值),如果音量值大于设定的阈值则会通过 activeSpeaker 通知音频接收端当前用户正在讲话,音频接收客户端接收到该信令后做展示。
二. Audio Level
RTP 协议允许扩展头部字段,a=extmap:1 urn:ieft:params:rtp-hdrext:ssrc-audio-level 表示采用 RFC6464 定义的针对 audio 的扩展头部,该头部用于客户端通知混流器音量值大小,扩展头部的内容可以使用 one-byte extension header 或者 two-byte extension header。
字段 | 含义 |
V | 编解码器指示当前是否处于说话状态,V=1 表示说话中,V=0 表示静音 |
level | 音量电平值,取值范围为 [0, 127],表示 0 到 -127dBov,0 表示音量电平最大值,-127dBov 表示音量电平最小值 |
如下是 RFC6465 附录 A 提供的 audio level 值的计算方式,通过阅读代码可以知道,它其实是通过计算一段信号采样点值的对数均方根得到的 audio level 值。
db = 20 * Math.log10(∑(sample / overload)² / n_samples)
public class AudioLevelCalculator
{
/**
* Calculates the audio level of a signal with specific
* <tt>samples</tt>.
*
* @param samples the samples whose audio level we need to
* calculate. The samples are specified as an <tt>int</tt>
* array starting at <tt>offset</tt>, extending <tt>length</tt>
* number of elements, and each <tt>int</tt> element in the
* specified range representing a sample whose audio level we
* need to calculate. Though a sample is provided in the
* form of an <tt>int</tt> value, the sample size in bits
* is determined by the caller via <tt>overload</tt>.
*
* @param offset the offset in <tt>samples</tt> at which the
* samples start.
*
* @param length the length of the signal specified in
* <tt>samples<tt>, starting at <tt>offset</tt>.
*
* @param overload the overload (point) of <tt>signal</tt>.
* For example, <tt>overload</tt> can be {@link Byte#MAX_VALUE}
* for 8-bit signed samples or {@link Short#MAX_VALUE} for
* 16-bit signed samples.
*
* @return the audio level of the specified signal.
*/
public static int calculateAudioLevel(
int[] samples, int offset, int length,
int overload)
{
/*
* Calculate the root mean square (RMS) of the signal.
*/
double rms = 0;
for (; offset < length; offset++)
{
double sample = samples[offset];
sample /= overload;
rms += sample * sample;
}
rms = (length == 0) ? 0 : Math.sqrt(rms / length);
/*
* The audio level is a logarithmic measure of the
* rms level of an audio sample relative to a reference
* value and is measured in decibels.
*/
double db;
/*
* The minimum audio level permitted.
*/
final double MIN_AUDIO_LEVEL = -127;
/*
* The maximum audio level permitted.
*/
final double MAX_AUDIO_LEVEL = 0;
if (rms > 0)
{
/*
* The "zero" reference level is the overload level,
* which corresponds to 1.0 in this calculation, because
* the samples are normalized in calculating the RMS.
*/
db = 20 * Math.log10(rms);
/*
* Ensure that the calculated level is within the minimum
* and maximum range permitted.
*/
if (db < MIN_AUDIO_LEVEL)
db = MIN_AUDIO_LEVEL;
else if (db > MAX_AUDIO_LEVEL)
db = MAX_AUDIO_LEVEL;
}
else
{
db = MIN_AUDIO_LEVEL;
}
return (int)Math.round(db);
}
}
三. 源码剖析
mediasoup-demo 创建房间时会创建一个 AudioLevelObserver 对象用于监听指定的音频 Producer 在一段时间的平均音量值是否大于指定阈值 threshold,每个 Room 会有一个 AudioLevelObserver 对象。
mediasoup-demo 收到音频的 produce 请求后会通过 AudioLevelObserver 的 addProducer 添加该 producerId 。
mediasoup server 每次收到 RTP Packet 后会调用 AudioLevelObserver::ReceiveRtpPacket,该函数取出 RTP 包的 audio level 扩展字段值,得到 volume 后存放在该 Producer 对应的 dBovs 中。
AudioLevelObserver 对象创建后会有个定时器,它不断执行 AudioLevelObserver::Update 函数,如果这段定时器时间内收到某个 Producer 的音量值个数不小于 10 并且平均 volume 大于 threshold,则将其存放到 mapDBovsProducer,之后再通过管道通知 mediasoup-demo volumes 消息。
mediasoup-demo AudioLevelObserver 会监听 worker 的通知消息,如果是 volumes 则执行通知则获取当前所有房间内的用户,然后通过 activeSpeaker 信令通知 peerId 对应的音量值。
客户端接收到 activeSpeaker 通知后,调用 stateActions.setRoomActiveSpeaker(peerId) 设置当前 speaker。