webrtc Histogram(直方图) 算法研究
说明:Histogram被使用在neteq的DelayManager使用,被用做计算网络延迟。
关键数据结构:
private:
std::vector<int> buckets_;
int forget_factor_; // Q15
const int base_forget_factor_;
int add_count_;
const absl::optional<double> start_forget_weight_;
buckets_ 桶,每一个桶代表数组索引个单位的延迟的百分比使用Q30表示(Q30代表定点数表示浮点数的方法,具体实现可以百度),所有桶相加的和为100% 如:
buckets_[1]=10% 一个单位延迟的包占比10%
buckets_[2]=20% 两个单位延迟的包占比20%
buckets_[3]=30% 三个单位延迟的包占比30%
…
forget_factor_ 遗忘因子,每一次跟新数据时需要遗忘的百分比。
base_forget_factor_最终趋向的稳定遗忘因子。
add_count_ 所有的样本的累加数。
网络延迟统计算法:
void Histogram::Add(int value) {
RTC_DCHECK(value >= 0);
RTC_DCHECK(value < static_cast<int>(buckets_.size()));
int vector_sum = 0; // Sum up the vector elements as they are processed.
// Multiply each element in |buckets_| with |forget_factor_|.
//统计所有的bucket并使用遗忘因子进行遗忘
for (int& bucket : buckets_) {
bucket = (static_cast<int64_t>(bucket) * forget_factor_) >> 15;
vector_sum += bucket;
}
// Increase the probability for the currently observed inter-arrival time
// by 1 - |forget_factor_|. The factor is in Q15, |buckets_| in Q30.
// Thus, left-shift 15 steps to obtain result in Q30.
//使用(1-forget_factor_)更新最新的buckets_所占百分比,注意这个buckets_是Q30表示,而forget_factor_是Q15
buckets_[value] += (32768 - forget_factor_) << 15;
vector_sum += (32768 - forget_factor_) << 15; // Add to vector sum.
// |buckets_| should sum up to 1 (in Q30), but it may not due to
// fixed-point rounding errors.
//将vector_sum的值维持在1的大小
vector_sum -= 1 << 30; // Should be zero. Compensate if not.
if (vector_sum != 0) {
// Modify a few values early in |buckets_|.
int flip_sign = vector_sum > 0 ? -1 : 1;
for (int& bucket : buckets_) {
// Add/subtract 1/16 of the element, but not more than |vector_sum|.
int correction = flip_sign * std::min(std::abs(vector_sum), bucket >> 4);
bucket += correction;
vector_sum += correction;
if (std::abs(vector_sum) == 0) {
break;
}
}
}
RTC_DCHECK(vector_sum == 0); // Verify that the above is correct.
++add_count_;
// Update |forget_factor_| (changes only during the first seconds after a
// reset). The factor converges to |base_forget_factor_|.
//使用自定义权重更新
if (start_forget_weight_) {
if (forget_factor_ != base_forget_factor_) {
int old_forget_factor = forget_factor_;
int forget_factor =
(1 << 15) * (1 - start_forget_weight_.value() / (add_count_ + 1));
forget_factor_ =
std::max(0, std::min(base_forget_factor_, forget_factor));
// The histogram is updated recursively by forgetting the old histogram
// with |forget_factor_| and adding a new sample multiplied by |1 -
// forget_factor_|. We need to make sure that the effective weight on the
// new sample is no smaller than those on the old samples, i.e., to
// satisfy the following DCHECK.
RTC_DCHECK_GE((1 << 15) - forget_factor_,
((1 << 15) - old_forget_factor) * forget_factor_ >> 15);
}
} else {//使用默认更新方式
forget_factor_ += (base_forget_factor_ - forget_factor_ + 3) >> 2;
}
}
1.统计所有bucket * forget_factor_ 的值
v e c t o r _ s u m = ∑ n = 0 b u c k e t s . s i z e ( ) b u c k e t s _ [ n ] × f o r g e t _ f a c t o r _ vector\_sum = \sum_{n=0}^{buckets_.size()} buckets\_[n] \times forget\_factor\_ vector_sum=n=0∑buckets.size()buckets_[n]×forget_factor_
2.增加新到bucket 值的权重
b u c k e t s _ [ v a l u e ] = b u c k e t s _ [ v a l u e ] + ( 1 − f o r g e t _ f a c t o r _ ) buckets\_[value] = buckets\_[value] + (1-forget\_factor\_) buckets_[value]=buckets_[value]+(1−forget_factor_)
v e c t o r _ s u m = v e c t o r _ s u m + ( 1 − f o r g e t _ f a c t o r _ ) vector\_sum = vector\_sum + (1-forget\_factor\_) vector_sum=vector_sum+(1−forget_factor_)
3.将vector_sum的值维持在1,这是由于浮点转定点的计算误差导致
v e c t o r _ s u m = 1 − v e c t o r _ s u m b u c k e t s _ [ n ] = { b u c k e t s _ [ n ] − M i n ( ∣ v e c t o r _ s u m ∣ , b u c k e t s _ [ n ] / 16 ) i f ( v e c t o r _ s u m > 0 ) b u c k e t s _ [ n ] + M i n ( ∣ v e c t o r _ s u m ∣ , b u c k e t s _ [ n ] / 16 ) i f ( v e c t o r _ s u m < 0 ) vector\_sum = 1-vector\_sum \\ buckets\_[n]=\begin{cases} buckets\_[n] - Min(|vector\_sum|, buckets\_[n]/16) \ \ if(vector\_sum>0)\\ \\ buckets\_[n] + Min(|vector\_sum|, buckets\_[n]/16) \ \ if(vector\_sum<0)\end{cases} vector_sum=1−vector_sumbuckets_[n]=⎩⎪⎨⎪⎧buckets_[n]−Min(∣vector_sum∣,buckets_[n]/16) if(vector_sum>0)buckets_[n]+Min(∣vector_sum∣,buckets_[n]/16) if(vector_sum<0)
4.更新forget_factor_, 使遗忘因子forget_factor_逼近base_forget_factor_(DelayManager使用start_forget_weight_进行更新,start_forget_weight_ = 2,base_forget_factor_=0.9993, )
使用自定义start_forget_weight_更新
a d d _ c o u n t _ = a d d _ c o u n t _ + 1 f o r g e t _ f a c t o r _ = 1 − ( s t a r t _ f o r g e t _ w e i g h t _ / ( a d d _ c o u n t _ + 1 ) ) f o r g e t _ f a c t o r _ = M a x ( 0 , M i n ( b a s e _ f o r g e t _ f a c t o r _ , f o r g e t _ f a c t o r ) ) add\_count\_ = add\_count\_ + 1 \\ forget\_factor\_ = 1 - (start\_forget\_weight\_/(add\_count\_ + 1)) \\ forget\_factor\_ = Max(0,Min(base\_forget\_factor\_, forget\_factor)) add_count_=add_count_+1forget_factor_=1−(start_forget_weight_/(add_count_+1))forget_factor_=Max(0,Min(base_forget_factor_,forget_factor))
使用默认方式更新(其中的+3让人比较容易误解,这个3是Q30的没有多大)
f o r g e t _ f a c t o r _ = f o r g e t _ f a c t o r _ + ( b a s e _ f o r g e t _ f a c t o r _ − f o r g e t _ f a c t o r _ + 0.000091552734375 ) / 4 forget\_factor\_ = forget\_factor\_ + (base\_forget\_factor\_ - forget\_factor\_ + 0.000091552734375) / 4 forget_factor_=forget_factor_+(base_forget_factor_−forget_factor_+0.000091552734375)/4
获取当前的延迟:
int Histogram::Quantile(int probability) {
// Find the bucket for which the probability of observing an
// inter-arrival time larger than or equal to |index| is larger than or
// equal to |probability|. The sought probability is estimated using
// the histogram as the reverse cumulant PDF, i.e., the sum of elements from
// the end up until |index|. Now, since the sum of all elements is 1
// (in Q30) by definition, and since the solution is often a low value for
// |iat_index|, it is more efficient to start with |sum| = 1 and subtract
// elements from the start of the histogram.
int inverse_probability = (1 << 30) - probability;
size_t index = 0; // Start from the beginning of |buckets_|.
int sum = 1 << 30; // Assign to 1 in Q30.
sum -= buckets_[index];
while ((sum > inverse_probability) && (index < buckets_.size() - 1)) {
// Subtract the probabilities one by one until the sum is no longer greater
// than |inverse_probability|.
++index;
sum -= buckets_[index];
}
return static_cast<int>(index);
}
依据这个probability这个百分比取获取延迟,
∑ n = 0 B b u c k e t s _ [ n ] > p r o b a b i l i t y \sum_{n=0}^{B} buckets\_[n] > probability n=0∑Bbuckets_[n]>probability
统计满足probability概率的索引值,记为B,并将B返回。
重置操作Reset
// Set the histogram vector to an exponentially decaying distribution
// buckets_[i] = 0.5^(i+1), i = 0, 1, 2, ...
// buckets_ is in Q30.
void Histogram::Reset() {
// Set temp_prob to (slightly more than) 1 in Q14. This ensures that the sum
// of buckets_ is 1.
uint16_t temp_prob = 0x4002; // 16384 + 2 = 100000000000010 binary.
for (int& bucket : buckets_) {
temp_prob >>= 1;
bucket = temp_prob << 16;
}
forget_factor_ = 0; // Adapt the histogram faster for the first few packets.
add_count_ = 0;
}
重置并不是将bucket全部置为0
而使用0.5^(i+1)对bucket进行初始化,0x4002是Q15的1/2, buckets的和趋近与1。
DelayManager中使用Histogram的参数
struct DelayHistogramConfig {
int quantile = 1041529569; // 0.97 in Q30.
int forget_factor = 32745; // 0.9993 in Q15.
absl::optional<double> start_forget_weight = 2;
};