webrtc connection 的管理，是建立 p2p 连接的关键，关于 webrtc 的 connection 有几个问题需要弄清楚，下面记录下来，以加深理解。webrtc 粗略的 connection 管理流程如下图所示：
在这里插入图片描述
connection 的管理，从connection 的创建开始，经历更新 connection 集合，到最后选择一个最优的 connection 来传输数据。以及在传输数据的过程中，仍然要按照某种规则来对 connection 集合执行 ping，以应对网络状况的变化，动态的选择最优的 connection 来传输数据。

1. connection 相关的概念

1.1 connection 的概念以及属性

Connection 代表了本地客户端的端口和远端客户端的端口之间建立的通信链路。源码中的解释如下：

// Represents a communication link between a port on the local client and a port on the remote client.

类原型部分方法和属性如下：

class Connection : public CandidatePairInterface,
                   public rtc::MessageHandler,
                   public sigslot::has_slots<> {
 public:
 // Implementation of virtual methods in CandidatePairInterface.
  // Returns the description of the local port
  const Candidate& local_candidate() const override;
  // Returns the description of the remote port to which we communicate.
  const Candidate& remote_candidate() const override;
  
  // The connection can send and receive packets asynchronously.  This matches
  // the interface of AsyncPacketSocket, which may use UDP or TCP under the
  // covers.
  virtual int Send(const void* data, size_t size,
                   const rtc::PacketOptions& options) = 0;
 // Called when a packet is received on this connection.
  void OnReadPacket(const char* data, size_t size,
                    const rtc::PacketTime& packet_time);

上面给出的函数原型能够很好的代表一个 connectin 的两端以及发送和接收数据的能力，更具体的信息，可以查看源码。connection 还有很多属性参数，代表了这个链接能否 ping 通，超时时间、平均往返时间以及丢包率等各种信息，因为太多就不列出来了。

1.2 connection 状态

1.2.1 pingable

一个 connection 能否被选中用于传递数据，需要进行大量的检查和试探，connection 支持的检查状态如下：

enum class IceCandidatePairState {
  WAITING = 0,  // Check has not been performed, Waiting pair on CL.
  IN_PROGRESS,  // Check has been sent, transaction is in progress.
  SUCCEEDED,    // Check already done, produced a successful result.
  FAILED,       // Check for this connection failed.
  // According to spec there should also be a frozen state, but nothing is ever
  // frozen because we have not implemented ICE freezing logic.
};

（1）一个 connection 被创建的时候，检查状态置为 IceCandidatePairState::WAITING ，也就是处于等待被检查的状态；
（2）当我们对一个 connection 执行 Ping() 动作时，检查状态被置为 IceCandidatePairState::IN_PROGRESS ，表示正在被检查；
（3）当我们发出的 ping 消息成功收到响应后，检查状态被置为 IceCandidatePairState::SUCCEEDED；如果超时没有收到响应，检查状态被置为 IceCandidatePairState::FAILED；
一个 connection 只有调用 IsPingable() 返回 true 的条件下，才会执行 Ping() ，而 Pingable 的判断条件比较多，其中一条就是如果检查状态为 IceCandidatePairState::FAILED ，那么 IsPingable() 将返回 false。

1.2.2 writable

一个 connection 是否能够执行 Ping ，与其当前所处的写入状态有关。connection 支持的写入状态如下：

  enum WriteState {
    STATE_WRITABLE          = 0,  // we have received ping responses recently
    STATE_WRITE_UNRELIABLE  = 1,  // we have had a few ping failures
    STATE_WRITE_INIT        = 2,  // we have yet to receive a ping response
    STATE_WRITE_TIMEOUT     = 3,  // we have had a large number of ping failures
  };

（1）当一个 connection 被创建时，其写入状态被初始化为 STATE_WRITE_INIT；
（2）如果一个 connection 因为某些原因被剪枝(prune)，其装态被设置为 STATE_WRITE_TIMEOUT；
（3）如果一个 connection 之前处于 STATE_WRITABLE，而之后 ping 失败的次数太多且很长时间没有收到响应，其状态被设置为 STATE_WRITE_UNRELIABLE；
（4）如果一个 connection 在 ping 之后成功收到响应，那么其状态被设置为 STATE_WRITABLE。

1.2.3 prune

在 webrtc 需要建立连接的每一端，都可能会有多个网卡，并且每一端还有 TCP 和 UDP 端口，因此在 webrtc 的两端会创建较多的 connection。这些 connection 良莠不齐，有的网络稳定延时小，而有的根本走不通，因此增加新的 connection 时，会执行剪枝动作，将那些较差的 connection 标志位 pruned。剪枝的具体依据，可以看 P2PTransportChannel::PruneConnections()。

1.2.4 ICE_ROLE

在看 webrtc 代码的过程中，会频繁看到 ice_role_ 身影，这是一个枚举变量，用来决定建立连接的两端谁掌握主动权。其定义如下：

// Whether our side of the call is driving the negotiation, or the other side.
enum IceRole {
  ICEROLE_CONTROLLING = 0,
  ICEROLE_CONTROLLED,
  ICEROLE_UNKNOWN
};

如果一端的 ice_role_ 值为 ICEROLE_CONTROLLING ，表示该端掌握会话协商的主动权，否则，表示这一端属于被控制的。
offer/answer 和 controlling/controlled 是否存在某种关联呢？根据代码中的注释，两者之间并没有关联关系。下面给出位于src\third_party\webrtc\pc\transportcontroller.cc 中的一段注释：

// The initial offer side may use ICE Lite, in which case, per RFC5245 Section 5.1.1, the answer side should take the controlling role if it is in the full ICE mode.
//
// When both sides use ICE Lite, the initial offer side must take the controlling role, and this is the default logic implemented in SetLocalDescription in PeerConnection.

根据注释，如果 offer 端采用了精简模式的 ICE，那么 answer 端将担任 controlling 角色，如果两端都采用精简模式的 ICE ，那么 PeerConnection 的默认实现逻辑是让 offer 端担任 controlling 角色。

2. 创建 connection

（1）CreateConnection
在已经获得了远端的 Candidates 和本地的 Candidate 后，那这两者是怎么关联起来的呢？
这里就要提到 SignalPortReady了，当本地的 Candidate 准备好之后，就会发送这个信号。在 P2PTransportChannel 中，有对应的信号响应函数 P2PTransportChannel::OnPortReady() ，部分代码如下：

// A new port is available, attempt to make connections for it
void P2PTransportChannel::OnPortReady(PortAllocatorSession *session,
                                      PortInterface* port) {
 ...
  // Attempt to create a connection from this new port to all of the remote
  // candidates that we were given so far.

  std::vector<RemoteCandidate>::iterator iter;
  for (iter = remote_candidates_.begin(); iter != remote_candidates_.end();
       ++iter) {
    CreateConnection(port, *iter, iter->origin_port());
  }

  SortConnectionsAndUpdateState();
}

remote_candidates_ 成员变量保存了所有的远端 Candidates ，其中的值是在 AddIceCandidate 时保存下来的，针对本地创建的 Port ( Candidate 是从 Port 创建的，具有对应的关系)，循环遍历 remote_candidates_ ，用 Port 与每一个远端 Candidate 建立一个 Connection 。

（2）AddConnection
创建 connection 之后，会将其保存到 P2PTransportChannel 的 connections_ ，unpinged_connections_ 两个成员中，并且给这个 connection 添加很多信号处理函数。

void P2PTransportChannel::AddConnection(Connection* connection) {
  connections_.push_back(connection);
  unpinged_connections_.insert(connection);
  connection->set_remote_ice_mode(remote_ice_mode_);
  connection->set_receiving_timeout(config_.receiving_timeout);
  connection->SignalReadPacket.connect(
      this, &P2PTransportChannel::OnReadPacket);
...
  had_connection_ = true;
}

3. 选择合适的 connection

3.1 P2PTransportChannel 与 connection 之间的关系

一个 P2PTransportChannel 管理着很多的 connection，在传输数据时，需要根据各个 connection 的状态来选择其中最好的 connection 来使用。P2PTransportChannel 与 connection 之间的关系见下图：
在这里插入图片描述
图片来自：libjingle翻译之《Important Concepts（重要概念）之Transports, Channels, and Connections（传输、通道、链接）》

3.2 如何选择合适的 connection

在前面创建 connection 的过程中提到本地的端口创建完毕后，会调用 P2PTransportChannel::OnPortReady() 这个方法来针对这个端口创建 connection，在 connection 创建完毕后会调用 SortConnectionsAndUpdateState() ，来对所有的 connection 进行排序并更新状态，以找到最合适的 connection 来传递音视频数据。

3.2.1 比较两个 connection

当存在多个 connection 时，如何比较 connection（其中一个为 a，另一个为 b）的优劣呢，下面给出 P2PTransportChannel::CompareConnections() 中两个 connection 之间的比较逻辑。
（1）比较两个 connection 之间的状态：

如果两者之间一个可写一个不可写，那么可写的那个胜出；
如果两者都可写或不可写，那么再比较两者的具体状态（可写包括几种子状态），可写的状态值越小表明可写度越高，可写值较小的那个胜出；
如果上面两步还是无法分出胜负，那么接下来比较接收状态，如果一个接收过数据而另一个没有接收过数据，那么接收过数据的胜出；
如果是TCP 类型的 connection ，那么会比较其是否处于 connected() 状态，处于连接状态的好于断开状态的，对于 UDP 类型的 connection ， connected() 始终返回 true；

（2）经过（1）中的比较还是无法分出哪个 connection 比较好，进一步判断如果这一端是 controlled 的（参见前面介绍的 ICE_ROLE ）：

如果 controlling 端给这个 connection 分配的提名值（remote_nomination）越高，那么这个 connection 就较好；
还是无法比较，那么判断两个 connection 最近收到数据的时刻，越晚（表示越临近当前时刻）收到数据的越好；

（3）经过（1）和（2）还是无法区分，那么进行下一步比较：

两个 connection 中网络代价低的胜出，网络代价比较很简单，有线网和环回网络代价最低，蜂窝网络代价最高；
还是无法区分，则进一步比较连接两端的 generation 标识之和，generation 标识之和越大表明这个 connection 越新，越新的越好；
还是无法区分，则比较 connection 的两端端口是否有被剪枝，一个 connection 的任何一端被剪枝，都表明这个 connection 较差，没有被剪枝的 connection 好于被剪枝过的 connection；

（4）如果经过上面的一系列比较还是无法区分，则比较两个 connection 的 RTT（平均往返时间），RTT 小的胜出。

上面的步骤就是比较 connection 的全部流程。

3.2.2 是否切换选中的 connection

在 SortConnectionsAndUpdateState() 后会执行 MaybeSwitchSelectedConnection() 以在可以切换 selected_connection_ 时进行切换。判断是否切换的流程如下：
（1）如果对 new_connection 调用 ReadyToSend() 返回 false，或者 new_connection 等于 selected_connection_ 时，不执行切换；
（2）如果 selected_connection_ 为空，那么执行切换动作；
（3）如果 new_connection 的网络代价比 selected_connection_ 更大，并且 new_connection 还没有收到响应，那么不执行切换；
（4）如果通过调用 CompareConnections() 比较两个 connection，如果 new_connection 比 selected_connection_ 更好，那么执行切换；
（5）如果 new_connection 的 RTT 相比 selected_connection_ 要小，且差值达到 10ms，就执行切换。

如果因为从上一次收到响应数据到当前的时间长度超过阈值导致无法切换，那么会抛出一个延时任务，过一段时间后再重新执行 SortConnectionsAndUpdateState() ，以进一步判断是否需要切换。

3.2.3 如何选择下一步执行 ping 的 connection

在执行 P2PTransportChannel::MaybeStartPinging() 时，会抛出一个消息 MSG_CHECK_AND_PING，这个消息的处理函数 P2PTransportChannel::OnCheckAndPing() 在距离上一次 ping 的间隔达到一定时间后会选择一个 connection 来执行下一次 ping。关于如何选择执行下一次 ping 的 connection，流程如下：
（1）如果 selected_connection_ 不为空，并且 connected() 函数返回 true，selected_connection_ 是可写的，距离上一次在该 connection 上执行 ping 的时间间隔大于阈值，那么 selected_connection_ 将被选择作为下次执行 ping 的 connection；
（2）如果 P2PTransportChannel 是 weak 的（selected_connection_ 为空或者 selected_connection_ 是 weak 的），那么会针对每一个网络（可能多个网卡多个网络）选出一个最好的并且可写的 connection 集合 A，进一步选出其中距离该 connection 上次 ping 的时间间隔超过一个阈值的集合 B，最后从集合 B 中选择距离上一次 ping 过去最长时间的 connection 作为下一次 ping 的执行对象；
（3）第二步可能无法找到可以执行 ping 的 connection，因此进入第三步，接下来从所有接收到 ping 但是还没有发送 ping （last_ping_received > last_ping_sent）的 connection 集合中选择 last_ping_received 最小的 connection，也就是从上一次接收到 ping 到现在过去最久的，优先执行 ping；
（4）所有 pingable 但还没有 ping 过的 connection 集合优先于已经 ping 过的集合，如果还没有 ping 过的集合都是不可 ping 的，那么将所有 ping 过的集合加入到没有 ping 过的集合中，进行统一筛选；
（5）从第（4）步中得到的没有 ping 过的集合中选出 pingable 的集合，再将集合按照 MorePingable() 进行排序，选出其中最 pingable 的 connection；
（6）如果经过前面几步还是无法找到一个合适的 connection ，那么返回 nullptr。

两个 connection ，谁更加 pingable，会经过如下的比较流程：
（1）如果 config_.prioritize_most_likely_candidate_pairs 被设置为 true（默认为 false），那么首先会比较 connection 两端的端口类型，如果一个 connection 的两端端口类型都是 cricket::RELAY_PORT_TYPE ，而另一个不是，那么前者是 more pingable，如果两个 connection 的两端端口类型都是 cricket::RELAY_PORT_TYPE ，那么 UDP 类型的 connection 将 more pingable；
（2）如果第（1）步无法比较，那么判断 connection 的 last_ping_sent 时间，也就是上一次发送 ping 的时间戳，时间戳越小越 pingable；
（3）在初始状态下，还没有任何一个 connection 被 ping 过，上面几步的比较没有意义，因此进入第（3）步，两个 connection 在有序（参考3.2.1 比较两个 connection）的 connections_ 中越靠前的越 pingable。

4. 小结

webrtc 中的 connection 管理，属于 webrtc 中比较重要的模块，通过分析 connection 管理过程，能够比较好的了解 webrtc 如何选择最优的 connection ，以及如何应对网络变化动态切换 connection 等。

webrtc 的 connection 管理