WEBRTC TURN protocol source code analysis

WebRTC is an open-source media framework released by Google in 2011. It can perform real-time audio and video communication in the browser. It is designed for P2P communication. Developers can also build their own servers as one end of the communication. In the following scenarios with strict network conditions, the communication cannot be established directly, and the forwarding needs to be forwarded with the help of the transit server TURN (Traversal Using Relays around NAT).

One end is a symmetric NAT, and the other end is a port-restricted cone NAT or also a symmetric NAT , so P2P cannot be established. This tool can detect the NAT type of your own network https://github.com/aarant/pynat
In environments with strict restrictions on network egress, such as banks and government agencies, the external network IP addresses that require access need to be added to their gateway whitelist. The TURN server can be deployed independently, and the public IP address of TURN can be added to the whitelist.
A firewall with extremely strict security requirements does not allow UDP communication, and even only allows TLS over port 443 traffic.

TURN process analysis

The protocol is divided into three parts : 1. Create transmission resources
on the TURN server , which is called allocation 2. Indication mode to transmit data 3. Channel mode to transmit data It should be noted that these two ways of transmitting data are in parallel. The three-part process sequence diagram is as follows (picture link https://justme0.com/assets/pic/turn/seq.svg ), the client refers to the WebRTC code, and the server refers to the pion/turn code.

1. Create allocation resources

allocation is the resource allocated by the TURN server to the client. The data structure is as follows, and the main fields are listed (see https://github.com/pion/turn/blob/master/internal/allocation/allocation.go#L23 for details ). Identify an allocation with the quintuple fiveTuple <clientIP, clientPort, svrIP, svrPort, protocol>. The latest RFC document of the protocol stipulates TCP/UDP/TLS/DTLS. Pion does not support DTLS yet. The server listening port svrPort defaults to 3478 for TCP/UDP, 5349 for TLS/DTLS.

// FiveTuple is the combination (client IP address and port, server IP
// address and port, and transport protocol (currently one of UDP,
// TCP, or TLS)) used to communicate between the client and the
// server.  The 5-tuple uniquely identifies this communication
// stream.  The 5-tuple also uniquely identifies the Allocation on
// the server.
type FiveTuple struct {
	Protocol
	SrcAddr, DstAddr net.Addr
}

type Allocation struct {
	RelayAddr           net.Addr
	Protocol            Protocol
	TurnSocket          net.PacketConn
	RelaySocket         net.PacketConn
	fiveTuple           *FiveTuple
	permissionsLock     sync.RWMutex
	permissions         map[string]*Permission
	channelBindingsLock sync.RWMutex
	channelBindings     []*ChannelBind
	lifetimeTimer       *time.Timer
}

allocation structure

Pay special attention to the permissions and channelBindings fields in the data structure. The key of permissions is the peer address, and channelBindings is an array, which is also identified by the peer address. The flow is described below with reference to the sequence diagram.

1.1 STUN bind request

Same as the STUN function, it returns the IP and port of the client, which is used to inform the end of its own export address, collect local candidates, and the server is stateless.

1.2 allocation request

Request resource allocation. The request parameter identifies whether UDP or TCP is between TURN and peer. WebRTC client hardcodes UDP. Note that the RFC document is a specification, and WebRTC is an implementation. It does not implement all the functions specified in the standard. As can be seen from the source code below, the request does not contain the MAC code AttrMessageIntegrity, returns a 401 error code CodeUnauthorized, and returns a realm and a random number. The anti-error here is a normal process, which is to bring some parameters of the server to the end.

func authenticateRequest(r Request, m *stun.Message, callingMethod stun.Method) (stun.MessageIntegrity, bool, error) {
	respondWithNonce := func(responseCode stun.ErrorCode) (stun.MessageIntegrity, bool, error) {
		nonce, err := buildNonce()
		if err != nil {
			return nil, false, err
		}

		// Nonce has already been taken
		if _, keyCollision := r.Nonces.LoadOrStore(nonce, time.Now()); keyCollision {
			return nil, false, errDuplicatedNonce
		}

		return nil, false, buildAndSend(r.Conn, r.SrcAddr, buildMsg(m.TransactionID,
			stun.NewType(callingMethod, stun.ClassErrorResponse),
			&stun.ErrorCodeAttribute{Code: responseCode},
			stun.NewNonce(nonce),
			stun.NewRealm(r.Realm),
		)...)
	}

	if !m.Contains(stun.AttrMessageIntegrity) {
		return respondWithNonce(stun.CodeUnauthorized)
	}
	...
}

After receiving the response, set the realm and random number on the terminal, and calculate the MAC code = MD5(username ":" realm ":" SASLprep(password)). The MAC code will be carried in the next step. See TurnAllocateRequest:: OnAuthChallenge https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/p2p/base/turn_port.cc;l=1439

1.3 The second allocation request

The terminal carries the MAC code, realm and the random number returned in the previous step, requests allocation again, the server verifies the MAC code, creates the Allocation data structure after ok, and assigns a timer to the lifetimeTimer field. The resource keep-alive time defaults to 10 minutes. The WebRTC terminal will send a heartbeat to keep alive one minute in advance, that is, send a heartbeat every 9 minutes; if the client wants to release resources, fill the lifetime with 0 in the refresh request parameter, which will be discussed in the next section. After allocating resources, the server process binds a UDP port for communicating with the peer, and the for loop waits to receive packets from the peer.

func (m *Manager) CreateAllocation(fiveTuple *FiveTuple, turnSocket net.PacketConn, requestedPort int, lifetime time.Duration) (*Allocation, error) {
	...
	go a.packetHandler(m)
	...
}

// 从relay端口（UDP）收包处理
func (a *Allocation) packetHandler(m *Manager) {
	buffer := make([]byte, rtpMTU)

	for {
		n, srcAddr, err := a.RelaySocket.ReadFrom(buffer)
		if err != nil {
			m.DeleteAllocation(a.fiveTuple)
			return
		}

		a.log.Debugf("relay socket %s received %d bytes from %s",
			a.RelaySocket.LocalAddr().String(),
			n,
			srcAddr.String())
		...
	}
}

1.4 allocation refresh request

The resource is released after specifying how long (namely expire) is specified in the request parameter. If 0 is passed, it means that the resource is released immediately, and expire is refreshed regularly on the end, which is to keep alive periodically.

2. Indication way to transmit data

Overview of indication methods from RFC

2.1 create permission

Before transferring data in indication mode, request permission first. When adding remote candidate on the terminal, a TurnEntry object is created, and a create permission request is sent in the constructor, and the peer address is carried in the request parameter.

TurnEntry::TurnEntry(TurnPort* port, Connection* conn, int channel_id)
    : port_(port),
      channel_id_(channel_id),
      ext_addr_(conn->remote_candidate().address()),
      state_(STATE_UNBOUND),
      connections_({conn}) {
  // Creating permission for `ext_addr_`.
  SendCreatePermissionRequest(0);
}

from https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/p2p/base/turn_port.cc;l=1764

create permission capture packet

After the server receives the signaling, it adds a kv to the permissions map of the Allocation structure, the key is the peer address, identifies the permission, and creates a timer at the same time. If it times out, delete the permission. The timeout period is 5 minutes, and finally returns a packet to the end. .

After receiving the reply packet, the terminal is ready to send the next heartbeat to keep alive:

void TurnEntry::OnCreatePermissionSuccess() {
  RTC_LOG(LS_INFO) << port_->ToString() << ": Create permission for "
                   << ext_addr_.ToSensitiveString() << " succeeded";
  if (port_->callbacks_for_test_) {
    port_->callbacks_for_test_->OnTurnCreatePermissionResult(
        TURN_SUCCESS_RESULT_CODE);
  }

  // If `state_` is STATE_BOUND, the permission will be refreshed
  // by ChannelBindRequest.
  if (state_ != STATE_BOUND) {
    // Refresh the permission request about 1 minute before the permission
    // times out.
    TimeDelta delay = kTurnPermissionTimeout - TimeDelta::Minutes(1);
    SendCreatePermissionRequest(delay.ms());
    RTC_LOG(LS_INFO) << port_->ToString()
                     << ": Scheduled create-permission-request in "
                     << delay.ms() << "ms.";
  }
}

from https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/p2p/base/turn_port.cc;l=1846

2.2 indication

The signaling that transmits data from the terminal to the server is called send indication, and the signaling sent from the server to the terminal is called data indication. (I feel that this name does not correspond well)

The overhead of the send indication is larger than that of the channel, with 36B, which specifies the peer address, that is, who to send to. The server will check whether the permission exists according to the peer address, and if it exists, it will take out the carried data and send it to the peer.

send indication capture packet

When receiving the peer's packet, check whether the channel mode is established. If it is established, the channel mode is preferred. If it is not established, the permission mode is used to send it back to the end. The data indication structure is the same as the send indication.

After reading the above transmission method, do you think there is any problem?

Note that the peer address can be arbitrarily specified by the client when creating permission. If the service is deployed on the intranet, the user may maliciously scan the intranet server, similar to the SSRF (Server-side request forgery) vulnerability, see this report https://hackerone.com /reports/333419?from_wecom=1

The attacker may proxy TCP connections to the internal network by setting the `XOR-PEER-ADDRESS` of the TURN connect message (method `0x000A` , https://tools.ietf.org/html /rfc6062#section-4.3 ) to a private IPv4 address.

UDP packets may be proxied by setting the `XOR-PEER-ADDRESS` to a private IP in the TURN send message indication (method `0x0006` , https://tools.ietf.org/html/rfc5766#section-10 ).

For example, if a server on the intranet provides HTTP services for intranet use, when creating resources, specify the TCP method between TURN and peer (pion/turn does not support, coturn supports), and subsequently specify peer when creating permission and send indication address is the intranet address (brute force exhaustion), and the HTTP request is wrapped in the TURN protocol, so that the data on the intranet server is obtained.

The transmission method to be introduced in the next section also has this problem. As a solution, if it is only for WebRTC transfer, and WebRTC only uses UDP, you can disable the function of TURN to allocate TCP relay ports, and then block the UDP ports of commonly used protocols. To verify the peer address, you can read what the author said in the original post in detail.

3. Channel way to transmit data

Overview of channel methods from RFC

3.1 channel bind request

Similar to the permission method, it requests to create a channel before sending data. It requests to create a channel when the TurnEntry object sends real data. See the code below, keyword SendChannelBindRequest.

int TurnEntry::Send(const void* data,
                    size_t size,
                    bool payload,
                    const rtc::PacketOptions& options) {
  rtc::ByteBufferWriter buf;
  if (state_ != STATE_BOUND ||
      !port_->TurnCustomizerAllowChannelData(data, size, payload)) {
    // If we haven't bound the channel yet, we have to use a Send Indication.
    // The turn_customizer_ can also make us use Send Indication.
    TurnMessage msg(TURN_SEND_INDICATION);
    msg.AddAttribute(std::make_unique<StunXorAddressAttribute>(
        STUN_ATTR_XOR_PEER_ADDRESS, ext_addr_));
    msg.AddAttribute(
        std::make_unique<StunByteStringAttribute>(STUN_ATTR_DATA, data, size));

    port_->TurnCustomizerMaybeModifyOutgoingStunMessage(&msg);

    const bool success = msg.Write(&buf);
    RTC_DCHECK(success);

    // If we're sending real data, request a channel bind that we can use later.
    if (state_ == STATE_UNBOUND && payload) {
      SendChannelBindRequest(0);
      state_ = STATE_BINDING;
    }
  } else {
    // If the channel is bound, we can send the data as a Channel Message.
    buf.WriteUInt16(channel_id_);
    buf.WriteUInt16(static_cast<uint16_t>(size));
    buf.WriteBytes(reinterpret_cast<const char*>(data), size);
  }
  rtc::PacketOptions modified_options(options);
  modified_options.info_signaled_after_sent.turn_overhead_bytes =
      buf.Length() - size;
  return port_->Send(buf.Data(), buf.Length(), modified_options);
}

from https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/p2p/base/turn_port.cc;l=1846

There is an important field channel number in the creation request, which is provided by the terminal Generated and identified this channel, the value of the channel number is stipulated

- 0x0000-0x3FFF: cannot be used as the channel number
- 0x4000-0x7FFF: the value (16383) that can be used as the channel number
- 0x8000-0xFFFF: Reserved value, reserved for future use

channel bind request packet capture

After receiving the creation request, the server adds an item to the channelBindings array of the Allocation structure. The channel number or peer address identifies the channel, and creates a timer with a timeout of 10 minutes. If the channel has been created, refresh expire. In addition, the bind request will refresh the expire of the permission . Finally return the package to the end.

After receiving the return packet, the terminal is also ready for the next heartbeat to keep alive, and the heartbeat is refreshed one minute before the timeout.

void TurnChannelBindRequest::OnResponse(StunMessage* response) {
  RTC_LOG(LS_INFO) << port_->ToString()
                   << ": TURN channel bind requested successfully, id="
                   << rtc::hex_encode(id())
                   << ", code=0"  // Makes logging easier to parse.
                      ", rtt="
                   << Elapsed();

  if (entry_) {
    entry_->OnChannelBindSuccess();
    // Refresh the channel binding just under the permission timeout
    // threshold. The channel binding has a longer lifetime, but
    // this is the easiest way to keep both the channel and the
    // permission from expiring.
    TimeDelta delay = kTurnPermissionTimeout - TimeDelta::Minutes(1);
    entry_->SendChannelBindRequest(delay.ms());
    RTC_LOG(LS_INFO) << port_->ToString() << ": Scheduled channel bind in "
                     << delay.ms() << "ms.";
  }
}

from https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/p2p/base/turn_port.cc;l=1732

3.2 channel data

As can be seen from the code TurnEntry::Send() that sends real data in the previous section, if the channel mode has been created when sending data, it will be transmitted in channel mode (that is, the priority channel mode is sent), and its overhead is only 4B, compared with the indication mode Much less overhead:

channel data packet capture

After reading the above two transmission methods, I have a question. The channel method has less overhead than the permission method, and the expire of the permission can also be refreshed when the heartbeat is sent. To put it bluntly, the function is more powerful. Is it possible to use only the channel method? At the beginning ( TurnEntry constructor) use channel bind instead of create permission?

This question was asked on stackoverflow https://stackoverflow.com/questions/75611078/why-not-use-only-channel-data-in-webrtc-turn-client , a WebRTC boss mentioned the ICE RFC 5245 document , it is recommended to create a channel after the ICE process is completed, that is, create a candidate pair after selecting it. In fact, the TURN documentation doesn’t care about the specific data carried at all. The concept of candidate belongs to WebRTC ICE, which is recommended in the ICE documentation. I think the native side can be optimized and changed to only use the channel method.

Finally, in summary, the protocol background of the request class needs to authenticate the MAC code , including allocation and refresh requests, create permission requests, and channel bind requests. There are three kinds of timers involved in TURN , corresponding to the three parts in the text, and the terminal needs to send "heartbeat" to keep alive regularly, and the permission can be kept alive by channel bind.

Source code analysis of the original WEBRTC TURN protocol-Knowledge

★The business card at the end of the article can receive audio and video development learning materials for free, including (FFmpeg, webRTC, rtmp, hls, rtsp, ffplay, srs) and audio and video learning roadmaps, etc.

see below!