How to create a video chat on Android? A Beginner's Guide to WebRTC

Introduction to WebRTC

WebRTC is a video chat and conferencing development technology. It allows you to create a peer-to-peer connection between a mobile device and a browser to stream media. You can find more details about how it works and its general principles in the article about WebRTC .

Two ways to achieve video communication with WebRTC on Android

  • The easiest and fastest option is to use one of the many commercial projects such as Twilio or LiveSwitch . They provide their own
    SDKs for various platforms and implement features out of the box, but they also have disadvantages. They're paid and have limited functionality: you can only do what they have, not anything you can think of.
  • Another option is to use one of the existing libraries. This approach requires more code, but will save you money and give you more flexibility in functionality implementation. In this article, we will investigate the second option and use https://webrtc.github.io/webrtc-org/native-code/android/ as our library.

create connection

Creating a WebRTC connection consists of two steps:

  1. Establish a logical connection - devices must agree on data format, codec, etc.
  2. To establish a physical connection - devices must know each other's addresses.

First, note that at the beginning of the connection, in order to exchange data between the devices, signaling mechanisms are used. The signaling mechanism can be any channel used to transfer data, such as sockets.

Suppose we want to establish a video connection between two devices. To do this, we need to establish a logical connection between them.

logical connection

Establish a logical connection for this peer using the Session Description Protocol (SDP):

Create a PeerConnection object.

Form an object on the SDP offer containing data about the upcoming session and send it to the interlocutor using a signaling mechanism.

val peerConnectionFactory: PeerConnectionFactory
lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
    
    
  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
    
    
          ...
      }
  )!!
}

fun sendSdpOffer() {
    
    
  peerConnection.createOffer(
      object : SdpObserver {
    
    
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
    
    
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpOffer(sdpOffer)
          }

          ...

      }, MediaConstraints()
  )
}

Conversely, for another peer:

  1. Also create a PeerConnection object.
  2. Using the signaling mechanism, the SDP-offer from the first peer is received and stored in itself.
  3. Form an SDP-answer and send it back, also using the signaling mechanism.
fun onSdpOfferReceive(sdpOffer: SessionDescription) {
    
    // Saving the received SDP-offer
  peerConnection.setRemoteDescription(sdpObserver, sdpOffer)
  sendSdpAnswer()
}

// FOrming and sending SDP-answer
fun sendSdpAnswer() {
    
    
  peerConnection.createAnswer(
      object : SdpObserver {
    
    
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
    
    
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpAnswer(sdpOffer)
          }
      }, MediaConstraints()
  )
}

After the first node receives the SDP reply, it keeps it.

fun onSdpAnswerReceive(sdpAnswer: SessionDescription) {
    
    
  peerConnection.setRemoteDescription(sdpObserver, sdpAnswer)
  sendSdpAnswer()
} 

A logical connection is considered established after a successful exchange of SessionDescription objects.

physical connection

We now need to establish physical connections between devices, which is often a non-trivial task. Typically, devices on the Internet do not have public addresses because they are behind routers and firewalls. To solve this problem, WebRTC uses ICE (Interactive Connection Establishment) technology.
Stun and Turn servers are an important part of ICE. They serve one purpose - to establish connections between devices that do not have a public address.

Stun server

A device makes a request to a Stun server and receives its public address in response. Then, send it to the interlocutor using the signaling mechanism. After the interlocutors do the same, the devices recognize each other's network locations and are ready to transfer data to each other.

Turn server

In some cases, routers may have "symmetric NAT" restrictions. This limitation does not allow direct connections between devices. In this case, use the Turn server. It acts as an intermediary through which all data passes. Read more in Mozilla's WebRTC documentation.

As we have seen, STUN and TURN servers play an important role in establishing the physical connection between devices. It is for this purpose that we pass a list of available ICE servers when creating the PeerConnection object.

To establish a physical connection, a peer generates ICE candidate objects - objects containing information about how to find devices on the network, and sends them to the peer via signaling mechanisms.

lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
    
    

  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
    
    
          override fun onIceCandidate(iceCandidate: IceCandidate) {
    
    
              signaling.sendIceCandidate(iceCandidate)
          }}
  )!!
}

The second peer then receives the first peer's candidate ICEs through a signaling mechanism and reserves them for itself. It also generates its own ICE candidates and sends them back.

fun onIceCandidateReceive(iceCandidate: IceCandidate) {
    
    
  peerConnection.addIceCandidate(iceCandidate)
}

Now that the peers have exchanged their addresses, you can start sending and receiving data.

Receive data

After the library has established a logical and physical connection with the interlocutor, it calls the onAddTrack header and passes in a MediaStream object containing the interlocutor's VideoTrack and AudioTrack.

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
    
    

   val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

   peerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {
    
    

           override fun onIceCandidate(iceCandidate: IceCandidate) {
    
    }

           override fun onAddTrack(
               rtpReceiver: RtpReceiver?,
               mediaStreams: Array<out MediaStream>
           ) {
    
    
               onTrackAdded(mediaStreams)
           }
       }
   )!!
}

Next, we must retrieve the VideoTrack from the MediaStream and display it on the screen.

private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
    
    
   val videoTrack: VideoTrack? = mediaStreams.mapNotNull {
    
                                                                
       it.videoTracks.firstOrNull() 
   }.firstOrNull()

   displayVideoTrack(videoTrack)

}

To display a VideoTrack, you need to pass it an object that implements the VideoSink interface. For this purpose, the library provides the SurfaceViewRenderer class.

fun displayVideoTrack(videoTrack: VideoTrack?) {
    
    
   videoTrack?.addSink(binding.surfaceViewRenderer)
}

To get the interlocutor's voice, we don't need to do anything extra - the library does everything for us. However, if we want to fine-tune the sound, we can get an AudioTrack object and use it to change the audio settings.

var audioTrack: AudioTrack? = null
private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
    
    

   audioTrack = mediaStreams.mapNotNull {
    
     
       it.audioTracks.firstOrNull() 
   }.firstOrNull()
}

For example, we can mute the interlocutor as follows:

fun muteAudioTrack() {
    
    
   audioTrack.setEnabled(false)
}

send data

Sending video and audio from your device also starts with creating a PeerConnection object and sending an ICE candidate. But unlike creating an SDPOffer when receiving a video stream from an interlocutor, in this case we must first create a MediaStream object, which includes AudioTrack and VideoTrack.

In order to send our audio and video streams, we need to create a PeerConnection object and then use the signaling mechanism to exchange IceCandidate and SDP packets. But instead of getting the media stream from the library, we have to get the media stream from our device and pass it to the library in order to pass it on to our interlocutor.

fun createLocalConnection() {
    
    

   localPeerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {
    
    
            ...
       }
   )!!

   val localMediaStream = getLocalMediaStream()
   localPeerConnection.addStream(localMediaStream)

   localPeerConnection.createOffer(
       object : SdpObserver {
    
    
            ...
       }, MediaConstraints()
   )
}

Now we need to create a MediaStream object and pass it the AudioTrack and VideoTrack objects.

val context: Context
private fun getLocalMediaStream(): MediaStream? {
    
    
   val stream = peerConnectionFactory.createLocalMediaStream("user")

   val audioTrack = getLocalAudioTrack()
   stream.addTrack(audioTrack)

   val videoTrack = getLocalVideoTrack(context)
   stream.addTrack(videoTrack)

   return stream
}

Receive track:

private fun getLocalAudioTrack(): AudioTrack {
    
    
   val audioConstraints = MediaConstraints()
   val audioSource = peerConnectionFactory.createAudioSource(audioConstraints)
   return peerConnectionFactory.createAudioTrack("user_audio", audioSource)
}

Receiving VideoTrack is a little more difficult. First, get a list of all cameras on the device.

lateinit var capturer: CameraVideoCapturer

private fun getLocalVideoTrack(context: Context): VideoTrack {
    
    
   val cameraEnumerator = Camera2Enumerator(context)
   val camera = cameraEnumerator.deviceNames.firstOrNull {
    
    
       cameraEnumerator.isFrontFacing(it)
   } ?: cameraEnumerator.deviceNames.first()

   ...

}

Next, create a CameraVideoCapturer object that will capture the image.

private fun getLocalVideoTrack(context: Context): VideoTrack {
    
    

   ...


   capturer = cameraEnumerator.createCapturer(camera, null)
   val surfaceTextureHelper = SurfaceTextureHelper.create(
       "CaptureThread",
       EglBase.create().eglBaseContext
   )
   val videoSource =
       peerConnectionFactory.createVideoSource(capturer.isScreencast ?: false)
   capturer.initialize(surfaceTextureHelper, context, videoSource.capturerObserver)

   ...

}

Now, after getting the CameraVideoCapturer, start capturing pictures and add them to MediaStream.

private fun getLocalMediaStream(): MediaStream? {
    
    
  ...

  val videoTrack = getLocalVideoTrack(context)
  stream.addTrack(videoTrack)

  return stream
}

private fun getLocalVideoTrack(context: Context): VideoTrack {
    
    
    ...

  capturer.startCapture(1024, 720, 30)

  return peerConnectionFactory.createVideoTrack("user0_video", videoSource)

}

After creating the MediaStream and adding it to the PeerConnection, the library forms an SDP offer, and the above SDP packet exchange takes place through the signaling mechanism. When this process is complete, the interlocutor will start receiving our video stream. Congratulations, the connection is now established.

many to many

We have considered one-to-one connections. WebRTC also allows you to create many-to-many connections. In its simplest form, this works in exactly the same way as a one-to-one join. The difference is that the PeerConnection object, as well as the SDP packet and ICE-candidate exchange, are not done once for each participant. This approach has disadvantages:

  • The device is heavily loaded because it needs to send the same stream of data to every interlocutor
  • It is difficult or even impossible to implement additional functions such as video recording and transcoding

In this case, WebRTC can be used in combination with a media server that takes care of the above tasks. The process for the client is exactly the same as for a direct connection to the interlocutor's device, but the media stream is not sent to all participants, but only to the media server. The media server retransmits it to other participants.

in conclusion

We considered the easiest way to create a WebRTC connection on Android. If you still don't understand after reading it, just go through all the steps again and try to implement it yourself-once you have the key points, it should not be a problem to use this technique in practice. If you want to learn more about this technology, check out our WebRTC security guide.

Guess you like

Origin blog.csdn.net/CJohn1994/article/details/127017891