Best Practice | Use WebSocket to create a real-time face and body comparison service

In order to improve user experience, increase verification speed, and improve security during the verification process, face-core products will introduce some real-time communication technologies to remind users to adjust their posture in real time, guide users to perform live body movements, and perform live body detection in real time. Face Kernel uses two real-time communication technologies - WebSocket and WebRTC.

This article will mainly introduce WebSocket used in the floating layer of the face core body.

The core technology used by floating-layer living bodies - WebSocket

In the floating layer living body, our main feature is "real-time" - real-time detection of face distance, face occlusion, etc. Before the birth of WebSocket, the browser needed to request data from the server through HTTP requests. Although subsequent HTTP versions have supported or smart developers have implemented various "quasi-real-time" data requesting solutions: polling, long polling, long connections, etc. But these methods are inseparable from the Request/Response pair , that is, the browser needs to initiate a request before the server is qualified to send a response.

Polling and long polling

The initial "real-time" is not really real-time, but the client asks the server every once in a while whether there is new data, and the client's polling interval determines how real-time the data is.

Insert image description here

The polling process is as follows:

  1. Client initiates request
  2. The server responds immediately , regardless of whether there is new data or not.
  3. After waiting n seconds (i.e. a polling interval), the client initiates a request again.
  4. The server still responds immediately.
  5. So back and forth.

It can be seen that if the data update occurs between two polls (generally speaking, the polling interval is in seconds, so the data will almost always appear between two polls), then the latest data will go through a certain period of time. delay in delivery.

So smart developers invented a long polling solution.

Insert image description here

The process of long polling is as follows:

  1. The client initiates a request.
  2. The server does not respond immediately, but waits until the data update arrives before responding to the client. (Of course, if there is still no data update after a certain waiting time, it will respond.)
  3. After the client processes the response, it immediately initiates the next long polling request.
  4. So back and forth.

Compared with polling, the advantage of long polling is that data updates can be delivered to the client with almost no delay. At the same time, it also reduces the number of connections established between the client and the server, reducing the cost of connection establishment.

Short and long connections

Polling and long polling are often compared with short connections and long connections. In general, a short link means that a new TCP connection is established for communication with each request; while a long connection means that the same TCP connection is reused for multiple requests.
Insert image description here

However, whether it is a short connection or a long connection, polling or long polling, all data updates require the client to initiate a request before the server can send it. However, during the floating layer living process, many data updates arrive in batches and need to be delivered to the client in time, so a more real-time communication method is needed.

WebSocket

WebSocket provides a two-way communication capability between the browser and the server. Similar to Socket, it is an application layer protocol based on TCP connection. Use the HTTP protocol to connect. After the connection is successfully established, both ends can actively send information to the other party.

Insert image description here

How does WebSocket establish a connection?

With the help of the HTTP protocol, you can grab the TCP connection that HTTP relies on, put on your own mask, and communicate with your own protocol.

In order to remain compatible with HTTP servers, WebSocket chooses to use the HTTP protocol to establish connections. First, the client will send an HTTP Upgrade request requesting the upgrade protocol:

GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13

This is a very standard HTTP Get request. There is a key Header inside:

  1. Upgrade: upgrade is the header field used to define the conversion protocol in HTTP1.1. It means that if the server supports it, switch the current application layer protocol, but the TCP connection based on it will not change. For example, change to WebSocket and HTTP2.0.

After the server receives a protocol switching request, it will make a judgment on its own capabilities. If it supports the protocol, it will reply with a successful Upgrade Response - Switching Protocols:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat

At this point, the TCP connection based on HTTP has been reused as a WebSocket connection. Below is a nodejs version of websocket server demo.

httpserver.on('upgrade', function upgrade(request, socket, head) {
    
    
    wsserver.handleUpgrade(request, socket, head, function done(ws) {
    
    
        ws.on('message', (data, isBinary) => {
    
    
            ws.send('message: ' + data + 'recieved!')
        })
    });
})

Since WebSocket connection establishment relies on the HTTP protocol, many students mistakenly believe that WebSocket is a protocol based on the HTTP protocol. But in fact, WebSocket has nothing to do with HTTP after the connection is established. Like the HTTP protocol, it is an application layer protocol based on the TCP protocol.

WebSocket frame format

WebSocket uses a custom binary framing format to divide each application message into one or more frames. The peer waits until the complete message is received before assembling and processing it.

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-------+-+-------------+-------------------------------+
 |F|R|R|R| opcode|M| Payload len |    Extended payload length    |
 |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
 |N|V|V|V|       |S|             |   (if payload len==126/127)   |
 | |1|2|3|       |K|             |                               |
 +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
 |     Extended payload length continued, if payload len == 127  |
 + - - - - - - - - - - - - - - - +-------------------------------+
 |                               |Masking-key, if MASK set to 1  |
 +-------------------------------+-------------------------------+
 | Masking-key (continued)       |          Payload Data         |
 +-------------------------------- - - - - - - - - - - - - - - - +
 :                     Payload Data continued ...                :
 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
 |                     Payload Data continued ...                |
 +---------------------------------------------------------------+

Mainly introduces two key fields:

  • FIN. Occupies 1 bit. Indicates whether there are any subsequent frames. A message may be split into multiple frames. The receiver determines that the frame is the last and concatenates the previous frames to form a message. TCP does not have sticky packets. Sticky packets are a problem caused by unreasonable application layer protocol design.
  • opcode. Occupies 4bit.
    • 8 means close (close connection) frame. This control instruction needs to be sent when actively closing the connection. Otherwise, websocket will report a 1006 error. This error code can be used to distinguish whether the connection is closed normally or due to other abnormal conditions.
    • 9 represents a ping frame, and 10 represents a pong frame. The ping/pong mechanism is to detect whether the connection is disconnected when there is no message communication for a long time. Currently, the server can only send a ping to the browser, and the browser returns a pong message. The browser currently does not have an open interface for sending control instructions.

Implement a simple real-time comparison service using WebSocket

We can simply use the face detection and analysis interface and the face comparison interface to create a real-time face detection and comparison service.

前端 服务端 腾讯云 建立WebSocket连接 开始定时发送截帧 调用人脸检测与分析接口 返回人脸位置信息 传入比对图与前端截帧,调用人脸比对接口 返回比对结果 返回比对结果 断开连接 前端 服务端 腾讯云

In terms of AI capabilities, we will use the two interfaces provided by Tencent Cloud: face detection and analysis interface and face comparison :

  • The face detection and analysis interface is used to detect face position and face occlusion, and returns according to the interface to prompt the user to adjust the posture.
  • The face comparison interface is used to compare the frames passed in from the front end with the comparison stored on the server to obtain a similarity, which is used to determine whether it is the same person.

On the front end, we use the getUserMedia API to open the camera to obtain the video stream; we use the WebSocket API to establish a WebSocket connection with the server. After the connection is successfully established, frames can be intercepted from the video stream and sent to the server for detection.

On the server side, we can use the Nodejs+ ws npm package to build a simple WebSocket server. After receiving the frame capture, the server can call the interface provided by Tencent Cloud for detection and verification.

Experience floating living body

The face core floating layer living body is also a real-time living body detection solution based on the above solution. It also handles more details to make the experience smoother. You can follow the steps below to apply for and experience the facial core and body floating layer in vivo service.

1. Activate face verification service

Learn about Tencent Cloud AI Face Core product on Tencent Cloud official website , click to apply for a free trial to experience it.

Insert image description here

2. Apply for business process and obtain RuleId

After the face ID service is successfully activated, you can go to the console to create a business process: https://console.cloud.tencent.com/faceid
Insert image description here

Select "WeChat H5 (floating layer/normal mode)", enter the name of the official account for testing, and click Next.
Insert image description here

Then fill in the relevant information according to the prompts on the console.
Insert image description here

Insert image description here

Insert image description here
Insert image description here

After the application is completed, check your RuleId on the console.

3. Call the pre-authentication interface to obtain the experience connection

We can use API Explorer to call the real-name authentication interface and obtain the experience connection. The RuleId input parameter is filled in with the RuleId applied for in the previous step. Click to initiate the call.
Insert image description here

4. Use WeChat to open the experience connection

After successfully calling the DetectAuth interface, there will be a URL in the return packet. You can open it using WeChat to experience it.

{
    
    
    "Response": {
    
    
        "BizToken": "CE661F1A-0F1E-45BD-BE13-34C05CEA7681",
        "Url": "https://xxxxxxxxxxxxx",
        "RequestId": "f904f4cf-75db-4f8f-a5ec-dc4f942c7f7a"
    }
}

Reference documentation

Guess you like

Origin blog.csdn.net/tencentAI/article/details/129404008