"Anti-replay" strategy for data security (anti-crawlers)

In the security of the big front-end era, I talked about how the web front-end and the native client can implement anti-crawling strategies from the data security level. This article continues the previous background and will talk about a technical solution from the API data interface level to achieve data security.

1. API interface request security issues

There are many common security problems in the API interface. The common ones are as follows:

  1. Even if HTTPS is used, professional packet capture tools such as Charles and Wireshark can play the role of certificate issuance and verification, so the data can be viewed
  2. After getting the request information, the second request is initiated intact, and some dirty data is produced on the server (the logic behind the interface is the data insertion and deletion of the DB, etc.)

So there are some solutions for the above problems:

  1. Two-way authentication of HTTPS certificate solves the problem of packet capture tool
  2. If the data after HTTPS plus certificate authentication is intercepted by a network layer expert, it is necessary to sign the request parameters
  3. "Anti-replay policy" solves the problem of multiple initiation of requests
  4. The request parameters and return content are subject to additional RSA encryption processing. Even if intercepted, the plaintext cannot be viewed.

The two-way authentication of HTTPS certificate and the technical solution of anti-crawlers on the web side are explained in the article Security in the Big Front-End Era . Next, the protagonist of this article is introduced: anti-replay

2. The request parameter is tamper-proof

As mentioned in the previous article, HTTPS can still be captured, causing security problems. The data under the packet capture tool is still streaking. You can check how to obtain HTTPS data in the article Charles from Getting Started to Mastering .

If the data after HTTPS plus certificate authentication is intercepted by a network layer expert, the request parameters need to be signed. Proceed as follows

  • The client encrypts the request parameters with the agreed key to obtain the signature. Add the signature to the request parameters and send it to the server
  • The server receives the client request, and uses the agreed key to re-sign the request parameters (excluding signature) to obtain the value autograph
  • The server compares the signature and autograph, and if they are equal, it is considered to be a legitimate request. Otherwise, it is considered that the parameters have been tampered with, and it is determined to be an illegal request.

Because the intermediary does not know the signature key, even if the request is intercepted and a certain parameter is modified, but the correct signature cannot be obtained, a request constructed in this way will be judged as an illegal request by the server.

3. Anti-replay strategy

In the engineering culture, when we want to do something, we must first define it. We know what to do and how to do it.

Theoretically, when an API interface request is received, the service will verify it, but when a legitimate request is intercepted by a middleman, the middleman must repeat the request one or more times without changing it. This kind of repeated use of legitimate requests is an attack. is called a replay .

Replay can cause server problems, so we need to do anti-replay for replay. In essence, it is how to distinguish a normal and legitimate request.

3.1 Scheme based on timestamp

Theoretically, the time from when the client initiates a request to the time when the server receives the request is judged as no more than 60 seconds in the industry. Using this feature, the client adds timestamp1 to each request, and the client signs timestamp1 and other request parameters together to obtain the signature, and then sends the request to the server.

  • The server gets the current timestamp timestamp2, if timestamp2 - timestamp1 > 60s, it is considered illegal
  • The server receives the client request, and uses the agreed key to re-sign the request parameters (excluding signature and timestamp1) to obtain the value autograph. Compare signature and autograph, if they are not equal, it is considered an illegal request

If the middleman intercepts the request and modifies the timestamp or any other parameters, but does not know the key, the server still judges it as an illegal request. The process of the middleman from capturing packets, tampering with parameters, and initiating requests is generally longer than 60 seconds, so the server will still judge it as an illegal request.

The design flaws based on timestamp are also obvious. For various reasons, a request within 60 seconds will exploit a rule loophole, and the server will judge it as a legitimate request.

3.2 Nonce-based scheme

Since timestamps have loopholes, the new scheme is based on random strings of nonce. That is to say, a random string is added to each request, and then the other parameters are encrypted with the key to obtain the signature signature. After the server receives the request

  • First judge whether the nonce parameter can exist in a certain collection, if it exists, it is considered as an illegal request; if it does not exist, add the nonce to the current collection
  • The server encrypts the client request parameters (except nonce) with the key to obtain the autograph, and compares the signature with the autograph. If they are not equal, the request is considered illegal.

However, this solution also has shortcomings, because the current request needs to be searched and matched with the set, so the set cannot be too large, otherwise the matching algorithm is very time-consuming and the interface performance is reduced. So had to delete some nonce values ​​periodically. But in this case, the deleted nonce is used as a replay attack, and the server determines that it is a legitimate request.

Assuming that the server only stores nonces requested within 24 hours, this storage is still not a small overhead.

3.3 Scheme based on timestamp + nonce

According to the respective characteristics of timestamp and nonce: timestamp cannot solve the replay request within 60 seconds; nonce storage and search consume a lot. Therefore, combining the characteristics of the two, there is a "timestamp + nonce anti-replay solution".

  • Use timestamp to solve the problem of illegal requests exceeding 60 seconds
  • Use nonce to solve the missed fish within 60 seconds of timestamp

step:

  1. The client uses the current timestamp1, random string and other request parameters to generate a signature according to the key
  2. The server receives the request and uses the server key to encrypt the request parameters except timestamp1 and random string to generate a signature autograph
  3. The server compares signature and autograph, if they are not equal, it considers an illegal request
  4. Get the server-side timestamp, timestamp2 - timestamp1 < 60, it is judged as a legitimate request, and then save the nonce
  5. The server only saves the nonce within 60 seconds, and periodically deletes the expired nonce in the collection

The collection should not operate files or databases directly, otherwise the server will have too much IO, causing performance bottlenecks. Can be mmap or other memory-to-file read and write mechanisms. Depending on the scenario, you can choose optimistic locking or pessimistic locking.

There is a timestamp problem. The server will judge the difference between the timestamp in the request parameter. One of the fatal drawbacks is that there is a time difference between the server time and the client time. Of course, you can also solve this problem by verifying the timestamp. Please continue to see the following section for time synchronization.

Fourth, the principle of computer network time synchronization technology

Time synchronization between client and server is very important in many scenarios, to name a few examples, these scenarios are frequent.

  • A commodity spike system. Users open the page and browse products of various categories. The right side of the product list interface and the details page have a countdown seconds kill function. Users add purchases, place orders, and settle payments on the details page. Found a pop-up prompt "Insufficient product inventory, please buy similar products from other brands"
  • A question answering system, the topic is the company's core competitiveness. So deliberate programmers designed an "anti-replay" function for the interface. However, the front-end brother is not good enough, the timestamp brought by the interface is not in the same time zone as the server, and there is a difference of several seconds. The crawler engineers of competing companies with ulterior motives discovered the vulnerability and crawled the subject data.

So this phenomenon is very common in the computer field, and there is a solution.

  1. If the accuracy requirement is not high: first request the time ServerTime on the server, then record it, and record the current time LocalTime1 at the same time; when you need to obtain the current time, use the latest current time (LocalTime2 - LocalTime1 + ServerTime)

    Take the iOS side as an example:

    • After the app starts, the server time ServerTime is obtained through the interface and saved locally. And at the same time record the current time LocalTime1
    • When you need to use the server time, first get the current time LocalTime2 - LocalTime1 + ServerTime
    • If the access to the server time interface fails, the previously synchronized result will be obtained from the cache (the initial time is built-in in the App packaging phase)
    • Use to NSSystemClockDidChangeNotificationmonitor that the system time changes, and if it changes, re-acquire the interface and synchronize the time
  2. If you need higher precision, such as 100 nanoseconds, you need to use the NTP (Network Time Protocol) network time protocol and the PTP (Precision Time Protocol) precise time synchronization protocol.

NTP and PTP are not in the scope of this article. If you are interested, you can check this article

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324206623&siteId=291194637