Analysis and solution to nginx response 400 problem caused by an irregular HTTP request

background

Recently analyzing the data, I accidentally discovered that there is a group of users in the nginx log. All HTTP POST log reporting requests return 400, and there is no 200 success record. Since it only accounts for less than 0.5% of the overall requests, the monitoring alarm has not been triggered before, and it is very common. The strange thing is that only for the POST interface reported by the log, there will be this situation of all 400 for a specific user, but there is no such problem for other interfaces, whether POST or GET.

Further analysis of the log revealed that in fact, for user requests in some areas, this ratio even exceeded 10%, so I took the time to follow up and finally found that it was caused by the non-standard format of HTTP requests issued by some models of clients. I will record it here. Analyze the process, causes, and final solutions.

problem analysis

Common nginx 400 reasons

After searching online information, I found that there may be several reasons why nginx responds with 400:

  1. request_uri is too long and exceeds nginx configuration size
  2. The cookie or header is too large and exceeds the nginx configuration size.
  3. Empty HOST header
  4. content_length and body length are inconsistent

These errors actually occur at the nginx layer. That is, when nginx processes, it thinks that the client request format is incorrect, so it directly returns 400 and does not forward the request to the upstream server. Therefore, the upstream server is completely unaware of these erroneous requests.

This time, according to the nginx log analysis, we can see that nginx actually forwards the request to the upstream server - upstream_addr is already the effective address of the upstream server, so the 400 should actually be returned by the upstream server instead of nginx directly. This shows that at least nginx One layer thinks the request format is OK.

Actual nginx 400 log analysis

Intercept the error logs of some online users. The general format is as follows:

127.0.0.1:63646	-	24/Apr/2022:00:50:07 +0900	127.0.0.1:1080	0.000	0.000	POST /log/report?appd=abc.demo.android&appname=abcdemo&v=1.0&langes=zh-CN&phonetype=android&device_type=android&osn=Android OS 10 / API-29 (QKQ1.190825.002/V12.0.6.0.QFKCNXM)&channel=Google Play&build=Android OS 10 / API-29 (QKQ1.190825.002/V12.0.6.0.QFKCNXM)&resolution=1080x2340&ts=1650636192534 HTTP/1.1	400	50	-	curl/7.52.1	-	0.000	0.000	127.0.0.1	1563	2021

Log analysis can reveal that most 400 requests have a problem: their query parameters have not been urlencoded. For example, you can clearly see that the spaces in the parameter channel=Google Play have not been transcoded into %20. Intuitively, this should be the same as The reason for 400 is directly related.

trial

In order to verify whether the untranscoded query parameters are the direct cause of 400, simply construct several test http requests through curl:

# 无空格
curl -v 'http://127.0.0.1/log/report?appd=abc.demo.android&appname=abcdemo&v=1.0&langes=zh-CN&phonetype=android&channel=Google%20Play' -d @test.json
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> POST /log/report?appd=abc.demo.android&appname=abcdemo&v=1.0&langes=zh-CN&phonetype=android&channel=Google%20Play HTTP/1.1
> Host: 127.0.0.1
> User-Agent: curl/7.52.1
> Accept: */*
> Content-Length: 1563
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< Server: nginx/1.16.1
< Date: Sat, 23 Apr 2022 15:54:53 GMT
< Content-Type: application/json
< Content-Length: 22
< Connection: keep-alive
<
* Curl_http_done: called premature == 0
* Connection #0 to host 127.0.0.1 left intact
# 有空格
curl -v 'http://127.0.0.1/log/report?appd=abc.demo.android&appname=abcdemo&v=1.0&langes=zh-CN&phonetype=android&channel=Google Play' -d @test.json
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> POST /log/report?appd=abc.demo.android&appname=abcdemo&v=1.0&langes=zh-CN&phonetype=android&channel=Google Play HTTP/1.1
> Host: 127.0.0.1
> User-Agent: curl/7.52.1
> Accept: */*
> Content-Length: 1563
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 400 Bad Request
< Server: nginx/1.16.1
< Date: Sat, 23 Apr 2022 15:55:14 GMT
< Content-Type: text/plain; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
<
* Curl_http_done: called premature == 0
* Connection #0 to host 127.0.0.1 left intact

It is found that all requests with spaces in the upstream server will directly return 400. It can be inferred that the query parameter not urlencoded is the direct cause of the 400 problem, but why does not transcoding cause 400? How to explain this phenomenon from the perspective of HTTP principles? To find the answer, we need to review the HTTP protocol standard.

HTTP request specification format

The HTTP request message format is as follows:


As shown in the figure above, as a text protocol, the distinction and splitting of different parts of the HTTP request message are completely based on the character marks of spaces, carriage returns\r, and line feeds\n. For the three characters in the first line The splitting of each partial request method, URL and protocol version is split based on spaces.

Analyzing the 400 HTTP requests found, we can find that because the query parameters are not urlencoded, spaces will appear in them. Strictly speaking, this request no longer conforms to the HTTP specification, because at this time, the first line can be split based on the spaces to produce more than 3 Part cannot correspond one-to-one with method, URL, and version. From a semantic point of view, it is reasonable processing logic to directly return 400 at this time.

In actual processing, when faced with this situation, some components can be compatible - the first and last parts of the split are used as method and version respectively, and the remaining parts in the middle are unified as URLs. For example, nginx is compatible with this non-standard format, but Many components are not compatible with handling this situation - after all, this does not comply with the HTTP specification. For example, Charles' packet capture will cause an error when requesting such a request. Golang's net/http library and Django's http module will report 400 when receiving such requests. .

golang net/http parsing HTTP code analysis

The upstream server responsible for log reporting is logsvc implemented in golang. It uses the standard card library net/http to process HTTP requests. Let’s further explore how the standard library parses HTTP requests to confirm the cause of the error.

According to the golang source code, it can be found that the path for HTTP request parsing is http.ListenAndServe => http.Serve => serve => readRequest.... The logic for parsing HTTP request headers is located in the readRequest function.
The readRequest part of the code is as follows:

// file: net/http/request.go
...
1009 func readRequest(b *bufio.Reader, deleteHostHeader bool) (req *Request, err error) {
1010     tp := newTextprotoReader(b)
1011     req = new(Request)
1012
1013     // First line: GET /index.html HTTP/1.0
1014     var s string
1015     if s, err = tp.ReadLine(); err != nil {
1016         return nil, err
1017     }
1018     defer func() {
1019         putTextprotoReader(tp)
1020         if err == io.EOF {
1021             err = io.ErrUnexpectedEOF
1022         }
1023     }()
1024
1025     var ok bool
1026     req.Method, req.RequestURI, req.Proto, ok = parseRequestLine(s)
1027     if !ok {
1028         return nil, &badStringError{"malformed HTTP request", s}
1029     }
1030     if !validMethod(req.Method) {
1031         return nil, &badStringError{"invalid method", req.Method}
1032     }
1033     rawurl := req.RequestURI
1034     if req.ProtoMajor, req.ProtoMinor, ok = ParseHTTPVersion(req.Proto); !ok {
1035         return nil, &badStringError{"malformed HTTP version", req.Proto}
1036     }
...

It can be seen that in readRequest, the method, URL and Proto fields of the first line are first parsed through parseRequestLine, and then ParseHTTPVersion is used to parse whether the version is correct. If it is not correct, an error {"malformed HTTP version" will be reported, which will eventually lead to a response of 400.

The parseRequestLine code is as follows:

...
 966 // parseRequestLine parses "GET /foo HTTP/1.1" into its three parts.
 967 func parseRequestLine(line string) (method, requestURI, proto string, ok bool) {
 968     s1 := strings.Index(line, " ")
 969     s2 := strings.Index(line[s1+1:], " ")
 970     if s1 < 0 || s2 < 0 {
 971         return
 972     }
 973     s2 += s1 + 1
 974     return line[:s1], line[s1+1 : s2], line[s2+1:], true
 975 }

It can be seen that the parsing code of parseRequestLine is to find the 0th and 1st space index, and then directly cut it into three parts: method, requestURI, and proto based on slice syntax. If the requestURI contains extra spaces, it will cause the proto value It actually becomes all characters after the first space. For example, "POST abc/?x=o space d HTTP/1.1" will be parsed as: method=POST, requestURI=abc/?x=0, proto=" space d HTTP/1.1", which will cause an error in ParseHTTPVersion parsing in the next step.

The ParseHTTPVersion code is as follows. You can find that if the version field parsed by parseRequestLine is not valid, an error will be returned:

...
 769 // ParseHTTPVersion parses an HTTP version string.
 770 // "HTTP/1.0" returns (1, 0, true).
 771 func ParseHTTPVersion(vers string) (major, minor int, ok bool) {
 772     const Big = 1000000 // arbitrary upper bound
 773     switch vers {
 774     case "HTTP/1.1":
 775         return 1, 1, true
 776     case "HTTP/1.0":
 777         return 1, 0, true
 778     }
 779     if !strings.HasPrefix(vers, "HTTP/") {
 780         return 0, 0, false
 781     }
 782     dot := strings.Index(vers, ".")
 783     if dot < 0 {
 784         return 0, 0, false
 785     }
 786     major, err := strconv.Atoi(vers[5:dot])
 787     if err != nil || major < 0 || major > Big {
 788         return 0, 0, false
 789     }
 790     minor, err = strconv.Atoi(vers[dot+1:])
 791     if err != nil || minor < 0 || minor > Big {
 792         return 0, 0, false
 793     }
 794     return major, minor, true
 795 }

solution

The first thing to do is to align the problem with the client. The client confirms that the network library method calling unity on some models fails to urlencode its query parameters properly. The new version will add additional code on top of the unity network library to ensure all parameters. Must be urlencoded so that it conforms to the HTTP specification.

Then we will further consider whether we can temporarily handle the existing abnormal requests online to prevent the new version from overwriting and repairing this part of the abnormal user log reporting data and continue to lose it. We have considered the following options for compatibility.

Try the third-party HTTP golang library gin && echo

Since the log service is handled by an independent golang server, its code logic is very simple: it just decompresses, parses, and writes the body of the log POST request to kafka. There is no other additional logic and the cost of modification is low, so replacement is considered first. net/http for other third-party libraries to see if they can solve the problem.

I tried the popular gin and echo libraries and found that both reported 400. I couldn't help but explore their source codes. It turned out that these two libraries actually called the ListenAndServer and Serve methods of net/http. The previous parsing logic was net /http corresponds to the code responsible, so it will naturally report 400.

nginx lua/perl script changes query parameters

Another possible method that comes to mind is to use a lua/perl script in the nginx layer to urlencode the incoming request_uri parameter that is not urlencoded and then send it to the upstream server. However, it was found that the lua and perl modules were not integrated during the online nginx compilation. To use this method, you can only:

  1. Either recompile the entire nginx and replace the original nginx
  2. Or use dynamic loading to compile the perl and lua modules separately and then use nginx to dynamically load them.

Considering that I am a RD rather than a professional nginx OP, and the risk of online impact, I will not try it easily.

nginx routes log/report to a server that is compatible with whitespace uncensored HTTP requests

As mentioned at the beginning, for abnormal requests waiting for spaces, only the log reporting POST interface will return 400, and other interfaces will return normal. This is actually because the normal business interface and log interface are split during nginx forwarding, log/report The interface will be forwarded to the independent golang logsvc service separately, while normal business requests will be forwarded to python's main api service.
Looking back, the reason why we split a separate golang server to be responsible for parsing app log reports and writing kafka, instead of the main api service being responsible for other interface logic, is mainly for two reasons:

  1. The API main service written in Pythono is relatively inefficient. Frequent and large-scale log reporting may consume too many resources and be slow.
  2. Avoid log reporting requests from affecting the response speed of other normal business requests, and decouple business logic from log reporting.

Currently, logsvc cannot handle this situation, but the api main service that uses the uwsgi protocol to interact with nginx can be parsed normally, so the following temporary configuration is added to nginx:

    location /log/report {
        include proxy_params;
        if ( $args !~ "^(.*) (.*)$" ) {
	    proxy_pass http://test_log_stream;
            break;
        }
        include uwsgi_params;
        uwsgi_pass test_api_stream;
    }

That is, if there are no spaces in the query parameters (args) through regular matching, it will be directly processed by logsvc. If there are spaces, it will be processed by the api main service using the uwsgi protocol. Since such abnormal requests only account for less than 0.5% of the total requests, previously The split architecture considered still works, but a small number of abnormal requests are first processed through the api main service for compatibility.

Please indicate the source when reprinting, original address:  https://www.cnblogs.com/AcAc-t/p/nginx_400_problem_for_not_encode_http_request.html

 

Guess you like

Origin blog.csdn.net/pantouyuchiyu/article/details/131440322