Python http.server URL Redirect

源码分析

众所周知Python有一个一键启动Web服务器的方法：

python3 -m http.server port

在任意目录执行如上命令，即可启动一个web文件服务器，这个方法用到了http.server模块，该模块包含以下几个比较重要的类：

HTTPServer这个类继承于socketserver.TCPServer，说明其实HTTP服务器本质是一个TCP服务器
BaseHTTPRequestHandler，这是一个处理TCP协议内容的Handler，目的就是将从TCP流中获取的数据按照HTTP协议进行解析，并按照HTTP协议返回相应数据包，但这个类解析数据包后没有进行任何操作，不能直接使用，如果我们要写自己的Web应用，可以继承这个类，并实现其中的do_XXX等方法
SimpleHTTPRequestHandler，这个类继承于BaseHTTPRequestHandler，从父类中拿到解析好的数据包，并将用户请求的path返回给用户，等于实现了一个静态文件服务器
CGIHTTPRequestHandler，这个类继承于SimpleHTTPRequestHandler，在静态文件服务器的基础上，增加了执行CGI脚本的功能

简单来说就是如下：

+-----------+ +------------------------+
| TCPServer | | BaseHTTPRequestHandler |
+-----------+ +------------------------+
^ |
| v
| +--------------------------+
+----------------| SimpleHTTPRequestHandler |
| +--------------------------+
| |
| v
| +-----------------------+
+-----------------| CGIHTTPRequestHandler |
+-----------------------+

下面我们看一下SimpleHTTPRequestHandler的源代码：

class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):

    """Simple HTTP request handler with GET and HEAD commands.

    This serves files from the current directory and any of its
    subdirectories.  The MIME type for files is determined by
    calling the .guess_type() method.

    The GET and HEAD requests are identical except that the HEAD
    request omits the actual contents of the file.

    """

    server_version = "SimpleHTTP/" + __version__

    def __init__(self, *args, directory=None, **kwargs):
        if directory is None:
            directory = os.getcwd()
        self.directory = directory
        super().__init__(*args, **kwargs)

    def do_GET(self):
        """Serve a GET request."""
        f = self.send_head()
        if f:
            try:
                self.copyfile(f, self.wfile)
            finally:
                f.close()

    def do_HEAD(self):
        """Serve a HEAD request."""
        f = self.send_head()
        if f:
            f.close()

    def send_head(self):
        """Common code for GET and HEAD commands.

        This sends the response code and MIME headers.

        Return value is either a file object (which has to be copied
        to the outputfile by the caller unless the command was HEAD,
        and must be closed by the caller under all circumstances), or
        None, in which case the caller has nothing further to do.

        """
        path = self.translate_path(self.path)
        f = None
        if os.path.isdir(path):
            parts = urllib.parse.urlsplit(self.path)
            if not parts.path.endswith('/'):
                # redirect browser - doing basically what apache does
                self.send_response(HTTPStatus.MOVED_PERMANENTLY)
                new_parts = (parts[0], parts[1], parts[2] + '/',
                             parts[3], parts[4])
                new_url = urllib.parse.urlunsplit(new_parts)
                self.send_header("Location", new_url)
                self.end_headers()
                return None
            for index in "index.html", "index.htm":
                index = os.path.join(path, index)
                if os.path.exists(index):
                    path = index
                    break
            else:
                return self.list_directory(path)
        ctype = self.guess_type(path)
        try:
            f = open(path, 'rb')
        except OSError:
            self.send_error(HTTPStatus.NOT_FOUND, "File not found")
            return None

        try:
            fs = os.fstat(f.fileno())
            # Use browser cache if possible
            if ("If-Modified-Since" in self.headers
                    and "If-None-Match" not in self.headers):
                # compare If-Modified-Since and time of last file modification
                try:
                    ims = email.utils.parsedate_to_datetime(
                        self.headers["If-Modified-Since"])
                except (TypeError, IndexError, OverflowError, ValueError):
                    # ignore ill-formed values
                    pass
                else:
                    if ims.tzinfo is None:
                        # obsolete format with no timezone, cf.
                        # https://tools.ietf.org/html/rfc7231#section-7.1.1.1
                        ims = ims.replace(tzinfo=datetime.timezone.utc)
                    if ims.tzinfo is datetime.timezone.utc:
                        # compare to UTC datetime of last modification
                        last_modif = datetime.datetime.fromtimestamp(
                            fs.st_mtime, datetime.timezone.utc)
                        # remove microseconds, like in If-Modified-Since
                        last_modif = last_modif.replace(microsecond=0)

                        if last_modif <= ims:
                            self.send_response(HTTPStatus.NOT_MODIFIED)
                            self.end_headers()
                            f.close()
                            return None

            self.send_response(HTTPStatus.OK)
            self.send_header("Content-type", ctype)
            self.send_header("Content-Length", str(fs[6]))
            self.send_header("Last-Modified",
                self.date_time_string(fs.st_mtime))
            self.end_headers()
            return f
        except:
            f.close()
            raise
...

前面HTTP解析的部分不再分析，如果我们请求的是GET方法，将会被分配到do_GET函数里，在do_GET()中调用了send_head()方法

send_head()中调用了self.translate_path(self.path)将request path进行一个标准化操作，目的是获取用户真正请求的文件，如果这个path是一个已存在的目录，则进入if语句，如果用户请求的path不是以/结尾，则进入第二个if语句，这个语句中执行了HTTP跳转的操作，这就是我们当前漏洞的关键点了：

漏洞复现

在chrome、firefox等主流浏览器中，如果url以//domain开头，浏览器将会默认认为这个url是当前数据包的协议，比如，当我们在浏览器中访问http://example.com//baidu.com/时，浏览器会默认认为要跳转到http://baidu.com，而不是跳转到.//baidu.com/目录，所以，如果我们发送的请求的是GET //baidu.com HTTP/1.0\r\n\r\n，那么将会被重定向到//baidu.com/，也就产生了一个任意URL跳转漏洞。

扫描二维码关注公众号，回复： 12739550 查看本文章

在这里，由于目录baidu.com不存在，我们还需要绕过if os.path.isdir(path)这条if语句，而绕过方法也很简单，因为baidu.com不存在，我们跳转到上一层目录即可：

GET //baidu.com/%2f.. HTTP/1.0\r\n\r\n

下面我们做一个简单的测试，在本地的test目录下启动一个http.server服务:

之后在浏览器中访问http://127.0.0.1:1234//baidu.com%2f..即可发现跳转到了http://www.baidu.com/search/error.html

漏洞价值

虽然说python核心库存在这个漏洞，不过通常情况下不会有人直接在生产环境用python -m http.server，但是我们在做类似审计的时候可以关注一些请求处理，查看一些doGet以及doPost时是否有继承并使用SimpleHTTPRequestHandler类的，如果有的话可以进行跟进一步的分析，查看是否可以利用~