table of Contents:
First, the basic knowledge introduction
Second, get the page
First, the basic knowledge introduction
1, mainly rely urllib: That URL (web address) + lib (package); more details, refer python documentation (open IDLE - Help - Python Docs - you can query);
2, the general format of the URL (ps: the [] may be omitted)
Protocol: // domain name [: port] / path /
Where the term is explained as follows:
Protocol: such as: http, https, ftp, file and so on;
Domain: storage resource server domain name or IP address of the system (plus port number required portion, such as: 8080), such as: www.baidu.com (Domain example), localhost (local IP address) and the like;
Path: specific address storage resource, directory or file name, such as: index.html and so on.
# Introduction rely Import urllib.request # open garden blog login address (ie get the page), the returned object is stored in response in response = the urllib.request.urlopen ( " https://account.cnblogs.com/signin " ) # reading object just returned, will be stored in the form of binary strings in html_d html_d = response.read () # binary decoding utf-8 string (mainly to see what the page is encoded, but typically are utf -8) HTML html_d.decode = ( " UTF-. 8 " ) # will print out the results Print (HTML)
Reference in this blog:
Zero-based learning portal Python https://www.bilibili.com/video/av4050443?p=54