Python crawler.1. Simple web crawler

This is to record my own crawler learning process.

Crawl web pages using url package

import urllib.request                   #url包

def main():
    url = "http://www.douban.com/"
    response = urllib.request.urlopen(url) #request
    html = response.read() #Get
    html = html.decode("utf-8") #decode
    print(html) #print

if __name__ == "__main__":
    main()

 The urllib.request module is used to open and read urls

Several commonly used encoding methods for characters:

ASCII encoding: used to represent English, it is represented by 1 byte, the first bit is specified as 0, the other 7 bits store data, and a total of 128 characters can be represented.
Extended ASCII encoding: used to represent more European characters, using 8 bits to store data, a total of 256 characters can be represented
GBK/GB2312/GB18030: Represents Chinese characters. GBK/GB2312 stands for Simplified Chinese, GB18030 stands for Traditional Chinese.
Unicode encoding: contains all the characters in the world, is a character set.
UTF-8: It is one of the implementations of Unicode characters. It uses 1-4 characters to represent a symbol, and the byte length varies according to different symbols.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324691502&siteId=291194637