python first experience - Web Crawler

Principle reptiles: just crawler is an automated program to help us get the page data. Then you are probably wondering exactly how we get page data? Here we will communicate with the network called an analogy: when we want to access a Web site when (URL), web site (URL) is similar to a phone number, and computers, smart phones such client (client) will like phone. Through the client's browser sends access (browser) requests (request), like dialing a telephone number over the telephone. Party that receives the request is called a server (web server), if the server is running properly and agree to our request will be sent to the client answer (response), to answer contents will be placed in HTML file. In this case, the browser and can help us parse HTML file and let it turn into a web page looks like we usually see.

urllib  is a Python module, we through  import  call it, and let it ( urllib.request ) to help us send a request to the URL, to receive the reply.

Equivalent to the destination URL to a letter we still unopened, and the next thing to do is use  urllib in  the Read () , read the specific content of the letter.

from urllib.request import urlopen
page = "https://assets.baydn.com/baydn/public/codetime/1/shanbay_news.html"
# 爬取page数据存入shanbay_news
shanbay_news = urlopen(page)
news_data = shanbay_news.read()
print(news_data)

Guess you like

Origin www.cnblogs.com/free-1124/p/11360080.html