golang reptiles

Web crawler (also known as web spider, web robot, in the middle of FOAF community, more often called web Chaser), is a kind of follow certain rules, automatically grab information on the World Wide Web program or script.

In fact, the popular talk is to obtain data on the web page you want by program, which is automatically grab data

The basic flow of reptiles

Initiate a request
to initiate a request to the target site via HTTP library, that is, send a Request, the request may contain additional header information, waiting for a server response

Acquiring response content
if the server can be a normal response, will get a Response, Response contents page content is to be acquired, may be the type of HTML, Json string, binary data (images or videos) and other types

Analytical content
obtained content may be HTML, you can use regular expressions to parse, page parsing library, may be Json, it can be directly converted into Json object parsing and may be binary data, or can be stored for further processing

Save data
stored in various forms, can be saved as text, it can be saved to the database, or files stored in a specific format

 

reference:

1.  Write reptiles with Golang (a)

2. Python Reptile

3.  layman reptiles way: Comparative Python, Golang with the GraphQuery

Guess you like

Origin www.cnblogs.com/embedded-linux/p/12549053.html