Simple python reptile tutorial: batch crawling pictures

python programming language that can be said is a new language, but also the past two years relatively fast development of the language, and whether children or adults can learn this new programming language, today Nanjing small yard king python training institutions into a share of the a python reptile tutorial.

 

  Web crawlers, also known as web spider, web robot, reptiles divided into general web crawler, the web crawler focused, incremental web crawler, Deep Web crawlers four. "Insect its name", reptiles we know is not a arthropods invertebrates, but a kind of computer program, or script program that automatically crawls the WWW information, and can be processed in accordance with certain requirements these messages.

 

  c / c ++, java, python, php and other languages ​​can be used to write crawlers, but generally speaking, most developers would choose python, php and other language development crawlers, so today will teach you to make use of python a very simple and practical reptiles.

 

 

  Reptile principle

 

  When we find the information we need in the network, usually operated by a browser, the browser will send our request to the server to store information, and then sent me the server receives a request will return the requested data to the browser device.

 

1.jpeg

  Image point of speaking, the browser is equivalent to our human translator, our human language translation server to listen, understand after the server began to execute our command, and the result is said to listen to the browser, then the browser results translate to us humans. Therefore, our crawler is by constantly imitating instruction sent by the browser, so that the server continually execute the corresponding command, and the server does not know the command to send human or reptiles, because the server can only understand Browser "Language" .

 

  Why has to be this way? We humans make their command do not you? Why let the machine do it then? For example: for example, you are doing the picture identification of artificial intelligence, which will require a lot of training data - pictures, one by one, you can not download the images manually, right? This time the role of reptiles came!

 

2.jpeg

  python reptile principle

 

  For example, we now want to grab the cat's picture batch classification model training for dogs and cats, with a python smart choice to replace manual download cat, dog picture this tedious process.

 

  In python, we can use requests.get (https://www.jkys120.com/) initiates a request to the target address, after which the server returns some data, there is a cat, dog pictures stored in the address data inside we need to address the picture and HTML tags, and other useless information separate, it will be used to regularization method, where you can use re library that comes with some of the regularization method.

 

  Finally, we need to specify the url to download the files to your computer, it will use urllib library request.urlretrieve () method.

 

  Code Tutorial

 

  First, by convention we will use to put the library into the py file.

 

  import requestsimport jsonimport urllibimport re

 

  Then start writing our crawling program, for example pictures here to Baidu (request address in the address bar, the difference is only in the word keyword), Detailed program in the note below them.

 

3.jpeg

  requests_content text property text data is returned by the server, which contains a number of HTML tags and JavaScript scripting code.

 

4.jpeg

  This is what we will use the regularization method to extract the address of the picture.

 

5.jpeg

  Finally, the url of the picture file is downloaded to your computer.

 

6.jpeg

  operation result

 

  Here some bulk download computer wallpaper, how the situation with a look!

Guess you like

Origin www.cnblogs.com/zqw111/p/11347347.html