python reptiles (a) (suitable for beginners)

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/weixin_43701019/article/details/98876292

- Personal Notes

Series:
Python Reptile (b)
Python Reptile (C)
Python Reptile (four)
Python reptiles (five)
Python Reptile (six)
Python Reptile (seven)
Python Reptile (eight)
Python reptiles (nine)
Python Reptile (x)
Python Reptile (eleven)


Reptile concept

Here Insert Picture Description

  • Analysis:
    Step 0: get the data. Crawler based URLs we send a request to the server, and then return the data.
    Step 1: Analytical data. Crawler server will parse the returned data into a format that we can read.
    Step 2: Extraction data. Crawlers then extracts the data we need.
    Step 3: Data storage. Crawlers save these useful data up, easy to use and analyze your future.

First in vscode (go to the official website to download this myself on the line) but also add a python parser library download requests:

  • shift + ctrl + p and the following figure, select the resolver
    Here Insert Picture Description
  • This is a package of steps to download requests
    Here Insert Picture Description

  • retrieve data

    • Acquired by requests.get ( 'URL') from the URL (address) data, and returns the Response object.
      Here Insert Picture Description

    • Download a picture

      import requests
      res = requests.get('https://xxxx.com/xxx.png')
      #发出请求,并把返回的结果放在变量res中
      pic=res.content
      #把Reponse对象的内容以二进制数据的形式返回
      photo = open('ppt.jpg','wb')
      #新建了一个文件ppt.jpg,这里的文件没加路径,它会被保存在程序运行的当前目录下。
      #图片内容需要以二进制wb读写。
      photo.write(pic) 
      #获取pic的二进制内容
      photo.close()
      
    • Download a text

      import requests
      res = requests.get('https://xxxx.com/xxx')
      #发出请求,并把返回的结果放在变量res中
      novel=res.text
      #把Reponse对象的内容以字符串的形式返回
      #这假设下载一个小说
      novelfile = open('novel.text','w')
      #新建了一个文件novel.text,这里的文件没加路径,它会被保存在程序运行的当前目录下。		 
      novelfile.write(novel) 
      #获取pic的二进制内容
      novelfile.close()
      

Do not climb on random, this is oh Robots agreement, the provisions of which give you climb and what does not.
We have to look at the agreement before crawling URLs Oh, look Robots protocol URL .

Guess you like

Origin blog.csdn.net/weixin_43701019/article/details/98876292