- scrapy the default is to get the request. This time we try to request a post.
We were still way dictionary online translation website, for example: http://fanyi.youdao.com/?keyfrom=fanyi.logo
get real URL url was translated by: http://fanyi.youdao.com/translate_o?smartresult= dict & smartresult = rule
actual use needs to be removed _o.
- First, we create a project, create a new folder, hold down the shift, the right mouse button to open a command window here, enter scrapy startproject youdaosipder.
- Once created, enter scrapy genspider ydspider youdao.com if appropriate reptiles file does not appear, open a command window, enter the command again in just such a file folder.
ydspider.py
# -*- coding: utf-8 -*-
import scrapy
#http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule
class YdspiderSpider(scrapy.Spider):
name = 'ydspider'
allowed_domains = ['fanyi.youdao.com']
# start_urls = ['http://youdao.com/']
def start_requests(self):
url='http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
#向队列中加入post请求
yield scrapy.FormRequest(
url=url,
formdata={
'i':'男人',
'from':'AUTO',
'to':'AUTO',
'smartresult':'dict',
'client':'fanyideskweb',
'salt':'15589655028559',
'sign':'6781389ab298673f7036bce9cd99815b',
'ts':'1558965502855',
'bv':'ab57a166e6a56368c9f95952de6192b5',
'doctype':'json',
'version':'2.1',
'keyfrom':'fanyi.web',
'action':'FY_BY_REALTlME'
},
callback=self.parse
)
def parse(self, response):
print('-----------------------------------------------------------')
print(response.body)
Turn off (commented) robot protocol in settings.py. Robot agreement are some of the statements site that allows users to what behavior, what behavior is not allowed, suggest that you find out.
Scrapy crawl ydspider input terminal in black
A man translation results
Reproduced in: https: //www.jianshu.com/p/e96e33060ebe