This article describes the use of scrapy send post requests pit, Xiao Bian feel very good, now for everyone to share, but also to be a reference. Come and see, to follow the small series together
to use post requests to send a request
to send a post request to look at it is how much easy to use requests, send requests
Requests simple API means that all types of HTTP requests are obvious. For example, you could send a HTTP POST request so that:
>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})
Using the dictionary data can be passed as a parameter, but can also pass ancestral
>>>payload = (('key1', 'value1'), ('key1', 'value2'))
>>>r = requests.post('http://httpbin.org/post', data=payload)
>>>print(r.text)
{
...
"form": {
"key1": [
"value1",
"value2"
]
},
...
}
Json is passed
>>>import json
>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}
>>>r = requests.post(url, data=json.dumps(payload))
Version 2.4.2 added new features:
>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}
>>>r = requests.post(url, json=payload)
In other words, you do not need to do anything to change parameters, only need to focus on the use of data = or json =, the rest of the requests have been done to help you.
Use scrapy send a post request to
get scrapy default request sent when we need to send a request to carry or login parameters, need post, requested by the source code can be seen, for example in the following
from scrapy.spider import CrawlSpider
from scrapy.selector import Selector
import scrapy
import json
class LaGou(CrawlSpider):
name = 'myspider'
def start_requests(self):
yield scrapy.FormRequest(
url='https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false',
formdata={
'first': 'true',#这里不能给bool类型的True,requests模块中可以
'pn': '1',#这里不能给int类型的1,requests模块中可以
'kd': 'python'
},这里的formdata相当于requ模块中的data,key和value只能是键值对形式
callback=self.parse
)
def parse(self, response):
datas=json.loads(response.body.decode())['content']['positionResult']['result']
for data in datas:
print(data['companyFullName'] + str(data['positionId']))
Official Recommended Using FormRequest to via HTTP send data POST
return [FormRequest(url="http://www.example.com/post/action",
formdata={'name': 'John Doe', 'age': '27'},
callback=self.after_post)]
Used here is FormRequest, and use formdata pass parameters, see here is a dictionary.
However, a little to the super pit, toss the afternoon today, the use of this method to send a request, how the problem will be made to return the data has not been what I want
return scrapy.FormRequest(url, formdata=(payload))
on the Internet for a long time, finally found a way, use scrapy.Request transmission request, data acquisition can be normal.
return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)
Reference: Send Post Request in Scrapymy
_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST',
body=json.dumps(my_data),
headers={'Content-Type':'application/json'} )
FormRequest and Request difference
in the document, could barely see the difference,
The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.
He said FormRequest FormData added a new parameter, receiving form data or dictionary contains tuples can be iterative, and converts it to the request body. And FormRequest of inherited Request
class FormRequest(Request):
def __init__(self, *args, **kwargs):
formdata = kwargs.pop('formdata', None)
if formdata and kwargs.get('method') is None:
kwargs['method'] = 'POST'
super(FormRequest, self).__init__(*args, **kwargs)
if formdata:
items = formdata.items() if isinstance(formdata, dict) else formdata
querystr = _urlencode(items, self.encoding)
if self.method == 'POST':
self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')
self._set_body(querystr)
else:
self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)
###
def _urlencode(seq, enc):
values = [(to_bytes(k, enc), to_bytes(v, enc))
for k, vs in seq
for v in (vs if is_listlike(vs) else [vs])]
return urlencode(values, doseq=1)
We pass the final { 'key': 'value', 'k': 'v'} is converted to 'key = value & k = v' and the default method is POST, look at Request
class Request(object_ref):
def __init__(self, url, callback=None, method='GET', headers=None, body=None,
cookies=None, meta=None, encoding='utf-8', priority=0,
dont_filter=False, errback=None, flags=None):
self._encoding = encoding # this one has to be set first
self.method = str(method).upper()
The default method is GET, in fact, is not affected. You can still send post request. It makes me think of the usage requests in the request, which is the basis of the method defined in the request.
def request(method, url, **kwargs):
"""Constructs and sends a :class:`Request <Request>`.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
:param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
to add for the file.
:param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How many seconds to wait for the server to send data
before giving up, as a float, or a :ref:`(connect timeout, read
timeout) <timeouts>` tuple.
:type timeout: float or tuple
:param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
:type allow_redirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param verify: (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to ``True``.
:param stream: (optional) if ``False``, the response content will be immediately downloaded.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
:return: :class:`Response <Response>` object
:rtype: requests.Response
Usage::
>>> import requests
>>> req = requests.request('GET', 'http://httpbin.org/get')
<Response [200]>
"""
# By using the 'with' statement we are sure the session is closed, thus we
# avoid leaving sockets open which can trigger a ResourceWarning in some
# cases, and look like a memory leak in others.
with sessions.Session() as session:
return session.request(method=method, url=url, **kwargs)
Content on more than how many, and finally to recommend a good reputation in the number of public institutions [programmers], there are a lot of old-timers learning skills, learning experience, interview skills, workplace experience and other share, the more we carefully prepared the zero-based introductory information on actual project data every day to explain the timing of Python programmers technology, to share some of the ways to learn and need to pay attention to small details, to remember the attention I