basic programming python: python post pits using transmission request scrapy

This article describes the use of scrapy send post requests pit, Xiao Bian feel very good, now for everyone to share, but also to be a reference. Come and see, to follow the small series together
to use post requests to send a request
to send a post request to look at it is how much easy to use requests, send requests

Requests simple API means that all types of HTTP requests are obvious. For example, you could send a HTTP POST request so that:

>>>r = requests.post('http://httpbin.org/post', data = {'key':'value'})

Using the dictionary data can be passed as a parameter, but can also pass ancestral

>>>payload = (('key1', 'value1'), ('key1', 'value2'))
>>>r = requests.post('http://httpbin.org/post', data=payload)
>>>print(r.text)
{
 ...
 "form": {
  "key1": [
   "value1",
   "value2"
  ]
 },
 ...
}

Json is passed

>>>import json
 
>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}
 
>>>r = requests.post(url, data=json.dumps(payload))

Version 2.4.2 added new features:

>>>url = 'https://api.github.com/some/endpoint'
>>>payload = {'some': 'data'}
 
>>>r = requests.post(url, json=payload)

In other words, you do not need to do anything to change parameters, only need to focus on the use of data = or json =, the rest of the requests have been done to help you.

Use scrapy send a post request to
get scrapy default request sent when we need to send a request to carry or login parameters, need post, requested by the source code can be seen, for example in the following

from scrapy.spider import CrawlSpider
from scrapy.selector import Selector
import scrapy
import json
class LaGou(CrawlSpider):
  name = 'myspider'
  def start_requests(self):
    yield scrapy.FormRequest(
      url='https://www.******.com/jobs/positionAjax.json?city=%E5%B9%BF%E5%B7%9E&needAddtionalResult=false',
      formdata={
        'first': 'true',#这里不能给bool类型的True,requests模块中可以
        'pn': '1',#这里不能给int类型的1,requests模块中可以
        'kd': 'python'
      },这里的formdata相当于requ模块中的data,key和value只能是键值对形式
      callback=self.parse
    )
  def parse(self, response):
    datas=json.loads(response.body.decode())['content']['positionResult']['result']
    for data in datas:
      print(data['companyFullName'] + str(data['positionId']))

Official Recommended Using FormRequest to via HTTP send data POST

return [FormRequest(url="http://www.example.com/post/action",
          formdata={'name': 'John Doe', 'age': '27'},
          callback=self.after_post)]

Used here is FormRequest, and use formdata pass parameters, see here is a dictionary.

However, a little to the super pit, toss the afternoon today, the use of this method to send a request, how the problem will be made to return the data has not been what I want
return scrapy.FormRequest(url, formdata=(payload))
on the Internet for a long time, finally found a way, use scrapy.Request transmission request, data acquisition can be normal.
return scrapy.Request(url, body=json.dumps(payload), method='POST', headers={'Content-Type': 'application/json'},)
Reference: Send Post Request in Scrapymy

_data = {'field1': 'value1', 'field2': 'value2'}
request = scrapy.Request( url, method='POST', 
             body=json.dumps(my_data), 
             headers={'Content-Type':'application/json'} )

FormRequest and Request difference
in the document, could barely see the difference,

The FormRequest class adds a new argument to the constructor. The remaining arguments are the same as for the Request class and are not documented here.
Parameters: formdata (dict or iterable of tuples) – is a dictionary (or iterable of (key, value) tuples) containing HTML Form data which will be url-encoded and assigned to the body of the request.

He said FormRequest FormData added a new parameter, receiving form data or dictionary contains tuples can be iterative, and converts it to the request body. And FormRequest of inherited Request

class FormRequest(Request):
 
  def __init__(self, *args, **kwargs):
    formdata = kwargs.pop('formdata', None)
    if formdata and kwargs.get('method') is None:
      kwargs['method'] = 'POST'
 
    super(FormRequest, self).__init__(*args, **kwargs)
 
    if formdata:
      items = formdata.items() if isinstance(formdata, dict) else formdata
      querystr = _urlencode(items, self.encoding)
      if self.method == 'POST':
        self.headers.setdefault(b'Content-Type', b'application/x-www-form-urlencoded')
        self._set_body(querystr)
      else:
        self._set_url(self.url + ('&' if '?' in self.url else '?') + querystr)
      ###
 
 
def _urlencode(seq, enc):
  values = [(to_bytes(k, enc), to_bytes(v, enc))
       for k, vs in seq
       for v in (vs if is_listlike(vs) else [vs])]
  return urlencode(values, doseq=1)

We pass the final { 'key': 'value', 'k': 'v'} is converted to 'key = value & k = v' and the default method is POST, look at Request

class Request(object_ref):
 
  def __init__(self, url, callback=None, method='GET', headers=None, body=None,
         cookies=None, meta=None, encoding='utf-8', priority=0,
         dont_filter=False, errback=None, flags=None):
 
    self._encoding = encoding # this one has to be set first
    self.method = str(method).upper()

The default method is GET, in fact, is not affected. You can still send post request. It makes me think of the usage requests in the request, which is the basis of the method defined in the request.

def request(method, url, **kwargs):
  """Constructs and sends a :class:`Request <Request>`.
 
  :param method: method for the new :class:`Request` object.
  :param url: URL for the new :class:`Request` object.
  :param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
  :param data: (optional) Dictionary or list of tuples ``[(key, value)]`` (will be form-encoded), bytes, or file-like object to send in the body of the :class:`Request`.
  :param json: (optional) json data to send in the body of the :class:`Request`.
  :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
  :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
  :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
    ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
    or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
    defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
    to add for the file.
  :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
  :param timeout: (optional) How many seconds to wait for the server to send data
    before giving up, as a float, or a :ref:`(connect timeout, read
    timeout) <timeouts>` tuple.
  :type timeout: float or tuple
  :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
  :type allow_redirects: bool
  :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
  :param verify: (optional) Either a boolean, in which case it controls whether we verify
      the server's TLS certificate, or a string, in which case it must be a path
      to a CA bundle to use. Defaults to ``True``.
  :param stream: (optional) if ``False``, the response content will be immediately downloaded.
  :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
  :return: :class:`Response <Response>` object
  :rtype: requests.Response
 
  Usage::
 
   >>> import requests
   >>> req = requests.request('GET', 'http://httpbin.org/get')
   <Response [200]>
  """
 
  # By using the 'with' statement we are sure the session is closed, thus we
  # avoid leaving sockets open which can trigger a ResourceWarning in some
  # cases, and look like a memory leak in others.
  with sessions.Session() as session:
    return session.request(method=method, url=url, **kwargs)

Content on more than how many, and finally to recommend a good reputation in the number of public institutions [programmers], there are a lot of old-timers learning skills, learning experience, interview skills, workplace experience and other share, the more we carefully prepared the zero-based introductory information on actual project data every day to explain the timing of Python programmers technology, to share some of the ways to learn and need to pay attention to small details, to remember the attention I
Here Insert Picture Description

Released six original articles · won praise 0 · Views 5

Guess you like

Origin blog.csdn.net/chengxun02/article/details/104976025