Scrapy the twisted module

scrapy framework (internal to download content is to use asynchronous non-twisted rent plug module)

  1. dependence twisted
    internal implementation reptiles concurrent event-based mechanism cycle of
    non-rent plug: Do not wait for a connection request, connection without waiting for a connection to go, sending the next immediately after a send
    asynchronous: callback manifestation of the notification sent successfully come back as long as automatically notify
    event loop: loop socket mission, to detect whether the socket connection is successful if the state returns the result
    vernacular: http single-threaded and can initiate requests to multiple target
    official: cycling event-based asynchronous non-plug modules rent

. 1  from twisted.web.client Import the getPage, the defer
 2  from twisted.internet Import Reactor
 . 3  
. 4  
. 5  # first portion agent begins receiving a task 
. 6  DEF the callback (Contents):
 . 7      Print (Contents)
 . 8  
. 9 deferred_list = [] # Task List 
10 = URL_LIST [ ' http://www.bing.com ' , ' https://segmentfault.com/ ' , ' https://stackoverflow.com/ ' ]
 . 11  for URLin URL_LIST:
 12 is      deferreds the getPage = (bytes (URL, encoding = ' UTF8 ' ))   # acquired demand 
13 is      deferred.addCallback (callback)   # notification callback directly execute callback function 
14      deferred_list.append (deferreds)
 15  
16  # second portion after the agent to perform the task, stopping 
17 DLIST = defer.DeferredList (deferred_list)
 18  DEF all_done (Arg):
 19      reactor.stop ()
 20 dlist.addBoth (all_done) # receiving three tasks regardless of whether the task is executed successfully 
21  # begin processing tasks 
22 reactor.run ()

  2. Write parse

. 1  DEF the parse (Self, Response):
 2      . 1 . Response
 . 3      response.text
 . 4      response.encoding
 . 5      response.body
 . 6      response.request   # current response which is initiated by the request: the request package (url to be accessed, then the download is complete which function performed) 
. 7      2 . resolved
 . 8      response.xpath ( ' // div [@ the href = "X1" / a] ' ) .extract_first ()   # first 
. 9      response.xpath ( ' // div [@ the href = "X1" / A] ' ) .extract ()         # All 
10      response.xpath ( 'div // [@ the href = "X1" / A / text ()] ' ) .extract ()  
 . 11      tag_list response.xpath = ( ' // div [@ the href = "X1" / A] ' ) .extract ()       
 12 is      for tag in tag_list:
 13 is          tag.xpath ( ' .// ' ) to find the current label descendants
 14      . 3 retransmission request again (but not yet issued, but the package).
 15          the yield the request (URL = Page, the callback = Self. the parse) # just packaging, not only performs a request to initiate downloading

 the difference:

1 1 .twisted the difference between requests?
2      . 1 .requests can be forged browser sends a Http request module implemented in Python
 . 3      - encapsulated socket transmission request
 . 4  
. 5 2 .twisted asynchronous event-based non-circular plug frame rent
 . 6      - encapsulating socket transmission request
 . 7      - threaded complete concurrent operation does not wait to go directly to hair, regardless of the success, just send
 8      - non-Cypriot rent does not wait for
 9      - asynchronous callback
 10      - event cycle: the cycle has been to check the status

 

  

Guess you like

Origin www.cnblogs.com/Alexephor/p/11436694.html