3-04. Simple reptile

Easy reptiles

  • Action : crawling the backend data language website, and then cleaned by a particular data block, and finally outputs the data to the front end.

  • Anti reptile

  • Resolution: The contents of a tag to put a picture

  • step

    1. The introduction of http module

    const http = require( 'http' )    

2. Copy node.js official website, http.get ()

3. Copy node.js official website, const Options () on http.get () above

4. The path http.get () is changed to options that this is not a string, so no single quotation marks.

5. To find a site crawling, the console's Network found in Doc , and then there will be a refresh folder, the folder is the data to be in; click the leftmost Headers , drop down to find General of Request UR L, copy one of the URL.

6. In the modification options object information, is the hostname domain (http pure domain and ending without the slash), path is the path com / the latter, if not blank, method of a GET request, because it is http.get ( ), to be consistent context, headers beginning Acc copy Request headers console entire code is added to the inside, the final content-length value into 0;

And: the value of the latter with an apostrophe followed by a comma. : With front - bar also with an apostrophe.

7. Check console data type, if the text is the type of program you want to remove the error. (Let reeor to return part)

8. Remove the try josn handler, into consolelog (rawData)

* If the request is 9 website https protocol, all the code should be replaced http https, change the port number 443, option in the headers to change the header;

10. Enter ls to view the file directory, enter the command: node + space + file name to run, you get a piece of data.

11. Next, data cleansing data obtained by third-party plug-ins you want [module] cheerio

12. First npm init -y appears package.js file for recording project dependencies

13.cnpm i cheerio -S

14. The injection module

const cheerio = require( 'cheerio' )

15. The introduction of the code in try and remove console.log (rawData)

const $ = cheerio.load( rawData )
$('标签.类名 a').each( function ( item ) {
       console.log( $( this ).text()  )
      })
  1. const req = http.get () module to define a constant

  2. req.end () and then finally add

  3. Use native code to create a server, reference 04. native code to create a server;
  4. Prior will start from scratch to http.get copy before pasting the code into native server level function, and delete the duplicate variable.
  5. The remaining part of the copy paste http.get replace response.write native server (), response.end () statement.
  6. The console.log ($ (this) .text ( )) is replaced Response.Write ( <h3> ${ $( this ).text() } </h3>)
  7. In Response.End plus function ()
  8. Run the file to the page output climbed data

Guess you like

Origin www.cnblogs.com/douyacai7822/p/11353407.html