Python --- reptile learning summary

First, this study is the use of writing Python scripts to obtain information on the web page, and save him to our database finally form an Excel spreadsheet

Download the source code and third-party modules installed MongoDB

At first we need to do some preparation:
install third-party modules

Python --- reptile learning summary

Python --- reptile learning summary

https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-rhel70-3.2.5.tgz

Python --- reptile learning summary


Ideas are as follows:

1. Visit the website to get the html page

get headers:
Python --- reptile learning summary

Script 1:

Before running the open mongod:

             ./mongod &  

Python --- reptile learning summary

2. Extract the contents of html which we want

Script 2:
Python --- reptile learning summary

Python --- reptile learning summary

Long Jump and View Graph is based on the label that they can navigate to the information we want to get the

This script does not need to finish the run, his third script url is imported

3. We climbed the content into the database

Script 3:

Python --- reptile learning summary

Will check whether MongoD run before the run, after running into the database to see the information we can into the
bin under the MongoDB

./mongo

use iaaf

db.athletes.find()

4. converted to Excel spreadsheet

Script 4:

Python --- reptile learning summary

Python --- reptile learning summary

5.requests, pymongo, usage summary of bs4

requests is a very useful Python HTTP client library, often used when writing reptiles and test server response data. It can be said, Requests to fully meet the needs of today's network

1. Role: send request acquirer Why requesst?
1) requests the underlying implementation is urllib2) requests in python2 and python3 in general, exactly the same way
3) requests ease of use (python characteristic)
4) Requests can help us extract the contents of the response (self-extracting complete request header to automatically obtain cookie)

  1. Send a simple get request, the fetch response response = requests.get (url)

pymongo mongodb operation is python toolkit

bs4 concept:

bs4 library is resolved, traverse, maintenance, "tag tree" function library
popular thing to say is: bs4 the HTML source code library re-formatted,
so as to facilitate us to one of the nodes, tags, attributes, etc. to operate
2.BS4 4 objects
①Tag objects: a html tag is, BeautifulSoup with specific content can be parsed tag, in particular
the format 'soup.name', where the label is the name under html.
②BeautifulSoup objects: entire html text object can be used as a Tag object
③NavigableString objects: text object in the label
④Comment objects: is a special NavigableString object if html tags in the comment memory, it can filter out the comment text annotation symbols reserved
the most commonly used or BeautifulSoup objects and objects Tag

Guess you like

Origin blog.51cto.com/14375779/2409327
Recommended