In order to extract the information, we need to first understand the labeling method of information, widespread use of the current web page HTML (HyperText Markup Language), can explain the hypertext markup text, sound, images, videos and links language
HTML basic format
These three can be equated
HTML (tag tree) content traversing the need to use beautifulsoup libraries (installed beautifulsoup4 ), which is the resolution, traversal, tree maintenance tag library of functions
beautifulsoup library has four parsing library, we will use normal HTML parsing the first
essential element
we get the HTML content will find it very messy, not clear the contents of
which we can use bs4 library prettify ()
on specific content retrieving
knowledge very much and trivial, to be slowly digested, the next article will talk about instances, actual combat.