#Python reptiles (b) ## Taiyuan University of Technology robotics team 20 days to learn punch day13

In order to extract the information, we need to first understand the labeling method of information, widespread use of the current web page HTML (HyperText Markup Language), can explain the hypertext markup text, sound, images, videos and links language
HTML basic format
Here Insert Picture Description
Here Insert Picture DescriptionHere Insert Picture DescriptionThese three can be equated
Here Insert Picture Description
HTML (tag tree) content traversing the need to use beautifulsoup libraries (installed beautifulsoup4 ), which is the resolution, traversal, tree maintenance tag library of functions

Here Insert Picture Description
beautifulsoup library has four parsing library, we will use normal HTML parsing the first
Here Insert Picture Description essential element
Here Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture Descriptionwe get the HTML content will find it very messy, not clear the contents of
Here Insert Picture Description which we can use bs4 library prettify ()
Here Insert Picture DescriptionHere Insert Picture Description
Here Insert Picture Description on specific content retrieving
Here Insert Picture DescriptionHere Insert Picture Description
Here Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture DescriptionHere Insert Picture Descriptionknowledge very much and trivial, to be slowly digested, the next article will talk about instances, actual combat.

Published 13 original articles · won praise 30 · views 4968

Guess you like

Origin blog.csdn.net/weixin_46424753/article/details/104908270