Event address: CSDN 21-day learning challenge
The biggest reason for learning is to get rid of mediocrity. One day earlier, there will be more splendor in life; Dear friends, if you:
want to systematically/deeply learn a certain technical knowledge point...
it is difficult to persist in learning alone, and want to learn efficiently in a group...
want to write a blog but can't start, and urgently need to inject energy into writing dry goods...
love writing, willing to let yourself become better people
...
Welcome to participate in the CSDN Learning Challenge and become a better self. Please refer to the free high-quality column resources of the high-quality column bloggers in the event (this part of the high-quality resources is free and open for a limited time in the event~), according to your own learning field and learning progress Learn and document your own learning process. You can choose one of the following three aspects to start (not mandatory), or publish column learning works according to your own understanding, as follows:
**
study diary
**
1. Learning knowledge points
Installation related operations
2. Problems encountered in learning
API not touched
3. Learning gains
API usage of beautifulsoup4
4. Practical operation
Installation related:
1. cmd command line: pip install beautifulsoup4
2、密包:form bs4 import BeautifulSoup
Parsing library:
1. Python standard library: BeautifulSoup(html,'html.parser'), Python's built-in standard library, with moderate execution speed and strong document fault tolerance. Versions of Python 2.7.3 and earlier than Python 3.2.2 have poor error tolerance.
2. lxml HTML parsing library: BeautifulSoup(html,'lxml'), fast speed and strong document fault tolerance. The C language library needs to be installed.
3. lxml XML parsing library: BeautifulSoup(html,'xml'), fast and the only parser that supports XML. The C language library needs to be installed.
4. htm5lib parsing library: BeautifulSoup(html,'htm5llib'), the best fault tolerance, parsing documents in the way of a browser, and generating documents in HTMLS format. Slow and does not rely on external extensions.
Object type:
1. tag: label.
2. NavigableString: the text in the label.
3. BeautifulSoup: the content, type, name, and attributes of the document.
4. Comment: Content that does not contain comment symbols.
…