Getting started with Python crawler 7: HTML message parsing to obtain basic information about web pages

☞ ░Go to LaoYuanPython blog https://blog.csdn.net/LaoYuanPython

I. Introduction

BeautifulSoup is a class for HTML parsing provided in the third-party module bs4. It can be considered as an HTML parsing toolbox. It has a better fault-tolerant recognition function for tags in HTML messages. Reading this section requires basic knowledge of HTML. If For insufficient knowledge in this area, please refer to the introduction in the previous chapter.

Two, BeautifulSoup installation, import and create objects

2.1, install BeautifulSoup and lxml

BeautifulSoup is a class of bs4 module, lxml is an html text parser, execute the command to install bs4 module and lxml parser module in operating system command line:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple bs4
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple lxml

2.2, load the module where BeautifulSoup is located

Because BeautifulSoup is a class provided by the bs4 module, it is generally used when importing:
from bs4 import BeautifulSoup

Guess you like

Origin blog.csdn.net/LaoYuanPython/article/details/113091721