Installation of Beautiful Soup
1. Installation of Beautiful Soup
Beautiful Soup is an HTML or XML parsing library for Python, which can be used to easily extract data from web pages. Its parser depends on the lxml library, so before that, please make sure that the lxml library has been successfully installed.
The environment of this article is windows 10 64-bit + python3.11, here we take windows installation as an example.
1.1 Install the lxml library
To install the lxml library, first try to install it using pip:
pip install lxml
If the pip installation reports an error, such as the lack of information such as the libxml2 library, then you can use the wheel method to install
Use the wheel method to install, first you need to install the wheel
pip install wheel
Then go to the official website https://pypi.org/project/lxml/ to download the wheel version corresponding to lxml, the latest version is lxml 4.9.1, click Download files
In the listed files, select the one that matches your own version. For example, your python version is 3.10, your machine is a windows system, 64-bit version, then choose lxml-4.9.1-cp310-cp310- win_amd64.whl
The tricky thing here is that the latest python version is version 3.11, but lxml does not have a corresponding official windows version 311, only version 311 under linux. You can choose to downgrade the python version, such as downgrading to python3.10 version.
Or in https://www.lfd.uci.edu/~gohlke/pythonlibs/ , you can find the 311 version of the windows wheel installation package, you can try it yourself.
To install the wheel package, go to the directory where the wheel installation package is located and execute the pip command, or you can also bring the full path
pip install lxml-4.9.0-cp311-cp311-win_amd64.whl
1.2 Install beautifulsoup4
It is recommended to use pip to install, execute the following installation command
pip install beautifulsoup4
1.3 Verify whether beautifulsoup4 can run
Execute the following code, if you can successfully output hello, it means that you can successfully use beautifulsoup4 for analysis.
If only beautifulsoup4 is installed successfully and the lxml library is not installed correctly, the following code cannot be executed successfully.
from bs4 import BeautifulSoup as bs
soup = bs('<p>hello</p>', 'lxml')
print(soup.p.string)