Installation of python parsing library Beautiful Soup

1. Installation of Beautiful Soup

Beautiful Soup is an HTML or XML parsing library for Python, which can be used to easily extract data from web pages. Its parser depends on the lxml library, so before that, please make sure that the lxml library has been successfully installed.

The environment of this article is windows 10 64-bit + python3.11, here we take windows installation as an example.

1.1 Install the lxml library

To install the lxml library, first try to install it using pip:

pip install lxml

If the pip installation reports an error, such as the lack of information such as the libxml2 library, then you can use the wheel method to install

20221211174313

Use the wheel method to install, first you need to install the wheel

pip install wheel

20221211193726

Then go to the official website https://pypi.org/project/lxml/ to download the wheel version corresponding to lxml, the latest version is lxml 4.9.1, click Download files

20221211204339

In the listed files, select the one that matches your own version. For example, your python version is 3.10, your machine is a windows system, 64-bit version, then choose lxml-4.9.1-cp310-cp310- win_amd64.whl

20221211200756

The tricky thing here is that the latest python version is version 3.11, but lxml does not have a corresponding official windows version 311, only version 311 under linux. You can choose to downgrade the python version, such as downgrading to python3.10 version.

Or in https://www.lfd.uci.edu/~gohlke/pythonlibs/ , you can find the 311 version of the windows wheel installation package, you can try it yourself.

20221211204213

To install the wheel package, go to the directory where the wheel installation package is located and execute the pip command, or you can also bring the full path

pip install lxml-4.9.0-cp311-cp311-win_amd64.whl

20221211202526

1.2 Install beautifulsoup4

It is recommended to use pip to install, execute the following installation command

pip install beautifulsoup4

20221211173125

1.3 Verify whether beautifulsoup4 can run

Execute the following code, if you can successfully output hello, it means that you can successfully use beautifulsoup4 for analysis.

If only beautifulsoup4 is installed successfully and the lxml library is not installed correctly, the following code cannot be executed successfully.

from bs4 import BeautifulSoup as bs

soup = bs('<p>hello</p>', 'lxml')
print(soup.p.string)

20221211203212

Guess you like

Origin blog.csdn.net/hubing_hust/article/details/128278550