Python writes crawler code, which libraries are needed besides requests and beautifulsoup4? 【Learning record】

When Python writes crawler code, in addition to requests and beautifulsoup4, there are some other libraries that can be used. The following are some commonly used libraries:

  1. Scrapy: Scrapy is a Python crawler framework, which provides powerful crawling tools and convenient data processing functions, and can quickly write efficient crawler programs.

  2. Selenium: Selenium is an automated testing tool that can simulate browser behavior. For some websites that need to simulate login, click and other interactive operations, Selenium is a very useful tool.

  3. PyQuery: PyQuery is a library similar to jQuery, which can be used to manipulate HTML documents in a way similar to CSS selectors, which is very convenient.

  4. lxml: lxml is a Python XML processing library that can quickly parse XML documents and can also be used to parse HTML documents.

  5. requests-html: requests-html is a library based on requests and lxml, which can easily parse HTML documents and supports JavaScript rendering and CSS selectors.

  6. pandas: pandas is a Python data processing library, which can easily clean, organize and analyze data, and is very useful for data processing in crawler programs.

installation method:

Enter the pip install library name in the terminal  , such as:

pip install scrapy

Here is a code example that imports the above library:

import scrapy
from selenium import webdriver
from pyquery import PyQuery as pq
from lxml import etree
from requests_html import HTMLSession
import pandas as pd

Guess you like

Origin blog.csdn.net/whoas123/article/details/130022860