1.3 Web page data capture

1.3 Web page data capture

Li Mu

Station B: https://space.bilibili.com/1567748478/channel/collectiondetail?sid=28144
Course homepage: https://c.d2l.ai/stanford-cs329p/

1. Web page data capture

  • Web page data capture: extract data from specific websites;

    • Features: large noise, spam information, large data scale

    • Common Applications: Price Comparison, Price Tracking Sites

  • What is the difference between crawling and data crawling?

    • Crawl: entire webpage
    • Data crawling: data scientist –> specific web page –> data of interest

2. Crawler

There are problems: tools linuxare used under the platform curl, but websites generally use various tools to prevent crawling;

Solution: headlessthe browser used, a browser without GUI; ( ipa large number of visits to the same website in the same period of time may be banned, and you can use the cloud server)

  • sample code
from selenium import webdriver
chrome_op

Guess you like

Origin blog.csdn.net/ch_ccc/article/details/129876865