Python simple crawler, crawl the entire page - Code World

Python simple crawler, crawl the entire page

Enterprise 2020-10-28 10:24:44 views: null

Simple crawler, crawl the entire page, modify the url in the code to crawl the specified website.

import urllib.request  # 导入包


def getHtml(url):  # 获取html的内容
    html = urllib.request.urlopen(url).read()  # bytes 如果不用read()html会是一个↓
    return html                                # http.client.HTTPResponse的变量


def saveHtml(fileName, fileContent):
    with open(fileName, "wb") as f:  # 以wb打开文件
        f.write(fileContent)  # 写入


def main():
    url = "https://www.zhihuishu.com/"  # url
    html = getHtml(url)  # 调用函数获取bytes
    saveHtml("theHtml.html", html)  # 保存
    print("保存网页完成")  # 提示语


if __name__ == "__main__":  # 主函数
    main()

Remaining problem: Many websites have anti-crawler mechanisms, which cause the crawled files to be not the desired files.

Guess you like

Origin blog.csdn.net/weixin_51343683/article/details/109295084

Python simple crawler, crawl the entire page

Python crawler: crawl hyperlinks on the page

Python3 crawler second step Selenium uses a simple way to crawl complex page information

Python crawler 1 ------ (crawl pictures to achieve multi-page download)

Simple steps and implementation code of python crawler to crawl pictures

Use Python native crawler to crawl simple information of blog posts

[Crawler login, crawl and comment - simple and practical]

Python crawler advanced article-use the beautifulsoup library to crawl web page article content practical demonstration

[Python web crawler] Use urllib to crawl web page source code, pictures and videos

Python crawlers crawl the page content

python crawl web page elements

Python crawler uses requests module to make a simple web page collector

Python crawler collection anti-crawl strategy

[Python crawler] Use proxy to crawl sister images

Python crawler-crawl data into excel sheet

python crawler: crawl QQ music songs

How Python crawler uses proxy IP to crawl

Douyin crawler tutorial, python crawler collection anti-crawl strategy

Simple crawler: crawl 100 4K animation pictures

Crawler combat: batch crawl Jingdong underwear pictures (automatically crawl multiple pages, not one page)

Python simple business crawler

Python programming learning to crawl domain sites page

[Python crawler series tutorial 29-100] Through the scrapy framework, Miss Sister teaches you to crawl down the entire beauty net, how much you need, and mainly learn custom pipelines (pictures and CSV)

Python crawler -- open crawling page

Getting started with python crawler (1)-crawling the source code of the entire webpage

[Python realizes web crawler 20] Knowing the hot list crawl

Python crawler, crawl 500+ WeChat articles in 30 seconds

How to crawl 1000+ Baidu pictures? Python crawler tutorial with code

How to use Python crawler to crawl the barrage data of station B?

Quickly grasp the core technology of Python crawler and crawl network pictures in batches

Recommended

LFOSSA Yuanlaisusu Open Course | Mastering the Cloud Native Future: Comprehensive Guide to CNCF Certification and Exam Preparation Tips

Ranking

C++ Basic Syntax

bootstrapTable hides a column based on a condition

Why is reentrant lock recommended instead of Synchronized when dynamic high concurrency?

hexo create a blog

[Fully open source and non-encrypted version] Imitation of the eighth district distribution/online signature/multiple sets of download templates/APP distribution hosting/APP packaging and packaging

Polymerization combination

https://www.flysnow.org/2017/05/06/go-in-action-go-log.html

From the perspective of Flutter and the front-end, talk about how to ensure UI fluency under the single-threaded model

nginx-301, 302 redirect

Geolocation by IP Address in ASP.NET

Daily

More

2024-04-26(22)

2024-04-25(32)

2024-04-24(30)

2024-04-23(30)

2024-04-22(5)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(31)