百度搜索爬虫完善

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zhoulinshijie/article/details/88667496
import urllib.request
import urllib.parse

keyword = input("请输入搜索的关键词")
num = input("请输入保存的页面个数")
keyword = urllib.parse.quote(keyword)

headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0"
}

for i in num:
    url = "https://www.baidu.com/s?ie=utf-8&wd=" + keyword + "&pn=" + i
    request = urllib.request.Request(url, headers=headers)
    response = urllib.request.urlopen(request)
    content = response.read()
    html = content.decode("utf-8")

    name = i + ".html"
    with open( name, "w", encoding="utf-8") as f:
        f.write(html)

猜你喜欢

转载自blog.csdn.net/zhoulinshijie/article/details/88667496