Python crawlers crawl the page content

Example blog Park: Ctrl + Alt + L formatting code

#coding:utf-8
import requests
from lxml import etree


def gettitle(url):
    html=requests.get(url)
    selector=etree.HTML(html.text)
    title=selector.xpath('//a[@id="cb_post_title_url"]/text()')
    return title[0]

def getcontent(url):
    html=requests.get(url)
    selector=etree.HTML(html.text)
    contentlist=selector.xpath('//div[@class="postBody"]/div/p/text()')
    contents=''
    for i in contentlist:
        contents=contents+"\n"+i
    return contents
print("请输入博客园文章的链接:")
url=input("")
print(gettitle(url))
print(getcontent(url))

 

Published 46 original articles · won praise 9 · views 3672

Guess you like

Origin blog.csdn.net/weixin_41896770/article/details/100099428