Use of Python's BeautifulSoup library

Use of BeautifulSoup Library of Python Crawler

The following is an example of climbing to 985 universities:

import requests
from bs4 import BeautifulSoup  #从bs4中引入BeautifulSoup库
import re

url="http://daxue.eol.cn/985.shtml"

r=requests.get(url)
r.encoding=r.apparent_encoding

soup=BeautifulSoup(r.text,"html.parser")#使用html.parser对爬取的源代码进行解析

for tr in soup.tbody:  #子标签可以使用for in从父标签中提取出来
    if len(tr)==9:     
        list=tr.contents #可以使用contents提取标签的子标签并生成列表
        print(list[1].text)
        print(list[3].string) #可以利用string,text,get_text提取标签中的文本
    if len(tr)==7:
        list=tr.contents
        print(list[1].string)




Guess you like

Origin blog.csdn.net/xinzhilinger/article/details/102727870