小白学爬虫笔记9---实例：中国好大学排名 - 代码天地

小白学爬虫笔记9---实例：中国好大学排名

其他 2018-08-28 00:10:14 阅读次数: 0

实例：中国大学排名

html = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html'

功能说明

输入：url
输出：大学排名信息的屏幕输出（排名，大学名称，总分）
基数路线：requests-bs4
定向爬虫：仅对当前url进行爬取，不扩展爬取

程序的结构设计

获取网页内容
提取数据结构
展示数据结果

getHTMLText()
fillUnivList()
printUnivList()

主程序

import requests
import bs4
from bs4 import BeautifulSoup
def getHTMLText(url):
    try:
        r = requests.get(url,timeout = 30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return ""

def fillUnivList(ulist,html):
    soup = BeautifulSoup(html, "html.parser")
    for tr in soup.find('tbody').children:
        if isinstance(tr,bs4.element.Tag):
            tds = tr('td')
            ulist.append([tds[0].string,tds[1].string,tds[2].string])
    pass

def printUnivList(ulist,num): # num 学校的个数 ； 格式化输出：format函数
    print("{:^10}\t{:^6}\t{:^10}".format("排名","学校名称","总分"))
    for i in range(num):
        u=ulist[i]
        print("{:^10}\t{:^6}\t{:^10}".format(u[0],u[1],u[2]))
    print("Suc" + str(num))

def main():
    uinfo = []
    url = 'http://www.zuihaodaxue.cn/zuihaodaxuepaiming2016.html'
    html = getHTMLText(url)
    fillUnivList(uinfo,html)
    printUnivList(uinfo,20) # 20 univs
main()

优化

问题：
* 中英文不对齐：字符宽度不同

中文对其问题的解决

采用中文字符的空格填充chr(12288)

def printUnivList(uList,num):
    tplt = "{0:^10}\t{1:{3}^10}\t{2:^10}" #3指中文
    print(tplt.format("排名","学校","分数",chr(12288)))
    for i in range(num):
        u=ulist[i]
        print(tplt.format(u[0],u[1],u[2],chr(12288)))

猜你喜欢

转载自blog.csdn.net/paleyellow/article/details/81301316

小白学爬虫笔记9---实例：中国好大学排名

Python爬虫学习笔记(实例：中国好大学排名定向爬虫)

Python爬虫实现[中国最好大学排名2016]

python爬虫-中国最好大学排名

python3爬虫-中国最好大学排名

爬虫——最好大学排名实例

爬虫日记-最好大学排名实例

python爬虫笔记（五）网络爬虫之提取——实例优化：中国大学排名爬虫

python爬虫笔记（五）网络爬虫之提取——实例：中国大学排名爬虫

爬虫：中国大学排名定向爬虫实例

爬虫（五）“中国大学排名定向爬虫”实例

使用python爬虫爬取最好大学网大学排名实例

利用python网络爬虫获取软科中国最好大学排名2019数据

中国大学排名定向爬虫实例

python 爬虫实例爬取中国大学排名

中国大学排名（定向爬虫）实例

python定向爬虫实例-中国大学排名

【实例】爬取2018中国最好大学排名

爬虫之爬取最好大学排名实例

爬取软科中国最好大学排名

python爬虫爬取最好大学排名

爬虫爬取最好大学排名

中国大学排名定向爬虫

爬虫中国最好的大学排名

爬虫中国大学排名

Python网络爬虫之中国大学排名爬虫代码实例分析学习笔记手札及代码实战

Python网络爬虫与信息提取笔记06-实例1：中国大学排名爬虫

python,网络爬虫完整示例代码－－抓取中国最好大学排名网站信息，并进行输出显示

4爬虫实例----大学排名

Python3.6——"中国大学排名定向爬虫"实例介绍

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

【转】spring中对控制反转和依赖注入的理解

tms webcore 安装和使用

java程序员进阶相关书籍

SpringMVC接受请求参数、

如何保存训练好的机器学习模型

MyEclipse、Eclipse设置项目JDK的三个地方

商超行业微信小程序开发定制一般多少钱（行业技术人员解读）

Markdown编辑器语言——30分钟入门到到精通

Linux系统下MongoDB的简单安装与基本操作

Power Strings

每日归档

更多

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)