网络爬虫爬取中国大学排名，并存入数据库 - 代码天地

网络爬虫爬取中国大学排名，并存入数据库

其他 2019-09-15 14:34:31 阅读次数: 0

#CrawUnivRanjingA.py
import requests
from bs4 import BeautifulSoup
import bs4
import pymysql

db=pymysql.connect(host="localhost",user="root",password="admin",db="test",port=3306)
print('数据库连接成功')
cursor=db.cursor()
# cursor.execute()
# sql = """CREATE TABLE Daxue (
#   排名 int(3) NOT NULL,
#   学校名称 CHAR(10),
#   总分 float (2),
#   省市 varchar(10))"""





def getHTMLtEXT(url):
    try:
        r=requests.get(url,timeout=30)
        r.raise_for_status()
        r.encoding=r.apparent_encoding
        return r.text
    except:
        return ""

def fillUnivList(ulist,html):
    a=0
    soup=BeautifulSoup(html,"html.parser")
    for tr in soup.find('tbody').children:
         if isinstance(tr,bs4.element.Tag):
             tds=tr('td')
             ulist.append([tds[0].string,tds[1].string,tds[3].string,tds[2].string])
             paiming=tds[0].text.strip()
             xuexiaomingcheng=tds[1].text.strip()
             zongfeng=tds[3].text.strip()
             shengshi=tds[2].text.strip()
             if a<20:
                 insert_into = ("INSERT INTO Daxue(排名,学校名称,总分,省市)""VALUES(%s,%s,%s,%s)")
                 data_into=(paiming,xuexiaomingcheng,zongfeng,shengshi)
                 cursor.execute(insert_into,data_into)
                 db.commit()
                 a+=1

def PrintUnivlist(ulist,NUM):
    tplt="{0:<10}\t{1:{4}<10}\t{2:<10}\t{3:<10}"
    print(tplt.format("排名","学校名称","总分","省市",chr(12288)))
    for i in range(NUM):
        u=ulist[i]
        print(tplt.format(u[0],u[1],u[2],u[3],chr(12288)))

def main():
    uinfo=[]
    url="http://www.zuihaodaxue.com/zuihaodaxuepaiming2019.html"
    html=getHTMLtEXT(url)
    fillUnivList(uinfo,html)
    PrintUnivlist(uinfo,20)
main()


存入数据库后：

猜你喜欢

转载自www.cnblogs.com/doudouhaha521/p/11522076.html

网络爬虫爬取中国大学排名，并存入数据库

python 爬虫实例爬取中国大学排名

定向爬虫，爬取中国大学排名 Python

爬取中国大学排名

中国大学排名的爬取

中国大学排名定向爬取

实例一：中国大学排名爬取

python爬取中国大学排名

【python】爬取中国大学排名

【Python爬虫】从html里爬取中国大学排名

初学爬虫之访问goole网页与爬取中国大学排名。

Python爬虫——定向爬取“中国大学排名网”

python爬虫爬取2020年中国大学排名

Python爬虫入门实例三之爬取软科中国大学排名

网络爬虫：中国大学排名定向爬虫

国内大学排名如何？用Python爬取中国大学排名

2023年python爬取中国大学排名并且进行数据分析

python 爬虫爬取最好大学网，并存入 mysql 数据库

中国大学排名定向爬虫

爬虫中国大学排名

Python爬虫之BeautifulSoup库——爬取大学排名

【网络爬虫】爬取中国大学排名网站上的排名信息，将排名前20的大学的信息保存为文本文件并在窗口打印的python程序

爬虫爬取大学排名示例

python爬虫笔记（五）网络爬虫之提取——实例：中国大学排名爬虫

python爬虫笔记（五）网络爬虫之提取——实例优化：中国大学排名爬虫

Python定向爬取单网页中国大学排名（一）

爬取中国大学排名并以csv格式存储

python-爬取中国大学排名（第五周）

使用Python爬取中国大学排名，并格式化对其输出内容

爬取中国大学排名时报错：AttributeError: 'NoneType' object has no attribute 'children'

今日推荐

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

OOP第二次作业

java web 乱码问题

android 禁止scrollview 因控件变化自动滚动到底的方法

mysql服务解压版的安装(5.7)

centos7 nginx+tomcat配置https 安装免费SSL Let’s Encrypt

使用Mosquitto遗嘱机制实现感知客户端上下线功能的方法

面向对象之------多态与多态性

开发Teams Tabs应用程序

C# 希尔排序

第2章 Jupyter Notebooks

每日归档

更多

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)