使用python3爬取数据至mysql - 代码天地

使用python3爬取数据至mysql

其他 2018-08-14 08:18:32 阅读次数: 0

直接贴代码

#!/usr/local/bin/python3.5
# -*- coding:UTF-8 -*-
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import datetime
import random
import pymysql

connect = pymysql.connect(host='192.168.10.142', unix_socket='/tmp/mysql.sock', user='root', passwd='1234', db='scraping', charset='utf8')
cursor = connect.cursor()
cursor.execute('USE scraping')

random.seed(datetime.datetime.now())


def store(title, content):

    execute = cursor.execute("select * from pages WHERE `title` = %s", title)
    if execute <= 0:
        cursor.execute("insert into pages(`title`, `content`) VALUES(%s, %s)", (title, content))
        cursor.connection.commit()
    else:
        print('This content is already exist.')


def get_links(acticle_url):
    html = urlopen('http://en.wikipedia.org' + acticle_url)
    soup = BeautifulSoup(html, 'html.parser')
    title = soup.h1.get_text()
    content = soup.find('div', {'id': 'mw-content-text'}).find('p').get_text()
    store(title, content)
    return soup.find('div', {'id': 'bodyContent'}).findAll('a', href=re.compile("^(/wiki/)(.)*$"))

links = get_links('')

try:
    while len(links) > 0:
        newActicle = links[random.randint(0, len(links) - 1)].attrs['href']
        links = get_links(newActicle)
        print(links)
finally:
    cursor.close()
    connect.close()

猜你喜欢

转载自blog.csdn.net/ASAS1314/article/details/52594232

使用python3爬取数据至mysql

使用python3爬取数据至csv

python3 爬取影像数据

python3 爬取API数据

使用Python3爬取美女

使用python3爬取小说

python3 scrapy实战：爬取猎聘网招聘数据至数据库（反爬虫）

python3 scrapy实战：爬取拉勾网招聘数据至数据库（反爬虫）

python3 scrapy实战：爬取直聘网招聘数据至数据库（反爬虫）

python3 scrapy爬虫进行爬取数据存入MySQL数据库

Python3 + Scrapy 爬取豆瓣评分数据存入Mysql与MongoDB数据库。

Python3~Scrapy框架爬取网页数据到MySql~pipelines.py

python3爬虫爬取英语单词到MySQL数据库

Python3爬取猫眼电影榜并将数据存入MySql

Python3爬取前程无忧数据分析工作并存储到MySQL

最新 Python3 爬取前程无忧招聘网 mysql和excel 保存数据

python3 Mysql保存爬取的数据(正则提取关键信息)

python3 scrapy 使用PhantomJS作为middlewares爬取动态加载的数据

Python3使用selenium爬取斗鱼直播平台数据

python3使用Selenium+Chrome+BeautifulSoup爬取国家统计局数据

Python3使用Scrapy2.4框架爬取数据，多个spider同时执行

Python3使用Scrapy2.4框架爬取数据，多spider指定pipelines配置

Python3爬取百度文库数据

python3将爬取的数据写入execl表格

使用python3爬取百度图片

python3爬虫-使用requests爬取起点小说

Python3 使用request模块爬取网页的图片

Python3 使用urllib 爬取网页

利用python3爬虫爬取全国天气数据并保存入Mysql数据库

python3爬取网页图片

今日推荐

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

“百模大战”必有一战 | 2024中国“百模大战”竞争格局分析

周排行

Family Tree 题解

BZOJ 1093 最大半连通子图 SCC + DP

幂等处理

Spring----学习（2）----XML 配置Bean 自动装配

SQL Server 远程更新目标表数据

HIbernate3.6 环境搭建

特殊符号正则表达式

【Linux】第一章进程的理解

843. n-皇后问题（dfs+输出各种情况）

空间数据库2

每日归档

更多

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(5)