Python Pymysql implements data storage

Python connects and operates the MySQL database, mainly through the Pymysql module. This section explains how to store the scraped data into a MySQL database.
Tip: Before learning this section, you have mastered the basic syntax of the SQL language.

Create the storage data table
First you should make sure that the MySQL database has been installed on your computer, and then do the following:

# 1. 连接到mysql数据库
mysql -h127.0.0.1 -uroot -p123456
# 2. 建库
create database maoyandb charset utf8;
# 3. 切换数据库
use maoyandb;
# 4. 创建数据表
create table filmtab(
name varchar(100),
star varchar(400),
time varchar(30)
);
Pymysql基本使用
1) 连接数据库
db = pymysql.connect('localhost','root','123456','maoyandb')
参数说明:
localhost:本地 MySQL 服务端地址,也可以是远程数据库的 IP 地址。
root:连接数据所使用的用户名。
password:连接数据库使用的密码,本机 MySQL 服务端密码“123456”。
db:连接的数据库名称。
2) 创建cursor对象
cursor = db.cursor()
3) 执行sql命令
execute() 方法用来执行 SQL 语句。如下所示:
#第一种方法:编写sql语句,使用占位符传入相应数据
sql = "insert into filmtab values('%s','%s','%s')" % ('刺杀,小说家','雷佳音','2021')
cursor.excute(sql)
第二种方法:编写sql语句,使用列表传参方式
sql = 'insert into filmtab values(%s,%s,%s)'
cursor.execute(sql,['刺杀,小说家','雷佳音','2021'])
4) 提交数据
db.commit()
5) 关闭数据库
cursor.close()
db.close()

The complete code looks like this:

# -*-coding:utf-8-*-
import pymysql
#创建对象
db = pymysql.connect('localhost','root','123456','maoyandb')
cursor = db.cursor()
# sql语句执性,单行插入
info_list = ['刺杀,小说家','雷佳音,杨幂','2021-2-12']
sql = 'insert into movieinfo values(%s,%s,%s)'
#列表传参
cursor.execute(sql,info_list)
db.commit()
# 关闭
cursor.close()
db.close()
查询数据结果,如下所示:
mysql> select * from movieinfo;
+-------------+-------------------+-----------+
| name        | star              | time      |
+-------------+-------------------+-----------+
| 刺杀,小说家   | 雷佳音,杨幂         | 2021-2-12 |
+-------------+-------------------+-----------+
1 rows in set (0.01 sec)
还有一种效率较高的方法,使用 executemany() 可以同时插入多条数据。示例如下:
db = pymysql.connect('localhost','root','123456','maoyandb',charset='utf8')
cursor = db.cursor()
# sql语句执性,列表元组
info_list = [('我不是药神','徐峥','2018-07-05'),('你好,李焕英','贾玲','2021-02-12')]
sql = 'insert into movieinfo values(%s,%s,%s)'
cursor.executemany(sql,info_list)
db.commit()
# 关闭
cursor.close()
db.close()

Query the insert result as follows:

mysql> select * from movieinfo;
±------------±------------------±---------- -+
| name | star | time |
±------------±------------------±-------- ---+
| I am not the God of Medicine | Xu Zheng | 2018-07-05 |
| Hello, Li Huanying | Jia Ling | 2021-02-12 |
±------------±--- ---------------±-------------+
2 rows in set (0.01 sec)

Modify the crawler program
Let's modify the crawler program in the previous section "Python Crawler Crawls Cat's Eye Movie Rankings", and store the captured data in the MySQL database.
As follows:

# coding=gbk
from urllib import request
import re
import time
import random
from ua_info import ua_list
import pymysql
class MaoyanSpider(object):
    def __init__(self):
        #初始化属性对象
        self.url = 'https://maoyan.com/board/4?offset={}'
        #数据库连接对象
        self.db = pymysql.connect(
            'localhost','root','123456','maoyandb',charset='utf8')
        #创建游标对象
        self.cursor = self.db.cursor()
    def get_html(self,url):
        headers = {
    
    'User-Agent':random.choice(ua_list)}
        req = request.Request(url=url,headers=headers)
        res = request.urlopen(req)
        html = res.read().decode()
        # 直接解析
        self.parse_html(html)
    def parse_html(self,html):
        re_bds = '<div class="movie-item-info">.*?title="(.*?)".*?<p class="star">
        (.*?)</p>.*?class="releasetime">(.*?)</p>'
        pattern = re.compile(re_bds,re.S)
        r_list = pattern.findall(html)
        self.save_html(r_list)
    def save_html(self, r_list):
        L = []
        sql = 'insert into movieinfo values(%s,%s,%s)'
        # 整理数据
        for r in r_list:
            t = (
                r[0].strip(),
                r[1].strip()[3:],
                r[2].strip()[5:15]
            )
            L.append(t)
            print(L)
        # 一次性插入多条数据 L:[(),(),()]
        try:
            self.cursor.executemany(sql,L)
            # 将数据提交数据库
            self.db.commit()
        except:
            # 发生错误则回滚
            self.db.rollback()
    def run(self):
        for offset in range(0,11,10):
            url = self.url.format(offset)
            self.get_html(url)
            time.sleep(random.uniform(1,3))
        # 断开游标与数据库连接
        self.cursor.close()
        self.db.close()
if __name__ == '__main__':
    start=time.time()
    spider = MaoyanSpider()
    spider.run()
    end=time.time()
    print("执行时间:%.2f" % (end-start))

The database query stores the results as follows:
mysql> select * from movieinfo;
±---------------±----------------- ----------------------------------------±--------- --+
| name | star | time |
±---------------±----------------------- ----------------------------------±------------+
| I am not The God of Medicine| Xu Zheng, Zhou Yiwei, Wang Chuanjun| 2018-07-05 |
| The Shawshank Redemption| Tim Robbins, Morgan Freeman, Bob Gunton| 1994-09-10 |
| Green Book| Viggo Mortensen, Mahershala Ali, Linda Cardellini| 2019-03-01 | | Sea
Pianist| Tim Roth, Bill Nunn, Clarence Williams III| 2019-11-15 |
| The Thief Family | Nakagawa Masaya, Ando Sakura, Matsuoka Moyu | 2018-08-03 |
| Farewell My Concubine | Leslie Cheung, Zhang Fengyi, Gong Li |
1993-07-26 | | Into the World| Lv Yanting, Jon Senseff, Han Mo| 2019-07-26 |
| A Beautiful Life| Roberto Bernini, Giustino Durano, Sergio Bini Bust Rick | 2020-01-03 |
| This killer is not too cold | Jean Reno, Gary Oldman, Natalie Portman | 1994-09-14 |
| Inception | Leonardo DiCaprio, Ken Watanabe, Joseph Gordon-Levitt | 2010-09-01 |
±--------------- ±------------------------------------------------- --------±------------+
10 rows in set (0.01 sec)

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326874536&siteId=291194637