Crawler Learning (06): Data Storage_mysql

1. Introduction to mysql

数据库It is a functional 存数据software that provides various data 查询操作and修改操作

mysqlThe specific installation operation, here will not do too much introduction. Mainly talk about how to implement mysql operation through py program.
The specific installation operation can be seen in the following article
链接: mysql installation and configuration

The operations here are basically visualized in navicat

1. Install python to connect to mysql module -> pymysql module

pip install pymysql

2. pymysql import package:

import pymysql

Two, mysql basic operation

1. Create table

SQL语句create form with

create table student(
    -- 字段=列=column=属性
	sno int(10) primary key auto_increment,
    sname varchar(50) not null, 
    sbirthday date not null,
    saddress varchar(255),
    sphone varchar(12),
    class_name varchar(50)
);
project type of data
double decimal
varchar string
date Time (Year Month Day)
datetime Time (year, month, day, hour, minute, second)
text large text
project Restrictions
primary key Primary key, the unique value of the entire table. Like a student number. ID number. Can uniquely determine a piece of data
auto_increment Primary key auto-increment. Must be an integer type
not null Cannot be empty.
null can be empty
default set default

2. Modify the table

-- 添加一列
ALTER TABLE table_name
ADD COLUMN column_name datatype

-- eg
ALTER TABLE student 
ADD COLUMN f_name VARCHAR(20) NOT NULL
AFTER sno;  -- AFTER 用于将新加的列在指定列的后面插入


-- 删除一列
ALTER TABLE table_name 
DROP COLUMN column_name

-- 修改一列的数据类型
ALTER TABLE table_name
MODIFY COLUMN column_name datatype

-- 表格重命名
ALTER TABLE table_name RENAME TO new_name;

3. Create and modify tables in navicat

3.1 navicat and mysql connection

insert image description here
insert image description here
After the creation is complete, you need to right-click or double-click to open the connection
insert image description here

3.2 navicat create database

insert image description here
insert image description here
Like the connection, it also needs to be opened
insert image description here

At this point, Navicat can manipulate your database.

3.3 navicat create table

insert image description here
Fool's operation
insert image description here

3.4 navicat design table

Select the created table, right-click to design the table, and then you can modify the table
insert image description here

4. Data operation - addition, deletion, modification and query - data

4.1 Add data

INSERT INTO table_name(col1, col2, col3...) values (val1,val2,val3)
-- 添加学生信息
INSERT INTO STUDENT(sname, sbirthday, saddress, sage, class_name) values ('周杰伦', '2020-01-02', "北京市昌平区", 18, "二班");

Note, if 主键设置自增, that's 不用处理主键it. mysql will help 自动us .按照自然顺序逐一自增

4.2 Delete data

DELETE FROM table_name where_clause
-- 删除学生信息
DELETE FROM STUDENT where sno = 1 ; 

4.3 Modify data

UPDATE table_name SET col1 = val1, col2 = val2... where_clause
-- 修改学生信息
UPDATE STUDENT SET SNAME = '王力宏' where sno = 1;

4.4 Query data

4.4.1 Basic query
SELECT *|col1, col2, col3 
FROM table_name 
where_clause
-- 全表查询
SELECT * FROM STUDENT;

-- 查询学生姓名, 年龄
SELECT sname, sage FROM STUDENT;

-- 查询学号是1的学生信息
select * from student where sno = 1;

-- 查询年龄大于20的学生信息
select * from student where sage > 20;

-- 查询学生年龄大于20 小于40的信息(包含)
select * from student where sage >= 20 and sage <= 40;
select * from student where sage between 20 and 40 ;

-- 查询姓张的学生信息
-- 		_一位字符串
-- 		%多位字符串
select * from student where sname like '张%';
4.4.2 Grouping queries and aggregate functions

How to query the average age of students in each class?

Let's expand the data first

insert image description here

The average age of each class. Do we need to separate the class from the class first? Each class calculates internally. Right. At this point, what we need is a grouping operation. At this time, we need to use the group by statement

select * from table_name group by col_name

Note that the above sql cannot be used. After sql requires grouping, what exactly to do must be clearly pointed out. Otherwise, an error will be reported

That's very easy, we need to calculate the average age of each class after grouping. How to calculate the average, this requires the use of aggregate functions. There are 5 aggregate functions in sql, namely: avg(), sum(), min(), max(), count()

-- 查询每一个班级的平均年龄
select avg(sage), class_name from STUDENT group by class_name;

-- 查询每个班级最小的年龄
select min(sage), class_name from STUDENT group by class_name;

-- 查询每个班的最大年龄
select max(sage), class_name from STUDENT group by class_name;

-- 查询每个班的学生数量
select count(*), class_name from STUDENT group by class_name;

-- 查询每个班级的年龄和
select sum(sage), class_name from STUDENT group by class_name;

Be careful, don't put the content that is not in the group by directly in the select. Think about it. To query the average age according to the class, you have to put the information of a certain person in the result. It is not appropriate.

4.4.4 having statement

If we need to further filter the results of aggregate function calculations, we can use the having statement

-- 查询平均年龄在15岁以上的班级信息
select avg(sage), class_name from student group by class_name having avg(sage) > 15;

having和where的区别:

  1. where, performed on the original data 数据筛选.

  2. having, after the aggregation function is computed 结果进行筛选.

4.4.5 Sorting

Use statements in sql order byto perform query results 排序.

-- 按照年龄从小到大查询学生信息
select * from student order by sage asc

-- 按照年龄从大到小查询学生信息
select * from student order by sage desc

4.5 Multi-table joint query

In actual use, a table is definitely not enough for our data storage. For example, in the student course selection system. We can design the following table structure:

  1. Student table: student number, name, gender, address, etc…
  2. Class schedule: course number, course title, teaching teacher, etc…
  3. Student Course-Grade Sheet: Transcript Number, Student Number, Course Number, Grade

In a table structure like this:

​:优势 The structure of each table is relatively clear. There is no ambiguity. The data is kept intact and there is no redundancy.
​:劣势 Novices are not easy to think about. I can’t figure out why it is designed this way. This involves the database table structure design paradigm, which belongs to the first Three paradigms (just listen to it).

In the table structure of the model, the grade table is very important. In the grade table, it is clearly stated which student has scored how many points for which course. It associates two original irrelevant tables. Established primary-foreign key relationship.

Why 主外键关系:

​ Put the primary key in table A in another table and use it as a common field, but the data requirements must come from A. This is easy to understand. For example, the student number data in the student score table must come from the student table. Otherwise This data is meaningless.

Note that the above structure is just to explain the multi-table relationship. It is not a complete table structure of the student course selection system.

Create table statement:

-- 创建学生表, 课程表, 成绩表
-- 1. 学生表: 学号, 姓名, 性别, 住址等...
-- 2. 课程表: 课程编号, 课程名称, 授课教师等...
-- 3. 学生课程-成绩表:  成绩表编号, 学号, 课程编号, 成绩
create table stu(
	sid int primary key auto_increment,
	sname varchar(50) not null, 
	gender int(1),
	address varchar(255)
);

create table course(
	cid int primary key auto_increment,
	cname varchar(50) not null, 
	teacher varchar(50)
);

create table sc(
	sc_id int primary key auto_increment,
	s_id int, 
	c_id int,
	score int,
	CONSTRAINT FK_SC_STU_S_ID FOREIGN key(s_id) REFERENCES stu(sid),
	CONSTRAINT FK_SC_COURSE_C_ID FOREIGN key(c_id) REFERENCES course(cid)
);

4.5.1 Subqueries

Another query can be made in the where statement.

​ For example, to query students who have chosen the course "Programming"

-- 查询选择了"编程"这门课的学生
-- 先查询编程课程的编号
select cid from course where cname = '编程';
-- 根据cid可以去sc表查询出学生的id
select s_id from sc where c_id = 2;
-- 根据学生ID查询学生信息
select * from stu where sid in (1,2,3,4,5,6);

-- 把上面的sql穿起来 
select * from stu where sid in (
    select s_id from sc where c_id in (
        select cid from course where cname = '编程'
    )
);

-- 查询课程名称为“编程”,且分数低于60的学生姓名和分数
select stu.sname, sc.score from stu, sc where stu.sid = sc.s_id and sc.score < 60 and sc.c_id in (
	select cid from course where cname = '编程'
)

4.5.2 Association query

关联查询It is to combine multiple tables join的方式together. Then perform conditional retrieval.

Grammar rules:

select ... from A xxx join B on A.字段1 = b.字段2

表示:  A表和B表连接. 通过A表的字段1和b表的字段2进行连接. 通常on后面的都是主外键关系
4.5.2.1 inner join
-- 查询每门课程被选修的学生数
-- count(*)
-- group by cid

select c.cid,c.cname, count(*) from sc inner join course c on sc.c_id = c.cid group by c.cid, c.cname
4.5.2.2 left join
-- 查询所有学生的选课情况
select s.sname, c.cname from stu s left join sc on s.sid= sc.s_id left join course c on sc.c_id = c.cid

-- 查询任何一门课程成绩在70分以上的姓名、课程名称和分数
-- score > 70 sc
-- sname student
-- cname course
select s.sname, c.cname, sc.score from stu s inner join sc on s.sid = sc.s_id inner join course c on sc.c_id = c.cid
where sc.score > 70

3. Python connects to mysql

3.1 Find data

import pymysql  # 导入模块
from pymysql.cursors import DictCursor  # 导入字典模块
#  1. 创建连接
conn = pymysql.connect(
    #  当忘记参数是什么的时候,直接按住commond点进去看看
        user='root',  # 用户名
        password="x",  # 密码
        host='127.0.0.1',  # 端口
        database='test',  # 数据库名
)

#  2. 创建cursor, 游标 -> 用于执行sql语句,,以及获取sql执行结果
cursor = conn.cursor()
#  2.1 执行sql语句
cursor.execute('select * from student')
r = cursor.fetchall()  # 获取结果
print(r)  # 运行完毕,会发现是元组套元组的形式 # ( (), () )
#  而我们喜欢的数据类型应该是 [{cno:1, cname:xxx, xxx: xxx}, {}, {}]
#  所以需要导入一个字典模块

#  将导入的模块放到游标里
cursor1 = conn.cursor(DictCursor)
#  2.1 执行sql语句
cursor1.execute('select * from student')
r = cursor1.fetchall()  # 获取结果
print(r)  # 可以发现已经成为我们想要的那个类型了

operation result
insert image description here

3.2 New data

import pymysql  # 导入模块
from pymysql.cursors import DictCursor  # 导入字典模块
#  1. 创建连接
conn = pymysql.connect(
    #  当忘记参数是什么的时候,直接按住commond点进去看看
        user='root',  # 用户名
        password="x",  # 密码
        host='127.0.0.1',  # 端口
        database='test',  # 数据库名
)


#  2. 新增数据
cursor = conn.cursor()
sname = 'wby'
sbirthday = '2010-08-10'
saddress = '浙江宁波'
class_name = '少年团'
#  准备好sql语句
#  注意: 这种sql的问题 1. 很乱, 2. 有被注入的风险,可以选择下面的方式
sql = f'insert into student(sname, sbirthday, saddress, class_name) values ("{
      
      sname}", "{
      
      sbirthday}", "{
      
      saddress}", "{
      
      class_name}")'
cursor.execute(sql)
#  数据增加后,需要提交
conn.commit()

#  %s字符串的占位符  用来预处理,有几个参数要填入,就写几个%s   ->  推荐这种方法
sql = f'insert into student(sname, sbirthday, saddress, class_name) values (%s, %s, %s, %s)'
#  在execute中放预处理的内容, 注意传入的是元组的形式
cursor.execute(sql, (sname, sbirthday, saddress, class_name))
conn.commit()

4. Summary about mysql

  1. Commonly used 增加数据operations of reptiles
insert into(字段1,字段2,字段3...) values (1,2,3...)
  1. Commonly used 修改数据operations of reptiles
updateset 字段=, 字段=where 条件
  1. Commonly used 删除数据operations of reptiles
delete fromwhere 条件
  1. Commonly used 查询数据operations of reptiles
select * fromwhere 条件

Guess you like

Origin blog.csdn.net/m0_48936146/article/details/127473183