Data Analyst __SQL Basics

SQL language

 Whether you are a junior or intermediate data analyst, sql is a must-have skill for work, and you must also know how to tear sql code by hand. To avoid forgetting while learning, use this skill as my first step in summarizing relevant knowledge.

Note: [] brackets indicate optional;

1. Data Definition Language (Data Definition Language, DDL)

 DDL includes the creation, modification, and deletion of various objects (tables, views, indexes, etc.) in the database. Common commands such as create, alter, drop:

1.1 create

Function: used to create a database/table

grammar:

create database/table name;

create table [if not exists] name (
	'字段名' 列的类型 [属性] [索引] [注释],
	...
	'字段名' 列的类型 [属性] [索引] [注释]
	)[表类型][字符集设置][注释];

example:

##创建数据库school
create database [if not exists] school;
## 创建数据库表student
create table [if not exists] student(
	id int(4) not null auto_increment comment '学号',
	name varchar(30) not null default comment '学生姓名',
	gender varchar(2) not null default comment '性别',
	class int(4) not null default comment '班级',
	primary key(id)
)engine=innodb default charset=utf8;

type of data

type of data description
integer(size)
int(size)
smallint(size)
tinyint(size)
Only hold whole numbers. Specify the maximum number of digits in parentheses
decimal(size,d)
numeric(size,d)
Holds numbers with decimals.
"Size" specifies the maximum number of digits. "D" specifies the maximum number of digits to the right of the decimal point.
char(size) Holds a fixed-length character string (can hold letters, numbers, and special characters).
Specify the length of the string in parentheses.
varchar(size) Holds variable-length character strings (can hold letters, numbers, and special characters).
Specify the maximum length of the string in parentheses.
date(yyyymmdd) date

Constraints

Constraint type description
not null Indicates that a column cannot store NULL values
unique Ensure that each row of a column must have a unique value
primary key Unique identification, a combination of NOT NULL and UNIQUE. Ensure that a column (or a combination of two columns and multiple columns) has a unique identifier, which helps to find a specific record in the table more easily and quickly
foreign key Ensure that the data in one table matches the referential integrity of the values ​​in another table
foreign key (stu_id) references student (id)
check Ensure that the values ​​in the column meet the specified conditions
check(age>0)
default Specifies the default value when no value is assigned to the column.
auto_increment Self-increasing function, initial value 1, step size 1, and initial value can also be set.

1.2 alter

Function: Add, modify or delete columns to an existing table
Syntax:

alter table table_name add/modify/change/rename/drop name;

--rename用于修改表名
alter table table_name rename [to] new table_name ;
alter table student rename to student2021;

--add 增加一列多多列
alter table table_name add column_name datatype
alter table student add(
	age int(2) not null default comment '年龄',
	address varchar(30) not null default comment '地址'
);

--modify 修改字段类型和约束
alter table table_name modify column_name datatype...;
alter table student modify (name varchar(10) default 'unknown')

--change 修改字段名
alter table table_name change old_name new_name datatype...; --新字段需要完整定义
alter table student change name stu_name char(4);

--drop 删除字段,删除表的结构及其所依赖的约束、索引等,执行后无法回滚
alter table table_name drop column column_name;
alter table student drop column age;	

1.3 truncate

Function: Clear all rows in the table, but the table structure and its constraints, indexes, etc. remain unchanged, and cannot be rolled back after execution like drop.
Syntax:

truncate [table] table_name;
truncate student;

2. Data Manipulation Language

Used to manipulate records, common commands such as insert, update, delete, are used to add, modify and delete records respectively

2.1 insert

Function: Add records to the table
Syntax:

insert into table_name values (values1, values2...)
insert into talbe_name (columns1, columns2, ...) values (values1, values2...)
insert into student (id, name) VALUES ('10', '张三');

2.2 update

Function: modify the data in the table
Syntax

update table_name set columns_name = new_values where columns_name2 = old_values
update student set name = '小李', age = 12 where id = 10

2.3 delete

Function: delete a row in the table
Syntax

delete from table_name where columns_name = some_values
delete from student where id = 10

Note:
 delete can delete part of the data in the table and retain the table structure, while truncate can delete all data in the table and retain the table structure, but it is fast


3. Data Query Language (DQL)

Query records and basic structure in database tables

3.1 Syntax

select * from table where condition;
select id, name from student where id = 10;

3.2 Description of common functions

function description
from Select the data table
where Form records that meet certain conditions
group by Group the result set according to one or more columns, generally combined with aggregate functions
having After filtering and grouping each group of data, it is generally used in combination with aggregate functions, because where cannot be used with aggregate functions
order by Specify the column to sort the result set, the default ascending order is asc, and descending order is desc
limit n[,m] Return the first n records, or from the nth record, return m records, excluding the nth record
sum()
avg()
count()
min()
max()
Common aggregate functions

3.3 Advanced function description

function description
like Used to search for the specified pattern in the column in the WHERE clause
in Allow multiple values ​​to be specified in the WHERE clause
_ Or% wildcard _Any character, %Any character
between A and B Select a value in the data range between the two values.
These values ​​can be numeric, text, or date
including a but not b
inner join on Intersection of two tables that meet the conditions
left join on The records that meet the conditions are subject to the left table, and the ones that are not matched in the right table are null
right join on The records that meet the conditions are subject to the right table, and the left table does not match null
full join on All records that meet the conditions are matched in one of the left and right tables
union Connect up and down, merge the results of two or more SELECT statements, and perform column conversion (highly difficult test sites)
upper() Convert the value of the field to uppercase
Lower() Convert the value of the field to lowercase
mid() Extract the intermediate value
MID(column_name,start[,length])
including the starting position
len () Return field length
round() Round numeric fields to the specified number of decimal places.
round(column_name,decimals)
now() Return to the current system environment
format() Format the display of the field
format(column_name,format)
format(Now(),'YYYY-MM-DD')
case when then end 条件表达式函数,也可用于行转列(高难度考点)
case columns_name when condition then [else] end

3.4 开窗函数

函数类型 开窗函数 描述
序号函数 row_number() 有序号排序
row_number() over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
rank() 有序号排名,相同分数并列排
rank() over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
dense_rank() 始终返回连续的排名值
dense_rank() over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
前后函数 lag() 返回columns_name当前行往上offset行的值
lag(columns_name [, offset, default_values ] )
over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
lead() 返回columns_name当前行往下offset行的值
lead(columns_name [, offset, default_values ] )
over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
头尾函数 first_value() 返回columns_name有序行集第一行的值
first_value(columns_name)
over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
last_value() 返回columns_name有序行集最后一行的值
last_value(columns_name)
over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
分布函数 percent_rank() 返回某列或某列组合后每行的百分比排序
初始值0,有重复值
percent_rank() over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
cume_dist() 计算累积分布值,有重复值
表示值小于或等于行的值除以总行数的行数
cume_dist() over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
其他函数 nth_value() 从结果集的第N行获取值
nth_value() over (
[ partition by 分组列 ]
order by 排序列 desc/asc)
ntile() 按顺序分n组
ntile() over (
[ partition by 分组列 ]
order by 排序列 desc/asc)

3.5 select语句执行顺序:

from—where—group by—having—select—order by—limit

4. 结束语

 此篇为基础总结篇,下一篇文章介绍相关实战演练

Guess you like

Origin blog.csdn.net/Keeomg/article/details/113922039