Database (1) Basic knowledge

Overview

**Database is a warehouse that organizes, stores and manages data according to data structure.**

data model

The core and foundation of the database system is the data model, which is a strictly defined set of concepts. Therefore, the data model generally consists of three parts: data structure, data operations and integrity constraints. There are three main types of data models: hierarchical model, network model and relational model. The relational model is currently the most widely used data model. Relational databases use the relational model as the way to organize data, such as ORACLE, MYSQL and other relational databases.

Relational model features
- The logical structure of data in the relational model is a two-dimensional table, or the data structure of the relation is a table.
- The data operations of the relational data model mainly include querying, inserting, deleting and updating data.
- The integrity constraints of the relational model include three major categories: entity integrity, referential integrity and user-defined integrity.
- Entity integrity rule: If attribute (referring to one or a group of attributes) A is the primary attribute of basic relationship R, then A cannot take a null value (from this rule, a direct conclusion can be drawn: the primary key cannot be null)
- Referential integrity rule: If an attribute (or attribute group) F is a foreign key of a basic relationship R, and it corresponds to the primary key of the basic relationship R1, then the value of each F in R is either null or equal to R1 primary key value in

Database classification

**The difference between relational and non-relational databases**

Notice:

OLTP (On-Line Transaction Processing) refers to online transaction processing
OLAP (On-Line Analytical Processing) refers to online analytical processing

Databases are mainly divided into relational databases and non-relational databases. Their respective characteristics are as follows:

Relational database (RDBMS)
- A database that uses a relational model to organize data. The integrity rules of the relational model are certain constraints on the relationship, which are divided into entity integrity, referential integrity constraints and user-defined integrity. That is, the relational model refers to the two-dimensional tabular model, and a relational database is composed of two-dimensional A data organization composed of tables and the relationships between them. In a relational database, relationships are called tables, tuples are called rows, and attributes are also called columns.
- Common relational databases such as: Oracle, MySQL, SQL Server
Non-relational database:
- Now it refers more to NoSQL databases, such as: based on key-value pairs (redis), based on document (mongodb), based on column family (hbase), based on graph (neo4j)

affairs

A transaction refers to a sequence of database operations defined by the user. These operations are either done or not done at all. It is an indivisible unit of work. Transactions have four characteristics: atomicity, consistency, isolation, and persistence. ACID properties for short

Atomicity: A transaction is an indivisible whole. All operations within a transaction either succeed or fail.
Consistency: Before and after the transaction is executed, the data must be consistent from one state to another (A transfers money to B, A deducts the money, but B does not receive it)
Isolation: Multiple concurrent transactions are isolated from each other and cannot interfere with each other.
Durability: After the transaction is completed, changes to the database are permanently saved and cannot be rolled back

index

An index is a structure that sorts the values of one or more columns in a database. Indexes can be used to quickly access data in a database table.

Index advantages
- By creating a unique index, you can ensure the uniqueness of each row of data in the database table
- Can greatly speed up data retrieval (the main reason to create an index)
- When using group by and order by clauses for data retrieval, the time for grouping and sorting in queries can also be significantly reduced.
- It can speed up the connection between tables, especially in achieving referential integrity of data.
Index Disadvantages
- Increased database storage space
- Inserting and deleting data takes more time (because the index also needs to change)

view

A view is a table derived from one or several base tables (or views). Unlike a basic table, it is a virtual table
The database only stores the definition of the view, not the data corresponding to the view. These data are still stored in the original basic table.
Therefore, when the data in the basic table changes, the data queried in the view will also change.
A view is like a window through which you can see the data you are interested in and its changes in the database.
Once a view is defined, it can be queried and deleted just like the basic table.

Primary keys and foreign keys

Candidate code: an attribute group that uniquely identifies a tuple in a relationship (two-dimensional table)
Primary key : If a table has multiple candidate keys, select one of them as the primary key.
Foreign key : If a certain attribute set in the relational schema R is not the primary key of R, but the primary key of another relation R1, then the attribute set is the relational schema
Primary attributes and non-primary attributes: The primary attributes of candidate codes are called primary attributes. Attributes that are not included in any candidate code are called non-primary attributes

SQL classification

The full name of SQL is Structured Query Language, which refers to structured query language. SQL statements mainly include

Data definition (DDL): create, drop, alter
Data query (DQL): select
Data operations (DML): insert, update, delete
Data control (DCL): grant, revoke

DDL data definition language

create database <dbName>; ##Create database
show databases; ## Display the current database list
alter database <dbName> character set utf8; ## Modify the character set of the database
drop database <dbName>; ## Delete database

DQL data query language

DQL is used to extract records that meet specific conditions from the data table

basic grammar
- select columnName1 [columnName1, columnName2, ...] from <tableName> [where conditions]; # After the select keyword, specify the columns in which the queried records are to be displayed.
Where clause: You can add where clause (condition) after deletion, modification and query statements, which is used to filter the data that meets specific added data for deletion, modification and query operations.
- delete from tableName where conditions;
- update tableName set ... where conditions;
- select ... from tableName where conditions;
LIKE clause: In the condition of the where clause, we can use the like keyword to implement fuzzy query.
- select * from tableName where column like 'reg';
  - In the reg expression
  - % represents any number of characters [%o% contains the letter o]
  - _ represents any character [_o% the second letter is o]
- # Query the information of students whose names contain the letter o
- select * from stus where stu_name like '%o%';
- # Query student information whose first character is 'Zhang'
- select * from stus where stu_name like '张%';
- # Query student information whose second letter is o
- select * from stus where stu_name like '_o%';
Sort order by: Arrange the queried records that meet the conditions in ascending/descending order according to the value of the specified column.
- order by column,column1,column2 asc/desc: First sort by column, if there are the same, then sort by column1, and so on...
- select * from tableName where conditions order by columnName asc|desc;
Aggregation functions: SQL provides some functions that can calculate the columns of the queried records – aggregate functions
- The count() statistical function counts the number of specified field values that meet the conditions (number of records)
- max() calculates the maximum value and queries the maximum value of the specified column in the records that meet the conditions
- min() calculates the minimum value and queries the minimum value of the specified column in the records that meet the conditions
- sum() calculates the sum and queries the sum of the values of the specified columns in the records that meet the conditions.
- avg() calculates the average value and calculates the average value of the specified column among the records that meet the conditions.
  - Total count: select count(*) as totalcount from table1;
  - Sum: select sum(field1) as sumvalue from table1;
  - 平均：select avg(field1) as avgvalue from table1;
  - 最大：select max(field1) as maxvalue from table1;
  - Minimum: select min(field1) as minvalue from table1;
Group by: Group tuples according to the value of one or more attributes. Those with the same values are a group. Usually after grouping, the aggregation function will act on each group, that is, each group has a function value; if it is also required after grouping To filter these groups according to certain conditions and only output groups that meet the specified conditions, use the HAVING phrase to specify the filtering conditions.
- #Group by age, count the number of people of each age, and output (age, number of people of this age)
- select Sage, count(*) from Student group by Sage;
- #Group by age, count the number of people of each age, select the group with a number greater than 1, and output (age, number of people of this age)
- select Sage, count(*) from Student group by Sage having count(*) > 1;
Join query: One query involves multiple tables
- Suppose there are 2 tables - Student table and SC table (course selection table):
  - Inner join (natural join): When using inner join, if some students in Student do not choose courses, there will be no corresponding tuples in SC. The final query results discarded the information of these students
    - Query the status of each student and their elective courses (students who have not chosen courses will not be listed)
    - SELECT Student.*, SC.* FROM Student , SC WHERE Student.Sno=SC.Sno;
  - External connection: If you want to use the Student table as the main body to list the basic information of each student and his course selection. Even if a student has not chosen a course, it will still be displayed in the query results (fill in the blank values for the attributes of the SC table). You need to use outer joins
    - Query the status of each student and their elective courses (students who have not chosen courses will also be listed)
    - SELECT Student.*, SC.*
    - FROM Student LEFT JOIN SC ON(Student.Sno=SC.Sno);
Paging query: Retrieving all records at one time will occupy a lot of system resources, so paging statements are often used: only as many records as required are fetched from the database;
- Different database paging statements are different. Mysql keyword is limit, SqlServer is top, and Oracle is rowNum.

DML data manipulation language

DML is used to complete the insertion, deletion and modification operations of data in the data table

insert into <tableName>(columnName, columnName...) values(value1,value2...) #Insert data
delete from <tableName> [where conditions] #Delete data
update <tableName> set columnName=value [where conditions] #Modify data

DCL data control language

DCL is used to create users and manage user permissions

create user 'user_name'@host_name [IDENTIFIED BY[PASSWORD ]'password' ]#createUsr
select * from user #View users
drop user 'username'@hostname#delete user
grant permission 1, permission 2 on database name.table name to username@user address#Grant permissions
revoke permission 1, permission 2 on database name. table name from username@user address#revoke permissions