Database Principles

What is a database?

A database is a collection of data stored in some organized manner . That is: a container (a file or group of files) that holds organized data

Why do we need a database?

There is no doubt that databases are used to store data. We are certainly not unfamiliar with excel, and excel is also used to store data. Then since there is a very easy-to-use software like excel, why do you need a database? ?

  • The amount of data stored in excel is too small . Due to the development of our network, the storage capacity of excel is far from being able to support our needs.
  • Excel data cannot be shared by multiple people . Excel is just a single file that can only be used and modified by the current user.
  • Data security . The modification of excle data is very arbitrary.

The database solves the above problems, and the database manages data files with a special mechanism, and has a very high read and write speed for data, which greatly exceeds the read and write speed of the operating system for conventional files.

The composition of the database system

The database system consists of three layers:

  • Database (dataBase)
    • A warehouse for storing data, which is stored in a certain format (organized way)
  • Database management system (dataBase Manager System)
    • System software for establishing, managing and maintaining databases
  • Database Application System (dataBase Application System)
    • Application software using database technology

 

write picture description here

 

Data Description and Data Model

understand data description

It is very simple for us to describe a thing in real life. When we see "a tree", we say "a tree" .

But how to describe "a tree" in a computer? ? The computer only recognizes 0 and 1 , and "a tree" cannot be directly stored on the computer !

Therefore, we abstracted "a tree" to form a conceptual model of the information world. The conceptual model is then formalized into a data model supported by the DBMS and stored in the computer .

To put it simply: data description is to abstract the physical objects in the real world to form a conceptual model. Convert the form of the conceptual model into a type supported by the DBMS, and then store it in the computer!

Understand the data model

** The data model is mainly used to describe the data! **As mentioned above, when we want to store the data of real things on the computer, we need to abstract it into a conceptual model first. Convert the conceptual model to a DBMS-backed data model, and you can store things on your computer!

The data model generally consists of three parts:

  • Data structures (object-to-object relationships)
  • Data manipulation (add, delete, modify, check)
  • Integrity constraints (there are certain rules for limiting data, such as: age cannot be negative)

The data model also goes through a phase of development:

①: Hierarchical model is a data model that organizes data in a tree (hierarchical) structure.

 

write picture description here

 

advantage:

  • Clear structure and easy to understand
  • The connection between nodes can be realized through pointers, and the query efficiency is high

shortcoming:

  • For non-hierarchical data, it is very cumbersome to represent and not intuitive!

②: Mesh model, which is a data model that uses a directed graph structure to organize data

 

write picture description here

 

advantage:

  • Very flexible and can describe real world things more directly

shortcoming:

  • Complex structure, very difficult to maintain

③: The relational model is a data model that uses a two-dimensional table structure to represent data and the relationship between data.

 

 

The relational model is the data model we use the most today.

advantage:

  • The data structure is simple and clear . Both entities and entity sets are represented by corresponding two-dimensional tables!
  • There is a strict mathematical theoretical basis . Various relational operations (will be discussed later)

shortcoming:

  • The query efficiency is higher than the non-relational model query, especially when querying multiple tables!

Terminology (Basic Concepts)

Let's explain it against the curriculum relationship table:

 

 

Entity

Things that exist objectively and can be distinguished from each other are called entities. Can be seen as a Java class

Example: (course relationship table) is an entity.

Attribute

A property that an entity has is called an attribute. It can be seen as a member variable of a Java class. Attributes are also called fields (or columns) in the database

Example: (course name), (course number), (hours) are attribute names.

tuple

Lines other than the line containing the attribute name are called tuples.

Each row of data below is called a tuple (C401001 Data Structures 70) (C401002 Operating Systems 80) (C402001 Computer Principles 60)

Code (Key)

Codes are also called keys. It can uniquely identify an entity .

Candidate key and primary key:

  • Candidate code: If a set of attributes can uniquely identify the tuples in a relation without containing redundant attributes, the attribute set is called the candidate code of the relation. ( There may be more than one candidate code )
  • Primary key: The candidate key selected by the user is called the primary key

Example: Mailing address (city name, street name, zip code, unit name, recipient)

It has two candidate keys: {city name, street name} and {street name, zip code}

If I select {city name, street name} as the attribute that uniquely identifies the entity, then {city name, street name} is the primary code

relational schema

The combination of a relation name and its set of attributes is called a relation schema .

Example of relationship mode: course relationship table (course number, course name, hours)

Tip: A relational model is a collection of relational schemas

area

The relational model requires that each component of a tuple be atomic, that is, it must belong to a certain element type, such as Integer, String, etc., not a column, a set, a record, an array!

The domain represents the type of each component in the tuple . From the above figure, we can see that its domain is as follows: course number: string, course name: string, hours: int

Database Architecture Internal Structure

The internal structure of the database system can be divided into three layers:

  • outside mode
  • logical mode
  • Intra mode

Location of the tertiary mode:

 

write picture description here

 

The role of the three-level mode:

 

write picture description here

 

logical mode

A logical schema is a description of the overall logical structure of all data in a database .

Example: Now I have a database that manipulates the relationship between permissions, roles, users

So we have the following relational schema

  • Authority relationship (authority number, authority name, authority description)
  • Role relationship (role number, role name, role description)
  • User relationship (user ID, user name, user password)

The collection of all relational schemas in the database constitutes the logical schema!

outside mode

External schema is a description of the local data logical structure that database users can see and use , and is a logical representation of data related to an application .

There can be multiple external modes. The external mode is the interface between the user and the DBAS, and it is a description of the local logical structure!

When the user application only needs to display the username and password:

  • User relationship (username, user password)

Manipulating local logical structures in the database is called external schema !

Intra mode

An internal schema is a description of the physical storage structure of a database table . It defines the internal record types of data, record addressing techniques, indexing and file organization, and data control.

 

write picture description here

 

Two-level image of DB internal architecture

The two-level images are:

  • Image of external mode and logical mode
  • Image of logical mode and inner mode

 

write picture description here

 

What is the use of coming up with the concept of two-level mapping? Why do you need two levels of images? ?

  • When the logical schema structure of the database is modified for some reason, the external schema can be kept unchanged as long as the attributes related to the external schema definition in the logical schema and the affiliation of its relational schema names are not changed, so that the application program does not need to be modified. .
  • When the internal schema of the database needs to be modified for some reason, the logical schema can be kept unchanged as much as possible by modifying the image between the logical schema and the internal schema, so that the internal schema can be changed without modifying the application as much as possible. program.

That is to say: when changing the internal structure, as long as the external data is not touched, the external data does not need to be changed. The proposal of the concept of two-level image is also the problem of coupling in the program!

Why should we learn database relational operations?

Learning and understanding the mechanism of relational operations is of great significance for understanding the data query mechanism in relational databases.

We may know that we need to eliminate redundant redundant data when querying multiple tables, so how does redundant redundant data come about? ? How does the WHERE clause filter the data? ? We can find answers to these questions in relational operations.

Learning the relational operation of the database will let us understand how the SQL statement is executed and what means we use to get the desired result.

study outline

 

write picture description here

 

Cartesian Product

What is a Cartesian product?

A Cartesian product is simply the result of multiplying two sets .

Why does a Cartesian product appear when querying a database?

As mentioned in the previous blog post, a relational model is a collection of relational schemas .

The two tables in the database are equivalent to two sets. When we use the SELECT statement to query the data, the DBMS internally obtains the result by multiplying the sets.

The production process of the Cartesian product

We found out: the cardinality of the Cartesian product is the multiplication of the tuples of each set !

write picture description here

 

The content of the data obtained is difficult to conform to the actual situation in reality.

 

write picture description here

 

In order to better see the effect, I will use the actual SQL statement to see the effect, and then explain the problem.

There are 14 records in the emp table:

 

write picture description here

 

The dept table has 4 records:

 

write picture description here

 

Let's take a look at SMITH, in the emp table, he is only in department 20.

 

write picture description here

 

But after the two tables were queried, he was there in departments 10, 20, 30, and 40! ! We observed 56 more pieces of data and found that everyone has 4 departments, such data is unreasonable! !

 

write picture description here

 

Going back to the original intention, what is the purpose of querying the two tables? ? **While querying employee information, you can also know what the employee's department name is! ! ! ** Therefore, the number of records we query should not be as many as 56. . The number of records we query should be the number of records in the employee table, which is only 14!

Let's analyze it again: There is a deptno field in the emp table, and there is also a deptno field in the dept table! And found that the value range of the deptno field in the emp table is determined by the deptno field in the dept table ! ! !

So, we can use equijoin (emp.deptno=dept.deptno) to eliminate the Cartesian product , and we're done!

 

write picture description here

 

Relational Operations Based on Traditional Set Theory

On Oracle, the syntax for manipulating collections provides 4 keywords:

  • UNION (union, duplicate tuples are not displayed)
  • UNION ALL (union, duplicate tuples are also shown)
  • MINUS (difference set)
  • INTERSECT (intersection)

and

Display full information of query results, eliminating duplicate tuples

 

write picture description here

 

Query information for all clerks and managers


	SELECT *
	FROM emp
	WHERE job = 'MANAGER'
	
	UNION

	SELECT *
	FROM emp
	WHERE job = 'CLERK';

 

write picture description here

 

Note: use UNION and operation, the performance is better than using the keyword OR!

pay

Returns the same part of the query result

 

write picture description here

 

Query information of 10 departments


SELECT *
FROM dept

INTERSECT 
SELECT *
FROM dept
WHERE deptno = 10;


(All departments and department 10 are the same only for department 10, so the result of department 10 is returned in the end)

 

write picture description here

 

Difference

The query result returned is

 

write picture description here

 

 

write picture description here

 


SELECT *
FROM dept

MINUS
SELECT *
FROM dept
WHERE deptno = 10;


 

write picture description here

 

Relational operations specific to relational algebra

projection

The operation process of projection:

First, in the order of j1, j2, ..., jk, take out the k columns with the column numbers j1, j2, ..., jk (or the attribute name sequence is Aj1, Aj2, ..., Ajk) from the relation R, and then remove the k columns in the result. Repeat the tuple to form a k-item relationship with Aj1, Aj2, ..., Ajk as the attribute order.

Simply put: take out a certain column in a query result and eliminate duplicate data, this is projection!

  • Projection is an operation from the perspective of a column
  • The subscript of the projection can be the column serial number or the column attribute name

Query the number of all departments




SELECT deptno
FROM dept;

The process of query: first query to obtain all the results of the dept table, and then extract only the column data of "deptno" through the projection operation. If the SELECT is followed by "*", then all the data is projected!

 

write picture description here

 

choose

Use comparison operators and logical operators to pick out tuples that meet the conditions, and calculate the result!

Query the names of employees whose salary is greater than 2000



SELECT ename
FROM emp
WHERE sal > 2000;

Process: First query all the results of the emp table, use the selection operation to filter out the results with a salary greater than 2000, and finally use the projection operation to get the name of the employee with a salary greater than 2000!

 

write picture description here

 

division operation

I haven't figured out the practical application of the division operation. If anyone knows where the division operation can be used in the database, please let me know. .

Let's also understand the process of division operation: relation R has ABCD, relation S has CD, first project AB (because S has CD), and then use the result of projecting AB and relation S to do the Cartesian product operation. If the record of the Cartesian product operation done finds the corresponding record in the R relation, then the projected AB is the result!

 

write picture description here

 

join operation

The connection operation actually defines the conditions (a column is greater than, less than, equal to a certain column) on the basis of the Cartesian product operation , and only matches those that meet the conditions , so as to obtain the result!

natural connection

Natural join is a special join operation, which restricts the condition that a certain column is equal to a certain column . Natural connections we use a lot. Eliminating the Cartesian product is actually a natural connection!


SELECT *
FROM emp, dept
WHERE dept.deptno = emp.deptno;

Set the deptno column of the dept table to be the same as the deptno column of emp [this is the natural connection]

If there are any mistakes in the article, please correct me, and we can communicate with each other. Students who are accustomed to reading technical articles on WeChat and want to get more Java resources can follow WeChat public account: Java3y

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324944349&siteId=291194637