Database kernel explanation-(1) database system overview

Database system overview (kernel)

This chapter will be divided into the following parts:

Some basic concepts of database system

The following uses the form of questions to answer the definition of some basic database concepts:

  1. What is data?
    According to the term definition of Baidu Encyclopedia:
    Data is numerical value, which is the result obtained through observation, experiment or calculation. There are many kinds of data, the simplest is numbers. The data can also be text, image, sound, etc. The data can be used for scientific research, design, verification, mathematics, etc.
    However, the definition of data in the research will be broader. The data are: some symbols that can be identified when people record them to reflect the objective world. Of course, these symbols include text, numbers, and data, including multimedia data.

  2. What are the characteristics of the data?
    Data has semantics. For example, the data 90 has different meanings in different situations, such as 90 points in test scores, 90 km/h of car speed, and so on.

  3. What is a database?
    The database is also called Database, and it is often called DB among programmers. Its responsibilities are:
    1. The data can be stored in the computer for a long time, and the data can be organized according to a certain data model, which is a table structure in popular terms. .
    2. It can be shared by users, that is, people can crud. (Create), (Retrieve), (Update), (Delete)
    3. For a certain application service, the current database may serve more than one system, but it means to provide services.
    4. The relationship between the data is closely linked.
    Of course, this function can also be done in the file system of the operating system, but why use a database system without a file system will not be elaborated here.

  4. What is a database management system?
    Database management system is also called Database Management System (DBMS), which is some programs we usually use to create and maintain databases, such as Oracle11g, MySql and so on. There will be people mixing up here, isn't this a database you are talking about? A database is actually a warehouse for storing data. With some familiar database systems, programmers can manipulate data well.
    There is an operating system between the database system and the database.

  5. What is the database schema?
    The teacher will talk about the database when the college students are in the class of the database. There are three modes, internal mode, external mode, and mode (logical mode). However, because college students are not familiar with database concepts when they first learn, they It is not easy to understand, but this is the core explanation and does not elaborate on the use and design process of the database.

Some implementation problems of database system

Next, pretend that we have implemented a simple database ourselves! Let's call it Treeses DBMS for the time being!
 1. First of all, the first thing our DBMS has to solve is how to store the database files?
 The first step, we have to consider what encoding method our database should use to store, binary? ASC-II code?

 Everything needs to be stored in files, and databases are no exception. Files are ultimately stored in binary in the memory. In order to make it look convenient for us, we will store our database in ASC-II code here!

 So we realized the details, the relationship is stored in files (ASC-II), where the files are stored in the location /user/db/data/table/user.db.
File table structure
 Above we are the storage of our database tables, but since it is a relational database, some schemas (table structures) are always stored, otherwise how do we know what this table is for? So, we need to A table structure of this table is stored in the file /user/db/model/table_structure.db.

Insert picture description here
 Ever since, we solved the storage problem of the table structure.

 With the table structure, we ignore the intermediate cache, memory, index, SQL parsing, etc., and we have to directly use the SQL statement we have done to query!
  We ran the following statement:

Insert picture description here
  Then how do we deal with these data?
    In the first step, we have to read the table_structure.db file to get the table structure of the user table;
    then read the user.db file corresponding to the user, check each line of the file, and output if the condition is met, and skip if the condition is not met. Go to the next line.
   Until the end of the table.

Very simple, next:
 suppose we also have a table called standard stature table (standard_stature.db), in which each height corresponds to a standard weight.
 We ran the following statement:

Insert picture description here

  How should this statement be handled?

  The first step is the same, read the table_structure file to obtain the table structure of the user table and the standard_stature table;
  then we ignore the following cost analysis process of the table connection and follow the rules.
  Read the user file, for each row of data, read the standard_stature file, for each row of data, generate connection tuples, check the conditions, if the conditions are met, then output

  At this time, Treeses DBMS has the function of defining tables and some data operations. But what are the problems with our Treeses DBMS?

  • Is our query efficiency really high?
      What if we can do the selection operation first? Can there be fewer connection operations? Let's make a simple calculation below!
      Assuming that the table_structure table has 1000 data and the standard_stature has 100 data, according to the original plan. (Do not consider memory)
          1000 100=100000 io.
      Now, let's filter the tables first. For example, table A meets the requirements of 500 pieces of data, and table B meets the requirements of 50 pieces of data. Then the io times of the two tables to be filtered are 1000+50=1500, and the remaining data volume is 500 For bars and 50 bars, io times are 500
    50=25000 times, and a total of 25000+1500=16500 times. Obviously, the efficiency is nearly 10 times faster, but is it really true that the number of IOs will decrease in all cases? This is left to the reader's own thinking.

  • Our tuples are actually tiled on the disk. Is the cost of ASCII storage too expensive? Is it too much trouble for us to modify the tuple of the table? Is it expensive to delete table tuples?

  • All our data is read directly from the disk. To know that the cost of disk io is very high, should we consider the buffer to optimize our data access efficiency?

  • If many users visit together, how can we ensure that the data is correct? For example, when I modify the data, you want to read the data, what should I do?

  • The database lacks indexes, is the query efficiency a bit low? Do you have to read the entire relationship every time you query data? As for why indexes can improve query efficiency, there will be detailed explanations in subsequent articles.

  • How to ensure the reliability of the database? For example, a sudden power failure? Does our database do not have a recovery mechanism when a database system fails, which is prone to data inconsistency?

  • How can the database be provided to applications? No API?

  • Is the organization of the data dictionary particularly poor?

Comprehensive evaluation, is this database system unavailable for commercial use? (Alas, it’s embarrassing to evaluate the database system I have done after many years. You can go to https://github.com/nainaiguang/Treeses.git)

实际上数据库怎么表达是个很复杂的问题,在后续的文章中会陆续的说到,上面的方式只是一个简单的说明。

Some design problems of database system

The design problems of the database system refer to some problems when we use the database system. This article provides ideas, and the specific learning is still for the reader to learn. Not the focus of this series hahahahaha

Problems that may be caused by irregular database schema design. Such as data redundancy, update exception, insert exception, delete exception and so on.

Some access problems of database system

Is how to define the database language, the database language is as follows:

  1. Database definition language (Data Definition Language DDL), database access modes
    include: create bable; alter table drop table
  2. Database operation language (Data Manipulation Language DML), database access data
    including: insert, delete, select update
  3. Database control language (Data Control Language DCL) access control information
    including: grant; recoke

The codewords are so tiring. Thank you for your support. In the next article, I will draw you a complete database architecture. Future articles will focus on this architecture core. Thank you!

Guess you like

Origin blog.csdn.net/qq_34364255/article/details/108609756