Database Design Fundamentals (2)

Level 3 Blockade Protocol

An X lock is an exclusive lock. If transaction T adds X lock to data object A, only T is allowed to read and modify A, and other transactions cannot add any type of lock to A until T releases the lock on A.

S locks are shared locks. If transaction T adds S lock to data object A, only T is allowed to read A, but cannot modify A, and other transactions can only add S lock to A (that is, it can read but not modify) until T releases the lock on A. S lock.

First-level locking protocol: Before a transaction modifies the data R, it must first add x lock to it, and it will not be released until the end of the transaction. Can solve the problem of missing updates.

Second-level blocking protocol: On the basis of the first-level blocking protocol, transaction T must add s-lock to the data R before reading it, and the s-lock can be released after reading. It can solve the problem of missing updates and reading dirty data.

Three-level blocking protocol: The first-level blocking protocol plus transaction T adds s lock to the data R before reading it, and it is not released until the end of the transaction. It can solve the problems of lost updates, reading dirty data, and repeated data reading.

database failure

The failures generated by the database system are as follows:

fault type fault cause

Internal transaction failure logic (expected), operation overflow (unexpected)

System failure System stops functioning for any event such as operating system failure, power outage

Media failure Physical media damage, the probability is small and the most destructive

Human failure and destruction of computer virus, corruption inserted in computer program

data backup

Static dump: that is, cold backup, which means that any access and modification operations to the database are not allowed during the dump:

The advantages are very fast backup method, easy archiving (direct physical copy operation);

The disadvantage is that it can only provide recovery to a certain point in time, cannot do other work, and cannot recover by table or by user.

Dynamic dump: that is, hot backup, allowing access and modification operations to the database during the dump, so the dump and user transactions can be executed concurrently;

The advantage is that it can be backed up at the tablespace or database file level, the database can be used, and recovery can be achieved in seconds.

The disadvantage is that no mistakes can be made, otherwise the consequences will be serious. If the hot backup is unsuccessful, the results obtained are almost invalid.

Full Backup: Back up all data.

Differential backup: Only back up data that has changed since the last full backup.

Incremental backup: Back up data that has changed since the last backup.

Log file: During transaction processing, the DBMS writes the start of the transaction, the end of the transaction, and each operation of inserting, deleting, and modifying the database into the log file. In the event of a failure, the recovery subsystem of the DBMS uses the log file to undo the changes made to the database by the transaction and roll back to the initial state of the transaction.

database

A data warehouse is a special kind of database, and it also stores data in the form of a database, but with different purposes: after a long period of operation, the data in the database will be stored more and more, which will affect the operating efficiency of the system. In other words, the data from a long time ago is not necessary. Therefore, it can be deleted to reduce data and increase efficiency. Considering that it is a pity to delete these data, these data are generally extracted from the database and stored in another database, called database.

From this, it can be seen that the purpose of the data warehouse is not for application, but is subject-oriented , used for data analysis, integrating different tables , and is relatively stable . inserts, reflecting historical changes .

Analytical Methods for Data Mining

Correlation analysis: Correlation analysis is mainly used to find the correlation between different events, that is, when an event occurs, another event also occurs frequently.

Sequence analysis: Sequence analysis is mainly used to discover successive events within a certain time interval, these events constitute a sequence, and the discovered sequence should have universal significance.

Classification analysis: Classification analysis obtains rules or methods for determining samples belonging to various categories by analyzing the characteristics of samples with categories. In classification analysis, each record is first assigned a tag (a group of categories with different characteristics), that is, records are classified according to the tag, and then these marked records are examined to describe the characteristics of these records.

Cluster analysis: Cluster analysis is a process of clustering samples without categories into different groups according to the principle of "clustering of objects", and describing each such group.

Business Intelligence

The BI system mainly includes four main stages: data preprocessing, data warehouse establishment, data analysis and data presentation.

Data preprocessing is the first step to integrate the original data of the enterprise, which includes three processes (ETL process) of data extraction (Extraction), transformation (Transformation) and loading (Load);

Building a data warehouse is the basis for processing massive amounts of data:

Data analysis is the key to reflect system intelligence, and two technologies, online analytical processing (OLAP) and data mining, are generally used. Online analytical processing not only performs data aggregation/aggregation, but also provides data analysis functions such as slicing, dicing, drilling down, rolling up, and rotating, allowing users to easily perform multi-dimensional analysis on massive data. The goal of data mining is to mine the hidden knowledge behind the data, establish an analysis model through methods such as association analysis, clustering and classification, and predict the future development trend of the enterprise and the problems it will face:

With the increase of massive data and analysis methods, data presentation mainly guarantees the visualization of system analysis results.

Denormalization techniques

As can be seen from the previous introduction, normalization operation can prevent insert exceptions, update, delete exceptions and data redundancy. Generally, the schema is decomposed and tables are split to achieve this purpose.

However, after the table is split, the above exception is solved, but it is not conducive to the query. Each query may have to associate many tables, which seriously reduces the query efficiency. Therefore, sometimes it is necessary to use denormalization technology to improve the query efficiency.

Technical means include: adding derived redundant columns, adding redundant columns, regrouping tables, and splitting tables.

The main purpose is to increase redundancy and improve query efficiency, which is the inverse operation of normalization operation.

Big Data

Features: Mass production, diversification, low value density, and rapidity.

The comparison between big data and traditional data is as follows:

insert image description here

To process big data, an integrated platform is generally used, called a big data processing system, which is characterized by:

High scalability, high performance, high fault tolerance, support for heterogeneous environments, short analysis latency, easy-to-use and open interfaces, low cost, and backward compatibility.

Guess you like

Origin blog.csdn.net/flysh05/article/details/124232983