Computer Rank Examination--Level 3 Database Technology

I took this test some time ago (ps: the grade test seems to be of no help to students of this major)

I want to send out my written notes, because I completed them bit by bit according to the question bank, which is relatively fragmented, but if you can memorize them all, then your objective question accuracy rate must be above 80-90%

Don't talk nonsense, take notes!


  • Project planning includes:

  • Determine the goals and scope of the project, and specify the final product of the project and the expected time, cost, and quality goals according to the work content planned and defined by the system

  • According to the DBAS software development model, decompose and define the work activities and tasks included in the entire project

  • Estimate the scale and resources required to complete the project

  • Formulate a reasonable DBAS project plan, including forecast and control plans for schedule, cost, quality, etc.

  • Four basic elements of the DFD method (structured analysis): data flow (arrows), processing (rectangular box), data storage (rounded rectangular box), external terms (rounded box or parallelogram box)

  • Requirements analysis process : identifying problems, establishing requirements models, describing requirements, and confirming requirements

  • DFD, IFEF0 (system simulation, establishment of dynamic model) can be used to establish demand model

  • In a data flow diagram, a process has at least one input stream and one output stream

  • Data needs analysis :

  • Data processing requirements analysis: From the perspective of data composition and storage design, identify various data items and data structures managed in the application field, together with the results of data processing requirements analysis, form a data dictionary and form a "data specification specification"

  • Functional requirement analysis (data processing requirement analysis and business rule requirement analysis): It mainly analyzes the functions that DBAS should have, which is the core link of DBAS requirement analysis , and can be generally divided into data processing requirement analysis and business rule requirement analysis. Data processing requirement analysis clarifies the data access operations required for each data item from the perspective of data access and processing. In the system planning and analysis stage, DBAS developers have defined various user views. Therefore, in the stage of data processing requirements analysis, we can start from this view, analyze the data processing requirements for each user view, and then summarize the analysis results of each view to obtain a complete analysis result of the system.

  • Performance requirements analysis: performance requirements describe the extent to which the system should achieve, and analyze the performance indicators that DBAS should have

  • Other requirements analysis: storage requirements, security requirements

  • Database application system requirements modeling : DFD, IDEF0 (arrows, activities), UML

  • DBAS performance indicators : data operation response time, system throughput, maximum number of users allowed for concurrent access, value per TPS

  • There are 6 stages of database system design: requirements analysis, conceptual structure design, logical structure design, physical structure design, etc.

  • IDEF1X modeling method : entity set (independent entity set "right-angled rectangle frame", dependent entity set "rounded rectangle frame"), connection (standard type connection, non-standard type connection, classification connection, non-deterministic connection)

  • Non-standard relationship: In a "deterministic relationship", if each instance of the child entity set can be uniquely confirmed without knowing the instance of the parent entity set associated with it (one-to-many, no subordinate entity set)

  • Principles of indexing :

  • Columns that are often used as conditions in queries

  • Frequent grouping (group by) or sorting (order by)

  • A column with a large value range

  • There are multiple columns to be sorted, and composite indexes should be built on these columns

  • System tools can be used to check the integrity of the index and repair it if necessary

  • Embodying aspects of data distributed tasks : physical distribution of different types of data, division and distribution of application data, distribution of derived attribute data, denormalization of relational schema

  • Database physical design stage : database logic design, selection or configuration of the file organization form of basic relational tables, design of data access methods or access paths for basic relational tables, data distribution design, security mode design, determination of system configuration, physical mode evaluation

  • The main activities in the physical design stage include: determining the storage structure, selecting and adjusting the access path, determining the data storage location and determining the storage allocation

  • Presentation layer detailed design : Prototype iteration method: preliminary design, user interface detailed design, prototype design and improvement (detailed design)

  • Data security design : security protection, integrity protection, concurrency control, data backup and recovery, data encrypted transmission

  • Database backup and recovery strategy : dual-machine hot backup, data dump, data encryption storage

  • Data encryption transmission : digital security certificate, symmetric key encryption, digital signature, digital envelope

  • Environmental security design : regularly find vulnerabilities and update patches; anti-virus software, real-time monitoring; firewall, intrusion detection system, network isolation (logic isolation and physical isolation); physical environment security (anti-theft facilities, UPS, temperature and humidity alarms)

  • DBAS implementation : creating database; data loading; writing and debugging application program; trial operation of database system

  • Data constraint types : primary key constraints, foreign key constraints, unique constraints, default constraints, check constraints

  • Transaction outline design : transaction name, relational table and relational attributes accessed by the transaction, transaction processing logic, transaction user (referring to the software module or system that uses, starts, and invokes the transaction)

  • UML composition: meta-metamodel, meta-model, model layer, user model

  • UML5 views (DBAS macro): structure, behavior, implementation, environment, use case

  • structure diagram

  • Class diagram: A static view showing a set of classes, interfaces, collaborations, and relationships among them. It is mainly used to express the conceptual model of the problem domain. In addition to expressing the name of the abstract concept, it also needs to express the attributes and methods of the abstract concept

  • object graph

  • Composite Structure Diagram

  • Package Diagram: A class diagram that represents packages and the relationships between packages

  • Component Diagram: Shows the dependencies of the software on other software components in the system . It can be displayed at a very high level so that only coarse-grained components are displayed, or it can be displayed at the component package level to model the source code, the release of executable programs, etc.

  • Deployment Diagram (Configuration Diagram): Describes the physical configuration of the hardware and software in the system and the system architecture

  • Behavior diagram

  • Use case diagram: functional modeling, composed of use cases (functional modules, system internals, ellipse, including extension, use, combination ), roles, system composition (use case view), dynamic structure, expressing system functional requirements, non-DBAS system internal structure

  • interactive diagram

  • Sequence diagram: It is used to describe the sequence of message sending and receiving between objects in the system, vertically represents the continuous process of time, horizontally represents objects, and describes the views of objects themselves and information transfer between objects

  • Communication Diagram (Collaboration Diagram): Used to describe how objects interact in space, it also directly describes how objects are connected together . There is no time axis in the figure, but the messages are numbered in sequence, a series of connections between objects and messages sent and received between objects, the relationship between objects in the system , and the order of interaction is not emphasized

  • Interaction Overview Diagram

  • State diagram: describe the state change of an entity when some time occurs, one start, there can be multiple ends

  • Activity diagram: used to describe the execution sequence of logical processes in the system, use cases, and program modules, and to describe the transfer of process control between activities and activities, with one starting point and multiple ending points

  • DBAS micro design

  • State machine diagram:

  • Object graph:

  • Time diagram: When the transition of the state is closely related to time, it also emphasizes the important role of the time factor in the process of state transition

  • In the state machine diagram of UML, the transition between states is driven by events

  • The use case model expresses the functional requirements of the system by describing the system participants and their important behaviors

  • Aggregation is a special form of association that represents a whole-part relationship between classes

  • From the perspective of function, the database application system is divided into 4 levels:

  • The presentation layer is responsible for all the functions of interaction with the user, and the most intuitive feeling of the user to the database application system is realized in the layer

  • The business logic layer is responsible for following up the business logic. It needs to organize the data obtained by the presentation layer and pass it to the data access layer, or process the data obtained by the data access layer accordingly and pass it to the presentation layer for display

  • The data access layer is responsible for interacting with the DBMS system, extracting or storing the data required by the application system

  • The data persistence layer is responsible for saving and managing application system data. Adjust the organizational structure of data files according to the transaction-basic table cross-reference matrix, involving changes in application system data, and establish appropriate indexes

  • The object of the database integrity constraint : column, tuple, relation, or table

  • Outline design principles of the business logic layer:

  • The build itself should consist of highly interrelated code, with a component or a module responsible for only one task

  • Each component of the business logic layer of the component system should have independent functions and minimize overlap with other building functions

  • The interface between components should be as simple and clear as possible

  • If the relationship between two components is complicated, further module division should be considered

  • If the component is too complex, it can be subdivided

  • Improve transaction throughput :

  • Access resources in the same order

  • Avoid user interaction in transactions

  • Use the small transaction mode to minimize the length of the transaction and reduce the time of occupying the lock

  • Use record-level locks as much as possible, and use less table-level locks

  • Use bound connections to enable two or more connections opened by the same application to cooperate with each other

  • From the perspective of data storage security :

  • security protection

  • user authentication

  • access control

  • view control

  • integrity protection

  • concurrency control

  • Database backup and recovery

  • Data encrypted transmission

  • Environmental security design : vulnerabilities and patches, computer virus protection, network environment security, physical environment security

  • Reduce the transaction isolation level : improve transaction throughput, increase the possibility of livelock, reduce the possibility of deadlock and blocking

  • Physical Design:

  • Database physical structure design

  • Database logic schema adjustment

  • File organization and access design

  • Data distribution design

  • Safe Mode Design

  • Determine system configuration

  • physical model evaluation

  • Detailed Design of Database Transactions

  • Application physical structure design

  • The larger the blocking granularity and the smaller the concurrency, the smaller the system overhead; the smaller the blocking granularity, the greater the concurrency, and the greater the system overhead

  • The object of the integrity constraint:

  • Column: its value type, range, precision, sorting, etc.

  • Tuple: the connection between the various attributes in the record , etc.

  • Relationship: Constraints on the connection between several records, on the set of relationships, and between relationships

  • Two-stage lock protocol (to ensure the serializability of transaction scheduling ):

  • Before reading or writing any data, apply for and obtain the blockade of the data

  • After releasing a block, the transaction is not applying for and acquiring any other block

  • Database integrity:

  • entity integrity

  • primary key in create table

  • referential integrity

  • foreign key in create table

  • User Defined Integrity

  • not null、unique、check

  • Overall system design: determine DBAS architecture, software and hardware selection, configuration design, overall design of application software, preliminary design of business planning

  • SQL server system database:

  • master: records all system-level information of the SQL Server instance

  • msdb: stores the job information of the SQL Server instance. The job is a collection of a series of operations defined in SQL Server that are automatically executed. The execution of the job does not require any manual intervention

  • tempdb: tempdb is recreated every time SQL Server is started

  • model: Modifications to the model database will be applied to all user databases created in the future

  • resource:

  • Database three-level mode: internal mode (storage mode or physical mode, the bottom layer, unique), mode (logical mode), external mode (sub-mode or user mode, multiple)

  • One page of data is 8KB, one row of data is xKB, a total of n rows of data, storage space=8*(8//x)*n, space utilization=(8//x)*x/8

  • DBAS system implementation and deployment expression

  • System Implementation and Component Diagram

  • System Implementation and Deployment Diagram

  • Implementation and deployment of database application system includes

  • Create database structure

  • data loading

  • Coding and testing of transactions and applications

  • System integration, testing and commissioning

  • system deployment

  • RecordSet

  • AddNew, create a new record

  • Cancel, cancel an execution

  • Close, close a RecordSet

  • Delete, delete a record or a group of records

  • MoveNext, move the record pointer to the next record

  • UML micro design: object diagram, state machine diagram, time diagram

  • class diagram representation

  • Hollow triangle solid line: class, inheritance relationship

  • Hollow triangle dotted line: interface, implementation relationship

  • Hollow diamond-shaped solid line: Aggregation, weak ownership, A object contains B, B object is not part of A

  • Solid diamond-shaped lines: composition, strong ownership, whole-part relationship

select[distinct][top n] select_list
[into new_table]
[from table_source] [
where search_condition]
[groupby group_by_expression]
[having search_condition]
[orderby order_expression [ASC|DESC]]
[compute expression]
into: save the query result to the corresponding location
from: which table to query
where: query condition group by : conditions for grouping query results having: specify conditions for aggregate query order by: sort query
results ; asc ascending order, desc descending order compute: generate summary data rows at the end of the result set top n[percent][with ties]: the first few records/n% records of the query n: the first few rows [percent]: the first n% rows [with ties]: including the last row value






  • Distribution transparency (level from large to small, fragmentation-location-local):

  • Fragmentation transparency: the highest level, which means that users or applications only operate on global relations without considering relational fragmentation

  • Location Transparency: The next level, meaning that the user or application only needs to know about the data fragments, not where the fragments are stored

  • Local data model transparency: the user or application does not have to know which data model is used on the local site, but must be aware of the fragmentation of the global data, as well as the copy replication of each fragment and the location allocation of fragments and their replicas

  • Support:

s=(a, b exist at the same time)/total situation

  • Confidence:

c=support degree/support count (the number of occurrences of the former)

  • Metadata is data about data, or data that describes data. Metadata describes the structure, content, links, and indexes of data

  • Views in relational databases provide logical data independence

  • The global variable used to judge the status of cursor data fetching is @@FETCH_STATUS

  • Set operations include: UNION, INTERSECT (intersection), EXCEPT (difference)

  • The function to calculate the difference between two dates is dayediff()

  • The system role that only has permission to modify the data of all user tables in the database is db_datawriter

  • A dump that copies only the data that has changed since the most recent full database dump is called a differential dump

  • In a distributed database, the use of semi-join operations can reduce the amount of data transmission between sites

  • When performing multidimensional analysis, if annual sales are projected onto each month for observation, this analysis action is called drilling

  • Drilling Operations: Switching from Highly Granular Level Data Views to Low Granular Level Data Views in Multidimensional Data Analysis

  • In a data warehouse, metadata is mainly divided into two categories : technical metadata and business metadata.

  • Schema/internal schema guarantees physical independence between data in the database and applications

  • External schemas/patterns ensure logical independence between data and applications

  • Database application system design:

  • Conceptual design: adopt top-down ER design

  • Logical Design: Design Views and Integrity Constraints for Relational Schemas

  • Physical design: convert relational schema into relational tables supported by specific DBMS platforms

  • The content of the database application system logic design work is divided into three parts:

  • Database logical structure design

  • Database transaction outline design

  • Application outline design

  • Requirements for database application systems:

  • Data requirements analysis: starting from the user view, analyze and identify various data items and data structures managed in the application field, and form the main content of the data dictionary

  • Data Processing Requirements Analysis

  • business needs analysis

  • requirements in terms of its performance, storage, security, backup and recovery, etc.

  • On the premise of ensuring the consistency of the database, put multiple divisible processing procedures with frequent operations in multiple stored procedures, which can greatly improve the response speed of the system

  • Using cursors will take up more system resources, especially in the case of large-scale concurrency, it is easy to cause the system to run out of volunteers and crash

  • Using temporary tables can speed up queries, correlated subqueries cannot

  • create index

  • nonclustered: non-clustered index (default)

  • clustered: clustered index

  • unique: Merge

create[unique][nonclustered]index<index name>

on<table name>(<column name>[<order>],<column name>[<order>],) include(attribute name) where constraint

  • Steps to create an indexed view:

  1. Use the schemabinding clause to create a view. This view must meet a number of requirements, for example, it must only refer to base tables in the same database, not other standard views. All referenced functions must be deterministic, none of the rowset functions, derived tables, and subqueries can be used in indexed views

  1. Create a unique index on the view. The leaf level of this index consists of the full result set of the view

  1. Create a non-clustered index based on the clustered index as required. Nonclustered indexes can be created in the usual way

  1. Create and use indexed views

  • Add a new data file to the database

alterdatabase<databaseName>add file(
name=<fileName>,
filename='<文件存储路径全称[D:\DB1\filex.ndf]>',
[filegrowth=[XX%]])
  • Modify data file size

alterdatabase<database name> modify file (name=<data file name>, size=<file capacity, must be larger than the original>)

  • Indexed views can improve:

  • Handle joins and aggregations of large numbers of rows

  • Many queries frequently perform joins and aggregations

  • Decision Support Workload

  • Indexed views generally do not improve:

  • OLTP systems with heavy write operations

  • database with lots of updates

  • Queries that don't involve aggregations or joins

  • GROUP BY keys aggregate data with high cardinality. High cardinality means that the column contains many distinct values. When the column value followed by GROUP BY contains many different values, resulting in the same number of rows in the view graph and the table, then using indexed views for this column cannot improve query efficiency

  • A database contains only one main data file ( .mdf is recommended ), multiple secondary data files ( .ndf is recommended ), and at least one log file. Log files are not included in the filegroup. A file cannot be a member of more than one filegroup,

  • separate database

  • When detaching a database, not only the data files but also the log files are detached

  • The SQL Server service cannot be stopped during the operation of detaching the database

  • All users must be disconnected from the database before detaching it

  • Files can be stored in a different location when you attach a database than they are when you detach a database

  • The attached database name can be different from the detached database name

  • The essence of the partition table is to store data subsets that meet different standards in one or more file groups of a database, and express the logical address of the data storage through metadata

  • file group

  • A database can contain multiple filegroups

  • A filegroup can contain multiple data files

  • A datafile cannot be a member of more than one filegroup

  • The primary filegroup is a system-defined filegroup that contains the main data files and any other data files that are not explicitly assigned to other filegroups. If the secondary data files are not allocated to other filegroups, they can also be stored in the primary filegroup

  • Partition Table

  • The partition table mechanism divides the data of a table into multiple data subsets according to certain conditions

  • Reasonable use of partition table technology can improve the overall performance of the database

  • The partition table mechanism is to physically divide a table into several partitions

  • Whether to create a partition table mainly depends on the current data volume and future data volume of the table, and also depends on how to operate the data in the table

  • If the amount of data in the table is huge and the data is segmented, this table is more suitable for partitioning

  • The number of file groups specified when creating a partition scheme must not be less than the number of partitions generated by the partition function, otherwise an error message will be returned

  • Users do not need to consider which table partition is being operated when using a partitioned table, and the partition is transparent to the user

  • The purpose of creating a partition function is to tell the database management system how to partition the table

  • view

  • The view corresponds to the external schema of the database, so it can provide a certain degree of logical independence

  • Views are virtual tables whose data is not actually stored in the database

  • When querying data through a view, it will eventually be converted into a query on the basic table

  • Views can be defined on views (since the result class returned by the view is in the same format as the base table)

  • Query Processor and Storage Manager

  • The DML compiler of the query processor optimizes the DML statement submitted by the user and converts it into executable underlying database operation instructions

  • The main modules in the query processor are query compiler and query executor , which are responsible for parsing and executing DML statements

  • The buffer manager in the storage manager is responsible for putting the data blocks read from the disk into the memory buffer , and is also responsible for maintaining the data blocks in the buffer

  • The DDL compiler in the query processor compiles or interprets the DDL statement submitted by the user and stores the produced metadata in the data dictionary of the database

  • multi-table join

  • left join (left join): Returns all records including all records in the left table and the records with the same join field in the right table

  • right join (right join): Returns all records including all records in the right table and records equal to the join field in the left table

  • Inner join (equivalent join): returns only the rows where the joined fields in the two tables are equal

  • cursor

  • Each cursor has a current row pointer, when the cursor is opened, the current row pointer automatically points to the first row of data in the result set

  • If the INSENSITIVE option is not specified when declaring the cursor, committed updates to the base table will be reflected in subsequent fetch operations

  • After the cursor is closed, the cursor can be opened again by OPEN

  • The cursor consists of two parts: the cursor result set and the current row pointer of the cursor

  • NEXT is the only fetch option supported if SCROLL is not specified when declaring the cursor

  • After the FETCH operation on the cursor , you can use the @@FETCH_STATUS variable to judge the data extraction status. When "@@FETCH_STATUS=0, it means that the FETCH statement is successful, -1 means failure or not in the result set, -2 means the extracted row does not exist

  • After using the CLOSE statement to close the cursor, you need to use the DEALLOCATE command to release the resources allocated by the system for the cursor

  • NEXT: Follow up the current row to return the result row, and the current row is incremented as the returned row

  • PRIOR: the search returns the result row immediately preceding the current row and the current row is decremented to the returned row

  • FIRST: returns the first row in the cursor and makes it the current row

  • LASR: Returns the last row in the cursor to search for it as the current row

  • ABSOLUTE n|@nvar: If n is positive , it returns the nth row backward from the cursor head , and the returned row becomes the new current row. If n is negative , returns the nth row forward from the end of the cursor , and makes the returned row the new current row. If n is 0 , nothing is returned . n must be an integer constant, and the @nvar data type must be smallint, tinyint, int

  • RELATIVE n|@nvar: If n is positive , it will return the nth row backward from the cursor head , and the returned row will become the new current row. If n is negative , returns the nth row forward from the end of the cursor , and makes the returned row the new current row. If n is 0 , returns the current row . On the first fetch for a cursor, if FETCH RELATIVE is specified with n or @nvar set to a negative number or 0, no rows are returned, n must be an integer constant, and @nvar must be of data type smallint, tinyint, int

  • declare cursor

DECLARE<cursor name>cursorFOR<SQL statement>

  • inline table-valued function

  • In an inline table-valued function, there is no associated return variable

  • An inline table-valued function populates the table value returned by the function via a SELECT statement

  • Inline table-valued functions act like views with parameters

  • When calling an inline table-valued function, only place the inline table-valued function in the FROM clause

  • trigger

  • Responsible integrity constraints can be implemented

  • For DML type triggers, the actions that trigger triggers can only be INSERT, DELETE, UPDATE (or three types)

  • Using triggers to achieve data integrity is generally less efficient than CHECK constraints

  • A table can create multiple post-triggers, but only one pre-trigger

  • DELETED tables are used to store copies of rows affected by DELETE and UPDATE statements; INSERTED tables are used to store copies of rows affected by INSERT and UPDATE statements

  • DELETED tables and INSERTED tables are not generated at the same time when the trigger is executed

  • INSTEAD OF: before the trigger

  • grammar

CreateTRIGGER trigger_name
ON{table|view}{
{
    
    {FOR|AFTER|INSTEAD OF}{[INSERT][,][UPDATE]}}
AS
[{IFUPDATE(column)
[AND|or]}UPDATE(column)]
[...n]
|IF(COLUMNS_Updated(){bitwise_operator}update_bitmask)
{comparison_operator}column_bitmask[...n]
}]
 sql_statement[...n]
}
}
  • authorization statement

GRANT{ALL[PRIVILEGES]}
| permission [(column[,...n])][,...n]
[ON  [class::] securable]TO principal [,...n]]
[WITHGRANTOPTION][AS principal]
  • Database mandatory access control method rules:

  • The subject can read the object of the response only if the subject's permission level is greater than or equal to the object's confidentiality level

  • Only if the permission level of the subject is less than or equal to the security level of the object, the subject can write the corresponding object

  • safety management

  • SQL Server 2008 supports two authentication modes: Windows authentication mode and mixed authentication mode

  • In "Mixed Authentication Mode", allow Windows users and non-Windows users to log in to SQL Server

  • Only members of the system management group have permission to log in to SQL Server for Windows users

  • Only in "mixed authentication mode", sa can log in to SQL Server

  • sa is the default system administrator of SQL Server, not a Windows user

SQL Server Fixed Database Roles

permissions

db_owner

Has the authority to perform all operations in the database, including configuring, maintaining, and deleting the database

db_accessadmin

Ability to add or remove database users

db_securityadmin

Has authority to manage database roles, role membership, statements, and objects in the database

db_ddladmin

Has permission to execute Data Definition Language (DDL)

db_backupoperator

Has the authority to back up the database and backup logs

db_datareader

Has permission to query all user data in the database

db_datawriter

Has permission to insert, delete, and update all user data in the database

db_denydatareader

The permission to query all user data in the database is not allowed, which is equivalent to granting DENY SELECT permission to all views and tables

db_denydatawriter

Do not allow permission to INSERT, DELETE, UPDATE all user data in the database

  • Four types of safety levels

  • Class A provides authentication protection

  • Class B provides mandatory protection

  • Class C provides autonomous protection

  • Class D provides minimum protection

  • Database users can be divided into: system administrators, object owners, ordinary users

  • index

  • In most attribute indexes, the order of indexed attributes is sorted according to their discrimination

  • The hash index constructs the index according to the HASH algorithm, and the index retrieval speed is fast, but it cannot be used for range query

  • Sparse index: if the index file only contains part of the search code in the data file

  • The daily management work of the database administrator:

  • System Monitoring and Analysis

  • System performance optimization and adjustment

  • System Upgrade

  • concurrency control

  • storage space management

  • security maintenance

  • integrity maintenance

  • backup and restore

  • Data dump: static, dynamic, full, differential, incremental

  • During the static dump process, the database cannot run other transactions, and no modification activities are allowed

  • Only using a full dump will generate a large amount of data transmission, which takes up a lot of time and space, and may even affect the normal operation of the business system

  • Incremental dump can only be used in conjunction with full dump for database recovery, and the data recovery time of incremental dump is longer than that of full dump only

  • A differential dump is a dump of the data changes that have occurred since the last full database dump. Differential dumps are faster and take up less space than full dumps. Incremental dumps only copy files or data blocks that have changed since the last dump

  • When formulating a backup strategy, in addition to considering the amount of data lost when using backup recovery, you also need to consider the time required for database backup.

  • Checkpointing greatly reduces the portion of the log that must be performed to fully restore the database

  • Although the static dump ensures the validity of the data, it is at the cost of reducing the availability of the database; although the dynamic dump improves the availability of the database, the validity of the data may not be guaranteed.

  • Compared with incremental dump, differential dump is slower and takes up more space, but the recovery speed is faster than incremental dump

  • Differential dump takes longer to restore than full dump

  • checking point

  • The recorded content includes: a list of all transactions being executed at the moment of establishing the checkpoint, and the addresses of the latest log records of these foods

  • The recovery subsystem can take occasional or periodic checkpoints to save the database state

  • When the system is recovering, if the transaction was not completed when the failure occurred, it should UNDO

  • If the transactions are committed after the checkpoint, the modifications they made to the database may still be in the buffer when the failure occurs, and have not been written to the database, so REDO

  • If the transaction is committed before the checkpoint, it is not necessary to perform the REDO operation

  • The checkpoint verifies the validity of the log, and the log can be repaired to a certain extent when the log is damaged

  • The checkpoint ensures that the REDO and UNDO operations can be executed concurrently when the database is restored

  • The database administrator should set up checkpoints regularly to ensure that the database system can recover quickly when it fails

  • The contents of the checkpoint record include the list of transactions being executed when the checkpoint is established and the address of the last log record of these transactions

  • While establishing a checkpoint, the database management system will record all the data in the current data buffer into the database

  • Checkpoints should be automatically established by the database recovery subsystem on a regular or irregular basis, and should not be manually established by the database administrator

  • When recovering using checkpoints, you need to find the address of the last checkpoint recorded in the log file from the "restart file".

  • The active-standby mode (Active-Standby mode) means that one server is in the active state (Active state) of a certain service, and the other server is in the standby state (Standby state) of the service. The feature of this method is that when the Active state server fails, the Standby machine will be activated through software diagnosis to ensure that the system can be restored to use in the shortest possible time.

  • Database recovery sequence

  1. Restoring the latest full database backup without restoring the database

  1. If a differential backup exists, restore the latest differential backup without restoring the database

  1. Restore logs sequentially using the NORECOVERY option, starting with the first transaction log created after the last restore backup

  1. Restore the database, this step can also be combined with restoring the last log backup

  • backup statement

BACKUP DATABASE database name TO MyBK_1 WITH DIFFERENTIAL,NOINIT

  • DIFFERENTIAL: Indicates differential database backup

  • NOINIT: Indicates that the backup content will be appended to the specified media set to retain the original backup set

  • SQL Server 2008 supports three recovery models

  • Simple recovery mode: only used for test and development databases, or for databases that mainly contain read-only data (such as data warehouses), this mode is not suitable for production systems, no log backup, automatic recovery of log space to reduce space requirements, in fact no need to manage transaction log space. Changes made after the latest backup are not protected. In the event of a disaster, these changes must be redone. Can only restore to the end of the backup

  • Full recovery mode: The system data volume is large, but the data changes are small. Log backups are required. Loss or corruption of data will not result in lost work. Can be restored to any point in time (e.g. before application or user error)

  • Large-capacity log recovery mode: Generally, it is only used as an additional mode of the complete recovery mode, and this mode does not support point-in-time recovery. Log backups are required. Perform high-performance bulk copy operations. Reduce log space usage by logging most bulk operations in a minimal manner

  • database recovery

  • This database is offline as part of a full database restore and recovery

  • Before restoring the database, if the database log is not damaged, a tail log backup can be performed to reduce data loss

  • During a database restore, the database can be moved to another location

  • SQL Server supports the restore operation of a data file in the database. During the restore process, the database is automatically offline, and other files cannot be read or written, so it will be affected to a certain extent.

  • backup

  • Full backup: The first database backup requires a full backup, which is to back up all the contents of the database

  • Differential backup: backs up the modified part of the database after the most recent full backup of the backup database

  • Log backup: back up the log content since the previous backup, and cannot restore the physical damage of the database. The sequence of each transaction log backup must be started after performing a full backup or a differential backup

  • Backup device can be disk, tape

  • The backup device can be a local device or a remote network device

  • Two backup methods

  • Create a backup device, and then back up the database to the backup device (permanent backup device)

  • Back up the database directly to a physical file (temporary backup device)

  • The T-SQL stored procedure for creating a backup device is sp_addumpdevice

  • Transaction log backup is only used for complete recovery and bulk log recovery. It does not back up the database itself, but only log records, and only backs up log records that have changed since the last backup to the current backup time. However, point-in-time recovery of large-capacity operation log backups is not allowed.

  • Tail log backup is performed when a failure occurs, used to prevent data loss, and can contain pure log records or large-capacity operation log records

  • Distributed databases use data fragmentation to manage data:

  • Integrity principle: that is, all data items of the global relationship must be included in a certain fragment , otherwise the database will be incomplete and some fragment data will be lost

  • Refactorability principle: that is, all fragments must be able to restore the global relationship

  • Disjoint principle (except for the primary key of vertical sharding): For a global relationship, to ensure that data is not lost, it must belong to a certain segment, that is, it is not allowed to belong to any segment, and some global data is not allowed, that is, some segments that belong to the global relationship and belong to another segment of the global relationship (except for the code attribute in the vertical relationship)

  • In a distributed database, an allocation schema is used to describe the mapping of fragments to physical storage locations

  • Allocation mode:

  • Centralized: Arrange all data fragments in one venue

  • Segmentation: There is one and only one copy of all global data, which is divided into several fragments, and each fragment is assigned to a specific site

  • Full replication: There are multiple copies of global data, one full copy for each site

  • Hybrid: global data is divided into several data subsets, each subset is arranged in one or more different venues, and each venue does not necessarily save all data

  • Distributed database fragmentation type:

  • Hybrid sharding: it is a mixture of the other three , with different sequences and different results

  • Horizontal sharding: Divide all tuples of the global relationship into several disjoint subsets according to certain conditions , and each subset is a fragment of the relationship

  • Vertical sharding: divide the attribute set of a global relationship into several subsets , and perform projection operations on these subsets

  • Export sharding: also known as exporting horizontal sharding, that is, the condition of horizontal sharding is not the condition of this relationship attribute, but the condition of other relationship attributes

  • The goal of distributed data: local autonomy, non-centralized management, high availability, location independence, data fragmentation independence, data replication independence, distributed query processing, distributed transaction management, hardware independence, operating system independence, network independence, database management system independence

  • The goal of parallel database is high performance and high availability . By processing multiple processing nodes to execute database tasks in parallel, the performance and availability of the entire database system can be improved.

  • SQL Server2008 permission level:

  • GRANT: Allows a database user or role to perform authorized operations

  • DENY: Deny a specific privilege to a database user or role, and prevent them from inheriting this privilege from other roles

  • REVOKE: revoke permissions that have been granted

  • In the process of data warehouse design and construction, the designer needs to investigate the user's decision-making or data processing requirements, classify the requirements with similar functions and need associated data support, obtain different demand sets, and search for data sets that can meet each set of needs in the enterprise data model, and then design the data warehouse model for each data set. This design method is called "subject-oriented" design method

  • OLAP implementation technologies are mainly divided into three categories

  • Based on relational database (ROLAP)

  • Based on multidimensional database (MOLAP)

  • Hybrid (HOLAP)

  • ODS is an optional part of the data warehouse architecture. ODS has some characteristics of the data warehouse and some characteristics of the OLTP system. It is "subject-oriented, integrated, current or near current, and constantly changing" data.

  • The first type of ODS, the update frequency is second level

  • The second type of ODS, the update frequency is hourly

  • The third type of ODS, the update frequency is a day level

  • ODSIV, the fourth type of ODS is divided according to the direction and type of data source

  • In parallel databases, the most suitable data division method for full table scan operations is the round robin method

  • Google's cloud database is a distributed structured data storage system called Big table

  • Knowledge discovery is mainly composed of three steps, data preparation, data mining, interpretation and evaluation of results

  • In a distributed database, if the user does not need to know the distribution of data fragments in each site when writing a program, the distributed database system is said to have location transparency

  • Set Operators in SQL

  • IN, determines whether a given value matches a value in the subquery list , allowing it to select rows that match any of the values ​​in the list

  • EXCEPT refers to data that exists in the first collection but does not exist in the second collection

  • INTERSECT, which refers to data that exists in both collections

  • UNION, the operator is used to combine the result sets of two or more SELECT statements . For the UNION operator, the inner SELECT statements must have the same number of columns , and the columns must not have similar data types . Merge the result sets of two or more query statements, and automatically delete duplicate records in the merged result set , horizontal split using

  • Delete user-defined functions using the DROP FUNCTION statement

DROPFUNCTION{[schema_name.] function_name}[,...n]

  • CREATE FUNCTION: define a new function

  • ALTER FUNCTION: Modify the definition of a function

  • The essence of the partition table is to store data subsets that meet different standards in one or more file groups of a database, and express the logical address of the data storage through metadata

  • Data warehouse data maintenance strategy:

  • Snapshot : This method takes a "photograph" of the current data table, records the "photo" of the current data table information, and then compares the current "photo" with the previous data table "photo". If there is any inconsistency, it will be passed to the data warehouse in a certain way to achieve data consistency. This method is suitable for data tables with low update frequency. The trigger condition is time

  • Real-time maintenance: when the data source changes, the data warehouse is updated immediately, and the trigger condition is the update operation of the data

  • Delayed maintenance: It is not completed in the data source update transaction. The update is completed when the view of the data warehouse is queried. When the trigger condition is triggered, the user queries the data warehouse for the first time after updating the data source

  • There are student table (student number, name, department) and course selection table (student number, course number, grade), now we need to use the window function to query the name, department, and number of courses of each student (excluding students who have not selected courses)

SELECTDISTINCT name, department, COUNT(*) OVER (PARTITIONBY T1. student number) AS number of courses selected FROM student table T1 JOIN course selection table T2 ON T1. student number = T2. student number

(PARTITION BY T1. student number)

  • The SQL statement that defines a unique non-clustered index (so named idx1) on the c1 column of the T table

CREATEUNIQUE NONCLUSTERED INDEX idx1 ON T(c1)

UNIQUE NONCLUSTERED

  • The range division method divides the data file into n parts according to the value range of a certain attribute in the relationship, and puts them on the disk respectively. This method is suitable for range query and point query

  • Operation Management and Maintenance

  • Daily maintenance, the responsibilities of the database administrator (the main content of database maintenance):

  • Database dump and restore

  • As a database administrator, you should formulate a reasonable dump plan for various data, and regularly back up the database and log files to ensure that once the database fails, it is also capable of returning to a normal state

  • Database security, integrity control

  • Create a new database user

  • The integrity constraints of the database will change and need to be continuously revised by the database administrator to meet the needs of users

  • Database performance testing and improvement

  • The database administrator should always check the operation of the database system and observe the dynamic changes of the database so that it can recover in time when the database fails or take other effective measures to protect the database

  • Database Reorganization and Refactoring

  • Refactoring: Resizing Disk Partitions

  • recombine

  • Database administrators should regularly reorganize the database, that is, make overall adjustments to the database storage space according to system design requirements, such as adjusting disk partition methods and storage space, rearranging data storage

  • Monitoring and analysis: administrators use tools to monitor the running status of DBMS

  • Benchmark program evaluation (assessment of the overall operating status of the DBMS)

  • Grasp the current or forgotten load, configuration, application and other information of the system

  • Analyze performance parameters and environmental information from monitoring data

  • Performance tuning

  • System Upgrade

  • Transaction internal faults can be divided into expected and unexpected, and operation overflow faults are unexpected internal faults of transactions

  • In parallel databases, the shared-nothing structure is considered to be the best parallel structure to support parallel database systems , suitable for applications such as bank tellers

  • with grant option: If you want a user to grant his permissions to other users

  • In addition to deleting or creating indexes, as well as the conversion between non-clustered indexes and clustered indexes, the performance of the system can be improved by rebuilding indexes

  • There are two sources of login accounts:

  • SQL Server itself is responsible for the authenticated login user

  • A Windows network user who logs in to SQL Server, which can be a group account or a user account

  • Create table SQL template

createtable<表名>
(<字段名><数据类型>primarykey,
<字段名><数据类型>,
...
foreignkey<外键字段名>references<目标表明>(<目标字段名>)
);
  • Transaction specification: transaction name, transaction description, data items accessed by the transaction, transaction user

  • exists: When there is data that meets the condition in the subquery, exists returns a true value, otherwise returns a false value

  • Derived redundant columns : Refers to the added columns in the table are generated by calculation of some data items in the table. Its role is to reduce join operations at query time and avoid the use of aggregate functions.

  • After the database is created, you can manually expand the space of data files and log files

  • Count the number of different values ​​in the column: COUNNT(DISTINCT C1)

  • The query result does not select the data record of XXX: NOT EXISTS

  • Database mirroring is divided into: high availability operation mode, high protection operation mode, high performance operation mode

  • There are many architectures for parallel databases

  • Shared memory structure: All processors share a common main memory structure through the network

  • shared disk structure

  • Shared-nothing structure: high-performance computers are replaced by multiple smaller systems, global data directories are implemented between sites, and each site has independent memory and disk corresponding to the server of the site

  • Hierarchical structure: divided into two layers, the top layer is a shared nothing structure, and the bottom layer is a shared memory or shared disk structure

  • In the data warehouse, the method of maintaining the data based on the original data of the maintenance object according to the change of the data source is called the incremental maintenance method.

  • Cloud computing provides providers with powerful computing power, storage and broadband resources by concentrating all computing resources and adopting hardware virtualization technology.

  • One-dimensional data partition

  • The result of hash division is suitable for point query and sequential scan

  • Compared with the round-robin method, both range partitioning and hash partitioning are more suitable for point queries

  • Although range division may cause uneven data distribution, it is very beneficial to range query and point query

  • In order to carry out effective database file organization and access path design, it is necessary to analyze and understand the data access characteristics of database transactions. Transaction analysis can be done according to:

  • Cross-Reference Matrix Using Transaction-Basic Tables

  • Estimate the execution frequency of each transaction (the number of transaction executions per unit time)

  • For each basic table, summarize the operation frequency information of all transactions acting on the table

  • Define stored procedures

CREATE PROC <名称>
@aint, @<参数名称><数据类型>  <[是否是输出,是为output,否不写]>
  • stored procedure

  • allow default

  • Multiple input parameters are allowed

  • Can return multiple values ​​to the caller as input parameters

  • Return status to the caller whether the execution was successful

  • The log file is not included in the file group, and the log space and data space are managed separately

  • partition function

CREATEPARTITIONFUNCTION<partition_function_name>(input_parma_type)
ASRANGE[LEFT|RIGHT]#边界值,右-右侧数值取不到
FORVALUES([boundary_value[,...n]])
[;]

Among them, n specifies the number of values ​​provided by boundaryValue, and n is less than 999. Number of partitions created = n+1.

  • split table

  • Horizontal split: split according to the usage characteristics of data rows. After splitting, all the tables obtained have the same structure and different stored data. It will increase the complexity of the application, especially when querying all data, you need to add Union and operate

  • Vertical splitting: it is splitting according to the characteristics of the columns. After splitting, all the tables obtained after splitting contain the primary key column, and the rest of the columns are different. The number of I/Os will be reduced when querying, but the disadvantage is that when querying all data, a Join connection operation is required.

  • A transaction failure indicates that the transaction ended without committing or undoing , and the database may be left in an inaccurate state . Therefore, the recovery program must be forcibly rolled back . Under the condition that the transaction has no effect on other things , the log file is used to undo its modification to the database , so that the database can be restored to the state before the transaction was run.

  • Formulation of backup strategy

  • Define the type and frequency of backups

  • The characteristics and speed of the hardware required for backup

  • backup test method

  • Where and how to store backup media

  • The query cost of distributed database is measured by I/O cost, CPU cost and communication cost. Communication cost when primary goal

  • The query cost of the centralized database is measured by I/O cost and CPU cost

  • One-dimensional data partitioning divides the entire relationship according to the value of a certain attribute of the relationship, which is called the partition attribute. One-dimensional data division includes round-robin, range division, and hash division .

  • Multidimensional data partitioning solves the problems of one-dimensional data partitioning. Multidimensional data partitioning divides the attributes of relation R into main partitioning attributes and auxiliary partitioning attributes .

  • Big Table data model

  • Not only can the number of rows be increased or decreased at will, but also the number of columns can be expanded under certain constraints

  • Each unit introduces a time tag, which can store different data of multiple different time versions

  • Each data row in the table has one and only one primary key and any number of columns

  • The columns of each row of data in the table may not be the same

  • Each cell is co-located by row key, column key and timestamp

  • In select, the clause for connection:

  • Left outer join: LEFT OUTER JOIN

  • Right outer join: RIGHT OUTER JOIN

  • Full outer connection: FULL OUTER JOIN

  • In distributed database query, the main reason for the large amount of data transmission is: connection operation, and operation

  • Supervised learning includes: support vector machine, naive Bayesian, k-nearest neighbor, etc.

  • ETL tool: Extarct Transform Load, is the main technology to achieve data integration, extraction, transformation, loading

  • The life cycle of database application system: project planning, requirements analysis, system design, implementation and deployment, operation and maintenance

  • monitoring mechanism

  • Monitoring of the database architecture system: monitoring of basic space information, space usage, and remaining space

  • The main monitoring content of database performance monitoring includes: data buffer hit rate, library buffer, user lock, lock and wait, rollback segment, temporary segment usage, index usage, wait event, shared pool

  • Database recovery involves how to establish redundant data and how to use these redundant data. Establishing database redundancy techniques includes: data backup, log file logging, database replication, database mirroring

  • Automatic database recovery using database log files

  • Cloud Computing and Cloud Database:

  • One of the positioning of the cloud computing platform is to try to provide almost unlimited computing resources for the application system

  • Cloud computing has the flexibility to provide users with short-term use of resources

  • MapReduce computing is a mechanism for writing large-scale distributed applications on the basis of a set of large-scale clusters at the application layer

  • The cloud database architecture reduces the amount of communication between nodes in a distributed system through computation migration rather than data migration

  • Backup types include: database backup, file backup, transaction log backup

  • Hybrid OLAP refers to the combination of OLAP based on multidimensional database and OLAP based on relational database

  • Database application system life cycle, database application system design includes:

  • Concept Design

  • Sorting out and designing business ER diagrams

  • Analyze and organize data dictionary and data flow diagram

  • Clarify modeling goals

  • define entity set

  • Define the relationship between entity sets

  • logic design

  • Starting from the conceptual model of the database, design the logical structure of the database expressed as a logical schema

  • Decompose patterns vertically or horizontally

  • Convert ER diagram to relational schema

  • Normalize the relational schema

  • Design relational table structure

  • physical design

  • Determine where the data is stored

  • ER Diagram to Relational Schema Conversion

  • Entity Conversion

When an ER diagram is converted into a relational schema, an entity is converted into a relational schema, the attributes of the entity are the attributes of the relational schema, and the keys of the entity are the keys of the relationship

  • Conversion of relationships between entities

There are three relationships between entities

  • 1:1 (one to one)

  • 1:n (one-to-many), non-standard relationship

  • n:m (many-to-many), non-deterministic relationship

  • The recommended extension of the transaction log file is .ldf, which is used to store all log files of the restored database. Each database must have at least one log file, and there can be more than one. There is no limit to the storage location of the log file, and the log file can be set to automatically grow, and there is no size limit on the log file

  • Server User account for the database server. If no valid login ID is specified, the user cannot connect to the SQL Server database server, and all newly created database users must have corresponding login names. A database user can correspond to a login name and cannot be added to a fixed server role. Only after the database user is authorized can he have the corresponding authority

  • In the DBAS life cycle, the main tasks of planning and analysis include

  • System Planning and Definition

  • mission statement

  • Determine mission goals

  • Determine system scope and boundaries

  • Determine user view

  • project planning

  • Feasibility Analysis

  • Among the application types of enterprise information systems, OLAP applications refer to online analytical processing applications

  • Clustering is to divide a group of data objects into several groups by a certain method, and make the data objects in the group as close as possible, and the objects in the groups are as different as possible (K-means algorithm)

  • Classifier

  • statistical methods

  • Bayesian method

  • nonparametric method

  • machine learning method

  • decision tree method

  • rule induction

  • neural network approach

  • BP algorithm

  • Database management systems generally implement deadlock detection by periodically checking the transaction wait graph

  • When the user's permissions conflict with the permissions of the role, the permissions of the role shall prevail

  • The locking protocol in the database management system specifies the locking event, duration and release lock event of the transaction, among which the three-level locking protocol can fully guarantee the consistency of concurrent transaction data

  • Data Division and Parallel Algorithm in Parallel Database

  • The calculation of clustering functions SUM, MIN, and MAX can generally be parallelized by "dividing first and then combining"

  • If the relationship is divided according to the method of range division, and the sorting attribute happens to be the partitioning attribute, the sorting results of each division can be directly concatenated to obtain a completely sorted relationship

  • Dividing data by round robin is most suitable for applications that need to scan the entire relationship. For such applications, load balancing can be better achieved and parallelism can be fully utilized.

  • In data partitioning and parallel algorithms in parallel databases, range partitioning will cause uneven data distribution, resulting in a decline in parallel processing capabilities

  • Oracle's security control mechanism can be divided into

  • Database-level security control: through user identity authentication and authorized user corresponding system authority to ensure

  • Table-level security controls: guaranteed by granting or revoking object privileges

  • Row-level security controls: guaranteed by granting or revoking object privileges

  • Column-level security controls: guaranteed by granting or revoking object privileges

  • Users in the Oracle database can be divided into DBA users and ordinary users according to their operation authority

  • SQL Server 2008 supports data recovery at two levels of database and data file

  • 4 ranking functions

  • DENSE_RANK: Returns the rank of rows in the result set partition

  • RANK: returns the rank of each row within the partition of the result set

  • ROW_NUMBER: returns the sequence number of the row in the result set partition

  • NTILE: Distribute rows in an ordered partition to a specified number of feet

Guess you like

Origin blog.csdn.net/Orlando_Ari/article/details/129648125