java interview (three) database

table of Contents

Paradigm (best remembered for a few examples)

Index (meaning the index, the advantages and disadvantages)

Index data structure (bottom)

B + tree index Why do B-tree

Category Index

Clustered index and non-clustered indexes

Consideration of the establishment of the index

Note the use of the index

Transaction (ACID)

Multi-transaction concurrency problems (dirty reads, non-repeatable read, phantom read) will result in the

Transaction isolation

Read locks and write locks

High database concurrency control

Mysql analysis tools

MyBatis cache

MyBatis secondary cache

Mybatis difference in the # and $

High concurrent database design (one million database design)

Database Optimization

Master-slave replication

Mysql Data Engine selection

Database connection pool


Paradigm (best remembered for a few examples)

The first paradigm: as long as the relational database tables, meet the first paradigm.
Properties: All data field of the first paradigm table are single property, indivisible.
The second paradigm: the key combination can not be used to ensure unique primary key.
The third paradigm: the data sheet does not exist in the non-key fields for any candidate keyword segment transfer function depends, separate table.

Index (meaning the index, the advantages and disadvantages)

Index is the database values or columns in a table for a configuration of ranking, the index can quickly access the database specific information tables.

advantage:

  • By creating a unique index, you can guarantee the uniqueness of each row of data in a database table.
  • It can greatly accelerate the speed of data retrieval, which is the main reason for creating the index.
  • You can accelerate the connection between the table and the table, especially in reference to particular interest for data integrity aspects.
  • When using packet data retrieval and sorting clause, can also significantly reduce the query time grouping and sorting.
  • By using the index, you can process the query, use concealer to optimize and improve the performance of the system.

Disadvantages:

  • Creating indexes and index maintenance takes time, this time with the increase in the amount of data increases.
  • The index needs to occupy physical space, in addition to the data table space outside the accounting data, each index also accounted for a certain physical space, so if you want to build a clustered index space needs will be greater.
  • When the data in the table to add, delete and modify, the index should be dynamic maintenance, thus reducing the speed of data maintenance.

Index data structure (bottom)

Tree B-, B + tree, B + tree with a sequential access

https://www.cnblogs.com/songwenjie/p/9414960.html

B + tree index Why do B-tree

No B-tree with a B + tree consideration is the impact of IO performance, each node B-tree is stored data, and the B + tree only leaf nodes only store data, find the case the same amount of data, height higher B-tree , IO more often. Database index is stored on disk, and when the amount of data, the overall index can not be loaded into all of the memory, it can be loaded one by one (corresponding to the node of the index tree) each disk page. Wherein further optimize the B + tree in the bottom MySQL: a doubly linked list in the leaf nodes and the head node and tail node of the linked list is pointed cycle.

Category Index

Primary key index : one group attribute that uniquely identifies a record, only one primary key index

Unique index : the value of a data sequence to avoid repetition of the same table, there can be multiple unique index

General Index : quickly locate specific data, shall be added to the query field conditions, not easy to add too much conventional index, insert the impact data, delete and modify operations

Composite Index : index creation means on a plurality of fields, the first field of the composite index only appears in the query, the index can be used before, and therefore the application of high-frequency field, placed in front of a composite index, will make the system most likely to use this index, the index play a role

Clustered index and non-clustered indexes

Clustered index

Consistent clustered index records in the order and the order index, so fast query efficiency. As long as the first index value is recorded, the remaining continuous recording on the same physical storage continuous found. Only when the index table comprising aggregated data rows in the table was stored in sorted order. Clustered index corresponding modifications drawback is slow, because in order to guarantee the physical order of records in the table and index is consistent, at the time the record is inserted, will reorder the data pages.

Non-clustered index

I.e. non-clustered index logical order index is not equivalent to the physical order of rows in the table, neither index using B + tree structure. Non-clustered index levels, and will not cause data rearrangement.

Examples of comparison of the two indexes

Clustered index on a similar index Xinhua Dictionary in alphabetical order, are in sequence, such as that found in the dictionary, "love", on which the order of execution found "cancer." Rather than clustered index is similar to the sort of strokes, indexed sequential physical order and the order is not stored.

The fundamental difference

The fundamental difference between clustered index and non-clustered index is whether the order table records the order and with the index of the same.
 

Consideration of the establishment of the index

Should create an index on which columns

  •  The column often need to search
  •  As the primary key column
  •  Often used in a column connected columns mainly some foreign key, you can speed connection
  •  Often you need to search according to the column range
  •  Often we need to sort the column
  •  Often used in the column above where clause

 

Should not create an index on which columns

  •  Rarely used in the query column
  •  For those with little columns of data values, such as gender column personnel table, bit data type column
  •  For those defined as text, image columns. Because the amount of data in those columns considerable
  •  When asked to modify the performance is far greater than the search performance because when the index increases, it will improve the search performance, but will reduce performance modifications

Note the use of the index

1. The index does not contain NULL values

        As long as the column contains NULL values ​​will not be included in the index, a composite index containing long as there is a NULL value, then this column for this composite index is invalid. So we do not let the default fields of the database design value is NULL. 0 should be used, or a special value NULL value instead of an empty string.

2. Composite Index

        For example, there is a statement like this: select * from users wherearea = 'beijing' and age = 22; if we are in the area and create a single index, then age respectively, due to the mysql query can only use one index, so although this when the index has been relatively do not do full table scan efficiency has improved a lot, but if in area, age create a composite index on two words will lead to higher efficiency. If we create (area, age, salary) composite index, in fact, created the equivalent of (area, age, salary), (area, age), (area) three indexes, which is known as the best left-prefix characteristic. So when we created composite index should be used as the most common restrictions are listed on the left, in descending order.

3. Use a short index

        Of the serial indexed, if possible, you should specify a prefix length. For example, if there is a CHAR (255) column, if the 10 or 20 characters in the front, the only multi-value, then do not index the entire column. Short index can not only speed up the search and save disk space and I / O operations.

4. sort of indexing problems

        mysql query using only one index where clause so if the index has been used, then order by the column will not use the index. So do not use the default database sorting operation to sort the case may meet the requirements; try not to sort multiple columns contain, if necessary to create the best composite index to these columns.

5.like statement operation

        It does not encourage the use of like operations under normal circumstances, if not non-use, how to use is also a problem. like "% aaa%" will not like "aaa%" can use the index to use an index.

6. Do not carry out operations in the column

        select* from users where  YEAR(adddate)

7. NOT IN and operations not used

        NOT IN operation and will not use the index to a full table scan. NOT IN can be replaced by NOT EXISTS, id3 may be used id> 3 or id
 

Transaction (ACID)

MySQL transaction is mainly used for large data manipulation, high complexity of the process. For example, in personnel management system, you delete a person, you only need to remove the basic information of personnel, and also to delete the personnel-related information, such as mail, articles, etc., so that the database operation statement constitutes a transaction !

  • Only in MySQL database engine used Innodb database or table only support transactions.
  • Transactions can be used to maintain the integrity of the database to ensure that the bulk of the SQL statements either all executed or not executed all.
  • Transaction management for insert, update, delete statement

Generally speaking, the transaction must meet four conditions (ACID) :: atomic ( A tomicity, also known as indivisibility), consistency ( C onsistency), isolation ( the I solation, also known as independence), persistence ( D urability).

  • Atomicity: All operations in one transaction (transaction) is either completed or not completed all, does not end in the middle of a link. Transaction error occurs during execution, it will be rolled back (Rollback) to the state before the start of the transaction, as the transaction never performed the same.

  • Consistency: before the transaction begins and after the end of the transaction, the integrity of the database is not corrupted. This means that data written must be fully compliant with all of the default rules, which include data accuracy, and a series of follow-up database can complete the scheduled work spontaneously.

  • Isolation: database allows multiple concurrent transactions simultaneous read and write capabilities and modify its data, the isolation can be prevented when a plurality of transactions concurrently perform cross executed result in inconsistent data. Transaction isolation divided into different levels, including Uncommitted Read (Read uncommitted), Read Committed (read committed), repeatable read (repeatable read) and serialized (Serializable).

  • Durability: After a transaction, changes to data is permanent, even if the system failure will not be lost.

In the default setting MySQL command line transactions are automatically submitted after that is to execute SQL statements COMMIT operation will be executed immediately. Therefore explicitly advised to open a transaction using the BEGIN command or START TRANSACTION, or execute the command SET AUTOCOMMIT = 0, used to disable the auto-commit the current session.

MYSQL transaction There are two main methods:

1, with BEGIN, ROLLBACK, COMMIT to achieve

  • BEGIN  start a transaction
  • ROLLBACK  transaction rollback
  • COMMIT  transaction confirmation

2, the direct use SET to change MySQL's auto-commit mode:

  • SET AUTOCOMMIT = 0  to disable automatic submission
  • SET AUTOCOMMIT = 1  enable auto-commit

 

Multi-transaction concurrency problems (dirty reads, non-repeatable read, phantom read) will result in the

  • Dirty read  : A transaction reads to update the data of another uncommitted transactions
  • Non-repeatable read  : in the same transaction, the same data repeatedly reading result returned is different, in other words, the update data can read subsequent read transaction has been submitted to another opposite, "repeatable read" At the same. when a transaction reads data multiple times, to ensure that the read data, the subsequent read is not read another transaction has updated the data submitted
  • Phantom read  : a transaction insert data read another transaction has been submitted

Transaction isolation

The first category: Read Uncommitted (uncommitted read content)

Transaction A / B transaction
       at the isolation level, all other things can see the results of uncommitted transactions, this isolation level are rarely used in practice, because of his good performance other than page level number, read uncommitted data, also called a dirty read (dirty read).

The second category: Read Committed (read submission)

       This is the default isolation level for most database systems (but not mysql default), he meets a simple definition of isolation: a thing before submitting to other things is not visible, this isolation level also supports the so-called non-repeatable read take (Nonerepeatable Read), because other instances of the same transaction processing in this instance may have a new commit other period, so select the same may return different results.

The third category: Repeatable Read (may repeat)

       This is the mysql default isolation level, he ensure that the same transaction multiple concurrent instances when data is read, it will see the same data line, but in theory, it will lead to another thorny issue, phantom read (Phantom Read) simply put, phantom read means that when a user reads a range of data rows, the other things they insert a new row within the range, when the user re-read the data in this range, you will find new "Phantom" line, Innodb and Falcon storage engine through a multi-version concurrency control (MVCC) mechanism to solve the problem.

Class IV: Serializerable (serializable)

       This is the highest level of isolation, by forcing him to sort things, making it impossible to conflict with each other, so as to solve the problem of phantom read, in short, it is a shared locks on each row of data read, at this level, it may be lead to a lot of timeouts and lock contention.
 

Dirty read isolation levels unrepeatable reads phantom read
to read the contents of v v v uncommitted
read committed content x v v
repeatable read x x v
serializable x x x

High database concurrency control

Solving common database program high concurrency:

  1. Cached Web application architecture: between the Web and DB (database) layer-plus one cache, main purposes: to reduce the burden of reading the database, increase the data read speed. media access cache memory is to be considered a distributed cache layer, it is easier to get rid of the limitation of memory capacity, while increasing flexibility.
  2. Increase Redis cache database:
  3. Increase database indexes: Avoid the following two questions: more indexes, query speed but will slow down; the data table every time data is written, the index number will make the back of the reordering, still slows down (it is recommended that frequent changes in the table in construction index)
  4. Static pages: reduce the amount of read access to the database server users
  5. MySQL is separated from the master reader: As the writing pressure increases database, cache layer (e.g., the Memcached) read only relieve pressure on the database. Focus on reading and writing in a database so that the database overwhelmed. Separate read and write is only on the primary server to write, read-only from the server
  6. Sub-table and warehouses: [level] split points table. For example the service class data table to split, shorten the length of the table, can improve the speed of database queries (length proportional to the length of the query with the data), indirectly improve the processing efficiency of the peak concurrency
  7. Load balancing cluster: the ultimate solution, we want to improve the performance improvement, the most fundamental is the most effective way to improve on the hardware configuration, you sauce
     

Mysql analysis tools

https://www.cnblogs.com/amiezhang/p/10217133.html

MyBatis cache

https://www.cnblogs.com/happyflyingpig/p/7739749.html

MyBatis secondary cache

Mybatis difference in the # and $

Mapper.xml statement Mybatis parameterType reference in two ways to the SQL statement passed: $ # {} and {}

We often used is # {}, is illustrated generally as it prevents the SQL injection, simply # {} in this way is pre-compiled SQL statement, which is the intermediate parameter # {} escape into string, for example:

select * from student where student_name = #{name} 

After the pre-compiled, it will be parsed into a dynamic parameter markers?:

select * from student where student_name = ?

$ {} Used when the dynamic resolution will pass the parameter string

select * from student where student_name = 'lyrics'

to sum up:

# {} This value is the value of re-compiled SQL statements
after the $ {value} This is again compiled SQL statement

{#} Manner can be prevented to a large extent sql injection.
$ Sql injection method can not prevent.
$ Mode is generally used for incoming database objects, such as passing a table name.
Generally it can not use the # $.
 

High concurrent database design (one million database design)

http://blog.itpub.net/26736162/viewspace-2651606/

Database Optimization

  1. Data partition
    for vast amounts of data query optimization, how effective is an important way to reduce the size of data storage and processing needs, so we can partition it for huge amounts of data. For example, for the Year of data storage, we can partition the Year different databases have different ways of partitioning, but handling mechanism was much the same. for example SQLserver data partition different data stored in different file groups, and the presence of different disk partitions, so be it data partition different files, reducing disk IO and system load.

  2. index

    Index may generally be accelerated to retrieve data, the acceleration joins between tables, including the establishment of the clustered index on the primary key for table indexing, the polymerization index based on the date column, a number of advantages of the index, but for indexing, and also need to consider the actual situation, not for each column indexing. If the table is large structure, you have to take into account the cost of indexing and index maintenance, index itself occupies physical space, but also dynamically modify the dynamic maintenance of the index table, if these costs greater than the index brings speed optimization, it is not worth the candle.

  3. Caching mechanism

    As the amount of data processing tools in general are taken into caching problems, cache size settings are also related to the performance of data processing. As a column,

    The processing chamber 200 million polymerization operation data cache is set to 100,000 / reasonable buffer

  4. Increase the virtual memory

    Due to limited system resources, and the amount of data processed is very large, when there is insufficient memory, increase the amount of virtual memory to solve

  5. Batch processing

    Due to the huge amount of information processing, data can be massive batch (similar to the MapReduce cloud), then the processing of the data merge operation, divide and conquer, it is a good small data processing.

  6. Using temporary tables and the intermediate table

    Increase the amount of data, the process to be considered ahead of the summary, the aim is to split up, large table becomes small table, a certain rule reuse after merging block has been processed, the process using the temporary table and save intermediate results are very important. If the data on the massive, large table could not handle, only to split into several small tables. If the process requires a multi-step summary operation, press summary step step by step.

  7. Query optimization

    Affect the performance of the query to query efficiency is very large, narrow your search as early as possible

  8. Using Views

    View is a logical expression table, without occupying a physical address, for the mass data to be distributed to each base table according to certain rules, views are based on the query process.

  9. Using stored procedures

    Try to use SQL comes in a stored procedure return parameters, rather than custom return parameters, reducing unnecessary parameters, to avoid data redundancy

  10. Substituted by sorting the non-sequential storage

    Mechanical arm moves back and forth on the disk so that the non-sequential disk access becomes the slowest operation, but in a SQL statement of this phenomenon is hidden, so that makes the query a lot of non-sequential page query, reduce the query speed.

  11. Using the sampling data for data mining

    Based on data mining massive data in the ascendant, the face of super-massive data, general data mining algorithms often use sampling method for processing such error will not be great, and greatly improve the success rate of treatment efficiency and treatment. General sampling should pay attention data integrity, prevent excessive deviation.

Master-slave replication

Refers to the replication master MySQL data can be copied from a master node to a MySQL database server from one or more nodes. MySQL default asynchronous replication, so that node would not have to access the main server from updating update their data, the data can be performed on a remote connection, you can copy all database master database or specific database from the node, or a specific table .

Mysql Data Engine selection

Database connection pool

The basic idea of the database connection pool is to create a "buffer pool" for the database connection. Placed in advance certain number of connections in the buffer pool, establish a database connection when needed, simply remove the "pool" of one, then put it back after use. We can prevent endless connected with the database system by setting the maximum number of connection pools. More importantly, we can monitor the number of connections management system database connection pool usage, Development, provide the basis for system testing and performance tuning.

The core idea of the connection pool is multiplexed connection, a connection, distribution and management strategies so that the connection of the connection pool can be efficient and safe reuse and avoid frequent database connection is established by establishing a database connection pool and and close overhead.
  Works Connection pooling is mainly composed of three parts, namely the establishment of a connection pool, connection pool management connections, the connection is closed pool.
  First, establish a connection pool. Typically during system initialization, the connection pool will be established according to the system configuration, and to create several connection object in the pool, so as to obtain from the connection pool use. java provides many containers, can easily build connection pool, such as Vector (thread-safe), LinkedList like.
  Second, the connection pool management. Connection pool management strategy is the core mechanism for connection pooling, allocation and release of connection pool connections have a great impact on system performance. The strategy is:
  when a client requests a database connection, first to see if there is an idle connection pool, if there is a connection is idle, the customer will be allocated to the connection and make the appropriate treatment (i.e., the connection marked as being in use, a reference count is incremented ); If no free connection, check whether the number of currently open connections has reached the maximum number of connections, if not reached the maximum number of connections, then re-create a connection to the requesting client; if reached, according to a set maximum waiting time wait, if you exceed the maximum wait time, an exception is thrown to the customer.
  When the client releases the database connection, first determine the number of citations of the connection exceeds a predetermined value, and if it exceeds delete the connection from the connection pool, and determine whether the total number of connections to the current connection pool is less than the minimum number of connections, if it is less than the connection pool is full; if not exceed it marks the connection to the open state, available for reuse again.
  Third, close the connection pool. When the application exits, close all the links in the connection pool, the release of connection pool resources, the process is just the opposite and create.
 

The main advantage of the connection pool

  1) to reduce the time to create a connection. The connection pool is already ready, can be reused, after obtaining direct access to the database, thus reducing the number and duration of connections created.
  2), faster system response. Database connection pool during initialization, it often has a number of database connections created in a standby pool. At this point the connection initialization have been completed. For processing the service request, using the currently available direct connection, to avoid the overhead of database connection initialization time and release process, thereby reducing the overall system response time.
  3), unified connection management. If you do not use connection pooling, each time you access the database need to create a connection, the connection needs the stability of such a system by system great influence, it is prone to waste of resources and high load exception. Pool can be connected to maximize performance, the use of resources controlled below a certain level. Connection pool can control the number of links in the pool, the enhanced stability of the system when a large number of user applications.



 

 

Published 16 original articles · won praise 12 · views 8086

Guess you like

Origin blog.csdn.net/ziyou434/article/details/105079851
Recommended