What is the difference between sub-library, sub-table, and partition?

Java technology stack

www.javastack.cn

Excellent Java technology public account

1. Divide and combine

I have said many times, do n’t stick to a certain technology, technology is the same. What is important is programming thinking , and thinking is the most important.

When the amount of data is large, you need to have a divided idea to refine the granularity. When the amount of data is too fragmented, you need to have a reasonable idea to coarsen the granularity.

1.1 points

Many technologies use the programming idea of ​​points, here are a few examples, these are the ideas of points

  • The development of centralized services to distributed services

  • From Collections.synchronizedMap (x) to 1.7ConcurrentHashMap to 1.8ConcurrentHashMap, refine the granularity of the lock while still ensuring thread safety

  • From AtomicInteger to LongAdder, ConcurrentHashMap's size () method. Use decentralized thinking to reduce the number of cas and enhance multi-thread accumulation of a number

  • JVM's G1 GC algorithm divides the heap into many regions for memory management

  • In HBase's RegionServer, the data is divided into multiple regions for management

  • Is the development of thread pools usually isolated from resources?


2.2 He

Many techniques are also applied to the programming ideas, here are a few examples, these are the ideas

  • TLAB (Thread Local Allocation Buffers), thread local allocation buffer. Avoid multi-thread conflicts and improve object allocation efficiency

  • Escape analysis, allocate the instantiated memory of the variable directly on the stack, without entering the heap, the thread ends and the stack space is recycled. Reduce the number of temporary objects allocated in the heap

  • Under the CMS GC algorithm, although mark clearing is used, there are configurations to support defragmentation of memory. Such as: -XX: UseCMS-CompactAtFullCollection (whether to sort after FullGC, Stop The World will become longer) and -XX: CMSFullGCs-BeforeCompaction (compression and finishing after several FullGC)

  • Lock coarsening, when JIT finds that a series of consecutive operations are repeatedly locking and releasing locks on the same object, it will increase the scope of lock synchronization

  • Kafka's network data transmission has some data configurations to reduce network overhead. Such as: batch.size and linger.ms, etc.

  • Is it usually called bulk acquisition interface for development?

2. Partition

This article is based on MySql InnoDB

Having said so much, let's talk about the main body first, and talk about partitioning first, because the blogger has written a MySql partitioning blog before, so I wo n’t spend too much time writing.

For details, see: an article that takes you to understand the partitioning in MySQL!

2.1 Implementation

How to achieve it is written in the above link. Here, just remember that if there is a primary key or a unique index in the table, the partition column must be an integral part of the unique index.

This is divided into databases, the application is transparent, and the code does not need to modify anything.

2.2 Internal documents

Go to the data directory first, if you don't know the location of the directory, you can execute:

Next look at the internal files:

 

We can see from the picture above that there are 2 types of files, .frm files and .ibd files

  • .frm file : table structure file

  • .ibd file : In InnoDB, the index and the data are in the same file. . Because the Order table is divided into 5 areas, there are 5 such files

  • .par file : The result of your execution may or may not be a .par file. Note: Starting with MySql 5.7.6, the .par partition definition file is no longer created. Partition definitions are stored in the internal data dictionary.


2.3 Data processing

After partitioning the table, MySql performance is improved. If there is a table, there is only one .ibd file and a large B + tree. If the table is divided, it will be divided into different areas according to the partition rules, that is, a large B + tree is divided into multiple small trees.

The efficiency of reading is definitely improved. If you use the partition key index, first go to the auxiliary index B + tree of the corresponding partition, then go to the clustered index B + tree of the corresponding partition.

If you do not take the partition key, it will be executed once in all partitions. Will cause multiple logical IO!

If you want to view the sql statement partition query, you can use the explain partitons select xxxxx statement. You can see that a select statement has gone several partitions.

mysql> explain partitions select * from TxnList where startTime>'2016-08-25 00:00:00' and startTime<'2016-08-25 23:59:00';  
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+  
| id | select_type | table             | partitions | type | possible_keys | key  | key_len | ref  | rows  | Extra       |  
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+  
|  1 | SIMPLE      | ClientActionTrack | p20160825  | ALL  | NULL          | NULL | NULL    | NULL | 33868 | Using where |  
+----+-------------+-------------------+------------+------+---------------+------+---------+------+-------+-------------+  
row in set (0.00 sec)


Three. Sub-library sub-table

When a table grows with time and business, the amount of data in the Curry table will increase. Data operations will also grow larger and larger.

The resources of a physical machine are limited, and ultimately the amount of data that can be carried and the processing capacity of the data will be limited. At this time, it will use sub-library sub-tables to undertake super-large-scale tables , the kind that cannot be placed on a single machine.

The difference is that the partition is generally placed in a single machine, and the time range partition is used more for easy archiving. It's just that the sub-library and sub-table need code to realize, and the partition is implemented inside mysql. Sub-database sub-tables and partitions do not conflict, and can be used in combination.

3.1 Implementation

3.1.1 Sub-library and table standard

  • 100G + storage

  • Data increment 200w + per day

  • Number of single table 100 million +


3.1.2 Sub-database sub-table fields

The value of the sub-database sub-table field is very important

  1. In most scenarios this field is the query field

  2. Numeric

Generally use userId, can meet the above conditions

3.2 Distributed database middleware

There are two types of distributed database middleware, proxy and client-side architecture. The proxy mode includes MyCat and DBProxy, and the client-side architecture includes TDDL and Sharding-JDBC.

So what is the difference between proxy and client-side architecture ? What are the advantages and disadvantages of each? In fact, look at a picture to know.

In the proxy mode, our select and update statements are sent to the agent, and the agent operates the specific underlying database. Therefore, the agent itself must be required to ensure high availability, otherwise the database is not down, and the proxy hangs, then go away.

The client mode usually makes a layer of encapsulation on the connection pool, connects with different libraries internally, and sql is handed over to the smart-client for processing. Usually only one language is supported. If you want to use other languages, you need to develop a multilingual client.

The respective advantages and disadvantages are as follows: 

3.3 Internal documents

I found an example of sub-database sub-table + partition, which is basically the same as that of partitioned table, except that there are more .ibd files with more tables. The explanation of the files is above:

[miaojiaxing@Grim testmydata]# ls | grep 'base_info'
base_info_00.frm
base_info_00#P#p_2018.ibd
base_info_00#P#p_2019.ibd
base_info_00#P#p_2020.ibd
base_info_00#P#p_2021.ibd
base_info_00#P#p_init.ibd
base_info_00#P#p_max.ibd
base_info_01.frm
base_info_01#P#p_2018.ibd
base_info_01#P#p_2019.ibd
base_info_01#P#p_2020.ibd
base_info_01#P#p_2021.ibd
base_info_01#P#p_init.ibd
base_info_01#P#p_max.ibd
base_info.frm
base_info.ibd


3.4 Problem

3.4.1 Transaction issues

Now that the database is divided into tables, it must involve distributed transactions. How to ensure that multiple records inserted into different databases can succeed or fail at the same time.

Some students may think of XA, XA performance is poor and do not need to use mysql5.7. Flexible transactions are the current mainstream solution, and the TCC model belongs to flexible transactions.

Each company has its own implementation for distributed transaction issues. Huawei uses saga, Ali uses TXC, and Ant uses DTX. It supports FMT and TCC modes.


3.4.2 join problem

tddl, MyCAT, etc. support cross-shard join. But try to avoid cross-database joins, such as through field redundancy.

If this happens and the middleware supports shard join, it can be used like this. If not supported, you can manually query.

4. Summary

The sub-table is not the same as its use. The sub-table is to accept the super-large-scale table , which cannot be put on a single machine. The partition is usually placed in a single machine, and the time range partition is used more for easy archiving .

In terms of stable performance, they are all sub-tables. The difference is that the partition table is internally implemented by mysql, and there will be less data interaction than the sub-table scheme.

Author: GrimMjx

www.cnblogs.com/GrimMjx/p/11772033.html

END

Learning materials:

Share a copy of the latest Java architect learning materials

Recent articles:

1. Java 10 big pack B writing method, you can brag!

2. Java 14 pattern matching, very new feature!

3. 8 practical GitHub tips you must master!

4. Are you still using Date? Quickly use LocalDateTime!

5. 5 tricky String interview questions!

6. 8 data structures that every programmer must master!

7. The 8 ways of writing the singleton mode are very complete!

8. Nginx has another Niu X function! Traffic copy

9. When I go, my colleague actually stores the password in plain text! ! !

10. A piece of junk SQL that quickly crashed a 64-core CPU!

There are too many dry goods in the public account. Scan the code to pay attention to the public account of the Java technology stack . Read more.

Click " Read Original " to take you to fly ~

495 original articles have been published · 1032 thumbs up · 1.46 million views

Guess you like

Origin blog.csdn.net/youanyyou/article/details/105525169