Code address:
https://github.com/brucexx/heisenberg
Its advantages: The sub-database and sub-table are separated from the application, the sub-database table is like using a single-database table,
the pressure of the number of db connections is reduced , the
hot restart configuration
can be horizontally expanded ,
and the MySQL native protocol is followed
without language restrictions. mysqlclient, c, java, etc. can use the
Heisenberg server Through management commands, you can view, such as the number of connections, thread pools, nodes, etc., and you can adjust
the sub-database sub-table script using velocity to customize the sub-database table, which is quite flexible
I have done a simple sharing in the group before, this time is a little easier, share it first to see if there is a better idea that improves in this area
Let's start with an introduction to heisenberg
1.Heisenberg overall architecture
First this structure:
The application is the mysql client for the heisenberg cluster,
And heisenberg also integrates the native protocol of mysql, so for the application, it is equivalent to the data source of a single database and single table
Whether it is mysql client, c, jdbc driver, etc., can access the heisenberg server, and the server does the work of sub-database and sub-table.
Access to the heisenberg cluster can be solved by load software/devices such as lvs, F5, etc.
In fact, the performance of a heisenberg is quite good. I am stressed to 2320TPS load and it is only about 0.1-0.3 (CPU 8core, 16G). Since I can't find a mysql physical machine, I have to do it.
Server internal structure:
Among them, FrontConnectionFactory is for application-oriented connection management, and ManagerConnectionFactory is for some connection management for internal management of heisenberg server, such as hot restart after changing configuration, closing a connection and other functions
The mysql protocol runs through the application and the mysql server, and is finally parsed into related mysql data packets, authorization packets, registration packets, etc.
When the heisenberg server receives the SQL statement, it is parsed into DML, DCL, DDL type and the value of related column names through AST grammar parsing, etc., and then through the ServerRouter layer, after the segmentation of the sub-database and sub-table, the final segmentation will be A good statement is put into the corresponding data node for execution
The segmentation of sub-database and sub-table is supported by two syntaxes, velocity and groovy, in order to meet various general flexibility. Groovy is the initialization table, library and mapping relationship, which is only initialized once when loading; and velocity is Used to render the corresponding sub-library and sub-table rules.
OK, know the principle, then start to explain how to use sub-library and sub-table
2.Heisenberg development
Maven + JDK is deployed
https://github.com/brucexx/heisenberg
After downloading to local,
Mvn package 之
A heisenberg-server-1.0.0.zip file will be generated in the local target
Unzip unzip heisenberg-server-1.0.0.zip
Enter the conf directory
There are the following directories
conf
---log4j.xml
---rule.xml
---schema.xml
---server.xml
log4j.xml will not be introduced
sql_route.log is the time when the database table is split
sql_execute.log is the total execution time of sql
server.xml
"serverPort">8166 "managerPort">8266 "initExecutor">16 "timerExecutor">4 "managerExecutor">4 "processors">4 "processorHandler">8 "processorExecutor">8 "clusterHeartbeatUser">_HEARTBEAT_USER_ "clusterHeartbeatPass">_HEARTBEAT_PASS_
|
serverPort is the service port, that is, the port for the upper-layer application
managerPort is the management port, that is, the listening port of management, which is used to operate some configurations of the server, etc.
initExecutor is the number of threads to initialize
timerExecutor heartbeat execution thread number
managerExecutor manages the number of execution threads
The processors application receives the number of processor cores
The number of processing classes received by the processorHandler application
The processorExecutor application receives the number of processing threads
clusterHeartbeatUser and clusterHeartbeatPass do not need to be changed, they are used for cluster authentication
"brucexx"> "password">st0078 "schemas">trans_shard
|
Brucexx is the user name of the custom application, and st0078 is the password of the custom application
Schemas are custom schemas, see schema.xml for details.
The schemas here can be multiple, separated by commas
Whitelist restrictions:
test
|
schema.xml placement
mysql data source
"transDS" type="mysql"> "location"> 10.58.49.14:8701/db$0-9
"user">root "password">st0078 "sqlMode">STRICT_TRANS_TABLES
|
这里指定的mysql的数据源,后面$0-9是一种自定义的缩略写法
也可以在property里面定义多个location,比如:
"location"> 10.58.49.14:8701/db0 10.58.49.14:8701/db1 10.58.49.14:8701/db2
|
效果是一样的
Shard结点配置
Shard结点相当于一个逻辑结点,提供给外部相关的schema,对应于数据源有
主/备/灾,
"transDN"> "dataSource">
transDS$0-9
transSlaveDS$0-9
transSlaveDS$0-9
"rwRule">m:0,s:1 "poolSize">256 "heartbeatSQL">select user()
|
属性dataSource 第一个是主库,第二个备库,第三个灾库,需要多少配置多少个
读写分离规则rwRule,m和s代表读取的比例,表示主库读取为0,从库读取1,这样直接读写分离,如果是1:1的话相当读取各1:1的比例
池大小poolSize为到mysqlDB的连接数和心跳sql heartbeatSQL,无特殊需求保持不变
Schema配置
"trans_shard">
"trans_online, trans_content, trans_tb "dataNode="transDN$0-9"rule="rule1"/>
trans_shard 提供的schema,对应于server.xml中的名字 下面会有多个需要分库的表, "trans_online"dataNode="transDN$0-9"rule="rule1"/> 这里必须要把需要分库分表的内容写出来,当然,如果不分库表也是可以的
”tbxxx"dataNode="transDN0" ruleRequired=”false”/
rule.xml分库分表规则配置,其中columns,dbRuleList,tbRuleList里面的列名要保持大写
首先先上一个整体配置
其中dbRuleList 为分库规则
分库规则dbRuleList可以有多个dbRule,当第一个不满足时,可以用第二个,当然这个效率不好,如果有规则区分,尽量再写一个rule, dbRule 最后的结果是表的前缀 比如分库分表 库名为db0-db9,那么这个dbRule渲染时
取到TRANS_ID 这个为后,在脚本里计算出取倒数第2位为库后缀 比如上图的分库为
分表规则配置
这个和上面分库一样了,以倒数1,2位为库的后缀 如下图:
有个潜规则就是 需要保证全局的表名不能重复 比如db0有个trans_tb00,db1就不能有叫trans_tb00的表
表初始化
需要初始化个表,其中key为db的下标索引,比如db0 的下标为0, list为每个库里的表后缀名
目录是为了初始化定义这些库表
如何使用呢? 通过命令行 这里就不用讲了,wms_shard就是在server.xml里面配置的逻辑分库分表的数据源schema,应用只要访问这个就好了
show tables;也可以看到自己的一些表信息
ok.
mysql> select * from t_user_id_map; +-----------+---------------------------+-----------+------------+---------------------+---------------------+ | F_uid | F_uname | F_enabled | F_user_id | F_create_time | F_modify_time | +-----------+---------------------------+-----------+------------+---------------------+---------------------+ | 105001050 | @8230762802717b6a723fe9cd | 1 | 1287824017 | 2014-03-10 15:38:44 | 2014-03-10 15:38:44 | | 62000 | | 1 | 533885000 | 2014-03-26 23:02:31 | 2014-03-26 23:02:31 | | 86000 | | 1 | 237406000 | 2014-03-27 01:04:23 | 2014-03-27 01:04:23 | | 96000 | | 1 | 767684000 | 2014-03-27 00:30:32 | 2014-03-27 00:30:32 | | 130000 | | 1 | 506552000 | 2014-03-27 15:57:31 | 2014-03-27 15:57:31 | | 149000 | | 1 | 868483000 | 2014-03-27 15:50:09 | 2014-03-27 15:50:09 | | 179000 | | 1 | 245626000 | 2014-03-26 21:33:46 | 2014-03-26 21:33:46 | 当没有指定分库分表规则时,是进行的全表扫描,当然我们可以通过学习 mysql> explain select * from t_user_id_map; +-----------+----------------------------------- | DATA_NODE | SQL +-----------+----------------------------------- | wmsDN[0] | select * from t_user_id_map_00_0 | wmsDN[0] | select * from t_user_id_map_00_1 | wmsDN[0] | select * from t_user_id_map_00_2 | wmsDN[0] | select * from t_user_id_map_00_3 | wmsDN[0] | select * from t_user_id_map_00_4 | wmsDN[0] | select * from t_user_id_map_00_5 | wmsDN[0] | select * from t_user_id_map_00_6 | wmsDN[0] | select * from t_user_id_map_00_7 | wmsDN[0] | select * from t_user_id_map_00_8 | wmsDN[0] | select * from t_user_id_map_00_9 | wmsDN[1] | select * from t_user_id_map_01_0 | wmsDN[1] | select * from t_user_id_map_01_1 | wmsDN[1] | select * from t_user_id_map_01_2 | wmsDN[1] | select * from t_user_id_map_01_3 | wmsDN[1] | select * from t_user_id_map_01_4 | wmsDN[1] | select * from t_user_id_map_01_5 | wmsDN[1] | select * from t_user_id_map_01_6 | wmsDN[1] | select * from t_user_id_map_01_7 | wmsDN[1] | select * from t_user_id_map_01_8 | wmsDN[1] | select * from t_user_id_map_01_9 | wmsDN[2] | select * from t_user_id_map_02_0 .... 这边表很多,其中dataNode是我们里面对应的结点
mysql> select * from t_user_id_map where f_uid=196606999; +-----------+---------+-----------+-----------+---------------------+---------------------+ | F_uid | F_uname | F_enabled | F_user_id | F_create_time | F_modify_time | +-----------+---------+-----------+-----------+---------------------+---------------------+ | 196606999 | | 1 | 749331999 | 2014-04-04 14:46:58 | 2014-04-04 14:46:58 | +-----------+---------+-----------+-----------+---------------------+---------------------+ 1 row in set (0.04 sec) The configuration here is to divide the database and table according to the last three digits of F_uid, and the dbRuleList is configured with the last 2 and 3 digits. tbRuleList configures the last bit
Let's see how it is routed
mysql> explain select * from t_user_id_map where f_uid=196606999; +-----------+---------------------------------------------------------+ | DATA_NODE | SQL | +-----------+---------------------------------------------------------+ | wmsDN[99] | select * from t_user_id_map_99_9 where f_uid=196606999 | +-----------+---------------------------------------------------------+ 1 row in set (0.03 sec)
You can see data_node --> wmSDN[99] , branch location Table corresponding to t_user_id_map_99_9 |
http://blog.sina.com.cn/s/blog_56d988430102vdfo.html