Background
in traditional banking IT infrastructure, online trading and statistical analysis systems often use different technologies and physical device, regularly performed by ETL to migrate online transaction data to the analysis system. And as a data service resource pool, the same data may be shared by different types of micro-access services. While some online business transactions with the audit classes run simultaneously for the same data, it must ensure that a request to perform the complete isolation of the physical environment, so that the transaction analysis business without interference.
HBase is a high-reliability, high-performance, column-oriented, scalable distributed storage system, good at dealing with big data scene, have the following characteristics:
a large table scale, billion-level line, the one million
column-oriented storage, retrieval independent column
but there are also in use HBase some of the following questions:
HBase can only do simple key-value queries can not be achieved complex statistical SQL
HBase does not support multiple indexing
operation and maintenance complex. As a complete Hadoop big data analysis framework, the operation is more complex, difficult to locate problems
in order to solve the above problems, in a typical data sets and services business platform, users need to select an elastically scalable distributed relational database, to meet the following requirements:
standard SQL support
for highly concurrent
multi-index supports
easy maintenance requirements
SequoiaDB Sequoia database is stored in separate computing architecture This architecture can provide unlimited aspect transverse horizontal expansion for the data table, on the other hand by providing a layer in the calculation of different types of databases way of example, 100% compatible MySQL, PostgreSQL and SparkSQL agreement with the syntax. In addition to structured data, SequoiaDB giant sequoias database can support unstructured data, including JSON and S3 object storage, and Posix file systems in the same cluster, including the entire database for the upper layer of micro-architecture application service provides a complete data service resource pool.
SequoiaDB support complex SQL queries, support for multiple indexes, supports high concurrency, and distributed as open source database operation and maintenance simple and convenient. So far, a large number of users migrated to SequoiaDB from HBase, this article will share data from the actual combat operations HBase migrate to SequoiaDB with you.
1 Export HBase data
first with HBase Hive to export data to a csv file. Hbase data structure is as follows:
FIG. 1 Hbase data structures and integration HBase hive, hive create external table associating
hive create an external table associated reference statements Hbase
CREATE EXTERNAL TABLE hbase_user( id string, name string, phone string, birthday string, id_number string, gender string, email string, address string ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:name,info:phone,info:birthday,info:id_number,info:gender,info:email,i nfo:address" ) # Specify the mapping relationship,: key to RowKey TBLPROPERTIES ( " hbase.table.name " = " the User " ); # specify the name of the table to map hbase
Integration results as shown below
Hive ( default )> SELECT * from hbase_user limit . 1 ; the OK hbase_user.id hbase_user.name hbase_user.phone hbase_user.birthday hbase_user.id_number hbase_user.gender hbase_user.email hbase_user.address 0004928c7287408085403c6ec4cd3c12 Liu 15,073,203,583 , 2013 - 07 - 15 211324199408301340 M @ maojuan crystal City, Heilongjiang Province yahoo.com Xuhui Liupanshui Street Block J 222 331
Export data to csv hive table
Hive derived csv reference table # statement INSERT Overwrite local Directory ' / tmp / Data / hbase_hive_export_user ' Row # DELIMITED the format specified line separator Fields terminated by ' , ' column delimiters # perform SELECT * from hbase_user;
Derived data as shown below, is a csv format
[hadoop @ the Node hbase_hive_export_user.csv] $ tail - 2F 000000_0 ffdca61d22b74462aefdcb948d819542, side Zhiqiang, 18,598,897,076 , 1958 - 08 - 25 , 52062819960928857X, M, ming52 @ gmail.com, Inner Mongolia, Taiyuan County Xunyang History Street Block P 547 199 ffdf82a4e2f84c3a9c99e726153d9496, Fu Yuhua, 14,509,458,979 , 1977 - 08 - 13 , 451022198005119836 , M, Yanhe @ hotmail.com, Liaoning Province, Shenzhen Road shanting soldiers county seat h 706 208
2 Import a CSV file to the SequoiaDB
sdbimprt is SequoiaDB data import tool that can be introduced or JSON-formatted data into SequoiaDB csv format database.
The HBase exported csv file into SequoiaDB, the import command as follows:
sdbimprt --hosts=localhost:11810 --type=csv --file=user.csv -c users -l employee --fields='id string,name string,phone string,birthday string,id_number string,gender string,email string,address string'
Wherein the set of name space users, a set of name employee, execution results are as follows:
$ sdbimprt --hosts=localhost:11810 --type=csv --file=user.csv -c users -l employee --fields='id string,name string,phone string,birthday string,id_number string,gender string,email string,address string' parsed records: 24282 parse failure: 0 sharding records: 0 sharding failure: 0 imported records: 24282 import failure: 0
3 HBase create a mapping table corresponding to the MySQL layer SequoiaDB
CREATE TABLE `employee` ( `id` int(11) DEFAULT NULL, `name` varchar(50) COLLATE utf8mb4_bin DEFAULT NULL, `phone` int(20) DEFAULT NULL, `birthday` datetime DEFAULT NULL, `id_number` int(20) DEFAULT NULL, `gender` varchar(11) COLLATE utf8mb4_bin DEFAULT NULL, `email` varchar(50) COLLATE utf8mb4_bin DEFAULT NULL, `address` int(50) DEFAULT NULL ) ENGINE=SEQUOIADB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
4 View Data Import situation in SequoiaDB graphical interface of SAC
SAC graphical interface is self-development redwood database SequoiaDB a graphical tool that has automated deployment cluster, configuring host and cluster information, monitor the host and cluster status, cluster view data etc. function, can greatly improve the efficiency of database management.
Figure 2 SAC graphical interface features show
log on to SequoiaDB storage engine layer, view the amount of data was 24,282, data import successfully completed.
#sdb > db = new Sdb("localhost", 11810) > db.users.employee.count() 24282 Takes 0.002236s.
FIG 3 SAC SequoiaDB graphical interface to view the data storage layer engine schematic
Meanwhile, in the example MySQL SequoiaDB corresponding layer, can view the corresponding data is 24282
$ /opt/sequoiasql/mysql/bin/mysql -h 127.0.0.1 -P 3306 -u root ... mysql> use users; Database changed mysql> show tables; +-----------------+ | Tables_in_users | +-----------------+ | employee | +-----------------+ 1 row in set (0.00 sec) mysql> select count(*) from employee; +----------+ | count(*) | +----------+ | 24282 | +----------+ 1 row in set (0.00 sec)
In the SAC SequoiaDB graphical interface, see example MySQL layer, the number of data is 24,282 stars
FIG 4 SAC query data interface MySQL schematic example of a layer
5 perform multiple queries index validated in the SequoiaDB
example SequoiaDB layer created in the plurality of indexes mysql
mysql> ALTER TABLE `employee` ADD INDEX index_id (`id`);
mysql> ALTER TABLE `employee` ADD INDEX index_email (`name`);
Examples SequoiaDB mysql layer performs a plurality of index query:
mysql> select count(*) from employee where name="xiuyingxia"; +----------+ | count(*) | +----------+ | 2 | +----------+ 1 row in set (0.00 sec)
HBase SequoiaDB to migrate data from, respectively, and may be viewed in the SequoiaDB SAC interface from the underlying storage layer and MySQL engine example of a layer corresponding to the data introduced and supports multiple references.
Summary
SequoiaDB support complex SQL queries, support for multiple indexes, supports high concurrency, distributed as open source database operation and maintenance simple and convenient. Hbase migrate to SequoiaDB, then sdbimprt can first be introduced by SequoiaDB import tool to SequoiaDB through the hive export data csv file format.