Sequoia Tech | Hbase migrate to combat SequoiaDB

Background

in traditional banking IT infrastructure, online trading and statistical analysis systems often use different technologies and physical device, regularly performed by ETL to migrate online transaction data to the analysis system. And as a data service resource pool, the same data may be shared by different types of micro-access services. While some online business transactions with the audit classes run simultaneously for the same data, it must ensure that a request to perform the complete isolation of the physical environment, so that the transaction analysis business without interference.

HBase is a high-reliability, high-performance, column-oriented, scalable distributed storage system, good at dealing with big data scene, have the following characteristics:

    a large table scale, billion-level line, the one million
    column-oriented storage, retrieval independent column

but there are also in use HBase some of the following questions:

    HBase can only do simple key-value queries can not be achieved complex statistical SQL
    HBase does not support multiple indexing
    operation and maintenance complex. As a complete Hadoop big data analysis framework, the operation is more complex, difficult to locate problems

in order to solve the above problems, in a typical data sets and services business platform, users need to select an elastically scalable distributed relational database, to meet the following requirements:

    standard SQL support
    for highly concurrent
    multi-index supports
    easy maintenance requirements

SequoiaDB Sequoia database is stored in separate computing architecture This architecture can provide unlimited aspect transverse horizontal expansion for the data table, on the other hand by providing a layer in the calculation of different types of databases way of example, 100% compatible MySQL, PostgreSQL and SparkSQL agreement with the syntax. In addition to structured data, SequoiaDB giant sequoias database can support unstructured data, including JSON and S3 object storage, and Posix file systems in the same cluster, including the entire database for the upper layer of micro-architecture application service provides a complete data service resource pool.
SequoiaDB support complex SQL queries, support for multiple indexes, supports high concurrency, and distributed as open source database operation and maintenance simple and convenient. So far, a large number of users migrated to SequoiaDB from HBase, this article will share data from the actual combat operations HBase migrate to SequoiaDB with you.

1 Export HBase data
first with HBase Hive to export data to a csv file. Hbase data structure is as follows:

 

 

 FIG. 1 Hbase data structures and integration HBase hive, hive create external table associating

hive create an external table associated reference statements Hbase

CREATE EXTERNAL TABLE hbase_user(
     
    id string,
     
    name string,
     
    phone string,
     
    birthday string,
     
    id_number string,
     
    gender string,
     
    email string,
     
    address string
     
    ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
     
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:name,info:phone,info:birthday,info:id_number,info:gender,info:email,i nfo:address" ) # Specify the mapping relationship,: key to RowKey 
     
    TBLPROPERTIES ( " hbase.table.name " = " the User " ); # specify the name of the table to map hbase

   
Integration results as shown below

  

Hive ( default )> SELECT * from hbase_user limit . 1 ; 
     
    the OK 
     
    hbase_user.id hbase_user.name hbase_user.phone hbase_user.birthday hbase_user.id_number hbase_user.gender hbase_user.email hbase_user.address 
     
    0004928c7287408085403c6ec4cd3c12 Liu        15,073,203,583  , 2013 - 07 - 15        211324199408301340 M @ maojuan crystal City, Heilongjiang Province yahoo.com Xuhui Liupanshui Street Block J 222 331


Export data to csv hive table

Hive derived csv reference table # statement 
     
    INSERT Overwrite local Directory ' / tmp / Data / hbase_hive_export_user ' Row # DELIMITED the format specified line separator 
     
    Fields terminated by ' , ' column delimiters # perform 
     
    SELECT * from hbase_user;


Derived data as shown below, is a csv format
 

    [hadoop @ the Node hbase_hive_export_user.csv] $ tail - 2F 000000_0 
     
    ffdca61d22b74462aefdcb948d819542, side Zhiqiang, 18,598,897,076 , 1958 - 08 - 25 , 52062819960928857X, M, ming52 @ gmail.com, Inner Mongolia, Taiyuan County Xunyang History Street Block P 547 199 
     
    ffdf82a4e2f84c3a9c99e726153d9496, Fu Yuhua, 14,509,458,979 , 1977 - 08 - 13 , 451022198005119836 , M, Yanhe @ hotmail.com, Liaoning Province, Shenzhen Road shanting soldiers county seat h 706 208


2 Import a CSV file to the SequoiaDB

sdbimprt is SequoiaDB data import tool that can be introduced or JSON-formatted data into SequoiaDB csv format database.

The HBase exported csv file into SequoiaDB, the import command as follows:

sdbimprt --hosts=localhost:11810 --type=csv --file=user.csv -c users -l employee --fields='id string,name string,phone string,birthday string,id_number string,gender string,email string,address string' 


Wherein the set of name space users, a set of name employee, execution results are as follows:

 

  $ sdbimprt --hosts=localhost:11810 --type=csv --file=user.csv -c users -l employee --fields='id string,name string,phone string,birthday string,id_number string,gender string,email string,address string'
     
    parsed records: 24282
     
    parse failure: 0
     
    sharding records: 0
     
    sharding failure: 0
     
    imported records: 24282
     
    import failure: 0


3 HBase create a mapping table corresponding to the MySQL layer SequoiaDB

 

   CREATE TABLE `employee` (
     
      `id` int(11) DEFAULT NULL,
     
      `name` varchar(50) COLLATE utf8mb4_bin DEFAULT NULL,
     
      `phone` int(20) DEFAULT NULL,
     
      `birthday` datetime DEFAULT NULL,
     
      `id_number` int(20) DEFAULT NULL,
     
      `gender` varchar(11) COLLATE utf8mb4_bin DEFAULT NULL,
     
      `email` varchar(50) COLLATE utf8mb4_bin DEFAULT NULL,
     
      `address` int(50) DEFAULT NULL
     
    ) ENGINE=SEQUOIADB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;


4 View Data Import situation in SequoiaDB graphical interface of SAC

SAC graphical interface is self-development redwood database SequoiaDB a graphical tool that has automated deployment cluster, configuring host and cluster information, monitor the host and cluster status, cluster view data etc. function, can greatly improve the efficiency of database management.

 

 Figure 2 SAC graphical interface features show
 

log on to SequoiaDB storage engine layer, view the amount of data was 24,282, data import successfully completed.
 

    #sdb
     
    > db = new Sdb("localhost", 11810)
     
    > db.users.employee.count()
     
    24282
     
    Takes 0.002236s.

 

 FIG 3 SAC SequoiaDB graphical interface to view the data storage layer engine schematic

Meanwhile, in the example MySQL SequoiaDB corresponding layer, can view the corresponding data is 24282

   $ /opt/sequoiasql/mysql/bin/mysql -h 127.0.0.1 -P 3306 -u root
     
    ...
     
    mysql> use users;
     
    Database changed
     
    mysql> show tables;
     
    +-----------------+
     
    | Tables_in_users |
     
    +-----------------+
     
    | employee        |
     
    +-----------------+
     
    1 row in set (0.00 sec)
     
    mysql> select count(*) from employee;
     
    +----------+
     
    | count(*) |
     
    +----------+
     
    |    24282 |
     
    +----------+
     
    1 row in set (0.00 sec)

In the SAC SequoiaDB graphical interface, see example MySQL layer, the number of data is 24,282 stars

 

 FIG 4 SAC query data interface MySQL schematic example of a layer
5 perform multiple queries index validated in the SequoiaDB

example SequoiaDB layer created in the plurality of indexes mysql

    mysql> ALTER TABLE `employee` ADD INDEX index_id (`id`);
     
    mysql> ALTER TABLE `employee` ADD INDEX index_email  (`name`);


Examples SequoiaDB mysql layer performs a plurality of index query:

 

   mysql> select count(*) from employee where name="xiuyingxia";
     
    +----------+
     
    | count(*) |
     
    +----------+
     
    |        2 |
     
    +----------+
     
    1 row in set (0.00 sec)


HBase SequoiaDB to migrate data from, respectively, and may be viewed in the SequoiaDB SAC interface from the underlying storage layer and MySQL engine example of a layer corresponding to the data introduced and supports multiple references.


Summary

SequoiaDB support complex SQL queries, support for multiple indexes, supports high concurrency, distributed as open source database operation and maintenance simple and convenient. Hbase migrate to SequoiaDB, then sdbimprt can first be introduced by SequoiaDB import tool to SequoiaDB through the hive export data csv file format.

Guess you like

Origin www.cnblogs.com/sequoiadbsql/p/11543096.html