Lu Hao: Analysis of Cloud Database HBase Product Architecture Scenarios

In the HBase special session of the 2018 Database Live Lecture Hall Summit, Alibaba Cloud technical expert Lu Hao brought a scene analysis of ApsaraDB for HBase product architecture. This article mainly talks about the cloud HBase product architecture, and then focuses on sharing the cloud HBase application scenario analysis and typical customer cases, then introduces the cloud HBase kernel optimization and features, and finally briefly shares the cloud HBase platform operation and maintenance and stability assurance.
Live video : https://yq.aliyun.com/video/play/1333
PDF download : https://yq.aliyun.com/download/2458
Here are the highlights of the video content:

Cloud HBase Product Architecture

The relational database mainly solves the storage requirements of small and medium scales. When the amount of data becomes large, there will be sub-databases and sub-tables to meet the needs of a certain capacity to achieve complex and business awareness. When the amount of data reaches mass storage, there will be distributed storage and mass storage. , the database will sacrifice some consistency requirements to achieve tens of millions of concurrency and QPS.
The problems encountered by traditional relational databases mainly include four aspects:

  1. Cost: Generally high-end storage is required, and the cost is high!
  2. Capacity: Unable to meet TB, PB level storage.
  3. QPS: It cannot meet the ultra-high concurrency requirements, and the performance cannot be scaled horizontally.
  4. Analysis: Lack of framework and support for analysis.

While HBase uses ordinary disks, its distributed storage can easily meet the needs from GB to PB, and can automatically scale out to meet the demand of up to 5000w QPS. Spark on HBase natively supports analysis requirements, and analysis performance can be accelerated by analyzing HFile.
HBase supports real-time update, incremental import, multi-dimensional deletion, random query, and range query. It is an online distributed NOSQL database with high scalability, high availability, high reliability, high performance, and high adaptability.

1


HBase also solves the problems that other relational databases cannot solve, supporting multi-version, dynamic columns, heterogeneous storage, etc.

ApsaraDB HBase

 

2


ApsaraDB HBase provides operation and maintenance systems such as security, multi-activity, stability, and synchronization. The bottom layer is based on shared storage to separate computing and storage. The HBase kernel we use is the internal version of Alibaba HBase. Compared with the open source version, many improvements have been made in terms of performance. There is a certain improvement. HBase naturally supports KV access. Integrating other components on top of HBase can provide richer access forms. We have achieved a good connection with other Alibaba products and can support streaming, batch processing and Machine learning needs.
The main features of ApsaraDB HBase include large capacity (200G-10P), dynamic capacity expansion, high concurrency/high throughput (1W-5000W), and a powerful and rich ecosystem.

3


ApsaraDB HBase supports rich interfaces, such as KV, SQL, table storage, document types, etc.

4


ApsaraDB HBase product form is divided into cluster version and single-node version. The single-node version mainly meets the needs of testing and development, and the cost is extremely low. The cluster version is further divided into cloud disk and local disk. Cloud disk is characterized by the separation of storage and computing, which can be easily expanded. Local disk is the same as using physical machine to build HBase. Storage and computing are not separated, but storage is cheap and latency is low.
ApsaraDB HBase integrates well with many products on the cloud, including support for:

  • EMR Spark: Includes Spark components, which can access HBase and analyze data. SparkStreaming can write data to HBase in real time;
  • ODPS SQL: HBase data can be synchronized to ODPS in real time, and ODPS can be calculated offline to meet the needs of offline data warehouses;
  • ElasticSearch: Fields in HBase, real-time retrieval requirements;
  • Blink: Stream computing writes to HBase.

    5

 


The difference between ApsaraDB HBase and open source HBase (EMR HBase or self-built) is shown in the figure. Cloud HBase is fully managed, and all operation and maintenance work is done by Alibaba Cloud, supporting active-active, and the kernel has been optimized in terms of performance, active and standby. .
Compared with competing products, our products are more mature, with 2-3 times higher core performance, lower latency and higher stability.

Cloud HBase application scenario analysis and typical customer cases

HBase has a wide range of application scenarios. From the perspective of storage types, HBase supports report, time series, log, message, recommendation, risk control, and trajectory data; from the application industry, e-commerce, Internet of Things, Chat software, finance, advertisers, news, telecommunications, etc. are in use. Alibaba has hundreds of clusters and hundreds of businesses, with a total of 10,000+ nodes, PB+ data, 100 million+ TPS, mainly supporting logging, chat, monitoring, orders, IOT, risk control and search services. Alibaba, JD.com, Xiaomi , Tencent, NetEase, 360, Zhihu, China Life, Telecom, etc. are all using HBase.

A car networking company

 

6


A car networking enterprise uses the HBase architecture as shown in the figure. The data is written to HBase through stream computing cleaning through the Ali IOT suite, and the stored car trajectory data and sensor data are analyzed and calculated.
The Rowkey design is to use Sub(Hash(Vehicle ID), 5) + Vehicle ID + Time, upload once every 10s for each vehicle, 1KB each time. GeoHash is used to store track information, 1 million vehicles store 3P data for one year, and read and write requests reach 100w+.

White Knight (big data risk control company)

 

7


User behavior data is highly unstructured, data comes from different sources, and each source has a different structure. HBase can well support data storage of various structures. The raw data information collected by the crawler and APP will use Spark to do some algorithm training, the algorithm results will be written back to HBase, and Spark SQL will be used to generate some reports.

SoulSocial

 

8


Social messaging is a feed stream mode message promotion. The feed stream needs to be queried from the database according to dimensions such as time and interest, which requires very high system availability. We have made a dual-cluster guarantee, the SLA requirement reaches 99.99, the peak QPS of single-cluster read and write is 1000w+, and the data volume reaches 30T.

A financial company (real-time query of historical data)

 

9


Financial companies need to retain historical data for a long time and query in real time. HBase has great advantages in this scenario. ODPS is loaded into HBase in batches, and HBase uses Phoenix to implement SQL real-time query. A single table has 1,000 billion data, and has established many secondary Index, multiple index fields, data volume up to 100T.

data flow

 

10


The big picture of HBase data flow accumulated by Alibaba Cloud over the years is shown in the figure. It can be seen that the data source can be ECS services, sensors, etc., and it can be written to HBase through the message queue through stream computing, or directly on ECS. Write, also supports writing directly to HBase from the message queue. In addition, data from other data sources can also be written in batches through data synchronization. Data export can be read and analyzed in real time through ECS, and it can also be indexed in real time to synchronize ES, etc.
Many customers trust ApsaraDB HBase, including Dasouche, Qianxun Location, Rainbow Fund, Ant Financial, Yifangyun, Nanhua Futures, White Knight, etc.

Cloud HBase Kernel Optimization and Features

Alibaba has carried out hundreds of optimizations and functional improvements to the cloud HBase kernel, and has experienced Tmall Double Eleven to serve the Alibaba Group, with hundreds of clusters, 10,000+ machines, 1 billion QPS, and the largest cluster of 2,000 units. Wide range of applications, there are 2 HBase PMC, 3 Committer, dozens of kernel contributors contributed 200+ patches.
HBase performance optimization includes higher QPS, up to 200% higher random read, 50% higher random write, higher compression ratio, and smoother read and write latency.
Cloud HBase also has the following features:

  • Cloud HBase provides the incremental export function, which writes incremental data to the message middleware in real time, and then synchronizes the data to ODPS for offline analysis, or to ES for full-text indexing. The original data is stored in HBase, and the search fields are stored in ES.
  • Cloud HBase also supports enterprise security. Use the username and password to log in to HBase, so that there can be a security whitelist and data encryption will be performed.
  • Cloud HBase supports public network access, which can be accessed on your own development machine, which is convenient for users to deploy development and testing environments offline, and facilitate offline HBase clusters to go to the cloud.

Cloud HBase Platform Operation and Maintenance and Stability Guarantee

Our data reliability can reach 9 nines, almost no data is lost, and our service availability is 99.9% for single cluster and 99.99% for dual cluster.
ApsaraDB HBase provides many guarantees, including operation and maintenance automation, automatic guardian service, online expansion of nodes/disks, online kernel upgrade, availability detection/capacity alarm, 15-minute fast delivery, indicator visualization and 24-hour online expert online service.
In terms of stable operation and maintenance processing, we will do hot spot detection and automatic migration, MajorCompaction staged processing, read-write separation, large Scan alarm, HDFS timing automatic balancing, and more parameters to take effect online. ApsaraDB HBase active-active guarantees availability, and the switchover time is within 20 seconds.

This article is organized by Mao He of Yunqi Volunteer Group, edited by Baijian

Read the original text http://click.aliyun.com/m/41278/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324433371&siteId=291194637