HBase (1) basic introduction

1. Introduction to NoSQL

1.1 What is NoSQL

NoSQL: not only SQL, non-relational database

NoSQL is a general term

  • Refers to databases that do not follow the traditional RDBMS model
  • The data is non-relational and does not use SQL as the main query language
  • Solve database scalability and availability issues
  • Does not address atomicity or consistency issues

image-20200924163336133

1.2 Why use NoSQL

With the development of the Internet, traditional relational databases have bottlenecks

  • High concurrent reading and writing
  • High storage capacity
  • High availability
  • High scalability
  • low cost

Comparison of NoSQL and relational databases

There are mainly the following differences

Compared NoSQL Relational Database
Common database HBase、MongoDB、Redis Oracle、DB2、MySQL
Storage format Documents, key-value pairs, graph structure Table format, rows and columns
Storage specification Encourage redundancy Normative, avoid duplication
Storage expansion Scale out, distributed Vertical expansion (limited horizontal expansion)
inquiry mode Structured query language SQL Unstructured query
Affairs Does not support transaction consistency Support affairs
performance High read and write performance Poor read and write performance
cost Simple and easy to deploy, open source, low cost high cost

1.3 Features of NoSQL

  • Final consistency

  • The application has increased the responsibilities of maintaining consistency and handling transactions

  • Redundant data storage

  • NoSQL != Big data

    • NoSQL products are to help solve big data storage problems
    • Big data includes more than just data storage issues
      • Hadoop
      • Kafka
      • Spark, etc

1.4 Basic Concepts of NoSQL

  • Three cornerstones
    • CAP, BASE, final consistency
  • Indexing (index), Query (query)
  • MapReduce
  • Sharding
  1. CAP theory
  • The database supports up to 2 of 3
    • Consistency
    • Availability
    • Partition Tolerance (partition fault tolerance)
  • NoSQL does not guarantee "ACID"
  • Provide "eventual consistency"

image-20200924163738973

  1. BASE
  • Basically Availble (basically available)
    • Ensure that the core is available
  • Soft-state
    • The state can be out of sync for a while
  • Eventual Consistency (eventual consistency)
    • After a certain period of time, the data can finally reach a consistent state
  • The core idea is that even if strong consistency cannot be achieved, the application can choose a suitable way to achieve final consistency
  1. Final consistency
  • The end result is consistent, not always consistent
  • Data such as account balance and inventory must be strongly consistent
  • Information such as catalog does not require strong consistency
    • Causal consistency (Causal consistency)
    • Read-your-writes consistency
    • Session consistency

Index and query

  • Indexing (Indexing)
    Most NoSQL is indexed by key.
    Part of NoSQL allows secondary index
    HBase to use HDFS, append-only
    batch write Logged
    to recreate and sort files
  • Query (query)
    does not have a special query language, usually use scripting language for query,
    some start to support SQL query,
    some can use MapReduce code query

MapReduce、Sharding

  • MapReduce
    is not Hadoop's MapReduce, and the concept is related
    to data processing and query
  • Sharding (sharding)
    a partitioning mode that
    can replicate shards, which
    is good for disaster recovery

1.5 NoSQL classification

Mainly divided into the following four categories

classification For example Typical application scenarios
Key-value store database (key-value) Redis, MemcacheDB, Voldemort Content caching, etc.
Column store database (WIDE COLUMN STORE) Cassandra, HBase Respond to the massive data of distributed storage
Document database (DOCUMENT STORE) CouchDB, MongoDB Web application (can be regarded as an upgraded version of the key-value database)
GRAPH DB Neo4J, InfoGrid, Infinite Graph Social networks, recommendation systems, etc., focusing on building a relationship graph

Key-Value Store Database (Key-Value)

image-20200924164747438

Column Store Database (Wide Column Store)

image-20200924164759220

Document Store

image-20200924164819333

Graph Databases

image-20200924164834330

1.6 The relationship between NoSQL, BI and big data

  • BI (Business Intelligence): Business Intelligence
    It is a complete set of solutions.
    BI applications involve models, which depend on the model.
    BI mainly supports standard SQL, and NoSQL support is weaker than relational databases.
  • NoSQL has a high correlation with big data.
    Generally, column storage databases are used in big data scenarios,
    such as HBase and Hadoop.

2. Introduction to HBase

2.1 HBase overview

  • HBase is a leading NoSQL database. It
    is a column-oriented storage database. It
    is a distributed hash map
    based on the Google Big Table paper. It
    uses HDFS as storage and uses its reliability.
  • HBase features
    Fast data access speed, response time is about 2-20 milliseconds
    Support random read and write, each node 20k~100k+ ops/s
    scalability, can be expanded to 20,000+ nodes

2.2 HBase development history

time event
year 2006 Google published a paper on Big Table
2007 The first version of HBase and Hadoop 0.15.0 are released together
Year 2008 HBase becomes a sub-project of Hadoop
year 2010 HBase becomes the top Apache project
year 2011 Cloudera launches CDH3 based on HBase0.90.1
2012 HBase released version 0.94
2013-2014 HBase has released 0.96 version/0.98 version
2015-2016 HBase has released version 1.0, version 1.1 and version 1.2.4
2017 HBase released version 1.3
2018 HBase released version 1.4 and version 2.0

2.3 HBase user groups

image-20200924165725077

2.4 HBase application scenarios

  • Application scenario-1

Incremental data-time series data

High capacity, high speed writing

image-20200924165817407

  • Application scenario-2

Information exchange-messaging

High capacity, high speed reading and writing

image-20200924165843295

  • Application scenario-3

Content Service-Web Backend Application

High capacity, high speed reading and writing

image-20200924165909477

2.5 Apache HBase Ecosystem

HBase ecosystem technology
Lily – CRM
OpenTSDB based on HBase – HBase-oriented time series data management
Kylin – OLAP
Phoenix
on HBase – SQL operation HBase tool Splice Machine – OLTP based on HBase
Apache Tephra – HBase transaction support
TiDB – Distributed SQL DB
Apache Omid-Optimize transaction management
Yarn application timeline server v.2 Migrate to HBase
Hive metadata storage can be migrated to HBase
Ambari Metrics Server will use HBase for data storage

2.6HBase architecture

1. Physical architecture

HBase adopts Master/Slave architecture

image-20200924170008602

  • The role
    of HMaster is the master node of the HBase cluster, which can be configured with multiple nodes to achieve HA
    management and distribution. Region
    is responsible for the load balancing of RegionServers.
    Finds the failed RegionServer and redistributes the Region on it

  • RegionServer

    RegionServer is responsible for the management and maintenance of Region.
    One RegionServer contains one WAL, one BlockCache (read cache) and multiple Regions.
    One Region contains multiple storage areas. Each storage area corresponds to a column family.
    One storage area is composed of multiple StoreFiles and MemStores.
    One StoreFile corresponds to One HFile and a column family
    HFile and WAL are stored as sequence files on HDFS,
    Client interacts with RegionServer

image-20200924170110410

  • Region和Table

image-20200924170132008

2. Logical Architecture Row

  • Rowkey (row key) is unique and sorted
  • Schema can define when to insert records
  • Each Row can define its own column, even if other Rows are not used
    • Related columns are defined as column families
  • Maintain multiple Row versions with unique timestamps
    • The value type can be different in different versions
  • HBase data is all stored in bytes

image-20200924170259209

2.7 HBase data management

  • Data Management Directory
    • System catalog table hbase:meta
      • Store metadata, etc.
    • Files in HDFS directory
    • Region instance on Servers
  • HBase data on HDFS
    • Can be repaired through HDFS File
    • Repair path
      • RegionServer->Table->Region->RowKey->列族

image-20200924170349335

2.8HBase architecture features

  • Strong consistency
  • Automatic expansion
    • Automatically split when Region becomes large
    • Use HDFS to expand data and manage space
  • Write recovery
    • 使用WAL(Write Ahead Log)
  • Integration with Hadoop

Guess you like

Origin blog.csdn.net/zmzdmx/article/details/108778691