Storage structure

1. Storage concept features
2. Logical structure
3. Physical structure
4. Storage architecture

1. Storage concept features

Features

优先读写内存
- Write: write directly to memory
- Read: read the memory first, if not, then read HDFS
All data in the entire Hbase are arranged in the order of Rowkey
When querying data, you canrowkey前缀匹配查询
Rowkey是整个Hbase的唯一索引
Data type: The underlying storage types of Hbase are all bytes
data structure:KV结构，每一列在底层存储都是一个KV
- Key：Rowkey+列族名称+列的名称+时间戳
  - 20200101_001 column=basic:age, timestamp=1593247238403
- Value：值
  - value=18

concept

`RowKey：类似于主键的概念`

Uniquely mark a line
基于rowkey排序
唯一索引

`RegionServer：HBASE架构中的从节点`

All Hbase data is written to the RegionServer, and data is also read from the RegionServer
- Write: written into the memory of Regionserver
- Read: read the memory of RegionServer first
There will be multiple slave nodes
- 每一台RegionServer上都可以有很多个Region
- Similar to DataNode
- RegionServer是将数据存储在内存中
- RegionServer will call HDFS client to write data to HDFS
- DataNode是将数据存储在磁盘上

`Region：RegionServer中的存储单元`

Just数据表的分区
一个Region必然会被某一个RegionServer所管理
一个RegionServer可以管理多个Region

Thinking: I have a table, I have a lot of data, and I want to write to this table, how can I make this table distributed?

Only with this table 分布式, can I read and write data in parallel in a distributed manner?
Let a table have multiple partitions, and different partitions are stored on different machines
- region：分区
- regionserver：region所在的机器
Write data to the table, according to certain rules, write to a certain partition
- Suppose there is a table with three partitions
  - region0：node-01
  - region1：node-02
  - region2：node-03
  - The first data: write to the first partition region0
  - The second piece of data: write to the second partition region1
  - The third piece of data: write to the third partition region2
  - ……
  - Implement distributed storage

In Hbase:

Each table has only one partition at the beginning by default
每个Region都有一个范围
If a table has only one region: -oo ~ +oo
If a table has two regions
- region0 ： -oo ~ 1000
- region1：1000 ~ +oo
- ……
If a table has five partitions
- region0 ： -oo ~ 10
- region1：10 ~ 20
- region2：20 ~ 30
- region3：30 ~ 40
- region4：40 ~ +oo
分区规则：根据rowkey所属的范围
- analogy

concept	HDFS	Hbase
data	file	table
Split	Block: Block	`分区：Region`
rule	128M a Block	`范围`
machine	DataNode	RegionoServer
	Block has a copy	`分区没有副本`

Demo: a table has multiple partitions

create 'itcast01:t1', 'info', SPLITS => ['10', '20', '30', '40']

Insert picture description here

Partition rules: partition according to rowkey
- 1111：region1
  - 10 < 1111 < 20
- 5：region4
  - 40 < 5 < +oo
The relationship between Region and table
- Region就是表的分区, A table has only one partition by default, and there can be multiple partitions
- When writing data to the table, it will be written to different partitions of this table according to the partitioning rules
The relationship between Region and RegionServer
- Region is stored in RegionServer and managed by RegionServer
- Region是RegionServer负载均衡的最小单元
  - regionserver1：10个region
  - regionserver2: 5 regions
  - regionserver3: 3 regions
  - |
  - |Load Balancing: Balanced Distribution of Regions
  - |
  - regionserver1：6个region
  - regionserver2: 6 regions
  - regionserver3: 6 regions

Store: Smaller storage unit in Region

region是表在rowkey的方向上进行了划分多个分区
- Rowkey ： -oo ~ + oo
Store是表中列的划分，store就是列族
The data corresponding to each Rowkey in this table has a column family, and there is a Store in the corresponding region
The data of different column families are stored in different Stores, and the data of the same column families are stored together
- Why design the concept of column family?
  - Just为了将经常一起读写的数据存在一起（相似io属性）
  - 加快读写的效率

memstore: The memory area where data is stored in the Store

每个store中都有一个memstore，用于存储写入的数据
不同列族就是store的数据写入不同的内存
The memory of the regionserver where this region is used is used

StoreFile: The file refreshed from the memory in each Store is storefile

一个Store中逻辑上可能会有多个storefile文件
Storefile is stored on HDFS, encapsulation of HFILE

2. Logical structure

Put statement

put 'ns:tbname','rowkey',cf:col,value

RegionServer: slave node, used to store all data in Hbase
- Table: All data exists in the form of tables, and reads and writes are specified tables
  - Region: to realize the association between the table and the RegionServer, the partition of the table, used to rowkeyturn the table into a distributed
    - Store: Each partition 列族is divided into stores according to internal partitions, smaller partitions
      - MemStore: The memory area of Regionserver, each Store has 1 MemStore
      - StoreFIle: multiple, flashed data in the memory
        
        Logically belongs to this store
        
        Really stored on HDFS
        
        分区没有副本，如果分区所在的宕机了怎么办？
        
        Hbase will restore this region on other regionservers
        
        How can the data be guaranteed not to be lost?
        
        StoreFile可以通过HDFS副本进行恢复
        
        How to recover the data in memStore?
Monitoring logical structure
table
Partition: Region

3. Physical structure

The machine where the regionServer is located
- 内存
- can not see
The machine where the DataNode is located
- Disk
Structure of storage on HDFS
- NameSpace: corresponds to a directory in HDFS
Table: Corresponding to a directory on HDFS
Region: corresponds to a directory on HDFS
Store: corresponds to a directory on HDFS, that is, column family
StoreFile: File on HDFS

4. Storage architecture

Insert picture description here

Step 1: The client finds the corresponding region based on the table name and rowkey
- How does it know that this table has some regions?
- How does it know the extent of each region?
- How does it know which regionserver this region is on?
The second step:客户端会请求这个region所在的Regionserver，提交写请求
third step:RegionServer根据我们的请求来写入Region，根据列族来判断写入哪个Store
the fourth step:将这个数据读写Store中的MemStore
- If it is read:先读memstore ，如果没有读StoreFIle
- If writing: write directly to MemStore
  - When the data in MemStore reaches certain conditions, Flush will be triggered automatically
    - Flash the data in memStore into Storefile file and write it to HDFS

Hbase--storage structure

Storage structure

1. Storage concept features

Features

concept

`RowKey：类似于主键的概念`

`RegionServer：HBASE架构中的从节点`

`Region：RegionServer中的存储单元`

Thinking: I have a table, I have a lot of data, and I want to write to this table, how can I make this table distributed?

In Hbase:

Store: Smaller storage unit in Region

memstore: The memory area where data is stored in the Store

StoreFile: The file refreshed from the memory in each Store is storefile

2. Logical structure

3. Physical structure

4. Storage architecture

Guess you like