Hbase--features and basic concepts

1. Features

Big watch

  • 存储的容量非常大
  • 亿级别的行和百万级别的列

distributed

  • Distributed storage
  • 基于分布式内存:Data priority will read and write distributed memory
    • provide快速的读写
  • Based on distributed disks:Hbase底层依赖于HDFS
    • provide数据的持久化

Multiple versions: VERSIONS

  • General situation: the data rows and columns in the table determine a value
  • mysql
    • Xiao Ming is 18 years old this year
      • insert into table values(小明,18);
    • By next year, I will be 19 years old
      • update table set age = age + 1;
    • Query Xiaoming
      • Akari 19
    • Does MySQL store how old Xiao Ming was last year?
      • No
      • There is only one value in the age column, and the new value will overwrite the old value
  • Hbase中可以存储多版本
    • Setting Xiao Ming’s age can store multiple versions: 3 versions
    • Xiao Ming is 18 years old this year
      • Akari 18 2020
    • Next year Xiao Ming will be 19 years old
      • Akari 18 2020
      • 19 2021
    • Xiao Ming is 20 years old
      • Akari 18 2020
      • 19 2021
      • 20 2022
    • Xiao Ming is 21 years old
      • Akari 19 2021
      • 20 2022
      • 21 2023
    • 默认只会显示最新的版本
      • You can query a certain version of any storage according to your needs
      • How to distinguish between different versions, through 时间戳[The system uses the data insertion time by default]

NoSQL: non-relational database

  • RDBMS:关系型数据库
    • MYSQL、Oracle、PostgreSQL
  • NoSQL:非关系型数据库
    • Hbase、Redis、MongoDB
  • Common ground
    • 都是数据库,用于存储数据的
    • 都有数据库、表、行、列的概念,但是有些许的区别
  • difference
    • RDBMS
      • Generally used for storage 结构化数据,行和列是固定的
      • All支持SQL语句
      • The performance is quite satisfactory, can store a medium amount of data, and the performance is also medium
    • NoSQL
      • storage结构化或者半结构化数据,行和列可以是动态的
      • 一般都不支持SQL语句, Each NoSQL has its own operation command
      • Each NoSQL has its own characteristics
        • Large storage capacity, slower reading and writing performance than RDBMS
        • Storage capacity is small, read and write performance is much faster than RDBMS
        • Large storage capacity, based on memory storage, read and write performance is faster than RDBMS
        • The stored data formats are all different
      • Hbase:容量大、读写快,持久化存储
      • Redis:容量小、读写快,持久化存储

2. Concept

Basic storage

  • Hbase passed分布式将多台机器的内存在逻辑上进行了合并
  • When we write data to Hbase, the data will be written to the memory of a certain machine
    • 当内存中的数据达到一定的条件,会写入HDFS,保存为文件
    • 内存中会存储新的数据
  • When we read data from Hbase, look for data from memory first
    • 如果数据还在内存中,直接返回
    • If the data is not in the memory, it must be in the HDFS file
      • The first time: the data will be found from the HDFS file and then returned
        • If the data is cached, it can be put into the cache after the first read
      • Second time: read directly from the cache

The concept and logical structure of Hbase

concept MySQL Hbase
database database namespace
table table table
Row primary key: primary key rowkey: row key
Column family - column family
Column column column
Multi-version - VERSION
Timestamp timestamp timestamp

namespace: namespace

  • A similar 数据库concept, used to distinguish the storage of tables, each table must belong to a certain NameSpace
  • Is the database in Hbase
  • note
    • Hbase的数据库是不能切换的

table

  • When accessing a table in Hbase, the name of the namespace must be added
  • namespace:tbname
    • Special case: the table in the default database can be omitted
    • As long as the table accessed without namespace is the table in the default database

rowkey: row key, its function is the function of the primary key in MySQL

  • Similar to the concept of the primary key: used to uniquely mark a row
  • the difference
    • In MySQL, there are generally two ways to design主键
      • Method 1: From the data 找一列, as the primary key, uniquely mark a row
      • stuId name age sex
      • stuId: as the primary key
      • Method 2: Use 自增a column as the primary key
      • id name age sex
      • id is an auto-incremented column as the primary key
      • Can have no primary key
    • The row key in Hbase is a very independent column,任何一张Hbase表都必须有rowkey
      • 列名是定死的,就叫rowkey
      • Of this column值是什么由你自己决定
      • Must be unique
      • hbase table
      • rowkey name age sex
  • The role of Rowkey
    • 唯一标识一行
    • 顺序:将Hbase中每一行的数据根据Rowkey构建有序
    • Also as in Hbase唯一索引

column family: column family or column cluster

  • Essential design concept: that is分组
  • will拥有相似IO属性的列进行分组
    • Similar IO attributes:要读一起读,要写一起写的列
  • In Hbase任何一列都必须要属于某一个列族,rowkey除外
  • Hbase any table至少要有一个列族
  • The grouping is up to you
  • student table
    • basic: column family 1
      • name
      • age
    • other: column family 2
      • phone
      • address

column: a column in the hbase table, any column must belong to a column family

  • 如果访问某一列,必须加上列族的名称
    • cf:colname
  • Each row in the Hbase table can have different columns
id		name			age
001		zhangsan		18

id		name			age		sex
002		lisi			20		male

id		name			age		phone
003		wangwu			20		110

VERSION: This is an attribute of the column family, you can define that all the columns in this column family can store the value of several versions, the default is 1

  • Each column has only 1 version by default
  • 默认查询时,只显示每一列的最新的版本

Timestamp: timestamp, marking the time when the data was written

  • Also used for区分的不同的版本
  • The default value is the write time of the data
  • Each column has its own timestamp

Logical structure

Insert picture description here

3. Column storage

  • Traditional MySQL and other databases are stored in rows
    • Columns are defined when the table is created, and each row has these columns
    • Every time you insert a piece of data, it is a row
      • insert into table values(001,小明,18,male,110,null);
    • Every modification, query, and deletion is performed on the row
      • update tbname set key=value where;
        • Update by line
      • select * from tbname where
        • Query by line
        • select id,name from tbname where;
        • First obtain all eligible rows from the file, and then filter the columns of each row
      • delete from table where ;
        • Delete by line
  • Every piece of data in Hbase is operated by column
    • Every operation: insert, delete, update, are all operations on the column
    • Insert: Insert a column for a Rowkey
      • Column is dynamically inserted
      • No need to define columns when creating a table
      • Each row and column in the hbase table can be different
    • Delete: delete a column of a rowkey
  • Why do you want to operate in columns?
    • The underlying storage of Hbase is按列存储
    • The data of a column of a table are all together
    • Data processed in big data is processed in columns
      • select id,name from tbname where;
      • Option One:将所有行取出来,然后过滤每一行这两列
        • Store by row
      • Option II:从表的数据中直接取出这两列数据
        • Store by column
  • Can Hbase implement an architecture similar to MySQL?
    • Yes, let each row of the Hbase table have the same column
  • 颗粒度更细的一种数据的存储方式

Guess you like

Origin blog.csdn.net/qq_46893497/article/details/114182540