基于分布式内存:Data priority will read and write distributed memory
provide快速的读写
Based on distributed disks:Hbase底层依赖于HDFS
provide数据的持久化
Multiple versions: VERSIONS
General situation: the data rows and columns in the table determine a value
mysql
Xiao Ming is 18 years old this year
insert into table values(小明,18);
By next year, I will be 19 years old
update table set age = age + 1;
Query Xiaoming
Akari 19
Does MySQL store how old Xiao Ming was last year?
No
There is only one value in the age column, and the new value will overwrite the old value
Hbase中可以存储多版本
Setting Xiao Ming’s age can store multiple versions: 3 versions
Xiao Ming is 18 years old this year
Akari 18 2020
Next year Xiao Ming will be 19 years old
Akari 18 2020
19 2021
Xiao Ming is 20 years old
Akari 18 2020
19 2021
20 2022
Xiao Ming is 21 years old
Akari 19 2021
20 2022
21 2023
默认只会显示最新的版本
You can query a certain version of any storage according to your needs
How to distinguish between different versions, through 时间戳[The system uses the data insertion time by default]
NoSQL: non-relational database
RDBMS:关系型数据库
MYSQL、Oracle、PostgreSQL
NoSQL:非关系型数据库
Hbase、Redis、MongoDB
Common ground
都是数据库,用于存储数据的
都有数据库、表、行、列的概念,但是有些许的区别
difference
RDBMS
Generally used for storage 结构化数据,行和列是固定的
All支持SQL语句
The performance is quite satisfactory, can store a medium amount of data, and the performance is also medium
NoSQL
storage结构化或者半结构化数据,行和列可以是动态的
一般都不支持SQL语句, Each NoSQL has its own operation command
Each NoSQL has its own characteristics
Large storage capacity, slower reading and writing performance than RDBMS
Storage capacity is small, read and write performance is much faster than RDBMS
Large storage capacity, based on memory storage, read and write performance is faster than RDBMS
The stored data formats are all different
Hbase:容量大、读写快,持久化存储
Redis:容量小、读写快,持久化存储
2. Concept
Basic storage
Hbase passed分布式将多台机器的内存在逻辑上进行了合并
When we write data to Hbase, the data will be written to the memory of a certain machine
当内存中的数据达到一定的条件,会写入HDFS,保存为文件
内存中会存储新的数据
When we read data from Hbase, look for data from memory first
如果数据还在内存中,直接返回
If the data is not in the memory, it must be in the HDFS file
The first time: the data will be found from the HDFS file and then returned
If the data is cached, it can be put into the cache after the first read
Second time: read directly from the cache
The concept and logical structure of Hbase
concept
MySQL
Hbase
database
database
namespace
table
table
table
Row
primary key: primary key
rowkey: row key
Column family
-
column family
Column
column
column
Multi-version
-
VERSION
Timestamp
timestamp
timestamp
namespace: namespace
A similar 数据库concept, used to distinguish the storage of tables, each table must belong to a certain NameSpace
Is the database in Hbase
note
Hbase的数据库是不能切换的
table
When accessing a table in Hbase, the name of the namespace must be added
namespace:tbname
Special case: the table in the default database can be omitted
As long as the table accessed without namespace is the table in the default database
rowkey: row key, its function is the function of the primary key in MySQL
Similar to the concept of the primary key: used to uniquely mark a row
the difference
In MySQL, there are generally two ways to design主键
Method 1: From the data 找一列, as the primary key, uniquely mark a row
stuId name age sex
stuId: as the primary key
Method 2: Use 自增a column as the primary key
id name age sex
id is an auto-incremented column as the primary key
Can have no primary key
The row key in Hbase is a very independent column,任何一张Hbase表都必须有rowkey
列名是定死的,就叫rowkey
Of this column值是什么由你自己决定
Must be unique
hbase table
rowkey name age sex
The role of Rowkey
唯一标识一行
顺序:将Hbase中每一行的数据根据Rowkey构建有序
Also as in Hbase唯一索引
column family: column family or column cluster
Essential design concept: that is分组
will拥有相似IO属性的列进行分组
Similar IO attributes:要读一起读,要写一起写的列
In Hbase任何一列都必须要属于某一个列族,rowkey除外
Hbase any table至少要有一个列族
The grouping is up to you
student table
basic: column family 1
name
age
other: column family 2
phone
address
column: a column in the hbase table, any column must belong to a column family
如果访问某一列,必须加上列族的名称
cf:colname
Each row in the Hbase table can have different columns
id name age
001 zhangsan 18
id name age sex
002 lisi 20 male
id name age phone
003 wangwu 20 110
VERSION: This is an attribute of the column family, you can define that all the columns in this column family can store the value of several versions, the default is 1
Each column has only 1 version by default
默认查询时,只显示每一列的最新的版本
Timestamp: timestamp, marking the time when the data was written
Also used for区分的不同的版本
The default value is the write time of the data
Each column has its own timestamp
Logical structure
3. Column storage
Traditional MySQL and other databases are stored in rows
Columns are defined when the table is created, and each row has these columns
Every time you insert a piece of data, it is a row
insert into table values(001,小明,18,male,110,null);
Every modification, query, and deletion is performed on the row
update tbname set key=value where;
Update by line
select * from tbname where
Query by line
select id,name from tbname where;
First obtain all eligible rows from the file, and then filter the columns of each row
delete from table where ;
Delete by line
Every piece of data in Hbase is operated by column
Every operation: insert, delete, update, are all operations on the column
Insert: Insert a column for a Rowkey
Column is dynamically inserted
No need to define columns when creating a table
Each row and column in the hbase table can be different
Delete: delete a column of a rowkey
Why do you want to operate in columns?
The underlying storage of Hbase is按列存储
The data of a column of a table are all together
Data processed in big data is processed in columns
select id,name from tbname where;
Option One:将所有行取出来,然后过滤每一行这两列
Store by row
Option II:从表的数据中直接取出这两列数据
Store by column
Can Hbase implement an architecture similar to MySQL?
Yes, let each row of the Hbase table have the same column