What does the data model of HBase look like?

What does the data model of HBase look like?

The data model of HBase is column-oriented, which is an open source implementation based on the Bigtable paper. In HBase, data is organized into tables, which consist of rows and columns. Each row has a unique row key (row key), which is used to identify the row's data. A column is composed of a column family and a column qualifier.

A column family is a collection of related columns that are physically stored together and share the same storage and access policies. Column families need to be defined when the table is created and cannot be changed later. Column families can be scaled horizontally according to application needs to meet higher concurrent access requirements.

A column qualifier is used to uniquely identify a column, which is a sub-id under the column family. Column qualifiers under different column families can be repeated, but column qualifiers under the same column family must be unique. Column qualifiers can be added dynamically to column families without having to be defined in advance.

The data model of HBase also has the following characteristics:

  1. Flexible number of columns: HBase tables can have a lot of columns, and new columns can even be added dynamically. This makes HBase suitable for storing semi-structured and unstructured data, and can flexibly adapt to various types of data storage needs.

  2. Column storage: HBase stores data on disk by column, not by row. This storage method enables HBase to efficiently handle large-scale data read and write operations. When the data of a certain column needs to be queried, HBase only needs to read the data of this column, instead of reading the data of the entire row, thus improving the query efficiency.

  3. Version control: HBase can store multiple versions of data for each cell. This allows HBase to keep a historical record of the data and can support time range queries. Version control can also be used to implement optimistic concurrency control to avoid data conflicts.

The following is a sample code that demonstrates how to use HBase's Java API to create tables, insert data, and query data:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseExample {
    
    
    public static void main(String[] args) throws Exception {
    
    
        Configuration conf = HBaseConfiguration.create();
        Connection connection = ConnectionFactory.createConnection(conf);
        Admin admin = connection.getAdmin();

        TableName tableName = TableName.valueOf("mytable");
        HTableDescriptor tableDescriptor = new HTableDescriptor(tableName);

        HColumnDescriptor columnFamily = new HColumnDescriptor("cf");
        tableDescriptor.addFamily(columnFamily);

        admin.createTable(tableDescriptor);

        Table table = connection.getTable(tableName);

        Put put = new Put(Bytes.toBytes("row1"));
        put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("col1"), Bytes.toBytes("value1"));
        put.addColumn(Bytes.toBytes("cf"), Bytes.toBytes("col2"), Bytes.toBytes("value2"));

        table.put(put);

        Get get = new Get(Bytes.toBytes("row1"));
        Result result = table.get(get);

        byte[] value1 = result.getValue(Bytes.toBytes("cf"), Bytes.toBytes("col1"));
        byte[] value2 = result.getValue(Bytes.toBytes("cf"), Bytes.toBytes("col2"));

        System.out.println("Col1: " + Bytes.toString(value1));
        System.out.println("Col2: " + Bytes.toString(value2));

        table.close();
        connection.close();
    }
}

The above code demonstrates how to use HBase's Java API to create tables, insert data and query data. Through these operations, we can realize the understanding and practical application of the HBase data model.

To sum up, HBase's data model is column-oriented, organizing and storing data through tables, rows, column families, and column qualifiers. It has the characteristics of flexible column number, column storage and version control, etc. It is suitable for storing and processing massive data, and can meet the needs of real-time query.

Guess you like

Origin blog.csdn.net/qq_51447496/article/details/132725801