Hbase FAQ

1. Hbase hotspot (data skew) problem, read and write requests will be concentrated on a certain RegionServer

Reasons for hot issues:

1. The data in hbase is sorted in lexicographic order. When a large number of consecutive rowkeys are written in individual regions, the data distribution among regions is not balanced;

2. When creating the table, there is no pre-partition in advance. The created table has only one region by default, and a large amount of data is written to the current region

3. The created table has been pre-partitioned in advance, but the designed rowkey has no rules to follow

Solution: three design principles of row

2. What is the purpose of flush, compact, and spilt in Hbase

Flush (threshold 128M)

        When MemStore reaches the threshold, the Memstore data Flush into StoreFile ;

Compact ( When the data block reaches 3 blocks, HMaster triggers the merge operation)

        c ompact mechanism is to flush out small files merge into large Storefile file.

        Combine files; clear deleted, expired, and redundant versions of data; improve the efficiency of reading and writing data

played

        When the Region reaches the threshold ( 256M ), the over-large Region will be divided into two (not evenly divided). ( MiddleKey )

3. How to restore data when Hbase's regionserver is down

4. hbase optimization

Memory optimization: Garbage collection optimization: CMS , G1(Region); JVM startup: -Xms(1/64) -Xmx(1/4)

Region optimization: pre-partition; major merge is disabled, manual merge

Client optimization: batch processing

5. The difference between Hbase, hive and redis

Based on the MR program, Hive converts HQL to MR execution. Low efficiency, not suitable for real-time data access

Hbase is based on Hadoop data storage, stores massive amounts of data, and has its own query operations

Redis distributed cache, emphasizing caching, based on memory, supports data persistence, and supports transaction operations

Both Redis and Hbase are based on Key-Value storage

Application scenarios:

Hive is suitable for offline data analysis and cleaning, with high latency.

Hbase has low latency, access to online business use, and provides efficient data access speed.

6. Describe Hbase's scan and get functions and implementation similarities and differences

scan scan data

/**
     * 通过scan查询数据
     */
    public static void getDataByScanFilter(String tableName) throws IOException {
        Connection conn = connHolder.get();
        Table table = conn.getTable(TableName.valueOf(tableName));

        //创建用于扫描region的对象
        Scan scan = new Scan();

        //设置Filter(较慢)
        //字节数组比较器
        BinaryComparator bc = new BinaryComparator(Bytes.toBytes("1001"));
        //正则表达式比较器
        RegexStringComparator rc = new RegexStringComparator("^\\d{3}$");

        Filter f1 = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,bc);
        Filter f2 = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL,rc);
        //设置单个filter
        //scan.setFilter(f1);

        //设置多个fiter
        /**
         *  多个filter之间的逻辑关系,相当于java中的与和或
         *  (AND)
         *  MUST_PASS_ALL,
         *  (OR)
         *  MUST_PASS_ONE
          */

        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
        filterList.addFilter(f1);
        filterList.addFilter(f2);

        ResultScanner scanner = table.getScanner(scan);

        for (Result result : scanner) {
            for (Cell cell : result.rawCells()) {
                //得到rowkey
                System.out.println("行键:" + Bytes.toString(CellUtil.cloneRow(cell)));
                //得到列族
                System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
                System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
                System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
            }
        }
        
    }

get to get data

/**
     * 获取一行数据
     */
    public static void getRow(String tableName, String rowKey) throws IOException{
        Connection conn = connHolder.get();
        Table table = conn.getTable(TableName.valueOf(tableName));
        Get get = new Get(Bytes.toBytes(rowKey));
        //get.setMaxVersions();显示所有版本
        //get.setTimeStamp();显示指定时间戳的版本
        Result result = table.get(get);
        for(Cell cell : result.rawCells()){
            System.out.println("行键:" + Bytes.toString(result.getRow()));
            System.out.println("列族" + Bytes.toString(CellUtil.cloneFamily(cell)));
            System.out.println("列:" + Bytes.toString(CellUtil.cloneQualifier(cell)));
            System.out.println("值:" + Bytes.toString(CellUtil.cloneValue(cell)));
            System.out.println("时间戳:" + cell.getTimestamp());
        }
    }

7. Can multiple HMasters be started in an HBase cluster? Can these HMasters run in parallel?

You can start multiple Hmaster operations at the same time, (two Hmasters can be started in a high-availability state, but only one Hmaster can be run, that is, only one is active)

Guess you like

Origin blog.csdn.net/QJQJLOVE/article/details/107280729