1. Hbase-region segmentation
- Automatic segmentation, by default
- Version 2.0, the first time the data of the region reaches 256M, it will be divided, and then it will be divided every time it reaches 10G. After the division is completed, the load will be balanced to other regionservers
- Pre-partition + custom rowkey
- Can be understood as pre-segmentation
- For example, pre-partitioning, each regionserver will have 10 regions, and each region has startrow and endrow
- Production must use pre-partition + custom rowkey
- After the pre-partition is completed, even if there is no data, 10 empty files of the region will be created
- When storing data in the future, it will be evenly stored in each region
2. Hbase-big merge and small merge
Big Merge: Delete expired data and merge files once every 7 days in the enterprise
Small Merge: Mark expired data, but not delete, only merge adjacent files
3. Hbase-memory data refresh
- Manually flash
- Flash with the command
- Scheduled flushing
- Setting parameters
- MemStore reaches 128M
- If there are many MemStore and none of them reach 128M, you can set the size of the region to 512M
4. Hbase-secondary index
4.1. Problems
If the filter condition of hbase query is not rowkey, it will traverse globally
Example:
If you filter by name, it will traverse globally
id name age
1 ikun 19
4.2. Resolution
Adding a secondary index is actually creating a new table with name as rowkey
name id
ikun 1