hbase -how many regions are fit for a table when prespiting or keeping running

how many regions are thinked as too few or to too many? some conditions are worth of consideration for a table(regardless of hbase's configurations):

-first ,keep the cluster more 'static'

 when u are running a cluster with heavy writes(updates),that means for  a regionsever,it will hold many large short-lived objects in memstores,and flush if overceed the thrshold.so if u can see there are too few regions in one regionserver(for one table),the flush will be frequent and the bandwidth is high .

 in opposition,if u seee the regions memstores's size of one regionserver for one period(eg.one day ) are always low,that showcase they are much more then u need.

-second,for load balance

 yes if u feel free  to meet the requirements above,then u SHOULD consider further :whether your cluster is load balacne.for example,if the total number of all regionserver regoins are less then the cluster size,it means that some regionservers will be lightly load.

 more regions will advance the parallel capacity as clients will retrieve all rows sharded by regions.BUT the shortcoming is that more open files and datanode connections will need,see [1]

 but there is a simple way to do this :

 num = max(data-size / cluster-num / flush-size,cluster-num)

 anyhow,to find a appropiate num of regions is not tricky but need some time.

[1] some important optimized advices for hbase-0.94.x 

猜你喜欢

转载自leibnitz.iteye.com/blog/2085995