hbase中二级索引的实现--ihbase

一般来说,对数据库建立索引,往往需要单独的数据结构来存储索引的数据.在为hbase建立索引时,可以另外建立一张索引表,查询时先查询索引表,然后用查询结果查询数据表.

这个图左边表示索引表,右边是数据表.

但是对于hbase这种分布式的数据库来说,最大的问题是解决索引表和数据表的本地性问题,hbase很容易就因为负载均衡,表split等原因把索引表和数据表的数据分布到不同的region server上,比如下图中,数据表和索引表就出现在了不同的region server上

所以为了解决这个问题, ihbase项目应运而生,它的主要思想是在region级别建立索引而不是在表级别.

它的解决方案是用IdxRegion代替了常规的region实现,在flush的时候为region建立索引

@Override
  protected void internalPreFlashcacheCommit() throws IOException {
    rebuildIndexes();
    super.internalPreFlashcacheCommit();
  }

在scan的时候,提供特殊的scanner

 @Override
  protected InternalScanner instantiateInternalScanner(Scan scan,
    List<KeyValueScanner> additionalScanners) throws IOException {
    Expression expression = IdxScan.getExpression(scan);
    if (scan == null || expression == null) {
      totalNonIndexedScans.incrementAndGet();
      return super.instantiateInternalScanner(scan, additionalScanners);
    } else {
      totalIndexedScans.incrementAndGet();
      // Grab a new search context
      IdxSearchContext searchContext = indexManager.newSearchContext();
      // use the expression evaluator to determine the final set of ints
      IntSet matchedExpression = expressionEvaluator.evaluate(searchContext,
        expression);
      if (LOG.isDebugEnabled()) {
        LOG.debug(String.format("%s rows matched the index expression",
          matchedExpression.size()));
      }
      return new IdxRegionScanner(scan, searchContext, matchedExpression);
    }
  }

ihbase在内存中为region维护了一份索引,在scan的时候首先在索引中查找数据,按顺序提供rowkey,而在常规的scan时,能利用上一步的rowkey来move forward,有目的的进行seek.

IdxRegionScanner在进行scan的时候,用索引来构造keyProvider,然后执行next方法时,用keyProvider提供的rowkey进行定位

    @Override
    public boolean next(List<KeyValue> outResults) throws IOException {
      // Seek to the next key value
      seekNext();
      boolean result = super.next(outResults);

     //省略部分代码

      return result;
    }

seekNext方法就是从keyProvider取得下一个rowkey,然后跳到该rowkey

    protected void seekNext() throws IOException {
      KeyValue keyValue;
      do {
        keyValue = keyProvider.next();

        if (keyValue == null) {
          // out of results keys, nothing more to process
          super.getStoreHeap().close();
          return;
        } else if (lastKeyValue == null) {
          // first key returned from the key provider
          break;
        } else {
          // it's possible that the super nextInternal method progressed past the
          // ketProvider's next key.  We need to keep calling next on the keyProvider
          // until the key returned is after the last key returned from the
          // next(List<KeyValue>) method.

          // determine which of the two keys is less than the other
          // when the keyValue is greater than the lastKeyValue then we're good
          int comparisonResult = comparator.compareRows(keyValue, lastKeyValue);
          if (comparisonResult > 0) {
            break;
          }
        }
      } while (true);

      // seek the store heap to the next key
      // (this is what makes the scanner faster)
      getStoreHeap().seek(keyValue);
    }

我感觉这种实现问题在于内存占用很高,而且不知道如果region如果load balance到其他region server上,还能不能保持索引和数据的一致性

hbase中二级索引的实现--ihbase

猜你喜欢