【Introduction to Apache Accumulo】

Apache Accumulo is a reliable, scalable, high-performance sorted distributed key-value storage solution, based on cell access control and customizable server-side processing. Using Google BigTable design ideas, built on Apache Hadoop, Zookeeper and Thrift.



 

Leveldb is a very efficient kv database developed by Google, which supports billion-level data volume and has very high performance at this level of magnitude, mainly due to its good design, especially the LSM algorithm. Leveldb has been supported by Riak and Kyoto Tycoon as a storage engine. Taobao's Tair open source key-value store in China has also used LevelDB as its persistent storage engine and deployed it for online use.

 

Apache Accumulo is based on the design of Google's BigTable and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift.

 

 

Accumulo has several novel features such as cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.

 

 

Accumulo is a distributed data storage and retrieval system and as such consists of several architectural

components, some of which run on many individual servers. Much of the work Accumulo

does involves maintaining certain properties of the data, such as organization, availability, and

integrity, across many commodity-class machines.

 



 

2. Introduction of components---Accumulo Components

An instance of Accumulo includes many TabletServers, one Garbage Collector process, one

Master server and many Clients.

2.3.1 Tablet Server

The TabletServer manages some subset of all the tablets (partitions of tables). This includes

receiving writes from clients, persisting writes to a write-ahead log, sorting new key-value pairs

in memory, periodically flushing sorted key-value pairs to new files in HDFS, and responding to

reads from clients, forming a merge-sorted view of all keys and values from all the files it has

created and the sorted in-memory store.

TabletServers also perform recovery of a tablet that was previously on a server that failed,

reapplying any writes found in the write-ahead log to the tablet.

 

2.3.2 Garbage Collector

Accumulo processes will share files stored in HDFS. Periodically, the Garbage Collector will

identify files that are no longer needed by any process, and delete them. Multiple garbage

collectors can be run to provide hot-standby support. They will perform leader election among

themselves to choose a single active instance.

 

2.3.3 Master

The Accumulo Master is responsible for detecting and responding to TabletServer failure. It tries

to balance the load across TabletServer by assigning tablets carefully and instructing TabletServers

to unload tablets when necessary. The Master ensures all tablets are assigned to one

TabletServer each, and handles table creation, alteration, and deletion requests from clients.

The Master also coordinates startup, graceful shutdown and recovery of changes in write-ahead

logs when Tablet servers fail.

Multiple masters may be run. The masters will choose among themselves a single master, and

the others will become backups if the master should fail.

 

2.3.4 Tracer

The Accumulo Tracer process supports the distributed timing API provided by Accumulo. One

to many of these processes can be run on a cluster which will write the timing information to a

given Accumulo table for future reference. Seeing the section on Tracing for more information

on this support.

 

2.3.5 Monitor

The Accumulo Monitor is a web application that provides a wealth of information about the

state of an instance. The Monitor shows graphs and tables which contain information about

read/write rates, cache hit/miss rates, and Accumulo table information such as scan rate and

active/queued compactions. Additionally, the Monitor should always be the first point of entry

when attempting to debug an Accumulo problem as it will show high-level problems in addition

to aggregated errors from all nodes in the cluster. See the section on Monitoring for more

information.

Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the

forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active

at one time. Leader election will be performed internally to choose the active Monitor.

 

2.3.6 Client

Accumulo includes a client library that is linked to every application. The client library contains

logic for finding servers managing a particular tablet, and communicating with TabletServers to

write and retrieve key-value pairs.

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326881096&siteId=291194637