YunTable development diary (3) - BigTable data model and Call Interface (reprint)

 

Source Address: http://peopleyun.com/?p=665

 

This article will BigTable data model-depth analysis and describes how it is called.

Data Model

Like I said before to the fact BigTable name suggests, it is a very large table, but is a capable of storing billions of rows (Row) and several thousand columns (Column) of a very large table. What table is how much? Next, give some simple examples, such as: a table for all sites and content on the Internet personal information of all Chinese citizens, the overall size of these tables can reach more than PB level, and the size will increase with the date of these tables, so obviously we need to use the distributed approach, instead of using a machine to carry this huge and growing table. First, it will introduce the basic BigTable data model, which is the table.

Table

Table1. Table FIG.

This is the Table (table), although the screenshot above, only three and five Row Column, but since this table will store the personal information of all Chinese citizens, so there will be more than 1.3 billion and hundreds of multi-Row Column, Next, introduction in order to improve access efficiency and scalability of two characteristics: Colunm Family (column group) and the Tablet (sheet).

Column Family

Column Family图2. Column Family

Since each table, there will be hundreds of Column, and most queries only get a few of them Column, so if you have taken out all the Column per query, then, would be wasted, so Google's BigTable design Column family introduced this feature, this feature can through multiple Column and as a group, such as "home address" and "work address" on the map are subordinate to "address" the Column family, the biggest benefit of doing so these can be stored together in Column, not only can improve the access efficiency, but also to avoid excessive Column read, such as read only can select a Column Family.

Tablet

 TabletFigure 3 Tablet

This is very easy to understand, the system will automatically BigTable is based on a range of Row Name to copy the data to a different server.

Timestamp

In order to help synchronization and backup of data, may be provided for each respective Timestamp Cell (cell), and the system can be done GC (Garbage Collection) according Timestamp.

 

Call Interface

Google's BigTable API call interface mainly based, following is some sample code, the main reference of self Paper BigTable.

// Open Table

Table *T = OpenOrDie(“/peopletable”);

// find the appropriate Row, and make the appropriate updates

RowMutation r1(T,”310101”);

r1.Set ( "Address: Home Address", "SH88");

// perform the update

Operation on;

Apply(&op, &r1);

// create Scanner for queries

Scanner scanner(T);

ScanStream *stream;

// relevant code: 1 Lock "address" this Cloumn Family; 2 returns all versions; 3 Find Row Name is "310101" column....

stream = scanner.FetchColumnFamily ( "address");

stream->SetReturnAllVersion();

scanner.Lookup(“310101”);

//print

for(;!stream->Done();stream->Next()){

    printf(“%s %s %lld %s\n”,Scanner.RowName(), stream->ColumnName,

                                         stream->TimeStamp,  stream->Value);

}

 

Part II will focus on the development of the diary of BigTable storage model.


Reproduced in: https: //www.cnblogs.com/licheng/archive/2010/09/09/1821903.html

Guess you like

Origin blog.csdn.net/weixin_34384681/article/details/92626791