YCSB Introduction
YCSB (Yahoo! Cloud Serving Benchmark) is a universal Yahoo open source performance testing tools.
We can be related to performance testing various types of NoSQL products through this tool, including:
Instructions on YCSB can refer to:
YCSB and HBase comes with performance testing tools (PerformanceEvaluation) compared to the benefits that:
- Extended: performance testing of HBase client is not just a product, but may be a different version of HBase.
- Flexible: performance testing time, the test mode can be selected: read + write, read + scan the like can also select the frequency of the selected mode of operation different from the Key.
- monitor:
- When performance testing, real-time display of the progress of tests:
-
1340
sec:
751515
operations;
537.74
current ops/sec; [INSERT AverageLatency(ms)=
1.77
]
1350
sec:
755945
operations;
442.82
current ops/sec; [INSERT AverageLatency(ms)=
2.18
]
1360
sec:
761545
operations;
559.72
current ops/sec; [INSERT AverageLatency(ms)=
1.71
]
1370
sec:
767616
operations;
606.92
current ops/sec; [INSERT AverageLatency(ms)=
1.58
]
- After the test is complete, the overall test case:
-
[OVERALL], RunTime(ms),
1762019.0
[OVERALL], Throughput(ops/sec),
567.5307700995279
[INSERT], Operations,
1000000
[INSERT], AverageLatency(ms),
1.698302
[INSERT], MinLatency(ms),
0
[INSERT], MaxLatency(ms),
14048
[INSERT], 95thPercentileLatency(ms),
2
[INSERT], 99thPercentileLatency(ms),
3
[INSERT], Return=
0
,
1000000
[INSERT],
0
,
29
[INSERT],
1
,
433925
[INSERT],
2
,
549176
[INSERT],
3
,
10324
[INSERT],
4
,
3629
[INSERT],
5
,
1303
[INSERT],
6
,
454
[INSERT],
7
,
140
YCSB less than that:
Built-in workload model is too simple, does not provide a form of MR for testing, so time to test if you want to open a multi-threaded manner would be more trouble.
For example, when the import is only open multiple threads to start multiple import process, and then specify "Start Key value" in a different startup parameters. Transaction time during the test, only to open multiple threads on multiple machines to operate.
Use YCSB test HBase-0.90.4
YCSB-0.1.3 download the source code from the official website
http://github.com/brianfrankcooper/YCSB/tarball/0.1.3
Compile YCSB
hdfs@hd0004-sw1 guopeng$ cd YCSB-0.1.3/
hdfs@hd0004-sw1 YCSB-0.1.3$ pwd
/home/hdfs/guopeng/YCSB-0.1.3
hdfs@hd0004-sw1 YCSB-0.1.3$ ant
Buildfile: /home/hdfs/guopeng/YCSB-0.1.3/build.xml
compile:
javac /home/hdfs/guopeng/YCSB-0.1.3/build.xml:50: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
makejar:
BUILD SUCCESSFUL
Total time: 0 seconds
hdfs@hd0004-sw1 YCSB-0.1.3$
Because YCSB comes with HBase client code has some compatibility problems, so we use the following code to replace YCSB comes with a file (db / hbase / src / com / yahoo / ycsb / db / HBaseClient.java):
package com.yahoo.ycsb.db;
import java.io.IOException;
import java.util.ConcurrentModificationException;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import java.util.Vector;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;
import com.yahoo.ycsb.DBException;
/**
* HBase client for YCSB framework
* @see http://blog.data-works.org
* @see http://gpcuster.cnblogs.com/
*/
public class HBaseClient extends com.yahoo.ycsb.DB {
private static final Configuration config = new Configuration();
static {
config.addResource("hbase-default.xml");
config.addResource("hbase-site.xml");
}
public boolean _debug = false;
public String _table = "";
public HTable _hTable = null;
public String _columnFamily = "";
public byte _columnFamilyBytes[];
public static final int Ok = 0;
public static final int ServerError = -1;
public static final int HttpError = -2;
public static final int NoMatchingRecord = -3;
public static final Object tableLock = new Object();
/**
* Initialize any state for this DB. Called once per DB instance; there is
* one DB instance per client thread.
*/
public void init() throws DBException {
if ((getProperties().getProperty("debug") != null)
&& (getProperties().getProperty("debug").compareTo("true") == 0)) {
_debug = true;
}
_columnFamily = getProperties().getProperty("columnfamily");
if (_columnFamily == null) {
System.err
.println("Error, must specify a columnfamily for HBase table");
throw new DBException("No columnfamily specified");
}
_columnFamilyBytes = Bytes.toBytes(_columnFamily);
// read hbase client settings.
for (Object key : getProperties().keySet()) {
String pKey = key.toString();
if (pKey.startsWith("hbase.")) {
String pValue = getProperties().getProperty(pKey);
if (pValue != null) {
config.set(pKey, pValue);
}
}
}
}
/**
* Cleanup any state for this DB. Called once per DB instance; there is one
* DB instance per client thread.
*/
public void cleanup() throws DBException {
try {
if (_hTable != null) {
_hTable.flushCommits();
}
} catch (IOException e) {
throw new DBException(e);
}
}
public void getHTable(String table) throws IOException {
synchronized (tableLock) {
_hTable = new HTable(config, table);
}
}
/**
* Read a record from the database. Each field/value pair from the result
* will be stored in a HashMap.
*
* @param table
* The name of the table
* @param key
* The record key of the record to read.
* @param fields
* The list of fields to read, or null for all of them
* @param result
* A HashMap of field/value pairs for the result
* @return Zero on success, a non-zero error code on error
*/
public int read(String table, String key, Set<String> fields,
HashMap<String, String> result) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
Result r = null;
try {
if (_debug) {
System.out.println("Doing read from HBase columnfamily "
+ _columnFamily);
System.out.println("Doing read for key: " + key);
}
Get g = new Get(Bytes.toBytes(key));
if (fields == null) {
g.addFamily(_columnFamilyBytes);
} else {
for (String field : fields) {
g.addColumn(_columnFamilyBytes, Bytes.toBytes(field));
}
}
r = _hTable.get(g);
} catch (IOException e) {
System.err.println("Error doing get: " + e);
return ServerError;
} catch (ConcurrentModificationException e) {
// do nothing for now...need to understand HBase concurrency model
// better
return ServerError;
}
for (KeyValue kv : r.raw()) {
result.put(Bytes.toString(kv.getQualifier()),
Bytes.toString(kv.getValue()));
if (_debug) {
System.out.println("Result for field: "
+ Bytes.toString(kv.getQualifier()) + " is: "
+ Bytes.toString(kv.getValue()));
}
}
return Ok;
}
/**
* Perform a range scan for a set of records in the database. Each
* field/value pair from the result will be stored in a HashMap.
*
* @param table
* The name of the table
* @param startkey
* The record key of the first record to read.
* @param recordcount
* The number of records to read
* @param fields
* The list of fields to read, or null for all of them
* @param result
* A Vector of HashMaps, where each HashMap is a set field/value
* pairs for one record
* @return Zero on success, a non-zero error code on error
*/
public int scan(String table, String startkey, int recordcount,
Set<String> fields, Vector<HashMap<String, String>> result) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
Scan s = new Scan(Bytes.toBytes(startkey));
// HBase has no record limit. Here, assume recordcount is small enough
// to bring back in one call.
// We get back recordcount records
s.setCaching(recordcount);
// add specified fields or else all fields
if (fields == null) {
s.addFamily(_columnFamilyBytes);
} else {
for (String field : fields) {
s.addColumn(_columnFamilyBytes, Bytes.toBytes(field));
}
}
// get results
ResultScanner scanner = null;
try {
scanner = _hTable.getScanner(s);
int numResults = 0;
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// get row key
String key = Bytes.toString(rr.getRow());
if (_debug) {
System.out.println("Got scan result for key: " + key);
}
HashMap<String, String> rowResult = new HashMap<String, String>();
for (KeyValue kv : rr.raw()) {
rowResult.put(Bytes.toString(kv.getQualifier()),
Bytes.toString(kv.getValue()));
}
// add rowResult to result vector
result.add(rowResult);
numResults++;
if (numResults >= recordcount) // if hit recordcount, bail out
{
break;
}
} // done with row
}
catch (IOException e) {
if (_debug) {
System.out
.println("Error in getting/parsing scan result: " + e);
}
return ServerError;
}
finally {
scanner.close();
}
return Ok;
}
/**
* Update a record in the database. Any field/value pairs in the specified
* values HashMap will be written into the record with the specified record
* key, overwriting any existing values with the same field name.
*
* @param table
* The name of the table
* @param key
* The record key of the record to write
* @param values
* A HashMap of field/value pairs to update in the record
* @return Zero on success, a non-zero error code on error
*/
public int update(String table, String key, HashMap<String, String> values) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
if (_debug) {
System.out.println("Setting up put for key: " + key);
}
Put p = new Put(Bytes.toBytes(key));
for (Map.Entry<String, String> entry : values.entrySet()) {
if (_debug) {
System.out.println("Adding field/value " + entry.getKey() + "/"
+ entry.getValue() + " to put request");
}
p.add(_columnFamilyBytes, Bytes.toBytes(entry.getKey()),
Bytes.toBytes(entry.getValue()));
}
try {
_hTable.put(p);
} catch (IOException e) {
if (_debug) {
System.err.println("Error doing put: " + e);
}
return ServerError;
} catch (ConcurrentModificationException e) {
// do nothing for now...hope this is rare
return ServerError;
}
return Ok;
}
/**
* Insert a record in the database. Any field/value pairs in the specified
* values HashMap will be written into the record with the specified record
* key.
*
* @param table
* The name of the table
* @param key
* The record key of the record to insert.
* @param values
* A HashMap of field/value pairs to insert in the record
* @return Zero on success, a non-zero error code on error
*/
public int insert(String table, String key, HashMap<String, String> values) {
return update(table, key, values);
}
/**
* Delete a record from the database.
*
* @param table
* The name of the table
* @param key
* The record key of the record to delete.
* @return Zero on success, a non-zero error code on error
*/
public int delete(String table, String key) {
// if this is a "new" table, init HTable object. Else, use existing one
if (!_table.equals(table)) {
_hTable = null;
try {
getHTable(table);
_table = table;
} catch (IOException e) {
System.err.println("Error accessing HBase table: " + e);
return ServerError;
}
}
if (_debug) {
System.out.println("Doing delete for key: " + key);
}
Delete d = new Delete(Bytes.toBytes(key));
try {
_hTable.delete(d);
} catch (IOException e) {
if (_debug) {
System.err.println("Error doing delete: " + e);
}
return ServerError;
}
return Ok;
}
}
The modified HBase client can directly specify the tests need to use the client in the command line parameters, such as the connection information zk :-p hbase.zookeeper.quorum = hd0004-sw1.dc.sh-wgq.sdo.com, hd0001 -sw1.dc.sh-wgq.sdo.com, a client's local cache size :-p hbase.client.write.buffer = 100, and the like.
Then copy and use-dependent compiled Jar package and configuration information.
[hdfs@hd0004-sw1 YCSB-0.1.3]$ cp ~/hbase-current/*.jar ~/hbase-current/lib/*.jar ~/hbase-current/conf/hbase-*.xml db/hbase/lib/
[hdfs@hd0004-sw1 YCSB-0.1.3]$
Now compile HBase client:
[hdfs@hd0004-sw1 YCSB-0.1.3]$ ant dbcompile-hbase
Buildfile: /home/hdfs/guopeng/YCSB-0.1.3/build.xml
compile:
[javac] /home/hdfs/guopeng/YCSB-0.1.3/build.xml:50: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
makejar:
dbcompile-hbase:
dbcompile:
[javac] /home/hdfs/guopeng/YCSB-0.1.3/build.xml:63: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
makejar:
BUILD SUCCESSFUL
Total time: 0 seconds
[hdfs@hd0004-sw1 YCSB-0.1.3]$
Finally, the establishment of HBase table test (usertable):
hbase(main):004:0> create 'usertable', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
0 row(s) in 1.2940 seconds
Such environment is ready.
Then use the following command can start importing the data need to be tested:
java -cp build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=f1 -p recordcount=1000000 -p hbase.zookeeper.quorum=hd0004-sw1.dc.sh-wgq.sdo.com,hd0001-sw1.dc.sh-wgq.sdo.com,hd0003-sw1.dc.sh-wgq.sdo.com,hd0149-sw18.dc.sh-wgq.sdo.com,hd0165-sw13.dc.sh-wgq.sdo.com -s