YCSB workload工作负载参数设置

YCSB作为一个工作负载测试工具，参数设置很重要，通过设置不同的read，update或者是insert比例得到的测试时间是不同的。

主要是通过以下命令来加载workloada工作负载文件测试具体数据库性能：

bin/ycsb load DBname -s -P workloads/workloada

下面我提下YCSB工作负载参数具体的设置和说明，这里是(YCSB/workloads/workloada文件中设置)：

# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount = 1000（表示load和run操作中，使用的YCSB实例记录数）
operationcount = 1000（表示load和run操作中，使用的YCSB实例操作数）
workload = com.yahoo.ycsb.workloads.CoreWorkload（要使用的工作负载类）
readproportion = 0.5（默认是0.95，表示的是进行read的操作占所有操作的比例）
updateproportion = 0.5（默认是0.05，表示的是进行update的操作占所有操作的比例）
insertproportion = 0（默认是0，表示的是进行insert的操作占所有操作的比例）
scanproportion = 0（默认是0，表示的是进行scan的操作占所有操作的比例）
requestdistribution = zipfian（默认是uniform，应该使用什么分布来选择要操作的记录：uniform, zipfian, hotspot, sequential, exponential 和 latest）
threadcount = 2（默认值是1，表示YCSB客户端线程数）
readallfields = true（默认值是1，应该读取读取所有字段（true），只读取一个（false））

下面是YCSB/workloads/wordloadb文件对应的参数设置，主要是以读为主要工作负载测试的参数设置（主要区别就是readproportion和updateproportion对应比例）。

# Yahoo! Cloud System Benchmark
# Workload B: Read mostly workload
#   Application example: photo tagging; add a tag is an update, but most operations are to read tags
#                        
#   Read/update ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0.05
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadc文件对应的参数设置，主要是以读为全部工作负载测试的参数设置（主要区别就是readproportion对应比例）。

# Yahoo! Cloud System Benchmark
# Workload C: Read only
#   Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop)
#                        
#   Read/update ratio: 100/0
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadd文件对应的参数设置，主要是以读为全部工作负载测试的参数设置（主要区别就是readproportion、insertproportion对应比例）。

# Yahoo! Cloud System Benchmark
# Workload D: Read latest workload
#   Application example: user status updates; people want to read the latest
#                        
#   Read/update/insert ratio: 95/0/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: latest

# The insert order for this is hashed, not ordered. The "latest" items may be 
# scattered around the keyspace if they are keyed by userid.timestamp. A workload
# which orders items purely by time, and demands the latest, is very different than 
# workload here (which we believe is more typical of how people build systems.)
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0
scanproportion=0
insertproportion=0.05
requestdistribution=latest

下面YCSB/workloads/wordloade文件对应的参数设置，针对短区间生成的测试工作负载主要设置（scanproportion参数和insertproportion参数）：

# Yahoo! Cloud System Benchmark
# Workload E: Short ranges
#   Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id)                     
#   Scan/insert ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
# The insert order is hashed, not ordered. Although the scans are ordered, it does not necessarily
# follow that the data is inserted in order. For example, posts for thread 342 may not be inserted contiguously, but
# instead interspersed with posts from lots of other threads. The way the YCSB client works is that it will pick a start
# key, and then request a number of records; this works fine even for hashed insertion.
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100（默认值是1000，扫描的最大记录数是多少）
scanlengthdistribution=uniform（默认是uniform，对于扫描应使用什么分布来选择要扫描的记录数，对于每次扫描在1和maxscanlength之间）

下面YCSB/workloads/wordloadf文件对应的参数设置，针对读入修改写回生成的测试工作负载主要设置（readproportion参数和readmodifywriteproportion参数）：

# Yahoo! Cloud System Benchmark
# Workload F: Read-modify-write workload
#   Application example: user database, where user records are read and modified by the user or to record user activity.
#                        
#   Read/read-modify-write ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0
scanproportion=0
insertproportion=0
readmodifywriteproportion=0.5（默认值是0，指的是读取记录，修改它，写回来的操作比例）
requestdistribution=zipfian

下面YCSB/workloads/tswordloada文件对应的参数设置，针对生成的测试工作负载主要设置（readproportion参数和readmodifywriteproportion参数）：

# Yahoo! Cloud System Benchmark
# Workload A: Small cardinality consistent data for 2 days
#   Application example: Typical monitoring of a single compute or small 
#   sensor station where 90% of the load is write and only 10% is read 
#   (it's usually much less). All writes are inserts. No sparsity so 
#   every series will have a value at every timestamp.
#
#   Read/insert ratio: 10/90
#   Cardinality: 16 per key (field), 64 fields for a total of 1,024 
#                time series.
workload=com.yahoo.ycsb.workloads.TimeSeriesWorkload
recordcount=1474560
operationcount=2949120
fieldlength=8（默认值是100，字段的大小）
fieldcount=64(默认值是10，记录中的字段数)
tagcount=4（每个时间系列的唯一标记组合数，如果此值为4，则每条记录将包含一个键和4个标记组合，例如A = A，B = A，C = A，D = A.）
tagcardinality=1,2,4,2（每个“度量”或字段的每个标记值的基数（唯一值的数量），以逗号分隔的列表。 每个值必须是从1到Java的Integer.MAX_VALUE的数字，并且必须有'tagcount'值。 如果有多于或少于'tagcount'的值，则忽略它或分别替换1。）
# A value from 0 to 0.999999 representing how sparse each time series
# should be. The higher this value, the greater the time interval between
# values in a single series. For example, if sparsity is 0 and there are
# 10 time series with a 'timestampinterval' of 60 seconds with a total
# time range of 10 intervals, you would see 100 values written, one per
# timestamp interval per time series. If the sparsity is 0.50 then there
# would be only about 50 values written so some time series would have
# missing values at each interval.
sparsity=0.0
# The percentage of time series that are "lagging" behind the current
# timestamp of the writer. This is used to mimic a common behavior where
# most sources (agents, sensors, etc) are writing data in sync (same timestamp)
# but a subset are running behind due to buffering, latency issues, etc.
delayedSeries=0.0
# The maximum amount of delay for delayed series in interval counts. The 
# actual delay is chosen based on a modulo of the series index.
delayedIntervals=0
timestampunits=SECONDS（时间单位）
# The amount of time between each value in every time series in
# the units of 'timestampunits'.
timestampinterval=60
# The fixed or maximum amount of time added to the start time of a 
# read or scan operation to generate a query over a range of time 
# instead of a single timestamp. Units are shared with 'timestampunits'.
# For example if the value is set to 3600 seconds (1 hour) then 
# each read would pick a random start timestamp based on the 
#'insertstart' value and number of intervals, then add 3600 seconds
# to create the end time of the query. If this value is 0 then reads
# will only provide a single timestamp. 
querytimespan=3600
readproportion=0.10
updateproportion=0.00
insertproportion=0.90

还有一些重要的设置参数在这些负载中没有设置例如template中：

insertstart=0（第一个插入值的偏移量）
writeallfields=false（在更新的时候写所有字段）
fieldlengthdistribution=constant（字段长度分布形式，有constant，zipfian，uniform）insertorder=hashed（记录是按顺序插入还是伪随机插入，hashed还是ordered）
hotspotdatafraction=0.2（构成热集的数据项的百分比）
hotspotopnfraction=0.8（访问热集的操作百分比）
table=usertable（要对其运行查询的数据库表的名称）
#当measurementtype设置为raw时，测量将以以下csv格式输出为RAW数据点：“操作，测量的时间戳，我们的延迟”原始数据点在测试运行时收集在内存中。 每个数据点消耗大约50个字节（包括java对象开销）。 对于典型的100万到1000万次操作，大多数时候这应该适合存储器。 如果您计划每次运行执行数百万次操作，请考虑在使用RAW测量类型时配置具有更大RAM的计算机，或者将运行拆分为多次运行。
#（可选）可以指定输出文件以保存原始数据点。否则，原始数据点将写入stdout。如果输出文件已存在，将附加输出文件，否则将创建新的输出文件.measurement.raw.output_file =/tmp/ your_output_file_for_this_run
measurementtype=histogram（如何呈现延迟测量timeseries，histogram，raw）
measurement.histogram.verbose = false（使用直方图进行测量时是否发出单独的直方图桶）
histogram.buckets=1000(直方图中要跟踪的延迟范围（毫秒）)
timeseries.granularity=1000(时间序列的粒度（以毫秒为单位）)

YCSB workload工作负载参数设置

猜你喜欢