YCSB workload工作负载参数设置

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/clever_wr/article/details/88992723

YCSB作为一个工作负载测试工具,参数设置很重要,通过设置不同的read,update或者是insert比例得到的测试时间是不同的。

主要是通过以下命令来加载workloada工作负载文件测试具体数据库性能:

bin/ycsb load DBname -s -P workloads/workloada

下面我提下YCSB工作负载参数具体的设置和说明,这里是(YCSB/workloads/workloada文件中设置):

# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount = 1000(表示load和run操作中,使用的YCSB实例记录数)
operationcount = 1000(表示load和run操作中,使用的YCSB实例操作数)
workload = com.yahoo.ycsb.workloads.CoreWorkload(要使用的工作负载类)
readproportion = 0.5(默认是0.95,表示的是进行read的操作占所有操作的比例)
updateproportion = 0.5(默认是0.05,表示的是进行update的操作占所有操作的比例)
insertproportion = 0(默认是0,表示的是进行insert的操作占所有操作的比例)
scanproportion = 0(默认是0,表示的是进行scan的操作占所有操作的比例)
requestdistribution = zipfian(默认是uniform,应该使用什么分布来选择要操作的记录:uniform, zipfian, hotspot, sequential, exponential 和 latest)
threadcount = 2(默认值是1,表示YCSB客户端线程数)
readallfields = true(默认值是1,应该读取读取所有字段(true),只读取一个(false))

下面是YCSB/workloads/wordloadb文件对应的参数设置,主要是以读为主要工作负载测试的参数设置(主要区别就是readproportion和updateproportion对应比例)。

# Yahoo! Cloud System Benchmark
# Workload B: Read mostly workload
#   Application example: photo tagging; add a tag is an update, but most operations are to read tags
#                        
#   Read/update ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0.05
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadc文件对应的参数设置,主要是以读为全部工作负载测试的参数设置(主要区别就是readproportion对应比例)。

# Yahoo! Cloud System Benchmark
# Workload C: Read only
#   Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop)
#                        
#   Read/update ratio: 100/0
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=1
updateproportion=0
scanproportion=0
insertproportion=0
requestdistribution=zipfian

下面是YCSB/workloads/wordloadd文件对应的参数设置,主要是以读为全部工作负载测试的参数设置(主要区别就是readproportion、insertproportion对应比例)。

# Yahoo! Cloud System Benchmark
# Workload D: Read latest workload
#   Application example: user status updates; people want to read the latest
#                        
#   Read/update/insert ratio: 95/0/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: latest

# The insert order for this is hashed, not ordered. The "latest" items may be 
# scattered around the keyspace if they are keyed by userid.timestamp. A workload
# which orders items purely by time, and demands the latest, is very different than 
# workload here (which we believe is more typical of how people build systems.)
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.95
updateproportion=0
scanproportion=0
insertproportion=0.05
requestdistribution=latest

下面YCSB/workloads/wordloade文件对应的参数设置,针对短区间生成的测试工作负载主要设置(scanproportion参数和insertproportion参数):

# Yahoo! Cloud System Benchmark
# Workload E: Short ranges
#   Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id)                     
#   Scan/insert ratio: 95/5
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
# The insert order is hashed, not ordered. Although the scans are ordered, it does not necessarily
# follow that the data is inserted in order. For example, posts for thread 342 may not be inserted contiguously, but
# instead interspersed with posts from lots of other threads. The way the YCSB client works is that it will pick a start
# key, and then request a number of records; this works fine even for hashed insertion.
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0
updateproportion=0
scanproportion=0.95
insertproportion=0.05
requestdistribution=zipfian
maxscanlength=100(默认值是1000,扫描的最大记录数是多少)
scanlengthdistribution=uniform(默认是uniform,对于扫描应使用什么分布来选择要扫描的记录数,对于每次扫描在1和maxscanlength之间)

下面YCSB/workloads/wordloadf文件对应的参数设置,针对读入修改写回生成的测试工作负载主要设置(readproportion参数和readmodifywriteproportion参数):

# Yahoo! Cloud System Benchmark
# Workload F: Read-modify-write workload
#   Application example: user database, where user records are read and modified by the user or to record user activity.
#                        
#   Read/read-modify-write ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
recordcount=1000
operationcount=1000
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0
scanproportion=0
insertproportion=0
readmodifywriteproportion=0.5(默认值是0,指的是读取记录,修改它,写回来的操作比例)
requestdistribution=zipfian

下面YCSB/workloads/tswordloada文件对应的参数设置,针对生成的测试工作负载主要设置(readproportion参数和readmodifywriteproportion参数):

# Yahoo! Cloud System Benchmark
# Workload A: Small cardinality consistent data for 2 days
#   Application example: Typical monitoring of a single compute or small 
#   sensor station where 90% of the load is write and only 10% is read 
#   (it's usually much less). All writes are inserts. No sparsity so 
#   every series will have a value at every timestamp.
#
#   Read/insert ratio: 10/90
#   Cardinality: 16 per key (field), 64 fields for a total of 1,024 
#                time series.
workload=com.yahoo.ycsb.workloads.TimeSeriesWorkload
recordcount=1474560
operationcount=2949120
fieldlength=8(默认值是100,字段的大小)
fieldcount=64(默认值是10,记录中的字段数)
tagcount=4(每个时间系列的唯一标记组合数,如果此值为4,则每条记录将包含一个键和4个标记组合,例如A = A,B = A,C = A,D = A.)
tagcardinality=1,2,4,2(每个“度量”或字段的每个标记值的基数(唯一值的数量),以逗号分隔的列表。 每个值必须是从1到Java的Integer.MAX_VALUE的数字,并且必须有'tagcount'值。 如果有多于或少于'tagcount'的值,则忽略它或分别替换1。)
# A value from 0 to 0.999999 representing how sparse each time series
# should be. The higher this value, the greater the time interval between
# values in a single series. For example, if sparsity is 0 and there are
# 10 time series with a 'timestampinterval' of 60 seconds with a total
# time range of 10 intervals, you would see 100 values written, one per
# timestamp interval per time series. If the sparsity is 0.50 then there
# would be only about 50 values written so some time series would have
# missing values at each interval.
sparsity=0.0
# The percentage of time series that are "lagging" behind the current
# timestamp of the writer. This is used to mimic a common behavior where
# most sources (agents, sensors, etc) are writing data in sync (same timestamp)
# but a subset are running behind due to buffering, latency issues, etc.
delayedSeries=0.0
# The maximum amount of delay for delayed series in interval counts. The 
# actual delay is chosen based on a modulo of the series index.
delayedIntervals=0
timestampunits=SECONDS(时间单位)
# The amount of time between each value in every time series in
# the units of 'timestampunits'.
timestampinterval=60
# The fixed or maximum amount of time added to the start time of a 
# read or scan operation to generate a query over a range of time 
# instead of a single timestamp. Units are shared with 'timestampunits'.
# For example if the value is set to 3600 seconds (1 hour) then 
# each read would pick a random start timestamp based on the 
#'insertstart' value and number of intervals, then add 3600 seconds
# to create the end time of the query. If this value is 0 then reads
# will only provide a single timestamp. 
querytimespan=3600
readproportion=0.10
updateproportion=0.00
insertproportion=0.90

还有一些重要的设置参数在这些负载中没有设置例如template中:

insertstart=0(第一个插入值的偏移量)
writeallfields=false(在更新的时候写所有字段)
fieldlengthdistribution=constant(字段长度分布形式,有constant,zipfian,uniform)insertorder=hashed(记录是按顺序插入还是伪随机插入,hashed还是ordered)
hotspotdatafraction=0.2(构成热集的数据项的百分比)
hotspotopnfraction=0.8(访问热集的操作百分比)
table=usertable(要对其运行查询的数据库表的名称)
#当measurementtype设置为raw时,测量将以以下csv格式输出为RAW数据点:“操作,测量的时间戳,我们的延迟”原始数据点在测试运行时收集在内存中。 每个数据点消耗大约50个字节(包括java对象开销)。 对于典型的100万到1000万次操作,大多数时候这应该适合存储器。 如果您计划每次运行执行数百万次操作,请考虑在使用RAW测量类型时配置具有更大RAM的计算机,或者将运行拆分为多次运行。
#(可选)可以指定输出文件以保存原始数据点。否则,原始数据点将写入stdout。如果输出文件已存在,将附加输出文件,否则将创建新的输出文件.measurement.raw.output_file =/tmp/ your_output_file_for_this_run
measurementtype=histogram(如何呈现延迟测量timeseries,histogram,raw)
measurement.histogram.verbose = false(使用直方图进行测量时是否发出单独的直方图桶)
histogram.buckets=1000(直方图中要跟踪的延迟范围(毫秒))
timeseries.granularity=1000(时间序列的粒度(以毫秒为单位))

猜你喜欢

转载自blog.csdn.net/clever_wr/article/details/88992723
今日推荐