data generator

data-generator is a Java-implemented data generator open source project.

If you are working on big data BI and want to compare the performance of different implementation schemes such as MySQL, GreenPlum, Elasticsearch, Hive, Presto, Impala, Drill, HAWQ, Druid, Pinot, Kylin, ClickHouse, etc., then you need a This open source project is designed to generate such standard data.

Data model: src/main/resources/data model.png

src/main/resources/datamodel.png

1. Compiler:

 mvn assembly:assembly

2. Create a database in MySQL, and then execute src/main/resources/model_ddl.sql to create the corresponding table.

3. Specify the latitude and longitude type of ES:

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/contract/contract/_bulk' -d '
{ "index":{ "_id": 1} }
{"id":1}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/contract/_mapping/contract' -d '
{
  "properties": {
    "geo_location": {
      "type": "geo_point"
    }
  }
}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/detail/detail/_bulk' -d '
{ "index":{ "_id": 1} }
{"id":1}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/detail/_mapping/detail' -d '
{
  "properties": {
    "geo_location": {
      "type": "geo_point"
    }
  }
}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/area/area/_bulk' -d '
{ "index":{ "_id": 1} }
{"id":1}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/area/_mapping/area' -d '
{
  "properties": {
    "geo_location": {
      "type": "geo_point"
    }
  }
}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/customer/customer/_bulk' -d '
{ "index":{ "_id": 1} }
{"id":1}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/customer/_mapping/customer' -d '
{
  "properties": {
    "geo_location": {
      "type": "geo_point"
    }
  }
}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/sales_staff/sales_staff/_bulk' -d '
{ "index":{ "_id": 1} }
{"id":1}
'

curl -H "Content-Type: application/json" -XPUT 'http://192.168.252.193:9200/sales_staff/_mapping/sales_staff' -d '
{
  "properties": {
    "geo_location": {
      "type": "geo_point"
    }
  }
}
'

Fourth, specify the configuration in the config.txt file in the current directory:

#新增数据是MySQL批量提交记录数量
batchSize=1000
#订单时间开始年份
startYear=2000
#订单时间开始月份
startMonth=1
#订单时间开始天数
startDay=1
#客户数
customerCount=5000
#销售数
salesStaffCount=2000
#合同数
contractCount=20000
#商品数
itemCount=10000
#商品价格上限
priceLimit=1000
#合同最大明细数
contractDetailLimit=100
#合同明细商品最大数量
itemQuantityLimit=100
#将生成的数据保存到哪个MySQL
mysql.url=jdbc:mysql://192.168.252.193:3306/demo?useUnicode=true&characterEncoding=utf8
mysql.user=root
mysql.password=root
mysql.pageSize=10000
#将MySQL里面的数据查出来组装成JSON文档后索引到哪个ES
es.host=192.168.252.193
es.port=9200
#ES批量提交数量
es.batchSize=1000
#可选值为file或者es
#如果选择file,则在当前目录想生成相应的脚本文件,等程序执行完毕后再执行脚本文件将数据索引到ES
#如果选择es,则在数据生成完毕后直接在程序中把数据提交给ES进行索引
es.mode=es
#是否异步多线程的方式进行ES索引
output.async=true
#如果是异步多线程的方式进行ES索引,则需要几个线程
output.async.thread.count=10
#如果ES索引中断,再次索引的时候从哪一页开始索引,0代表第一页
output.start.page=0

Five, run the program:

all in one:

nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.Start &

or

step by step:

1. 生成模拟数据并保存到mysql:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.generator.Generator &

2. 将mysql中的数据生成合同文档并提交给ES:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.Contract &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x contract.sh
    nohup ./contract.sh &

3. 将mysql中的数据生成合同明细文档并提交给ES:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.ContractDetail &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x detail.sh
    nohup ./detail.sh &

4. 将mysql中的数据生成区域文档并提交给ES:

    nohup java -Xmx1g -Xms1g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.Area &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x area.sh
    nohup ./area.sh &

5. 将mysql中的数据生成商品文档并提交给ES:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.Item &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x item.sh
    nohup ./item.sh &

6. 将mysql中的数据生成客户文档并提交给ES:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.Customer &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x customer.sh
    nohup ./customer.sh &

7. 将mysql中的数据生成销售文档并提交给ES:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.SalesStaff &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x sales_staff.sh
    nohup ./sales_staff.sh &

8. 将mysql中的数据生成品牌文档并提交给ES:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.Brand &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x brand.sh
    nohup ./brand.sh &

9. 将mysql中的数据生成分类文档并提交给ES:

    nohup java -Xmx2g -Xms2g -cp data-generator-1.0-jar-with-dependencies.jar org.apdplat.data.generator.mysql2es.Category &
    如果es.mode=es则不需要执行如下两步, 只有es.mode=file才需要执行
    chmod +x category.sh
    nohup ./category.sh &

6. Execute src/main/resources/hive_ddl.sql to create a table in hive.

7. Execute the command in src/main/resources/sqoop.txt to import the data in MySQL into Hive.

8. Import Hive tables in Kylin, create Models and Cubes, and build Cubes.

9. Create an index pattern in Kibana and create a chart.

10. Compare MySQL, Kibana+ES, and Kylin with the following statistics:

SELECT
    item. NAME ,
    sum(contract_detail.price) AS total_price ,
    sum(contract_detail.item_quantity) AS total_quantity
FROM
    contract_detail
LEFT JOIN item ON contract_detail.item_id = item.id
GROUP BY
    item. NAME
ORDER BY
    total_quantity DESC 
    
Kylin耗时0.5秒,MySQL59秒,ES5秒。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325069880&siteId=291194637