druid使用

druid使用

一、安装

  1. druid安装

    • 使用hdp中druid-0.10.1

    • 配置sql支持:

      ​ Custom druid-broker添加
      druid.sql.enable=true

    • 组件:

    Broker 8082
    Coordinator 8081
    Overlord 8090
    Router 8888
    Historical 8083
    MiddleManager 8091
  2. imply安装

    • 对应imply版本为imply-2.3.9,仅使用其中的ui

    • 启动imply-ui:

      • bin/run-imply-ui-quickstart conf-quickstart

二、导入离线数据

  1. 文件准备 wikipedia_data.csv

    • 将此文件上传到hdfs://ns/user/druid/quickstart

    • 2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisco,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisc,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancis,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranci,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranc,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFran,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFra,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFr,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanF,57,200,-143
      2013-08-31T01:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,Sa,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisco,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancisc,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFrancis,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranci,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFranc,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFran,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFra,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanFr,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,SanF,57,200,-143
      2013-08-31T02:02:33Z,GypsyDanger,en,nuclear,true,true,false,false,article,NorthAmerica,UnitedStates,BayArea,Sa,57,200,-143
  2. 定义数据源的数据格式描述文件 wikipedia_index_hadoop_csv_task.json

    • {
      "type": "index_hadoop",
      "spec": {
          "dataSchema": {
              "dataSource": "wikipedia2",
              "parser": {
                  "type": "hadoopyString",
                  "parseSpec": {
                      "format": "csv",
                      "timestampSpec": { "column": "timestamp" },
                      "columns": ["timestamp", "page", "language", "user", "unpatrolled", "newPage", "robot", "anonymous", "namespace", "continent", "country", "region", "city", "added", "deleted", "delta"],
                      "dimensionsSpec": { "dimensions": ["page", "language", "user", "unpatrolled", "newPage", "robot", "anonymous", "namespace", "continent", "country", "region", "city"] } }
              },
              "metricsSpec": [{
                  "type": "count",
                  "name": "count"
              },
              {
                  "type": "doubleSum",
                  "name": "added",
                  "fieldName": "added"
              },
              {
                  "type": "doubleSum",
                  "name": "deleted",
                  "fieldName": "deleted"
              },
              {
                  "type": "doubleSum",
                  "name": "delta",
                  "fieldName": "delta"
              }],
              "granularitySpec": {
                  "type": "uniform",
                  "segmentGranularity": "DAY",
                  "queryGranularity": "DAY",
                  "intervals": ["2013-08-31/2013-09-01"]
              }
          },
          "ioConfig": {
              "type": "hadoop",
              "inputSpec": {
                  "type": "static",
                  "paths": "quickstart/wikipedia_data.csv"
              }
          },
          "tuningConfig": {
              "type": "hadoop",
              "partitionsSpec": {
                  "type": "hashed",
                  "targetPartitionSize": 5000000
              },
              "jobProperties": {
      
              }
          }
      }
      }

      paths:hdfs下/user/druid/quickstart/wikipedia_data.csv

    • 聚合粒度由queryGranularity控制

  3. 提交任务

    • curl -X ‘POST’ -H ‘Content-Type:application/json’ -d @wikipedia_index_hadoop_csv_task.json p5.ambari:8090/druid/indexer/v1/task
  4. sql查询

  5. ui展示

    • imply
    • superset
      • hdp自带
      • 不支持界面sql查询,展示图形较丰富,但是最长可查询距当前时间一年以内的数据

三、接入kafka实时数据

  1. 创建kafka topic

    • kafka-topics.sh –create –zookeeper localhost:2181 –partitions 1 –replication-factor 1 –topic metrics
  2. 定义数据源的数据格式描述文件 metrics-kafka.json


    • {
      "type": "kafka",
      "dataSchema": {
      "dataSource": "metrics-kafka",
      "parser": {
      "type": "string",
      "parseSpec": {
      "timestampSpec": {
      "column": "time",
      "format": "auto"
      },
      "dimensionsSpec": {
      "dimensions": ["url", "user"]
      },
      "format": "json"
      }
      },
      "granularitySpec": {
      "type": "uniform",
      "segmentGranularity": "hour",
      "queryGranularity": "second"
      },
      "metricsSpec": [{
      "type": "count",
      "name": "views"
      },
      {
      "name": "latencyMs",
      "type": "doubleSum",
      "fieldName": "latencyMs"
      }
      ]
      },
      "ioConfig": {
      "topic": "metrics",
      "consumerProperties": {
      "bootstrap.servers": "p6.ambari:6667",
      "group.id": "kafka-indexing-service"
      },
      "taskCount": 1,
      "replicas": 1,
      "taskDuration": "PT1H"
      },
      "tuningConfig": {
      "type": "kafka",
      "maxRowsInMemory": "100000"
      }
      }
  3. 提交任务

  4. 写数据到kafka topic

    • kafka-console-producer.sh –broker-list p6.ambari:6667 –topic metrics

    • {"time": "2018-03-06T09:58:09.111Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}
      {"time": "2018-03-06T09:58:09.222Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}
      {"time": "2018-03-06T09:58:09.333Z", "url": "/foo/bar", "user": "bob", "latencyMs": 45}
  5. 停止任务

猜你喜欢

转载自blog.csdn.net/ukakasu/article/details/81386318