Some large data testing tool

A data processing architecture

    As shown, there are two data transfer lines, real-time and offline calculation flow calculation process

  • Real-time computing: Event (hive table) ---- (using dw-event-to-collector.sh send events) ----> debt collection tool collector --------> flume distribution ---- ----> kafka cache --------> flink computing --------> hbase --------> elasticsearch
  • Off-line calculation: Event hdfs -------- (hive tables) ---- (active hive meter reading) ---->> flink computing --------> hbase ----- ---> elasticsearch

II. Real-time calculation process tools

  1.hive

  • Number of positions into the hive: hive
  • View current database: show databases;
  • Switch to cdp library: use cdp;
  • Create a table (Export Events configuration SMH front end, there are statements automatically generated):  
    the CREATE TABLE IF the NOT EXISTS TableName ( 
    uid String, 
    event_time bigint, 
    touch_point_id String
    ) Partitioned by (process_date String) 
    ROW the FORMAT DELIMITED 
    the FIELDS TERMINATED BY '\ t' 
    LINES BY TERMINATED '\ n-' 
    STORED TEXTFILE the AS;
  • View built table command: show create table c8_shopping;
  • View the current list: show tables;
  • View table column names: desc tablename;
  • The hive into the corresponding event table: load data local inpath "/home/hadoop/shopping.txt" into table tablename partition (process_date = "2019-07-22");
  • The data in Table query: select * from tablename where process_date = '2019-04-26' limit 10;
  • Command before executing the query with the column name and data: set hive.cli.print.header = true;
  • Delete the data in Table: truncate table tablename;
  • Delete tables: drop table tablename;

  2.kafka

   Queries kafka consumption, path: /home/hadoop/kafka_2.11-0.10.2.0/bin

   命令: sh kafka-console-consumer.sh --topic event_c8 --from-beginning --bootstrap-server 172.00.0.000:9092 > event_c8

  • Restart flink task, path: / home / hadoop / cdp-etl-jobs / bin / job / realtime
  • Close flink task: yarn application -kill task id
  • Start flink task: sh indexing-trait.sh sh calculate-trait.sh

  4.hbase

  • Enter hbase: hbase shell
  • View existing tables: list
  • Query a characteristic value: scan 'trait_c8', {COLUMNS => [ 'd: t1425', 'd: uid']}
  • Uid query a delete state: scan 'trait_c8', {COLUMNS => 'd: delete_status', FILTER => "ValueFilter (=, 'substring: true')"}
  • Discover a uid: get 'trait_c8', 'fff144eb653e7348f051307cde7db169'
  • Delete table data: truncate "tablename"; flush "tablename";
  • Delete the table: disable table; drop table;
  • hbase synchronized to the total amount es: cdp / cdp-etl-jobs / bin / job / batch / trait-crowd-calc.sh -calcType sync increment: incr

  5.elasticsearch

   Query Tool can be used kibana or elasticsearch head plug, commonly used commands:

  • 查询特性:
    GET /trait_c39/trait_c39/_search?size=1000
    {
    "query": {
    "match_all": {}
    },
    "_source": ["t596"]
    }
  • 查询人群:
    GET /trait_c39/trait_c39/_search?size=1000
    {
    "query": {
    "match_all": {}
    },
    "post_filter": {"term": {
    "crowds_code": "cr197"
    }}
    }

  • Discover a uid:
    GET / trait_c33 / trait_c33 / uid-1

III. Offline calculation process tools

  1.hdfs

Front page inquiry address: http://172.23.x.xxx:50070/explorer.html#/cdp/warehouse

View catalog: hadoop fs -ls / cdp / warehouse / c8 / offline /

View File: hadoop fs -cat /cdp/warehouse/c8/offline/shopping.txt

Download Data: hadoop fs -get / cdp / warehouse / c8 / offline /

Delete files: hadoop fs -rm -r /cdp/warehouse/c8/offline/shopping.txt

  2.azkaban

    • cdp-batch-process off-line batch data
      dw-etl-process cartridge number etl start
      dw-event-to-hdfs active read events into HDFS
      User-User Delete Delete
      event-ub-to-hbase send events to hbase, with the user profile data show
      common-jobs-config generate job configuration information, address: / home / hadoop / cdp- etl-jobs / jobs-tmp / codes /
          characteristics trigger the arrival of ALL_EVENT_TRAIT all events List
          ALL_ACC_TRAIT except timeline, all events accumulation characteristics trigger class list
          ALL_REF_TRAIT all the features change trigger properties list
          the full amount of the population list within ALL_CROWD channel
          list is triggered when CALC_EVENT_TRAIT event arrives and the need to re-calculate the characteristic
          trigger CALC_TRAIT change characteristics and the need to re-calculate the characteristics of the list
          CALC_CROWD day needs people calculation list, including the re-calculation of the crowd, the crowd in line with the cycle of
          the population list CLEAN_CROWD be deleted 
          CLEAN_TRAIT be deleted features list
          properties list to be exported EXPORT_TRAIT idmapping when
          CANCELED_TRAIT recall feature authorize the impact of list
      event-trait-calc-full full amount of heavy run of data, traitupdate judgment of history to the latest data assigned to the characteristic
      calculate the number of bins incremental daily data event-trait-calc-incr, traitupdate send only the day's data
      event-trait-calc-init to recalculate the trigger characteristic event arrives, traitupdate day only send data
      trait-crowd-calc computing crowd, characteristic for characteristic changes triggered when recalculated, timeline type characteristics, data update site administrator / operator Commissioner
      id-mapping-clean delete obsolete mapping relationship
      id-mapping-init idMapping initialization and establish mapping relationship
      id-mapping-copy idMapping the characteristics copy
      report-crowd-count update population number to mysql, cdp_crowd table crowd_scale column
      report-metric timing calculation for all people long-term tracking Kanban indicators index and full-channel
      cdp-batch-process
    • cdp-clean-jobs clear temporary files, file export expired crowd
    • crowd-export groups exported
    • init-channels initialization channel
    • introducing characteristic trait-import

Guess you like

Origin www.cnblogs.com/fanshudada/p/11278422.html