Ad-hoc query service based on spark engine

Project address IQL , a lot of support, welcome to star

IQL

Referring to the use of spark sql in Himalaya, the load, select, and save syntax of xql implements a set of spark-based ad hoc query services

  • Elegant interaction, supporting multiple datasource/sink
  • Spark resident service, automatic discovery based on zookeeper engine
  • Load balancing, multiple engines execute randomly
  • Parallel query in multi-session mode
  • Spark-based dynamic resource allocation will not occupy executor resources without tasks

Hive

  • Download Data
select * from hive_table
  • save data
save tb1 as hive.table

Hbase

Download Data
  • hbase.zookeeper.quorum:zookeeper地址
  • spark.table.schema: The schema corresponding to the Spark temporary table eg: "ID:String,appname:String,age:Int"
  • hbase.table.schema:Hbase表对应schema eg: “:rowkey,info:appname,info:age”
  • hbase.table.name: Hbase table name
  • spark.rowkey.view.name: The name of the tempview created by the dataframe corresponding to the rowkey (after setting this value, only the data corresponding to the rowkey is obtained)
load hbase.t_mbl_user_version_info 
where `spark.table.schema`="userid:String,osversion:String,toolversion:String"
       and `hbase.table.schema`=":rowkey,info:osversion,info:toolversion" 
       and `hbase.zookeeper.quorum`="localhost:2181"
as tb;
save data
  • hbase.zookeeper.quorum:zookeeper地址
  • hbase.table.rowkey.field: Which field of the spark temporary table is used as the rowkey of hbase, the default is the first field
  • bulkload.enable: Whether to start bulkload, it is not started by default, there is a bug in bulkload temporarily (sorting problem), when the hbase table to be inserted has only one rowkey, it must be started.
  • hbase.table.name: Hbase table name
  • hbase.table.family: column family name, default info
  • hbase.table.startKey: The pre-partition start key. When the hbase table does not exist, the Hbase table will be created automatically. Without the following three parameters, there is only one partition
  • hbase.table.endKey: pre-partition start key
  • hbase.table.numReg: number of partitions
  • hbase.table.rowkey.prefix: When rowkey is a number, pre-partition needs to specify the formate form of the prefix, such as 00
  • hbase.check_table: Whether to check whether the table exists when writing to the hbase table, the default is false
save tb1 as hbase.tableName 
where `hbase.zookeeper.quorum`="localhost:2181"
      and `hbase.table.rowkey.filed`="name"

MySQL

  • Download Data
load jdbc.ai_log_count 
where driver="com.mysql.jdbc.Driver" // 默认
      and url="jdbc:mysql://localhost/db?characterEncoding=utf8" 
      and user="root" // 默认
      and password="***" //默认
as tb; 
  • save data
save append tb as jdbc.aatest_delete;

File operations (where format can be: json, orc, csv, parquet, text)

  • Download Data
load formate.`path` as tb;
  • save data
save tb as formate.`path` partitionBy uid coalesce 2;

Kafka

  • Download Data
load kafka.`topicName`
where maxRatePerPartition="200"
    and `group.id`="consumerGroupId"

refer to

StreamingPro supports SQL-like DSL

Project address IQL , a lot of support, welcome to star

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325528688&siteId=291194637