clickhouse real time big data analysis engine of SQL wording Detailed

 

ClickHouse advantage

  • Parallel processing of a single query (using multiple cores)
  • Distributed processing on multiple servers
  • Very fast scanning (see below benchmark), it can be used for real-time query
  • Very suitable column stores "width" / "non-standardized" table (number of columns)
  • Compression is good
  • SQL support (limited)
  • Good feature set, including support approximated
  • Different storage engines (disk storage format)
  • Very suitable structure log / event data and time-series data (engine MergeTree need a date field)
  • Support index (primary key only, not all storage engines)
  • Nice command-line interface, user-friendly format and progress bar

The following are ClickHouse features a complete list of

ClickHouse shortcomings

  • Not really delete / update support, there is no transaction (with Spark and most of the big data system with the same)
  • No secondary key (Spark with the same system and most of the big data)
  • Own protocol (no MySQL protocol support)
  • Limited SQL support, and implementation is a different connection. If you want to migrate MySQL or Spark, you may have to use the connection to rewrite all queries.
  • No window function

Detailed here is the sql wording clickhouse:

clickhouse SQL query with mysql, presto of SQL roughly the same, but there are a few different places, this article only record clickhouseSQL query with my daily writing, including computing functions, aggregate functions, associated statement written, and some of the need to pay attention local.

clickhouse official website link address: https://clickhouse.yandex/

A: simple query

There is no difference with the simple query statement mysql database, etc.

select * from the library name. table where conditions

ex:select  snow,sname,sage from student where sno= 1

Caveats: clickhouse is great for real-time data analysis engine, so the general use of large clickhouse table in the data, there are a lot of partition, so, when a query if the query data is too large, it is necessary under conditions where the role of small case use the limit to do query limit. Otherwise it will prompt query data exceeds the error prompt XXGB.

Two: association query

First: clickhouse query does not support more than two or more tables directly join, as follows conventional multi-table associated with this wording like mysql

Second: association on condition changed from using, using field must match the name in the table, and if not can select as field aliases, field names will be unified

Third: associate keywords commonly used at several

1.ALL LEFT JOIN     

2.ANY LEFT JOIN

3.ALL FULL JOIN  

 

General sql query associated with the following

ex:select * from table as a                                                                                                                                                                          left join table as b on a.id=b.aid                                                                                                                                                              left join table as c on a.id=c.aid

The wording will complain in clickhouse, the association table above two tables, can be handled by way of sub-queries

ex: first TABLEA table and the table through the TABLEB results after the left and then left join association must be consistent join TABLEC associated conditions changed using using field names from on the tables, and if not can select as field aliases, field names will be unified

SELECT
	*
    FROM
(       SELECT
	    *
        FROM
	    (
            (select *
                FROM
	            TABLEA )                                                                                                                                         
            ALL LEFT JOIN
             (select *
                 FROM
	             TABLEB )using aid
         )                                                   
         ALL left  JOIN
             (select *
                FROM
	            TABLEC) USING aid
)     

 

Three: commonly used functions or expressions

1.sum (field) sum

2.avg (field) averaged

3.round (field / SUM (field) / (formula a / b), 2) rounded to two decimals

4.case when field B = 0 then null / 0 else round (field A / field B, 4) is determined statement where the dividend is 0 then returns a null or 0 in 

5.toString () into a string

6.concat (field value, 'such as the contents to be spliced%')

示例:concat(toString(round(round(a/b,4) * 100 ,2)),'%')

 

Four: the same section

In conditions where group by, order by, limit the use of these same mysql wording of SQL

 

Five: Practical examples:

SELECT
	account_id,
	'2018-12-12~2018-12-15' AS date,
	account,
	ad_click
FROM
	(
		SELECT
			account_id,
			fr,
			fr_name,
			account,
			account_balance,
			account_budget,
			account_exclude_ip,
			account_budget_offline_time,
			account_status
		FROM
			marketing.sem_account_type
		WHERE
			1 = 1
		AND lower(fr) IN ('bd_sem')
		AND account_id IN (
			'18091503',
			'18091505',
			'18091501'
		)
	) ALL
LEFT JOIN (
	SELECT
		account_id,
		fr,
		round(sum(ad_cost) / 3, 2) AS ad_cost,
		round(sum(ad_cost_real) / 3, 2) AS ad_cost_real,
		round(sum(ad_impression) / 3, 2) AS ad_impression,
		round(sum(ad_click) / 3, 2) AS ad_click,
		round(sum(clue_all) / 3, 2) AS clue_all,
		round(sum(clue_all_new) / 3, 2) AS clue_all_new,
		round(
			sum(
				c1_kpi_daily_new_customer_amount
			) / 3,
			2
		) AS c1_kpi_daily_new_customer_amount,
		round(
			sum(c1_kpi_new_customer_amount) / 3,
			2
		) AS c1_kpi_new_customer_amount,
		round(
			sum(
				c2_kpi_daily_new_customer_amount
			) / 3,
			2
		) AS c2_kpi_daily_new_customer_amount,
		round(
			sum(c2_kpi_new_customer_amount) / 3,
			2
		) AS c2_kpi_new_customer_amount,
		round(sum(c2c_c1_create) / 3, 2) AS c2c_c1_create,
		round(sum(c2b_c1_create) / 3, 2) AS c2b_c1_create,
		round(sum(c2c_c1_onsite) / 3, 2) AS c2c_c1_onsite,
		round(sum(c2b_c1_onsite) / 3, 2) AS c2b_c1_onsite,
		round(sum(c2c_c1_onsale) / 3, 2) AS c2c_c1_onsale,
		round(sum(c2b_c1_onsale) / 3, 2) AS c2b_c1_onsale,
		round(sum(c2c_c2_appoint) / 3, 2) AS c2c_c2_appoint,
		round(sum(b2c_c2_appoint) / 3, 2) AS b2c_c2_appoint,
		round(sum(ssss_c2_appoint) / 3, 2) AS ssss_c2_appoint,
		round(
			sum(c2c_c2_finish_appoint) / 3,
			2
		) AS c2c_c2_finish_appoint,
		round(
			sum(b2c_c2_finish_appoint) / 3,
			2
		) AS b2c_c2_finish_appoint,
		round(
			sum(ssss_c2_finish_appoint) / 3,
			2
		) AS ssss_c2_finish_appoint,
		round(sum(c2c_c2_order) / 3, 2) AS c2c_c2_order,
		round(sum(b2c_c2_order) / 3, 2) AS b2c_c2_order,
		round(sum(weighting_number) / 3, 2) AS weighting_number,
		round(sum(ssss_c2_order) / 3, 2) AS ssss_c2_order
	FROM
		(
			SELECT
				fr,
				keyword_id,
				account_id,
				cost AS ad_cost,
				cost_real AS ad_cost_real,
				impression AS ad_impression,
				click AS ad_click
			FROM
				marketing.sem_keyword_report
			WHERE
				1 = 1
			AND the_day >= '2018-12-12'
			AND the_day <= '2018-12-15'
			AND lower(fr) IN ('bd_sem')
			AND account_id IN (
				'18091503',
				'18091505',
				'18091501'
			)
			AND (
				campaign_city IN (
					'上海',
					'东莞',
					'中山',
					'临沂',
					'乌鲁木齐',
					'伊犁',
					'佛山',
					'保定',
					'全国',
					'兰州',
					'包头',
					'北京',
					'南京',
					'南宁',
					'南昌',
					'南通',
					'南阳',
					'厦门',
					'合肥',
					'呼和浩特',
					'咸阳',
					'哈尔滨',
					'唐山',
					'嘉兴',
					'大同',
					'大连',
					'天津',
					'太原',
					'宁波',
					'宜昌',
					'宿迁',
					'常州',
					'广州',
					'廊坊',
					'徐州',
					'惠州',
					'成都',
					'扬州',
					'新乡',
					'无锡',
					'昆明',
					'杭州',
					'武汉',
					'沈阳',
					'泉州',
					'泰州',
					'泸州',
					'洛阳',
					'济南',
					'济宁',
					'淮安',
					'深圳',
					'温州',
					'澳门',
					'烟台',
					'珠海',
					'盐城',
					'石家庄',
					'福州',
					'绵阳',
					'芜湖',
					'苏州',
					'襄阳',
					'西安',
					'许昌',
					'贵阳',
					'达州',
					'郑州',
					'重庆',
					'金华',
					'银川',
					'镇江',
					'长春',
					'长沙',
					'青岛'
				)
			)
		) ALL
	FULL JOIN (
		SELECT
			fr,
			keyword_id,
			account_id,
			clue_all,
			clue_all_new,
			c1_kpi_daily_new_customer_amount,
			c1_kpi_new_customer_amount,
			c2_kpi_daily_new_customer_amount,
			c2_kpi_new_customer_amount,
			c2c_c1_create,
			c2b_c1_create,
			c2c_c1_onsite,
			c2b_c1_onsite,
			c2c_c1_onsale,
			c2b_c1_onsale,
			c2c_c2_appoint,
			b2c_c2_appoint,
			ssss_c2_appoint,
			c2c_c2_finish_appoint,
			b2c_c2_finish_appoint,
			ssss_c2_finish_appoint,
			c2c_c2_order,
			b2c_c2_order,
			weighting_number,
			ssss_c2_order
		FROM
			(
				SELECT
					fr,
					kid AS keyword_id,
					sum(clue_all) AS clue_all,
					sum(clue_all_new) AS clue_all_new,
					sum(
						c1_kpi_daily_new_customer_amount
					) AS c1_kpi_daily_new_customer_amount,
					sum(c1_kpi_new_customer_amount) AS c1_kpi_new_customer_amount,
					sum(
						c2_kpi_daily_new_customer_amount
					) AS c2_kpi_daily_new_customer_amount,
					sum(c2_kpi_new_customer_amount) AS c2_kpi_new_customer_amount,
					sum(c2c_c1_create) AS c2c_c1_create,
					sum(c2b_c1_create) AS c2b_c1_create,
					sum(c2c_c1_onsite) AS c2c_c1_onsite,
					sum(c2b_c1_onsite) AS c2b_c1_onsite,
					sum(c2c_c1_onsale) AS c2c_c1_onsale,
					sum(c2b_c1_onsale) AS c2b_c1_onsale,
					sum(c2c_c2_appoint) AS c2c_c2_appoint,
					sum(b2c_c2_appoint) AS b2c_c2_appoint,
					sum(ssss_c2_appoint) AS ssss_c2_appoint,
					sum(c2c_c2_finish_appoint) AS c2c_c2_finish_appoint,
					sum(b2c_c2_finish_appoint) AS b2c_c2_finish_appoint,
					sum(ssss_c2_finish_appoint) AS ssss_c2_finish_appoint,
					sum(c2c_c2_order) AS c2c_c2_order,
					sum(b2c_c2_order) AS b2c_c2_order,
					sum(weighting_number) AS weighting_number,
					sum(ssss_c2_order) AS ssss_c2_order
				FROM
					marketing.market_kid_stat_new_v5
				WHERE
					1 = 1
				AND dts >= '2018-12-12'
				AND dts <= '2018-12-15'
				AND lower(fr) IN ('bd_sem')
				AND (
					city IN (
						'上海',
						'东莞',
						'中山',
						'临沂',
						'乌鲁木齐',
						'伊犁',
						'佛山',
						'保定',
						'全国',
						'兰州',
						'包头',
						'北京',
						'南京',
						'南宁',
						'南昌',
						'南通',
						'南阳',
						'厦门',
						'合肥',
						'呼和浩特',
						'咸阳',
						'哈尔滨',
						'唐山',
						'嘉兴',
						'大同',
						'大连',
						'天津',
						'太原',
						'宁波',
						'宜昌',
						'宿迁',
						'常州',
						'广州',
						'廊坊',
						'徐州',
						'惠州',
						'成都',
						'扬州',
						'新乡',
						'无锡',
						'昆明',
						'杭州',
						'武汉',
						'沈阳',
						'泉州',
						'泰州',
						'泸州',
						'洛阳',
						'济南',
						'济宁',
						'淮安',
						'深圳',
						'温州',
						'澳门',
						'烟台',
						'珠海',
						'盐城',
						'石家庄',
						'福州',
						'绵阳',
						'芜湖',
						'苏州',
						'襄阳',
						'西安',
						'许昌',
						'贵阳',
						'达州',
						'郑州',
						'重庆',
						'金华',
						'银川',
						'镇江',
						'长春',
						'长沙',
						'青岛'
					)
				)
				GROUP BY
					keyword_id,
					fr
			) ANY
		LEFT JOIN (
			SELECT
				fr,
				keyword_id,
				account_id
			FROM
				marketing.sem_keyword_type
		) USING keyword_id,
		fr
	WHERE
		1 = 1
	AND lower(fr) IN ('bd_sem')
	AND account_id IN (
		'18091503',
		'18091505',
		'18091501'
	)
	) USING keyword_id,
	fr
GROUP BY
	account_id,
	fr
) USING account_id,
 fr
ORDER BY
	account_id
LIMIT 0,
 50

 

Guess you like

Origin blog.csdn.net/Alice_qixin/article/details/86494438