ClickHouse advantage
- Parallel processing of a single query (using multiple cores)
- Distributed processing on multiple servers
- Very fast scanning (see below benchmark), it can be used for real-time query
- Very suitable column stores "width" / "non-standardized" table (number of columns)
- Compression is good
- SQL support (limited)
- Good feature set, including support approximated
- Different storage engines (disk storage format)
- Very suitable structure log / event data and time-series data (engine MergeTree need a date field)
- Support index (primary key only, not all storage engines)
- Nice command-line interface, user-friendly format and progress bar
The following are ClickHouse features a complete list of
ClickHouse shortcomings
- Not really delete / update support, there is no transaction (with Spark and most of the big data system with the same)
- No secondary key (Spark with the same system and most of the big data)
- Own protocol (no MySQL protocol support)
- Limited SQL support, and implementation is a different connection. If you want to migrate MySQL or Spark, you may have to use the connection to rewrite all queries.
- No window function
Detailed here is the sql wording clickhouse:
clickhouse SQL query with mysql, presto of SQL roughly the same, but there are a few different places, this article only record clickhouseSQL query with my daily writing, including computing functions, aggregate functions, associated statement written, and some of the need to pay attention local.
clickhouse official website link address: https://clickhouse.yandex/
A: simple query
There is no difference with the simple query statement mysql database, etc.
select * from the library name. table where conditions
ex:select snow,sname,sage from student where sno= 1
Caveats: clickhouse is great for real-time data analysis engine, so the general use of large clickhouse table in the data, there are a lot of partition, so, when a query if the query data is too large, it is necessary under conditions where the role of small case use the limit to do query limit. Otherwise it will prompt query data exceeds the error prompt XXGB.
Two: association query
First: clickhouse query does not support more than two or more tables directly join, as follows conventional multi-table associated with this wording like mysql
Second: association on condition changed from using, using field must match the name in the table, and if not can select as field aliases, field names will be unified
Third: associate keywords commonly used at several
1.ALL LEFT JOIN
2.ANY LEFT JOIN
3.ALL FULL JOIN
General sql query associated with the following
ex:select * from table as a left join table as b on a.id=b.aid left join table as c on a.id=c.aid
The wording will complain in clickhouse, the association table above two tables, can be handled by way of sub-queries
ex: first TABLEA table and the table through the TABLEB results after the left and then left join association must be consistent join TABLEC associated conditions changed using using field names from on the tables, and if not can select as field aliases, field names will be unified
SELECT
*
FROM
( SELECT
*
FROM
(
(select *
FROM
TABLEA )
ALL LEFT JOIN
(select *
FROM
TABLEB )using aid
)
ALL left JOIN
(select *
FROM
TABLEC) USING aid
)
Three: commonly used functions or expressions
1.sum (field) sum
2.avg (field) averaged
3.round (field / SUM (field) / (formula a / b), 2) rounded to two decimals
4.case when field B = 0 then null / 0 else round (field A / field B, 4) is determined statement where the dividend is 0 then returns a null or 0 in
5.toString () into a string
6.concat (field value, 'such as the contents to be spliced%')
示例:concat(toString(round(round(a/b,4) * 100 ,2)),'%')
Four: the same section
In conditions where group by, order by, limit the use of these same mysql wording of SQL
Five: Practical examples:
SELECT
account_id,
'2018-12-12~2018-12-15' AS date,
account,
ad_click
FROM
(
SELECT
account_id,
fr,
fr_name,
account,
account_balance,
account_budget,
account_exclude_ip,
account_budget_offline_time,
account_status
FROM
marketing.sem_account_type
WHERE
1 = 1
AND lower(fr) IN ('bd_sem')
AND account_id IN (
'18091503',
'18091505',
'18091501'
)
) ALL
LEFT JOIN (
SELECT
account_id,
fr,
round(sum(ad_cost) / 3, 2) AS ad_cost,
round(sum(ad_cost_real) / 3, 2) AS ad_cost_real,
round(sum(ad_impression) / 3, 2) AS ad_impression,
round(sum(ad_click) / 3, 2) AS ad_click,
round(sum(clue_all) / 3, 2) AS clue_all,
round(sum(clue_all_new) / 3, 2) AS clue_all_new,
round(
sum(
c1_kpi_daily_new_customer_amount
) / 3,
2
) AS c1_kpi_daily_new_customer_amount,
round(
sum(c1_kpi_new_customer_amount) / 3,
2
) AS c1_kpi_new_customer_amount,
round(
sum(
c2_kpi_daily_new_customer_amount
) / 3,
2
) AS c2_kpi_daily_new_customer_amount,
round(
sum(c2_kpi_new_customer_amount) / 3,
2
) AS c2_kpi_new_customer_amount,
round(sum(c2c_c1_create) / 3, 2) AS c2c_c1_create,
round(sum(c2b_c1_create) / 3, 2) AS c2b_c1_create,
round(sum(c2c_c1_onsite) / 3, 2) AS c2c_c1_onsite,
round(sum(c2b_c1_onsite) / 3, 2) AS c2b_c1_onsite,
round(sum(c2c_c1_onsale) / 3, 2) AS c2c_c1_onsale,
round(sum(c2b_c1_onsale) / 3, 2) AS c2b_c1_onsale,
round(sum(c2c_c2_appoint) / 3, 2) AS c2c_c2_appoint,
round(sum(b2c_c2_appoint) / 3, 2) AS b2c_c2_appoint,
round(sum(ssss_c2_appoint) / 3, 2) AS ssss_c2_appoint,
round(
sum(c2c_c2_finish_appoint) / 3,
2
) AS c2c_c2_finish_appoint,
round(
sum(b2c_c2_finish_appoint) / 3,
2
) AS b2c_c2_finish_appoint,
round(
sum(ssss_c2_finish_appoint) / 3,
2
) AS ssss_c2_finish_appoint,
round(sum(c2c_c2_order) / 3, 2) AS c2c_c2_order,
round(sum(b2c_c2_order) / 3, 2) AS b2c_c2_order,
round(sum(weighting_number) / 3, 2) AS weighting_number,
round(sum(ssss_c2_order) / 3, 2) AS ssss_c2_order
FROM
(
SELECT
fr,
keyword_id,
account_id,
cost AS ad_cost,
cost_real AS ad_cost_real,
impression AS ad_impression,
click AS ad_click
FROM
marketing.sem_keyword_report
WHERE
1 = 1
AND the_day >= '2018-12-12'
AND the_day <= '2018-12-15'
AND lower(fr) IN ('bd_sem')
AND account_id IN (
'18091503',
'18091505',
'18091501'
)
AND (
campaign_city IN (
'上海',
'东莞',
'中山',
'临沂',
'乌鲁木齐',
'伊犁',
'佛山',
'保定',
'全国',
'兰州',
'包头',
'北京',
'南京',
'南宁',
'南昌',
'南通',
'南阳',
'厦门',
'合肥',
'呼和浩特',
'咸阳',
'哈尔滨',
'唐山',
'嘉兴',
'大同',
'大连',
'天津',
'太原',
'宁波',
'宜昌',
'宿迁',
'常州',
'广州',
'廊坊',
'徐州',
'惠州',
'成都',
'扬州',
'新乡',
'无锡',
'昆明',
'杭州',
'武汉',
'沈阳',
'泉州',
'泰州',
'泸州',
'洛阳',
'济南',
'济宁',
'淮安',
'深圳',
'温州',
'澳门',
'烟台',
'珠海',
'盐城',
'石家庄',
'福州',
'绵阳',
'芜湖',
'苏州',
'襄阳',
'西安',
'许昌',
'贵阳',
'达州',
'郑州',
'重庆',
'金华',
'银川',
'镇江',
'长春',
'长沙',
'青岛'
)
)
) ALL
FULL JOIN (
SELECT
fr,
keyword_id,
account_id,
clue_all,
clue_all_new,
c1_kpi_daily_new_customer_amount,
c1_kpi_new_customer_amount,
c2_kpi_daily_new_customer_amount,
c2_kpi_new_customer_amount,
c2c_c1_create,
c2b_c1_create,
c2c_c1_onsite,
c2b_c1_onsite,
c2c_c1_onsale,
c2b_c1_onsale,
c2c_c2_appoint,
b2c_c2_appoint,
ssss_c2_appoint,
c2c_c2_finish_appoint,
b2c_c2_finish_appoint,
ssss_c2_finish_appoint,
c2c_c2_order,
b2c_c2_order,
weighting_number,
ssss_c2_order
FROM
(
SELECT
fr,
kid AS keyword_id,
sum(clue_all) AS clue_all,
sum(clue_all_new) AS clue_all_new,
sum(
c1_kpi_daily_new_customer_amount
) AS c1_kpi_daily_new_customer_amount,
sum(c1_kpi_new_customer_amount) AS c1_kpi_new_customer_amount,
sum(
c2_kpi_daily_new_customer_amount
) AS c2_kpi_daily_new_customer_amount,
sum(c2_kpi_new_customer_amount) AS c2_kpi_new_customer_amount,
sum(c2c_c1_create) AS c2c_c1_create,
sum(c2b_c1_create) AS c2b_c1_create,
sum(c2c_c1_onsite) AS c2c_c1_onsite,
sum(c2b_c1_onsite) AS c2b_c1_onsite,
sum(c2c_c1_onsale) AS c2c_c1_onsale,
sum(c2b_c1_onsale) AS c2b_c1_onsale,
sum(c2c_c2_appoint) AS c2c_c2_appoint,
sum(b2c_c2_appoint) AS b2c_c2_appoint,
sum(ssss_c2_appoint) AS ssss_c2_appoint,
sum(c2c_c2_finish_appoint) AS c2c_c2_finish_appoint,
sum(b2c_c2_finish_appoint) AS b2c_c2_finish_appoint,
sum(ssss_c2_finish_appoint) AS ssss_c2_finish_appoint,
sum(c2c_c2_order) AS c2c_c2_order,
sum(b2c_c2_order) AS b2c_c2_order,
sum(weighting_number) AS weighting_number,
sum(ssss_c2_order) AS ssss_c2_order
FROM
marketing.market_kid_stat_new_v5
WHERE
1 = 1
AND dts >= '2018-12-12'
AND dts <= '2018-12-15'
AND lower(fr) IN ('bd_sem')
AND (
city IN (
'上海',
'东莞',
'中山',
'临沂',
'乌鲁木齐',
'伊犁',
'佛山',
'保定',
'全国',
'兰州',
'包头',
'北京',
'南京',
'南宁',
'南昌',
'南通',
'南阳',
'厦门',
'合肥',
'呼和浩特',
'咸阳',
'哈尔滨',
'唐山',
'嘉兴',
'大同',
'大连',
'天津',
'太原',
'宁波',
'宜昌',
'宿迁',
'常州',
'广州',
'廊坊',
'徐州',
'惠州',
'成都',
'扬州',
'新乡',
'无锡',
'昆明',
'杭州',
'武汉',
'沈阳',
'泉州',
'泰州',
'泸州',
'洛阳',
'济南',
'济宁',
'淮安',
'深圳',
'温州',
'澳门',
'烟台',
'珠海',
'盐城',
'石家庄',
'福州',
'绵阳',
'芜湖',
'苏州',
'襄阳',
'西安',
'许昌',
'贵阳',
'达州',
'郑州',
'重庆',
'金华',
'银川',
'镇江',
'长春',
'长沙',
'青岛'
)
)
GROUP BY
keyword_id,
fr
) ANY
LEFT JOIN (
SELECT
fr,
keyword_id,
account_id
FROM
marketing.sem_keyword_type
) USING keyword_id,
fr
WHERE
1 = 1
AND lower(fr) IN ('bd_sem')
AND account_id IN (
'18091503',
'18091505',
'18091501'
)
) USING keyword_id,
fr
GROUP BY
account_id,
fr
) USING account_id,
fr
ORDER BY
account_id
LIMIT 0,
50