The ES7.x version of x-pack comes with ElasticSearch SQL, and we can directly use SQL queries through SQL REST API, SQL CLI, etc.
SQL REST API
Enter in Kibana Console:
POST /_sql?format=txt
{
"query": "SELECT * FROM library ORDER BY page_count DESC LIMIT 5"
}
Replace the above SQL with your own SQL statement. The return format is as follows:
author | name | page_count | release_date
-----------------+--------------------+---------------+------------------------
Peter F. Hamilton|Pandora's Star |768 |2004-03-02T00:00:00.000Z
Vernor Vinge |A Fire Upon the Deep|613 |1992-06-01T00:00:00.000Z
Frank Herbert |Dune |604 |1965-06-01T00:00:00.000Z
SQL CLI
elasticsearch-sql-cli is a script file in the bin directory when ES is installed, or it can be downloaded separately. We run in the ES directory
./bin/elasticsearch-sql-cli https://some.server:9200
Enter sql to query
sql> SELECT * FROM library WHERE page_count > 500 ORDER BY page_count DESC;
author | name | page_count | release_date
-----------------+--------------------+---------------+---------------
Peter F. Hamilton|Pandora's Star |768 |1078185600000
Vernor Vinge |A Fire Upon the Deep|613 |707356800000
Frank Herbert |Dune |604 |-144720000000
SQL To DSL
Type in Kibana:
POST /_sql/translate
{
"query": "SELECT * FROM library ORDER BY page_count DESC",
"fetch_size": 10
}
You can get the converted DSL query:
{
"size": 10,
"docvalue_fields": [
{
"field": "release_date",
"format": "epoch_millis"
}
],
"_source": {
"includes": [
"author",
"name",
"page_count"
],
"excludes": []
},
"sort": [
{
"page_count": {
"order": "desc",
"missing": "_first",
"unmapped_type": "short"
}
}
]
}
Because the query-related statements have been generated, we only need to modify or not modify appropriately on this basis to use DSL happily.
Here we detail under ES SQL supported SQL statements and how to avoid misuse .
First, you need to understand the correspondence between SQL terms and ES terms in the SQL statements supported by ES SQL:
The syntax support of ES SQL mostly follows the ANSI SQL standard, and the supported SQL statements include DML queries and some DDL queries.
DDL query such as: DESCRIBE table
, SHOW COLUMNS IN table
slightly tasteless, we mainly look for SELECT,Function
DML query support.
SELECT
The grammatical structure is as follows:
SELECT [TOP [ count ] ] select_expr [, ...]
[ FROM table_name ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]
Represents getting row data from 0-N tables. The execution order of SQL is:
Get all
FROM
of the keywords to determine the table name.If there are
WHERE
conditions to filter out all the lines do not meet.If there are
GROUP BY
conditions, the packet aggregation; ifHAVING
conditions, polymerization results are filtered.The result obtained in the previous step is
select_expr
calculated to determine the specific returned data.If there are
ORDER BY
conditions, have returned data sorting.If there is an
LIMIT
orTOP
condition, a subset of the result of the previous step will be returned.
There are two differences from commonly used SQL, ES SQL support
TOP [ count ]
andPIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) )
clauses.TOP [ count ]
: If itSELECT TOP 2 first_name FROM emp
means to return two data at most, it cannot beLIMIT
shared with conditions.PIVOT
The clause will perform row-to-column conversion of the results obtained by its aggregation conditions for further operations. I haven't used this before, so I won't introduce it.
FUNCTION
Based on the above SQL, we can actually have SQL for filtering, aggregation, sorting, and paging. But we need to learn more about the FUNCTION support in ES SQL in order to write rich SQL with full-text search, aggregation, and grouping functions.
Use to list SHOW FUNCTIONS
the supported function names and their types.
SHOW FUNCTIONS;
name | type
-----------------+---------------
AVG |AGGREGATE
COUNT |AGGREGATE
FIRST |AGGREGATE
FIRST_VALUE |AGGREGATE
LAST |AGGREGATE
LAST_VALUE |AGGREGATE
MAX |AGGREGATE
MIN |AGGREGATE
SUM |AGGREGATE
........
We mainly look at the common functions related to aggregation, grouping, and full-text search.
Full text matching function
MATCH
: It is equivalent to match and multi_match query in DSL.
MATCH(
field_exp, --字段名称
constant_exp, --字段的匹配值
[, options]) --可选项
Examples of use:
SELECT author, name FROM library WHERE MATCH(author, 'frank');
author | name
---------------+-------------------
Frank Herbert |Dune
Frank Herbert |Dune Messiah
SELECT author, name, SCORE() FROM library WHERE MATCH('author^2,name^5', 'frank dune');
author | name | SCORE()
---------------+-------------------+---------------
Frank Herbert |Dune |11.443176
Frank Herbert |Dune Messiah |9.446629
QUERY
: It is equivalent to query_string in DSL.
QUERY(
constant_exp --匹配值表达式
[, options]) --可选项
Examples of use:
SELECT author, name, page_count, SCORE() FROM library WHERE QUERY('_exists_:"author" AND page_count:>200 AND (name:/star.*/ OR name:duna~)');
author | name | page_count | SCORE()
------------------+-------------------+---------------+---------------
Frank Herbert |Dune |604 |3.7164764
Frank Herbert |Dune Messiah |331 |3.4169943
SCORE()
: Return the relevance of the input data and the returned data.
Examples of use:
SELECT SCORE(), * FROM library WHERE MATCH(name, 'dune') ORDER BY SCORE() DESC;
SCORE() | author | name | page_count | release_date
---------------+---------------+-------------------+---------------+--------------------
2.2886353 |Frank Herbert |Dune |604 |1965-06-01T00:00:00Z
1.8893257 |Frank Herbert |Dune Messiah |331 |1969-10-15T00:00:00Z
Aggregate function
AVG(numeric_field)
: Calculate the average value of numeric fields.
SELECT AVG(salary) AS avg FROM emp;
COUNT(expression)
: Returns the total number of input data, including the data whose value is null corresponding to field_name in COUNT(). COUNT(ALL field_name)
: Returns the total number of input data, excluding the data whose value is null corresponding to field_name. COUNT(DISTINCT field_name)
: Returns the total number of values corresponding to field_name in the input data that are not null. SUM(field_name)
: Returns the sum of the values corresponding to the numeric field field_name in the input data. MIN(field_name)
: Returns the minimum value of the value corresponding to the numeric field field_name in the input data. MAX(field_name)
: Returns the maximum value corresponding to the numeric field field_name in the input data.
Grouping function
The grouping function here corresponds to the bucket grouping in the DSL.
HISTOGRAM
: The syntax is as follows:
HISTOGRAM(
numeric_exp, --数字表达式,通常是一个field_name
numeric_interval --数字的区间值
)
HISTOGRAM(
date_exp, --date/time表达式,通常是一个field_name
date_time_interval --date/time的区间值
)
The following returns the data of births in the early morning of January 1st each year:
ELECT HISTOGRAM(birth_date, INTERVAL 1 YEAR) AS h, COUNT(*) AS c FROM emp GROUP BY h;
h | c
------------------------+---------------
null |10
1952-01-01T00:00:00.000Z|8
1953-01-01T00:00:00.000Z|11
1954-01-01T00:00:00.000Z|8
1955-01-01T00:00:00.000Z|4
1956-01-01T00:00:00.000Z|5
1957-01-01T00:00:00.000Z|4
1958-01-01T00:00:00.000Z|7
1959-01-01T00:00:00.000Z|9
1960-01-01T00:00:00.000Z|8
1961-01-01T00:00:00.000Z|8
1962-01-01T00:00:00.000Z|6
1963-01-01T00:00:00.000Z|7
1964-01-01T00:00:00.000Z|4
1965-01-01T00:00:00.000Z|1
ES SQL limitations
Because ES SQL and ES DSL are not completely functionally matched, the SQL limitations mentioned in the official documents are:
Large queries may throw ParsingException
In the parsing phase, extremely large queries will take up too much memory. In this case, the Elasticsearch SQL engine will abort the parsing and throw an error.
Representation of nested type fields
SQL does not support nested type fields, can only be used
[nested_field_name].[sub_field_name]
This form refers to inline subfields.
Examples of use:
SELECT dep.dep_name.keyword FROM test_emp GROUP BY languages;
The nested type field cannot be used in the Scalar function of where and order by
They are as the following SQL error of
SELECT * FROM test_emp WHERE LENGTH(dep.dep_name.keyword) > 5;
SELECT * FROM test_emp ORDER BY YEAR(dep.start_date);
Does not support simultaneous query of multiple nested fields
For example, the nested fields nested_A and nested_B cannot be used at the same time.
Paging limit of nested inner field
When the paging query has nested fields, the paging results may be incorrect. This is because: the pagination query in ES occurs on the Root nested document, not its inner field.
The field of keyword type does not support normalizer
Does not support array type fields
This is because a field in SQL corresponds to only one value. In this case, we can use the SQL To DSL API described above to convert it into a DSL statement, just use DSL to query.
Limitations of aggregate sorting
The sort field must be a field in the aggregation bucket. ES SQL CLI breaks this limitation, but the upper limit cannot exceed 512 rows, otherwise an exception will be thrown during the sorting stage. It is recommended to use with
Limit
clauses, such as:
SELECT * FROM test GROUP BY age ORDER BY COUNT(*) LIMIT 100;
The sort condition of aggregate sort does not support Scalar function or simple operator operations. Complex fields after aggregation (for example, containing aggregation functions) cannot be used in sorting conditions.
The following are examples of errors:
SELECT age, ROUND(AVG(salary)) AS avg FROM test GROUP BY age ORDER BY avg;
SELECT age, MAX(salary) - MIN(salary) AS diff FROM test GROUP BY age ORDER BY diff;
Limitations of subqueries
If the subquery contains GROUP BY or HAVING
or is more SELECT X FROM (SELECT ...) WHERE [simple_condition]
complicated than this structure, it may be unsuccessful.
TIME data type field does not support GROUP BY condition and HISTOGRAM function
Such as the following query is wrong:
SELECT count(*) FROM test GROUP BY CAST(date_created AS TIME);
SELECT HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES) as h, COUNT(*) FROM t GROUP BY h
But wrapping the TIME type field as a Scalar function to return is to support GROUP BY, such as:
SELECT count(*) FROM test GROUP BY MINUTE((CAST(date_created AS TIME));
Restrictions on returned fields
If a field is not stored in the source, it cannot be queried. keyword, date, scaled_float, geo_point, geo_shape
These types of fields from such restrictions, because they are not from _source
the return, but rather from the docvalue_fields
return in.
There is no way, but the technique can be achieved; if there is no way, it ends with the technique
Welcome everyone to follow the Java Way public account
Good article, I am reading ❤️