Query ElasticSearch: Use SQL instead of DSL

The ES7.x version of x-pack comes with ElasticSearch SQL, and we can directly use SQL queries through SQL REST API, SQL CLI, etc.

SQL REST API

Enter in Kibana Console:

POST /_sql?format=txt
{
  "query": "SELECT * FROM library ORDER BY page_count DESC LIMIT 5"
}

Replace the above SQL with your own SQL statement. The return format is as follows:

    author      |        name        |  page_count   | release_date
-----------------+--------------------+---------------+------------------------
Peter F. Hamilton|Pandora's Star      |768            |2004-03-02T00:00:00.000Z
Vernor Vinge     |A Fire Upon the Deep|613            |1992-06-01T00:00:00.000Z
Frank Herbert    |Dune                |604            |1965-06-01T00:00:00.000Z

SQL CLI

elasticsearch-sql-cli is a script file in the bin directory when ES is installed, or it can be downloaded separately. We run in the ES directory

./bin/elasticsearch-sql-cli https://some.server:9200

Enter sql to query

sql> SELECT * FROM library WHERE page_count > 500 ORDER BY page_count DESC;
     author      |        name        |  page_count   | release_date
-----------------+--------------------+---------------+---------------
Peter F. Hamilton|Pandora's Star      |768            |1078185600000
Vernor Vinge     |A Fire Upon the Deep|613            |707356800000
Frank Herbert    |Dune                |604            |-144720000000

SQL To DSL

Type in Kibana:

POST /_sql/translate
{
  "query": "SELECT * FROM library ORDER BY page_count DESC",
  "fetch_size": 10
}

You can get the converted DSL query:

{
  "size": 10,
  "docvalue_fields": [
    {
      "field": "release_date",
      "format": "epoch_millis"
    }
  ],
  "_source": {
    "includes": [
      "author",
      "name",
      "page_count"
    ],
    "excludes": []
  },
  "sort": [
    {
      "page_count": {
        "order": "desc",
        "missing": "_first",
        "unmapped_type": "short"
      }
    }
  ]
}

Because the query-related statements have been generated, we only need to modify or not modify appropriately on this basis to use DSL happily.

Here we detail under ES SQL supported SQL statements and how to avoid misuse .

First, you need to understand the correspondence between SQL terms and ES terms in the SQL statements supported by ES SQL:

The syntax support of ES SQL mostly follows the ANSI SQL standard, and the supported SQL statements include DML queries and some DDL queries.
DDL query such as: DESCRIBE table, SHOW COLUMNS IN tableslightly tasteless, we mainly look for SELECT,FunctionDML query support.

SELECT

The grammatical structure is as follows:

SELECT [TOP [ count ] ] select_expr [, ...]
[ FROM table_name ]
[ WHERE condition ]
[ GROUP BY grouping_element [, ...] ]
[ HAVING condition]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count ] ]
[ PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) ) ]

Represents getting row data from 0-N tables. The execution order of SQL is:

  1. Get all FROMof the keywords to determine the table name.

  2. If there are WHEREconditions to filter out all the lines do not meet.

  3. If there are GROUP BYconditions, the packet aggregation; if HAVINGconditions, polymerization results are filtered.

  4. The result obtained in the previous step is select_exprcalculated to determine the specific returned data.

  5. If there are ORDER BYconditions, have returned data sorting.

  6. If there is an LIMITor TOPcondition, a subset of the result of the previous step will be returned.

There are two differences from commonly used SQL, ES SQL support TOP [ count ]and PIVOT ( aggregation_expr FOR column IN ( value [ [ AS ] alias ] [, ...] ) )clauses.
TOP [ count ]: If it SELECT TOP 2 first_name FROM empmeans to return two data at most, it cannot be LIMITshared with conditions.
PIVOTThe clause will perform row-to-column conversion of the results obtained by its aggregation conditions for further operations. I haven't used this before, so I won't introduce it.

FUNCTION

Based on the above SQL, we can actually have SQL for filtering, aggregation, sorting, and paging. But we need to learn more about the FUNCTION support in ES SQL in order to write rich SQL with full-text search, aggregation, and grouping functions.
Use to list SHOW FUNCTIONSthe supported function names and their types.

SHOW FUNCTIONS;

      name       |     type
-----------------+---------------
AVG              |AGGREGATE
COUNT            |AGGREGATE
FIRST            |AGGREGATE
FIRST_VALUE      |AGGREGATE
LAST             |AGGREGATE
LAST_VALUE       |AGGREGATE
MAX              |AGGREGATE
MIN              |AGGREGATE
SUM              |AGGREGATE
........

We mainly look at the common functions related to aggregation, grouping, and full-text search.

Full text matching function

MATCH: It is equivalent to match and multi_match query in DSL.

MATCH(
    field_exp,       --字段名称
    constant_exp,       --字段的匹配值
    [, options])       --可选项

Examples of use:

SELECT author, name FROM library WHERE MATCH(author, 'frank');

    author     |       name
---------------+-------------------
Frank Herbert  |Dune
Frank Herbert  |Dune Messiah
SELECT author, name, SCORE() FROM library WHERE MATCH('author^2,name^5', 'frank dune');

    author     |       name        |    SCORE()
---------------+-------------------+---------------
Frank Herbert  |Dune               |11.443176
Frank Herbert  |Dune Messiah       |9.446629

QUERY: It is equivalent to query_string in DSL.

QUERY(
    constant_exp      --匹配值表达式
    [, options])       --可选项

Examples of use:

SELECT author, name, page_count, SCORE() FROM library WHERE QUERY('_exists_:"author" AND page_count:>200 AND (name:/star.*/ OR name:duna~)');

      author      |       name        |  page_count   |    SCORE()
------------------+-------------------+---------------+---------------
Frank Herbert     |Dune               |604            |3.7164764
Frank Herbert     |Dune Messiah       |331            |3.4169943

SCORE(): Return the relevance of the input data and the returned data.
Examples of use:

SELECT SCORE(), * FROM library WHERE MATCH(name, 'dune') ORDER BY SCORE() DESC;

    SCORE()    |    author     |       name        |  page_count   |    release_date
---------------+---------------+-------------------+---------------+--------------------
2.2886353      |Frank Herbert  |Dune               |604            |1965-06-01T00:00:00Z
1.8893257      |Frank Herbert  |Dune Messiah       |331            |1969-10-15T00:00:00Z

Aggregate function

AVG(numeric_field) : Calculate the average value of numeric fields.

SELECT AVG(salary) AS avg FROM emp;

COUNT(expression): Returns the total number of input data, including the data whose value is null corresponding to field_name in COUNT().
COUNT(ALL field_name): Returns the total number of input data, excluding the data whose value is null corresponding to field_name.
COUNT(DISTINCT field_name): Returns the total number of values ​​corresponding to field_name in the input data that are not null.
SUM(field_name): Returns the sum of the values ​​corresponding to the numeric field field_name in the input data.
MIN(field_name): Returns the minimum value of the value corresponding to the numeric field field_name in the input data.
MAX(field_name): Returns the maximum value corresponding to the numeric field field_name in the input data.

Grouping function

The grouping function here corresponds to the bucket grouping in the DSL.

HISTOGRAM: The syntax is as follows:

HISTOGRAM(
           numeric_exp,    --数字表达式,通常是一个field_name
           numeric_interval    --数字的区间值
)

HISTOGRAM(
           date_exp,      --date/time表达式,通常是一个field_name
           date_time_interval      --date/time的区间值
)

The following returns the data of births in the early morning of January 1st each year:

ELECT HISTOGRAM(birth_date, INTERVAL 1 YEAR) AS h, COUNT(*) AS c FROM emp GROUP BY h;


           h            |       c
------------------------+---------------
null                    |10
1952-01-01T00:00:00.000Z|8
1953-01-01T00:00:00.000Z|11
1954-01-01T00:00:00.000Z|8
1955-01-01T00:00:00.000Z|4
1956-01-01T00:00:00.000Z|5
1957-01-01T00:00:00.000Z|4
1958-01-01T00:00:00.000Z|7
1959-01-01T00:00:00.000Z|9
1960-01-01T00:00:00.000Z|8
1961-01-01T00:00:00.000Z|8
1962-01-01T00:00:00.000Z|6
1963-01-01T00:00:00.000Z|7
1964-01-01T00:00:00.000Z|4
1965-01-01T00:00:00.000Z|1

ES SQL limitations

Because ES SQL and ES DSL are not completely functionally matched, the SQL limitations mentioned in the official documents are:

Large queries may throw ParsingException

In the parsing phase, extremely large queries will take up too much memory. In this case, the Elasticsearch SQL engine will abort the parsing and throw an error.

Representation of nested type fields

SQL does not support nested type fields, can only be used

[nested_field_name].[sub_field_name]

This form refers to inline subfields.
Examples of use:

SELECT dep.dep_name.keyword FROM test_emp GROUP BY languages;

The nested type field cannot be used in the Scalar function of where and order by

They are as the following SQL error of

SELECT * FROM test_emp WHERE LENGTH(dep.dep_name.keyword) > 5;

SELECT * FROM test_emp ORDER BY YEAR(dep.start_date);

Does not support simultaneous query of multiple nested fields

For example, the nested fields nested_A and nested_B cannot be used at the same time.

Paging limit of nested inner field

When the paging query has nested fields, the paging results may be incorrect. This is because: the pagination query in ES occurs on the Root nested document, not its inner field.

The field of keyword type does not support normalizer

Does not support array type fields

This is because a field in SQL corresponds to only one value. In this case, we can use the SQL To DSL API described above to convert it into a DSL statement, just use DSL to query.

Limitations of aggregate sorting

  • The sort field must be a field in the aggregation bucket. ES SQL CLI breaks this limitation, but the upper limit cannot exceed 512 rows, otherwise an exception will be thrown during the sorting stage. It is recommended to use with Limitclauses, such as:

SELECT * FROM test GROUP BY age ORDER BY COUNT(*) LIMIT 100;
  • The sort condition of aggregate sort does not support Scalar function or simple operator operations. Complex fields after aggregation (for example, containing aggregation functions) cannot be used in sorting conditions.

The following are examples of errors:

SELECT age, ROUND(AVG(salary)) AS avg FROM test GROUP BY age ORDER BY avg;

SELECT age, MAX(salary) - MIN(salary) AS diff FROM test GROUP BY age ORDER BY diff;

Limitations of subqueries

If the subquery contains GROUP BY or HAVINGor is more SELECT X FROM (SELECT ...) WHERE [simple_condition]complicated than this structure, it may be unsuccessful.

TIME data type field does not support GROUP BY condition and HISTOGRAM function

Such as the following query is wrong:

SELECT count(*) FROM test GROUP BY CAST(date_created AS TIME);

SELECT HISTOGRAM(CAST(birth_date AS TIME), INTERVAL '10' MINUTES) as h, COUNT(*) FROM t GROUP BY h

But wrapping the TIME type field as a Scalar function to return is to support GROUP BY, such as:

SELECT count(*) FROM test GROUP BY MINUTE((CAST(date_created AS TIME));

Restrictions on returned fields
If a field is not stored in the source, it cannot be queried. keyword, date, scaled_float, geo_point, geo_shapeThese types of fields from such restrictions, because they are not from _sourcethe return, but rather from the docvalue_fieldsreturn in.

There is no way, but the technique can be achieved; if there is no way, it ends with the technique

Welcome everyone to follow the Java Way public account

Good article, I am reading ❤️

Guess you like

Origin blog.csdn.net/hollis_chuang/article/details/108675333