How SQL query Elasticsearch

Foreword

This blog probably wanted to put the full range of the fifth, six revisit time, after all, a query is created in the index, document indexing data generation and only need some basic concepts introduced after the finish. After some knowledge of the current and then to explain the concept to explain the complete surface of all queries are best, but recently the company projects busy often work overtime, after the end of the year. But I do not write, then I fear that they will only prolong, eventually let the matter rest, so just Shanghai weekend snow, cold weather can not go out, sitting in front of the computer Qiaoxia this blog. Because the company I was in charge of this inquiry so I study a lot more points, but also to write more easily. Then enter the text.

Why use SQL Query

Previous article introduced, Elasticsearch official query language Query DSL, since it is officially designated, the most powerful explanation consistent with the ES, the ES do support. So why do we still use SQL queries? Whether this is superfluous it?

In fact, there is a reason, after all, exist, there is reasonable. As a SQL database query language syntax is simple, easy to write and most of the server-side programmers are familiar with it and a clear understanding of the wording. But as a new ES Meng who, even if he was already a programming industry old traveler, but if he was not familiar ES, so if he wants to use the company was set for the ES service, he must first learn Query DSL, learning cost is also a factor affecting the progress of technology development and stability is not high. But if ES SQL query support, then maybe even if he is the students work in 2012, he did not understand the concept of ES complex, he can be a good use of ES and successfully participate in the development of the team, after all, who SQL not write it?

Elasticsearch-SQL

Our formal introduction to our protagonist - Elasticsearch-SQL, Elasticsearch-SQL does not belong Elasticsearch official, it is NLPChina (Chinese natural language processing open source) open an ES plug-in, the main function is to query the ES via SQL, in fact, it's It is explained by the underlying SQL, SQL conversion to DSL syntax, and then through DSL queries.

Elasticsearch-SQL now supports all versions of about ES, and the scope of the recent 6.5.x are also supported, so you can see that maintenance is still quite frequent.

Install plug

Due to the difference between ES 2.x and 5.x versions of (detailed reference: version Select ), we installed the plug-ES is a little different,

Before installation 5.0 are:plugin install

./bin/plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/2.4.6.0/elasticsearch-sql-2.4.6.0.zip
复制代码

After 5.0 (including 6.x) installation of:elasticsearch-plugin install

./bin/elasticsearch-plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/5.0.1/elasticsearch-sql-5.0.1.0.zip
复制代码

If we are not successful installation, we can directly download Elasticsearch-SQL plug-in compression package, then extract, after the completion rename folder sql, put the installation path of the ES pluginsdirectory, for example: ..\elasticsearch-6.4.0\plugins\sql.

After you do this, you need to restart the server Elasticsearch, otherwise it will error: Invalid index name [sql], must not start with '']; ","status":400}.

Visual front end interface

Elasticsearch-SQL plug-in provides a visual interface, allowing you to execute SQL queries, the interface is as follows:

In elasticsearch 1.x / 2.x, you can access directly to the following address:

http://localhost:9200/_plugin/sql/
复制代码

In elasticsearch 5.x / 6.x, which requires the installation node.js and download and unzip Site , and then start the web front end like this:

cd site-server
npm install express --save
node node-server.js 
复制代码

Query Syntax

After the above operation, if no problem, now you can use SQL queries ES, and some of them are normal SQL syntax, and some are beyond SQL syntax, the equivalent is an enhancement to SQL syntax, ES query format is:

http://localhost:9200/_sql?sql=select * from indexName limit 10
复制代码

Simple Query

First on a simple query syntax:

SELECT fields from indexName WHERE conditions
复制代码

You can see, our previous query, the table name tableName place now changed the name of the index indexName, if there is an index Type, you can also write:

SELECT fields from indexName/type WHERE conditions
复制代码

You can also check at the same time more than one type of index, the syntax is as follows:

SELECT fields from indexName/type1,indexName/type2 WHERE conditions
复制代码

If you want to know how the current SQL is interpreted as SQL Query DSL Elasticsearch to be so by keyword explain.

http://localhost:9200/_sql/_explain?sql=select * from indexName limit 10
复制代码

Class function queries the polymerization

select COUNT(*),SUM(age),MIN(age) as m, MAX(age),AVG(age)
  FROM bank GROUP BY gender ORDER BY SUM(age), m DESC
复制代码

Additional enhancements to the query

Search

 SELECT address FROM bank WHERE address = matchQuery('880 Holmes Lane') ORDER BY _score DESC LIMIT 3
复制代码

Aggregations

  • range age group 20-25,25-30,30-35,35-40
SELECT COUNT(age) FROM bank GROUP BY range(age, 20,25,30,35,40)
复制代码
  • range date group by day
SELECT online FROM online GROUP BY date_histogram(field='insert_time','interval'='1d')
复制代码
  • range date group by your config
  SELECT online FROM online GROUP BY date_range(field='insert_time','format'='yyyy-MM-dd' ,'2014-08-18','2014-08-17','now-8d','now-7d','now-6d','now')
复制代码

Geographical inquiry

Elasticsearch can combine location, full-text search, structured search and analysis together. The Elasticsearch-sql basically supports all location-related queries, correspondence Elasticsearch chapters is Geolocation .

1, the geographical coordinates of the cartridge filter model

Geographical coordinates cartridge filter model (Geo Bounding Box Filter), to specify a rectangular top, bottom, left and right boundaries, then the filter simply determines whether the longitude coordinates between the left and right boundaries, is between the upper and lower boundaries of the latitude .

grammar:

GEO_BOUNDING_BOX(fieldName,topLeftLongitude,topLeftLatitude,bottomRightLongitude,bottomRightLatitude)
复制代码

Example:

SELECT * FROM location WHERE GEO_BOUNDING_BOX(center,100.0,1.0,101,0.0)
复制代码

2, the filter geographical distance

Geographical distance filter (geo_distance), at a given position as the center draw a circle, to find those geographic coordinates fall within a specified distance of the document.

grammar:

GEO_DISTANCE(fieldName,distance,fromLongitude,fromLatitude)
复制代码

Example:

SELECT * FROM location WHERE GEO_DISTANCE(center,'1km',100.5,0.5)
复制代码

3, the filter geographical distance interval

Filter distance range (Range Distance filter), a given position as a center, respectively, two circle given distance, and to identify the point specified distance at a given point between a minimum and a maximum distance, and geo_distance filteronly the difference is that Range Distance filteris a ring, it will exclude that part of the documents falling in inner ring.

grammar:

GEO_DISTANCE_RANGE(fieldName,distanceFrom,distanceTo,fromLongitude,fromLatitude)
复制代码

Example:

SELECT * FROM location WHERE GEO_DISTANCE_RANGE(center,'1m','1km',100.5,0.50001)
复制代码

4、Polygon filter (works on points)

Find points falling polygon. This filter uses a high price. When you feel you need to use it, it is best to look at GEO-Shapes .

grammar:

GEO_POLYGON(fieldName,lon1,lat1,lon2,lat2,lon3,lat3,...)
复制代码

Example:

SELECT * FROM location WHERE GEO_POLYGON(center,100,0,100.5,2,101.0,0)
复制代码

5、GeoShape Intersects filter (works on geoshapes)

You should use WKT shape representation of the query. grammar:

GEO_INTERSECTS(fieldName,'WKT')
复制代码

Example:

SELECT * FROM location WHERE GEO_INTERSECTS(place,'POLYGON ((102 2, 103 2, 103 3, 102 3, 102 2))
复制代码

For more queries about geography can refer to here .

Actual usage

We in this series, the first tutorial in our index created nbato make an example, as follows:

** 1 ** inquiry nba all Team Info

http://localhost:9200/_sql?sql=select * from nba limit 10
复制代码

search result:

** 2 ** inquiry team headed star James Information

http://localhost:9200/_sql?sql=select * from nba where topStar  = "勒布朗·詹姆斯"
复制代码

search result:

** 3 ** according Established in descending order

http://localhost:9200/_sql?sql=select * from nba order by date desc
复制代码

search result:

4 ** ** query has more than five championship teams information

http://localhost:9200/_sql?sql=select * from nba where championship  >= 5
复制代码

search result:

* 5, the number of queries ** championship team number range between respectively 1-5,5-10,10-15,15-20

http://localhost:9200/_sql?sql=SELECT COUNT(championship) FROM nba GROUP BY range(championship, 1,5,10,15,20) 
复制代码

search result:

Of course there is more to the wording of the specific implementation here is not to appeal, and interested readers can build their own projects and then try the next, more features SQL wording can be found here:

Java implementation

As already describes the installation and use of Elasticsearch-SQL, then how do we use it in the project, Elasticsearch-SQL is the underlying Java language developed by parsing the SQL language is converted to DSL, then to query results, the analysis results into key -value fixed format returned.

The introduction of dependence

Before use we need to rely on the introduction maven

<dependency>
    <groupId>org.nlpcn</groupId>
    <artifactId>elasticsearch-sql</artifactId>
    <version>x.x.x.0</version>
</dependency>
复制代码

Version number (xxx) and the corresponding version needed Elasticsearch, the specific correspondence between substantially illustrated in Figure 17:

But not all versions, we can from the Maven Repository get in to, if we can only get as several versions depends directly from the Maven repository, which lacks many versions:

What if we are using a different version of how to solve the ES-dependent jar package problem? Remember after we started to download the plug-extracting sql文件夹it? For example unzip the file 6.5.0 version of the plugin folder contents as follows:

There is there we need the jar package, with the jar package easier to handle, we may be added directly to the project, of course, the best way is uploaded to the company's private repository, and then rely on to come through pom file.

Building project

After the jar package can solve the problem officially entered the development stage, and create a new springboot project, the introduction of the dependent, and you will find everything ready, the ES how to connect it?

There are two ways to achieve our functions via JDBC it is a way to connect to the database as the connection ES. Another way is through tansport client.

JDBC way

The sample code

public void testJDBC() throws Exception {
        Properties properties = new Properties();
        properties.put("url", "jdbc:elasticsearch://192.168.3.31:9300,192.168.3.32:9300/" + TestsConstants.TEST_INDEX);
        DruidDataSource dds = (DruidDataSource) ElasticSearchDruidDataSourceFactory.createDataSource(properties);
        Connection connection = dds.getConnection();
        PreparedStatement ps = connection.prepareStatement("SELECT  gender,lastname,age from  " + TestsConstants.TEST_INDEX + " where lastname='Heath'");
        ResultSet resultSet = ps.executeQuery();
        List<String> result = new ArrayList<String>();
        while (resultSet.next()) {
              System.out.println(resultSet.getString("lastname") + "," + resultSet.getInt("age") + "," + resultSet.getString("gender"))
        }
        ps.close();
        connection.close();
        dds.close();
    }
复制代码

This approach is the most intuitive, Druid uses connection pooling, so we also need to introduce druid rely on in the project, but also noted dependent version, otherwise it will error.

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>druid</artifactId>
    <version>1.0.15</version>
</dependency>
复制代码

This way is well understood, but also easy to develop, but I find the application in the project it has many shortcomings, so I finally looked at their own source code, by way of repackaging API calls.

API way

In fact elasticsearch-sql did not provide documentation development, and does not describe how to call Java API way to develop, we need to read elasticsearch-sql source code to discover its service, and then packaged into what we need, by reading the source code we find the following a significant feature of the Service class.

public class SearchDao {

	private static final Set<String> END_TABLE_MAP = new HashSet<>();

	static {
		END_TABLE_MAP.add("limit");
		END_TABLE_MAP.add("order");
		END_TABLE_MAP.add("where");
		END_TABLE_MAP.add("group");

	}

	private Client client = null;


	public SearchDao(Client client) {
		this.client = client;
	}

    public Client getClient() {
        return client;
    }

    /**
	 * Prepare action And transform sql
	 * into ES ActionRequest
	 * @param sql SQL query to execute.
	 * @return ES request
	 * @throws SqlParseException
	 */
	public QueryAction explain(String sql) throws SqlParseException, SQLFeatureNotSupportedException {
		return ESActionFactory.create(client, sql);
	}
}
复制代码

SearchDao explain class has a method, the received parameter is a string sql, the result is QueryAction, QueryAction is an abstract class, which have the following subclasses

As can be seen, corresponding to each subclass is a function of the query, aggregate queries, the default query, delete, hash join query, the query is connected, like nested queries.

QueryAction we can accept obtained by the method QueryActionElasticExecutor executeAnyAction class, and internal processing, and then the corresponding execution result can be obtained.

 public static Object executeAnyAction(Client client , QueryAction queryAction) throws SqlParseException, IOException {
        if(queryAction instanceof DefaultQueryAction)
            return executeSearchAction((DefaultQueryAction) queryAction);
        if(queryAction instanceof AggregationQueryAction)
            return executeAggregationAction((AggregationQueryAction) queryAction);
        if(queryAction instanceof ESJoinQueryAction)
            return executeJoinSearchAction(client, (ESJoinQueryAction) queryAction);
        if(queryAction instanceof MultiQueryAction)
            return executeMultiQueryAction(client, (MultiQueryAction) queryAction);
        if(queryAction instanceof DeleteQueryAction )
            return executeDeleteAction((DeleteQueryAction) queryAction);
        return null;
    }
复制代码

Although the results obtained, but it is an Object type, we also need a customized look, noticed a class: ObjectResultsExtractorits constructor as follows, contains three constructors Boolean parameter. Their role is in the result set contains a score, whether to include type, whether to include ID, we are set to false.

public ObjectResultsExtractor(boolean includeScore, boolean includeType, boolean includeId) {
    this.includeScore = includeScore;
    this.includeType = includeType;
    this.includeId = includeId;
    this.currentLineIndex = 0;
}
复制代码

ObjectResultsExtractorIt is the only method for modifying a pulic external extractResults.

public ObjectResult extractResults(Object queryResult, boolean flat) throws ObjectResultsExtractException {
    if (queryResult instanceof SearchHits) {
        SearchHit[] hits = ((SearchHits) queryResult).getHits();
        List<Map<String, Object>> docsAsMap = new ArrayList<>();
        List<String> headers = createHeadersAndFillDocsMap(flat, hits, docsAsMap);
        List<List<Object>> lines = createLinesFromDocs(flat, docsAsMap, headers);
        return new ObjectResult(headers, lines);
    }
    if (queryResult instanceof Aggregations) {
        List<String> headers = new ArrayList<>();
        List<List<Object>> lines = new ArrayList<>();
        lines.add(new ArrayList<Object>());
        handleAggregations((Aggregations) queryResult, headers, lines);
        
        // remove empty line。
        if(lines.get(0).size() == 0) {
            lines.remove(0);
        }
        //todo: need to handle more options for aggregations:
        //Aggregations that inhrit from base
        //ScriptedMetric

        return new ObjectResult(headers, lines);

    }
    return null;
}
复制代码

At this point we have a general understanding of its query API, and then we just need to do the following code to call in our project can be completed our search function, and end up with ObjectResultis our final query result set up.

//1.解释SQL
SearchDao searchDao = new SearchDao(transportClient);
QueryAction queryAction = searchDao.explain(sql);
//2.执行        
Object execution = QueryActionElasticExecutor.executeAnyAction(searchDao.getClient(), queryAction);
//3.格式化查询结果            
ObjectResult result = (new ObjectResultsExtractor(true, false, false)).extractResults(execution, true);
复制代码

Thus, code development is complete, let's test run results, our foreign offers three interfaces, is a way to query API, JDBC is a way of inquiry, as well as an explanation SQL.

@RestController
@RequestMapping("/es/data")
public class ElasticSearchController {

    @Autowired
    private ElasticSearchSqlService elasticSearchSqlService;

    @PostMapping(value = "/search")
    public CommonResult search(@RequestBody QueryDto queryDto) {
        SearchResultDTO resultDTO = elasticSearchSqlService.search(queryDto.getSql());
        return CommonResult.success(resultDTO.getResult());
    }

    @PostMapping(value = "/query")
    public CommonResult query(@RequestBody QueryDto queryDto) {
        SearchResultDTO resultDTO = elasticSearchSqlService.query(queryDto.getSql(), queryDto.getIndex());
        return CommonResult.success(resultDTO.getResult());
    }

    @PostMapping(value = "/explain")
    public CommonResult explain(@RequestBody QueryDto queryDto) {
        return CommonResult.success(elasticSearchSqlService.explain(queryDto.getSql()));
    }

}
复制代码

Example Request:

Example results:

to sum up

Although not officially recommended ES SQL query language, but because of his convenience, ES official also beginning to realize this. ES versions after 6.3.0 also began to support SQL, but he is by introducing x-pack way, if we can use the REST way through, but we are introduced to the development is still a little problem and needs to Platinum membership for the job, I do not know the future will not let go.

In addition, SQL Although more convenient to use, but after all, not official, it is inevitable that a defect in function, there is no DSL powerful features, and more inside the pit, but the basic query support. So if it is not forced to, I still recommend the use of DSL, and some simple operations can use SQL to assist, the source of this article have been uploaded to my Github, if interested readers can focus on my Github .


Personal Public Number: JaJian

Welcome to long press drawing public attention number: JaJian!

Explanation and analysis of distributed regularly offer micro-tier Internet companies and other service-related technology for you.




Reproduced in: https: //juejin.im/post/5cf229d86fb9a07ee27afd65

Guess you like

Origin blog.csdn.net/weixin_33674437/article/details/91428952