Python Elasticsearch DSL queries, filtering, operation examples of the polymerization

 

github.com/yongxinz/te…

Elasticsearch basic concepts

Index: Elasticsearch logic region for storing data, which is similar to the concept of database relational database. An index may be in one or more of the above shard, a shard while there may be a plurality of replicas.

Document: Elasticsearch entity data stored inside, similar to the relationship data for a row in a table.

document composed of a plurality of field, different field of the same name inside the document must have the same type. document which may be repeated field, there is a plurality of field values, i.e., multivalued.

Document type: To query requires an index may have a variety of document, namely document type table which is similar to the concept of a relational database. Note, however, field a different document with the same name inside it must be of the same type.

Mapping: It is similar to a relational database schema definition of the concept of. Related mapping information storage field, the different document type will have a different mapping.

Below is some terms ElasticSearch Comparative and relational databases:

Relationnal database Elasticsearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping
Schema Mapping
Index Everything is indexed
SQL Query DSL
SELECT * FROM table… GET http://…
UPDATE table SET PUT http://…

Python Elasticsearch DSL Introduction

Connection Es:

import elasticsearch

es = elasticsearch.Elasticsearch([{'host': '127.0.0.1', 'port': 9200}])

Look at the search, q refer to search for content, spaces of  no effect results,  specify the number,  specify a starting location,  you can specify the data to be displayed, as shown in the final result only in this case  and . qsizefrom_filter_path _id _type

res_3 = es.search(index="bank", q="Holmes", size=1, from_=1)
res_4 = es.search(index="bank", q=" 39225    5686 ", size=1000, filter_path=['hits.hits._id', 'hits.hits._type'])

Query all data specified index:

Wherein, index specified index, an index string representation; represents a list of a plurality of indices, such as ; canonical form representing a plurality of eligible index, as expressed in  the beginning of the full index. index=["bank", "banner", "country"] index=["apple*"] apple

search You may also be specified in particular . doc-type

from elasticsearch_dsl import Search

s = Search(using=es, index="index-test").execute()
print s.to_dict()

According to a field query, you can overlay multiple query:

s = Search(using=es, index="index-test").query("match", sip="192.168.1.1")
s = s.query("match", dip="192.168.1.2")
s = s.excute()

Multi-field queries:

from elasticsearch_dsl.query import MultiMatch, Match

multi_match = MultiMatch(query='hello', fields=['title', 'content'])
s = Search(using=es, index="index-test").query(multi_match)
s = s.execute()

print s.to_dict()

You can also use  object-field multi-query  is a list of  values to be queried. Q()fieldsquery

from elasticsearch_dsl import Q

q = Q("multi_match", query="hello", fields=['title', 'content'])
s = s.query(q).execute()

print s.to_dict()

Q() The first parameter is the query method can also be . bool

q = Q('bool', must=[Q('match', title='hello'), Q('match', content='world')])
s = s.query(q).execute()

print s.to_dict()

By  combining the query, the query corresponding to the above another way. Q()

q = Q("match", title='python') | Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"should": [...]}}

q = Q("match", title='python') & Q("match", title='django')
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must": [...]}}

q = ~Q("match", title="python")
s = s.query(q).execute()
print(s.to_dict())
# {"bool": {"must_not": [...]}}

Filtered, the filter range here, range is a process, timestamp is to be queried  name,  is greater than or equal,  less than, be set according to need. fieldgtelt

On  and  distinction,  is an exact match,  will blur, will conduct word, returns a match score, (  if the query string of lowercase letters, uppercase will return empty ie without a hit,  it is not case sensitive can be queried return the result is the same) term matchtermmatchtermmatch

# 范围查询
s = s.filter("range", timestamp={"gte": 0, "lt": time.time()}).query("match", country="in")
# 普通过滤
res_3 = s.filter("terms", balance_num=["39225", "5686"]).execute()

Other wording:

s = Search()
s = s.filter('terms', tags=['search', 'python'])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}

s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}}
s = s.exclude('terms', tags=['search', 'python'])
# 或者
s = s.query('bool', filter=[~Q('terms', tags=['search', 'python'])])
print(s.to_dict())
# {'query': {'bool': {'filter': [{'bool': {'must_not': [{'terms': {'tags': ['search', 'python']}}]}}]}}}

The polymerization may be placed behind the query, such as filtration superimposed, and needs . aggs

bucket Is the grouping, where the first parameter is the name of the group, you can specify your own, and the second parameter is the method, and the third is designated . field

metric Is the same, metric methods are , , ,  and the like, but should be noted that, there are two ways these values return disposable,  and which can return the variance of equivalent. sumavgmaxminstats extended_stats

# 实例1
s.aggs.bucket("per_country", "terms", field="timestamp").metric("sum_click", "stats", field="click").metric("sum_request", "stats", field="request")

# 实例2
s.aggs.bucket("per_age", "terms", field="click.keyword").metric("sum_click", "stats", field="click")

# 实例3
s.aggs.metric("sum_age", "extended_stats", field="impression")

# 实例4
s.aggs.bucket("per_age", "terms", field="country.keyword"
a = A (Example 5, this polymerization is carried out according to the polymerization zone#)

"range", field="account_number", ranges=[{"to": 10}, {"from": 11, "to": 21}])

res = s.execute()

Finally, still to be performed , to be noted here,  the operation can not be received with a variable (e.g. , this operation is wrong), the polymerization result will be saved to the  display. execute()s.aggs res=s.aggs res

Sequence

s = Search().sort(
    'category',
    '-title',
    {"lines" : {"order" : "asc", "mode" : "avg"}}
)

Paging

s = s[10:20]
# {"from": 10, "size": 10}

Some extension methods, interested students can see:

= S Search () 

# set extended attributes using `.extra ()` Method 
S = s.extra (EXPLAIN = True) 

# Set parameters .params `()` 
S = s.params (SEARCH_TYPE = " COUNT " ) 

# to limit the return to the field, may be used `source ()` method 
# only return The Selected fields 
S = s.source ([ ' title ' , ' body ' ])
 # do Not return the any fields, Just The Metadata 
S = s.source (False)
 # Explicitly the include / the exclude Fields 
S = s.source (the include = [ " title " ], the exclude = [ ". * User " ])
 # RESET The Field Selection 
S = s.source (None) 

# use of a query sequence dict 
S = Search.from_dict ({ " Query " : { " match " : { " title " : " Python " }}}) 

# modify an existing query 
s.update_from_dict ({ " query " : { " match " : { " title " : " Python " }}, "size": 42})

Reference documents:

fingerchou.com/2017/08/12/…

fingerchou.com/2017/08/13/…

blog.csdn.net/JunFeng666/…

Guess you like

Origin www.cnblogs.com/lianhaifeng/p/11875835.html