Elasticsearch basic concepts
Index: Elasticsearch logic region for storing data, which is similar to the concept of database relational database. An index may be in one or more of the above shard, a shard while there may be a plurality of replicas.
Document: Elasticsearch entity data stored inside, similar to the relationship data for a row in a table.
document composed of a plurality of field, different field of the same name inside the document must have the same type. document which may be repeated field, there is a plurality of field values, i.e., multivalued.
Document type: To query requires an index may have a variety of document, namely document type table which is similar to the concept of a relational database. Note, however, field a different document with the same name inside it must be of the same type.
Mapping: It is similar to a relational database schema definition of the concept of. Related mapping information storage field, the different document type will have a different mapping.
Below is some terms ElasticSearch Comparative and relational databases:
Relationnal database | Elasticsearch |
---|---|
Database | Index |
Table | Type |
Row | Document |
Column | Field |
Schema | Mapping |
Schema | Mapping |
Index | Everything is indexed |
SQL | Query DSL |
SELECT * FROM table… | GET http://… |
UPDATE table SET | PUT http://… |
Python Elasticsearch DSL Introduction
Connection Es:
import elasticsearch es = elasticsearch.Elasticsearch([{'host': '127.0.0.1', 'port': 9200}])
Look at the search, q
refer to search for content, spaces of no effect results, specify the number, specify a starting location, you can specify the data to be displayed, as shown in the final result only in this case and . q
size
from_
filter_path
_id
_type
res_3 = es.search(index="bank", q="Holmes", size=1, from_=1) res_4 = es.search(index="bank", q=" 39225 5686 ", size=1000, filter_path=['hits.hits._id', 'hits.hits._type'])
Query all data specified index:
Wherein, index specified index, an index string representation; represents a list of a plurality of indices, such as ; canonical form representing a plurality of eligible index, as expressed in the beginning of the full index. index=["bank", "banner", "country"]
index=["apple*"]
apple
search
You may also be specified in particular . doc-type
from elasticsearch_dsl import Search s = Search(using=es, index="index-test").execute() print s.to_dict()
According to a field query, you can overlay multiple query:
s = Search(using=es, index="index-test").query("match", sip="192.168.1.1") s = s.query("match", dip="192.168.1.2") s = s.excute()
Multi-field queries:
from elasticsearch_dsl.query import MultiMatch, Match multi_match = MultiMatch(query='hello', fields=['title', 'content']) s = Search(using=es, index="index-test").query(multi_match) s = s.execute() print s.to_dict()
You can also use object-field multi-query is a list of values to be queried. Q()
fields
query
from elasticsearch_dsl import Q q = Q("multi_match", query="hello", fields=['title', 'content']) s = s.query(q).execute() print s.to_dict()
Q()
The first parameter is the query method can also be . bool
q = Q('bool', must=[Q('match', title='hello'), Q('match', content='world')]) s = s.query(q).execute() print s.to_dict()
By combining the query, the query corresponding to the above another way. Q()
q = Q("match", title='python') | Q("match", title='django') s = s.query(q).execute() print(s.to_dict()) # {"bool": {"should": [...]}} q = Q("match", title='python') & Q("match", title='django') s = s.query(q).execute() print(s.to_dict()) # {"bool": {"must": [...]}} q = ~Q("match", title="python") s = s.query(q).execute() print(s.to_dict()) # {"bool": {"must_not": [...]}}
Filtered, the filter range here, range
is a process, timestamp
is to be queried name, is greater than or equal, less than, be set according to need. field
gte
lt
On and distinction, is an exact match, will blur, will conduct word, returns a match score, ( if the query string of lowercase letters, uppercase will return empty ie without a hit, it is not case sensitive can be queried return the result is the same) term
match
term
match
term
match
# 范围查询 s = s.filter("range", timestamp={"gte": 0, "lt": time.time()}).query("match", country="in") # 普通过滤 res_3 = s.filter("terms", balance_num=["39225", "5686"]).execute()
Other wording:
s = Search() s = s.filter('terms', tags=['search', 'python']) print(s.to_dict()) # {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}} s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])]) print(s.to_dict()) # {'query': {'bool': {'filter': [{'terms': {'tags': ['search', 'python']}}]}}} s = s.exclude('terms', tags=['search', 'python']) # 或者 s = s.query('bool', filter=[~Q('terms', tags=['search', 'python'])]) print(s.to_dict()) # {'query': {'bool': {'filter': [{'bool': {'must_not': [{'terms': {'tags': ['search', 'python']}}]}}]}}}
The polymerization may be placed behind the query, such as filtration superimposed, and needs . aggs
bucket
Is the grouping, where the first parameter is the name of the group, you can specify your own, and the second parameter is the method, and the third is designated . field
metric
Is the same, metric
methods are , , , and the like, but should be noted that, there are two ways these values return disposable, and which can return the variance of equivalent. sum
avg
max
min
stats
extended_stats
# 实例1 s.aggs.bucket("per_country", "terms", field="timestamp").metric("sum_click", "stats", field="click").metric("sum_request", "stats", field="request") # 实例2 s.aggs.bucket("per_age", "terms", field="click.keyword").metric("sum_click", "stats", field="click") # 实例3 s.aggs.metric("sum_age", "extended_stats", field="impression") # 实例4 s.aggs.bucket("per_age", "terms", field="country.keyword" a = A (Example 5, this polymerization is carried out according to the polymerization zone#) "range", field="account_number", ranges=[{"to": 10}, {"from": 11, "to": 21}]) res = s.execute()
Finally, still to be performed , to be noted here, the operation can not be received with a variable (e.g. , this operation is wrong), the polymerization result will be saved to the display. execute()
s.aggs
res=s.aggs
res
Sequence
s = Search().sort( 'category', '-title', {"lines" : {"order" : "asc", "mode" : "avg"}} )
Paging
s = s[10:20] # {"from": 10, "size": 10}
Some extension methods, interested students can see:
= S Search () # set extended attributes using `.extra ()` Method S = s.extra (EXPLAIN = True) # Set parameters .params `()` S = s.params (SEARCH_TYPE = " COUNT " ) # to limit the return to the field, may be used `source ()` method # only return The Selected fields S = s.source ([ ' title ' , ' body ' ]) # do Not return the any fields, Just The Metadata S = s.source (False) # Explicitly the include / the exclude Fields S = s.source (the include = [ " title " ], the exclude = [ ". * User " ]) # RESET The Field Selection S = s.source (None) # use of a query sequence dict S = Search.from_dict ({ " Query " : { " match " : { " title " : " Python " }}}) # modify an existing query s.update_from_dict ({ " query " : { " match " : { " title " : " Python " }}, "size": 42})
Reference documents: