Python Elasticsearch8.2.0

一、Elasticsearch

1. Understand concepts and nouns

Very powerful search engine, easy to store and retrieve, can quickly store, search and analyze massive data, Wikipedia/Stack Overflow/GitHub are all implemented by it
A distributed real-time document repository, each field can be indexed and searched
A distributed real-time analytical search engine
It is capable of expanding hundreds of service nodes and supports PB-level structured or unstructured data
Distributed database, allowing multiple servers to work together, each server can run multiple instances of Elasticsearch
节点Node: a single Elasticsearch instance
集群Cluster: A set of nodes form a cluster
索引index: Elasticsearch will index all fields, and write a reverse index ( ) after processing inverted index, which is equivalent to the database concept in MongoDB/Mysql. Each 索引(数据库)name must小写
文档document: A single record in an index is called a document, and many documents form an index. The structure of documents in an index can be different, but this is not recommended
类型Types: Documents can be grouped, virtual logical grouping, used to filter documents, similar to collections in MongoDB, data tables in MySQL
字段Fields: Each document is similar to a JSON structure, containing many fields, each field has a value, and multiple fields form a document
总结：Elasticsearch: 索引index>类型Types>文档document>字段Fields，
es8.x彻底删除了type！, es is document-oriented, all es are JSON,ELK是ElasticSearch, Logstash, Kibana三大开源框架首字母大写的简称

2. Elasticsearch installation

ElasticSearch is developed based on lucence, that is, java jdk support is required for operation, so the JAVA environment must be installed first, the document directory three JDK1.8 installation , and then the cmd window input java -versionas follows shows that the java environment installation is successful
Elasticsearch download address , then unzip
Then enter the config directory, modify the following two files, elasticsearch.yml modify part of the configuration, jvm.options modify the es memory size (add two lines of this -Xms1g)

elasticsearch.yml modify part of the configuration

# elasticsearch.yml文件下修改如下
cluster.name: mysy-es
network.host: localhost
http.port: 9200
# 是否启用ssl，若不改为false则无法连接端口，http访问
xpack.security.enabled: false

jvm.options modify es memory size: add two lines of this-Xms1g
Then enter the bin directory, double-click to execute elasticsearch.bat, wait for a while to load as follows, pay attention not to close the cmd window, otherwise es cannot connect successfully, unless you have configured the service to start automatically
Then open http://localhost:9200/ , and the following interface appears, indicating that the installation is successful
set ES_HOMEenvironment variables

3. ik participle plug-in installation

Elasticsearch search function, for Chinese, you need to install a word segmentation plug-in elasticsearch-analysis-ik, note that the installed version is consistent with the Elasticsearch version
Go here to download the corresponding installation package https://github.com/medcl/elasticsearch-analysis-ik/releases , note that the downloaded version is consistent with Elasticsearch
In the plugins directory of Elasticsearch, unzip the compressed package you just downloaded to this folder, and rename it to ik (it seems that you don’t need to rename it)
Then restart elasticsearch.bat again, as shown in the figure has been loaded

4. kibana visual installation

Go to https://www.elastic.co/cn/downloads/kibana to download and install Kibana
After decompression, double-click kibana.bat in the bin directory to start, and then go to http://localhost:5601 to see the following interface

5. Windows configuration ElasticSearch service

Open the cmd window to execute in the bin directory elasticsearch-service.bat install, and then elasticsearch-service.bat startstart the service. I didn’t try it here, so I manually double-clicked the elasticsearch.bat file in the bin directory. Refer to the article
elasticsearch-service.bat install: Install Elasticsearch service
elasticsearch-service.bat remove: Remove the installed Elasticsearch service (stop the service if started)
elasticsearch-service.bat start: Start the Elasticsearch service (if installed)
elasticsearch-service.bat stop: stop the service (if started)
elasticsearch-service.bat manager: Launch GUI to manage installed services

Two, python operation Elasticsearch

pip install elasticsearch

Create an es instance, refer to the documentation for more parameter descriptions

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=['http://localhost:9200/']).options(
    request_timeout=20,
    retry_on_timeout=True,
    ignore_status=[400, 404]
)

The total code, where ignore_status=[400, 404]400 means that if the index already exists, it will return 400 but will not throw an error and cause the next code to fail to run. It is to ignore the error of repeated creation of the index; 404 is to ignore the error because the index does not exist. The problem that the program was interrupted due to deletion failure

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=['http://localhost:9200/']).options(
    request_timeout=20,
    retry_on_timeout=True,
    ignore_status=[400, 404]
)
# 删除索引
result = es.indices.delete(index="news")
print(result)  # {'acknowledged': True}
# 创建索引
result = es.indices.create(index="news")
print(result)  # {'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'}
# 插入数据
result = es.create(index='news', id='1', document={
      
      "title": "你好周六"})   # 需指定id
print(result)  # {'_index': 'news', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}
result = es.index(index='news', document={
      
      "title": "你好周日"})  # 自动生成id
print(result)  # {'_index': 'news', '_id': 'zT_HwIABRhdG867DnYdw', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}
# 更新数据
result = es.index(index='news', id='1', document={
      
      "title": "你好周日", "en": "hello zhou liu"})
print(result)  # {'_index': 'news', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}
# 删除数据
result = es.delete(index='news', id='1')
print(result) # {'_index': 'news', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}

1. Create an index

For example, create an index news: es.indices.create(index="news"), and then open http://127.0.0.1:9200/news to see the data

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=['http://localhost:9200/']).options(
    request_timeout=20,
    retry_on_timeout=True,
    ignore_status=[400, 404]
)
result = es.indices.create(index="news")
print(result) # {'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'}

2. Delete the index

Delete index news:es.indices.delete(index="news")

# 删除索引
result = es.indices.delete(index="news")
print(result)  # {'acknowledged': True}

3. New data

There are two types of insert data es.createand es.index, the create method needs to specify the id, the index automatically generates the id, and returns the result part: '_version': 1, 'result': 'created'

result = es.create(index='news', id='1', document={
      
      "title": "你好周六"})   # 需指定id
print(result)  # {'_index': 'news', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}
result = es.index(index='news', document={
      
      "title": "你好周日"})  # 自动生成id
print(result)  # {'_index': 'news', '_id': 'zT_HwIABRhdG867DnYdw', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}

4. Update data

You need to specify the updated id, es.indexwhich can either insert data or update data. There is a field in the returned result, _versionwhich means that the version number will be incremented by 1 every time it is updated, and the returned result part: '_version': 2, 'result': ' updated'

result = es.index(index='news', id='1', document={
      
      "title": "你好周日", "en": "hello zhou liu"})
print(result)  # {'_index': 'news', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}

5. Delete data

You need to specify the deleted id, es.deletedelete the data, and return part of the result: '_version': 3, 'result': 'deleted',

result = es.delete(index='news', id='1')
print(result) # {'_index': 'news', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}

6. Query data

es.searchQuery data, more queries use

from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=['http://localhost:9200/']).options(
    request_timeout=20,
    retry_on_timeout=True,
    ignore_status=[400, 404]
)
properties = {
      
      
    "title": {
      
      'type': 'text'}
}
# es.indices.delete(index="news")
result = es.indices.put_mapping(index='news', properties=properties)
print(result)
# 插入数据
datas = [
    {
      
      'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm', 'date': '2011-12-16'},
    {
      
      'title': '公安部：各地校车将享最高路权', 'url': 'http://www.chinanews.com/gn/2011/12-16/3536077.shtml', 'date': '2011-12-16'},
    {
      
      'title': '中韩渔警冲突调查：韩警平均每天扣1艘中国渔船', 'url': 'https://news.qq.com/a/20111216/001044.htm', 'date': '2011-12-17'},
    {
      
      'title': '中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首', 'url': 'http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml', 'date': '2011-12-18'}
]
for data in datas:
    es.index(index='news', document=data)
# 查询1
result = es.search(index='news')
print(result)
# 查询2
query = {
      
      
    'match': {
      
      
        'title': '平均'
    }
}
result = es.search(index='news', query=query)
print(result)