Python Elasticsearch8.2.0

一、Elasticsearch

1. Understand concepts and nouns
  • Very powerful search engine, easy to store and retrieve, can quickly store, search and analyze massive data, Wikipedia/Stack Overflow/GitHub are all implemented by it
  • A distributed real-time document repository, each field can be indexed and searched
  • A distributed real-time analytical search engine
  • It is capable of expanding hundreds of service nodes and supports PB-level structured or unstructured data
  • Distributed database, allowing multiple servers to work together, each server can run multiple instances of Elasticsearch
  • 节点Node: a single Elasticsearch instance
  • 集群Cluster: A set of nodes form a cluster
  • 索引index: Elasticsearch will index all fields, and write a reverse index ( ) after processing inverted index, which is equivalent to the database concept in MongoDB/Mysql. Each 索引(数据库)name must小写
  • 文档document: A single record in an index is called a document, and many documents form an index. The structure of documents in an index can be different, but this is not recommended
  • 类型Types: Documents can be grouped, virtual logical grouping, used to filter documents, similar to collections in MongoDB, data tables in MySQL
  • 字段Fields: Each document is similar to a JSON structure, containing many fields, each field has a value, and multiple fields form a document
  • 总结:Elasticsearch: 索引index>类型Types>文档document>字段Fields
  • es8.x彻底删除了type!, es is document-oriented, all es are JSON,ELK是ElasticSearch, Logstash, Kibana三大开源框架首字母大写的简称
2. Elasticsearch installation

Recommended reference installation article

  • ElasticSearch is developed based on lucence, that is, java jdk support is required for operation, so the JAVA environment must be installed first, the document directory three JDK1.8 installation , and then the cmd window input java -versionas follows shows that the java environment installation is successful
    insert image description here

  • Elasticsearch download address , then unzip
    insert image description here

  • Then enter the config directory, modify the following two files, elasticsearch.yml modify part of the configuration, jvm.options modify the es memory size (add two lines of this -Xms1g)
    insert image description here

  • elasticsearch.yml modify part of the configuration

    # elasticsearch.yml文件下修改如下
    cluster.name: mysy-es
    network.host: localhost
    http.port: 9200
    # 是否启用ssl,若不改为false则无法连接端口,http访问
    xpack.security.enabled: false
    
  • jvm.options modify es memory size: add two lines of this-Xms1g
    insert image description here

  • Then enter the bin directory, double-click to execute elasticsearch.bat, wait for a while to load as follows, pay attention not to close the cmd window, otherwise es cannot connect successfully, unless you have configured the service to start automatically
    insert image description here

  • Then open http://localhost:9200/ , and the following interface appears, indicating that the installation is successful
    insert image description here

  • set ES_HOMEenvironment variables
    insert image description here

3. ik participle plug-in installation
  • Elasticsearch search function, for Chinese, you need to install a word segmentation plug-in elasticsearch-analysis-ik, note that the installed version is consistent with the Elasticsearch version

  • Go here to download the corresponding installation package https://github.com/medcl/elasticsearch-analysis-ik/releases , note that the downloaded version is consistent with Elasticsearch
    insert image description here

  • In the plugins directory of Elasticsearch, unzip the compressed package you just downloaded to this folder, and rename it to ik (it seems that you don’t need to rename it)
    insert image description here
    insert image description here

  • Then restart elasticsearch.bat again, as shown in the figure has been loaded
    insert image description here

4. kibana visual installation
5. Windows configuration ElasticSearch service
  • Open the cmd window to execute in the bin directory elasticsearch-service.bat install, and then elasticsearch-service.bat startstart the service. I didn’t try it here, so I manually double-clicked the elasticsearch.bat file in the bin directory. Refer to the article
    insert image description here
  • elasticsearch-service.bat install: Install Elasticsearch service
  • elasticsearch-service.bat remove: Remove the installed Elasticsearch service (stop the service if started)
  • elasticsearch-service.bat start: Start the Elasticsearch service (if installed)
  • elasticsearch-service.bat stop: stop the service (if started)
  • elasticsearch-service.bat manager: Launch GUI to manage installed services

Two, python operation Elasticsearch

  • pip install elasticsearch
  • Create an es instance, refer to the documentation for more parameter descriptions
    from elasticsearch import Elasticsearch
    es = Elasticsearch(hosts=['http://localhost:9200/']).options(
        request_timeout=20,
        retry_on_timeout=True,
        ignore_status=[400, 404]
    )
    
  • The total code, where ignore_status=[400, 404]400 means that if the index already exists, it will return 400 but will not throw an error and cause the next code to fail to run. It is to ignore the error of repeated creation of the index; 404 is to ignore the error because the index does not exist. The problem that the program was interrupted due to deletion failure
    from elasticsearch import Elasticsearch
    es = Elasticsearch(hosts=['http://localhost:9200/']).options(
        request_timeout=20,
        retry_on_timeout=True,
        ignore_status=[400, 404]
    )
    # 删除索引
    result = es.indices.delete(index="news")
    print(result)  # {'acknowledged': True}
    # 创建索引
    result = es.indices.create(index="news")
    print(result)  # {'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'}
    # 插入数据
    result = es.create(index='news', id='1', document={
          
          "title": "你好周六"})   # 需指定id
    print(result)  # {'_index': 'news', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}
    result = es.index(index='news', document={
          
          "title": "你好周日"})  # 自动生成id
    print(result)  # {'_index': 'news', '_id': 'zT_HwIABRhdG867DnYdw', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}
    # 更新数据
    result = es.index(index='news', id='1', document={
          
          "title": "你好周日", "en": "hello zhou liu"})
    print(result)  # {'_index': 'news', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}
    # 删除数据
    result = es.delete(index='news', id='1')
    print(result) # {'_index': 'news', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}
    
    
1. Create an index
  • For example, create an index news: es.indices.create(index="news"), and then open http://127.0.0.1:9200/news to see the data
    from elasticsearch import Elasticsearch
    es = Elasticsearch(hosts=['http://localhost:9200/']).options(
        request_timeout=20,
        retry_on_timeout=True,
        ignore_status=[400, 404]
    )
    result = es.indices.create(index="news")
    print(result) # {'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'}
    
    insert image description here
2. Delete the index
  • Delete index news:es.indices.delete(index="news")
    # 删除索引
    result = es.indices.delete(index="news")
    print(result)  # {'acknowledged': True}
    
3. New data
  • There are two types of insert data es.createand es.index, the create method needs to specify the id, the index automatically generates the id, and returns the result part: '_version': 1, 'result': 'created'
    result = es.create(index='news', id='1', document={
          
          "title": "你好周六"})   # 需指定id
    print(result)  # {'_index': 'news', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}
    result = es.index(index='news', document={
          
          "title": "你好周日"})  # 自动生成id
    print(result)  # {'_index': 'news', '_id': 'zT_HwIABRhdG867DnYdw', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}
    
4. Update data
  • You need to specify the updated id, es.indexwhich can either insert data or update data. There is a field in the returned result, _versionwhich means that the version number will be incremented by 1 every time it is updated, and the returned result part: '_version': 2, 'result': ' updated'
    result = es.index(index='news', id='1', document={
          
          "title": "你好周日", "en": "hello zhou liu"})
    print(result)  # {'_index': 'news', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}
    
5. Delete data
  • You need to specify the deleted id, es.deletedelete the data, and return part of the result: '_version': 3, 'result': 'deleted',
    result = es.delete(index='news', id='1')
    print(result) # {'_index': 'news', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}
    
6. Query data
  • es.searchQuery data, more queries use
    from elasticsearch import Elasticsearch
    es = Elasticsearch(hosts=['http://localhost:9200/']).options(
        request_timeout=20,
        retry_on_timeout=True,
        ignore_status=[400, 404]
    )
    properties = {
          
          
        "title": {
          
          'type': 'text'}
    }
    # es.indices.delete(index="news")
    result = es.indices.put_mapping(index='news', properties=properties)
    print(result)
    # 插入数据
    datas = [
        {
          
          'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm', 'date': '2011-12-16'},
        {
          
          'title': '公安部:各地校车将享最高路权', 'url': 'http://www.chinanews.com/gn/2011/12-16/3536077.shtml', 'date': '2011-12-16'},
        {
          
          'title': '中韩渔警冲突调查:韩警平均每天扣1艘中国渔船', 'url': 'https://news.qq.com/a/20111216/001044.htm', 'date': '2011-12-17'},
        {
          
          'title': '中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首', 'url': 'http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml', 'date': '2011-12-18'}
    ]
    for data in datas:
        es.index(index='news', document=data)
    # 查询1
    result = es.search(index='news')
    print(result)
    # 查询2
    query = {
          
          
        'match': {
          
          
            'title': '平均'
        }
    }
    result = es.search(index='news', query=query)
    print(result)
    
    

Guess you like

Origin blog.csdn.net/weixin_43411585/article/details/124763838