Table of contents
一、Elasticsearch
1. Understand concepts and nouns
- Very powerful search engine, easy to store and retrieve, can quickly store, search and analyze massive data, Wikipedia/Stack Overflow/GitHub are all implemented by it
- A distributed real-time document repository, each field can be indexed and searched
- A distributed real-time analytical search engine
- It is capable of expanding hundreds of service nodes and supports PB-level structured or unstructured data
- Distributed database, allowing multiple servers to work together, each server can run multiple instances of Elasticsearch
节点Node
: a single Elasticsearch instance集群Cluster
: A set of nodes form a cluster索引index
: Elasticsearch will index all fields, and write a reverse index ( ) after processinginverted index
, which is equivalent to the database concept in MongoDB/Mysql. Each索引(数据库)
name must小写
文档document
: A single record in an index is called a document, and many documents form an index. The structure of documents in an index can be different, but this is not recommended类型Types
: Documents can be grouped, virtual logical grouping, used to filter documents, similar to collections in MongoDB, data tables in MySQL字段Fields
: Each document is similar to a JSON structure, containing many fields, each field has a value, and multiple fields form a document总结
:Elasticsearch:索引index
>类型Types
>文档document
>字段Fields
,es8.x彻底删除了type!
, es is document-oriented, all es are JSON,ELK是ElasticSearch, Logstash, Kibana三大开源框架首字母大写的简称
2. Elasticsearch installation
Recommended reference installation article
-
ElasticSearch is developed based on lucence, that is, java jdk support is required for operation, so the JAVA environment must be installed first, the document directory three JDK1.8 installation , and then the cmd window input
java -version
as follows shows that the java environment installation is successful
-
Elasticsearch download address , then unzip
-
Then enter the config directory, modify the following two files, elasticsearch.yml modify part of the configuration, jvm.options modify the es memory size (add two lines of this
-Xms1g
)
-
elasticsearch.yml modify part of the configuration
# elasticsearch.yml文件下修改如下 cluster.name: mysy-es network.host: localhost http.port: 9200 # 是否启用ssl,若不改为false则无法连接端口,http访问 xpack.security.enabled: false
-
jvm.options modify es memory size: add two lines of this
-Xms1g
-
Then enter the bin directory, double-click to execute elasticsearch.bat, wait for a while to load as follows, pay attention not to close the cmd window, otherwise es cannot connect successfully, unless you have configured the service to start automatically
-
Then open http://localhost:9200/ , and the following interface appears, indicating that the installation is successful
-
set
ES_HOME
environment variables
3. ik participle plug-in installation
-
Elasticsearch search function, for Chinese, you need to install a word segmentation plug-in
elasticsearch-analysis-ik
, note that the installed version is consistent with the Elasticsearch version -
Go here to download the corresponding installation package https://github.com/medcl/elasticsearch-analysis-ik/releases , note that the downloaded version is consistent with Elasticsearch
-
In the plugins directory of Elasticsearch, unzip the compressed package you just downloaded to this folder, and rename it to ik (it seems that you don’t need to rename it)
-
Then restart elasticsearch.bat again, as shown in the figure has been loaded
4. kibana visual installation
- Go to https://www.elastic.co/cn/downloads/kibana to download and install Kibana
- After decompression, double-click kibana.bat in the bin directory to start, and then go to http://localhost:5601 to see the following interface
5. Windows configuration ElasticSearch service
- Open the cmd window to execute in the bin directory
elasticsearch-service.bat install
, and thenelasticsearch-service.bat start
start the service. I didn’t try it here, so I manually double-clicked the elasticsearch.bat file in the bin directory. Refer to the article
elasticsearch-service.bat install
: Install Elasticsearch serviceelasticsearch-service.bat remove
: Remove the installed Elasticsearch service (stop the service if started)elasticsearch-service.bat start
: Start the Elasticsearch service (if installed)elasticsearch-service.bat stop
: stop the service (if started)elasticsearch-service.bat manager
: Launch GUI to manage installed services
Two, python operation Elasticsearch
pip install elasticsearch
- Create an es instance, refer to the documentation for more parameter descriptions
from elasticsearch import Elasticsearch es = Elasticsearch(hosts=['http://localhost:9200/']).options( request_timeout=20, retry_on_timeout=True, ignore_status=[400, 404] )
- The total code, where
ignore_status=[400, 404]
400 means that if the index already exists, it will return 400 but will not throw an error and cause the next code to fail to run. It is to ignore the error of repeated creation of the index; 404 is to ignore the error because the index does not exist. The problem that the program was interrupted due to deletion failurefrom elasticsearch import Elasticsearch es = Elasticsearch(hosts=['http://localhost:9200/']).options( request_timeout=20, retry_on_timeout=True, ignore_status=[400, 404] ) # 删除索引 result = es.indices.delete(index="news") print(result) # {'acknowledged': True} # 创建索引 result = es.indices.create(index="news") print(result) # {'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'} # 插入数据 result = es.create(index='news', id='1', document={ "title": "你好周六"}) # 需指定id print(result) # {'_index': 'news', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1} result = es.index(index='news', document={ "title": "你好周日"}) # 自动生成id print(result) # {'_index': 'news', '_id': 'zT_HwIABRhdG867DnYdw', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1} # 更新数据 result = es.index(index='news', id='1', document={ "title": "你好周日", "en": "hello zhou liu"}) print(result) # {'_index': 'news', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1} # 删除数据 result = es.delete(index='news', id='1') print(result) # {'_index': 'news', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}
1. Create an index
- For example, create an index news:
es.indices.create(index="news")
, and then open http://127.0.0.1:9200/news to see the datafrom elasticsearch import Elasticsearch es = Elasticsearch(hosts=['http://localhost:9200/']).options( request_timeout=20, retry_on_timeout=True, ignore_status=[400, 404] ) result = es.indices.create(index="news") print(result) # {'acknowledged': True, 'shards_acknowledged': True, 'index': 'news'}
2. Delete the index
- Delete index news:
es.indices.delete(index="news")
# 删除索引 result = es.indices.delete(index="news") print(result) # {'acknowledged': True}
3. New data
- There are two types of insert data
es.create
andes.index
, the create method needs to specify the id, the index automatically generates the id, and returns the result part: '_version': 1, 'result': 'created'result = es.create(index='news', id='1', document={ "title": "你好周六"}) # 需指定id print(result) # {'_index': 'news', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1} result = es.index(index='news', document={ "title": "你好周日"}) # 自动生成id print(result) # {'_index': 'news', '_id': 'zT_HwIABRhdG867DnYdw', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}
4. Update data
- You need to specify the updated id,
es.index
which can either insert data or update data. There is a field in the returned result,_version
which means that the version number will be incremented by 1 every time it is updated, and the returned result part: '_version': 2, 'result': ' updated'result = es.index(index='news', id='1', document={ "title": "你好周日", "en": "hello zhou liu"}) print(result) # {'_index': 'news', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}
5. Delete data
- You need to specify the deleted id,
es.delete
delete the data, and return part of the result: '_version': 3, 'result': 'deleted',result = es.delete(index='news', id='1') print(result) # {'_index': 'news', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}
6. Query data
es.search
Query data, more queries usefrom elasticsearch import Elasticsearch es = Elasticsearch(hosts=['http://localhost:9200/']).options( request_timeout=20, retry_on_timeout=True, ignore_status=[400, 404] ) properties = { "title": { 'type': 'text'} } # es.indices.delete(index="news") result = es.indices.put_mapping(index='news', properties=properties) print(result) # 插入数据 datas = [ { 'title': '美国留给伊拉克的是个烂摊子吗', 'url': 'http://view.news.qq.com/zt2011/usa_iraq/index.htm', 'date': '2011-12-16'}, { 'title': '公安部:各地校车将享最高路权', 'url': 'http://www.chinanews.com/gn/2011/12-16/3536077.shtml', 'date': '2011-12-16'}, { 'title': '中韩渔警冲突调查:韩警平均每天扣1艘中国渔船', 'url': 'https://news.qq.com/a/20111216/001044.htm', 'date': '2011-12-17'}, { 'title': '中国驻洛杉矶领事馆遭亚裔男子枪击 嫌犯已自首', 'url': 'http://news.ifeng.com/world/detail_2011_12/16/11372558_0.shtml', 'date': '2011-12-18'} ] for data in datas: es.index(index='news', document=data) # 查询1 result = es.search(index='news') print(result) # 查询2 query = { 'match': { 'title': '平均' } } result = es.search(index='news', query=query) print(result)