ElasticsSearch basic concepts and installation

ElasticSearch basic concepts and visual interface installation

1 Introduction

The bottom layer of Elastic is the open source library Lucene. However, you cannot use Lucene directly, you must write your own code to call its interface. Elastic is a package of Lucene, providing a REST API operation interface that can be used out of the box. It is a search engine that relies on various methods such as word segmentation to realize many search scenarios that cannot be achieved by databases.

REST API: Naturally cross-platform.

Official documentation: ES official documentation

Official Chinese: Official Chinese version of the document

Community Chinese:

https://es.xiaoleilu.com/index.html

http://doc.codingdict.com/elasticsearch/0/

2. Basic concepts

  • Index: Index is equivalent to the database concept in a relational database. It is a collection of a type of data and a logical concept.
  • Type: Type, equivalent to the table concept in the database. Before version 6.0, there could be multiple types in an Index. typeAfter version 7.0, multiple types were completely abandoned. Each index can only have one type, namely "_doc". You don’t need to pay too much attention to this concept.
  • Document: Document. The main entity stored in ES is called a document, which can be understood as a row of data records in a table in a relational database . Each document consists of multiple fields ( field). Different from relational databases, ES is an unstructured database. Each document can have different fields and has a unique identifier. The document is in Json format.
  • Field: Field, which exists in the document. A field is a key-value pair containing data . It can be understood as one of the columns of a row of Mysql data.
  • Mapping: Mapping is to define the index fields and their data types in the index database, similar to the table structure in a relational database. ES dynamically creates indexes and index type mappings by default.
Elasticsearch Mysql
Index Database
Type Table
Document Row
Field Column
Mappings Table structure (schema)

3. Inverted index mechanism

When it comes to search, people will immediately think of entering keywords on Baidu and Google to obtain relevant content. But search is not equal to Baidu. In-site search supported by most APPs is even more popular.

The database is a powerful tool for storing and querying data, so is the database suitable for searching? The answer is inappropriate. The first reason is that when the database stores a large amount of data, the query efficiency is greatly reduced.

However, some search scenarios are not supported by the database . For example, in the table below, we try to search data through the keyword "Chinese Football", but the database cannot query the corresponding content.

id name
1 Chinese men's football team
2 Chinese men's track and field team
3 Chinese women's volleyball team
4 Chinese women's diving team

3.1. Inverted index

What is an inverted index? The inverted index is also called a reverse index. The index we usually understand is to find the value through the key. On the contrary, the inverted index is to find the key through the value, so it is called a reverse index. Let's use a simple example to describe the working process of the inverted index: If there are three data documents, the contents are:

Doc 1:Java is the best programming language

Doc 2:PHP is the best programming language

Doc 3:Javascript is the best programming language

In order to create an index, the ES engine splits the content of each document into separate words (called terms, or terms) through a word breaker, then creates these terms into a sorted list without duplicate terms, and then lists In which document each term appears, the results are as follows:

term Doc 1 Doc 2 Doc 3
Java
is
the
best
programming
language
PHP
Javascript

This structure consists of a list of all unique words in a document, each of which has at least one document associated with it. This structure in which the location of records is determined by attribute values ​​is an inverted index , and files with inverted indexes are called inverted files.

Convert the above table into a more intuitive picture to show the inverted index:

Among them, several core terms need to be understood:

  1. Term : The smallest storage and query unit in the index. For English, it is a word. For Chinese, it generally refers to a word after word segmentation.
  2. Dictionary (Term Dictionary) : Also called a dictionary, it is a combination of entries. The usual indexing unit of a search engine is a word. The word dictionary is a collection of strings composed of all words that appear in the document collection. Each index item in the word dictionary records some information about the word itself and a pointer to all the inverted words.
  3. Post list : A document usually consists of multiple words. The post list records the documents in which a certain word appears and where it appears. Each record is called a posting. The posting table records not only the document number, but also information such as word frequency.
  4. Inverted File : The inverted list of all words is often stored sequentially in a file on the disk. This file is called an inverted file. The inverted file is a physical file that stores the inverted index.

Dictionaries and inverted lists are two very important data structures in Lucene and are an important cornerstone of fast retrieval. Dictionaries and inverted files are stored in two parts, the dictionary is in memory and the inverted files are stored on disk.

Relevance score : Take the picture above as an example. If we search for "Java is best", we will find that it is split into three words. Three words are hit (appeared) in document one, document two and document three. If two words are hit, then document one has the highest relevance score. Relevant documents will be retrieved by sorting them from high to low according to their relevance scores.

4. Use docker to install ElasticSearch

To install docker, you can refer to my other article and install it directly from top to bottom. as follows:

Introduction and installation of docker

4.1. Download the image file

First you need to download these two image files

docker pull elasticsearch:7.4.2   存储和检索数据引擎 

docker pull kibana:7.4.2      可视化检索数据工具

4.2. Create an instance and start es

mkdir -p /mydata/elasticsearch/config 

mkdir -p /mydata/elasticsearch/data 

--任何一台远程主机都可以访问
echo "http.host: 0.0.0.0" >> /mydata/elasticsearch/config/elasticsearch.yml 

-- 保证权限 
chmod -R 777 /mydata/elasticsearch/      

-- 9200是发送http请求restful请求的,9300是es集群之间的通信端口,single-node 单节点运行,并指定es启动占用的内存
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms64m -Xmx128m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:7.4.2

If you use docker ps and find that es has not started successfully, go to the /mydata/elasticsearch/ path and execute the above chmod -R 777 /mydata/elasticsearch/ to ensure that the permissions given to files and sub-files are RWX.

Then restart elasticsearch. If an error is reported, you can check the corresponding container startup log. Errors are usually caused by spaces and line breaks in the run command above.

docker logs  +`容器id前三位`

Delete container

 docker rm 容器id
docker ps -a 查看es的容器id
文件权限没问题后用容器id启动es(比如我的容器id是 13e30b6e7c1a ),报错就看日志重新run启动
docker start 13e30b6e7c1a 

postman tests elasticSearch accessing virtual machine port 9200

5. Install Kibana

docker run --name kibana -e ELASTICSEARCH_HOSTS=http://172.20.10.11:9200 -p 5601:5601 \
-d kibana:7.4.2

172.20.10.11为自己的虚拟机ip   ifconfig命令查看

Access Kibana on port 5601 of the virtual machine

As follows: Installation successful

Insert image description here

Guess you like

Origin blog.csdn.net/qq_45925197/article/details/132119185