One or a group of process nodes on an independent network can be understood as an independently deployable application, a middleware
Provide external search service (http or transport protocol)
Internally is a search database
Noun definition
Index=database
Type=table, type definition is gradually abolished in es7
Document = row data
Comparison diagram of relational database and ES nouns:
Relational database
ElasticSearch
Database
Index
Table
Type
Row
Document
Column
Field
Schema
Mapping
Index
Everything is indexed
SQL
Query DSL
SELECT * FROM Table …
GET http://…
UPDATE Table Set
PUT http://…
index
Database or table definition in search
Index creation when building documents
Participle
Search is the most basic search unit in terms of words
Rely on tokenizer to build word segmentation
Use word segmentation to build an inverted index
Search engine processing diagram
Inverted index
Forward index: Traverse all documents and traverse all the fields under each document to determine whether it is the target record
Inverted index: in terms of words, all documents containing the word can be found according to the word, so it is not necessary to traverse all the documents, but only need to traverse all the words
TF-IDF scoring
Imagine searching for a bunch of documents based on a certain word. Which one is the better match? At this time, the logic of scoring is needed.
TF: Word frequency, how many words are contained in this document, the more it contains, the more relevant it is
DF: document frequency, the total number of documents containing the word
IDF: DF takes the reciprocal
Commonly used calculation formula for scoring: TF * IDF
Two, ElasticSearch installation
ElasticSearch
Introduction: is a distributed search and analysis engine based on Json
Sharding is based on an index. Suppose an index is an inverted index plus document structure. When the number of index plus documents exceeds the upper limit of a machine's disk, a sharding process is required. The default index creation is to allocate a shard Slice, all documents will be indexed in this slice
Master-slave
A master shard corresponds to a slave shard
routing
The primary and secondary shards need a routing information
numbers_of_shards: Define a number of main shards, used to respond to write operations (also respond to reads)
numbers_of_replicas: defines the number of index backup fragments, used to respond to read operations
The read request can occur directly on the slave node without passing through the master node. If the corresponding node has no fragmentation, it will be routed to the node with fragmentation.