ElasticSearch basics

1. About ElasticSearch

What is ElasticSearch

  • One or a group of process nodes on an independent network can be understood as an independently deployable application, a middleware
  • Provide external search service (http or transport protocol)
  • Internally is a search database

Noun definition

  • Index=database
  • Type=table, type definition is gradually abolished in es7
  • Document = row data
  • Comparison diagram of relational database and ES nouns:
Relational database ElasticSearch
Database Index
Table Type
Row Document
Column Field
Schema Mapping
Index Everything is indexed
SQL Query DSL
SELECT * FROM Table … GET http://…
UPDATE Table Set PUT http://…

index

  • Database or table definition in search
  • Index creation when building documents

Participle

  • Search is the most basic search unit in terms of words
  • Rely on tokenizer to build word segmentation
  • Use word segmentation to build an inverted index

Search engine processing diagram

Search engine processing diagram

Inverted index

  • Forward index: Traverse all documents and traverse all the fields under each document to determine whether it is the target record
    Forward Index Diagram
  • Inverted index: in terms of words, all documents containing the word can be found according to the word, so it is not necessary to traverse all the documents, but only need to traverse all the words
    Inverted index diagram

TF-IDF scoring

  • Imagine searching for a bunch of documents based on a certain word. Which one is the better match? At this time, the logic of scoring is needed.
  • TF: Word frequency, how many words are contained in this document, the more it contains, the more relevant it is
  • DF: document frequency, the total number of documents containing the word
  • IDF: DF takes the reciprocal
  • Commonly used calculation formula for scoring: TF * IDF

Two, ElasticSearch installation

ElasticSearch

Kibana

Three, distributed principle

Fragmentation

  • Sharding is based on an index. Suppose an index is an inverted index plus document structure. When the number of index plus documents exceeds the upper limit of a machine's disk, a sharding process is required. The default index creation is to allocate a shard Slice, all documents will be indexed in this slice

Master-slave

  • A master shard corresponds to a slave shard

routing

  • The primary and secondary shards need a routing information
  • numbers_of_shards: Define a number of main shards, used to respond to write operations (also respond to reads)
  • numbers_of_replicas: defines the number of index backup fragments, used to respond to read operations
  • The read request can occur directly on the slave node without passing through the master node. If the corresponding node has no fragmentation, it will be routed to the node with fragmentation.

For example

PUT /test
{
    
    
	"settings": {
    
    
		"numbers_of_shards": 1, // 表示生成1个主分片
		"numbers_of_replicas": 0 // 表示生成0个从分片
	}
}

Build a cluster

  • Start three nodes
  • Modify the configuration file: config/elasticsearch.yml
cluster.name: dianping-app # 相同集群的所有节点配置一样
node.name: node-1 # 同一个集群下的每个node.name唯一
network.host: 127.0.0.1 # 当前节点启动在哪个IP上面,如果在一台机器上的话可以用端口区分
http.port: 9200 # 端口
transport.tcp.port: 9300 # 做集群之间的指令通信,三个节点之间通过对方的这个端口做集群协商以及指令传输
http.cors.enabled: true # 允许前端做跨域访问
http.cors.allow-origin: "*"
discovery.seed_hosts: ["127.0.0.1:9300", "127.0.0.1:9301", "127.0.0.1:9302"] # 用来发现对应的集群节点
cluster.initail_master_nodes: ["127.0.0.1:9300", "127.0.0.1:9301", "127.0.0.1:9302"] # 竞选主节点

Guess you like

Origin blog.csdn.net/qq_36221788/article/details/109704278