Elasticsearch data organization

Data organization mainly from the following two aspects

  Logic Design: document, type, index

  Physical design: node fragment

       Inverted index

Foreword

Logical Design: We elasticsearch relational database to be objective comparison

Relational DB Elasticsearch
Database (database) Index (indices)
Table (tables) Type (types)
Lines (rows) Documents (documents)
Fields (columns) Fields (fields)

 

 

 

 

 

 

elasticsearch (cluster) may comprise a plurality of indexes (databases), each index may comprise a plurality of types (Table), each document type may include a plurality of (rows), each document may comprise a plurality of fields (columns).

Note: In previous versions in an index, and assigns each document store multiple map types, the type of mapping is used to indicate the type of documents being indexed or entity, it also creates a problem (), which later led to version 6.0. 0 in a document can contain only one type mapping, mapping type is deprecated in 7.0.0 to 8.0.0 are completely removed.

Logic Design: document, type, index

Document Properties

  elasticsearch is a document-oriented operation, which is the smallest unit is the document

  Self-contained: a document that contains both the field and the corresponding value key: vaule form

  It can be hierarchical type: a document that contains self-documenting

  Flexible structure: in a relational database table can be pre-designed its operation, and elasticsearch, it is sometimes possible to ignore a field or dynamic to add a field (but this may lead to dirty data, we can book field, after no change the field)

  No Mode: corresponding field type may be worth Any type

Types of

  Logical container document, like relational databases, the table is a row of containers

  For the type defined in the field called mapping, is mapped to a string name, such as

index

  Index is mapped type of container, the index is very large collection of documents, which are present on each slice

Physical Design: nodes and fragmentation

node

  A cluster contains at least one node, and a node is a elasticsearch process. There may be a plurality of nodes within the index.

  By default, if you create an index, the index will have five slices ( Primary Shard , known as the main fragment) composition, and each fragment have a copy ( Replica Shard , also known as copy fragments) , Thus, there are 10 slices.

The figure we can see is a cluster of three nodes, you can see the main fragment and fragment corresponding copy will not be in the same node, it is a good hang a node, data that does not mean lost.

A slice is a Lucene index, the inverted index a file directory that contains the inverted index structure makes elasticsearch without scanning all the documents, will be able to tell you which documents that contain specific keywords.

Inverted index

elasticsearch uses a structure called inverted index using Lucene inverted as the underlying cable. This structure is suitable for fast full-text search, an index made up of non-repetition of the list of all the documents, for each word, there is a list of documents that contains it.

1 Study Every Day, Good Good up to Forever         # contents of the document included 1 
2 the To Forever, Study Every Day, Good Good up        # contents of the document contained 2
term doc_1 doc_2
Study ×
To ×
every
forever
day
study ×
good
every
to ×
up

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If we search forever to documents containing each term

term doc_1 doc_2
to ×
forever
total 2 1

 

 

 

 

 

 

elasticsearch Lucene indexing and index contrast

  elasticsearch the index is divided into slices, each slice is a Lucene index. Therefore, by a plurality of index elasticsearch Lucene index thereof.

Read more:  inverted index  |  inverted index principle and

 

Guess you like

Origin www.cnblogs.com/Alexephor/p/11387172.html