Data organization mainly from the following two aspects
Logic Design: document, type, index
Physical design: node fragment
Inverted index
Foreword
Logical Design: We elasticsearch relational database to be objective comparison
Relational DB | Elasticsearch |
Database (database) | Index (indices) |
Table (tables) | Type (types) |
Lines (rows) | Documents (documents) |
Fields (columns) | Fields (fields) |
elasticsearch (cluster) may comprise a plurality of indexes (databases), each index may comprise a plurality of types (Table), each document type may include a plurality of (rows), each document may comprise a plurality of fields (columns).
Note: In previous versions in an index, and assigns each document store multiple map types, the type of mapping is used to indicate the type of documents being indexed or entity, it also creates a problem (), which later led to version 6.0. 0 in a document can contain only one type mapping, mapping type is deprecated in 7.0.0 to 8.0.0 are completely removed.
Logic Design: document, type, index
Document Properties
elasticsearch is a document-oriented operation, which is the smallest unit is the document
Self-contained: a document that contains both the field and the corresponding value key: vaule form
It can be hierarchical type: a document that contains self-documenting
Flexible structure: in a relational database table can be pre-designed its operation, and elasticsearch, it is sometimes possible to ignore a field or dynamic to add a field (but this may lead to dirty data, we can book field, after no change the field)
No Mode: corresponding field type may be worth Any type
Types of
Logical container document, like relational databases, the table is a row of containers
For the type defined in the field called mapping, is mapped to a string name, such as
index
Index is mapped type of container, the index is very large collection of documents, which are present on each slice
Physical Design: nodes and fragmentation
node
A cluster contains at least one node, and a node is a elasticsearch process. There may be a plurality of nodes within the index.
By default, if you create an index, the index will have five slices ( Primary Shard , known as the main fragment) composition, and each fragment have a copy ( Replica Shard , also known as copy fragments) , Thus, there are 10 slices.
The figure we can see is a cluster of three nodes, you can see the main fragment and fragment corresponding copy will not be in the same node, it is a good hang a node, data that does not mean lost.
A slice is a Lucene index, the inverted index a file directory that contains the inverted index structure makes elasticsearch without scanning all the documents, will be able to tell you which documents that contain specific keywords.
Inverted index
elasticsearch uses a structure called inverted index using Lucene inverted as the underlying cable. This structure is suitable for fast full-text search, an index made up of non-repetition of the list of all the documents, for each word, there is a list of documents that contains it.
1 Study Every Day, Good Good up to Forever # contents of the document included 1 2 the To Forever, Study Every Day, Good Good up # contents of the document contained 2
term | doc_1 | doc_2 |
Study | √ | × |
To | × | √ |
every | √ | √ |
forever | √ | √ |
day | √ | √ |
study | × | √ |
good | √ | √ |
every | √ | √ |
to | √ | × |
up | √ | √ |
If we search forever to documents containing each term
term | doc_1 | doc_2 |
to | √ | × |
forever | √ | √ |
total | 2 | 1 |
elasticsearch Lucene indexing and index contrast
elasticsearch the index is divided into slices, each slice is a Lucene index. Therefore, by a plurality of index elasticsearch Lucene index thereof.
Read more: inverted index | inverted index principle and