Elasticsearch series --- a comprehensive understanding of Document

Overview

This part describes the knowledge about the document's metadata to document and explain basic grammar.

Core metadata document

Getting real in front of a simple example introduced document data, this time we take a closer look at its core metadata query response message is as follows:

{
  "_index": "music",
  "_type": "children",
  "_id": "1",
  "_version": 1,
  "found": true,
  "_source": {
    "name": "gymbo",
    "content": "I hava a friend who loves smile, gymbo is his name",
    "language": "english",
    "length": "75"
  }
}

_index metadata
representative of a document which is stored in the index, a project agreed on a structure similar to the index data, different data put in different indexes, so in the same document the basic structure is similar to the index, individual document one more or less a field so Elasticsearch maximum disk storage utilization.
Each index has its own independent shard store files, independently of each other index.

Naming convention: names in lower case, can not '_', - or the beginning, '+' ''.

_type metadata
after ES 6.0.0 an index below can only have one type, specify the earliest is what is what.

Naming convention: use the '_' at the beginning, as only one example of the direct use on the official '_doc'.

_id metadata
that uniquely identifies the document, together with the index uniquely identify and locate a document, either manually or automatically created with the ES.

_version metadata
inside ES optimistic locking to write the document control, version version number is initially 1, after a successful auto update +1.

found Metadata

document search sign, success is true, search is not false.

_source metadata
inside that we've placed json string http request body content in the new, business data is stored, by default when the Get operation, will be unchanged all returned to the client.

When using Get command search document, you can customize the returned results, you can specify the desired field in _source request, sample command:

GET /music/children/_search
{
  "query": {
    "match_all" : {}
  },
  "_source": ["name","content"]
}

document id

id manual and automatic generation of document designated in two ways:

  1. Manually specify
    when the PUT command line specifies ID, a manual mode

PUT /music/children/id

  1. Generated automatically
    when the PUT command is not specified ID, automatically generates case ES id, a length of 20 characters, URI safety, base64 encoding, GUID, guaranteed not repeated.

PUT /music/children

How to choose our project ID generation way?
In general, look at the role assumed in the system Elasticsearch, if the system is operational, relational data itself has completed landing data, Elasticsearch value is to fill the short board full-text search of relational data, Elasticsearch data source itself with the ID, in this case it should use a manually specified manner ID, ID database can be directly used for storing data, a subsequent search function, it is also very easy to establish correspondence between the databases. For example order data, this time using a direct order ID to ID, order ID generated by the way, whether it is incremented ID, snowflakes ID, in terms of Elasticsearch does not matter, it can guarantee uniqueness.

ID automatically generated scene, for example, some systems, it is not a relational database, it is probably the only Elasticsearch floor plan data, this data structure, ID is not much importance, then let Elasticsearch automatically generate one can be saved to Elasticsearch can be.

tips: GUID, UUID, COMB concept

  • UUID: 128-bit integer (16 bytes) of the universal unique identifier (Universally Unique Identifier), which is a software-defined by the Open Software Foundation (the OSF) Construction of a standard.
  • GUID: this is Microsoft's implementation of the UUID standard. UUID there are other various implementations, more than one kind GUID.
  • COMB (combine) specific to a type of database design can be understood as an improved GUID, GUID and that by combining the system time, so that it has better performance during indexing and retrieval.

The difference between the GUID and UUID, generating different formats.

document write operation

  1. Force the creation
    force the creation of more _create parameters in grammar, or op_type = create, such as

PUT /music/children/id/_create
or
PUT /music/children/id?op_type=create

Force the creation of an alternative to the full amount of the similarities and differences:

  • When the ID is not present, the same effect of both.
  • When present ID, the full amount of the replacement do update operations, force the creation of error, suggesting "version conflict, document already exists" error.
  1. Delete document

If you perform delete operations on a document, ES is not physically removed immediately, but the first is marked as deleted state, when the file data increases to meet certain conditions, ES and then performs a physical delete, similar to the JVM's garbage.
This process is called soft delete, also known as lazy delete.

  1. The full amount of the replacement & incremental update
Replace the whole amount

The full amount of the replacement command can be executed multiple times, if the ID does not exist, create document operation, if the ID exists, perform updates, syntax examples:

PUT /music/children/id

The full amount of the replacement principle: When the whole amount of the replacement request is sent to the ES, will perform the original document ID soft delete, and then create a new document, the request body content store to the new document, the subsequent GET queries only focus on non-deleted state document, thus completing a full amount of the replacement operation.

Before an incremental update must ensure that the ID is there, there is the update operation, if not, throw "document_missing_exception" error message.

Incremental update

Principle incremental update, replace consistent with the full amount, but also the soft delete process, just create a new document, you need to copy the original document data, and then increments the contents were covered, to get a new document.

Incremental update than the full amount of the advantages of replacing

  • Query modification write-back operations have occurred in an internal shard, network bandwidth is smaller (there are 2 network transmission), greatly improved performance
  • Reducing the time lapse in the query and modify, can effectively reduce concurrency conflict situations (millisecond update)
  • Reduce the workload of a full application splicing amount of data (if json field more, stitching a complete document is very troublesome)

summary

Data in this chapter a brief explanation about the document's metadata, hoping to deepen the impression of the document.

High focus on Java concurrency, distributed architecture, more dry goods share technology and experience, please pay attention to the public number: Java Architecture Community
Java Architecture Community

Guess you like

Origin www.cnblogs.com/huangying2124/p/11955515.html