Big data day 15 -elasticsearch

Big data day 15 -elasticsearch

lucene Summary

lucene full-text search is a framework, rather than applications.

The principle is lucene inverted index, the essence is to give lucene search to quickly locate content

Inverted index
inverted index is based on the value needed to find property records. Each such an index table includes addresses having a property value of each record of the attribute value. Because the property value is not determined by the recording, but recording is determined by the location attribute value, which is called an inverted index. In a nutshell is the value of fixed keys to find the location of key articles will appear tf-idf weights text weight were descending order.

In order to achieve fast positioning search, lucene need:

The data source docment package, then the inverted index (word-breaking) according to the needs, and then create an inverted index documents at the same time the old doc selective saved (source path must be saved), to form a new doc saved on disk (old doc documents will be destroyed) has a new document id, article content from the old doc data. Old {doc contains three attributes whether to save the index word if}

lucene carrying data in two ways: the amount of total and static slice sharding

1561081299751

Elasticsearch

Horizontal longitudinal extension principle

When the scale data input using% n way data into each slice, in which case the new index containing the index and the document doc documents in each section. When the output data, the need to find a plurality of slice slice as a temporary spare agents, other summary data is outputted. Increased expansion and concurrent processing, but does not solve the availability problem.

In longitudinal extension, a master-slave model, different from the other sub-sheet position fragments (from fragmentation), borne by the pressure reading from the master slice slice / query.

As the amount of data you need to create a new node problem, not slot%, because redis data are static data <k, v>, lucene itself is a real-time computing framework, so before you need to go through to build after data estimation, in a few percent of the time to take on more lucene sheet. Because the follow-up can not be re-divided lucene, can only increase the server.

Elasticsearch is a Lucene-based search and real-time distributed analysis engine. Designed for the cloud, to achieve real-time search, stable, reliable, fast and easy to install. Based on RESTful interface.

ElasticSearch and contrast the advantages and disadvantages Solr - networked data processing, data processing single
Solr using the Zookeeper distributed management, coordination and Elasticsearch itself with distributed management functions;
Solr supports more data formats, and Elasticsearch only supports json file formats;
Solr official offer more features, but Elasticsearch itself to focus more on core functionality, advanced features to provide more than a third-party plug-ins;
Solr outperformed Elasticsearch in traditional search applications, but when dealing with real-time search application efficiency is significantly lower than Elasticsearch .
Solr is a powerful solution for traditional search applications, but Elasticsearch more suitable for emerging real-time search

Resources GET PUT POST DELETE
URI of a set of resources, such as: http://example.com/resources / Lists details URI, and for each resource in the resource group (which is optional) Replace the current group resource entire specified set of resources Create / append a new resource in this resource group. This operation is often return url address of the new resource Delete the entire group resources
URI single resource, such as: http://example.com/resources/142 Get the specified resource details, the format can be chosen a suitable network media types (e.g., xml or json) Alternatively / create a named resource, and append it to the corresponding resource group. The specified resource as a resource group, and under which create / append a new element, it is part of the current resource Delete the specified element
operating Gets the current state of the object Change the object state Create Object Deleting objects

XPUT and XPOST can do to create and modify work, when the ID is not executed when you create a job, when there is ID, modify the content of the doc, the difference is that when using XPOST, you can not set ID, generated by its own internal , but you must set your own ID when using XPUT.

Relational Database ElasticSearch
database (database) index (index Library)
Table (Table) type (type)
row (line) document (document)
column (column) field (field)

Push doc format (operation command ElasticSearch the curl)

curl -XPOST http://sxt003:9200/bjsxt/employee -d ’
{
“first_name” :“bin”, “last_name” : “tang”,
“age” : 33,
“about” : “I love to gorock climbing”,
“interests”: [ “sports”, “music” ]
}’

1561101484918

1561167207714

Guess you like

Origin blog.csdn.net/qq_40929921/article/details/93300116