Getting started with elasticsearch

Original text: https://blog.csdn.net/laoyang360/article/details/52244917

How did ES come about?

(1) Thinking: How to retrieve large-scale data

When the amount of system data reaches 1 billion or 10 billion, we usually consider the problem from the following perspectives when doing system architecture:

  • What database to use?
  • How to troubleshoot single point of failure (lvs, F5, A10, Zookeeper, MQ)
  • How to ensure data security (hot backup, cold backup, remote multi-active)
  • How to solve retrieval problems (database proxy middleware: msql-proxy, Cobar, MaxScale, etc.)
  • How to solve statistical analysis problems (offline, near real-time)

(2) Coping solutions for traditional databases

For relational data, we usually use the following or similar architectures to solve query bottlenecks and write bottlenecks:

  • Solve data security issues with master-slave backup
  • Through the heartbeat monitoring of the database agent middleware, the single point of failure problem is solved
  • Distribute query statements to each slave node for query through proxy middleware, and summarize the results

(3) Solutions for non-relational databases

For Nosql database, taking mongdb as an example, other principles are similar:

  • Database security with replica backup
  • Solve single-point problems through node election mechanism
  • First retrieve shard information from the configuration library, then distribute the request to each node, and finally combine the summary results by the routing node


In order to solve the problem, the method is usually found in the following ways:

(1) Store data in order when storing data

(2) Separate data and indexes

(3) Compressed data

This leads to an Elasticsearch.

IS

Elasticsearch is an open source, highly scalable, distributed full-text search engine that can store and retrieve data in near real-time with good scalability. Using Lucenne as the core, but its purpose is to hide the complexity of lucene through the Restful API, thus making full text search easy.

1. Solve the main problem

(1) Retrieve relevant data

(2) Return statistical results

(3) faster

2. How it works

When the ES node starts, it will use multicast (or unicast, if the user has changed the configuration) to find other nodes in the cluster and establish connections with them.


3. Concept

(1) Cluster cluster

(2) Node node

(3) Sharding

When there are a large number of documents, one node may not be enough due to memory limitations, insufficient disk processing capacity, and inability to respond to client requests quickly enough. In this case, the data can be divided into smaller shards, each shard. on a different server. When the index you query is distributed across multiple shards, ES will send the query to each relevant shard and combine the results.

(4) Copy replia

A replica is an exact replica of a shard, and each shard can have zero or more replicas.

(5) Full-text search

You can search by keyword, similar to like in sql.

4. The main concepts of ES data architecture (compared with mysql)


 In a relational database, the schema defines the tables, the fields of each table, and the relationships between the tables and the fields. Correspondingly, in ES, Mapping defines the field processing rules of the Type under the index, that is, how to establish the index, the type of the index, whether to save the original index JSON document, whether to compress the original JSON document, whether to need word segmentation processing, how to perform word segmentation processing, etc. .

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325796311&siteId=291194637
Recommended