Inquiry | Elasticsearch boundaries of traditional database

Reprint (original link): https://blog.csdn.net/laoyang360/article/details/103379651

0 Preface

 Now almost all online information said data stored in traditional databases, and then synchronize es in a data retrieval as, but not very detailed description of how to do this, and in the case of es itself can store data, storage is not no need to duplicate data? Also cause other problems.

Although the charges and supported syntax is not complete, but in the case has now es support sql, I am getting confused boundary between es and databases.

es not support transactions but can be written to ensure that a single data, so that the transaction can be achieved through code. It is difficult to conduct joint inquiries can achieve the same as any other nosql with a wide table. Real-time configuration can be adjusted, and in the scalability and complex statistical certainty es better.

Based on the above question, what difference at this stage es with the database or the boundaries Where?

https://elasticsearch.cn/question/8885

- questions from the community

In fact, take a traditional relational database and Elasticsearch direct contrast to some far-fetched, after all, is a database, a search engine.

If you insist contrast, we peel cocoon spinning, explore a little bit different Elasticsearch traditional databases.

1, a different mission

Oracle relational database definition:

Relational database, refers to the use of the relational model database to organize the data, which is stored as rows and columns of data in order to understand the user, a relational database this series of rows (records that contain the unique key) and the column (storage attributes) is called a table, a set of database tables.

Elasticsearch the official definition:

Elasticsearch is a distributed open source search and analysis engine, applicable to all types of data, including text, numbers, geospatial, structured and unstructured data. Elasticsearch was developed in the Apache Lucene basis, first published by the Elasticsearch NV (now known as Elastic) in 2010. Elasticsearch its simple style REST the API, distributed nature, speed and scalability is known, it is a core component of Elastic Stack; Elastic Stack is applicable to data acquisition, enhancement, storage, analysis and visualization of a set of open source tools. It is generally referred Elastic Stack ELK Stack (on behalf of that Elasticsearch, Logstash and Kibana), including a rich present Elastic Stack lightweight data collection agents that collectively Beats, may be used to send data to Elasticsearch.

A relational database can store data and also index it.

A search engine can index data but also store it.

As may be interpreted as popular:

Relational database can store data and index.

Search engines can index data, but can also store data.

2, different application scenarios

Relational database more suitable for OLTP (transactional dollar as a unit of data processing, human-computer interaction computer application system, the biggest advantage: The biggest advantage is that can instantly process incoming data, timely answer) business scenarios; and Elasticsearch can not be used as a pure database.

Reason 1: Does not support transactions,

Reason 2: near real time instead of near real-time, controlled by the refresh_interval, after data is written to the fastest 1s can be retrieved.

Elasticsearch for OLAP scene (which enables analysts to quickly, consistent, interactive observation information from all aspects, in order to achieve in-depth understanding of the data. Focused analysis).

For example:

Massive log analysis and retrieval,

Massive large text full-text retrieval.

3, different storage types

Relational databases generally support storing structured data (pgsql support json).

Structured data features:

Expression and logic implemented by the two-dimensional data table structure

Strictly follow the data format and length specifications.

Example: bank transaction data, personal information and other data.

And Elasticsearch support relational and unstructured data, such as: json type of object or nested or stored Join Sons.

Unstructured data features:

Irregular or incomplete data structure;

No predefined data model, inconvenient to use two-dimensional logical database tables to represent data.

For example: all formats including office documents, text, images, XML, HTML, various reports, images and audio / video information and so on.

Mind think about: is not encountered in actual combat: the data structure variable, variable number of fields, field types and downs, whether to add dynamic variable such as changing business scenarios?

4, different scalability

Relational database common problem, such as: mysql single table support limited amount of data, a large amount of data on library score points table, and then distributed to consider a big bottleneck native distributed as follows:

Sub-library sub-table is very troublesome,

High business dependence,

Complex queries errors,

More importantly, it can not be effectively distributed transaction processing.

She gave birth to a lot of third-party companies such as NewSql: TIDB (+ open source solutions for pay).

The Elasticsearh support scale, inherently support multi-node cluster deployment, expansion capability, and even supports cross-cluster retrieval; supports PB + data.

Domestic: drops, Ctrip, SF, headlines today, bat, and many other core data services have been realized by Elasticsearch.

5, solve the problem of different

Relational database for core: CRUD business scenarios, for full-text search will slow death (many customers migrate Elasticsearch is for this reason, early after use with solr lucene, but found Elasticsearch easier to use); and Elasticsearch inverted index mechanism is more suitable for full-text search.

Real business:

If the data is not recommended to use simple relational database combined with a simple SQL query will solve the problem.

If you do not have performance problems, keep the architecture simple and uses a single database to store, add some cache when necessary (such as redis).

If you experience performance problems in your search, you can use relational databases and Elasticsearch combination.

6, different data models

Relational database design is usually multi-table, different tables in different business models for complex, multi-table or view related by join query.

The Elasticsearch support complex business data, not normally associated with the proposed multi-table, actually, Elasticsearch inverted indexing mechanism determines that it is not suitable for multi-table natural association. Complex business data is often the solution:

1, wide table (space for time);

2, nested

3, and his son associated join (for frequent updates scenes).

For aggregation business scenario, indeed a large amount (more than ten million) the total amount of data multiple nested aggregation es will be very slow, service selection can consider other aid programs.

7, different underlying logic

Traditional database storage engine for the B + tree, including LSM Tree ES lot NOSQL database used for the write operation to support more efficient.

Why Elasticsearch / Lucene searches can be faster than mysql?

Mysq the word dictionary (term dictionary) is a sort of b-tree stored on disk. Retrieving a term require several random access disk operations.

The Lucene word dictionary on the basis of the added term index (saved in FST (finite state transducers) form, very save memory) to speed retrieval, term index in the form of a tree cached in memory. After found the corresponding block from the location term dictionary term index, go to the disk to find a term, greatly reducing the number of random disk accesses.

8. Summary

So, no newbest "a first move, eat days" of the program, only the most suitable solution.

For your business scene is the best!

Other welcome supplement.

reference:

[1] https://stackoverflow.com/questions/51639166/elasticsearch-vs-relational-database 

[2] https://zhuanlan.zhihu.com/p/73585202 

[3] https://www.elastic.co/cn/what-is/elasticsearch 

[4] https://www.oracle.com/database/what-is-a-relational-database/ 

[5] https://zhuanlan.zhihu.com/p/33671444

END

Published 35 original articles · won praise 17 · views 30000 +

Guess you like

Origin blog.csdn.net/dev666/article/details/104406459