ElasticSearch study notes-Chapter 1 Overview

1. Overview of ElasticSearch

1.1 Introduction

Elaticsearch, referred to as ES, is an open source, highly scalable, distributed, RESTful-style full-text search engine and is the core of the entire Elastic Stack technology stack. It can store and retrieve data in near real-time; it is very scalable and can be extended to hundreds of servers to process PB-level data.

The Elastic Stack technology stack includes Elasticsearch , Kibana , Beats and Logstash.

Elasticsearch for searching stored data

Kibana is used to display data

1.2 Full-text search engine

Website searches such as Google and Baidu all generate indexes based on keywords in web pages. When we enter keywords when searching, they will return all web pages matched by the keyword, that is, the index; there are also common projects Search of application logs and more. For these unstructured data texts, relational database search is not well supported.

In general, full-text search in traditional databases is very useless, because generally no one uses data inventory text fields. Full-text search requires scanning the entire table. If the amount of data is large, even optimizing the SQL syntax will have little effect. The index is established, but it is also very troublesome to maintain. The index will be rebuilt for insert and update operations.

Therefore, when we need to use conventional search methods in the following scenarios, it is very performance-consuming:

  • The data objects searched are large amounts of unstructured text data.
  • The number of file records reaches hundreds of thousands or millions or more.
  • Supports a wide range of interactive text-based queries.
  • Requires very flexible full-text search queries.
  • There is a special need for highly relevant search results, but no available relational database can satisfy it.
  • Situations where there are relatively few requirements for different record types, non-text data manipulation, or secure transaction processing.

The full-text search engine mentioned here refers to the mainstream search engines that are currently widely used. Its working principle is that the computer indexing program scans each word in the article, builds an index for each word, and indicates the number and location of the word in the article. When the user queries, the retrieval program uses the pre-established index Search method and feed the search results back to the user . This process is similar to the process of looking up a word through the search word list in a dictionary.

1.3 Lucene

Lucene is a sub-project of the Jakarta project team of the Apache Software Foundation. It provides a simple but powerful application programming interface that can perform full-text indexing and search. Lucene is a mature free and open source tool in the Java development environment . By itself, Lucene is currently and has been the most popular free Java information retrieval library in recent years. However, Lucene is only a core toolkit that provides a full-text search function library, and its actual use requires a complete service framework to be built and applied.

Elasticsearch is built based on Lucene and can independently deploy and launch search engine service software.

1.4 Application cases

  • GitHub: In early 2013, Solr was abandoned and Elasticsearch was used for petabyte-level searches. "GitHub uses Elasticsearch to search 20TB of data, including 1.3 billion files and 130 billion lines of code."
  • Wikipedia: Launching core search architecture based on Elasticsearch
  • Baidu: Elasticsearch is currently widely used for text data analysis. It collects various indicator data and user-defined data on all Baidu servers. Through multi-dimensional analysis and display of various data, it assists in locating and analyzing instance anomalies or business-level anomalies. It currently covers more than 20 business lines within Baidu (including cloud analysis, network alliances, predictions, libraries, direct accounts, wallets, risk control, etc.), with a maximum of 100 machines and 200 ES nodes in a single cluster, and 30TB+ data is imported every day.
  • Sina: Use Elasticsearch to analyze and process 3.2 billion real-time logs.
  • Alibaba: Use Elasticsearch to build a log collection and analysis system.
  • Stack Overflow: A website for solving bug problems, all in English, a website for programmers to communicate

reference

[Silicon Valley] ElasticSearch tutorial from getting started to mastering (based on the new features of ELK technology stack elasticsearch 7.x+8.x)

Guess you like

Origin blog.csdn.net/weixin_42584100/article/details/129653418