Elasticsearch series --- acquaintance Elasticsearch

What Elasticsearch that?

Elasticsearch referred to as ES, is built on an open source Lucene, distributed, Restful full-text search engine interface, or a distributed document database. Inherently distributed, highly available, scalable, can be stored in a very short period of time, search and analyze large amounts of data.

What is the full-text search?

Full text search is also called full-text search, scanning means that every word in the article, and every word into the establishment of an index indicating the number and location of the word appears in the article, the current input keyword end user initiates a query request, Search engine searches in a previously established index, and will respond to the query results to the user.
There are two keywords: segmentation and indexing, internal Elasticsearch will do both of these things, to save the contents of the text word by the rules, and these entries after the word index for user queries.

What is the inverted index?

Full-text search index based on the process of creation is called keyword inverted index, by definition, to establish a positive relationship "text - keyword" call forward index, follow-up will introduce inverted index is the original relationship upside down, to establish " Key words - text relationship ", this relationship is very conducive to search.
for example:

  • 文本1:I have a friend who loves smile
  • 文本2:I have a dream today

First performed in English word, and then create an inverted index, get a simple - mapping relationship "keyword text" are as follows:

Key words Text Number
I 1,2
have 1,2
a 1,2
friend 1
who 1
loves 1
smile 1
dream 2
today 2

With this time map search "have" keyword, can immediately returns the id of 1 and 2 two records, search Today, 2 returns the id of the record, so that search performance is very high. Of course Elasticsearch maintenance inverted index contains more information, here is just as simple principle introduction.

Elasticsearch what the scene is suitable?

Common scenarios
  1. Search class scene
    common search scenarios such as the electricity supplier sites, job sites, like news sites, search within a variety of app.

  2. Log scene analysis class
    classic ELK combination (Elasticsearch / Logstash / Kibana), can complete log collection, log storage, log analysis query interface basic functions, is currently implementing the program is very popular, most companies are using the system log analysis program .

  3. Early warning platform data and data analysis scenarios
    such as the electricity supplier price warning, set the price of early warning in support of electronic business platform, when prices below a certain value, triggering a notification message informing the user to purchase.
    Data analysis, such as analysis of common brand electronic business platform sales top 10, the system analysis blog, website top 10 headlines attention, comments, views content, and so on.

  4. Commercial BI systems
    than large retail supermarkets, users need to be analyzed in the previous quarter, the amount of consumption, age, time period each day to the number of stores and other distribution information, the output of the corresponding report data, and predict next quarter's best sellers, directed according to age recommend suitable products. Elasticsearch perform data analysis and mining, Kibana do data visualization.
Common Case
  • Wikipedia, Baidu Encyclopedia: full-text search, highlight, search recommendation function
  • stack overflow: full-text search, according to the key information being given, to search for solutions.
  • github: the key to the code you want to search from hundreds of billions of lines of code.
  • Log analysis system: internal structures of enterprises ELK platform.
  • and many more

Elasticsearch architecture diagram

Elasticsearch functional framework

Each component architecture simple explanation:

  • gateway underlying storage system, the file system generally support a variety of types.
  • distributed lucence directory lucence based distributed framework that encapsulates create an inverted index, data storage, translog, segment or the like.
  • ES main modules of the module layers, comprising an index module, a search module, the mapping module.
  • Discovery cluster node discovery module for communication between the operation of the election Coordinate node cluster node, to support a variety of discovery mechanism, such as zen, ec2 like.
  • A script parsing module to support scripts written in the query, such as painless, groovy, python and so on.
  • plugins third-party plug-ins, plug-ins provided by a variety of advanced features, support customization.
  • transport / jmx communication module, data transmission, using the underlying framework netty
  • Interface restful / node provide external access to the cluster Elasticsearch
  • x-pack elasticsearch an expansion pack, the integrated safety, warning, monitoring, reporting and graphics, seamless access pluggable design.

Elasticsearch installation

Official website address

https://www.elastic.co/cn/
there are various versions of the download address, official documents and use the example above, download and install your own package.

Source Address

https://github.com/elastic/elasticsearch
have above each version of the source address, you can switch to the specified version of the study, the current version is 6.3.1 selection

installation steps
  1. Environmental requirements
    JDK 1.8 and above
  2. Official website to download the installation package can be directly extracted in the specified directory
  3. Execute bin / elasticsearch (Linux, be careful not to use the root account)
    bin \ elasticsearch.bat (Windows)
  4. curl http: // localhost: 9200 / or browser to open http: // localhost: 9200 / address, see the following response indicates a successful startup :
{
  "name" : "node-1",
  "cluster_name" : "hy-application",
  "cluster_uuid" : "lJ4DRWOvQauAy-VEYiZc2g",
  "version" : {
    "number" : "6.3.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "eb782d0",
    "build_date" : "2018-06-29T21:59:26.107521Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
  1. kibana start downloading codecs
    execution bin / kibana (Linux, be careful not to use the root account) or bin \ kibana.bat (Windows), and if kibana elasticsearch deployed on the same machine, the default configuration file.
  2. Verify kibana, enter http://192.168.17.137:5601/ on the browser, the following interface represents a successful start:

Elasticsearch series --- acquaintance Elasticsearch

summary

Benpian briefly introduces the basic concepts of Elasticsearch for the scene and the main functions of the framework, as well as the easiest to install startup procedure for verification of learning, as the opening learning Elasticsearch system, Elasticsearch has a feature that is out of the box, if it is for learning, or small and medium application, less data than, the operation is not very complex, then a direct start can be used. Subsequent Elasticsearch learning, unless otherwise specified, are 6.3.1 version, for example.

High focus on Java concurrency, distributed architecture, more dry goods share technology and experience, please pay attention to the public number: Java architecture community
can scan two-dimensional code on the left add friends, invite you to join micro-channel Java architecture community groups to discuss technical
Java Architecture Community

Guess you like

Origin blog.51cto.com/2123175/2480897