ElasticSearch learning (a): ElasticSearch Introduction

  First, what ElasticSearch that?

  ElasticSearch is a very powerful, open source Lucene-based search and analysis engine that can help you from massive data, quickly find relevant data information.

  For example, when you search on GitHub, ElasticSearch can not only help you find the relevant code libraries can also help you achieve code-level search and highlighted; when you shop online, ElasticSearch can help you recommend related goods ; when you hit the car, elasticSearch can be positioned near the driver and passengers, to help optimize scheduling platform.

  In addition to search, combine Kibana, Logstash, Beats open source products, Elastic Stack ( referred ELK ) is also widely used in the field of near-real-time analysis of large data, including: log analysis, monitoring indicators and information security. It can help you explore massive amounts of structured and unstructured data, needed to create visual report, set alarm thresholds to monitor data through the use of machine learning, automatic identification of abnormal conditions.

  ElasticSearch is based Restful WebApi, developed using the Java language search engine library class, and as open source under the Apache license terms published, is the current popular enterprise-class search engine. Its clients in Java, C #, PHP, Python and many other languages are available. Download the appropriate address:  https://www.elastic.co/guide/en/elasticsearch/client/index.html

  So, ElasticSearch has two advantages:

  1) naturally supports distributed, horizontally scalable;

  2) provides a Restful interface reduces the learning curve for full-text search, because Restful interface, it can be invoked by any programming language;

  Lucene talk about the advantages and disadvantages:

  Advantages: 1) High performance; 2) easy to expand;

  Disadvantages:

  1) it can only be based on the Java language development;

  2) interface library steep learning curve;

  3) does not support native horizontal expansion;

  Two, Elastic Stack configuration

  1, ElasticSearch data search, analysis and storage, which is a JSON-based distributed search and analysis engine, designed to achieve levels of scalability, reliability and ease of administration and design .

Its implementation principle divided into the following steps:

    1) First, the user ElasticSearch submit data to a database;

    2) word by word and then the corresponding control statement;

    3) The segmentation results together into their weights, to prepare users search data, based on the weight will result ranking and scoring, the results will be returned to the user;

  2, Kibana data visualization, its role is performed in ElasticSearch in aviation. Kibana can be presented in graphical form data, and having extensible user interface, configuration and management of the full range ElasticSearch.

    Kibana earliest time is based tool Logstash created after the Elastic acquired in 2013.

    1) Kibana may provide various visualization graph;

    2) through machine learning techniques to detect abnormal situations for suspicious questions in advance;

  3, Beats is a collector for lightweight platforms, these collectors may transmit data from the machine to the edge Logstash, ElasticSearch, which is developed by the Go language, operating efficiency relatively fast. As can be seen from the figure, it is different Beats packages for different data sources.

  

  4, Logstash dynamic data collection pipes with expandable plug ecosystems, collected from various sources to support data, transmitting the converted data, and data to a different storage libraries. It can have a strong synergy with ElasticSearch, after Elastic company acquired in 2013.

    It has the following characteristics:

    1) real-time to parse and transform data;

    2) can be extended, having a plurality of plug 200;

    3) reliability and security. Logstash by durable queues will be to ensure the operation of the at least one service events, while the encrypted data is transferred;

    4) monitoring;

    对于日志的解决方案一般分为:日志搜索、格式化分析、全文检索、风险告警; 

  三、为什么要学习ElasticSearch?

  根据DB Engine的排名显示,ElasticSearch是最受欢迎的企业级搜索引擎。根据https://db-engines.com/en/ranking网站排名可知,比较靠前的有三家大数据搜索引擎公司,除了ElasticSearch,还有Splunk和Solr。其中Solr也是基于Lucene。

  

  1、在当前软件行业中,搜索是一个软件系统或平台的基本功能, 学习ElasticSearch就可以为相应的软件打造出良好的搜索体验。

  2、其次,ElasticSearch具备非常强的大数据分析能力。虽然Hadoop也可以做大数据分析,但是ElasticSearch的分析能力非常高,具备Hadoop不具备的能力。比如有时候用Hadoop分析一个结果,可能等待的时间比较长。

  3、ElasticSearch可以很方便的进行使用,可以将其安装在个人的笔记本电脑,也可以在生产环境中,将其进行水平扩展。

  4、国内比较大的互联网公司都在使用,比如小米、滴滴、携程等公司。另外,在腾讯云、阿里云的云平台上,也都有相应的ElasticSearch云产品可以使用。

  5、在当今大数据时代,掌握近实时的搜索和分析能力,才能掌握核心竞争力,洞见未来。(涨工资 

  四、学习ElasticSearch的入手层面

  1、开发层面

    1)了解ElasticSearch有基本功能;2)底层分布式工作原理;3)针对数据进行数据建模;

  2、运维层面

    1)进行集群的容量规划;2)对集群进行滚动升级;3)对性能的优化;4)出现问题后,对问题的诊断与解决;

  3、方案层面

    1)学习ElasticSearch后,可以针对实际情况,解决搜索的相关问题;2)可以将ELK运用到大数据分析场景中; 

  五、ElasticSearch的主要功能及应用场景

  1、主要功能:

    1)、海量数据的分布式存储以及集群管理,达到了服务与数据的高可用以及水平扩展;

    2)、近实时搜索,性能卓越。对结构化、全文、地理位置等类型数据的处理;

    3)、海量数据的近实时分析(聚合功能)

  2、应用场景:

    1)网站搜索、垂直搜索、代码搜索;

    2)日志管理与分析、安全指标监控、应用性能监控、Web抓取舆情分析; 

  六、Elastic Stack的生态圈

  

  从上图中可以看到,ElasticSearch做为ELK中的核心部分,它起到了数据存储的作用。而Kibana在上层可以为使用者提供一个可视化的界面。Logstash和Beats可以将各种各样的数据进行抓取和收集。

在右侧的X-Pack部分,是Elastic公司所提供的几种收费服务,同时Elastic公司也提供云的解决方案。 

  七、ElasticSearch与DB的集成

  

  针对上图,可以分为两种情况:

  1、将ElasticSearch当成数据库来存储数据,好处是架构比较简单;

  2、若数据更新比较频繁,同时需要考虑数据事务性时,应该先将数据存入数据库,然后建立一个合适的同步机制,将数据同步到ElasticSearch中; 

  八、ELK在数据指标收集,日志分析的架构设计

  

  从上图中可以知道,通过Beats或者程序来进行数据方面的收集,当收集的数据量较大时,需要加入一层(Redis、Kafka、RabbitMQ)进行数据缓冲,然后将数据送入Logstash进行聚合及数据处理,最后通后ElasticSearch进行分词、创建索引并存储,通过Kibana或者Grafana这类图形化工具进行数据的可视化和数据分析。

 

  知识学习来源:《Elasticsearch核心技术与实战》

Guess you like

Origin www.cnblogs.com/supersnowyao/p/11110703.html
Recommended