Lucence

1. The first step in keyword retrieval is to segment the entire document

English space

Chinese Chinese thesaurus with Chinese word segmentation

Chinese word segmentation includes IK and Paoding, which can be used directly after configuration.

 

2. Lucence's open source project

--lucence core: The core class library written in java provides the underlying API and SDK for full-text search function

--Solr: A high-performance search service developed based on lucence core, providing a high-level encapsulation interface of RESTAPI

 

3. Five basic classes

--Document any document to search

--Field property

--IndexWriter builds the index and persists the index to the Directory

--Analyzer finds keywords by word segmentation of document content

--Directory represents the storage location of the lucence index (memory, file)

 

4. Disadvantage of lucence: cannot be distributed

Solr, ElasticSearch distributed, full-text retrieval middleware based on lucence

 

5. Lucence is generally combined with web crawlers, and the crawlers crawl commodity information on various e-commerce platforms and enter them into the lucence index library

 

6, solr distributed cluster solrcloud

Decentralization: Coordinated through zookeeper

Sharding Algorithm: Consistent Hash Algorithm

Leader node: elected by zookeeper

If each shard has 3 copies of data: one leader and two replicas

The replica of the index data is synchronized, and the data request is forwarded to the leader node of the shard, and then the leader node synchronizes it to each Replica node.

 

7. Near real-time query relies on soft commit

--soft commit memory

--hard commit disk

 

8、ELK

ElasticSearch

Logstash: A data collection engine with real-time pipeline capability, used to collect log data and write it into the ES cluster as index data

Kibana: provides a web interface for data analysis and data visualization for ES

 

9. ES is also a distributed system, but instead of using zookeeper, it implements a set of modules called Zen Discovery, which is mainly responsible for the automatic discovery of nodes in the cluster and the election of master nodes.

discovery.zen.minimum_master_nodes It determines how many nodes need to communicate in the process of electing the Master

ES has a Tribe node: you can connect to multiple clusters, perform searches and other operations in all clusters

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325086880&siteId=291194637