Spring Boot integrates Elasticsearch and Logstash to realize MySQL data synchronization and full-text search

Elasticsearch

Let's first understand Elasticsearch

Welcome to Elastic — Developer of Elasticsearch and Kibana | Elastic

Elasticsearch is an open source distributed search and analysis engine designed to process and store large-scale real-time data. Its main features are fast, powerful search capabilities and flexible data analysis functions. Here are some key features and uses of Elasticsearch:

  • Distributed architecture : Elasticsearch is designed as a distributed system that can run on multiple servers to form a cluster. Data is automatically sharded and replicated across the cluster, providing high availability and scalability.

  • Real-time search and analysis : Elasticsearch provides a very fast real-time search capability, which can perform fast full-text search on large-scale data sets, and supports various query operations, such as fuzzy search, exact match, range query, etc.

  • Multiple data types support : Elasticsearch not only supports text data, but also supports indexing and querying of multiple data types such as numbers, dates, and geographic locations.

  • Full-text search engine : Elasticsearch provides powerful full-text search functions, supports word segmentation, semantic analysis, spelling error correction, etc., and can quickly find relevant results in a large amount of text data.

  • Various queries and filters : Elasticsearch provides rich query syntax and filters to enable users to retrieve data more precisely.

  • Document-oriented database : Elasticsearch is a document-oriented database. Data is stored in the form of documents in JSON format. Each document has a unique identifier called a document ID.

Elasticsearch is widely used in a variety of fields, including enterprise search, log and event data analysis, e-commerce site search, content management systems, business metrics monitoring and visualization, and more. Its powerful search and analysis capabilities make it one of the important tools for dealing with large-scale data.

Spring Boot backend configuration

maven configuration

Maven Repository: Search/Browse/Explore (mvnrepository.com)

The pom.xml file leads to the corresponding maven package

<!-- https://mvnrepository.com/artifact/org.springframework.boot/spring-boot-starter-data-elasticsearch -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

application.properties configuration

#---------------------------------------
# ELASTICSEARCH 搜索引擎配置
#---------------------------------------
# 连接数据库
spring.elasticsearch.rest.uris=127.0.0.1:9200
# 数据库用户名
spring.elasticsearch.rest.username=elasticsearch

model layer

Define the corresponding entity class FileSearchEntity and place it in the model layer

package com.ikkkp.example.model.po.es;
import lombok.Data;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;


@Data
@Document(indexName = "file_search")
public class FileSearchEntity {

    @Id
    private Integer fileID;

    @Field(name = "analyzer_title", type = FieldType.Text, searchAnalyzer = "ik_max_word", analyzer = "ik_smart")
    private String title;

    @Field(name = "analyzer_abstract_content", type = FieldType.Text, searchAnalyzer = "ik_max_word", analyzer = "ik_smart")
    private String abstractContent;
    @Field(type = FieldType.Integer)
    private Integer size;
    @Field(name = "file_type", type = FieldType.Text)
    private String fileType;
    @Field(name = "upload_username", type = FieldType.Text)
    private String uploadUsername;

    @Field(name = "preview_picture_object_name", type = FieldType.Text)
    private String previewPictureObjectName;

    @Field(name = "payment_method", type = FieldType.Integer)
    private Integer paymentMethod;
    @Field(name = "payment_amount", type = FieldType.Integer)
    private Integer paymentAmount;

    @Field(name = "is_approved", type = FieldType.Boolean)
    private String isApproved;

    @Field(name = "hide_score", type = FieldType.Double)
    private Double hideScore;

    @Field(name = "analyzer_content",type = FieldType.Text, searchAnalyzer = "ik_max_word", analyzer = "ik_smart")
    private String content;
    @Field(name = "analyzer_keyword",type = FieldType.Keyword)
    private String keyword;

    @Field(name = "is_vip_income", type = FieldType.Text)
    private String isVipIncome;
    @Field(name = "score", type = FieldType.Text)
    private String score;
    @Field(name = "raters_num", type = FieldType.Text)
    private String ratersNum;
    @Field(name = "read_num", type = FieldType.Text)
    private String readNum;
}

service layer

Define the corresponding class ESearchService and place it in the service layer

searchFile: By creating different query conditions and options, the search results contain highlight information, which can be used to highlight matching keywords when displaying search results in the front-end interface. We use this method to search the content of the full text of the article.

suggestTitle: Real-time auto-completion suggestions can be provided in the search box, and possible completion items can be quickly displayed according to the keywords entered by the user.

package com.ikkkp.example.service.esImpl;

import com.ikkkp.example.model.po.es.FileSearchEntity;
import org.elasticsearch.common.unit.Fuzziness;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.suggest.Suggest;
import org.elasticsearch.search.suggest.SuggestBuilder;
import org.elasticsearch.search.suggest.SuggestBuilders;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import org.elasticsearch.search.suggest.completion.CompletionSuggestionBuilder;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.data.elasticsearch.core.query.NativeSearchQuery;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.stereotype.Service;

import javax.annotation.Resource;
import java.util.*;

@Service
public class ESearchService {

    @Resource
    private ElasticsearchRestTemplate elasticsearchRestTemplate;

    public SearchHits<FileSearchEntity> searchFile(String keywords, Integer page, Integer rows) {
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
                .should(QueryBuilders.fuzzyQuery("analyzer_title", keywords).fuzziness(Fuzziness.AUTO))
                .should(QueryBuilders.fuzzyQuery("analyzer_content", keywords).fuzziness(Fuzziness.AUTO))
                .should(QueryBuilders.fuzzyQuery("analyzer_abstract_content", keywords).fuzziness(Fuzziness.AUTO))
                .must(QueryBuilders.multiMatchQuery(keywords,"analyzer_title","analyzer_content","analyzer_abstract_content"))
                .must(QueryBuilders.matchQuery("is_approved", "true"));//必须是已经被核准的才能被检索出来


        //构建高亮查询
        NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
                .withQuery(boolQueryBuilder)
                .withHighlightFields(
                        new HighlightBuilder.Field("analyzer_title"),
                        new HighlightBuilder.Field("analyzer_abstract_content"),
                        new HighlightBuilder.Field("analyzer_content"))
                .withHighlightBuilder(new HighlightBuilder().preTags("<span class='highlight'>").postTags("</span>"))
                .withPageable(PageRequest.of(page - 1, rows)).build();

        SearchHits<FileSearchEntity> searchHits = elasticsearchRestTemplate.search(searchQuery, FileSearchEntity.class);
        return searchHits;
    }
 public ArrayList<String> suggestTitle(String keyword,Integer rows) {
        return suggest("suggest_title",keyword,rows);
    }

    public ArrayList<String> suggest(String fieldName, String keyword,Integer rows) {
        HashSet<String> returnSet = new LinkedHashSet<>(); // 用于存储查询到的结果
        // 创建CompletionSuggestionBuilder
        CompletionSuggestionBuilder textBuilder = SuggestBuilders.completionSuggestion(fieldName) // 指定字段名
                .size(rows) // 设定返回数量
                .skipDuplicates(true); // 去重

        // 创建suggestBuilder并将completionBuilder添加进去
        SuggestBuilder suggestBuilder = new SuggestBuilder();
        suggestBuilder.addSuggestion("suggest_text", textBuilder)
                .setGlobalText(keyword);
        // 执行请求
        Suggest suggest = elasticsearchRestTemplate.suggest(suggestBuilder, elasticsearchRestTemplate.getIndexCoordinatesFor(FileSearchEntity.class)).getSuggest();
        // 取出结果
        Suggest.Suggestion<Suggest.Suggestion.Entry<CompletionSuggestion.Entry.Option>> textSuggestion = suggest.getSuggestion("suggest_text");
        for (Suggest.Suggestion.Entry<CompletionSuggestion.Entry.Option> entry : textSuggestion.getEntries()) {
            List<CompletionSuggestion.Entry.Option> options = entry.getOptions();
            for (Suggest.Suggestion.Entry.Option option : options) {
                returnSet.add(option.getText().toString());
            }
        }
        return new ArrayList<>(returnSet);
    }
   

}

controller layer

Define the corresponding class DocSearchController and place it in the controller layer

package com.ikkkp.example.controller;

import com.ikkkp.example.model.vo.MsgEntity;
import com.ikkkp.example.model.po.es.FileSearchEntity;
import com.ikkkp.example.service.esImpl.ESearchService;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.elasticsearch.core.SearchHits;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

import java.util.ArrayList;


@RestController
@Slf4j
@RequestMapping("/docSearchService")
public class DocSearchController {

    @Autowired
    ESearchService eSearchService;

    @RequestMapping(value = "/search",method = RequestMethod.GET)
    public MsgEntity<SearchHits<FileSearchEntity>> searchDoc(@RequestParam String keywords, @RequestParam Integer page, @RequestParam Integer rows) {
        return new MsgEntity<>("SUCCESS", "1", eSearchService.searchFile(keywords, page, rows));
    }

    @RequestMapping(value = "/suggest",method = RequestMethod.GET)
    public MsgEntity<ArrayList<String>> suggestTitle(@RequestParam String keyword, @RequestParam Integer rows) {
        ArrayList<String> suggests = eSearchService.suggestTitle(keyword, rows);
        return new MsgEntity<>("SUCCESS", "1", suggests);
    }

}

Now we have basically completed the configuration of Spring Boot, but there are a few points to be clear:

We are now directly taking the corresponding data in the es database. In general, it also involves the problem of importing data into es (this data can actually be in multiple aspects). Let’s take the data on mysql as an example. The following We regularly pull mysql data to es through Logstash.

Let's first understand Logstash:

Logstash

Learn more about Logstash

Logstash is an open source data collection engine with real-time pipeline capabilities.

It receives data from multiple sources, performs data processing, and then sends the transformed information to stash, which is storage.

Logstash allows us to import data in any format into any data store, not just ElasticSearch.

It can be used to import data in parallel to other NoSQL databases such as MongoDB or Hadoop, or even to AWS.

Data can be stored in files, or passed through streams, etc.

Logstash parses, transforms and filters the data. It can also deduce structure from unstructured data, anonymize personal data, enable geolocation queries, and more.

A Logstash pipeline has two required elements, input and output, and an optional element, filter.

Input components consume data from sources, filter components transform data, and output components write data to one or more destinations.

Download from the official websiteDownload Logstash Free | Get Started Now | Elastic

Works right out of the box! ! The child is so hungry that he is about to cry 

main directory structure

Create the mysqletc folder under the home directory

Needless to say, filesearch.sql is the sql that automatically collects data from MySQL

SELECT
	file.file_id,
	
	file.title as analyzer_title,
	file.title as suggest_title,
	file.abstract_content as analyzer_abstract_content,
	
	file.size,
	file.file_type,
	file.upload_username,
	file.preview_picture_object_name,
	file.payment_amount,
	file.payment_method,
	file.is_approved,
	file.hide_score,
	
	file_search.content as analyzer_content,
	
	file_search.keyword as suggest_keyword,
	file_search.keyword as analyzer_keyword,
	
	file_extra.is_vip_income,
	file_extra.score,
	file_extra.raters_num,
	file_extra.read_num
FROM
	file,
	file_search,
	file_extra
WHERE
	file.file_id = file_search.file_id AND
	file.file_id = file_extra.file_id

mysql.conf is the configuration file executed by Logstash

input {
    stdin {
    }
    jdbc {
      # mysql 数据库链接,shop为数据库名
      jdbc_connection_string => "jdbc:mysql://localhost:3306/yourdatabase"
      # 用户名和密码
      jdbc_user => "root"
      jdbc_password => "password"
      # 驱动
      jdbc_driver_library => "../mysqletc/mysql-connector-java-8.0.28.jar"
      # 驱动类名
      jdbc_driver_class => "com.mysql.jdbc.Driver"
      jdbc_paging_enabled => "true"
      jdbc_page_size => "500"
      # 执行的sql 文件路径+名称
      statement_filepath => "../mysqletc/filesearch.sql"
      # 设置监听间隔  各字段含义(由左至右)分、时、天、月、年,全部为*默认含义为每分钟都更新
      schedule => "* * * * *"
      # 索引类型
      type => "_doc"
    }
}
 
filter {
    json {
        source => "message"
        remove_field => ["message"]
    }
    date {
		match => ["timestamp","dd/MM/yyyy:HH:mm:ss Z"]
	}
}
 
output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "file_search"
        document_type => "_doc"
        document_id => "%{file_id}"
        template_overwrite => true
        template => "../mysqletc/logstash-ik.json"
    }
    stdout {
        codec => json_lines
    }
}

Attach logstash-ik.json. This JSON configuration is used to create the Elasticsearch Index Template (Index Template), in this particular example, it is used to define the mapping and settings of the Elasticsearch index . Very important!

{
    "template": "*",
    "version": 50001,
    "settings": {
        "index.refresh_interval": "5s"
    },
    "mappings": {
            "dynamic_templates": [
                {
                    "suggest_fields": {
                        "match":"suggest_*", 
                        "match_mapping_type": "string",
                        "mapping": {
                            "type": "completion",
                            "norms": false,
                            "analyzer": "ik_max_word"
                        }
                    }
                },{
                    "analyzer_fields": {
                        "match":"analyzer_*", 
                        "match_mapping_type": "string",
                        "mapping": {
                            "type": "text",
                            "norms": false,
                            "analyzer": "ik_max_word",
                            "fields": {
                                "keyword": {
                                    "type": "keyword",
                                    "ignore_above": 256
                                }
                            }
                        }
                    }
                },{
                    "string_fields": {
                        "match": "*",
                        "match_mapping_type": "string",
                        "mapping": {
                            "type": "text",
                            "norms": false
                        }
                    }
                }
            ],
            "properties": {
                "@timestamp": {
                    "type": "date"
                },
                "@version": {
                    "type": "keyword"
                }
            }
        }
}

The bin directory executes batch processing

Finally, execute the batch K:\Data\elasticsearch-logstash\bin>logstash -f ../mysqletc/mysql.conf

At this time due to the set timing processor

# Set the monitoring interval The meaning of each field (from left to right) minute, hour, day, month, year, all are * The default meaning is updated every minute

We can see that the command line imports the data of the MySQL database into es regularly

Notice! ! ! The logstash storage directory cannot contain Chinese names

Guess you like

Origin blog.csdn.net/ikkkp/article/details/132238717