Teach you to build a file search engine through ElasticSearch, FSCrawler and SearchUI

1. Demand analysis

  • There are a large number of equipment maintenance office documents in the company. When retrieving specific maintenance knowledge, equipment personnel need to find a list of potentially related files on the server according to the index files of the directory, and then open them one by one for retrieval. Inefficient and poor experience.
  • Users hope to use the existing document system (preparation, release, upgrade and other document control management with special personnel responsible) unchanged, and build a document search engine that can search according to key words to improve efficiency and experience.
  • This article will build a file search engine system through ElasticSearch(open source search engine), FSCrawler(file crawler, "upload" documents to elasticsearch), SearchUI(use elasticsearch to search the front-end page of the API).
    insert image description here

二、ElasticSearch

3. FSCrawle

insert image description here

  • Select y, the program will create the following configuration file in the user directory, we need to configure the task
    insert image description here
---
name: "test"
fs:
  url: "d:\\test" # 监控windows下的D盘test目录
  update_rate: "15m" # 间隔15分进行扫描
  excludes:
  - "*/~*"  #排除以~开头的文件
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
  follow_symlinks: false
elasticsearch:
  nodes:
  - url: "http://127.0.0.1:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  ssl_verification: true

insert image description here

  • After saving the configuration, we can start the FSCrawle crawler:

insert image description here
insert image description here

  • After the startup is successful, we will find an additional status file in the original configuration directory, which will record the regular running records of the file crawler:
    insert image description here
{
    
    
  "name" : "test",
  "lastrun" : "2021-11-27T09:00:16.2043064",
  "indexed" : 0,
  "deleted" : 0
}

Third, SearchUI

  • We finally download the frontend page from https://github.com/elastic/search-ui :
    insert image description here
  • After the file is unzipped, open the examples\elasticsearch directory with vscode:
    insert image description here
  • And modify the search.js, buildRequest.js, buildState.js files in turn
    insert image description here
  1. Modify search.js: set job path
    insert image description here
  2. Modify buildRequest.js
    insert image description here
    insert image description here
  3. Modify buildState.js
    insert image description here
  • Note: In order to allow users to download files directly through the file link on the search page, we build a file download service through IIS:
    insert image description here
    insert image description here
  • This address is reflected in buildState.js
    insert image description here
  1. Finally, we modify app.js to match the page returned by the search with the search field name
    insert image description here

Five, run the test

  • Install dependencies and run the program
# 安装
- npm install
# 运行
- npm start

insert image description here

  • Place files in the directory monitored by FSCrawler
    insert image description here

  • Test search performance
    insert image description here

Guess you like

Origin blog.csdn.net/jpgzhu/article/details/121515258