Full-text search algorithm function implementation of search engine (based on Lucene)

When I was working on Go to Turntable before, I had already disclosed the code for non-full-text search, and friends who needed it hoped to be able to read my blog. This article mainly discusses how to conduct full-text search. Since I spent a long time designing a new work: Opinions , Opinions still have high requirements for full-text search, so I spent a lot of time studying full-text search. You can experience it first: Click me to search . Not much nonsense, go directly to the code:

public Map<String,Object>  articleSearchAlgorithms(SearchCondition condition,IndexSearcher searcher) throws ParseException, IOException{
         
            Map<String,Object> map =new HashMap<String,Object>();
             String[] filedsList=condition.getFiledsList();
             String keyWord=condition.getKeyWord();
             int currentPage=condition.getCurrentPage();
             int pageSize=condition.getPageSize();
             String sortField=condition.getSortField();
             boolean isASC=condition.isDESC();
             String sDate=condition.getsDate();
            String eDate = condition.geteDate ();
            String classify=condition.getClassify();
             
            
            // filter terminator
            keyWord=escapeExprSpecialWord(keyWord);
            
            BooleanQuery q1 = new BooleanQuery();
            BooleanQuery q2 = new BooleanQuery();
             BooleanQuery booleanQuery = new BooleanQuery(); //boolean查询
             
             if(classify!=null&&(classify.equals("guanzhi")||classify.equals("opinion")||classify.equals("write"))){
                 String typeId="1";//Default speech
                 if(classify.equals("guanzhi")){
                     typeId="2";
                 }
                 if(classify.equals("opinion")){
                     typeId="3";
                 }
                 Query termQuery = new TermQuery(new Term("typeId",typeId));
                 q1.add(termQuery,BooleanClause.Occur.MUST);
             }

             if(sDate!=null&&eDate!=null){//Whether the range query is determined by these two parameters
                Query rangeQuery = new TermRangeQuery("writingTime", new BytesRef(sDate), new BytesRef(eDate),true, true);
                q1.add(rangeQuery,BooleanClause.Occur.MUST);
             }

            Sort sort = new Sort(); // sort
            sort.setSort(SortField.FIELD_SCORE);
            if(sortField!=null){
                sort.setSort(new SortField(sortField, SortField.Type.STRING, isASC));
            }
            
            int start = (currentPage - 1) * pageSize;
            int hm = start + pageSize;
            
            TopFieldCollector res = TopFieldCollector.create(sort,hm,false, false, false, false);

            //exact match query
            Term t0=new Term(filedsList[1],keyWord);
            TermQuery termQuery = new TermQuery(t0);//Two highly matching queries
            q2.add(termQuery,BooleanClause.Occur.SHOULD);
            
            // prefix match
            Term t1=new Term(filedsList[1],keyWord);
            PrefixQuery prefixQuery=new PrefixQuery(t1);
            q2.add(prefixQuery,BooleanClause.Occur.SHOULD);
            
            //phrase, similarity matching, suitable for word segmentation content
            for(int i=0;i<filedsList.length;i++){ //Multiple field term query algorithm
                if(i!=1){
                    PhraseQuery phraseQuery=new PhraseQuery();
                    Term ts0=new Term(filedsList[i],keyWord);
                    phraseQuery.add(ts0);
                    
                    FuzzyQuery fQuery=new FuzzyQuery(new Term(filedsList[i],keyWord),2);//Last similarity query
                    
                    q2.add(phraseQuery,BooleanClause.Occur.SHOULD);
                    q2.add(fQuery,BooleanClause.Occur.SHOULD);//Suffix similar to take out
                }
            }

            MultiFieldQueryParser  queryParser = new MultiFieldQueryParser(Version.LUCENE_47,filedsList,analyzer);
            queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
            Query query = queryParser.parse(keyWord);

            q2.add(query,BooleanClause.Occur.SHOULD);
            
            //must add logical judgment, otherwise the result will be different
            if(q1!=null && q1.toString().length()>0){
                booleanQuery.add(q1,BooleanClause.Occur.MUST);
            }
            if(q2!=null && q2.toString().length()>0){
                 booleanQuery.add(q2,BooleanClause.Occur.MUST);
            }
            
            searcher.search(booleanQuery, res);
            long amount = res.getTotalHits();
            TopDocs tds = res.topDocs(start, pageSize);
            map.put("amount",amount);
            map.put("tds",tds);
            map.put("query",booleanQuery);
            return map;
    }

Note: The search condition (SearchCondition) of the above code is the specific requirement of Viewpoint . You can make changes according to your own search conditions, and it is difficult to adapt to all readers here.

public Map<String, Object> searchArticle(SearchCondition condition) throws Exception{
            
        Map<String,Object> map =new HashMap<String,Object>();
        List<Write> list=new ArrayList<Write>();
        
         DirectoryReader reader=condition.getReader();
         String URL=condition.getURL();
         boolean isHighligth=condition.isHighlight();
         String keyWord=condition.getKeyWord();
         IndexSearcher searcher=getSearcher(reader,URL);
        
        try{
            Map<String,Object> output=articleSearchAlgorithms(condition,searcher);
            if(output==null){
                map.put("amount",0L);
                map.put("source",null);
                return map;
            }
            
            map.put("amount", output.get("amount"));
            TopDocs tds = (TopDocs) output.get("tds");
            ScoreDoc[] sd = tds.scoreDocs;
            Query query =(Query) output.get("query");
            
            for (int i = 0; i < sd.length; i++) {
                
                Document doc = searcher.doc(sd[i].doc);

                String id = doc.get("id");
                /************************start**************************Required Put it all together ********************/
                String temp=doc.get("title");
                String title =temp; //Not highlighted by default
                if(isHighligth){
                    // Highlight article title
                    Highlighter highlighterTitle = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
                    highlighterTitle.setTextFragmenter(new SimpleFragmenter(40)); // word length
                    TokenStream ts = analyzer.tokenStream("title", new StringReader(temp));
                    title= highlighterTitle.getBestFragment(ts,temp);
                    if(title==null){
                        title=temp.replace(keyWord,"<span style='color:red'>"+keyWord+"</span>");//Highlight the plugin bug, add this sentence to avoid
                    }
                }
                
                String temp1=HtmlEnDecode.htmlEncode(doc.get("content"));
                String content=temp1;//Use your own encapsulated method to escape
                
                if(isHighligth){
                    //Do highlighting, content
                    Highlighter highlighterContent = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
                    highlighterContent.setTextFragmenter(new SimpleFragmenter(Constant.HIGHLIGHT_CONTENT_LENGTH)); // 字长度
                    //temp1=StringEscapeUtils.escapeHtml(temp1);//Escape Chinese characters to cause highlight failure
                    TokenStream ts1 = analyzer.tokenStream("content", new StringReader(temp1));
                    content = highlighterContent.getBestFragment(ts1,temp1);
                    
                    if(content==null){
                        content=temp1.replace(keyWord,"<span style='color:red'>"+keyWord+"</span>");//Highlight the plugin bug, add this sentence to avoid
                        
                        //Assuming to deal with this situation, other highlighters will automatically take screenshots
                        content=subContent(content);//Intercept processing
                        content=HtmlEnDecode.htmldecode(content);//html解码
                        content=SubStringHTML.sub(content,Constant.HIGHLIGHT_CONTENT_LENGTH);
                    }
                }
                /*---------------------------------------- --------------------------*/
                
                Write write=writeDao.getArticle(Long.parseLong(id));
                if(write!=null){
                    write.setTitle(title);
                    write.setContent(content);
                    
                    Date writingTime=write.getWritingTime();
                    String timeGap = DateUtil.dateGap (writingTime); // timeGap
                    write.setTimeGap(timeGap);
                    
                    list.add(write);
                }
            }
            
        }catch(Exception e){
            e.printStackTrace ();
        }
        map.put("source",list);
        return map;
    }

Note the above, this is a specific search code. Different application scenarios have different requirements. Please encapsulate objects and query databases according to your own needs. The code is unreserved and absolutely usable.

If you have any questions, you can add 99 group: 284205104 If the group is full, please go to the turntable to find the latest group to add, thank you for reading.

Full-text search algorithm function implementation of search engine (based on Lucene)

Guess you like