Sorting is an essential feature for full-text search

Sorting is an essential function for full-text retrieval. In practical applications, sorting can bring us great convenience at certain times. For example, in some e-commerce websites such as Taobao and Jingdong, we may sort To quickly find the cheapest products, or to find the products with the highest number of comments or the best selling products by sorting, for example, in the blog column in Iteye, the latest published articles are displayed in descending order every day. Blog, with sorting, we can easily and quickly get some effective information at certain times, so the sorting function is everywhere ^_^.


So, in this article, let's take a look at how we use its rich sorting function in Lucene.

Before that, let's familiarize yourself with the basic knowledge of sorting in lucene. By default, Lucene uses the descending order of relevance as the default sorting method, which can make our search results usually optimal. Because it will try to make the first few results that are most relevant to the content we searched, without requiring us to turn pages to find the content we want the most. Compared with databases, full-text search is a huge The advantages. Of course, in actual development, we also need to provide our customers with a variety of different sorting methods according to the actual situation of the business. Let's first look at the two special basic sorting methods in Lucene. Attributes in Sort Field



Meaning of attributes in
Sort.INDEXORDER SortField.FIELD_DOC Sort by index order
Sort.RELEVANCE SortField.FIELD_SCORE Sort by relevance score


Let's look at a few methods that need to be used for retrieval.



Java code copy code collection code
1. =========SortField class============ 
2.//field is the sorting field type is the sort type 
3.public SortField(String field, Type type); 
4.//field is the sort field type is the sort type reverse is to specify ascending or descending order 
5.//reverse is true is descending false is ascending 
6. public SortField(String field, Type type, boolean reverse) 
7. 
8. =========Sort class============ 
9. public Sort();//Sort object construction method defaults to document Scoring sorting 
10. public Sort(SortField field);//A SortField for sorting 
11. public Sort(SortField... fields)//Multiple SortFields for sorting can be passed in an array 
12.  
13. ====== ===IndexSearche class r======== 
14.//query is the Query object of the query filter is the number of filters n returned sort is the sorting 
15.search(Query query, Filter filter, int n, Sort sort)  
16.//When doDocScores is true, each hit result will be scored 
. 17.//When doMaxScore is true, the search result with the maximum score will be scored 
18.search(Query query, Filter filter, int n, Sort sort, boolean doDocScores, boolean doMaxScore)   19.
1.  


Let’s take a look at the contents of the index before doing some sorting. The core code is as follows:



Code collection code
1.TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),10000);      



2. After using the default relevance score, the core code and running effect are as follows:



Java code copy code collection code
1.Sort sort=new Sort();//Relevance score is used by default 
2. TopDocs topDocs=searcher.search(new MatchAllDocsQuery(), 10000, sort); 





The reason for the garbled characters in the above picture is because lucene will not search results in the default sorting case For the scoring operation, because the scoring operation will reduce the performance, the column about the score returns a NAN string. For the needs of the format, when Sanxian uses the DecimalFormat class to reserve 2 decimal places for the scoring result, because it is A special character, so the above situation occurs.

3. Sort by date in descending order, the core code and running effect are as follows:



Java code copy code collection code
1.Sort sort=new Sort(new SortField("date", Type.INT,true));//true is descending arrangement 
2. TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),10000,sort); 




3. Sort by price in ascending order, the core code and running effect are as follows:



Java code copy code Favorite code
1.Sort sort=new Sort(new SortField("price", Type.DOUBLE,false));//false is the descending order 
2. TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),10000,sort); 




4. Multi-field sorting, in descending order by date Next, because the dates with id 7 and 8 are the same, we add a sort field to sort in ascending order by ename. The core code and running effect are as follows:



Java code copy code collection code
1.// Sort sort=new Sort( new SortField("date", Type.INT, true),new SortField("ename", Type.STRING, false)); 
2. //These two pieces of code have the same effect 
3. Sort sort=new Sort(new SortField[ ]{new SortField("date", Type.INT, true),new SortField("ename", Type.STRING, false)}); 
4. TopDocs topDocs=searcher.search(new MatchAllDocsQuery(),10000,sort); 




5. Sorting with scores, pay attention to the latter two Boolean variables can control whether to score, especially when there is no requirement to score, it is recommended not to When enabled, it will have a great impact on performance when the number is large. The results obtained by retrieving "programming" are sorted in descending order by default. The core code and running effect diagram are as follows:



Java code copy code Favorite code
1.Sort sort=Sort.RELEVANCE; 
2. TopDocs topDocs=searcher.search(new TermQuery(new Term("bookname", "programming")),null,100,sort,true,true); 




the above programming, programming because the tf of programming appears 2 during segmentation times, so it has a higher score when querying, so it ranks first.

6. Pay attention to some points
(1) Sorting does not store any fields in a document, using string sorting will rank first
(2) Sorting does not store any fields in a document, using numeric type sorting will assign values ​​to it by default Sorting for 0
(3) We can code control the document of the null value of the numeric type, which can be set to the largest, so it will be ranked at the end, the code is




as
follows SortField("value", SortField.Type.INT); 
2. sortField.setMissingValue(Integer.MAX_VALUE); 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326473129&siteId=291194637