Lucene-高效高亮搜索技术

  环境:jdk8或更高版本

  参考链接:1.   how2j-lucene

        2  . txt导入mysql数据 

  实现效果:能高效搜索(较普通数据库搜索,然后将关键字标红,效果可以放到html中查看)

  性能对比:1.能将不同相关度的结果都查询出来,而like模糊查询就做不到这一点

       2.如果数据量很大,比如下面我拿了14万条数据对比,时间差距还是很大的

 

Java测试文件:

 1.mysql数据   :下载链接 (里面包含txt数据和mysql建表源代码)

 2. jar包下载:  https://files.cnblogs.com/files/meditation5201314/lucene-lib.rar 

 3.Java文件:

 1 package com.empirefree.lucene;
 2 /**
 3 * @author Empirefree 胡宇乔:
 4 * @version 创建时间:2020年3月31日 下午5:48:13
 5 */
 6 public class Product {
 7     int id;
 8     String name;
 9     String category;
10     float price;
11     String place;
12  
13     String code;
14     public int getId() {
15         return id;
16     }
17     public void setId(int id) {
18         this.id = id;
19     }
20     public String getName() {
21         return name;
22     }
23     public void setName(String name) {
24         this.name = name;
25     }
26     public String getCategory() {
27         return category;
28     }
29     public void setCategory(String category) {
30         this.category = category;
31     }
32     public float getPrice() {
33         return price;
34     }
35     public void setPrice(float price) {
36         this.price = price;
37     }
38     public String getPlace() {
39         return place;
40     }
41     public void setPlace(String place) {
42         this.place = place;
43     }
44  
45     public String getCode() {
46         return code;
47     }
48     public void setCode(String code) {
49         this.code = code;
50     }
51     @Override
52     public String toString() {
53         return "Product [id=" + id + ", name=" + name + ", category=" + category + ", price=" + price + ", place="
54                 + place + ", code=" + code + "]";
55     }
56 }
Product.java
  1 package com.empirefree.lucene;
  2 
  3 import java.io.File;
  4 import java.io.IOException;
  5 import java.sql.Connection;
  6 import java.sql.DriverManager;
  7 import java.sql.SQLException;
  8 import java.sql.Statement;
  9 import java.util.ArrayList;
 10 import java.util.List;
 11 
 12 import org.apache.commons.io.FileUtils;
 13 import com.empirefree.lucene.JdbcConnection;
 14 import com.mysql.jdbc.ResultSet;
 15 
 16 /**
 17 * @author Empirefree 胡宇乔:
 18 * @version 创建时间:2020年3月31日 下午5:49:56
 19 */
 20 public class ProductUtil {
 21     private static final String URL="jdbc:mysql://127.0.0.1:3306/campus_system?useUnicode=true&characterEncoding=utf-8";
 22     private static final String USER="root";
 23     private static final String PASSWORD="root";
 24     
 25     private static Connection connection=null;
 26     
 27     static {
 28         try {
 29             //1.加载驱动程序
 30             Class.forName("com.mysql.jdbc.Driver");
 31             //2.获得数据库的连接
 32             connection=DriverManager.getConnection(URL, USER, PASSWORD);
 33         } catch (ClassNotFoundException e) {
 34             e.printStackTrace();
 35         } catch (SQLException e) {
 36             e.printStackTrace();
 37         }
 38     }
 39     
 40     
 41     public static Product lineproduct(String line) {
 42         Product p = new Product();
 43         String[] fields = line.split(",");
 44         p.setId(Integer.parseInt(fields[0]));
 45         p.setName(fields[1]);
 46         p.setCategory(fields[2]);
 47         p.setPrice(Float.parseFloat(fields[3]));
 48         p.setPlace(fields[4]);
 49         p.setCode(fields[5]);
 50         
 51         return p;
 52     }
 53     
 54     public static List<Product> filelist(String filename) throws IOException {
 55         File file = new File(filename);
 56         List<String> lines = FileUtils.readLines(file, "UTF-8");
 57         List<Product> products = new ArrayList<>();
 58         for(String line : lines){
 59             Product p = lineproduct(line);
 60             products.add(p);
 61         }
 62         return products;
 63     }
 64     public static List<Product> mysqllist(){
 65 //        Connection connection = new JdbcConnection().getConnection();
 66         Statement statement = null;
 67         List<Product>products = new ArrayList<>();
 68 
 69         try {
 70             //执行数据库操作语句(注意是包sql,不是mysql)
 71             statement = connection.createStatement();
 72             
 73             String sql = "select * from product";
 74             ResultSet resultSet = (ResultSet) statement.executeQuery(sql);
 75             while (resultSet.next()) {
 76                 Product product = new Product();
 77                 product.setId(resultSet.getInt("id"));
 78                 product.setName(resultSet.getString("name"));
 79                 product.setCategory(resultSet.getString("category"));
 80                 product.setPrice(resultSet.getFloat("price"));
 81                 product.setPlace(resultSet.getString("place"));
 82                 product.setCode(resultSet.getString("code"));
 83                 products.add(product);
 84             }
 85             
 86         } catch (SQLException e) {
 87             // TODO Auto-generated catch block
 88             e.printStackTrace();
 89         } finally {
 90             //数据库连接关闭:先关闭statement,后关闭connection
 91             if (statement != null) {
 92                 try {
 93                     statement.close();
 94                 } catch (SQLException e2) {
 95                     // TODO: handle exception
 96                     e2.printStackTrace();
 97                 }
 98             }
 99             if (connection != null) {
100                 try {
101                     connection.close();
102                 } catch (SQLException e2) {
103                     // TODO: handle exception
104                     e2.printStackTrace();
105                 }
106             }
107         }
108         return products;
109     }
110     
111     public static List<Product> mysqllist2(String searchname){
112 //        Connection connection = new JdbcConnection().getConnection();
113         Statement statement = null;
114         List<Product>products = new ArrayList<>();
115 
116         try {
117             //执行数据库操作语句(注意是包sql,不是mysql)
118             statement = connection.createStatement();
119             
120             String sql = "select * from product where name like  '%" + searchname + "%'";
121             ResultSet resultSet = (ResultSet) statement.executeQuery(sql);
122             while (resultSet.next()) {
123                 Product product = new Product();
124                 product.setId(resultSet.getInt("id"));
125                 product.setName(resultSet.getString("name"));
126                 product.setCategory(resultSet.getString("category"));
127                 product.setPrice(resultSet.getFloat("price"));
128                 product.setPlace(resultSet.getString("place"));
129                 product.setCode(resultSet.getString("code"));
130                 products.add(product);
131             }
132             
133         } catch (SQLException e) {
134             // TODO Auto-generated catch block
135             e.printStackTrace();
136         } finally {
137             //数据库连接关闭:先关闭statement,后关闭connection
138             if (statement != null) {
139                 try {
140                     statement.close();
141                 } catch (SQLException e2) {
142                     // TODO: handle exception
143                     e2.printStackTrace();
144                 }
145 //            }
146 //            if (connection != null) {
147 //                try {
148 //                    connection.close();
149 //                } catch (SQLException e2) {
150 //                    // TODO: handle exception
151 //                    e2.printStackTrace();
152 //                }
153             }
154         }
155         return products;
156     }
157     public static void deleteconnection() throws SQLException {
158         connection.close();
159     }
160     
161     public static void main(String[] args) throws IOException {
162         String filename = "140k_products.txt";
163 //        List<Product> products = filelist(filename);
164         List<Product> products = mysqllist();
165         for(Product name : products){
166             System.out.println(name);
167         }
168 //        System.out.println(products.size());
169         
170     }
171 }
ProductUtil.java(与mysql的连接,单独写成一个文件,方便以后调用)
  1 package com.empirefree.lucene;
  2 /**
  3 * @author Empirefree 胡宇乔:
  4 * @version 创建时间:2020年3月31日 下午5:45:39
  5 */
  6 
  7 import java.io.IOException;
  8 import java.io.StringReader;
  9 import java.util.List;
 10 import java.util.Scanner;
 11  
 12 import org.apache.lucene.analysis.TokenStream;
 13 import org.apache.lucene.document.Document;
 14 import org.apache.lucene.document.Field;
 15 import org.apache.lucene.document.TextField;
 16 import org.apache.lucene.index.DirectoryReader;
 17 import org.apache.lucene.index.IndexReader;
 18 import org.apache.lucene.index.IndexWriter;
 19 import org.apache.lucene.index.IndexWriterConfig;
 20 import org.apache.lucene.index.IndexableField;
 21 import org.apache.lucene.queryparser.classic.QueryParser;
 22 import org.apache.lucene.search.IndexSearcher;
 23 import org.apache.lucene.search.Query;
 24 import org.apache.lucene.search.ScoreDoc;
 25 import org.apache.lucene.search.highlight.Highlighter;
 26 import org.apache.lucene.search.highlight.QueryScorer;
 27 import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
 28 import org.apache.lucene.store.Directory;
 29 import org.apache.lucene.store.RAMDirectory;
 30 import org.wltea.analyzer.lucene.IKAnalyzer;
 31 
 32 
 33 public class TestLucene2 {
 34     
 35     private static Directory createIndex(IKAnalyzer analyzer) throws IOException {
 36         Directory index = new RAMDirectory();
 37         IndexWriterConfig config = new IndexWriterConfig(analyzer);
 38         IndexWriter writer = new IndexWriter(index, config);
 39         String fileName = "140k_products.txt";
 40         
 41 //        List<Product> products = ProductUtil.filelist(fileName);
 42         List<Product> products = ProductUtil.mysqllist();
 43         int total = products.size();
 44         int count = 0;
 45         int per = 0;
 46         int oldPer = 0;
 47         for (Product p : products) {
 48             addDoc(writer, p);
 49             count++;
 50             per = count*100/total;
 51             if(per!=oldPer){
 52                 oldPer = per;
 53                 System.out.printf("索引中,总共要添加 %d 条记录,当前添加进度是: %d%% %n",total,per);
 54             }
 55         }
 56         writer.close();
 57         return index;
 58     }
 59  
 60     private static void addDoc(IndexWriter w, Product p) throws IOException {
 61         Document doc = new Document();
 62 //        doc.add(new TextField("id", String.valueOf(p.getId()), Field.Store.YES));
 63         doc.add(new TextField("name", p.getName(), Field.Store.YES));
 64 //        doc.add(new TextField("category", p.getCategory(), Field.Store.YES));
 65 //        doc.add(new TextField("price", String.valueOf(p.getPrice()), Field.Store.YES));
 66 //        doc.add(new TextField("place", p.getPlace(), Field.Store.YES));
 67 //        doc.add(new TextField("code", p.getCode(), Field.Store.YES));
 68         w.addDocument(doc);
 69     }
 70     
 71     private static void showSearchResults(IndexSearcher searcher, ScoreDoc[] hits, Query query, IKAnalyzer analyzer) throws Exception {
 72         System.out.println("找到 " + hits.length + " 个命中.");
 73  
 74         SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
 75         Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
 76  
 77         System.out.println("找到 " + hits.length + " 个命中.");
 78         System.out.println("序号\t匹配度得分\t结果");
 79         for (int i = 0; i < hits.length; ++i) {
 80             ScoreDoc scoreDoc= hits[i];
 81             int docId = scoreDoc.doc;
 82             Document d = searcher.doc(docId);
 83             List<IndexableField> fields= d.getFields();
 84             System.out.print((i + 1) );
 85             System.out.print("\t" + scoreDoc.score);
 86             for (IndexableField f : fields) {
 87  
 88                 if("name".equals(f.name())){
 89                     TokenStream tokenStream = analyzer.tokenStream(f.name(), new StringReader(d.get(f.name())));
 90                     String fieldContent = highlighter.getBestFragment(tokenStream, d.get(f.name()));
 91                     System.out.print("\t"+fieldContent);
 92                     System.out.print("?????????\n");
 93                 }
 94                 else{
 95                     System.out.print("\t"+d.get(f.name()));
 96                 }
 97             }
 98             System.out.println("<br>");
 99         }
100     }
101  
102     
103     
104     public static void main(String[] args) throws Exception {
105         Scanner s = new Scanner(System.in);
106         System.out.print("请输入查询关键字:");
107         String keyword = s.nextLine();
108         System.out.println("当前关键字是:"+keyword);
109         long startTime = System.currentTimeMillis();
110         List<Product> products = ProductUtil.mysqllist2(keyword);
111         long endTime = System.currentTimeMillis();
112         System.out.println("Like程序运行时间:" + (endTime - startTime) + "ns");
113         
114         for(Product name : products){
115             System.out.println(name.getName());
116         }
117        
118         /******************************************************************************/
119         // 1. 准备中文分词器
120         IKAnalyzer analyzer = new IKAnalyzer();
121         // 2. 索引
122         Directory index = createIndex(analyzer);
123         
124         // 3. 查询器
125         s = new Scanner(System.in);
126         System.out.print("请输入查询关键字:");
127         keyword = s.nextLine();
128         System.out.println("当前关键字是:"+keyword);
129         Query query = new QueryParser("name", analyzer).parse(keyword);
130         
131         startTime = System.currentTimeMillis();
132         // 4. 搜索
133         IndexReader reader = DirectoryReader.open(index);
134         IndexSearcher searcher=new IndexSearcher(reader);
135         int numberPerPage = 10;
136         ScoreDoc[] hits = searcher.search(query, numberPerPage).scoreDocs;
137         endTime = System.currentTimeMillis();
138         System.out.println("Lucene程序运行时间:" + (endTime - startTime) + "ns");
139         
140         // 5. 显示查询结果
141         showSearchResults(searcher, hits,query,analyzer);
142         // 6. 关闭查询
143         reader.close();
144        
145         ProductUtil.deleteconnection();
146     }
147 }
TestLucene2.java-数据库

TestLucene2.java注意点:

1.我将Product全提取出来了,如果只需要查name(或者username等更改即可),dou.add就注释掉其他内容

2.dou.add(中,p.getID()是int就要转成String)

3.最后输出结果可以用List保存下来,然后前端EL表达式显示即可(也可以控制标题显示数目)

  Lucene讲解:

    1.addDou():将Product赋值,方便后面查询

    2.createIndex():创建索引,同时调用mysqllist()连接数据库(存储数据)和addDou,完成存储数据

    3.showSearchResults():在上面存储数据返回的结果中搜索数据,然后标红.

    详细过程:先是创建内存索引(createIndex()函数,普通like是数据库查询,而Lucene是先加载到内存中,然后再查询,就是加载一次,到处查询的样子),创建内存索引Directory的时候,

将查询对象属性Product全加载到Document中(这样后面无论查Product的什么内容都可以查,只需要修改name成别的就行)。

----------------------------------------------------扩展知识--------------------------------------------------------

1.mysql连接:普通mysql就是连接,然后close,但是开发时候很多次都要查询,所以就写成static,然后调用deleteconnection就可以删除连接了

(详细过程见ProductUtil.java)

2.

 txt导入数据到mysql表中:

LOAD DATA INFILE 'E:/xxx.txt' 
REPLACE INTO TABLE test FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'

txt数据格式应该如下所示

猜你喜欢

转载自www.cnblogs.com/meditation5201314/p/12612057.html
今日推荐