lucene入门例子

lucene是一个开源的全文检索引擎工具包。在学习lucene之前需要对全文检索的原理有一定的了解，http://ye-liang.iteye.com/admin/blogs/new，这篇文章对了解原理很有帮助。

我这里只附上自学过程中写的一个例子。

lucene下载地址：http://lucene.apache.org/

lucene的核心功能两大块：创建索引和搜索

创建索引的代码

		// 通过Analyzer的创建指定索引语言词汇的分析器
		Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
		// 通过Directory的创建指定索引存放位置
		Directory dir = FSDirectory.open(new File("c://indexFile"));
		// 通过IndexWriterConfig的创建指定索引版本和语言词汇分析器
		IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_47,
				analyzer);
		// 创建IndexWriter,用来写索引文件
		IndexWriter indexWriter = new IndexWriter(dir, iwc);

		for (File f : new File("c://dataFile").listFiles()) {
			Reader txtReader = new FileReader(f);
			// 通过创建Document指定要索引的文档
			Document doc = new Document();
			// 向Document文档中添加Field信息,不同类型的信息用不同类型的Field来表示
			doc.add(new TextField("path", f.getCanonicalPath(), Store.YES));
			doc.add(new TextField("content", txtReader));
			// 将Document添加到IndexWriter中并且提交
			indexWriter.addDocument(doc);
		}
		indexWriter.commit();
		indexWriter.close();

主要通过IndexWriter对象将c://dataFile下的所有文件建立索引，索引文件的目录位置c://indexFile。Document对象对应一个带搜索的文件，可以是文本文件也可以是一个网页。可以为Document对象指定field，比如这里我们为文本文件定义了两个field：path和content。

在c://dataFile目录下面新建几个txt文件，随便输入几个内容，作为待搜索的文件。

运行完上面的代码后，在c://indexFile目录下面生成了索引文件。

搜索的代码

		// 通过Analyzer的创建指定索引语言词汇的分析器
		Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_47);
		// 通过Directory的创建指定索引存放位置
		Directory dir = FSDirectory.open(new File("c://indexFile"));
		// 创建IndexReader将搜索目录读取到内存
		IndexReader indexReader = DirectoryReader.open(dir);
		// 创建IndexSearcher准备搜索
		IndexSearcher indexSearcher=new IndexSearcher(indexReader);
		// 创建QueryParser对查询语句进行语法分析
		QueryParser parser = new QueryParser(Version.LUCENE_47, "content", analyzer);
		// 根据搜索关键字创建Query生成查询语法树
		Query query = parser.parse("aa");
		//获取搜索结果
		TopDocs td=indexSearcher.search(query, 1000);
		ScoreDoc[] sds =td.scoreDocs;
		for(ScoreDoc sd : sds){
			Document d = indexSearcher.doc(sd.doc);   
            System.out.println(d.get("path"));
		}

主要通过IndexReader对象读取索引文件。通过QueryParser对象指定语法分析器和对document的哪个字段进行查询。Query对象指定搜索的关键字。通过IndexSearcher对象的search方法返回搜索结果集TopDocs。

运行完上面代码后，会输出包含关键字aa的txt文件的路径。比如C:\dataFile\1.txt

猜你喜欢