The acquisition of word segmentation results in solr is analysis

When using solr, you can find a display of the analysis of words in the analysis of the management page. In solrj, it also provides a method to obtain, take a note, and use it later.

To use solrj to obtain word segmentation results, use the FieldAnalysisRequest class.

 

    The logic of the interface provided by solrj is the same as the logic of the management page. First, it distinguishes between index building and query (for example, when ik is used, intelligent word segmentation is not applicable when indexing, but intelligent word segmentation is enabled during query). In the case of word segmentation, distinguish which field (fieldName) or field type (filedTypes) is. For the first instance on the fieldValue and query of FieldAnalysisRequest, if it is a word segmentation when creating an index, use fieldValue, otherwise use query (I have already tested it), but it is embarrassing that solrj does not support only query settings, That is, if the fieldValue is not set, there will be a null pointer exception. I guess this function does not completely imitate the word segmentation of the management page, because in the returned result, it also provides the function of match, that is, the word segmentation of query and fieldValue Whether the match is reflected, so he does not allow fieldValue to be null can only be understood as his limitation. The second embodiment is the setting of fieldNames and fieldTypes, that is, setting the names or types of multiple word segmentation fields to be matched, so as to obtain the word segmentation device. Note that multiple names and types can be set here. , and then specify the name of the domain or type to be obtained when obtaining the result.

 

    To get the result of word segmentation, you need to use solrServer, call the FieldAnalysisRequest.process(solrServer) method, and get a FieldAnalysisResponse - that is, the result of word segmentation. The result here is reflected in two maps, one is fieldName and the other is fieldType. You can obtain the processing result of the tokenizer for a certain field by calling getFieldTypeAnalysis(String type) or getFieldNameAnalysis(String name). The processing result uses Analysis package. In Analysis, there are also results when indexing and querying, that is, the word segmentation results corresponding to fieldValue and query, which correspond to the above analysis. The final word segmentation result is AnalysisPhase. For the result of indexing or querying, there are multiple AnalysisPhases, and there are multiple TokenInfos in one AnalysisPhase. I can't understand this, why not just a List<Token> Woolen cloth? Although the class name of the tokenizer is added to AnalysisPhase, I still don't understand why it is divided into two layers.

 Last code

public static void main(String[] args) throws SolrServerException, IOException {
	// link to solrCloud
	CloudSolrServer server = new CloudSolrServer("10.6.8.96:2181/shard_test");
	server.setZkClientTimeout(1000*60);
	server.setDefaultCollection("article");
	FieldAnalysisRequest request = new FieldAnalysisRequest();

	request.setFieldNames(java.util.Collections.singletonList("title"));//Multiple fieldNames or fieldTypes can be set here, but we are just here for an example
	request.setFieldValue("I'm from Shandong, China and we have a lot of delicious food there");//Set the content of the word segmentation when indexing
	request.setQuery("I'm from Shandong, China, we have a lot of delicious food there");//Set the content of the word segmentation when querying
	
	FieldAnalysisResponse response = request.process(server);
	Analysis sis = response.getFieldNameAnalysis("title");//Specify the name of the field to be obtained, because the above is setFieldNames, so here is getFieldNameAnalysis, if the above is setFieldTypes, then getFieldTypeAnalysis is called here
		
	// Get the word segmentation result of fieldValue
	Iterator<AnalysisPhase> result = sis.getIndexPhases().iterator();
	while(result.hasNext()){
		AnalysisPhase pharse = result.next();
		List<TokenInfo> list = pharse.getTokens();
        for (TokenInfo info : list) {
        	System.out.println(info.getText());//info has many properties, which are not set here
        }
	}
		
	// get the query
	result = sis.getQueryPhases().iterator();
	while(result.hasNext()){
        AnalysisPhase pharse = result.next();
		List<TokenInfo> list = pharse.getTokens();
	       for (TokenInfo info : list) {
	       	System.out.println(info.getText());
	       }
	}
}

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326641650&siteId=291194637