Solr6.2 from environment deployment to integration with mysql to Chinese tokenizer to the use of solrJ

Solr will not introduce more. is an open source tool for search functions, very powerful

 

1. Solr environment deployment

Solr itself can be started with jetty, but it is more suitable for me to start with tomcat. I have been learning solr for two weeks, and now (2016.11.27) the latest version of solr is 6.3.0, but it doesn't matter, the basic things are the same.

1. Solr official website:  http://lucene.apache.org/solr/

After downloading and unzipping, the directory structure is like this:


2. Of course, solr can be directly integrated into tomcat like many other tutorials on the Internet, but I prefer to build a new project, which is more flexible. Next, we will create a new java web project of tomcat-solr in eclipse, but it should be noted here that the access path of this project should be changed to solr, not tomcat-solr, because in the management page of solr, solr developers put this The directory is dead, we need to change the access path to solr, that is: http://localhost:8080/solr/index.html, how to change it:

In eclipse:



 

Next we do the following:

1) Copy the content under solr-6.2.1\server\solr-webapp\webapp to the webapp of our web project

2) Copy the jar package under solr-6.2.1\server\lib\ext to our tomcat-solr function

3) Copy the solr-6.2.1\server\resources\log4j.properties file to the src of the tomcat-solr function,

4) In the webapp folder of the tomcat-solr project, create a new folder solrhome, copy the folders and files under solr-6.2.1\server\solr to solrhome,

5) Modify the web.xml file and modify the value of env-entry-value to the absolute location of solrhome, such as:



 6) Start tomcat and visit http://localhost:8080/solr/index.html


At this point, the solr deployment is complete.
Our tomcat-solr function should look like this:


 

By the way, solr supports zookeeper to build cluster configuration, here we comment it out first, in the solr.xml file under the solrhome folder,

 

<solr>
<!-- Comment out the zookeeper to build the cluster configuration, in the form of master-slave
  <solrcloud>

    <str name="host">${host:}</str>
    <int name="hostPort">${jetty.port:8983}</int>
    <str name="hostContext">${hostContext:solr}</str>

    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>

    <int name="zkClientTimeout">${zkClientTimeout:30000}</int>
    <int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
    <int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>
    <str name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}</str>
    <str name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}</str>

  </solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:600000}</int>
    <int name="connTimeout">${connTimeout:60000}</int>
  </shardHandlerFactory>
-->
</solr>

 

 

 

 Second, the core configuration

The core of solr is very important. What else is there? Since I just learned it, I haven't figured it out yet. Here we can only use the simplest one. We will create a new folder under the solrhome folder: my_solr, and then put solr -6.2.1\example\example-DIH\solr\solr files and folders are copied to my_solr,


 

Visit the address just now:
 

3. Integrate mysql

1. First, we create a table in mysql as an exercise,

 

CREATE TABLE `solrTest` (
  `solrID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'ID',
  `context` varchar(255) DEFAULT NULL COMMENT 'context',
  `updateTime` datetime DEFAULT NULL COMMENT 'updateTime',
  `sort` int(11) DEFAULT '1' COMMENT 'sort',
  PRIMARY KEY (`solrID`)
) DEFAULT CHARSET=utf8;

 

2. Copy the mysql driver jar package to our tomcat-solr function (under the lib folder, this is the problem),

3. Modify the solr-data-config.xml file in the conf folder under the my_solr folder:

 

<dataConfig>

  <dataSource type="JdbcDataSource"
  	driver="com.mysql.jdbc.Driver"
  	url="jdbc:mysql://localhost:3306/test"
  	user="root"
  	password="123456"/>
  	
  <document name="solr_mysql_test">
    <entity name="solrTest"
    	pk="solrID"
    	query="select * from solrTest"
    	deltaImportQuery="select * from solrTest where solrID = '${dih.delta.solrID}'"
       	deltaQuery="select solrID from solrTest where updateTime > '${dataimporter.last_index_time}'"/>
        	
      <field column="solrID" name="solrID"/>  
       <field column="context" name="context"/>  
      <field column="updateTime" name="updateTime"/>
      <field column="sort" name="sort"/>
      
  </document>
</dataConfig>

 What does the content here mean? There are many articles and blogs on the Internet, so I won't talk about it here.

 

4. Modify the managed-schema file in the conf folder and add field under the schema tag

 

<field name="solrID" type="string" required="true" indexed="true" stored="true" multiValued="false"/>
<field name="context" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="updateTime" type="date" indexed="true" stored="true" multiValued="false"/>
<field name="sort" type="int" indexed="true" stored="true" multiValued="false"/>

 The name here corresponds to the name in solr-data-config.xml, it must be the same, otherwise solr will not be able to find the field,

 

type is the type, indexed represents whether it can be used for retrieval, stored represents whether it is stored, and multiValued represents whether there are multiple values. In addition, there are many fields, readers can move around.

(In addition, I am actually using schema.xml here, not the managed-schema file, but it feels the same)

 

Next, take a look at our achievements (don't forget to fill in the data in mysql, otherwise you won't be able to see anything):

First refresh the solr data:



 

Then check the data:



 

Fourth, let's take a look at the Chinese tokenizer

Solr's duty provides a lot of tokenizers (but it can't be used for Chinese word segmentation, angry!!!), by the way, so what is word segmentation? For example, the word "data structure", before the word segmentation, solr will search by the word "data structure" (a bit like mysql's =), after the Chinese word segmentation, it will be searched by the two words "data" and "structure" (a bit like mysql). like), this is more in line with our search habits

There are many jar packages that support solr Chinese word segmentation, such as the IKAnalyzer Chinese word segmentation we use

We first import the jar package of IKAnalyzer. Note that the jar package of IKAnalyzer should correspond to the solr version, otherwise it will be executed.

Then at the end of the managed-schema file:

 

    <!-- IKAnalyzer Chinese tokenizer. If the version does not correspond to solr, an abstract method error will be reported -->
	<fieldType name="text_ik" class="solr.TextField">
	    <!--Tokenizer when indexing -->
		<analyzer type="index" isMaxWordLength="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
		<!--The tokenizer when querying -->
		<analyzer type="query" isMaxWordLength="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
	</fieldType>

 Then, for example, the context field in our database table needs to be segmented, then modify the field:

 

 

<field name="context" type="text_ik" indexed="true" stored="true" multiValued="false"/>

 That is, modify the type to the name of the tokenizer

 

 

At this time, we are going to solr to query, let's fill in some data first:



 Check out the results:



 5. solrJ is a tool jar package that connects our java project and solr, which is very practical

Simple usage example:

 

public class SolrQueryTest {
	
	//solr server address  
    public static final String solrServerUrl = "http://localhost:8080/solr";  
    
    //core under solrhome  
    public static final String solrCroeHome = "my_solr";  
    
    @SuppressWarnings({ "resource", "deprecation" })
	public static void main(String[] args) {
    	try {
    		//SolrClient client = new HttpSolrClient(solrServerUrl + "/" + solrCroeHome);
			//QueryResponse resp = client.query(new SolrQuery("*:*"));
			
			SolrClient client = new HttpSolrClient(solrServerUrl);
			QueryResponse resp = client.query(solrCroeHome, new SolrQuery("*:*"));
			List<TestBean> lists =  resp.getBeans(TestBean.class);
			System.out.println(lists);
		} catch (SolrServerException e) {
			e.printStackTrace ();
		} catch (IOException e) {
			e.printStackTrace ();
		}
	}
}

 Bean (note that there is an annotation of solrJ):

 

 

public class TestBean {

	@Field("solrID")
	private String solrID;
	
	@Field("context")
	private String context;
	
	@Field("updateTime")
	private Date updateTime;

	public String getSolrID() {
		return solrID;
	}

	public void setSolrID(String solrID) {
		this.solrID = solrID;
	}

	public String getContext() {
		return context;
	}

	public void setContext(String context) {
		this.context = context;
	}

	public Date getUpdateTime() {
		return updateTime;
	}

	public void setUpdateTime(Date updateTime) {
		this.updateTime = updateTime;
	}

}

 Solr also has many methods when querying, which can set query conditions, such as paging, sharding retrieval, result filtering conditions, sorting, etc. The following is just an example, a code snippet of one of my functions:

 

 

		/**
		 * Set query information
		 */
	        solrQuery.setQuery(this.getQueryFields(keyWord)); //Set the basic query
	        
	        /**
	         * Set query conditions
	         */
	        solrQuery.setFilterQueries(this.getFielder(condition));
	        
	        /**
	         * Pagination
	         */
	        solrQuery.setStart(pInteger);
	        solrQuery.setRows(systemConfigureUtil.getSolrRow());
			
	        /**
	         * Fragment retrieval (categorical retrieval)——
	         */
	        solrQuery.setFacet(systemConfigureUtil.isFacet());
	        solrQuery.add("facet.field", systemConfigureUtil.getFacetStr()); //Can be classified by multiple fields, separated by commas

	        /**
	         * Set filter results - those fields can be queried
	         */
	        solrQuery.add("fl", systemConfigureUtil.getFilterFields());
	        
	        /**
	         * Sort, note that there is order
	         */
	        solrQuery.setSort(this.getSort(0)); //setSort() will overwrite the previous query condition
	        solrQuery.addSort("sort", SolrQuery.ORDER.desc); //addSort() will not overwrite the previous query conditions

 

 

So far, that's all I've learned about solr, and I'll continue to learn.

 

The tomcat-solr function I built is in my github: https://github.com/hejiawang/tomcat-solr

The practice project of solrJ: https://github.com/hejiawang/search-web

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326523669&siteId=291194637