use of solr

The principle of solr will not be explained to you one by one, mainly about the precautions in the use of solr

1. Solr build service

The first is to install solr, and the installation steps are omitted. . . . (Don't say I'm lazy, the installation steps are all exported...)

After success, you need to create a service for your business in solr. I want to create a service called discuz

./bin/solr create -c discuz

Then you see discuz in your solr directory: solr-5.5.3/server/solr/, which you just created, and the entire search configuration for a business is configured in this directory.

Discuz directory resolution:

1.conf : configuration file (index, word segmentation, stop word database, etc.)

2.core.properties

3.data

2. Create an index

The establishment of an index is to establish a relationship between the database and solr, which can satisfy the requirement that the database data is imported into solr regularly, and solr realizes the search. So how to create an index?

Inside solr-5.5.3/server/solr/discuz/conf/data-config.xml

<dataConfig>
    <dataSource type="JdbcDataSource"
        driver="com.mysql.jdbc.Driver"
        url="jdbc:mysql://11.11.11.11:800/database"
        user="root" password="123456"/>
    <document>
        <entity name="database" query="SELECT * FROM table;  "  
 deltaQuery="SELECT id  FROM table  where  `create` > unix_timestamp('${dataimporter.last_index_time}' ;  "
  deltaImportQuery="SELECT * FROM table where id =${dataimporter.delta.tid}"  pk='tid'>
            <field column="id" name="id" />
            <field column="subject" name="subject" />
            <field column="views" name="views" />
           <field column="vip" name="vip" />
            <field column="message" name="message" />
            <field column="create" name="create" />
        </entity>
    </document>
</dataConfig>

There is a configuration for database import in dataConfig

dataSource: database connection configuration

document: database index configuration

entity:name= database name

query=sql to fully import data

deltaQuery: id of incremental import

deltaImportQuery: incremental imported sql

Explain the full amount and the incremental amount: the normal database is imported into solr for the first time, but the database and solr data are inconsistent with the change of time. At this time, it needs to be imported again. According to the time technology, half an hour or a more suitable time, just need Incremental import can be done, and full import can be done again at a more suitable time

field These are the definitions of the returned fields

== Now that the relationship between the database and solr is over, the following is the processing of these fields by solr itself

3. Solr's configuration of display fields and search

在solr-5.5.3/server/solr/discuz/conf/solrconfig.xml

1. Add some code

<schemaFactory class="ManagedIndexSchemaFactory">
      <bool name="mutable">true</bool>
      <str name="managedSchemaResourceName">managed-schema</str>
 </schemaFactory>

Configure the index in the managed-schema file under conf

1. Configure Chinese word segmentation mmseg4j

<!-- mmseg4j-->
    <fieldType name="text_mmseg4j_complex" class="solr.TextField" positionIncrementGap="100" >  
        <analyzer type="index">  
           <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>  
           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        </analyzer>
        <analyzer type="query">
           <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>
           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        </analyzer>
    </fieldType>  
    <fieldType name="text_mmseg4j_maxword" class="solr.TextField" positionIncrementGap="100" >  
        <analyzer>  
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>
        </analyzer>  
    </fieldType>
    <fieldType name="text_mmseg4j_simple" class="solr.TextField" positionIncrementGap="100" >  
        <analyzer>  
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>     
        </analyzer>  
        </fieldType>
    <!-- mmseg4j-->

solr-5.5.3/server/solr/discuz/conf/dic: The word segmentation library directory for Chinese word segmentation, put the word.dic word segmentation file in the changed directory

stopwords.txt: stopwords file

synonyms.txt: synonyms file

analyzer type="index": Index checks word segmentation, stop words, synonyms

analyzer type="query": Find check words, stop words, synonyms

2. Configure the Chinese word segmentation type of the index field, field type, and query field

<field name="submes" type="text_mmseg4j_complex" indexed="true" stored="true" required="true" multiValued="true" />
<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="vip" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="views"    type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="create" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="subject" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="message" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
<copyField source="subject" dest="submes" />
<copyField source="vip" dest="submes" />
<copyField source="views" dest="submes" />
<copyField source="create" dest="submes" />

The field field defines the type displayed by solr

name: field name

type: field type (if it is a field to be queried, the type is Chinese word segmentation type such as: text_mmseg4j_complex)

indexed：true

stored:true

required:true

multiValued: query fields are set to true, others are false

copyField : Copy the fields that need to be indexed into the integration field. submes is the integration field in this column. Solr query also queries this field

This field can be weighted according to multiple fields, and scored and sorted

4. Restart solr

The discuz configuration has been changed, and now it needs to be restarted to take effect:

./bin/solr restart

For some large websites, restarting the entire solr back makes other services unable to search, so you can do a zookeeper for a distributed management.

5. Use

1. Incremental import, find discuz in the drop-down box on the left of your solr interface

appears below

Dataimport: is to perform incremental full import

query: search

analysis: word segmentation

The picture cannot be loaded, it is difficult to describe according to the basis, but you will know when you open this interface

2. Search scoring: Sometimes we want to sort according to the number of pageviews, creation time, whether it is VIP and other comprehensive factors in addition to sorting according to the matching degree.

In the curl request, the request parameters should be added with the following values:

$params['defType'] = 'edismax' ;

$params['bf'] = "sum(linear(vip,1000,0),linear(sqrt(log(linear(views,1,2))),100,0),sqrt(log(create)))";

The parameters in bf are the scoring standard functions supported by solr. For details, please refer to http://mxsfengg.iteye.com/blog/352191 for detailed explanation. It is very imperfect and will be improved in the future. If you have any questions, feel free to ask questions.

Guess you like