use of solr

The principle of solr will not be explained to you one by one, mainly about the precautions in the use of solr

1. Solr build service

 The first is to install solr, and the installation steps are omitted. . . . (Don't say I'm lazy, the installation steps are all exported...)

After success, you need to create a service for your business in solr. I want to create a service called discuz

./bin/solr create -c discuz

 

Then you see discuz in your solr directory: solr-5.5.3/server/solr/, which you just created, and the entire search configuration for a business is configured in this directory.

Discuz directory resolution:

    1.conf : configuration file (index, word segmentation, stop word database, etc.)

    2.core.properties

    3.data

2. Create an index

   The establishment of an index is to establish a relationship between the database and solr, which can satisfy the requirement that the database data is imported into solr regularly, and solr realizes the search. So how to create an index? 

   Inside solr-5.5.3/server/solr/discuz/conf/data-config.xml

 

<dataConfig>
    <dataSource type="JdbcDataSource"
        driver="com.mysql.jdbc.Driver"
        url="jdbc:mysql://11.11.11.11:800/database"
        user="root" password="123456"/>
    <document>
        <entity name="database" query="SELECT * FROM table;  "  
 deltaQuery="SELECT id  FROM table  where  `create` > unix_timestamp('${dataimporter.last_index_time}' ;  "
  deltaImportQuery="SELECT * FROM table where id =${dataimporter.delta.tid}"  pk='tid'>
            <field column="id" name="id" />
            <field column="subject" name="subject" />
            <field column="views" name="views" />
           <field column="vip" name="vip" />
            <field column="message" name="message" />
            <field column="create" name="create" />
        </entity>
    </document>
</dataConfig>

 

 

  There is a configuration for database import in dataConfig

   dataSource: database connection configuration

   document: database index configuration

       entity:name= database name

                   query=sql to fully import data

                  deltaQuery: id of incremental import

                  deltaImportQuery: incremental imported sql  

 Explain the full amount and the incremental amount: the normal database is imported into solr for the first time, but the database and solr data are inconsistent with the change of time. At this time, it needs to be imported again. According to the time technology, half an hour or a more suitable time, just need Incremental import can be done, and full import can be done again at a more suitable time

               field These are the definitions of the returned fields

 

== Now that the relationship between the database and solr is over, the following is the processing of these fields by solr itself

3. Solr's configuration of display fields and search

   在solr-5.5.3/server/solr/discuz/conf/solrconfig.xml

   1. Add some code

<schemaFactory class="ManagedIndexSchemaFactory">
      <bool name="mutable">true</bool>
      <str name="managedSchemaResourceName">managed-schema</str>
 </schemaFactory>

 

 

   Configure the index in the managed-schema file under conf

 1. Configure Chinese word segmentation mmseg4j

 

<!-- mmseg4j-->
    <fieldType name="text_mmseg4j_complex" class="solr.TextField" positionIncrementGap="100" >  
        <analyzer type="index">  
           <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>  
           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        </analyzer>
        <analyzer type="query">
           <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>
           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        </analyzer>
    </fieldType>  
    <fieldType name="text_mmseg4j_maxword" class="solr.TextField" positionIncrementGap="100" >  
        <analyzer>  
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>
        </analyzer>  
    </fieldType>
    <fieldType name="text_mmseg4j_simple" class="solr.TextField" positionIncrementGap="100" >  
        <analyzer>  
            <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/>     
        </analyzer>  
        </fieldType>
    <!-- mmseg4j-->

   solr-5.5.3/server/solr/discuz/conf/dic: The word segmentation library directory for Chinese word segmentation, put the word.dic word segmentation file in the changed directory

 stopwords.txt: stopwords file

 synonyms.txt: synonyms file

    analyzer type="index": Index checks word segmentation, stop words, synonyms

   analyzer type="query": Find check words, stop words, synonyms

2. Configure the Chinese word segmentation type of the index field, field type, and query field

 

<field name="submes" type="text_mmseg4j_complex" indexed="true" stored="true" required="true" multiValued="true" />
<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="vip" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="views"    type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="create" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="subject" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="message" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
<copyField source="subject" dest="submes" />
<copyField source="vip" dest="submes" />
<copyField source="views" dest="submes" />
<copyField source="create" dest="submes" />

 The field field defines the type displayed by solr

   name: field name

   type: field type (if it is a field to be queried, the type is Chinese word segmentation type such as: text_mmseg4j_complex)

   indexed:true

    stored:true

   required:true

        multiValued: query fields are set to true, others are false

copyField : Copy the fields that need to be indexed into the integration field. submes is the integration field in this column. Solr query also queries this field

  This field can be weighted according to multiple fields, and scored and sorted

 

 

4. Restart solr

  The discuz configuration has been changed, and now it needs to be restarted to take effect:

  ./bin/solr restart 

  For some large websites, restarting the entire solr back makes other services unable to search, so you can do a zookeeper for a distributed management.

 

 

 5. Use

1. Incremental import, find discuz in the drop-down box on the left of your solr interface 

  appears below

  Dataimport: is to perform incremental full import

  query: search

  analysis: word segmentation

 

 

The picture cannot be loaded, it is difficult to describe according to the basis, but you will know when you open this interface

 

2. Search scoring: Sometimes we want to sort according to the number of pageviews, creation time, whether it is VIP and other comprehensive factors in addition to sorting according to the matching degree.

In the curl request, the request parameters should be added with the following values:

       $params['defType'] = 'edismax' ;

       $params['bf'] = "sum(linear(vip,1000,0),linear(sqrt(log(linear(views,1,2))),100,0),sqrt(log(create)))";

    

  The parameters in bf are the scoring standard functions supported by solr. For details, please refer to http://mxsfengg.iteye.com/blog/352191 for detailed explanation. It is very imperfect and will be improved in the future. If you have any questions, feel free to ask questions.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327061897&siteId=291194637