The principle of solr will not be explained to you one by one, mainly about the precautions in the use of solr
1. Solr build service
The first is to install solr, and the installation steps are omitted. . . . (Don't say I'm lazy, the installation steps are all exported...)
After success, you need to create a service for your business in solr. I want to create a service called discuz
./bin/solr create -c discuz
Then you see discuz in your solr directory: solr-5.5.3/server/solr/, which you just created, and the entire search configuration for a business is configured in this directory.
Discuz directory resolution:
1.conf : configuration file (index, word segmentation, stop word database, etc.)
2.core.properties
3.data
2. Create an index
The establishment of an index is to establish a relationship between the database and solr, which can satisfy the requirement that the database data is imported into solr regularly, and solr realizes the search. So how to create an index?
Inside solr-5.5.3/server/solr/discuz/conf/data-config.xml
<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://11.11.11.11:800/database" user="root" password="123456"/> <document> <entity name="database" query="SELECT * FROM table; " deltaQuery="SELECT id FROM table where `create` > unix_timestamp('${dataimporter.last_index_time}' ; " deltaImportQuery="SELECT * FROM table where id =${dataimporter.delta.tid}" pk='tid'> <field column="id" name="id" /> <field column="subject" name="subject" /> <field column="views" name="views" /> <field column="vip" name="vip" /> <field column="message" name="message" /> <field column="create" name="create" /> </entity> </document> </dataConfig>
There is a configuration for database import in dataConfig
dataSource: database connection configuration
document: database index configuration
entity:name= database name
query=sql to fully import data
deltaQuery: id of incremental import
deltaImportQuery: incremental imported sql
Explain the full amount and the incremental amount: the normal database is imported into solr for the first time, but the database and solr data are inconsistent with the change of time. At this time, it needs to be imported again. According to the time technology, half an hour or a more suitable time, just need Incremental import can be done, and full import can be done again at a more suitable time
field These are the definitions of the returned fields
== Now that the relationship between the database and solr is over, the following is the processing of these fields by solr itself
3. Solr's configuration of display fields and search
在solr-5.5.3/server/solr/discuz/conf/solrconfig.xml
1. Add some code
<schemaFactory class="ManagedIndexSchemaFactory"> <bool name="mutable">true</bool> <str name="managedSchemaResourceName">managed-schema</str> </schemaFactory>
Configure the index in the managed-schema file under conf
1. Configure Chinese word segmentation mmseg4j
<!-- mmseg4j--> <fieldType name="text_mmseg4j_complex" class="solr.TextField" positionIncrementGap="100" > <analyzer type="index"> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> </analyzer> <analyzer type="query"> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="complex" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> </analyzer> </fieldType> <fieldType name="text_mmseg4j_maxword" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="max-word" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/> </analyzer> </fieldType> <fieldType name="text_mmseg4j_simple" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="com.chenlb.mmseg4j.solr.MMSegTokenizerFactory" mode="simple" dicPath="solr-5.5.3/server/solr/discuz/conf/dic"/> </analyzer> </fieldType> <!-- mmseg4j-->
solr-5.5.3/server/solr/discuz/conf/dic: The word segmentation library directory for Chinese word segmentation, put the word.dic word segmentation file in the changed directory
stopwords.txt: stopwords file
synonyms.txt: synonyms file
analyzer type="index": Index checks word segmentation, stop words, synonyms
analyzer type="query": Find check words, stop words, synonyms
2. Configure the Chinese word segmentation type of the index field, field type, and query field
<field name="submes" type="text_mmseg4j_complex" indexed="true" stored="true" required="true" multiValued="true" />
<field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="vip" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="views" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="create" type="int" indexed="true" stored="true" required="true" multiValued="false" />
<field name="subject" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="message" type="text_general" indexed="true" stored="true" required="true" multiValued="false" />
<copyField source="subject" dest="submes" />
<copyField source="vip" dest="submes" />
<copyField source="views" dest="submes" />
<copyField source="create" dest="submes" />
The field field defines the type displayed by solr
name: field name
type: field type (if it is a field to be queried, the type is Chinese word segmentation type such as: text_mmseg4j_complex)
indexed:true
stored:true
required:true
multiValued: query fields are set to true, others are false
copyField : Copy the fields that need to be indexed into the integration field. submes is the integration field in this column. Solr query also queries this field
This field can be weighted according to multiple fields, and scored and sorted
4. Restart solr
The discuz configuration has been changed, and now it needs to be restarted to take effect:
./bin/solr restart
For some large websites, restarting the entire solr back makes other services unable to search, so you can do a zookeeper for a distributed management.
5. Use
1. Incremental import, find discuz in the drop-down box on the left of your solr interface
appears below
Dataimport: is to perform incremental full import
query: search
analysis: word segmentation
The picture cannot be loaded, it is difficult to describe according to the basis, but you will know when you open this interface
2. Search scoring: Sometimes we want to sort according to the number of pageviews, creation time, whether it is VIP and other comprehensive factors in addition to sorting according to the matching degree.
In the curl request, the request parameters should be added with the following values:
$params['defType'] = 'edismax' ;
$params['bf'] = "sum(linear(vip,1000,0),linear(sqrt(log(linear(views,1,2))),100,0),sqrt(log(create)))";
The parameters in bf are the scoring standard functions supported by solr. For details, please refer to http://mxsfengg.iteye.com/blog/352191 for detailed explanation. It is very imperfect and will be improved in the future. If you have any questions, feel free to ask questions.