Solr is an open source search engine written in java at the bottom. It has powerful performance, flexible configuration, rich APIs, a visual front-end panel, and is easy to debug. However, the installation process is complicated and has encountered many pitfalls. The following is a summary of the installation process.
Installation environment: centos7
1. Download the solr installation package
wget http://mirrors.shuosc.org/apache/lucene/solr/7.2.1/solr-7.2.1.zip
2. Unzip unzip solr-7.2.1.zip
3. Install the java environment
3.1 Check if it is installed
java -version
If already installed, remove the old version
yum remove java-x.x.x-openjdk
3.2 Installation
View installable list
yum -y list java*
select version install all packages
yum install java-1.8.0-openjdk*
3.3 Whether the test is successful
4. Start solr
mv solr-7.2.1 solr7.2.1
cd solr7.2.1
bin/solr start -force
5. Open 8983 firewall port
firewall-cmd --zone=public --add-port=8983/tcp --permanent
systemctl restart firewalld
6. Access port 8983
If it fails to respond, first check the port opening problem. Alibaba Cloud and other cloud servers need to add port 8983 to the inbound security group.
So far, solr is installed successfully
7. Install IK word segmentation
Since solr does not support Chinese word segmentation, it is necessary to install the IK word segmentation package to solve query segmentation, index segmentation and other problems
7.1 Download IK package
Link: https://pan.baidu.com/s/1smrpBOx
Password: irdx
7.2 Unzip unzip ikanalyzer-solr6.5.zip
7.3 Copy to the specified directory
7.3.1. Copy the two jar packages to the lib directory of the solr installation directory
cp *jar /usr/local/solr/solr7.2.1/server/solr-webapp/webapp/WEB-INF/lib/
7.3.2 Create a new classes directory
mkdir /usr/local/solr/solr7.2.1/server/solr-webapp/webapp/WEB-INF/classes
7.3.2 Copy the xml file to classes
cp IKAnalyzer.cfg.xml /usr/local/solr/solr7.2.1/server/solr-webapp/webapp/WEB-INF/classes/
7.3.4 View the xml file and configure the ext.dic stopword.dic dictionary in the classes directory
7.4 Configure IK word segmentation
vim /usr/local/solr/solr7.2.1/server/solr/seo/conf/managed-schema
Here is a little explanation. The name attribute of the field is the index field, and the type is the field type. Here we write it as the text_ik that was just installed, indexed is whether the index is indexed, and stored is whether it is sorted (the id field is true, and line 113 already exists), Suppose your database has a bunch of data, which is divided into two fields: title and content, then you can configure it as shown in the figure, and then go down to the configuration of IK word segmentation, and you can copy it directly.
<!-- my field -->
<field name="title" type="text_ik" indexed="true" stored="false"/>
<field name="content" type="text_ik" indexed="true" stored="false"/>
<!-- IK tokenizer-->
<fieldType name="text_ik" class="solr.TextField">
<analyzer type="index">
<tokenizer class="org.apache.lucene.analysis.ik.IKTokenizerFactory" useSmart="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="org.apache.lucene.analysis.ik.IKTokenizerFactory" useSmart="true"/>
</analyzer>
</fieldType>
7.5 Restart solr (don't forget to restart whether you add packages or modify configurations)
/usr/local/solr/solr7.2.1/bin/solr restart -force
8. Get started with solr
8.1 New core
core can be understood as a project
bin/solr create -c seo -force
8.2 Go to the front end to check whether the IK word segmentation takes effect
8.2.1 Select core
8.2.2 Click Analysis and select text_ik
If text_ik appears, the IK word segmentation installation is successful, otherwise it fails.
8.2.3 Check word segmentation
8.2.4 Checking dictionary loading
"Samsung" is a brand name that has been cut out separately, and "star" is not cut out, and "?" has also been removed
8.3 Add index
8.3.1 There are three ways to add solr
1. Data import through the Dataimport of the panel
2. Manually add through Documents
3. Add via API
The first method will not be repeated. I personally feel that the configuration is troublesome and the use is not flexible.
For the second method, I directly used the python API. For details, see: http://blog.csdn.net/sinat_33455447/article/details/56848791
Be careful not to submit one by one. The appropriate method is to add data in groups, such as a group of 10,000, which is more efficient
8.3.2 Manual addition via Documents
Generally used for testing, click Documents, select the data form you want to submit, here is json as an example
8.4. Search Test
Click query, q enter query, df specify the search field
You can directly get the search result data by requesting the url in the red box, and you can use the wt option to specify the returned data type
There are still many details and key parameters that are limited by energy. It is hard to describe here. Please forward them a lot or add my WeChat account below to ask me questions. Your support is the driving force for me to share :)