Features of Sphinx
- Quick index creation: An index of nearly 1 million records can be created in about 3 minutes, and the incremental index method is used to rebuild the index very quickly.
- Lightning-fast retrieval speed: Despite the large data volume of 10 million pieces, the data query speed is above milliseconds, and the average query speed for 2-4G text volume is less than 0.1 seconds.
- The retrieval API is designed for many scripting languages, such as PHP, Python, Perl , Ruby, etc., so you can easily call Sphinx related interfaces in most programming applications.
- There's a storage engine plugin designed for MySQL, so if you use Sphinx on MySQL, it's a no-brainer.
- Support distributed search, can scale system performance horizontally.
PHP+MySQL+Sphinx search engine architecture diagram
Install Sphinx in MySQL
There are two ways to install Sphinx on MySQL:
- The first method is to use API calls. We can use the API functions of programming languages such as PHP, Python, Perl , Ruby, etc. to query. This method does not need to recompile MySQL, and the changes between modules are relatively small and relatively flexible.
- The second method is to recompile MySQL and compile Sphinx into MySQL as a plug-in. This method requires less changes to the program and only needs to change the SQL statement, but the premise is that your MySQL version must be above 5.1.
The following is the first installation method:
#Download the latest stable version
wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
tar xzvf sphinx-0.9.9.tar.gz
cd sphinx-0.9.9
./configure --prefix=/usr/local/sphinx/ --with-mysql --enable-id64
make
make install
Sphinx Chinese word segmentation plugin Coreseek installation
Note: The installation tutorial of coreseek comes from here , the following is the detailed process:
Install and upgrade autoconf
Because coreseek requires autoconf 2.64 or later, autoconf needs to be upgraded, otherwise an error will be reported. Download autoconf-2.64.tar.bz2 from http://download.chinaunix.net/download.php?id=29328&ResourceID=648 . The installation method is as follows:
tar -jxvf autoconf-2.64.tar.bz2
cd autoconf-2.64
./configure
make
make install
download coreseek
The new version of coreseek puts the dictionary and sphinx source in one package, so just download the coreseek package.
wget http://www.wapm.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gz
Install mmseg (dictionary used by coreseek)
tar xzvf coreseek-3.2.14.tar.gz
cd mmseg-3.2.14
./bootstrap #The warning information output can be ignored. If an error occurs, it needs to be resolved./configure
--prefix=/usr/ local /mmseg3
make && make install
cd ..
Install coreseek (sphinx)
cd csft-3.2.14
sh buildconf.sh #The warning information output can be ignored. If an error occurs, it needs to be resolved./configure
--prefix=/usr/ local /coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/ usr/ local /mmseg3/include/mmseg/ --with-mmseg-libs=/usr/ local /mmseg3/lib/ --with-mysql
make && make install
cd ..
Test mmseg word segmentation and coreseek search
Remarks: It is necessary to pre-set the character set to zh_CN.UTF-8 to ensure that Chinese is displayed correctly. My system character set is also en_US.UTF-8.
cd testpack
cat var/ test /test.xml #This should display Chinese correctly
/usr/ local /mmseg3/bin/mmseg -d /usr/ local /mmseg3/etc var/ test /test.xml
/usr/local/coreseek/bin/indexer -c etc/csft.conf --all
/usr/ local /coreseek/bin/search -c etc/csft.conf web search
At this point the correct one should return
words:
1. 'Network' : 1 documents, 1 hits
2. 'Search' : 2 documents, 5 hits
Generate mmseg thesaurus and configuration files
The new version has been automatically generated.
Summarize
As a high-performance SQL full-text search engine, Sphinx deserves continued attention from developers, especially its multi-language API support, which makes it easier for developers to apply.