Sphinx : High-performance SQL full-text search engine

Features of Sphinx

  • Quick index creation: An index of nearly 1 million records can be created in about 3 minutes, and the incremental index method is used to rebuild the index very quickly.
  • Lightning-fast retrieval speed: Despite the large data volume of 10 million pieces, the data query speed is above milliseconds, and the average query speed for 2-4G text volume is less than 0.1 seconds.
  • The retrieval API is designed for many scripting languages, such as PHP, Python, Perl , Ruby, etc., so you can easily call Sphinx related interfaces in most programming applications.
  • There's a storage engine plugin designed for MySQL, so if you use Sphinx on MySQL, it's a no-brainer.
  • Support distributed search, can scale system performance horizontally.

PHP+MySQL+Sphinx search engine architecture diagram

Install Sphinx in MySQL

There are two ways to install Sphinx on MySQL:

  • The first method is to use API calls. We can use the API functions of programming languages ​​such as PHP, Python, Perl , Ruby, etc. to query. This method does not need to recompile MySQL, and the changes between modules are relatively small and relatively flexible.
  • The second method is to recompile MySQL and compile Sphinx into MySQL as a plug-in. This method requires less changes to the program and only needs to change the SQL statement, but the premise is that your MySQL version must be above 5.1.

The following is the first installation method:

#Download the latest stable version 
wget http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz
tar xzvf sphinx-0.9.9.tar.gz
cd sphinx-0.9.9
./configure --prefix=/usr/local/sphinx/   --with-mysql  --enable-id64
make
make install

Sphinx Chinese word segmentation plugin Coreseek installation

Note: The installation tutorial of coreseek comes from here , the following is the detailed process:

Install and upgrade autoconf

Because coreseek requires autoconf 2.64 or later, autoconf needs to be upgraded, otherwise an error will be reported. Download autoconf-2.64.tar.bz2 from http://download.chinaunix.net/download.php?id=29328&ResourceID=648 . The installation method is as follows:

tar -jxvf autoconf-2.64.tar.bz2
cd autoconf-2.64
./configure
make
make install

download coreseek

The new version of coreseek puts the dictionary and sphinx source in one package, so just download the coreseek package.

wget http://www.wapm.cn/uploads/csft/3.2/coreseek-3.2.14.tar.gz

Install mmseg (dictionary used by coreseek)

tar xzvf coreseek-3.2.14.tar.gz
cd mmseg-3.2.14
./bootstrap     #The warning information output can be ignored. If an error occurs, it needs to be resolved./configure 
--prefix=/usr/ local /mmseg3
make && make install
cd ..

Install coreseek (sphinx)

cd csft-3.2.14
sh buildconf.sh #The     warning information output can be ignored. If an error occurs, it needs to be resolved./configure 
--prefix=/usr/ local /coreseek --without-unixodbc --with-mmseg --with-mmseg-includes=/ usr/ local /mmseg3/include/mmseg/ --with-mmseg-libs=/usr/ local /mmseg3/lib/ --with-mysql
make && make install
cd ..

Test mmseg word segmentation and coreseek search

Remarks: It is necessary to pre-set the character set to zh_CN.UTF-8 to ensure that Chinese is displayed correctly. My system character set is also en_US.UTF-8.

cd testpack
cat var/ test /test.xml #This   should display Chinese correctly 
/usr/ local /mmseg3/bin/mmseg -d /usr/ local /mmseg3/etc var/ test /test.xml
/usr/local/coreseek/bin/indexer -c etc/csft.conf --all
/usr/ local /coreseek/bin/search -c etc/csft.conf web search
At this point the correct one should return
words:
1. 'Network' : 1 documents, 1 hits
2. 'Search' : 2 documents, 5 hits

Generate mmseg thesaurus and configuration files

The new version has been automatically generated.

Summarize

As a high-performance SQL full-text search engine, Sphinx deserves continued attention from developers, especially its multi-language API support, which makes it easier for developers to apply.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326356779&siteId=291194637