MySQL full-text indexing and like

concept

By comparison value, range filtering can complete the vast majority of inquiries we need, but if you want to query filtering by matching keywords, then you need based on the similarity of the query, rather than the original exact numerical comparison. Full-text index that is designed for such scenarios.

You might say that, with like +% can be achieved fuzzy matching, why full-text index? like +% is small in comparison text is appropriate, but for a lot of text data retrieval, is inconceivable. Full-text index before the large amount of data can be faster than like +% N times speed is not an order of magnitude, but the full-text index may present accuracy problems.

You may not have noticed the full-text index, but it should at least be familiar to one full-text indexing technology: a variety of search engines. Although search engines index object is a super large amounts of data, and usually are not behind a relational database, but the basic principle is the same full-text index.

Version Support

Before you begin, talk about full-text indexing version, the storage engine, the data type of support case

  • Previous versions of MySQL 5.6, only the MyISAM storage engine supports full-text indexing;
  • MySQL 5.6 and later versions, MyISAM and InnoDB storage engines support full-text indexing;
  • Only the field's data type is char, varchar, text and its family can build full-text indexing.

When testing or using full-text index, to look at their own version of MySQL storage engine and data type supports full-text indexing.

Use full-text indexing

Fuzzy matching and common use like +% different, full-text indexing has its own syntax, and against the use of match keywords, such as

select * from fulltext_test 
    where match(content,tag) against('xxx xxx');

Note: match () function in the specified column must be full-text indexed columns specified exactly, otherwise it will error, can not use the full-text index, because the full-text index does not record key from which column. If you want to use the full-text index on a column, create a separate column for full-text indexing.

Test full-text index

With the above knowledge, we can test the full-text index.

First, create a test table, insert test data

create table test (
    id int(11) unsigned not null auto_increment,
    content text not null,
    primary key(id),
    fulltext key content_index(content)
) engine=MyISAM default charset=utf8;

insert into test (content) values ('a'),('b'),('c');
insert into test (content) values ('aa'),('bb'),('cc');
insert into test (content) values ('aaa'),('bbb'),('ccc');
insert into test (content) values ('aaaa'),('bbbb'),('cccc');

Execute the following query syntax in accordance with the full-text index

select * from test where match(content) against('a');
select * from test where match(content) against('aa');
select * from test where match(content) against('aaa');

According to our thinking, but we should see four record fishes, but the result is a record does not, only when the following query

select * from test where match(content) against('aaaa');

Aaaa will be found to this one record.

why? The problem for many reasons, the most common is the minimum length of the search due. In addition interject, when using the full-text index, test record table must have at least four or more, otherwise, unexpected results may occur.

MySQL full-text indexing, there are two variables, the minimum length and maximum search search length, the minimum length less than the length of the search and the search term is greater than the maximum length, it will not be indexed. Popular point that is, want to use the full-text index search for a word, then the word length must be within the range of two or more variables.

The two defaults can use the following command to view

show variables like '%ft%';

We can see the variable name and default values ​​for these two variables in MyISAM and InnoDB two storage engines

// MyISAM
ft_min_word_len = 4;
ft_max_word_len = 84;

// InnoDB
innodb_ft_min_token_size = 3;
innodb_ft_max_token_size = 84;

You can see minimum default search engine is 4 length MyISAM, InnoDB engine is under 3, that is, MySQL's full-text indexing only the length of greater than or equal to 3 or 4 words indexing, and the length of just searching only aaaa greater than or equal 4.

to sum up

MySQL's full-text index initially available in English only, because there are spaces between words in English, using spaces as word delimiters is very convenient. Asian characters, such as Chinese, Japanese, and Chinese, there is no space, which resulted in certain restrictions. But MySQL 5.7.6 start, the introduction of a full-text parser ngram to solve this problem, and is valid for MyISAM and InnoDB engine.

In fact, MyISAM storage engine has a lot of restrictions on support for full-text indexing, such as the impact on the performance table-level locking, crash data files after a crash recovery, which makes the full-text index for MyISAM is not suitable for many application scenarios . Therefore, it is recommended in most cases is the use of other solutions, such as Sphinx, Lucene, and so on third-party plug-ins, or use the InnoDB storage engine also full-text index.

Note that a few points

  1. Before using full-text indexing, clear version supports the case;
  2. Full-text indexing like +% faster than N times, but there may be problems accuracy;
  3. If you need full-text indexing is a lot of data, it is recommended to add data, and then create an index;
  4. For Chinese, you can use later versions of MySQL 5.7.6, or third-party plug-ins.
  5. For the full-text index is a word, not a single match them! This means that if you retrieve the "hi" in "abcd, efg, hijklmn", then the full-text search is useless, if you retrieve efg, you can use full-text search
  6. Can be used to cover the index, primary key to solve around like %% fuzzy matching problem!

Reference article:
https://blog.csdn.net/mrzhouxiaofei/article/details/79940958

Published 107 original articles · won praise 14 · views 40000 +

Guess you like

Origin blog.csdn.net/belongtocode/article/details/102990743