MySQL全文索引功能

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_18377515/article/details/82712882

官网地址:https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html

说明

  1. 简介
    1).MySQL中的全文索引是FultLeXT类型的索引。
    2).全文索引只能用于InnoDB或MyISAM表,只能为CHAR、VARCHAR或文本列创建。
    3).在MySQL 5.7.6中,MySQL提供了支持中文、日文和韩文(CJK)的内置全文ngram解析器,以及用于日文的可安装MeCab全文解析器插件
    4).当创建表时,可以在CREATE TABLE语句中给出FULLTEXT索引定义,或者稍后使用ALTER TABLE或CREATE INDEX添加该定义。
    5).对于大型数据集,将数据加载到没有FULLTEXT索引的表中然后创建索引要比将数据加载到具有现有FULLTEXT索引的表中快得多。

  2. 查询语法结构

MATCH (col1,col2,...) AGAINST (expr [search_modifier])
search_modifier:
  {
       IN NATURAL LANGUAGE MODE
     | IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION
     | IN BOOLEAN MODE
     | WITH QUERY EXPANSION
  }

3.全文索引的三种类型

  1. 自然语言搜索将搜索字符串解释为自然语言中短语。
  2. 布尔全文搜索
  3. 查询扩展搜索

自然语言全文索引

例子1,简单使用

CREATE SCHEMA `fulltextsearches` DEFAULT CHARACTER SET utf8 ;
mysql> CREATE TABLE articles (
          id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
          title VARCHAR(200),
          body TEXT,
          FULLTEXT (title,body)
        ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.08 sec)

mysql> INSERT INTO articles (title,body) VALUES
        ('MySQL Tutorial','DBMS stands for DataBase ...'),
        ('How To Use MySQL Well','After you went through a ...'),
        ('Optimizing MySQL','In this tutorial we will show ...'),
        ('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
        ('MySQL vs. YourSQL','In the following database comparison ...'),
        ('MySQL Security','When configured properly, MySQL ...');
Query OK, 6 rows affected (0.01 sec)
Records: 6  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM articles
        WHERE MATCH (title,body)
        AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

SELECT COUNT(*) FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----------+
| COUNT(*) |
+----------+
|        2 |
+----------+
1 row in set (0.00 sec)

说明:
对于自然语言全文搜索,MATCH()函数中命名的列必须与表中一些FULLTEXT索引中包括的列相同。对于前面的查询,请注意,MATCH()函数中命名的列(title和body)与文章表的FULLTEXT索引的定义中命名的列相同。要分别搜索标题或正文,您将为每个列创建单独的全文索引。
例子2:演示如何显式检索相关值

SELECT id, MATCH (title,body)
    AGAINST ('Tutorial' IN NATURAL LANGUAGE MODE) AS score
    FROM articles;
+----+---------------------+
| id | score               |
+----+---------------------+
|  1 | 0.22764469683170319 |
|  2 |                   0 |
|  3 | 0.22764469683170319 |
|  4 |                   0 |
|  5 |                   0 |
|  6 |                   0 |
+----+---------------------+
6 rows in set (0.00 sec)

例子3:
查询返回相关值,并且按照降低相关性的顺序排序行。为了实现这个结果,指定Match()两次:一次在SELECT列表中,一次在WHERE子句中。这不会导致额外的开销,因为MySQL优化器注意到两个MATCH()调用是相同的,并且只调用一次全文搜索代码。

SELECT id, body, MATCH (title,body) AGAINST
    ('Security implications of running MySQL as root'
    IN NATURAL LANGUAGE MODE) AS score
    FROM articles WHERE MATCH (title,body) AGAINST
    ('Security implications of running MySQL as root'
    IN NATURAL LANGUAGE MODE);

这里写图片描述

布尔全文索引

例子1:简单使用

SELECT * FROM articles WHERE MATCH (title,body)
    AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);
+----+-----------------------+-------------------------------------+
| id | title                 | body                                |
+----+-----------------------+-------------------------------------+
|  1 | MySQL Tutorial        | DBMS stands for DataBase ...        |
|  2 | How To Use MySQL Well | After you went through a ...        |
|  3 | Optimizing MySQL      | In this tutorial we will show ...   |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ... |
|  6 | MySQL Security        | When configured properly, MySQL ... |
+----+-----------------------+-------------------------------------+

支持的运算符

  1. +:表示该单词必须出现在返回的每一行中。(前缀或者后缀,但InnoDB只能放到前面)
  2. -:表示该单词不能出现在返回的任何行中。(前缀或者后缀,但InnoDB只能放到前面)
  3. no operator:该单词是可选的,但包含它的行评分较高。
  4. @distance:仅InnoDB支持,测试两个或者两个以上的单词是不是都一定距离开始, for example, MATCH(col1) AGAINST(‘“word1 word2 word3” @8’ IN BOOLEAN MODE)
  5. > <:这两个运算符用于改变单词对分配给行的相关值的贡献。>操作符增加贡献,<操作符减少它
  6. ( ):括号将单词分组成子表达式。括号组可以嵌套。
  7. ~:单词对当前的匹配行贡献是负的
  8. *:单词匹配通配符,但也受InnoDB表的innodb_ft_min_token_size设置或MyISAM表的ft_min_word_len的影响。
  9. “:匹配按字面意义包含该短语的行

相关性计算

TF-IDF 公式
这里写图片描述

mysql> CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
) ENGINE=InnoDB;
Query OK, 0 rows affected (1.04 sec)

mysql> INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','This database tutorial ...'),
("How To Use MySQL",'After you went through a ...'),
('Optimizing Your Database','In this database tutorial ...'),
('MySQL vs. YourSQL','When comparing databases ...'),
('MySQL Security','When configured properly, MySQL ...'),
('Database, Database, Database','database database database'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL Full-Text Indexes', 'MySQL fulltext indexes use a ..');                  
Query OK, 8 rows affected (0.06 sec)
Records: 8  Duplicates: 0  Warnings: 0

mysql> SELECT id, title, body, MATCH (title,body)  AGAINST ('database' IN BOOLEAN MODE)
AS score FROM articles ORDER BY score DESC;
+----+------------------------------+-------------------------------------+---------------------+
| id | title                        | body                                | score               |
+----+------------------------------+-------------------------------------+---------------------+
|  6 | Database, Database, Database | database database database          |  1.0886961221694946 |
|  3 | Optimizing Your Database     | In this database tutorial ...       | 0.36289870738983154 |
|  1 | MySQL Tutorial               | This database tutorial ...          | 0.18144935369491577 |
|  2 | How To Use MySQL             | After you went through a ...        |                   0 |
|  4 | MySQL vs. YourSQL            | When comparing databases ...        |                   0 |
|  5 | MySQL Security               | When configured properly, MySQL ... |                   0 |
|  7 | 1001 MySQL Tricks            | 1. Never run mysqld as root. 2. ... |                   0 |
|  8 | MySQL Full-Text Indexes      | MySQL fulltext indexes use a ..     |                   0 |
+----+------------------------------+-------------------------------------+---------------------+
8 rows in set (0.00 sec)

结果:
共有8个记录,其中3个匹配“数据库”搜索项。第一记录(ID 6)包含搜索项6次,并且具有1.0886961221694946的相关性排序。使用TF值6(在记录id 6中“数据库”搜索项出现6次)和IDF值0.42596873216370745(其中8是记录的总数,3是搜索项出现的记录数)来计算该排名值:

${IDF} = log10( 8 / 3 ) = 0.42596873216370745

${rank} = ${TF} * ${IDF} * ${IDF}

mysql> SELECT 6*log10(8/3)*log10(8/3);
+-------------------------+
| 6*log10(8/3)*log10(8/3) |
+-------------------------+
|       1.088696164686938 |
+-------------------------+
1 row in set (0.00 sec)

这里写图片描述

扩展查询

当搜索短语太短时,这通常很有用,这通常意味着用户依赖于全文搜索引擎缺乏的隐含知识。例如,搜索“database”的用户可能真的意味着“MySQL”、“Oracle”、“DB2”和“RDBMS”都是应该与“database”匹配并且也应该返回的短语。

例子:

mysql> SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' IN NATURAL LANGUAGE MODE);
+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  1 | MySQL Tutorial    | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+
2 rows in set (0.00 sec)

mysql> SELECT * FROM articles
    WHERE MATCH (title,body)
    AGAINST ('database' WITH QUERY EXPANSION);
+----+-----------------------+------------------------------------------+
| id | title                 | body                                     |
+----+-----------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL     | In the following database comparison ... |
|  1 | MySQL Tutorial        | DBMS stands for DataBase ...             |
|  3 | Optimizing MySQL      | In this tutorial we will show ...        |
|  6 | MySQL Security        | When configured properly, MySQL ...      |
|  2 | How To Use MySQL Well | After you went through a ...             |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ...      |
+----+-----------------------+------------------------------------------+
6 rows in set (0.00 sec)

猜你喜欢

转载自blog.csdn.net/qq_18377515/article/details/82712882