Oracle fuzzy query optimization

Fuzzy query the database query is frequently used, commonly used format is as follows:

(1) field like '% keyword%' field contains the record "keyword" even index in the target field will not take the index, the slowest  

(2) The field like '% keywords' field to "key" may be used to record the beginning of ascending index field created in the target

(3) field like '% keyword' field to "key" the end of the record can be used to establish a field goal descending indexes

 

Who can not use the index '% keyword%' mode, there is no way to optimize it, the answer is yes,

ORacle provided in the instr (strSource, strTarget) function, much higher than '% keyword%' efficiency mode.

instr Function Description:

INSTR

  (Source string, the target string, the starting position, matching serial number)

  In Oracle / PLSQL in, instr function returns the position of the character string to be taken in the source string. Retrieved only once, that is to say from the beginning of the character

  Character to the end to end.

  The syntax is as follows:

  instr( string1, string2 [, start_position [, nth_appearance ] ] )

  Parametric analysis:

  string1

  Source string, to find in this string.

  string2

  To search string in the string1.

  start_position

  String1 which represents the position to start looking. This parameter is optional, and if omitted, the default is 1. a string index from the start. If this parameter is positive, from left to right to start searching, if this argument is negative, from right to left to retrieve and return the string you want to find the beginning of the index in the source string.

  nth_appearance

  To find how many times on behalf of string2 appears. This parameter is optional, if omitted, the default is 1. If negative system error.

  note:

  If String2 is not found in the String1, instr function returns 0.

  Example:

  SELECT instr('syranmo','s') FROM dual; -- 返回 1

  SELECT instr('syranmo','ra') FROM dual;  -- 返回 3

  SELECT instr('syran mo','a',1,2) FROM dual;  -- 返回 0

  Compared:

   instr (title, 'manual')> 0 is equivalent title like '%% manual'

   instr (title, 'manual') = 1 corresponds to title like '% manual'

   instr (title, 'manual') = 0 corresponds to the title not like '%% manual'

Fuzzy query optimization:

  Instr know how the function becomes easy to optimize, for example,%% identical to a keyword instr (field 'keywords')> 0 

 

Practical application:

t table nearly 11 million data, many times, we want to match the string in the SQL statement, we usually like to achieve the goal of our search. But after the actual test found that the efficiency of the instr function like the difference is considerable. Here are some results:

SQL> set timing on
SQL> select count(*) from t where instr(title,'手册')>0;

  COUNT(*)
----------
     65881

Elapsed: 00:00:11.04
SQL> select count(*) from t where title like '%手册%';

  COUNT(*)
----------
     65881

Elapsed: 00:00:31.47
SQL> select count(*) from t where instr(title,'手册')=0;

  COUNT(*)
----------
  11554580

Elapsed: 00:00:11.31
SQL> select count(*) from t where title not like '%手册%';

  COUNT(*)
----------
  11554580

In addition, I knot another 200 million table, using eight parallel, not long out of use like query results, but the use of instr, 4 minutes to complete the look, the performance is quite good. These tips make good use, improve working efficiency a lot. Through the above test instructions, ORACLE built-in functions that are optimized considerable degree of.

 

instr (title, 'aaa')> 0 is equivalent to like

instr (title, 'aaa') = 0 corresponds to not like

 

Special usage:

 

select   id, name from users where instr('101914, 104703', id) > 0; 
  它等价于 
select   id, name from users where id = 101914 or id = 104703;

 

 

 

Using Oracle 's instr function and improve the efficiency of the index with fuzzy queries

In general, in the Oracle database, we tb table name field of fuzzy query will be used two ways:
1.select * tb from the WHERE name like '% XX%';
2.Select * tb from the WHERE InStr ( name, 'XX')> 0 ;

if not indexed in the name field, almost two efficiency, substantially no difference.

To improve efficiency, we can add non-unique index on the name field:
the Create index idx_tb_name ON TB (name);

so, re-use

select * from tb where instr(name,'XX')>0;

Such a statement is a query, the efficiency can be improved a lot, the greater the difference is greater when the amount of data tables. But it should also take into account the impact of DML statement causes the index data reordering fields plus the name of the index.

 

 

Another unknown program:

Some people say that a full-text index, I saw, very troublesome step, but it is a good way to keep a spare:

http://sandish.itpub.net/post/4899/464369

Cmng_custominfo table of the address field to do full-text search:
1, you need to create something of a word in oracle9201 in:

BEGIN
ctx_ddl.create_preference ('SMS_ADDRESS_LEXER', 'CHINESE_LEXER');
--ctx_ddl.create_preference ('my_lexer', 'chinese_vgram_lexer'); 不用
end;

2. Create a full-text search:

CREATE INDEX INX_CUSTOMINFO_ADDR_DOCS ON cmng_custominfo(address) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER SMS_ADDRESS_LEXER');

3, query time, use:

select * from cmng_custominfo where contains (address, 'Metro Gold')> 1;

4, need to synchronize and optimize on a regular basis:
Sync: Updates according to the text content of the new recording of full-text search index.

begin
ctx_ddl.sync_index('INX_CUSTOMINFO_ADDR_DOCS');
end;

Optimization: According to the records to be deleted to clear garbage full-text search index

begin
ctx_ddl.optimize_index('INX_CUSTOMINFO_ADDR_DOCS', 'FAST');
end;

5, step 4 using job do work:

1) This function requires the use of oracle's function JOB done
because oracle9I JOB function is not enabled by default, so you first need to increase the ORACLE database instance configuration parameters JOB:
job_queue_processes = 5
restart oracle database service and the listener service.

2) optimization of synchronization and
- synchronization Sync:
variable jobno Number;
the BEGIN
DBMS_JOB.SUBMIT (: jobno, 'ctx_ddl.sync_index (' 'INX_CUSTOMINFO_ADDR_DOCS' ');', SYSDATE, 'SYSDATE + (1/24/4)') ;
 the commit;
the END;

--优化
variable jobno number;
begin
 DBMS_JOB.SUBMIT(:jobno,'ctx_ddl.optimize_index(''INX_CUSTOMINFO_ADDR_DOCS'',''FULL'');', SYSDATE, 'SYSDATE + 1');
 commit;
END;

Wherein a first job of SYSDATE + (1/24/4) refers to a synchronized every 15 minutes, the second job SYSDATE + 1 is once every other day to make fully optimized. Specific time interval, can be determined according to the needs of the application

6, the index rebuild
rebuilding the index will remove the original index, index rebuild, it takes a long time.
Rebuild the index syntax is as follows:
the ALTER INDEX REBUILD INX_CUSTOMINFO_ADDR_DOCS;

according to some online experience with family, oracle rebuild the index speed is relatively fast, have a home with this description:

the Oracle full-text search index build and maintain faster than ms sql server much more, a table of 650,000 records author index only 20 minutes, synchronized time is only 1 minute.
Therefore, we can also consider ways to rebuild the index with a job on a regular basis.

(1) field like '% keyword%' field contains the record "keyword" even index in the target field will not take the index, the slowest  

(2) The field like '% keywords' field to "key" may be used to record the beginning of ascending index field created in the target

(3) field like '% keyword' field to "key" the end of the record can be used to establish a field goal descending indexes

 

Who can not use the index '% keyword%' mode, there is no way to optimize it, the answer is yes,

ORacle provided in the instr (strSource, strTarget) function, much higher than '% keyword%' efficiency mode.

instr Function Description:

INSTR

  (Source string, the target string, the starting position, matching serial number)

  In Oracle / PLSQL in, instr function returns the position of the character string to be taken in the source string. Retrieved only once, that is to say from the beginning of the character

  Character to the end to end.

  The syntax is as follows:

  instr( string1, string2 [, start_position [, nth_appearance ] ] )

  Parametric analysis:

  string1

  Source string, to find in this string.

  string2

  To search string in the string1.

  start_position

  String1 which represents the position to start looking. This parameter is optional, and if omitted, the default is 1. a string index from the start. If this parameter is positive, from left to right to start searching, if this argument is negative, from right to left to retrieve and return the string you want to find the beginning of the index in the source string.

  nth_appearance

  To find how many times on behalf of string2 appears. This parameter is optional, if omitted, the default is 1. If negative system error.

  note:

  If String2 is not found in the String1, instr function returns 0.

  Example:

  SELECT instr('syranmo','s') FROM dual; -- 返回 1

  SELECT instr('syranmo','ra') FROM dual;  -- 返回 3

  SELECT instr('syran mo','a',1,2) FROM dual;  -- 返回 0

  Compared:

   instr (title, 'manual')> 0 is equivalent title like '%% manual'

   instr (title, 'manual') = 1 corresponds to title like '% manual'

   instr (title, 'manual') = 0 corresponds to the title not like '%% manual'

Fuzzy query optimization:

  Instr know how the function becomes easy to optimize, for example,%% identical to a keyword instr (field 'keywords')> 0 

 

Practical application:

t table nearly 11 million data, many times, we want to match the string in the SQL statement, we usually like to achieve the goal of our search. But after the actual test found that the efficiency of the instr function like the difference is considerable. Here are some results:

SQL> set timing on
SQL> select count(*) from t where instr(title,'手册')>0;

  COUNT(*)
----------
     65881

Elapsed: 00:00:11.04
SQL> select count(*) from t where title like '%手册%';

  COUNT(*)
----------
     65881

Elapsed: 00:00:31.47
SQL> select count(*) from t where instr(title,'手册')=0;

  COUNT(*)
----------
  11554580

Elapsed: 00:00:11.31
SQL> select count(*) from t where title not like '%手册%';

  COUNT(*)
----------
  11554580

In addition, I knot another 200 million table, using eight parallel, not long out of use like query results, but the use of instr, 4 minutes to complete the look, the performance is quite good. These tips make good use, improve working efficiency a lot. Through the above test instructions, ORACLE built-in functions that are optimized considerable degree of.

 

instr (title, 'aaa')> 0 is equivalent to like

instr (title, 'aaa') = 0 corresponds to not like

 

Special usage:

 

select   id, name from users where instr('101914, 104703', id) > 0; 
  它等价于 
select   id, name from users where id = 101914 or id = 104703;

 

 

 

Using Oracle 's instr function and improve the efficiency of the index with fuzzy queries

In general, in the Oracle database, we tb table name field of fuzzy query will be used two ways:
1.select * tb from the WHERE name like '% XX%';
2.Select * tb from the WHERE InStr ( name, 'XX')> 0 ;

if not indexed in the name field, almost two efficiency, substantially no difference.

To improve efficiency, we can add non-unique index on the name field:
the Create index idx_tb_name ON TB (name);

so, re-use

select * from tb where instr(name,'XX')>0;

Such a statement is a query, the efficiency can be improved a lot, the greater the difference is greater when the amount of data tables. But it should also take into account the impact of DML statement causes the index data reordering fields plus the name of the index.

 

 

Another unknown program:

Some people say that a full-text index, I saw, very troublesome step, but it is a good way to keep a spare:

http://sandish.itpub.net/post/4899/464369

Cmng_custominfo table of the address field to do full-text search:
1, you need to create something of a word in oracle9201 in:

BEGIN
ctx_ddl.create_preference ('SMS_ADDRESS_LEXER', 'CHINESE_LEXER');
--ctx_ddl.create_preference ('my_lexer', 'chinese_vgram_lexer'); 不用
end;

2. Create a full-text search:

CREATE INDEX INX_CUSTOMINFO_ADDR_DOCS ON cmng_custominfo(address) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('LEXER SMS_ADDRESS_LEXER');

3, query time, use:

select * from cmng_custominfo where contains (address, 'Metro Gold')> 1;

4, need to synchronize and optimize on a regular basis:
Sync: Updates according to the text content of the new recording of full-text search index.

begin
ctx_ddl.sync_index('INX_CUSTOMINFO_ADDR_DOCS');
end;

Optimization: According to the records to be deleted to clear garbage full-text search index

begin
ctx_ddl.optimize_index('INX_CUSTOMINFO_ADDR_DOCS', 'FAST');
end;

5, step 4 using job do work:

1) This function requires the use of oracle's function JOB done
because oracle9I JOB function is not enabled by default, so you first need to increase the ORACLE database instance configuration parameters JOB:
job_queue_processes = 5
restart oracle database service and the listener service.

2) optimization of synchronization and
- synchronization Sync:
variable jobno Number;
the BEGIN
DBMS_JOB.SUBMIT (: jobno, 'ctx_ddl.sync_index (' 'INX_CUSTOMINFO_ADDR_DOCS' ');', SYSDATE, 'SYSDATE + (1/24/4)') ;
 the commit;
the END;

--优化
variable jobno number;
begin
 DBMS_JOB.SUBMIT(:jobno,'ctx_ddl.optimize_index(''INX_CUSTOMINFO_ADDR_DOCS'',''FULL'');', SYSDATE, 'SYSDATE + 1');
 commit;
END;

Wherein a first job of SYSDATE + (1/24/4) refers to a synchronized every 15 minutes, the second job SYSDATE + 1 is once every other day to make fully optimized. Specific time interval, can be determined according to the needs of the application

6, the index rebuild
rebuilding the index will remove the original index, index rebuild, it takes a long time.
Rebuild the index syntax is as follows:
the ALTER INDEX REBUILD INX_CUSTOMINFO_ADDR_DOCS;

according to some online experience with family, oracle rebuild the index speed is relatively fast, have a home with this description:

the Oracle full-text search index build and maintain faster than ms sql server much more, a table of 650,000 records author index only 20 minutes, synchronized time is only 1 minute.
Therefore, we can also consider ways to rebuild the index with a job on a regular basis.

Guess you like

Origin www.cnblogs.com/itzhoucong/p/11647496.html