PostgreSQL DBA(42) - locale

When using initdb PostgreSQL database initialization, a "localized" locale parameter, the parameter is not specified as empty by default, i.e., using the OS locale settings. 
Localization settings affect the following SQL features: 
1. Sort and comparison operation: Queries the Sort Order in the ORDER BY or the using The Standard ON Textual Data comparison Operators 
2. built-in functions: of The Upper, Lower, and INITCAP functions 
3. matching pattern: pattern matching operators (LIKE, SIMILAR TO, and POSIX-style regular Expressions); Case INSENSITIVE matching both the locales Affect The Classification and Character-class of characters by Regular Expressions 
4.to_char related functions: functions of The TO_CHAR Family of 
5.LIKE can use an index: The ability to use indexes with LIKE clauses

Ordering 
the same data using different LC_COLLATE, SQL different output:

postgres=# SELECT name FROM unnest(ARRAY['MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg']) name ORDER BY name collate "C";        name        --------------------  my_name MYNAME my-image.jpg my-third-image.jpg(4 rows)postgres=# SELECT name FROM unnest(ARRAY['MYNAME', ' my_name', 'my-image.jpg', 'my-third-image.jpg']) name ORDER BY name collate "zh_CN";        name        -------------------- my-image.jpg  my_name MYNAME my-third-image.jpg(4 rows)

collate designated as "C", the default string using a binary ASCII code value comparison, is designated zh_CN not.

Use zh_CN their behavior by not case-sensitive processing

postgres=# SELECT name FROM unnest(ARRAY['MYNAME1', ' my_name2', 'my-image.jpg', 'my-third-image.jpg']) name ORDER BY name collate "zh_CN";        name        -------------------- my-image.jpg MYNAME1  my_name2 my-third-image.jpg(4 rows)postgres=# SELECT name FROM unnest(ARRAY['myname1', ' myname2', 'myimage.jpg', 'mythirdimage.jpg']) name ORDER BY name collate "zh_CN";       name       ------------------ myimage.jpg myname1  myname2 mythirdimage.jpg(4 rows)

Interpretation mailing list is as follows:

The behavior of each collation comes from the operating system’s own 
libc, except for the C collation, which is based on the ordering 
implied by strcmp() comparisons. Generally, most implementations have 
the behavior you describe, in that they assign least weight of all to 
caseness and whitespace, and somewhat more weight to punctuation. I 
don’t think that there is much that can be done about it in practice, 
though in principal there could be a collation that has all the 
properties you want.

内置函数              
如initcap,在法语和C下面会有不同

郑州不孕不育医院:http://byby.zztjyy.com/yiyuanzaixian/zztjyy//

postgres=#  select initcap('élysée' collate "C"); initcap --------- éLyséE(1 row)postgres=#  select initcap('élysée' collate "fr_FR"); initcap --------- Élysée(1 row)

在中文语境下,全角字符的小写字母会转换为全角的大写字母

postgres=# select initcap('a' collate "zh_CN"); initcap --------- A(1 row)postgres=# select initcap('a' collate "C"); initcap --------- a(1 row)

在LC_COLLATE下,只会对7F以下的ASCII字符生效,其他字符不生效

模式匹配

postgres=#  select 'élysée' ~ '^\w+$' collate "fr_FR"; ?column? ---------- t(1 row)postgres=#  select 'élysée' COLLATE "C" ~ '^\w+$'; ?column? ---------- f(1 row)

LIKE能否使用索引

postgres=# CREATE TABLE t_sort (postgres(#     a text COLLATE "zh_CN",postgres(#     b text COLLATE "C");CREATE TABLEpostgres=# postgres=# INSERT INTO t_sort SELECT md5(n::text), md5(n::text)postgres-#     FROM generate_series(1, 1000000) n; INSERT 0 1000000postgres=# CREATE INDEX ON t_sort USING btree (a);CREATE INDEXpostgres=# CREATE INDEX ON t_sort USING btree (b);CREATE INDEXpostgres=# ANALYZE t_sort;ANALYZEpostgres=# SELECT * FROM t_sort LIMIT 2;                a                 |                b                 ----------------------------------+---------------------------------- c4ca4238a0b923820dcc509a6f75849b | c4ca4238a0b923820dcc509a6f75849b c81e728d9d4c2f636f067f89cc14862c | c81e728d9d4c2f636f067f89cc14862c(2 rows)postgres=# explain SELECT * FROM t_sort WHERE a LIKE 'c4ca4238a0%';                                QUERY PLAN                                 --------------------------------------------------------------------------- Gather  (cost=1000.00..18564.33 rows=100 width=66)   Workers Planned: 2   ->  Parallel Seq Scan on t_sort  (cost=0.00..17554.33 rows=42 width=66)         Filter: (a ~~ 'c4ca4238a0%'::text)(4 rows)postgres=# explain SELECT * FROM t_sort WHERE b LIKE 'c4ca4238a0%';                                  QUERY PLAN                                  ------------------------------------------------------------------------------ Index Scan using t_sort_b_idx on t_sort  (cost=0.42..8.45 rows=100 width=66)   Index Cond: ((b >= 'c4ca4238a0'::text) AND (b < 'c4ca4238a1'::text))   Filter: (b ~~ 'c4ca4238a0%'::text)(3 rows)

使用zh_CN不能用上索引,但使用C可以用上索引

Guess you like

Origin blog.51cto.com/14337216/2413844