Viewing and Modifying Oracle Character Sets

First, what is the Oracle character set

       Oracle character set is a set of symbols for the interpretation of byte data. ORACLE's architecture supporting national languages ​​allows you to store, process, and retrieve data using localized languages. It automatically adapts database tools, error messages, sort orders, dates, times, currencies, numbers, and calendars to localized languages ​​and platforms.



The most important parameter that affects the character set of the Oracle database is the NLS_LANG parameter.

It has the following format: NLS_LANG = language_territory.charset

It has three components (language, territory, and charset), each of which controls the characteristics of a subset of NLS.

Among them:

Language: Specify the language of the server message, which affects whether the prompt information is Chinese or English

Territory: Specify the date and number format of the server,

Charset: Specify the character set.

Such as: AMERICAN _ AMERICA. ZHS16GBK

From the composition of NLS_LANG, we can see that the third part actually affects the database character set.

Therefore, as long as the character set between the two databases is the same as the third part, the data can be imported and exported from each other, and the previous effect is only whether the prompt information is Chinese or English.



two. Relevant knowledge of

character set: 2.1 The
    essence of character set is to assign different numerical codes to a set of specific symbols according to a certain character encoding scheme. The earliest supported encoding scheme for Oracle Database is US7ASCII.
    Oracle's character set naming follows the following naming rules:
    <Language><bit size><encoding>
    is: <language><bit size><encoding>
    For example: ZHS16GBK means GBK encoding format, 16-bit (two bytes) Simplified Chinese character set
 
2.2 Character encoding scheme


2.2. 1 Single-byte encoding
    (1) Single-byte 7-bit character set, which can define 128 characters, the most commonly used character set is US7ASCII
    (2) Single-byte 8-bit character set, which can define 256 characters, suitable for European For example in some countries
             : WE8ISO8859P1 (Western European, 8-bit, ISO standard 8859P1 encoding)



2.2.2 Multi-byte encoding
    (1) Variable-length multi-byte encoding
    Some characters are represented by one byte, and other characters are represented by two or more characters Indicates that variable-length multi-byte encoding is often used to support Asian languages, such as Japanese, Chinese, Hindi, etc.
    For example: AL32UTF8 (where AL stands for ALL, which means that it is applicable to all languages), zhs16cgb231280
    (2) Fixed-length multi-byte
    Each character is encoded using a fixed-length byte encoding scheme. Currently, the only fixed-length multi-byte encoding supported by oracle is AF16UTF16, which is also only used for national character set

2.2.3 unicode encoding
    Unicode is a single encoding scheme that covers all known characters currently used around the world, which means that Unicode provides a unique encoding for each character. UTF-16 is a 16-bit encoding method of unicode. It is a fixed-length multi-byte encoding. It uses 2 bytes to represent a unicode character. AF16UTF16 is a UTF-16 encoded character set.
    UTF-8 is an 8-bit encoding method of unicode. It is a variable-length multi-byte encoding. This encoding can use 1, 2, or 3 bytes to represent a unicode character. AL32UTF8, UTF8, and UTFE are UTF-8 encoded characters. Set
 
2.3 Character Set Super
    When the coded values ​​of one character set (Character Set A) contain all the coded values ​​of the other character set (Character Set B), and the same coded values ​​of the two character sets represent the same characters, then the character Set A is the superset of character set B, or character set B is a subset of character set A.
    There are subset-superset pairs in the official documentation of Oracle8i and oracle9i. For example: WE8ISO8859P1 is a subset of WE8MSWIN1252. Since US7ASCII is the earliest Oracle database encoding format, there are many character sets that are supersets of US7ASCII. For example, WE8ISO8859P1, ZHS16CGB231280, and ZHS16GBK are all supersets of US7ASCII.
 
2.4 Database character set (oracle server-side character set) The
    database character set is specified when the database is created, and usually cannot be changed after creation. When creating a database, you can specify the character set (CHARACTER SET) and the national character set (NATIONAL CHARACTER SET).



2.4.1 Character set
    (1) used to store CHAR, VARCHAR2, CLOB, LONG and other types of data
    (2) Used to indicate table names, column names and PL/SQL variables, etc.
    (3) Used to store SQL and PL/SQL program units, etc.



2.4.2 National character set:
    (1) Used to store NCHAR, NVARCHAR2, NCLOB Equal type data
    (2) The national character set is essentially an additional character set selected for oracle. The main function is to enhance the character processing capability of oracle, because the NCHAR data type can provide support for using fixed-length multi-byte encoding in Asia, while Database character sets cannot. The national character set has been redefined in oracle9i and can only be selected from AF16UTF16 and UTF8 in unicode encoding. The default value is AF16UTF16.



2.4.3 Querying the character set parameters
    You can query the following data dictionary or view to check the character set settings
    nls_database_parameters, props $, v$nls_parameters In the
    query result, NLS_CHARACTERSET represents the character set, and NLS_NCHAR_CHARACTERSET represents the national character set.



2.4.4 Modifying the database character set
    As mentioned above, the database character set cannot be changed in principle after it is created. But there are 2 ways to do it.



1. If you need to modify the character set, you usually need to export the database data, rebuild the database, and then import the database data to convert.

2. Modify the character set through the ALTER DATABASE CHARACTER SET statement, but there are restrictions on modifying the character set after the database is created. Only when the new character set is a superset of the current character set can the database character set be modified. For example, UTF8 is a superset of US7ASCII. , you can use ALTER DATABASE CHARACTER SET UTF8 to modify the database character set.
 
2.5 Client character set (NLS_LANG parameter)


2.5.1 Client character set meaning
    Client character set defines the encoding method of client character data, any character data sent from or to the client is encoded using the character set defined by the client , The client can be regarded as various applications that can directly connect to the database, such as sqlplus, exp/imp, etc. The client character set is set by setting the NLS_LANG parameter.



2.5.2 NLS_LANG parameter format
    NLS_LANG=<language>_<territory>.<client character set>
    Language: display oracle message, check, date naming
    Territory: specify the default date, number, currency and other formats
    Client character set: specify the client The character set to be used is
    for example: NLS_LANG=AMERICAN_AMERICA.US7ASCII
    AMERICAN is the language, AMERICA is the region, and US7ASCII is the client character set



2.5.3 Client character set setting method
     1) UNIX environment
         $NLS_LANG=“simplified chinese”_china.zhs16gbk
         $export NLS_LANG
         to edit the oracle user's profile file
    2) Windows environment to
         edit the registry
         Regedit.exe ---" HKEY_LOCAL_MACHINE --- "SOFTWARE ---" ORACLE-HOME



2.5.4 NLS parameter query
    Oracle provides several NLS parameters to customize the database and The user machine can adapt to the local format, such as NLS_LANGUAGE, NLS_DATE_FORMAT, NLS_CALENDER, etc., which can be viewed by querying the following data dictionary or v$ view. NLS_DATABASE_PARAMETERS : Displays
the current NLS parameter values ​​of the database, including the database character set The parameter V$NLS_PARAMETERS defined in the parameter file init<SID>.ora: display the current NLS parameter value of the database 2.5.5 Modify the NLS parameter     Use the following methods to modify the NLS parameter     (1) Modify the initialization parameter file used when the instance starts     (2) Modify the environment variable NLS_LANG     (3) Use the ALTER SESSION statement to modify it in the oracle session












    (4) Use some SQL function
    NLS function priority: Sql function > alter session > environment variable or registry > parameter file > database default parameter



III . EXP/IMP and Character Sets

3.1 EXP/IMP
    Export and Import are a pair of tools for reading and writing Oracle data. Export outputs the data in the Oracle database to the operating system files, and Import reads the data in these files into the Oracle database. Since exp/imp is used for data migration, there are four steps in the process of data from the source database to the target database. The links involve character sets. If the character sets of these four links are inconsistent, character set conversion will occur.
EXP
     ____________ _________________ _____________
     |imp import file|<-|environment variable NLS_LANG|<-|database character set|
      ------------ --------------- -- -------------

IMP
     ____________ _________________ _____________
     |imp import file|->|environment variable NLS_LANG|->|database charset|
      ------------ ----------------- -------------





The four character sets are
   (1) the source database character set
   (2) User session character set in the Export process (set by NLS_LANG)
   (3) User session character set in the Import process (set by NLS_LANG)
   (4) Target database character set
 
3.2 The exported conversion process is
    in the Export process, if If the character set of the source database is inconsistent with the character set of the Export user session, character set conversion will occur, and the ID number of the Export user session character set will be stored in the first few bytes of the export file. Data loss may occur during this conversion process.


Example: If the source database uses ZHS16GBK, and the Export user session character set uses US7ASCII, since ZHS16GBK is a 16-bit character set and US7ASCII is a 7-bit character set, during this conversion process, Chinese characters cannot find equivalent characters in US7ASCII , so all Chinese characters will be lost and become "??" form, so the Dmp file generated after conversion has lost data.
Therefore, if you want to export the source database data correctly, the user session character set in the Export process should be equal to the source database character set or a superset of the source database character set.
 
3.3 Import conversion process
    (1) Determine the export database character set environment
             by reading and exporting From the file header, you can get the character set settings of the export file
    (2) Determine the character set of the import session, that is, the NLS_LANG environment variable used by the import session
    (3) IMP reads the export file
             Reads the export file character set ID, and the NLS_LANG of the import process Compare
    (4) If the character set of the exported file is the same as that of the imported Session, then no conversion is required in this step. If it is different, the data needs to be converted to the character set used by the imported Session. It can be seen that there are two character set conversions in the process of importing data into the database.


    The first time is the conversion between the character set of the import file and the character set used by the import session. If this conversion process cannot be completed correctly, Import will import the target database. The process cannot be completed.
    The second time: import the conversion between the Session character set and the database character set.



4. Check the database character set There are three character sets

involved ,

1. The character set of the oracel server side;

2. The character set of the oracle client side;

3. The character set of the dmp file.



When doing data import, these three character sets need to be consistent to import correctly.



4.1 Querying the character set of the oracle server

There are many ways to find out the character set of the oracle server. The more intuitive query method is as follows:

SQL> select userenv('language') from dual;

USERENV('LANGUAGE')

-- --------------------------------------------------

SIMPLIFIED CHINESE_CHINA.ZHS16GBK



SQL>select userenv('language') from dual;

AMERICAN _ AMERICA. ZHS16GBK



4.2 How to query the character set of the dmp file The dmp file exported

with oracle's exp tool also contains the character set information. The 2nd and 3rd bytes of the dmp file record the character set of the dmp file. If the dmp file is not large, such as only a few M or dozens of M, you can open it with UltraEdit (in hexadecimal format), look at the content of the second and third bytes, such as 0354, and then use the following SQL to find out its corresponding character Set:

SQL> select nls_charset_name(to_number('0354','xxxx')) from dual;

ZHS16GBK



If the dmp file is large, such as more than 2G (this is also the most common situation), it is very slow to open with a text editor or completely If it can't be opened, you can use the following command (on the unix host):

cat exp.dmp |od -x|head -1|awk '{print $2 $3}'|cut -c 3-6

Then you can also get it with the above SQL Its corresponding character set.



4.3 Querying the character set of the Oracle client

In the Windows platform, it is the NLS_LANG of the corresponding OracleHome in the registry. You can also set it yourself in the dos window, for

example: set nls_lang=AMERICAN_AMERICA.ZHS16GBK

This only affects the environment variables in this window.



Under the unix platform, it is the environment variable NLS_LANG.

$echo $NLS_LANG

AMERICAN_AMERICA.ZHS16GBK



If the result of the check shows that the character sets of the server and the client are inconsistent, please change them to the same character set as the server.



Supplement:

(1). The database server character set

select * from nls_database_parameters

comes from props$, which is the character set representing the database.



(2). The client character set environment

select * from nls_instance_parameters

, which is derived from v$parameter, which indicates the client's character set settings, which may be parameter files, environment variables or the registry



(3). Session character set environment

select * from nls_session_parameters

comes from v$nls_parameters, which indicates the settings of the session itself, which may be the environment variables of the session or the completion of the alter session. If the session has no special settings, it will be consistent with nls_instance_parameters.



(4). The character set of the client is required to be consistent with the server, so that the non-Acii characters of the database can be displayed correctly.

If multiple settings exist, NLS takes precedence: Sql function > alter session > environment variable or registry > parameter file > database default parameter The



character set is required to be the same, but the language setting can be different. It is recommended to use English for the language setting. If the character set is zhs16gbk, nls_lang can be American_America.zhs16gbk.





Fives. Modify the character set of oracle

As mentioned above, the database character set cannot be changed in principle after it is created. Therefore, it is important to consider which character set to use at the beginning of design and installation. For the database server, incorrectly modifying the character set will lead to many unpredictable consequences, which may seriously affect the normal operation of the database. Therefore, before modifying, be sure to confirm whether there is a subset and superset relationship between the two character sets. Generally speaking, unless absolutely necessary, we do not recommend modifying the character set on the server side of the Oracle database. In particular, there is no subset and superset relationship between the two most commonly used character sets, ZHS16GBK and ZHS16CGB231280, so theoretically, the mutual conversion between these two character sets is not supported.



However, there are two possible ways to modify the character set.

1. Usually need to export the database data, rebuild the database, and then import the database data to convert.

2. Modify the character set through the ALTER DATABASE CHARACTER SET statement, but there are restrictions on modifying the character set after the database is created. Only when the new character set is a superset of the current character set can the database character set be modified. For example, UTF8 is a superset of US7ASCII. , you can use ALTER DATABASE CHARACTER SET UTF8 to modify the database character set.




5.1 Modify the server side character set (not recommended)



1. Close the database

SQL>SHUTDOWN IMMEDIATE



2. Start to Mount

SQL>STARTUP MOUNT;

SQL>ALTER SYSTEM ENABLE RESTRICTED SESSION;

SQL>ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;

SQL>ALTER SYSTEM SET AQ_TM_PROCESSES=0;

SQL>ALTER DATABASE OPEN; --You





can go from the parent set to the subset here

SQL>ALTER DATABASE CHARACTER SET ZHS16GBK;

SQL>ALTER DATABASE NATIONAL CHARACTER SET ZHS16GBK; --If



it is from the subset to the parent set, you need to use the INTERNAL_USE parameter, skip Excessive subset detection

SQL>ALTER DATABASE CHARACTER SET INTERNAL_USE AL32UTF8;

SQL>ALTER DATABASE NATIONAL CHARACTER SET INTERNAL_USE AL32UTF8;





SQL>SHUTDOWN IMMEDIATE;



SQL>STARTUP

Note: If there are no large objects, language conversion during use has no effect. (Remember that the set character set must be supported by ORACLE, otherwise it cannot be started) You can do it as above.



If the prompt message 'ORA-12717: Cannot ALTER DATABASE NATIONAL CHARACTER SET when NCLOB data exists' appears,

there are two ways to solve this problem:

1. Use the INTERNAL_USE keyword to modify the locale,

2. Use re-create, but re -create is a bit complicated, so please use internal_use



SQL>SHUTDOWN IMMEDIATE;

SQL>STARTUP MOUNT EXCLUSIVE;

SQL>ALTER SYSTEM ENABLE RESTRICTED SESSION;

SQL>ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;

SQL>ALTER SYSTEM SET AQ_TM_PROCESSES=0;

SQL>ALTER DATABASE OPEN;

SQL>ALTER DATABASE NATIONAL CHARACTER SET INTERNAL_USE UTF8;

SQL> SHUTDOWN immediate;

SQL>startup;

If you do as above, there will be no problem with the regional setting of National charset



5.2 Modifying the character

set of the dmp file As mentioned above, the second and third bytes of the dmp file record the character set information, so directly Modifying the content of the 2nd and 3rd bytes of the dmp file can 'deceive' the oracle's inspection. In theory, this can only be modified from subset to superset, but in many cases, it can also be modified without the relationship between subset and superset. Some of our commonly used character sets, such as US7ASCII, WE8ISO8859P1, ZHS16CGB231280, ZHS16GBK Basically it can be changed. Because only the dmp file is changed, it has little effect.



There are many specific modification methods, the simplest is to directly modify the 2nd and 3rd bytes of the dmp file with UltraEdit.

For example, if you want to change the character set of the dmp file to ZHS16GBK, you can use the following SQL to find out the hexadecimal code corresponding to this character set: SQL> select to_char(nls_charset_id('ZHS16GBK'), 'xxxx') from dual;

0354

and then Change the 2 and 3 bytes of the dmp file to 0354.

If the dmp file is very large and cannot be opened with ue, you need to use the program method.



5.3 Client character set setting method
     1) UNIX environment
         $NLS_LANG=“simplified chinese”_china.zhs16gbk
         $export NLS_LANG
         to edit oracle user profile
    2) Windows environment to
         edit registry
         Regedit.exe ---> HKEY_LOCAL_MACHINE ---> SOFTWARE ---> ORACLE-HOME

or set in the window:

        set nls_lang=AMERICAN_AMERICA.ZHS16GBK

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326753964&siteId=291194637