Is it normal for a MySQL create table statement to include redundant collation declarations for every char, varchar, and text column?

Leo Galleguillos :

When running SHOW CREATE TABLE `my_table`;, I notice that COLLATE utf8mb4_unicode_ci is shown for every char, varchar, and text column in the table. This seems a bit redundant since the collation is already declared in the table_option portion of the create statement.

mysql> SHOW CREATE TABLE `my_table`;
| Table    | Create Table
| my_table | CREATE TABLE `my_table` (
...
  `char_col_1` char(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `varchar_col_1` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `varchar_col_2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `varchar_col_3` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `text_col_1` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
...
) ENGINE=InnoDB AUTO_INCREMENT=1816178 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

This behavior is noticeable in both MySQL 5.7 and MySQL 8.0 and therefore most likely in other versions as well.

Is this behavior normal and acceptable, or is it a symptom of something that is misconfigured either with the table, database, or MySQL instance?

On the other hand, since collation can be individually set for any specific column, perhaps it is better to explicitly display the collation for every column to avoid any ambiguity or assumptions, even in cases where the collation of the column matches the collation of the table?

Rick James :

You have touched only the tip of the iceberg.

  • I think the settings on the table are just defaults for columns that are defined without charset or collate.
  • Ditto for ALTER TABLE ADD COLUMN -- will inherit from the table defaults.
  • I think that the column settings are put into the information_schema.COLUMNS table and that won't change with an ALTER TABLE .. MODIFY COLUMN ..

Similarly, the table charset and collation inherit from the database definition, and will be frozen as the table is defined.

About defaults:

  • The old default charset was latin1
  • The current default is utf8mb4; this is unlikely to ever change in the future.
  • Every collation applies to exactly one charset, and the charset name is the beginning of the collation name.
  • Each charset has exactly one "default" collation: latin1_swedish_ci, utf8_unicode_ci, utf8mb4_0900_ai_ci, etc.
  • That default collation (for a given charset) has rarely, if ever, changed. Perhaps the only change has been for utf8mb4 between 5.7 and 8.0??

(The more I experiment, the less certain I am about all this.)

Best practice: Always explicitly set CHARSET and COLLATE for each string column.

Secondary considerations:

  • Use utf8mb4, if available, for most string (VARCHAR / TEXT).
  • Use the latest available collation (Unicode keeps improving it); currently utf8mb4_0900_ai_ci.
  • Use ascii for things that are clearly only ascii -- country-code, postal_code, hex, etc. Mostly these can use CHAR(..)
  • Use ascii_general_ci or ascii_bin, depending on whether you need case folding.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=5665&siteId=1