MySQL Optimization-Best Practices-Field Types

I. Introduction

MySQL supports a lot of data types, but in fact, most developers do not have a clear understanding of data types, because some data types are very compatible. Everyone thinks that my data can be stored normally. Twenty-first, I use varchar directly for strings, and use int directly for reshaping. Some developers even list a stud full string type. Hahaha, don't laugh, I've done such silly things once.

As the first part of the best practice, of course, we must start with the field type. Finding the right person at the right time plays a very important role in the maintenance cost in the later period.

Here I list a few questions, you can think about it first, and I will answer them at the end of the article

  • Want to store time, how to choose timestamp, datetime, int?
  • How to choose the primary key?
  • How to select the enumerated value in business?
  • How to store the IP address?

2. What are the field types supported by MySQL?

Before that, we need to know what types MySQL provides for us to use. Let's divide them into three categories for the time being, namely numeric types, time and date types, and string types.

1. Numerical type
Field Type Byte size Range (signed) Range (unsigned)
TINYINT 1 byte (-128,127) (0,255)
SMALLINT 2 bytes (-32768,32767) (0,65535)
MEDIUMINT 3 bytes (-8 388 608,8 388 607) (0,16 777 215)
INT 4 bytes (-2^31, 2^31 - 1) (0, 2^32 - 1)
BIGINT 8 bytes (-2^63, 2^63 - 1) (0, 2^64 - 1)
FLOAT 4 bytes -3.402823466E + 38 ~ -1.175494351E-38 0 and -1.175494351E-38~-3.402823466E+38
DOUBLE 8 bytes -1.7976931348623157E + 308 -2.2250738585072014E-308 0 and -2.2250738585072014E-308~-1.7976931348623157E+308
DECIMAL For DECIMAL(M,D), if M>D, M+2 otherwise D+2 (Depends on the value of M and D) (Depends on the value of M and D)

First, let's talk about integers.
For most developers, integers are not used a lot. Its family includes TINYINT, SMALLINT, MEDIUMINT, BIGINT, INT these brothers. The difference between them is that the range of values ​​that can be stored is different, and we can clearly see the difference in values ​​in the above table.

We know that a byte is eight bits, and each bit has two possibilities of 0 and 1, so that we can clearly calculate its value range

  • For signed integers (the first bit is regarded as the sign bit, there are positive and negative points) the value range is -2^(number of bytesx8/2) to 2^(number of bytesx8/2)-1
  • Unsigned integers (positive integers) range from 0 to 2^(number of bytes x8)-1

If you want to store integers, it is actually a good choice. You only need to estimate what range your numbers need to be stored. In theory, you can choose the appropriate one, but in general, we will use the most cost-effective storage, such as The enumeration value usually uses TINYINT. If your laptop is only 13 inches, there is no need to buy a 16-inch computer bag.

But in the actual development process, most people have only used TINYINT and INT, because these two have basically met the basic requirements, maybe this is also a custom. I have talked with many developers. It seems that everyone does not want to be too entangled in the range of my field values. INT is not enough BIGINT to come together. It should be noted that if you can determine that the field has only positive numbers, you must bring unsigned. On the one hand, it is to explain business requirements to other developers, and on the other hand, it is able to make good use of storage space (we can see that the same storage space is unsigned The integer can store twice the signed value).
It should be reminded that the 10 in int(10) is only the display width (involving the number of characters that can be displayed by the client), and has no effect on storage.

In the numerical type, in addition to integers, there are decimals. In the actual development process, when the amount or multiplier is involved, we usually use decimals for storage. The float and double types support approximate calculations using standard floating-point arithmetic. The decimal type is used to store exact decimals. When floating-point types store the same range of values, float and double generally use less space than decimal. Compared with float, double has higher precision and larger range. In fact, these three types are only storage types. In the internal calculations of MySQL, the double type is used for processing. The companies that bloggers currently contact rarely use float and double, and usually use decimal to store decimals. , Although decimal occupies more storage, but it will be more accurate. If the number of decimals inserted is more than the preset number, MySQL will automatically round up. When it comes to accurate financial calculations, we still recommend using integers for storage (multiplying by the agreed multiple)

2. Date and time type
Field Type Byte size range format
DATE 3 bytes 1000-01-01/9999-12-31 YYYY-MM-DD
TIME 3 bytes ‘-838:59:59’/‘838:59:59’ HH:MM:SS
YEAR 1 byte 1901/2155 YYYY
DATETIME 8 bytes 1000-01-01 00:00:00/9999-12-31 23:59:59 YYYY-MM-DD HH:MM:SS
TIMESTAMP 4 bytes 1970-01-01 00:00:00/2038 YYYY-MM-DD HH-MM-SS

Next, let's talk about date types. Generally speaking, our data records always have the need to save time. Among them, DATE, TIME, and YEAR are usually used in special scenarios. Most business scenarios still use the format of YYYY-MM-DD HH:MM:SS for storage, which can be regarded as the first three types Aggregate form.
So how do we choose datetime and timestamp?
We start with the storage space to consider, timestamp requires only 4 bytes of storage, compared to datetime, there are a lot of advantages, so if you take into account the size of the data stored in it, under the same circumstances, choose timestamp will be more space-saving
Second , We consider the storage time range. The time range of timestamp has certain limitations compared to datetime. It can only store the time from 1970 to 2038. If you want to store data before 1970, you can only use datetime. , I have ridiculed my former colleagues, the table we built can live for more than ten years at most (timestamp is chosen because it is stored log data, the amount of data is very large, tens of millions of levels, storage space is given priority ), hahaha, the company may not be able to survive until then, it should have been restructured in more than ten years.
And timestamp has an advantage over datetime. It supports DEFAULT CURRENT_TIMESTAMP clause in version 5.5 to 5.6.4
. Starting from 5.6.5 (including 5.7), this advantage is gone. DEFAULT CURRENT_TIMESTAMP clause can be specified To TIMESTAMP or DATETIME type column

So in summary, if the time range allows, try to use timestamp, because it is more space efficient than datetime. Sometimes some people will put Unix timestamps in integer values, but it is not recommended in practice. It will not bring much benefit, unless you want to record to a granularity smaller than seconds (because the minimum granularity of these time types is seconds), then you can use bigint to store subtle-level timestamps or use double to store decimals after seconds Part, or you can use MariaDB instead of MySQL (another branch of mysql)

3. String type
Field Type Byte size use
CHAR 0-255 bytes Fixed-length string
VARCHAR 0-65535 bytes Variable length string
TINYBLOB 0-255 bytes Binary string of no more than 255 characters
TINYTEXT 0-255 bytes Short text string
BLOB 0-65535 bytes Long text data in binary form
TEXT 0-65535 bytes Long text data
MEDIUMBLOB 0-16 777 215 bytes Medium-length text data in binary form
MEDIUMTEXT 0-16 777 215 bytes Medium-length text data
LONGBLOB 0-4 294 967 295 bytes Very large text data in binary form
LONGTEXT 0-4 294 967 295 bytes Very large text data

String types refer to CHAR, VARCHAR, BINARY, VARBINARY, BLOB, TEXT, ENUM, and SET. Among them, the most commonly used ones are char and varchar.

  • char and varchar types

The varhcar type is used to store variable strings. Its advantage over the fixed-length type is that it only uses the necessary space. This is a bit like the difference between an array and a slice. Generally speaking, if the length of a string is uncertain, we should prefer to use the varchar type for storage, it can use space more effectively, and the char type is usually used to store the specified length The character string (such as the password after determining the length uuid or md5), in this case, the char type is less prone to fragmentation.

Varchar needs to use 1 or 2 extra bytes to record the length of the string. When the maximum length of the column is less than or equal to 255 bytes, only 1 byte is needed, otherwise it needs two bytes (because 1 byte The maximum value of eight bits is 255), assuming that the latin1 character set is used, varchar(10) requires 11 bytes of storage space, while varchar(1000) requires 1002 bytes. Although varchar seems to be very advantageous, because it is a variable character, this means that there will be additional loss when updating (for example, the space occupied by the string after the update becomes larger. In this case, different engines will have Different behaviors, for example, myisam will split the row into different fragments for storage, and innodb will split the page to make the row fit into the page), and because it needs extra space to store the length of the string, varchar(1) is better than char(1) takes up more space.

One more thing to note is that although the space overhead used by varchar(10) and varchar(100) for storing "mclink" strings is the same, because MySQL usually allocates a fixed-size memory block to store internal values, more Long columns will consume more memory, especially when using memory temporary tables or file temporary tables for sorting or other operations, it is very bad.

  • text and blob

Text and blob exist to store larger data. They each have four members of their own. They are stored in binary and character respectively. Unlike other types, MySQL treats each blob and text value as an independent object, and the storage engine usually does special processing during storage. When the blob and text values ​​are too large, innodb will use our "external" storage area for storage. At this time, each value needs 1 to 4 bytes to store a pointer in the row, and then stores the actual value in the external area value.

The only difference between blob and text is that the blob type stores binary data and has no collation or character set, while the text type has a character set and collation. (What else can binary have besides 0 and 1.) And when sorting, MySQL will sort only sort_length bytes instead of the entire string for these two types, and it cannot perform full-length indexing for these two types. . Therefore, we should reduce the use of these two types when it is not necessary.

4. Other types

  • Enum type Sometimes you can use enumerated columns instead of commonly used string types, which can store some unique strings into a predefined set. MySQL is very close when storing enumerations, and will be compressed into one or two bytes according to the number of list values. Internally, MySQL will save the position of each value in the list as an integer, and save the number-string mapping table in the .frm file.
    Let's give an example,
CREATE TABLE `test_enum` (
  `e` enum('woman','man') NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

insert into enum_test(e) values('woman'),('man');

mysql> select e + 0 from test_enum;
+-------+
| e + 0 |
+-------+
|     1 |
|     2 |
+-------+
2 rows in set (0.00 sec)

mysql> select e from test_enum;
+-------+
| e     |
+-------+
| woman |
| man   |
+-------+
2 rows in set (0.00 sec)

This dual attribute can be seen through the digital context search. So we should try to avoid using numbers as enumeration values, which can easily lead to confusion. For example, enum('1','2'), in my work, I have seen colleagues want to use this, in fact, this is not advisable, another point that needs to be mentioned, the enumeration field is stored in accordance with the internal Integers are used for sorting instead of defined strings, which means that the corresponding order must be considered when defining, otherwise you have to use field to specify the order of data.

The worst part of enumeration is that the list of strings is fixed. If you want to modify the list, you must use alter table. Therefore, if your enumeration value may change in the future, you should not use it. Add an element to the end of the list, then MySQL will not rebuild the entire table to complete the modification. In most cases, our enumerated list is relatively small, so the cost of searching for conversion is relatively low.

  • Bit type

Before MySQL 5.0, bit was a synonym for tinyint. But in MySQL 5.0 and later, this is a data type with completely different characteristics.

The maximum length of a bit is 64 bits. The behavior of a bit is related to the storage engine. For example, myisam will pack and store all the bit columns, so bit(17) only needs 3 bytes (24 bits) to store, while others For example, innodb or memory, they use a minimum integer enough to store each bit column, so they cannot save space. For example, the same bit(17) needs to use at least 17 bytes for storage.

MySQL treats bit as a string type, not a number type. For example, when retrieving the value of BIT(1), the result is a string containing binary 0 or 1 value, not the "0" or "1" of ASCII code. However, if it is in a digital context (including digital calculations), the bit string will be converted to the corresponding number. For example, a binary string with a value of b "00111001" has a decimal value of 57. In a digital context , It is 57, but in normal scenes, because the converted value is "57", it is "9" for the ASCII code

mysql>  Create Table: CREATE TABLE `test_bit` (
  `b` bit(8) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

mysql>  insert into test_bit(b) values (b'00111001');

mysql> select b, b+0 from test_bit;
+---+-----+
| b | b+0 |
+---+-----+
| 9 |  57 |
+---+-----+
1 row in set (0.00 sec)

In fact, we should use this type with caution. If we want to store a Boolean value in a bit storage space, another method is to create a char(0) column, which can store NULL or empty String (length is 0), corresponding to false and true respectively, this is a clever way, but it is not easy for others to understand.

  • set type

If you need to save a lot of boolean values, you can consider merging these columns into a set data type, which is represented by a series of packaged sets in MySQL, so you can effectively use space, and there is find_in_set() in MySQL Functions such as he filed() are used to assist. Its shortcomings are the same as enum. Changing the column requires using alter table, which is a great loss for large tables. Let's briefly explain how it is used.

mysql> show create table test_set\G;
*************************** 1. row ***************************
       Table: test_set
Create Table: CREATE TABLE `test_set` (
  `s` set('can_edit','can_del','can_read') NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)

mysql> insert into test_set(s) values('can_edit,can_read');
Query OK, 1 row affected (0.01 sec)


mysql> select s from test_set where find_in_set('can_read',s);
+-------------------+
| s                 |
+-------------------+
| can_edit,can_read |
+-------------------+
1 row in set (0.00 sec)

In this case, if you use an integer to pack a series of bits for processing, you can also achieve the function of a set. For example, use tinyint (8 bits) to mark the boolean value of each bit, and mark each bit in the code. Meaning, the result is obtained by bitwise operation. To be honest, this method is rarely used in business. It is generally used for lower-level development. It needs to be taken when high-performance storage is considered, because this method will undoubtedly increase the complexity of the code. Many low-level development is a good way to optimize performance, after all, the bottom structure of data storage is binary.

3. Questions and answers

After the baptism of the above knowledge, I believe you have a certain understanding of these numerical types. At least I won't just use a shuttle. Through the above knowledge, we answer some of the questions mentioned in the preface.

  • Storage time, how to choose timestamp, datetime, int?
    Give priority to using timestamp with smaller storage space, followed by choosing datetime. Int type is not used as a storage for timestamp, and there is not much advantage in storage. If you consider that the sorting and space utilization of shaping have advantages, I think it adds complexity Far more prominent than this

  • How to choose the primary key?
    Speaking of primary keys, we all know that the leaf nodes of the ordinary index tree are the primary keys of storage, so we should set the primary key smaller. The self-incrementing plastic primary key is a good choice. The plastic has the inherent compactness of space. Easily cause page splits, and there are also primary keys using uuid on the market. I personally feel that this method is very undesirable (the blogger’s current company uses uuid for some tables). The primary key of string type consumes a lot of the table. On the one hand, it will increase the storage space of the index tree, and secondly, it will increase for a long time. Deletion will cause a lot of fragmentation, and the efficiency of data table data operations is not as high as the auto-increment primary key. If you consider the distribution, you can use the snowflake ID for assignment, and uuid as the primary key can be replaced by the auto-increment primary key + business code. Please don't use the UUID primary key anymore.

  • How to select the enumerated value in business?
    Generally speaking, the enumeration values ​​in our business are usually some limited state values. As mentioned earlier, the performance of enum is not as good as that of tinyint, and there will be the overhead of alter table. Under normal circumstances, we will give priority to using tinyint. State storage, constant conversion of these numbers in the code, of course, in order to improve the readability of the data table, you should mark the meaning of the enumeration value in the comments, if you insist on using enum, then please remember that the number Try not to use enum for type enumeration to avoid unnecessary trouble.

  • How to store the IP address?
    As you all know, ipv4 is actually a 32-bit unsigned integer. In order to make people read, there will be the so-called dotted decimal notation. Therefore, some people like to use varchar(15) to store the IP address, which is actually undesirable. , We should use 4-byte int for storage, and MySQL provides inet_aton() and inet_ntoa() two functions to help us perform fast conversion

MySQL optimization includes a series of things. The selected field type is the basis, followed by index settings, storage engine selection, SQL statement optimization, configuration parameter impact, etc. These related optimizations will be discussed later. I will explain one by one in the article, I hope it will be helpful to everyone.

Guess you like

Origin blog.csdn.net/qq_38378384/article/details/114218224