High-performance MYSQL (study notes)-Schema and data type optimization 1

Schema and data type optimization

Choose an optimized data type

Choosing the right data type is critical to achieving high performance, and the following principles can help you make a better choice.

smaller is usually better

In general try to use the smallest data type that can correctly store the data. Smaller data types are usually faster because they take up less disk, memory, and CPU cache, and require fewer CPU cycles to process.

Simple is good

Operations on simple data types generally require fewer CPU cycles. Integer operations are less expensive than string operations because character sets and collation make character comparisons more complicated than integer comparisons.

Try to avoid NULL

NULLable columns require more storage space and require special handling in MySQL. When NULLable columns are indexed, each index record requires an extra byte.

When choosing a data type for a column, the first step is to determine the appropriate large type: number, string, time, etc., and the next step is to choose a specific type. For example, TIMESTAMP and DATETIME can both store the same type of data, but TIMESTAMP only uses DATETIME has half the storage space, and will change according to the time zone, with special automatic update capabilities.

Integer type

There are two types of numbers: integers and real numbers. If you store integers, you can use these integer types: TINYINT (8 bits), SMALLINT (16 bits), DEDIUMINT (24 bits), INT (32 bits), BIGINT (64 bits) storage space.

The integer type can have UNSIGNED, which means that negative values ​​are not allowed, which can roughly double the upper limit of positive numbers. For example, the range that TINYINT UNSIGNED can store is: 0-255. MySQL can specify a width for integer types, such as INT(11), which for most applications does not make sense: it does not limit the legal range of values, but only specifies the number of characters that the MySQL interactive tool can display. For storage and computation, INT(1) and INT(20) are the same.

real type

     Real numbers are numbers with a fractional part, and the FLOAT and DOUBLE types support the use of standard floating-point arithmetic and approximation. The DECIMAL type is used to store exact decimals. DECIMAL(18,9) means that 9 numbers will be stored on both sides of the decimal point, a total of 9 bytes are used: 4 bytes are used for the digits after the decimal point, and 4 characters are used for the digits after the decimal point. section, the decimal point itself occupies 1 byte. There are a number of ways to specify the desired precision of a floating-point column, which can cause MySQL to use a different data type, or to round off the value when storing it.

String type

MySQL can define its own character set and collation, or collation, for each string column.

VARCHAR and CHAR types

The VARCHAR type is used to store variable-length strings and is the most common string type. It is more space efficient than fixed-length types because it only uses the necessary space (shorter strings use less space) VARCHAR requires an additional 1 or 2 bytes to record the length of the string: if the maximum length of the column is less than Or equal to 255 bytes, use only 1 byte for representation, otherwise use 2 bytes. For example, a VARCHAR(10) column requires 11 bytes of storage space. But the need to make the row longer during UPDATE results in extra work. The use of VARCHAR is appropriate in the following cases: the longest length of the string column is much larger than the average length; the update of the column is infrequent and fragmentation is not an issue; a complex character set such as UTF-8 is used, and each character is used differently number of bytes to store.

The CHAR type is fixed-length: MySQL always allocates enough space based on the fixed-length string length. When storing CHAR values, the CHAR values ​​are padded with spaces as needed to facilitate comparison. CHAR is suitable for storing very short strings, or all values ​​are close to the same length. For frequently changing data, CHAR is better than VARCHAR, because fixed-length CHAR types are not prone to fragmentation.

Generosity is unwise~

VARCHAR(5) and VARCHAR(200) storage 'HELLO' space overhead is not the same, longer columns will consume more memory, because MySQL usually allocates fixed size memory blocks to hold internal values, especially using memory Temporary tables are especially bad when sorting or manipulating them.

BOLO and TEXT types

BLOB and TEXT are both string data types designed for storing very large data, which are stored in binary and character ways, respectively.

They belong to different data type families,

Character type: TINYTEXT,SMALLTEXT,TEXT,MEDIUMTEXT,LONGTEXT

Binary type: TINYBLOB, SMALLBLOB, BLOB, MEDIUMBOLOB, LONGBLOB

MySQL treats BLOB and TEXT as an object. When BLOB and TEXT are too large, INNOBD will use the external storage area for storage. At this time, each value needs 1-4 bytes to store a pointer in the row, and then externally The memory area stores the actual value.

BLOB stores binary data without collation and character set, while TEXT has character set and collation. Try not to use TEXT and BLOB

Use enums (ENUM) instead of string types

    Sometimes you can use an enumeration column instead of a common string type. An enumeration column can store some non-repeating strings as a predefined set. MySQL is very compact when storing, and will be compressed to one or more according to the number of list values. Of the two bytes, the internal storage is an integer instead of a string.

date and time type

DATETIME: This type can hold a wide range of values, from 1001 to 9999, with excellent precision. It encapsulates the date and time in integers in the format YYYYMMDDHHMMSS, regardless of the time zone, and uses 8 bytes of storage space.

TIMESTAMP: The TIMESTAMP type holds the magic number since midnight on January 1, 1970, it and UNIX time. Use 4 bytes of storage space. Only 1970 to 2038 can be stored. Use TIMESTAMP as much as possible, storage space is more efficient.

Bit data type

BIT column stores one or more true/false values ​​in a column, BIT(1) defines a field containing a single bit, BIT(2) stores 2 bits, the maximum length of BIT is 64 bits, MySQL treats BIT as a character String types instead of integer types, and a search for BIT(1) results in a "0" or "1" containing binary 0 or 1 instead of ASCII. The BIT type should be used with caution!

select identifier

It is very important to choose the appropriate data type for identifiers, usually used as foreign keys to related tables, integers are usually the best choice for identity columns, because they are fast and can use AUTO_INCREMENT

ENUM and SET types, try not to use

String types: avoid them, they are space-consuming and slower than numeric types. Great impact on performance!

Be careful when using random strings, such as strings generated by MD5()/SHA1()/UUID().

Randomly inserted values ​​will be randomly written to different positions of the index, so using the INSERT statement will slow down, which will lead to page splits, random disk access, and SELECT statements will become very slow, and logically adjacent rows will be distributed differently between disk and memory place.

Random values ​​make the cache less effective for all types of queries.

A special type of data, IPV4 addresses often use VARCHAR(15) columns to store IP addresses, however they are actually 32-bit unsigned integers, not strings. The decimal point is used to divide the address into four segments to make it easier for people to read. IP addresses should be stored using unsigned integers, mysql uses the INET_ATON() and INET_NTOA() functions to convert between these two representations.

Pitfalls in MySQL Schema Design

Too many columns: MySQL's storage engine API needs to copy data between the server layer and the engine layer through a buffer format, and then decode the buffer content into individual columns at the server layer.

Too many associations: The so-called entity-attribute-value design pattern is a common bad design pattern. MySQL can only associate up to 61 tables. Too many associations are expensive to resolve and optimize queries. It is recommended that each query is the best Do associations within 12 tables.

Almighty enumeration: The enumeration design should use integers as foreign keys to associate with the dictionary table and the latter lookup table to find specific values.

Non-inventive NULL: When you need to store a null value in the table, you may use 0, a special value, or an empty string instead. Avoid using NULL!

Paradigm and Anti-Paradigm

     In normalized databases, each factual data will appear and appear once, in denormalized databases, information is redundant and may be stored in multiple places.

The advantages and disadvantages of normalization

advantage:

1. Update operations in normal form are usually faster than in anti-normal form

2. When the data is well normalized, there is little or no duplicate data, so less data needs to be modified

3. Normalized tables are usually small and can be better placed in memory, so execution is faster

4. No redundant data means less DISTINCT or GROUP BY statements are required when retrieving list data. In the non-paradigm drawing structure, GROUP BY needs to be used to obtain the unique department table.

shortcoming:

The normalized design schema usually requires association, and a slightly more complex query statement may require at least one association on the schema that conforms to the normal form, which may cause some indexing strategies to be invalid.

Advantages and disadvantages of anti-paradigm:

An anti-normal schema can avoid associations very well because all data is in one table. When no association is needed, the worst case is a full table scan, which is much faster than an association because random I/O is avoided

Individual tables can also use more efficient indexing strategies.

Mixing Paradigm and Anti-Paradigm

Mixing is often used in practical applications. The most common way to denormalize data is to copy or cache data, store the same specific columns in different tables, and use triggers to update cached values. When updating data, a particular column needs to be updated twice. At this time, it is necessary to evaluate the update frequency and update duration, and compare it with the frequency of executing SELECT queries.

Another redundant data from the parent table to the word table is due to the need for sorting. For example, in the normalized schema, the cost of sorting the message by the author's name will be very high, but if the author_name field is cached in the message and created With a good index, sorting can be done very efficiently.

  Summarize

MySQL has always insisted on simplicity, and people who need to use the database should also like the simple principle:

1. Try to avoid over-design

2. Use small and simple appropriate data types, and avoid NULL values ​​as much as possible unless really necessary.

3. Try to use the same data type to store similar or related values, especially columns used in relational tables

4. Pay attention to variable-length strings, which may lead to pessimistic allocation of memory by maximum length in temporary tables and sorting

5. Try to use integers to define identity columns

6. Avoid using features discarded by MySQL, such as specifying the precision of floating-point numbers and the display width of integers

7. Be careful with ENUM and SET, preferably avoid SET.

Paradigm is good, but non-paradigm is also necessary.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325390268&siteId=291194637