Monkeys can understand the database to avoid the pit Guide, you said you do not?

Foreword

These years found a rather strange phenomenon that is around both work more than 10 years of veteran, joined the line or junior programmers, technology and trends in talk all the time artificial intelligence, big data, block chain, various frameworks, languages, algorithms, AI, BI, CI, DI ...... etc, actually found very few people concerned about the database, because the database does not know the feeling is too low-end or too low-key, not always easily been raised

Technology like this, where not too concerned will not pay attention, the more places were not taken seriously, the greater will be the probability of falling into the pit, so here give you a brief chat in which anti-use database process Hang out guidelines for children may also have joined the line a reminder of the role of oaks from little acorns, we must first lay the foundation go to consider the upper building, not forest for the trees

This chapter is divided into the following four sections (expected to finish about 5 minutes):

  1. Why is it important database
  2. What tips database
  3. What is easy to fall into the pit of the database have?
  4. Depth study and recommendations database

Why is it important database

Many people in the development process is not concerned about the database, the design table structure also belong mostly pay attention to nothing "can be used on the line," but, so long as you are engaged in the development of Web-related fields you based on nearly a decade of development experience could not avoid and deal with database, all database operations on a large multi-function nature in Web development , whether you use a Pythod, Java, Ruby and other languages for Web development, you are actually in the database-oriented programming, many Web in order to avoid the frame of the knowledge database programmers contacting layer of encapsulating even ORM (object Relational mapping object-Relational mapping), the database as a black box, and then to operate the database of the operation target in the form of

Monkeys can understand the database to avoid the pit Guide, you said you do not?

Although some sense simplified development, I have reservations about this, because it is necessary to understand the language in your SQL database is how the implementation of the programmer, you only need to explain to view the execution plan your SQL whether efficiency (number of scan lines, hit the index back to the table, sorting, etc.), comparison of different SQL wording, you also need to know how to use the show index to see your index is highly efficient (by Cardinality by the database assessment), these techniques rely heavily on your understanding of SQL, SQL for is a very important skill programmers , yes SQL is the language database operation, as far as I understand most of the company during the interview will examine programmer knowledge of SQL, SQL solid foundation not only allows you to write high-performance query language, the data for analysis, reports, statistics is a very big help

Most commercial company's core asset is actually inside the database data is very valuable asset, processes and systems hang up is not available for some time, most cases can be restored to restart, but the database was accidentally deleted by mistake If the point is of poor operation and maintenance capacity of SMEs could face collapse, from the commercial point of view, most of the software company's core database

Many programmers grow from rookie to master, contact the project from the school "a certain management system" just added to the company's internal systems, and then to large-scale distributed systems, in large systems, most programmers often encounter the first question is usually not enough thread, not the CPU load is too high, memory is not fast enough, are often the database could not carry the pressure, and why? The database itself on disk-based file system, each read access to all data through disk I / O, to understand the principles of computer students should know, in von Neumann computer architecture in disk I / O claims to be the slowest the I / O (milliseconds), usually in the amount of time your system only several thousand data, a full table scan does not usually have a great sense of delay, but when you reach millions and billions of inventory data, so a general inquiry will explode your database server, the application did people know, I hung up the database, no matter what a distributed, fast hardware micro-architecture services are basically futile, and whining Having said that, I believe we should already know the importance of the database, then the next issue behind us in terms of database design point of view

Impact on the system database design

Here we simply make a comparison, what good database design can bring to you?

  1. Reduce data redundancy, data maintenance to avoid abnormal
  2. Save storage space and efficient access speed

Bad design?

  1. Large amounts of data redundancy insert, update, delete abnormal
  2. Waste of storage space, inefficient access speed

Monkeys can understand the database to avoid the pit Guide, you said you do not?

Bad design (Figure)

For example, a simple age field, well, it should be used tinyint (1 byte) or smallint (2 bytes), but you have chosen to use int (4 bytes) which belongs to the poor field selection, see here many students just getting started might refute it, and so concerned about the use of space is a bit of overkill? Including storage is very cheap, and also so preoccupied like choice, anyway, ultimately functions are the same, others can not see what difference it. In fact, I think for this view argue about, which is typical of a novice thinking, you only see space savings on a single field, but not considered data also continues to grow, the more bad design will cost more to post growth high (here classic face questions similar to Java, collection classes ArrayList and LinkedList see the time gap when the small amount of data in comparison, it is calculated as the increase in the amount of data, the data will be consumed gap widening) until the ten million when the amount of data, you might design tables and others designed table is the same content, but your table gratuitous extra storage space for hundreds of G, if your application or if multiple data centers then this unwarranted waste of space will be copied several times to different data centers, and as long as your application is still running online, then this growth will bring costs will continue to rise, just say here waste of space, the following table structure stored in the analysis, but also specifically talk about bad design have much impact on the performance of this enterprise He said that incremental marginal cost, from a technical and architectural terms will make your system does not have the scalability

Tips Database

Storage Engine Notes

MySQL's open architecture design is not compatible with many types of storage engine (If you are powerful enough, you can also write your own set of storage engine), the storage engine is designed to respond to different types of data warehouse, work has seen no matter What tables are directly Innodb (MySQL default storage engine 5.0, although most of the scene is a good choice, but not all types of table structure applicable) have seen do not know what is the storage engine of the students, if these students come database design, then it is easy to step on your system pits, a lot of your own unexpected problems, choose the right storage engine should be combined with the actual business scenarios, from the most mainstream of MySQL, the most commonly used the main storage engine is MyISAM, Innodb, of course, there are many other storage engines, such as NDB (clustered storage engine), memory (RAM-based storage engine), archive (archive storage engine), because they usually use much, and not mainstream, work rarely used to get little value, so do not start to Here the main difference under the simple MyISAM, Innodb, mainly has the following characteristics:

MyISAM

  • No transaction mechanism, table lock, comes counting function (count full table millisecond response)
  • Mainly for OLAP applications, suitable for storing other types of data log report

Innodb

  • Row-level, high concurrency, transactions support the four transaction isolation level (MySQL 5.0+ default is read committed)
  • Transactional data primarily for OLTP applications, suitable for storing small amount of

Monkeys can understand the database to avoid the pit Guide, you said you do not?

Field Type Notes

Because do not understand the basic principles of the database, so a lot of junior programmers select the database field type when more confused, mainly no clear guidelines, the work I've seen the use of long (8 in only basic information table dozens of data byte field) id as the primary key type, as well as the state of the above said type field value using only 0,1 int (4 bytes), the character type field have seen the use of unified varchar (255), numeric type field unified use int, this database is not based on the principle of the rule to be chosen at random behavior of the field will only appear in some of your LocalHost in small projects or toys, is basically not a big table

据我所知,主流的数据库大多都提供非常丰富的字段类型给开发者使用,老司机都是基于业务类型的判断从而选择合适的字段类型,最终收获的是性能(时间)和存储(空间)都非常低的高性能数据库,具体数据库有哪些字段类型,文章里面就不多数了,这方面的资料简直太多了,有兴趣的小伙伴可以自己去搜索,例如这里 MySQL Data Types,那么对于新手而言如何选择字段类型呢?

简单的基本原则如下:(后面会具体将原因)

  1. 优先数字型字段(比如尽量使用 int 作为数据库主键 id 的类型而不是 varchar)
  2. 在满足需求的前提下,字段类型尽量足够的小(例如 age 字段应该考虑使用 tinyint 而不是 int 或者 long 类型)
  3. 时间字段考虑 timestamp (4字节,支持 UTC)而不是 datetime(8字节,不支持 UTC)

遵循基本规范能带来什么好处?

  1. 节省存储的开销,避免空间浪费(如果1条数据造成的空间开销n,那么随着数据增长,浪费空间的比例也就是 n * n)
  2. 最好的性能(用户体验,另一种角度的节省资源-算力)

为什么要把“选择尽可能小的字段”作为基本原则?我们可以先看下 innodb 的逻辑存储结构

Monkeys can understand the database to avoid the pit Guide, you said you do not?

innodb 逻辑存储结构(图)

innodb 的存储结构如下:

  • 表空间(Tablespace)
  • 段(Segment):表空间由多个段组成
  • 区(Extent):单个区由 64 个连续页(Page)组成
  • 页(Page):磁盘的最小单位,默认大小 16 KB
  • 行(Row):每条记录,也称行数据,数据存储在页中 Page

上图可以看到读取最小单元 Page,匹配的数据都是从 Page 里面取出,按照这个简单的逻辑来说页中存储的行数据越多,数据库的性能就越高,怎么算出来的呢?按最小类型 2B 来计算 Row,那么 Page 的默认大小(16KB)是可以匹配到 7992 行记录,相反,如果你的 Row 行数据过大,假如一行 32 KB,那么数据库就需要 2 个连续的 Page 来保存你一行的数据,那么性能可想而知会有多低,前后性能差距差不多 1.6 万倍,这块也不深入讲了,有兴趣的小伙伴推荐去阅读经典书籍,这里的内容也只是书里的冰山一角

选择索引的注意事项

索引是一种用空间换时间的优化手段,是数据库最重要的优化手段,也是最后的杀手锏,索引是否高效取决数据库设计是否良好,字段类型选择是否合理,索引是一把双刃剑,在提升检索速度的时候,也会减低插入,修改的性能(维护索引树的开销),在工作中这些年面试了不下几百人发现能把数据库索引原理讲明白的候选人非常的少,大多数情况下我们说索引通常默认指的是 BTREE 索引,BTREE 结构是特意为磁盘 I/O 这种缓慢的读取存储设计的数据结构,是一棵多路多叉树,和二叉树相反,每层的元素非常多,但是树的高度很矮(通常不会超过三层),从而可以保证最多不超过三次磁盘 I/O 即可定位到匹配的元素,所以说 BTREE 是一种非常适合磁盘的数据结构,也是 MySQL 默认索引类型是 BREE 的原因,如果能把这块吃透的话,那么去面试肯定是很大的加分项,索引在数据库可以简单参考下图:

Monkeys can understand the database to avoid the pit Guide, you said you do not?

简单说了下索引的结构,那么新手程序员在使用数据库所以的时候可以遵循以下原则:

  • 明白索引不是越多越好,过多的索引会降低读/写效率
  • 数据小和选择性低的列没有必要建索引(就像没必要为只有几页的书建目录)
  • 定期维护索引(移除不必要的索引,索引的最左匹配原则)
  • 谨慎使用全文索引,哈希索引,谨慎使用 FORCE INDEX 强制索引(强制会干扰优化器对索引选择的判断)

索引这块可以玩的还有很多,例如如何通过 SHOW INDEX 查看数据库为索引做出的评级(通过 Cardinality 统计),通过 Explain 查看 SQL 是否命中索引,rows 列可以看到 SQL 扫描的数据行数,Extra 列还可以查看索引匹配的类型,例如 Using index 代表完全匹配索引(无需回到 Primary Key 表查询数据,也称回表,甚至直接使用索引的排序,无需排序)往往说明性能不错,Using temporary 代表查询有使用临时表,一般出现于排序,多表 join 的情况,查询效率不高,建议优化

还有哪些要避开的坑?

Monkeys can understand the database to avoid the pit Guide, you said you do not?

人生总会遇到很多坑,与其自己去踩坑不如去总结别人踩过的坑,自己少走一些弯路也许可以更快的成功,这里是最后一章,不想把文章拉的太长,所以我在这里就直接抛出结论,不会再说明原因,如果对数据库有兴趣推荐看到最后我推荐的书籍

避免使用触发器/存储过程

  • 用存储过程写逻辑会导致代码非常的复杂难懂,并且难以定位问题
  • 降低数据库的性能(数据库不应该执行除 SQL 外的其他逻辑操作)

避免使用预留字段

  • 无法准确预测字段类型
  • 增加后期维护成本

反范式设计

  • 不必完全遵守古板的三大范式,对范式进行违反,用空间换时间
  • 对数据进行有计划的冗余,可以达到减少关联,提高性能和效率

尽量避免使用 Null 字段

  • Null 值会导致索引失效,让统计函数更加复杂,另外 Null 还会占用额外的空间(数据库需要额外标记)
  • 对于 Null 值,数据库程序通常都会进行额外的逻辑处理,奖励数据库性能
  • 从数据库中取出 Null 值容易造成程序出错,还会增加很多 if != null 的重复模板代码

最后 end

This article was written three days (idle time), mainly covering space broad, but each topic is the entry-level kindergarten, mainly to many novice programmers a simple reference, I think just to see the article to share ignite interest, as an appetizer, the final form their own knowledge, familiar with the full knowledge of the structure it is recommended to read classic books, this is the correct posture of learning, I read the book database is not a lot, but still can simple recommend two I have read and I feel very good, and this article is a reference to the large number of contents of the book, highly recommended (I'm ready electronic version of two books, electronic files may need a friend Click here [ free access ]):

  • "MySQL InnoDB storage engine technology insider": This book is mainly biased analysis of the storage engine, performance, storage structures and application scenarios of different storage engines do horizontal comparison, it is still the final table partitioning, indexing and other technical constraints and give their own opinion, I am looking at this book all admire level of understanding of the storage engine
  • "High Performance MySQL": This can be said that MySQL encyclopedia, covering a very comprehensive, MySQL is a recognized field of Bible class textbooks, the only drawback is too thick, the third edition of almost all the 800

Guess you like

Origin blog.51cto.com/14230003/2475539