Database related optimization solutions

1 Background overview

In most development projects, especially integration projects, there will be work involving data analysis. Most data analysis is the display and interaction of various charts (so-called data visualization). The speed of data analysis directly affects the user's experience. Experience, and most of the management system (MES, PDM/PLM, ERP, SCM, OA, HR, etc.) data are stored in the database, database-related performance optimization can easily improve the overall performance of the program system, improve users experience to ensure the smooth acceptance of the project. The overall performance optimization of the application requires global consideration, such as: hardware selection, software architecture, deployment architecture, program development, etc. This article mainly focuses on the relevant optimization methods at the database level in the program development part, hoping to be helpful to everyone.

2 Intended audience

  1. Internal staff of Shutong Changlian
  2. The majority of IT-related practitioners

3 ways to optimize

Whether it is a development project or an integration project, the ultimate purpose is the acceptance of the project, promoting the payment of the project, and ensuring the further operation of the company's capital flow. However, if the performance of the functional program is not up to standard, the response speed is slow and the customer experience is affected, which will directly affect the acceptance of the project, thus hindering the normal operation of the company. Typical optimization approaches are: hardware selection, system software, and application programs.

3.1 Hardware Selection

What kind of server to choose will encounter the same problem, that is, what kind of hardware configuration server to choose. In the daily project work, the server will be divided into: application server, database server, file server and other servers.

  • Application server : It is generally used for the deployment of business system functions and the deployment server of application systems. (Recommended configuration: CPU 3.0G Hz and above, 4 cores and above memory 32G hard disk 500G (RAID10)).
  • Database server: The data server has high requirements on CPU, memory, and disk. In practical applications, if a certain hardware is a short board, it will cause performance problems. (Recommended configuration: CPU 3.0G Hz and above, 4 cores and above, memory 16G and above, SSD 1T (RAID10)).
  • File server: The file server mainly requires high IO and hard disk size, and low memory. (Recommended configuration: CPU 3.0G Hz and above, memory 4G and above, hard disk 2TB (RAID5)).
  • Other servers: As for other servers, it depends on your specific needs.

Generally speaking, the higher the hardware configuration, the better the performance, but comprehensive consideration (money!) The hardware configuration can generally meet the performance requirements looking forward to the next 3-5 years. Note : Cloud servers are now also an option to consider.

3.2 System software

The most common choices for operating systems are Linux and Windows. Considering the performance and security of the server, we usually choose the Linux operating system. Although the performance of the Server version of the operating system itself is relatively stable, we can optimize the configuration of the corresponding operating system to further match the performance requirements of the corresponding project, while the Linux series of operating systems have more optimization strategies and space, and are more important The main thing is that operation and maintenance, especially remote operation and maintenance, is very convenient.

3.3 Applications

The standard to measure a program is the security of the program first, and then the performance of the program, that is, the response speed of the program. The confidentiality requirements for the program are not strictly required by all industries, so the performance of the program is not differentiated by industry, and the performance is changed to bring a better experience.

The nirvana of application optimization is usually that the program (software) itself supports horizontal expansion, which is introduced in many books, Baidu keyword: large-scale system architecture, you can learn a lot of related knowledge, horizontal expansion is another topic, this topic is also There are many aspects involved, so I won't go into details in this article.

The optimization of the basic environment of the system program is also more obvious to the optimization of the application, such as: the JVM settings of the Java program, the configuration of the number of child processes of the PHP program, the authentication mechanism of the .NET program, the runtime library settings, and so on. Basic environment tuning is not the focus of this article. In the following, we mainly introduce the optimization scheme related to the software database in detail.

4 Optimization plan

Although NoSQL has also become popular, it is only a supplement to the database in more scenarios. The database has firmly occupied the home field of the background storage of management software since its birth, and has never left. Performance optimization at the database level is a conventional system tuning method that can be effective with short-term and quick adjustment, or can greatly improve performance with a little attention during development.

4.1 Overall strategy

We usually need to start from the perspective of overall strategy development, and optimize the database from three aspects: summary query, view mode, and data caching.

4.1.1 Summary query

In daily work, if the query statements involved are complex, or you need to access a third-party database, the performance is often affected due to the different reading frequencies of different data tables in different databases when accessing a third-party database. Faced with this situation, we usually summarize the content that needs to be queried into the intermediate table, and then directly query the data from the intermediate table.

4.1.2 View Mode

Under normal circumstances, creating a view will not directly improve performance, but if the content of the query involves the association between multiple data tables and the association relationship is relatively re-examined, the query result set is frequently accessed. At this time, if the view is not created, the SQL needs to be recreated every time the result set is queried. However, if a unified view is created and the SQL tuning has been performed before the view is created, it is convenient for everyone to make unified calls to improve the performance of the database.

4.1.3 Cache Mode

The result set of the current query result is to provide data presentation for the presentation content, not interactive data operations. It is not frequently changed. We can put the query result set of the data into the cache, so that it can be obtained in the cache when reading, reducing the The access operation to the database further improves the response rate of the program. Common cache processing methods in program applications are as follows:

static cache

Static cache usually creates a static HashMap variable. In data acquisition, it is judged whether the Map contains it, if it is obtained in the Map variable, if not, it is queried in the database and then placed in the cached Map variable.

Distributed cache

Distributed cache is usually used in cluster deployment scenarios. Usually, applications are deployed on different business servers, and distributed cache management is performed through Redis or Mncached.

4.2 General optimization

In the database optimization scheme, the most common and the most critical part of performance optimization is the SQL optimization of the database. This article describes the common SQL optimizations in three aspects: query optimization, update optimization, and other explanations.

4.2.1 Query optimization

Avoid returning large data volumes on the client side

尽量避免在客户端返回大数据量,若数据量过大,应该考虑相应需求是否合理。如果一定要返回大数据量,考虑使用数据库分页来处理。

查询避免使用*

SELECT子句中避免使用*号数据库在解析的过程中,会将*依次转换成所有的列名,这个工作是通过查询数据字典完成的,这意味着将耗费更多的时间。如:

Select * from emp

应该为:

Select id,name,code from emp

慎用DISTINCT

用EXISTS替换DISTINCT: 当提交一个包含一对多表信息(比如部门表和雇员表)的查询时,避免在SELECT子句中使用DISTINCT. 一般可以考虑用EXIST替换, EXISTS 使查询更为迅速,因为RDBMS核心模块将在子查询的条件一旦满足后,立刻返回结果. 例子:

(低效):

SELECT DISTINCT DEPT_NO,DEPT_NAME FROM DEPT D , EMP E WHERE D.DEPT_NO = E.DEPT_NO

(高效):

SELECT DEPT_NO,DEPT_NAME FROM DEPT D WHERE EXISTS ( SELECT ‘X' FROM EMP E WHERE E.DEPT_NO = D.DEPT_NO);

UNION和UNION-ALL

用UNION-ALL 替换UNION ( 如果有可能的话): 当SQL 语句需要UNION两个查询结果集合时,这两个结果集合会以UNION-ALL的方式被合并, 然后在输出最终结果前进行排序. 如果用UNION ALL替代UNION, 这样排序就不是必要了. 效率就会因此得到提高. 需要注意的是,UNION ALL 将重复输出两个结果集合中相同记录. 因此还是要从业务需求分析考虑使用UNION ALL的可行性。

条件子句的注意事项

创建索引

对where中的条件列创建索引,可以加快查询速度。对于表中的主键、外键、有对像或身份标识意义的字段视情况添加索引。

避免null判断

应尽量避免在 where 子句中对字段进行 null 值判断,否则将导致引擎放弃使用索引而进行全表扫描,如:

select name from system_users where id is null

最好不要给数据库留NULL,尽可能的使用 NOT NULL填充数据库。备注、描述、评论之类的可以设置为 NULL,其他的,最好不要使用NULL。不要以为 NULL 不需要空间,比如:char(100) 型,在字段建立时,空间就固定了, 不管是否插入值(NULL也包含在内),都是占用 100个字符的空间的,如果是varchar这样的变长字段, null 不占用空间。

可以在id上设置默认值0,确保表中id列没有null值,然后这样查询:

select name from system_users where id = 0

避免不等于操作
尽量避免在 where 子句中使用 != 或 <> 操作符,否则将引擎放弃使用索引而进行全表扫描。

避免in或not in

in 和 not in 也要慎用,否则会导致全表扫描,如:

select id from t where num in(1,2,3)

对于连续的数值,能用 between 就不要用 in 了:

select id from t where num between 1 and 3

很多时候用 exists 代替 in 是一个好的选择:

select num from a where num in(select num from b)

用下面的语句替换:

select num from a where exists(select 1 from b where num=a.num)

避免对字段进行函数操作

尽量避免在where子句中对字段进行函数操作,这将导致引擎放弃使用索引而进行全表扫描。如下:

select id from t where substring(name,1,3) = ’abc’

查询所有以abc开头的名字的id

应改为:

select id from t where name like 'abc%'

4.2.2 更新优化

更新批量使用bach处理

在程序中尽量避免大量的insert或者delete同时处理,如果遇到这种情况需要使用bach进行批量统一处理。

避免大批量的insert和delete

因为这两个操作是会锁表的,表一锁住了,别的操作都进不来了。所以,如果有一个大的处理,一定把其拆分,使用 LIMIT oracle(rownum),sqlserver(top)条件。

Update注意

如果只更改1、2个字段,不要Update全部字段,否则频繁调用会引起明显的性能消耗,同时带来大量日志。

杜绝count(*)

select count(*) from table;

这样不带任何条件的count会引起全表扫描,并且没有任何业务意义,是一定要杜绝的。

4.2.3 事务处理

在数据库使用中尽量减少长事务

在数据库中如果涉及到主表、从表、附属从表,这时如果同时操作三个数据表同时成功以及同时失败,如果当前数据表的数据量较大,为了降低数据库的性能压力,我们可以采用批处理方式分别批处理三个数据表来进行数据库性能的提升。

减少分布式事务的使用

一般的数据库均是支持分布式事务,当涉及到跨数据库的不同数据表的操作时我们可以使用分布式事务。但为了提高性能损耗,尽量减少这种强一致性需求,更多情况下转化为最终一致性方式来满足业务需求,通常来说引入消息中间件是这种场景下的常规解决手段。

4.2.4 其他说明

多用varchar和nvarchar

尽可能的使用 varchar/nvarchar 代替 char/nchar ,因为首先变长字段存储空间小,可以节省存储空间,其次对于查询来说,在一个相对较小的字段内搜索效率显然要高些。

减少大字段的使用

在数据库中定义类型是尽量避免使用大字段类型如:BLOB、TEXT、LONG以及Object等大对象的类型

不要在数据库中存储文件

在程序设计以及数据库存储是不要将图片文件、其他日志文件的文件类型存储于数据库中,而是在数据库中存储文件索引的URL将文件存储于文件服务器中。

4.3 配置优化

在进行数据库连接操作时,我们可以通过选择合适的驱动、释放连接池中的资源、选择符合应用场景的接口,构造只读结果集来进一步的优化JDBC的配置。下面我们通过连接处理、匹配接口以及返回结果三个方面进行详细的说明。

4.3.1 连接处理

对于Java程序而言, Connention的优化通常使用数据连接池(dbcp、proxool、c3p0)来进行Connention对象的管理,这样程序的灵活性强,便于移植。但要注意的是对象池里中是没有回收机制,并且对象池里有容量限制,对于对象池里的闲置对象尽早的释放资源

下面来简单说明不用的连接池的对比:

Dbcp(DataBase connection pool):是apache上的一个 java连接池项目。

优点:配置方便,可以设置最大和最小连接,连接等待时间等,持续运行的稳定性,速度快。

缺点:没有自动的去回收空闲连接的功能,大并发量的压力下稳定性不高,不能够进行连接池监控。

ProxoolProxool是一种Java数据库连接池技术。是sourceforge下的一个开源项目。

优点:可以设置最大和最小连接,具备监控功能。

缺点:明显的性能问题持,续运行的稳定性不高。

C3p0是在Hibernate和Spring中默认支持该数据库连接池,实现了数据源和jndi绑定,支持jdbc3规范和jdbc2的标准扩展。

优点:支持高并发,异步操作,有自动回收空闲连接功能。

缺点:没有Dbcp的速度快。

4.3.2 匹配接口

对于Statement对象的优化,我们需要根据不同的应用场合选择合适的Statement接口。如:

Statement不带参数,例如:查询时,不需要到任何参数。

PreparedStatement PreparedStatement可以写参数化查询,比Statement能获得更好的性能,可以阻止常见的SQL注入式攻击,提高安全性。

CallableStatement专门针对存储过程,使用它能享受到所有存储过程带来的优势,但也包括存储过程带来的劣势如Java程序可移植性查,依赖数据库等。

4.3.3 返回结果

优化结果集(ResultSet)查询时候,返回的结果集有不同的类型。结果集分两种类型:只读和可更改。返回的结果集默认就是只读的。而在Oracle中我们可以设置手工加锁语句(Select XXX forUpdate)。

明确指定主键,并且有此数据则锁定若无则不锁定

SELECT * FROM products WHERE id='3' FOR UPDATE;

无主键或者主键不明确则进行表锁定

SELECT * FROM products WHERE name='Mouse' FOR UPDATE;

 

5 个人总结

应用程序优化是一个系统工程,需要综合考虑,更多时候要提前考虑,在系统架构层面来保障系统具有更多优化的能力。系统运维有一种消极的说法,系统能用就行,不要轻易去改变;但对于系统开发而言,每一次代码重构都是一次系统调优以及增强调优能力的机会。Devops也慢慢开始盛行了,开发和运营越来越密切,甚至是一套班子两种角色,你(们)如何选择?我个人而言,倾向主动调优、拥抱变化,即便可能带来一些风险。

无论是对公司的产品进行开发还是在项目开发的过程中,要在全局的角度出发整体考虑、制定规范、落实到每一项的工作中,从制度上保障系统性能调优的能力。笔者作为数通畅联公司的一名技术员工,今天将自己所学所用的常见的数据库优化相关处理总结出来与大家分享。如果对本文档相关的描述信息存在疑问欢迎加入数通畅联官方技术群(299719834)进行讨论。

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326225526&siteId=291194637