Refactoring this "little" thing|Dewu Technology

This article uses the business code refactoring practice of a Web project as a basis to discuss the development problems encountered in the process of project business code refactoring, as well as some points for attention during the refactoring process, hoping to provide refactoring for project development and service development and maintenance Provide some general references and ideas.

The refactoring practice of large-scale projects is not discussed here. After all, the refactoring of a large-scale project is more focused on the improvement and update of the architecture system and the splitting of the business domain. The architecture system, human resources, department coordination and other issues involved in it have great big challenge. In addition, most of the development is only responsible for one of the services or modules, and the content discussed here may be of more reference significance for the service reconstruction after splitting.

1. Background of project code refactoring

1.1 Background issues

At the beginning of 2022, our team took over a project that had been developed for five or six months and was in a period of rapid iteration. We thought that the iteration time was not long, and we would be able to quickly get started and easily cut into the business after taking over, but often when we mentioned this "but", it was followed by a reversal. There is still a gap between ideal and reality. As we all know, there are more or less problems in general projects that iterate quickly and catch up with the progress. The project we encountered happened to be precisely within this range. In the first iteration of the version after we took over, we have considered in advance the situation of being unfamiliar with the project and made some preparations, but there are still many unexpected situations. This time, we have to evaluate the problems that may be encountered in future iterations in advance again!

1.2 The need to change the status quo

Unexpected problems appeared in the first iteration of the project, which is unreasonable! After solving the problems found in the current version, we spent some time to roughly sort out the original project code, and saw the general implementation of many interfaces. It’s a small thing to look at the code itself, but it doesn’t matter if you look at it. There are problems in the logic of many interface implementations (such as calling database queries in loops, implementing multi-level invalid caches, cache elimination mechanisms with poor complexity, etc.) ), these issues involve multiple aspects such as performance, stability, and business exceptions. If it is not resolved, in addition to affecting the user experience, it will also cause great interference to our normal project iteration and maintenance! This project is an internal project developed in Python, which itself was developed as an internal tool of type toB. At present, due to the expansion of project business scenarios, more and more toC functions have to be undertaken, and the usage scenarios in Dewu App are also increasing, which brings additional challenges to project performance and stability. We urgently need to solve these unstable factors, quickly cut into the business, and carry out normal business iterations to meet the changes in demand.

1.3 Initial ideas for refactoring

For those problems that have been found, if they can be fixed quickly, we have repaired them and verified the release, and obviously achieved some results. However, there are still many common problems in the code, so that we cannot easily go to the existing code. These problems also need to be solved urgently. At that time, just in combination with the requirements of the unification of the technology stack of the company's departments (business projects were converted to use Java/Go language), we decided to gradually improve performance, reduce problems, and access the company's The technical infrastructure system reduces code maintenance costs. Although I have a preliminary refactoring idea, it is obviously not an easy task! Before refactoring, we still have a lot of pre-work to do.

2. Refactoring pre-work

2.1 Familiar with the business process and analyze problems and pain points

Before the official refactoring, we browsed the project handover documents and previous product documents, so as to know the overall product process of the project well. Then, based on the existing product process, we evaluated the main architecture and general technical solutions that should be implemented for the redevelopment of the project, and then used them as a reference for refactoring after review and improvement, which is equivalent to providing a target model for the overall refactoring and optimization of the project . Using the above scheme as a reference, and then analyzing the interfaces on the closed-loop of the entire product process, we can quickly locate a series of problems in the project. Of course, most of these problems are superficial and macroscopic, and many of the details have yet to be solved. We need to dig further in the project.

2.2 Assess the cost of restructuring and the mode of promotion

When we made the decision to refactor, we had already done a basic analysis of existing projects. Now we will compare the differences in project plans, and determine the final treatment method according to the difficulty and urgency of refactoring .

The degree of difficulty is comprehensively evaluated based on program differences, modification difficulty, stability requirements, and scope of influence. The degree of urgency is comprehensively evaluated based on iterative reuse, interface performance, exception frequency, and benefit range.

The processing method is analyzed in the following table:

** Difficulty ** Urgency urgent medium Not urgent
Simple refactor Refactoring, after sequence times refactor, sequence again
medium refactor Refactoring, after sequence times refactor, sequence again
difficulty refactoring (or migration watching) refactoring (or migration watching) to be determined

Since our overall project is in the process of rapid iteration, and limited manpower has to follow up with the version requirements, the time for refactoring and migration is extremely limited. Although we have set the refactoring order of the modules, how do we allocate manpower to follow up on these? What about things? Requirement iteration itself involves a large number of interfaces. If we have built a new project ourselves, the iterated interface here can be migrated and refactored to the new project code base, and the project refactoring will be promoted while iterating. At other times, with some scattered time, we can gradually advance the refactoring, but in this case, we must accept the fact that the time span of gradual refactoring will be very long.

2.3 Improve and determine the traffic migration plan

Most WEB projects basically have these general architectures, as shown in the following figure:

1.png

In such a general architecture, since we have switched the underlying language, the corresponding architecture supported by the company's infrastructure can be used.

The specific traffic migration plan we finally chose is as follows:

  • Initialize the new project warehouse and improve the release and deployment process
  • Get through the user authentication methods of the old and new projects and achieve two-way compatibility (if the gateway undertakes user authentication uniformly, this step can be omitted)
  • Use the capabilities of the gateway layer to configure forwarding rules, and use specific or general rules to import interface traffic into new projects (if there is no gateway, you can simply connect to Nginx to configure forwarding)
  • After the server completes a batch of interface refactoring and migration, switch traffic to the new project in the test environment and test the overall process, and release after passing the test
  • If some interfaces need to be changed during refactoring, push the front end to switch the call to the new interface
  • Track traffic and new interface function status after going online, and roll back at any time if there is any problem

So far, we have done all the pre-work, so now we are ready to go!

3. Typical problems and optimization solutions found in refactoring

3.1 The basic operation and maintenance monitoring system is perfect

In terms of improving the operation and maintenance system, trace, log, monitoring and alarm are unified and integrated first.

optimization point question income typical thinking
Full access to trace No link tracking function Do full link monitoring link tracking
Improve the log level The log classification is not perfect Hierarchical logs for easy troubleshooting log system
log injection trace The log has no trace and is inconvenient to correlate The log has link tracking Issue Log Tracking
metric No monitoring information Monitor the real-time status of project programs metric

At present, the integration of log, trace, and metrics is also particularly important. Tools such as OpenTelemetry provide a complete set of specifications that allow developers to quickly integrate. If the enterprise infrastructure can support it, you can choose these three. If it cannot be supported, it is recommended to improve the log, trace, log injection trace information, and metric system in the following order.

Take a look at the example below:

Trace information can help us track each call link

2.png

The log injection trace information allows us to quickly filter and analyze the time dimension of the independent call link log, and track the same link data according to the traceId

{
    "level":"error",
    "ts":"2022-07-22T21:26:00.073+0800",
    "caller":"api/foo_bar.go:38",
    "msg":"[foo]bar",
    "error":"Post "https://xxx.com/abc/xxx": context deadline exceeded",
    "traceId":"0aee15dc63f617e751d17060xcf74b9c",
    "content":{
        "body":"json str",
        "resp":""
    }
}

The monitoring system monitors the operation status of business projects

3.png

3.2 Reorganize business logic

In the business logic design, there are some problems that greatly hinder the development iteration, as follows:

optimization point question income typical thinking
The data writing logic of the display interface is migrated to the data writing interface, if it is not necessary, it will be cached and transferred Part of the display interface has data writing logic, which leads to unlimited data growth. The performance of the display interface is also poor. Reduce invalid data writing and improve query performance Most systems read more than write. The read interface does not do write logic to avoid invalid data writing.
The counting logic is independent and stored in an asynchronous manner using cache Counting logic relies on detailed statistics and poor performance requires independent counting and display Release database statistical pressure and improve data display performance Precomputation replaces real-time statistical cache instead of DB query
The count field of the migration information table is independent as a single table and cached first Part of the count field is updated in the base table TPS high-impact table structure change Reduce database TPS pressure and isolate hot and cold data Hot and cold data isolation
Two-way reverse lookup data uses independent tables instead of simple jsonStr The data storage format of the two-way association is not good, and it cannot meet the two-way query and the writing of changes and updates is too difficult Reduce cross-correlation of data updates and improve query performance Reasonably design data tables (three paradigms) Simplify the model of two-way association Independent management of association relationship
Transform the unreasonable cache system Gradually streamline the cache and replace the cache structure and data Unreasonable caching system design related to caching usage scenarios, multiple processes and long processes, it is difficult to migrate at one time Clean up invalid caches, reduce the difficulty of cache recycling, and improve cache usage Reasonable use of elimination strategy for cache structure selection

In the business process, from adding, deleting, modifying and checking in the management background, to displaying and recovering data on the client side, in the whole process, it is easy for a single problem to affect the whole process. During the change process, the above problems will also have a continuous impact on the overall business process, and special attention needs to be paid, so they are listed separately! These points are also relatively difficult to transform, and we also carry out gradual reconstruction according to the degree of urgency and the overall sorting progress.

3.3 Code refactoring and solving details

In the process of refactoring each specific interface or task script, we also summarized some typical problems encountered in previous development, which are organized as follows:

optimization point question income typical thinking
Database query network I/O optimization List class interface cyclic network I/O Reduce various list interface RT Minimize network I/O
Reduce interface return fields The interface returns too many invalid fields Reduce distractions, reduce traffic Reduce network bandwidth usage
Unified return value field data type Weakly Typed Language Type Misuse Unified data types for easy management strong type
Classification tree recursive generation optimizes a single query and refactors for reuse Cyclic multi-layer query database I/O excessive generation algorithm complexity is slightly higher Unified Classification Tree Generation Integrated Filtering Rules O(n) time complexity, recursion and prevent data exceptions from infinite loop
Non-strongly associated logical function split Part of the interface business logic is isolated but the interface is strongly related Strip different modules into multiple interfaces Partition
Solve grammar problems Changing the SET data type during loop iteration does not match the data storage format, etc. (Python) Improve the stability of the main process to prevent the interruption of the asynchronous process of answer submission Data and Syntax Compatibility
Extract the same logic to achieve method reuse The same logic method is scattered and incompatible, some logic is missing, and maintenance is difficult Reuse methods and logic to unify logic and reduce maintenance difficulty reuse
Offline data synchronization optimization Full synchronization data is not batched and does not support breakpoint compensation Limit the number of batches Batch synchronization supports interruption recovery Batch, upper control limit interruption compensation
Optimize batch import data processing There is a problem with the logic of business import data processing, correlation comparison and multiple queries without using MAP Reduce algorithm time complexity and improve processing speed O(1) complexity of batch extraction MAP comparison
Redis slow query optimization inefficient cache optimization cleaning or dismantling large KEY The correct data structure is not selected. The correct operation method for the data structure is not selected. For example: store the set data and take a single element, use the smembers command to read and compare, instead of using sismember Part of the smembers read is switched to sismember, RT>50ms queries are reduced by 5/s on average, almost no RT>100ms requests are optimized, and the throughput efficiency is significantly improved**** Data structure algorithm complexity Redis single-threaded model blocking
Cache elimination mechanism optimization Some caches have no effective elimination mechanism, code SCAN elimination management is difficult, and performance is poor Reasonable design and use of Redis automatic expiration mechanism Reasonable use of cache elimination mechanism
SQL index tuning inefficient indexing, cross-reference interference Improve query performance Index Tuning Index Covering Query
Asynchronous consumption script optimization The idempotent, transactional, and compensation logic of MQ consumption tasks need to be confirmed Anti-lost, anti-heavy, compensation processing Idempotency, transactions, etc.
Configuration migration to configuration center Some configurations are hardcoded and need to be migrated Unified management configuration code does not contain sensitive information Isolate configuration by environment to hide important information
Only producers have no consumer queues pending Infinite injection of data into the non-consumer queue by production leads to infinite expansion of the queue Get through the production and consumption process, cancel useless queues, and reduce space occupation If there is production in the queue, there must be consumption

The above typical problems have been refined and summarized. It seems that there are only a limited number of them, but in the project code base, each problem point may appear several times or even dozens of times, so the overall code has to be refactor.

Here are some practical examples:

  1. Optimizing the I/O of cyclic network I/O of list class interface
// 这里代码就不贴了,我大概做个说明:

// 数据库ORM使用时候处理粗糙,额外写的GET方法,独立查询关联表
// 在列表接口中循环获取该数据即造成数据库网络I/O的循环调用

// 而通过ORM预加载的方式,则是削减查询并二次组装的数据
// 当然我们也可以自己查询列表数据,然后先遍历提取外键,将关联数据转换为map接口,再次遍历列表来组装结果数据

  1. Batch operation when processing a large amount of data
// 大量数据处理分批次进行
// 一是考虑内存用量、二是考虑I/O交互流量、三是考虑处理分块时间、四是考虑中断恢复
// 例如: 从数据库批量获取数据,然后批量上传到某处
// 注意: 分页批处理需要保证每一页数据不变化,否则会有错漏或者重复
page := 1 
pageSize := 10000
for {
    // 判断已经存在的断点数据,将条件恢复成断点条件

    // 直接 模拟查询的结构列表
    dataList := make([]itemStruct{}, 0) 
    
    // 此处处理分批后的业务逻辑
    ......

    // 处理完当前批次可以记录断点
    // 断点可以缓存在数据库、Redis等其他媒介中
    
    // 分页查询,查不到数据或者低于pageSize可以当成数据处理完了
    if (len(dataList) < pageSize) {
        break
    }
    // 查询条件切换为下一页    
    page += 1
}

  1. Comparison before and after optimization of slow query caused by large key in Redis master-slave version

4.png

5.png

  1. Reasonable use of Redis's cache elimination mechanism (to eliminate the way of SCAN scanning to eliminate KEY)

6.png

  1. Sensitive data is transferred to the configuration center (for example, this hard-coded configuration needs to be migrated away)

7.png

4. Summary of Refactoring Experience

4.1 Outcome of Refactoring

经过接近一年的渐进重构,我们大概完成了80%以上的功能模块,解决了以上提到的绝大部分问题,并且取得了很不错的提升。主要表现在以下几个方面:

  • 已经全线接入trace、log、metric和告警,并可以根据这几个工具来指导项目维护和优化
  • 业务稳定性明显提升,从刚接手时候的经常报错修BUG到现在几乎没有问题
  • 不再有循环数据库查询I/O ,除了报表类接口,基本杜绝慢查询
  • 部分重点接口的性能提升明显,有一些后台接口极端RT值从10s以上压缩到1s以内,C端主要接口RT的99线在150ms内(非高并发设计场景,勿喷)
  • 可复用逻辑已尽量整合,完善并统一了之前不一致的业务逻辑,维护难度明显降低
  • 长链路的数据提交整体流程无丢失数据的异常发生
  • 业务方使用满意度提升

另外我们因为资源限制,截止目前,并未完成所有的重构工作,后续如果有资源投入,我们会继续推进这些问题的处理。

4.2 人员技能要求

我们也分析了本次重构对于开发人员的一些技能要求。在所有参与的开发人员都经验比较足的情况下,基本上可以通过自我驱动以及经验,来发现上面列举的这些问题,并主动去解决,而不需要额外的培训、规范以及纠错。对于经验不是很丰富的开发者,就需要适当的总结和规范去指导作业。抽取上面的问题来具体分析的话,不外乎如下这些技能:

  • 熟练使用当前项目所需要的开发语言,避免产生一些基础的语言问题
  • 对数据库有比较深的了解,能根据实际情况设计比较合理的数据结构并做到适当优化
  • 对常用的中间件使用比较熟悉,了解一些原理,避免在使用的时候只知其一不知其二从而踩坑又难以排查
  • 计算机的基础扎实,操作系统知识,算法与数据结构知识熟悉,可以写高效的代码
  • 了解高并发高可用架构,可以根据一些基本思想来指导开发与优化

4.3 规范执行与Review
除了以上的技能,一些规范的制定与执行也十分重要,不过相反的是,这是自上而下的流程,需要一定的管理层面的推进。规范本身指定了行为边界,完善合理的规范制定之后,只要遵照执行,就可以避免绝大部分问题。另外Review制度可以促进规范的落地,避免空有规范而实践偏离的情况。现行的好设计,在经过一定的时间之后,随着业务需求变化以及用户量的提升,或许又需要开启新一轮的优化或者重构,这也是正常的。

5.让重构成为“小”事

5.1  任务阶段化来变“小”当前事项

纷繁复杂的整套重构流程,如果混在一起,可能会让人望而却步。但是通过具体分析,阶段拆解的方式,将任务切割成一件件“小”事情,可以让我们用比较容易的方式去一步步解决问题。另外拆解不是无脑拆分,还是需要有一个整体上的架构设计,否则可能会导致整个重构任务不能达到目标,既延长了时间,又难有成效。重构过程的质量把控,可以通过规范、强制lint检查、Review、单元测试、性能测试 等方式去保障,这在每一阶段都是要贯彻执行的。

5.2 开发的几个主要思路

其实还有很多我们项目中没有出现的问题,恕我们不能一一列举。但是在开发过程中本着几个主要的思路,我们就可以设计并开发出完善且高性能的项目。例如:

  • 最小维护难度:系统设计结构完整、逻辑算法简洁高效
  • 单个接口最小RT:接口性能要高,RT尽可能的小
  • 最小限度的数据交互:接口请求参数以及返回值尽量精简,以节约网络带宽
  • 异步削峰限流等:使用异步方式剥离额外逻辑,提升接口性能并提升用户体验
  • 最少访问频次:了解计算机各个硬件以及网络I/O的性能层级,尽量减少长耗时的I/O,转换为程序内部处理
  • 最少数据写入:数据写入需要尽量削减,减少流量以及无效数据的产生
  • 缓存与缓存一致性:多级缓存、缓存一致性、缓存淘汰策略等思路
  • 分治与隔离:该思想除了在拆分资源上体现(对象储存CDN剥离图片文件等内容),还可以在业务模块上体现出来(界定业务边界,合理拆分具体的模块与流程)
  • High availability: focus on stability, the overall project process is stable and error-free, downgrade and disaster recovery need to be considered in advance

5.3 Upgrading skills and accumulating experience

Of course, it doesn't mean you can sit back and relax by mastering the ideas mentioned above. For example: the project we refactored has used asynchronous methods to deal with problems before, but this asynchrony is because the performance is limited due to the poor design of the interface itself, so we have to adopt a solution as a last resort. The data inconsistency caused by some long asynchronous processes will also cause many problems. After we reconstructed the interface, we started to cancel these asynchronous tasks and made them into transactional processes, which is obviously much better than the previous functions. Therefore, we still need to deepen our understanding of basic computer knowledge, expand and improve the development knowledge system, systematically master the design of high-availability and high-concurrency systems, learn to accumulate and summarize development experience, so that we can be comfortable in project programming and development .

 


This article belongs to Dewu technology original, source: Dewu technology official website

Dewu technology articles can be shared and forwarded at will, but please be sure to indicate the copyright and source: Dewu technology official website

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5783135/blog/8676985