At the moment of the epidemic, the struggle continues

foreword

Some friends asked why it hasn't been updated recently. I've been really busy recently, not because I don't have time, but because I don't want to update (a good excuse). In April, I transferred internally in the company and took charge of two new teams, so I was relatively busy. In this article, let’s talk about some recent insights.

New starting point, new goal

To be honest, at first I wanted to reconstruct, but the ideal is very full, the reality is very skinny, and there are several major problems ahead.

  1. The development process is not standardized, and the existing monitoring, alarm, and logs are not perfect.

  2. There is a serious shortage of manpower, the business is parallel, and the existing manpower is not enough to support the business. However, some things will never change without action.

What preparations have I made

  1. Target planning plans to spend 2 months to solve system stability issues.

262ca3605397ad4ddc10b9164cdc81c1.png


  1. Stability assurance task breakdown by priority

level task item illustrate schedule
01 Development specification & online process specification formulation Avoid human factors causing launch problems, such as code specification, CR specification [CR before testing], launch process specification, and points that need to be confirmed during the grayscale period after launch. completed
02 Existing Grafana Monitoring Improvements Improve monitoring indicators, currently some monitoring indicators are not friendly completed
03 Error & timeout log handling If there are many error logs and frequent alarms, the alarm will lose its meaning and will cause effective ERROR to be overwritten (the goal is zero ERROR) completed
04 Project GC tuning The garbage collector is changed to G1, parameters are tuned, and dynamic adjustment of operation and maintenance is supported completed
05 Sorting out the core process of order dispatching Draw a flowchart of the core business process, sort out points that may have performance bottlenecks, and prepare for subsequent optimization completed
06 Basic data source combing For example, Mysql, Redis, MongoDB, MQ data sources and usage scenarios, and continue to carry out project in progress
07 Grafana monitoring & alarm environment isolation Divide offline environment (development, testing, pre-production) and online environment, both online and offline have monitoring and DingTalk alarm completed
08 Optimization of dispatching project interface Slow interface, slow SQL, partial process transformation in progress
09 dynamic thread pool Support dynamic configuration, thread pool monitoring, monitoring indicators such as the number of core threads, the maximum number of threads, the number of active threads, accumulation queues, thread processing time, etc. in progress
10 Sorting out the core configuration of dispatch orders The current configuration is relatively chaotic, and it needs to be sorted out to the document and continue to carry out the project in progress
11 Add monitoring indicators Redis, Mysql connection pool monitoring, MQ production consumption monitoring, business indicator monitoring Not scheduled

Many are developed in spare time. It has been more than a month, and it can be said that the changes are visible to the naked eye. It's been a long time, and there are many things. I hope that every day will change a little bit. Some things will never change if you don't take action.

In another month, looking back, it will be a different scene.

what are you doing now

At present, part of the infrastructure has been improved, and reconstruction is on our agenda.

In the follow-up, we will share with you the bits and pieces of our reconstruction.

There are currently planned

  1. CMS system reconstruction

  2. Passenger queuing system reconstruction

  3. Internal communication transformation (internal change to RPC communication)

  4. Fence System Refactoring

Stay tuned!

One more sentence: refactoring is not equal to rewriting, refactoring must be carried out in line with the business.

Summarize

As a technical person, you still need to have some technical pursuits in order to go higher and further;

Guess you like

Origin blog.csdn.net/weixin_38130500/article/details/124791655