Monitoring and decision making

1. What do we need to monitor in the production environment?

  1. Resource monitoring , monitoring the health of the system infrastructure, including the monitoring of network and server nodes, monitoring content includes network connection and congestion status, CPU load and usage of memory and external storage space, etc.

  2. Application monitoring , to monitor the running health of the application, for example, whether the application process exists, whether it can provide external services normally, whether there is a function map, whether it can connect to the database normally, whether there is a timeout phenomenon, and whether there is a service throw Abnormalities and alarms, whether it can be expanded in time to cope with the sudden increase in a large number of requests, etc.

  3. Business monitoring is the monitoring of the health of business indicators. For example, for an e-commerce website, it should include but not limited to real-time user visits, specific page views, conversion rate, order volume, transaction volume, etc.

2. What is the data monitoring process like?

1984d2621d9d6667741609e5fc9cd1f9.png

  1. Collection and reporting: collect and report the pre-defined event data locally.

  2. Data collation: collect, clean and organize the data reported by each data source.

  3. Real-time analysis: analyze and process real-time data.

  4. Offline analysis: model or rule extraction from large amounts of data.

  5. Result output: display the results of real-time and offline analysis for decision-making reference.

  6. Problem decision-making: According to the output of the previous step, the next action judgment is given artificially or automatically, and the judgment record is saved at the same time to provide a basis for subsequent decision-making.

  7. Data storage: off-line storage of raw data, analysis data, and processing records.

  8. The interface between automatic repair and the operation and maintenance execution system, it needs to send repair instructions to the operation and maintenance execution system, and the execution system will distribute the instructions to the corresponding nodes and perform corresponding operations.

3. What information does the data format contain?

Usually contains two types of information: basic information and extended information.

Basic information The most basic application background information needs to be described, including 4 Ws:

  1. Who (which user or service)

  2. When (what time)

  3. Where (where)

  4. what

Extended information is for better data scalability to meet the monitoring and statistical needs of different businesses, and is usually defined, parsed, and used by each business team.

4. How to measure the capability of monitoring data system?

Can be measured from 3 dimensions:

  1. Correctness, the consistency of the collected data with the facts.

  2. Comprehensiveness, that is, whether the collected data information is sufficient to support the team to make decisions.

  3. Timeliness, that is, the processing time required for the occurrence of data to support decision-making is short enough.

Learn more: https://t.zsxq.com/08AGFfCK3

recommended reading

  1. Continuous Delivery 2.0

  2. value discovery circle

  3. Fast Validation Ring

  4. group Culture

  5. Software System Architecture

  6. Demand Collaboration Management

  7. Deployment Pipeline Principles

  8. A branching strategy that facilitates integration

  9. continuous integration

  10. automated testing strategy

  11. Software Configuration Management

  12. low risk release

join readers circle

82fa0d551a12d8c5bd104dbc869e89da.jpeg

Guess you like

Origin blog.csdn.net/XinLiangTalk/article/details/128295690