SLO, SLI and other knowledge induction in SRE

  • SLA = Service Level Agreement = Service Quality/Level Agreement

  • SLO = Service Level Objective = Service Quality/Level Objective

  • SLI = Services Level Indicator = Service Quality/Level Indicator

The following uses the logic of people, things, and things to explain.

people and things

Use top-to-bottom, left-to-right order.

Customers - Every customer is explicitly or implicitly based on a certain SLA when using product services. There is a 1-to-1 document relationship between the SLA and the customer. This agreement document is explicitly or implicitly exist in the system. Customers use 1 or n connection methods to access 1 or n application systems of product services.

Sales - The SLA itself is part of the service of the product being sold, and it specifies the functionality and quality of the product that is promised to the customer. Based on the SLA, customers can choose to use the product in a paid or free way. The sales job of 1/copy SLA can be completed by 1 to n sales persons. Both sales and customers dream of an almost perfect SLA, so that sales representing the interests of the company and customers of the product can achieve a win-win situation, and everyone is happy.

Product - Through indirect interaction with sales, or direct customer research, product managers can determine the functions and development directions that the application system should have.

SRE - SRE and the product jointly formulate the SLO of each SLA-related application system. The SLO quantitatively defines the service quality that each application system should have. The SLO of an application system is defined by the SLO document of the product service. In this document, SLO is mapped to 1 or n SLIs, and each SLI needs to use monitoring tools to continuously collect data, and usually their numerical units are different. All SLOs are expressed as percentage values, eg: 99.99% success rate, 90% request latency < 400ms, etc. SRE and product managers/experts should also pay attention to the infrastructure layer running the application system to ensure that the availability and capacity of the infrastructure are sufficient to meet the target number of user visits, and also consider and design the disaster recovery and cross-region multi-active of the underlying resources, etc. complex scene.

Dev/Ops - important but not discussed yet.

case

Use the order from bottom to top.

IaaS cloud service - can also be other types of environments where application systems can run. There exist 1 to n seed services. It usually has an n-to-n relationship with the n application systems on the upper layer.

Application system - 1 to n application systems constitute a product service (including SLA), and realize the business value of product service in the interaction with customers.

Documentation - Describes the service content and quality information provided by an application service to users in the form of web pages or paper. It is not mandatory, explicit, or necessary to provide this documentation to users.

SLI

Service Level Indicator Service level indicator, service level, referred to as SLI. It is the most important indicator for business. For example, a common SLI for websites is the percentage of requests that get a good response.

SLO

Service Level Object The service level target is the goal built around SLI. Usually a percentage and tied to a time frame. For example, monthly, quarterly, yearly, etc. Usually measured in a string of 9s. If it is separated from the measurement of time, the significance of SLO is not great.

90% (1 9's of uptime): This means 10% downtime, or 3 out of the last 30 days.
99% (2 9s uptime): means 1%, or 7.2 hours of downtime in the last 30 days.
99.9% (3 nines uptime): This means 0.1%, or 43.2 minutes of downtime.
99.95% (3.5 nines uptime): This means 0.05%, or 21.6 minutes of downtime.
99.99% (4 9s uptime): Means 0.01%, or 4.32 minutes of downtime.
99.999% (five nines of uptime): means 0.001%, or 26 seconds of downtime.

SLA

Service Level Agreement Service Level Agreement is an agreement issued by enterprises around SLO. It requires an agreement to compensate customers when the SLO is not met.

example

If I have a website http://eample.com , my monitoring indicator for this website is the number of normal responses to requests. From the launch on January 1, 2021 to today, March 18, 2021, the request data is as follows:

In January, the total number of requests is 500, and the error response is 20;

In February, the total number of requests was 600, and the error response was 10; and it was down for 10 minutes due to a failure;

From March 1st to March 18th, the total number of requests is 400, and the error response is 15;

So what are the SLI, SLO, and SLA I calculated?

SLI:1 -(20+10+15)/ (500+600+400) = 97%

SLO: 1 - ( 10 / 79 days * 24 * 60 ) = 99.991%

SLO: If we make a website for a third party, and sign an agreement that the SLO does not reach 99.999%, how much will be compensated, then calculate the amount of compensation based on the SLO above and the signed SLA agreement.

Guess you like

Origin blog.csdn.net/jackyrongvip/article/details/129223992