It is not enough for operations to focus on business continuity, two other aspects are also important

The original text comes from: Jiawei Blue Whale Public Account Download information: Click here

This article was contributed by Jia for internal staff

Since I learned and passed the relevant certifications of Lean (Lean IT leadership), Agile (Scrum Master), and DevOps (DevOps Master), I have new thinking and inspiration for O&M management and O&M work itself. In this article, we will discuss the three stages of ensuring business continuity, agile delivery of business value, and improving employee satisfaction, and share the realization ideas of different stages for your reference.

1. The ultimate goal of operation and maintenance management - BVSSH

One of the most important responsibilities of operation and maintenance management is to ensure business continuity - responsible for the operation and maintenance of the system to ensure the safe and stable operation of the business. With the passage of time, in the digital age of VUCA, it is not enough for operation and maintenance management to only focus on business continuity assurance. The new era puts forward new requirements - BVSSH, delivering better (Better) value (Value) to customers faster (Sooner) and safer (Safer), while satisfying customers and employees (Happier).

It is not enough for operations to focus on business continuity, two other aspects are also important

▲ From the book "Sooner Safer Happier: Antipatterns and Patterns for Business Agility" If there is any infringement, please contact to delete

  • Better :  Represents quality, such as: fewer production accidents, shorter disaster recovery times, fewer product defects and rework;
  • Value:  Represents business value, such as: increasing turnover, increasing profits, increasing the number of customers;
  • Safer (safer):  means continuous compliance (GRC) - governance (Governance), risk (Risk) and compliance (Compliance);
  • Sooner (faster):  End-to-end delivery efficiency, also at the core of lean and agile. Commonly used metrics include Lead Time, Flow Efficiency, Throughput, etc.;
  • Happier (happier):  about employee and customer satisfaction. It's a more human and engaging way of working.

In this article, we divide the development of operation and maintenance management into 3 stages:

The first stage: to ensure business continuity (Safer) , to ensure the safe and stable operation of the business;

The second stage: agile delivery of business value (Sooner) , rapid response to market changes to deliver business value;

The third stage: improve employee satisfaction (Happier ), improve employee and customer satisfaction.

 

Click to download the information: "Building Enterprise Business Continuity Management System Based on SRE Theory"

 

2. Phase 1: Guaranteeing Business Continuity (Safer)

It refers to ensuring the safe and stable operation of the business through the whole-process management of pre-prevention, in-process control, and post-processing.

1. How to measure business continuity?

Commonly used metrics for business continuity include MTTR/MTBF, SLA/OLA, and RTO/RPO, which are described as follows:

MTTR/MTBF:

  • Mean Time To Repair (MTTR) : refers to the average value of the time period between the failure of the system and the end of the recovery;
  • Mean Time Between Failure (MTBF, Mean Time Between Failure) : refers to the average value of the time period between two failures of the system.

SLA / OLA :

  • Service Level Agreement (SLA) : is an agreement or contract between a service provider and its customers regarding the provision of services that meet customer expectations. SLAs are all about meeting business level requirements and managing business expectations, e.g. how long the business can expect service interruptions in the event of an outage;
  • Operation Level Agreement (OLA) : A commitment or agreement established by a service provider for its internal customers to comply with an SLA. OLA is used to monitor internal service agreements such as response times to incidents, issues assigned to IT groups, availability of servers supporting multiple applications, etc.

RTO/RPO:

  • Recovery Time Objective (RTO) : When a service is interrupted, the time from when the service is interrupted to the time it takes to restore the service to normal, the time period between these two points is called RTO.
  • Recovery Point Objective (RPO) : Refers to the maximum amount of data that is acceptable for data loss, that is, the maximum amount of data that can be tolerated for loss. RPO is expressed as a measure of time from the event of loss to the last backup.

2. How to ensure business continuity?

There are mature standard systems and practices in the industry for business continuity management and assurance. I will not elaborate here, but only list the international standard ISO 22301 business continuity management system and the Google SRE service reliability hierarchy model.

ISO 22301 Business Continuity Management System can help enterprises to formulate a set of integrated management process plans, enable enterprises to identify and analyze potential disasters, help them determine the threat to enterprise operations caused by possible shocks, and provide an effective Management mechanisms to prevent or neutralize these threats and reduce the losses caused by disaster events to the enterprise.

The book "SRE: Google Operation and Maintenance Decryption" proposes a 7-layer model of Google SRE service reliability, including:

It is not enough for operations to focus on business continuity, two other aspects are also important

 

Click to view: How to build the stability capability of enterprise IT application system based on Google SRE theory?

 

3. Phase 2: Agile Delivery of Business Value (Sooner)

Refers to the optimization of fast and efficient value flow, rapid response to market changes, rapid iteration, feedback and learning, and delivery of value to customers.

1. How to measure business delivery efficiency?

Regarding the end-to-end delivery efficiency (Sooner) of the business, this paper uses 3 key lean metrics:

① Lead Time : The end-to-end time from the user's request to the final delivery of value to the customer. Reduced lead time promotes rapid feedback and learning;

② Flow Efficiency : The percentage of working time (such as software development, testing, deployment) divided by lead time. The opposite of working time (Working) is waiting time (such as process approval). It is important to note that flow efficiency focuses on "things", while resource utilization focuses on "people"; improving flow efficiency requires identifying and mitigating obstacles to processes and limiting ongoing concurrent work, rather than increasing human work. .

It is not enough for operations to focus on business continuity, two other aspects are also important

▲ From the book "Sooner Safer Happier: Antipatterns and Patterns for Business Agility" If there is any infringement, please contact to delete

③ Throughput : Throughput is the count of valuable items delivered to customers within a given time.

2. How to improve business delivery efficiency?

The lean value stream mapping method can help companies identify the entire delivery process from the perspective of customers, and at the same time help to establish a holistic thinking, avoid local optimization, and improve end-to-end delivery efficiency. The specific operation can be carried out through the following three steps:

1) Determine the optimized value stream type through value stream mapping

  • Main value stream : the whole process from the user's request to the delivery to the customer. Such as: from the whole process of "requirements-design-development-test-deploy-operation".
  • Value stream segment : It is relative to the main value stream, such as: software design, code writing, functional testing, and application production are all value stream segments.
  • Support value stream : Provide support for value delivery, typical examples are: recruitment, employee training, budget processing, etc.

2) Identify and eliminate unnecessary non-value-added activities

  • Value-added activities: activities that directly create value for customers, such as function development;
  • Necessary non-value-added activities: do not directly create customer value but are necessary, such as: plan changes, process approvals, personnel training;
  • Unnecessary non-value-added activities : that is, waste and should be eliminated as a priority. The most obvious waste is waiting—for example, queuing up for registration at the hospital, queuing to see a doctor, queuing to pay bills, and queuing to get medicine. It is common for IT to wait for process approvals and wait for resource procurement.

3) Identify and remove bottlenecks

Value stream thinking is customer-centric, and identifying bottlenecks requires holistic thinking. Optimizing it is only valuable if this stage or step becomes the bottleneck of the entire value stream, otherwise it is only a local optimization. Here we take "application release" as an example to illustrate:

It is not enough for operations to focus on business continuity, two other aspects are also important

Welcome to pay attention to "Jiawei Blue Whale Service Number" to receive information on research and transportation of dry goods

As shown in the figure above, the application release automation projects we see in enterprises can be roughly divided into three categories:

① Release execution automation (green part in the figure above) : The manual operation of the release step is handed over to the tool for execution to realize the automation of the release operation; there is no doubt that this is conducive to improving the standardization and standardization of the release process. Consider: Can automating release execution significantly reduce end-to-end delivery efficiency? The answer is not necessarily . It may take 2 months for your release plan formulation, release scheduling and release approval, which is of little value for the manual operation of "release execution" to increase from 1 hour to 5 minutes of automatic execution. of. Just like the example of the hospital, the time for doctors to see a doctor has been shortened from 3 minutes/person to 2 minutes/person. The 1 minute saved is just the tip of the iceberg for the half-day waiting time for users.

② Automation of the release process (blue part in the figure above) : The automation of the process from the start of the release request to the closure of the release plan can significantly improve the efficiency of the application release process. This improvement can be clearly perceived by the operation and maintenance team at least. Think about it: Will the automation of the release process greatly improve the end-to-end delivery efficiency of business value? The answer is still not necessarily. Unless your release process is the bottleneck of the whole process.

③ Whole-process automation (gray part in the figure above) : Realizing the whole-process automation from demand to customers can significantly improve delivery efficiency, shorten product launch cycles, and provide rapid feedback and iteration. Obviously, the optimization of the whole process needs to change from traditional departmental thinking and silo thinking to holistic thinking and global thinking.

Fourth, the third stage: improve employee satisfaction (Happier)

More and more enterprises begin to pay attention to customer success, but customer success comes from customer satisfaction, and the premise of customer satisfaction is the satisfaction of internal employees.

1. How to measure employee satisfaction?

Net Promoter Score: NPS (Net Promoter Score), Net Promoter Score, also known as word of mouth, is an index that measures the likelihood that a customer will recommend a company or service to others . NPS can be used to measure both product service and employee loyalty. By closely tracking Net Promoter Score, businesses can make themselves more successful.

It is not enough for operations to focus on business continuity, two other aspects are also important

 

Net Promoter Score is also relatively simple to use. You can ask employees questions and score between 0 and 10. For example, whether you are willing to recommend this company to friends and colleagues, it is divided into 3 categories according to the score:

  • Promoters (scores between 9-10) : are people with rabid loyalty who continue to recommend a company or product to others;
  • Passive (scores between 7-8) : Generally satisfied but not enthusiastic, generally do not recommend the company or product to others;
  • Detractors (score 0-6) : Dissatisfied or not loyal to the company, will not recommend the company or product to others, or even demean it.

Final Net Promoter Score (NPS)=(Number of recommenders/Number of total samples)×100%-(Number of detractors/Number of total samples)×100%

2. How to improve employee satisfaction?

Of course, there are many ways to improve employee satisfaction. This article lists three ideas: enterprise service management (ESM), empowering operation and maintenance personnel, and applying lean continuous delivery practices.

Idea 1: Enterprise Service Management (ESM)

Enterprise Service Management is the practice of applying IT service management to other areas of a business or organization, including but not limited to: HR, finance, legal, administrative, marketing, procurement, and security teams, with the aim of improving efficiency , service delivery and user experience. In short, it applies what works well in IT Service Management (ITSM) to the entire enterprise.

It is not enough for operations to focus on business continuity, two other aspects are also important

▲ From BMC official website, if there is any infringement, please contact to delete

As shown on BMC's official website, one of the six benefits of applying ESM is to increase user satisfaction. Internal users will be more comfortable with request expectations as processes help define roles and responsibilities. (Happy internal users will influence your external customers who will see the improvement too)

An Enterprise Service Management (ESM) platform has at least 4 core capabilities:

  • Self-service portal : Provides a user self-service portal for multiple terminals (mobile phones, tablets and PCs), allowing users to apply for the required services on demand. Obviously, the user experience design of the service portal is very important;
  • Knowledge base : Users can search and solve general problems by themselves, and put forward new requirements for knowledge precipitation and standardization of knowledge base;
  • Automatic delivery : After the user submits the service application, the background system is automatically scheduled to complete the rapid delivery of the service;
  • Service orchestration capability : Quickly assemble and orchestrate service processes that meet various scenarios through a drag-and-drop code-free or low-code service process orchestration engine.

It is not enough for operations to focus on business continuity, two other aspects are also important

Welcome to pay attention to "Jiawei Blue Whale Service Number" to receive information on research and transportation of dry goods

Idea 2: Empower operation and maintenance personnel to reduce their sense of crisis

In the "Friends of Time" New Year's Eve speech in 2022, Luo Pang mentioned a sentence in the "Global Human Capital Trend Report" - "Enterprises should be responsible for the viability of employees". In the cloud-native era, infrastructure has become cloud-based, resource delivery has been automated, and O&M operations have become tools. Dare to ask what the future holds for O&M personnel? As an operation and maintenance team and an IT department, it is necessary to provide employees with a platform for career upgrading and empowerment training. As an operation and maintenance person, I think there are at least 3 directions to go:

It is not enough for operations to focus on business continuity, two other aspects are also important

 

  • Operation and maintenance development : develop unified, standardized and automated operation and maintenance tools, deposit repetitive manual operations and experience into the tool platform, and provide operation and maintenance efficiency;
  • Operation and maintenance manager : coordinate and lead the operation and maintenance team to move left, participate in the project team in advance, and discuss non-functional requirements and operability with the project team;
  • Operation and maintenance experts : "Dinghaishenzhen" of the operation and maintenance technology stack (such as: DBA)

Of course, in addition to addressing the needs of operation and maintenance itself, the tools developed by operation and maintenance can also indirectly empower testers (such as test environment resource opening) and developers (such as log query), and at the same time formulate operation and maintenance specifications.

It is not enough for operations to focus on business continuity, two other aspects are also important

 

Idea 3: Apply Lean Management and Continuous Delivery Practices

The book "Accelerate" is based on years of research and found that applying lean management practices, software development practices, continuous delivery , and cultural change all affect employee satisfaction. For details, read the original book.

It is not enough for operations to focus on business continuity, two other aspects are also important

▲ From the book "Accelerate: Building and Scaling High Performance Technology Organizations" If there is any infringement, please contact to delete

This article summarizes the three stages of operation and maintenance work based on personal work experience for many years and my own thinking. The first and second stages are more focused on the "thing" itself, and the third stage needs to return to "people-oriented" , and at the same time rely on lean , Agile and DevOps thinking to keep employees happy and customers successful.

It is not enough for operations to focus on business continuity, two other aspects are also important
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4026796/blog/5527539