Learning disaster

Yun brother (veteran notes) 20,200,202

Some people ask why can not learn the lessons of the 2003 SARS? Why not take goggles? I do not know why the virus is active in the stool? Seventeen years ago not to know yet?

 

1

Not every industry, each entity can draw lessons counterparts, although usually we all learn from the successes and mistakes of others.

A recent example is that Jingdong no threshold coupons events in 2020 January 8, and the January 20, 2019 to fight a lot of no threshold coupons accident exactly the same.

The former Jingdong self-employed small appliance category to the applicable area of ​​200 yuan no threshold coupons inside for up to fifty minutes. The latter is a misuse expired operating activities, resulting in the morning back online, wool partisans have all night.

Unfortunately, this tragedy may all come once a year.

 

2

I wrote:

Neatly placed in the trunk, still good at forgetting to remind us of human folly.

- Zheng Yun, those years together we've done wrong

The aviation industry and the medical industry on the wrong attitude is very different. The aviation industry more willing to face errors, pilots generally say their mistakes are holding open and sincere attitude, partly because of their own mistakes can lead to death. This industry has a strong and independent organization responsible for the crash investigation. Failure will not be treated as a reason to accuse a pilot, but will be seen as a valuable opportunity to make all pilots, airline managers and learning progress.

The medical profession relatively speaking, more dead patients, as well as KPI pressure paper, so the industry is conservative. But SARS pneumonia and this event is a threat to the safety of front-line health care workers, it may have some role in promoting trade associations.

 

3

Toyota production system and management methods, there is a staff autonomy :

Who are independent of the organic complex behavior of personnel and machinery. The production line to produce quality, quantity and variety of mechanical problems on the equipment automatically shut down, and there are instructions displayed, and any person who finds fault problems have the right to immediately stop the production line, take the initiative to troubleshoot and solve problems . At the same time the quality management into the production process, become autonomous behavior of every employee, will all work into effective labor.

It emphasizes a bottom-up management, the provisions of the processes of implementation and monitoring are discussed by the field workers through mutual learning and formation, rather than leading you took a job specification. This is to some extent and aerospace is the same, if the treatment is the wrong attitude of openness, then the whole system will be able to learn, to make progress.

 

4

I often see articles of people know in their daily work, to deal with the accident, we always comply with aerospace twenty words tactic: accurate positioning, a clear mechanism, can reproduce and effective measures, by analogy .

We insist must check every wrong, wrong and wrong on the rectification, every mistake will be written with personally tell every new employee to face the error, open technical details to share to everyone in the long run, every accident will become our wealth, He became the team's inheritance and family property.

RCA standard format for the report:

Background phenomenon (Optional) range of issues that affect the cause of the problem analysis process (Optional) solution to subsequent processing steps: dirty data such as how to repair the line, such as the impact on users how to make up, etc. (Optional) RCA-type lessons: such as code issues, implementation issues, configuration issues, design issues, testing issues

 

But even with the RCA system and case base, our daily work will still be exhaustive, after all, people are not machines. then what should we do?

 

5

Do it like this:

First, in the professional field, do not let the layman leadership!

I found a lot of management do not know this one, do I always feel that management can control everything in the world. Crap!

Second, the growth of investment in human and material resources in the system of internal strength, will help the long-term project inheritance, tool, universal, standardization, automation, digitization. Medical and IT and other industries to see more "inventory revolution" and "black box thinking" these two books.

In the face of security, audit, quality control, etc., are more willing to choose to use "machine (Note: This is a Refers)" to solve, rather than increasing the flow, increasing the intermediate node.

Cool shell Chen Hao said, the technology is not debt owed to repay ruthlessly. A lot of things, there will be a beginning, then there will never be. Once a thing rotten, rotten behind only follow along, rotten, the more no one would dare to repay.

So those who have been repeated the process, be sure to its instrumentalization, bound to automate processes within and reduce unnecessary mental burden level employees.

Third, regular disaster drills.

近年来工业界有一种混沌工程(Chaos Engineering)理念,这是在分布式系统上进行实验的学科,目的是建立对系统抵御生产环境中失控条件的能力以及信心,最早由Netflix及相关团队提出。它的核心思想是,减少故障的最好方法就是让故障经常性的发生。通过不断重复失败过程,持续提升系统的容错和弹性能力。阿里巴巴对应的开源混沌工程工具名叫 ChaosBlade,就是专门做故障注入的。

比如阿里巴巴经常搞的断网断电演练和生产突袭。

你不能指望平常没做过异地多活切机房,灾难来临的时候所有人能步调一致、有条不紊地切换流量和机房。

 

-EOF-

Guess you like

Origin www.cnblogs.com/zhengyun_ustc/p/12286049.html