[Notes] Two root cause analysis methods: 5WHY&10WHY

What is problem root cause analysis

Root cause analysis: By investigating and analyzing where and why the problem is wrong, seek necessary measures to prevent the recurrence of errors and accidents, thereby improving service safety and quality.

Root cause analysis goals

  • Problem (what happened)
  • reason (why it happened)
  • Action (what can stop the problem from happening again)

WHY-WHY分析法(5WHY,5W)

5W Analysis: is a diagnostic technique used to identify and illustrate chains of cause and effect relationships whose root causes appropriately define the problem. Keep asking why the previous event happened, and don't stop asking until you answer "no good reason" or until a new failure mode is discovered. Explain the root cause to prevent the problem from recurring. Specific steps are as follows:

1. Grasp the status quo

Step 1: Identify the problem

  • What do I know? (big, vague or complex issue -> detailed facts)

Step 2: Clarify the question

  • What actually happens?
  • what should happen?

Step 3: Break down the problem

  • What else do you know?
  • Are there other sub-problems?

Step 4: Find Reason Points (PoC)

  • Where do you need to go?
  • What to see?
  • Who might have information about the problem?

Step 5: Grasp the tendencies of the problem

  • who?
  • Which?
  • What time?
  • How often?
  • How much?

2. Cause investigation

Step 6: Identify and confirm the immediate cause of the anomaly

  • Why did the problem occur?
  • Can you see the direct cause of the problem?
  • If not, what is the suspected cause?
  • How can I verify the most likely underlying cause?
  • How to identify the immediate cause?

Step 7: Use the 5WHY investigation method to build a chain of cause/effect relationships leading to the root cause

  • Can addressing the immediate cause prevent recurrence?
  • If not, can a next-level cause be discovered?
  • If not, what is the next level reason I suspect?
  • How can one verify and confirm that the next level has a reason?
  • Will addressing this level of cause prevent recurrence?

If not, keep asking "why" until the root cause is found.
Stop at the cause that must be addressed to prevent recurrence, ask:

  • Have I found the root cause of the problem?
  • Can I prevent it from happening again by dealing with this cause?
  • Can the cause be linked to the problem through a chain of fact-based cause/effect relationships?
  • Does the chain pass the "therefore" test?
  • If I ask "why" again will I get into another question?

Also need to use 5WHY to get back these questions:

  • Why do we have this problem?
  • Why does the problem reach the customer/user? (Why no problem found?)
  • Why does our system allow problems to occur?

3. Problem correction

Step 8: Take clear steps to deal with the problem

  • Use temporary measures to deal with anomalies until the root cause can be addressed.
  • Implement corrective actions to address the root cause to prevent recurrence.
  • Track and verify results: Does the solution work? How to confirm?

4. Prevention through the "error prevention" process

  • Take explicit steps to ensure that the problem does not recur, typically the "error prevention" process.
  • Remember the lessons learned.

10why problem analysis

  • 1w: What is the problem? What's the impact?
  • 2w: Why does this problem occur? In what scenarios does this problem occur?
  • 3w: At what stage was this problem discovered? - Could it be earlier?
  • 4w: At what stage was the defect introduced?
  • 5w: Why are problems introduced at this stage?
  • 6w: (how) how to avoid introducing this problem?
  • 7w: At what stage should the problem be discovered?
  • 8w: Why was no problem found at this stage?
  • 9w: (how) How can we find the problem at this stage?
  • 10w: (how) How to predict such product risks in advance based on the risk testing process?

Ideas for improvement/optimization measures

  1. How to avoid the problem?
    1. What can be optimized in the process?
    2. Can someone else be avoided?
  2. If it cannot be avoided, which stage is the easiest to detect? How to ensure timely detection of problems at this stage?
    1. Before release - are there tools/mechanisms in place to detect issues in a timely manner (eg code scanning)? Release use case?
    2. Grayscale - monitoring, feedback mechanism?
    3. Online - monitoring, data, feedback mechanism, feedback channel
  3. Can the problem handling process be optimized?
    1. Effective - Is the resolution process decision correct? What is the decision-making process like?
    2. Efficiency - processing speed? Where can I improve efficiency?
  4. Empathy, what would I do if it were
  5. Accountability for results: how to secure results/outputs
  6. Lessons Learned: Experience Precipitation

Reference: http://wiki.mbalib.com/wiki/WHY-WHY%E5%9B%BE

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324610715&siteId=291194637