Introduction to Root Cause Analysis
Introduction to Root Cause Analysis (RCA)
What is Root Cause Analysis (RCA)?
Root cause analysis (RCA) is the search for the underlying cause of a problem. For every problem, there is a cause, and although the term "Root Cause" implies that there is a single cause for a problem, often multiple causes interact and work together to trigger the problem. It is also important to notice that there is no single RCA method for all situations; however, the RCA should involve empirical methods and the selection of the appropriate tools for the problem under investigation. An RCA is performed by a root cause investigator; this could be a quality engineer, quality manager, or even a well-trained production operator. Root cause analysis is an important tool to help ensure quality of academic payloads including those in small sat, cube sat, and HABs.
What are the aims of RCA?
- Identify potential causes.
- Determine which cause or causes are root causes.
- Address those root causes to ensure the effect (the problem) does not recur.
Why is RCA so important?
If a problem has occurred once, it most likely will occur again (unless something is done to prevent its recurrence.) However, if the root cause is found (and is addressed,) future occurrences of the same problem can be prevented. Thus, root cause analysis is the key to preventing future problems and it allows us to learn from past problems, failures, and accidents.
In other words, RCA is a method that helps determine what happened, how it happened, and why it happened.
What can make RCA challenging?
- The problem is poorly defined.
- A systematic approach is not used.
- Investigations are stopped prematurely.
- Decisions are based on guesses, hunches, or assumptions.
- An inadequate level of detail is used to try to identify the root cause.
- Interim containment fixes are sometimes allowed to become permanent.
- The skills, knowledge, and experience needed to uncover the root cause are not available.
Four keys to successful RCA:
- Use a step-wise approach:
- Standardize the approach throughout the organization.
- Adopt fact-based decision making:
- Don't accept opinions, guesses, or hunches.
- Test to confirm:
- If the root cause has been "found," test to confirm you have indeed identified the root cause or causes.
- Implement permanent corrective solutions:
- Does the solution answer the "root cause question?" (The root cause question: Does this cause explain all that we know about what the problem is, as well as all we know about what the problem is not?)
- Is the solution practical, feasible, and cost-effective?
- Is the solution robust and sustainable?
Steps for Root Cause Analysis
According to NASA, the following steps must be followed in order to find and solve root causes:
- Identify and clearly define the undesired outcome.
- Gather data.
- Create a timeline.
- Place events and conditions on an event and causal factor tree.
- Use a fault tree or other method / tool to identify all potential causes.
- Decompose system failures down to basic events or conditions (that is, further describe what happened).
- Identify specific failure modes (i.e., Immediate Causes).
- Continue asking “WHY” to identify root causes.
- Check your logic and your facts. Eliminate items that are not causes or contributing factors.
- Generate solutions that address both proximate causes and root causes.
Can all problems be prevented?
Probably not - but most recurring problems can be prevented if the root cause is found and addressed.
Definitions and Related Terms
Some important and recurring terms that you should remember as you work through this course are:
Cause (Causal Factor): An event or condition that results in an effect. Anything that shapes or influences the outcome.
Proximate Cause(s): The event(s) that occurred, including any condition(s) that existed immediately before the undesired outcome, directly resulted in its occurrence, and, if eliminated or modified, would have prevented the undesired outcome. Also known as the direct cause(s).
Root Cause(s): One of multiple factors (events, conditions, or organizational factors) that contributed to or created the proximate cause and subsequent undesired outcome and, if eliminated or modified, would have prevented the undesired outcome. Typically multiple root causes contribute to an undesired outcome.
Root Cause Analysis (RCA): A structured evaluation method that identifies the root causes for an undesired outcome and the actions adequate to prevent recurrence. Root cause analysis should continue until organizational factors have been identified, or until all data is exhausted.
Event: A real-time occurrence describing one discrete action, typically an error, failure, or malfunction. Examples: pipe broke, power lost, lightning struck, person opened valve, etc.
Condition: Any as-found state, whether or not resulting from an event, that may have safety, health, quality, security, operational, or environmental implications.
Organizational Factors: Any operational or management structural entity that exerts control over the system at any stage in its life cycle, including but not limited to the system’s concept development, design, fabrication, test, maintenance, operation, and disposal. Examples: resource management (budget, staff, training); policy (content, implementation, verification); and management decisions.
Contributing Factor: An event or condition that may have contributed to the occurrence of an undesired outcome but, if eliminated or modified, would not by itself have prevented the occurrence.
Barrier: A physical device or an administrative control used to reduce risk of the undesired outcome to an acceptable level. Barriers can provide physical intervention (e.g., a guardrail) or procedural separation in time and space (e.g., lock-out-tag-out procedure).
Mishap: An unplanned event that results in at least one of the following: (1) injury to personnel, (2) damage to public or private property (including foreign property), (3) mission failure.