Failure Mode and Effects Analysis (FMEA)
Spotting Problems Before a Solution Is Implemented
When things go badly wrong, it's easy to say with hindsight, "We should have known that would happen."And, with a little foresight, perhaps, problems could have been avoided if only someone had asked "What Could Go Wrong?"
By looking at all the things that could possibly go wrong at design stage, you can cheaply solve problems that would otherwise take vast effort and expense to correct, if left until the solution has been deployed in the field. Failure Modes and Effects Analysis (FMEA) helps you do this.
More than this, FMEA provides a useful approach for reviewing existing processes or systems, so that problems with these can be identified and eliminated.
FMEA was originally known as Failure Mode, Effects, and Criticality Analysis (FMECA), and was first published in 1949 by the U.S. Department of Defense. FMEA grew out of systems engineering, and is a widely-used tool for quality control. It builds on tools like Risk Analysis and Cause and Effect Analysis to try to predict failures before they happen. Originally used in product development, it is also effective in improving the design of business processes and systems.
When using FMEA, you start by looking in detail at the proposed solution (see the tip box below) and then you identify systematically all of the points where it could fail. Once these potential failures have been identified, you rate the potential consequences of each according to:
- Severity – how critical is the failure?
- Occurrence – how likely is the failure to happen?
- Detection – how easy will it be to detect the failure?
Using these rankings, you then identify the most serious threats, and then alter the design to eliminate or minimize the likelihood of the failure you identified.
Once you've redesigned your solution, it's worth repeating the FMEA to ensure that new potential points of failure have not been introduced into the design.
When using FMEA, it can often be best to draw expert team members from a wide variety of functions, so that you can look at the proposed solution from different angles. The purpose of FMEA is to uncover and assess potential failures, therefore the more thorough the investigation, the more useful the analysis.
There are a range of tools that you can use to map out the solution you want to examine, and the best tool to use will depend on the type of solution you're looking at. Among the tools you may want to consider using are Flow Charts, Swim Lane Diagrams, Systems Diagrams, or Value Chain Analysis.
How to Use the Tool
The best way of understanding FMEA is to use an example. Let's use it to look at a proposal for a simple payroll process.
Identify the solution, system or process you're looking at and, if appropriate, the main issue you want to investigate. List the critical elements, in a logical (for example, chronological) order.
Proposed Payroll Process – Key System Elements:
- Hourly time sheet tally
- Vacation pay calculation
- Overtime calculation
Develop a flow chart to map the solution or process, and the interactions between its various parts.
Click here to download our FMEA Matrix template. Use this template to work through each element in this process in turn.
For each element in the process, use brainstorming or carry out a risk analysis to identify the potential failures that may occur. Enter the ways that the solution or process can fail in the Failure Mode column of the FMEA Matrix.
- Submit time sheet – employees fail to submit time sheets.
- Submit time sheet – employees enter poor quality data on time sheets.
- Submit time sheet – employee uses incorrect analysis codes.
- Enter hours – human error in data entry.
- Vacation pay calculated – human error in setting up formulae.
- Vacation pay calculated – look-up tables not maintained.
- And so on...
For each potential failure, identify the consequences of the failure.
- Submit time sheet – employees fail to submit time sheets – under-billing of clients, non-payment of wages to employees
- Submit time sheet – employees entering poor quality data on time sheets – clients incorrectly billed, unreliable management information
- Submit time sheet – employee uses incorrect analysis codes – unreliable management information
- Enter hours – human error in data entry – underpayment or overpayment of wages
- Vacation pay calculated – human error in setting up formulae – significant and systematic overpayment or underpayment of wages
For each potential failure in the system, rank Severity, Occurrence and Detection using the following scales:
Severity – how critical is the failure?
5 – Very High (huge losses that threaten company viability)
4 – High (large losses, company is still operable)
3 – Low (losses exist, can be remedied)
2 – Minor (loss is minimal, quite insignificant)
1 – Low (no effect)
Occurrence – how likely is the failure to happen?
5 – Very High (must be addressed immediately, will happen very often)
4 – High (will cause frequent issues, will happen often)
3 – Low (will cause sporadic issues, will happen occasionally)
2 – Minor (issue will be few and far between, will happen quite infrequently)
1 – Low (issues unlikely, not likely to ever happen)
Detection – how easy will it be to detect the failure?
5 – Very Difficult
4 – Difficult
3 – Somewhat Easy
2 – Easy
1 – Very Easy
In our example, the potential failure of the vacation pay calculation may be ranked as:
- Severity 4 – If undetected, overpayment of wages could lead to significant financial loss
- Occurrence 5 – If it happens, it may happen very often
- Detection 3 – Executives are likely to spot significant overpayment of wages!
Calculate the Risk Priority Number (RPN) for each of the modes and effects by multiplying the 3 ratings (Severity x Occurrence x Detection).
In the example above, the RPN is 60. This is likely to be one of the most significant risk points in this process, and therefore needs to be managed.
Now you are ready to brainstorm action plans and make recommendations to counter the potential threats you uncovered. This step is best completed in phases starting with the modes and effects that have the highest RPN – in other words, those that represent the greatest threat.
This is where the cross-functional teams comes in very useful again. By putting together the best team of people, you can reassure yourself that the action plan that is recommended is well rounded, practical, and a relatively easy sell to the people who will have to make it happen.
In our example, thorough testing of the formula may be mandated, and the formula may be locked so that it can't accidentally be changed.
Once you've modified the design for the proposed solution, repeat the Failure Mode and Effects Analysis process to review the design, and make sure that no additional potential failure points can be identified.
The objective here is to develop a solution that has a low overall RPN. Where the RPN is still high, go back and revamp your plan, as appropriate, to address the issues that still pose a high failure potential.
Failure Modes and Effects Analysis is a useful tool for uncovering possible points of failure that may be lurking within business processes and solutions, whether these are already in place within your company or are proposed for the future.
This technique is as applicable to business solutions and processes as it is to its original application, product design. Ultimately, proposals that have been scrutinized using FMEA are more likely to be successful. If these are projects that you're responsible for, then your projects' success is your own success.