Root Cause Analysis (RCA)
Purpose
The purpose of an RCA is to analyze an unexpected failure in service to better understand the root cause and to create mitigation steps. A robust RCA process helps technical teams continually improve their services.
When to do an RCA
A significant outage in services cause by a technical error - <10 minutes
A breakdown in services that negatively impact a stakeholder
A significantly delayed project
Anatomy of an RCA
|
|
---|---|
Date | When was the RCA prepared |
Event Description | What happened to what. Example: SHAREOK Outage. |
Prepared by | Who prepared the RCA. Multiple names acceptible. |
Reviewed By | Who will review and sign off on accuracy. Usually accountable like a supervisor. |
Event Duration | When did it start and end. |
Services Affected | Did it impact a platform, a service, a project. |
Event Summary & Timeline | A detailed summary of what happened. Include timeline of events. |
Root Cause | What caused the outage or interruption. |
Event Solution | How did you fix it. |
Mitigation Strategies | What measures can you put in place to make sure it doesn’t happen again. |
What to do after an RCA is complete
Make sure all responsible parties have a copy
Place a copy in the Digidev RCA folder
Make sure the division head knows a new RCA has been completed
Make sure all mitigation strategies have been implemented