Root Cause Analysis (RCA)

Purpose

The purpose of an RCA is to analyze an unexpected failure in service to better understand the root cause and to create mitigation steps. A robust RCA process helps technical teams continually improve their services.

When to do an RCA

  • A significant outage in services cause by a technical error - <10 minutes

  • A breakdown in services that negatively impact a stakeholder

  • A significantly delayed project

Anatomy of an RCA

 

 

 

 

Date

When was the RCA prepared

Event Description

What happened to what. Example: SHAREOK Outage.

Prepared by

Who prepared the RCA. Multiple names acceptible.

Reviewed By

Who will review and sign off on accuracy. Usually accountable like a supervisor.

Event Duration

When did it start and end.

Services Affected

Did it impact a platform, a service, a project.

Event Summary & Timeline

A detailed summary of what happened. Include timeline of events.

Root Cause

What caused the outage or interruption.

Event Solution

How did you fix it.

Mitigation Strategies

What measures can you put in place to make sure it doesn’t happen again.


What to do after an RCA is complete

  • Make sure all responsible parties have a copy

  • Place a copy in the Digidev RCA folder

  • Make sure the division head knows a new RCA has been completed

  • Make sure all mitigation strategies have been implemented