Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »


Objectives of this Document

  • Describe the procedures for outages & planned maintenance

  • Identify the proper way to post an alert message for outages

  • Identify the three stages for scheduling planned maintenance

  • Post a schedule of maintenance windows

  • Identify when a Root Cause Analysis (RCA) is needed


Table of Contents


  • Outages

  • Planned Maintenance

    • Scheduling Maintenance

    • Maintenance Windows

  • Root Cause Analysis (RCA)

Outages

Procedures for handling reports and updating users about UNPLANNED interruptions of service. Depending on the impact of the outages, ALL of these communication methods may be needed.


Outages Slack Channel

  • Employees should report outages for platforms and network to the outages channel. However, individual reports for personal equipment or accounts should be directed to the appropriate ticketing system.

  • All Digidev units should have a process for monitoring this channel for outage reports.

  • When a report is made, the team responsible for the platform or service should:
    • acknowledge the message
    • provide updates on the status of the issue.

  • Unit managers should post any unplanned outages to this channel as soon as possible and provide periodic updates to users.



Intranet Alert

  • Significant outages should also be posted on the Intranet as an alert.

  • Employees will receive an email when alert is created.

  • It is important to resolve the alert when the event has successfully resolved.


*** It is up to the unit manager to determine what a significant outage is but please err on the side of caution, especially during finals week, etc.


Main Website Alert

  • For significant outages that impact user services, an alert needs to be put on the libraries main website.

  • To add an alert, contact web Service with the message.

  • It is important to let Web Services know when to remove the message.

*** It is up to the unit manager to determine what a significant outage is but please err on the side of caution, especially during finals week, etc.


Planned Maintenance

We have identified several maintenance windows throughout the year where planned maintenance, upgrades, code pushes can be scheduled to minimize impact on services. The maintenance window will occur between 6:00 AM to 12:00 PM on the scheduled day. Please note the process for scheduling an update and communicating updates out to the public.

Scheduling Maintenance


Notifying Stakeholders

  • Make sure all stake holders know about any changes to the system or potential down time. Lead time on notification will depend on how major the change is.

  • For major interruptions, a WEEKLY UPDATE must be submitted prior to the outage with the appropriate lead time. 

  • Web services requires longer lead time so that Libguides, tutorials, and instruction materials can be updated. SHAREOK Partners must be notified in advance as well. 





Notifying Infrastructure Working Group (IWG)

  • The Wednesday before a maintenance window, unit managers should post information about the planned maintenance in the IWG slack channel.

  • IWG committee members will review the change and determine if there are any conflicts.

  • If more than one change is scheduled for the same day, teams should create a schedule so that multiple services are not interrupted at the same time.


Posting an Alert

  • The day of the maintenance, teams need to post in the slack Outages channel when work has begun.
    Example: "Maintenance on X system has started."

  • Teams should post messages about any unplanned delays.
    Example: "Maintenance on X system is taking longer than expected."

  • Teams should post a message when work is completed. 
    Example: "Maintenance on X system has been completed."

Maintenance Windows

2019

  • June 17th
  • July 15th
  • August 12th
  • August 26th
  • September 9th
  • September 23rd
  • October 7th
  • October 21st
  • November 4th
  • November 18th
  • December 16th

2020

  • January 13th
  • January 27th
  • February 10th
  • February 24th
  • March 9th
  • March 23rd
  • April 6th
  • April 20th
  • May 18th
  • June 1st
  • June 15th
  • June 29th

Root Cause Analysis (RCA)

Unplanned outages require an Root Cause Analysis to be completed by the team in charge. See the RCA documentation for more information. 


  • No labels