Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Current »

Public information about outages can be found on the main libraries website on the Maintenance Windows page. 

Outages

Procedures for handling reports and updating users about UNPLANNED interruptions of service. Depending on the impact of the outages, ALL of these communication methods may be needed.


Outages Slack Channel

  • Employees should report outages for platforms and network to the outages channel. However, individual reports for personal equipment or accounts should be directed to the appropriate ticketing system.

  • All Digidev units should have a process for monitoring this channel for outage reports.

  • When a report is made, the team responsible for the platform or service should:
    • acknowledge the message
    • provide updates on the status of the issue.

  • Unit managers should post any unplanned outages to this channel as soon as possible and provide periodic updates to users.



Intranet Alert

  • Significant outages should also be posted on the Intranet as an alert.

  • Employees will receive an email when alert is created.

  • It is important to resolve the alert when the event has successfully resolved.


*** It is up to the unit manager to determine what a significant outage is but please err on the side of caution, especially during finals week, etc.


Main Website Alert

  • For significant outages that impact user services, an alert needs to be put on the libraries main website.

  • To add an alert, contact Web Service with the message.

  • It is important to let Web Services know when to remove the message.

*** It is up to the unit manager to determine what a significant outage is but please err on the side of caution, especially during finals week, etc.


Planned Maintenance

We have identified several maintenance windows throughout the year where planned maintenance, upgrades, code pushes can be scheduled to minimize impact on services. The maintenance window will occur between 6:00 AM to 12:00 PM on the scheduled day. Please note the process for scheduling an update and communicating updates out to the public.

Scheduling Maintenance


Notifying Stakeholders

  • Make sure all stake holders know about any changes to the system or potential down time. Lead time on notification will depend on how major the change is.

  • For major interruptions, a WEEKLY UPDATE must be submitted prior to the outage with the appropriate lead time. 

  • Web services requires longer lead times so that Libguides, tutorials, and instruction materials can be updated. SHAREOK Partners must be notified in advance as well. 


*** It is up to the unit manager to determine who the internal stake holders are. Any public service must include notifications to the general public.





Notifying Infrastructure Working Group (IWG)

  • The Wednesday before a maintenance window, unit managers should post information about the planned maintenance in the IWG slack channel.

  • IWG committee members will review the change and determine if there are any conflicts.

  • If more than one change is scheduled for the same day, teams should create a schedule so that multiple services are not interrupted at the same time.

  • All significant changes, maintenance, and upgrades need IWG approval, but it is up to the unit manager to determine what changes are considered significant. 


Posting an Alert

  • A few days before, an alert should be posted on the libraries website to notify the public of any potential outages.

  • The day of the maintenance, teams need to post in the slack Outages channel when work has begun.
    Example: "Maintenance on X system has started."

  • Teams should post messages about any unplanned delays.
    Example: "Maintenance on X system is taking longer than expected."

  • Teams should post a message when work is completed. 
    Example: "Maintenance on X system has been completed."

Maintenance Windows

2020

  • June 1st
  • June 15th
  • June 29th
  • July 13th
  • August 10th
  • September 21st
  • October 5th
  • October 19th
  • November 2nd
  • November 16th
  • December 7th*

*SHAREOK can only be down for 15 mins or less

2021

  • February 1st
  • February 15th
  • March 1st
  • March 15th
  • March 29th
  • April 12th
  • April 26th
  • May 24th
  • June 7th
  • June 21st

Root Cause Analysis (RCA)

Unplanned outages require an Root Cause Analysis to be completed by the team in charge. See the RCA documentation for more information. 


  • No labels