Outages
Procedures for handling reports and updating users about UNPLANNED interruptions of service. Depending on the impact of the outages, ALL of these communication methods may be needed.
Outages Slack Channel
- Employees should report outages for platforms and network to the outages channel. However, individual reports for personal equipment or accounts should be directed to the appropriate ticketing system.
- All Digidev units should have a process for monitoring this channel for outage reports.
- When a report is made, the team responsible for the platform or service should:
- acknowledge the message
- provide updates on the status of the issue.
- Unit managers should post any unplanned outages to this channel as soon as possible and provide periodic updates to users.
| Intranet Alert
- Significant outages should also be posted on the Intranet as an alert.
- Employees will receive an email when alert is created.
- It is important to resolve the alert when the event has successfully resolved.
*** It is up to the unit manager to determine what a significant outage is but please err on the side of caution, especially during finals week, etc. | Main Website Alert
- For significant outages that impact user services, an alert needs to be put on the libraries main website.
- To add an alert, contact Web Service with the message.
- It is important to let Web Services know when to remove the message.
*** It is up to the unit manager to determine what a significant outage is but please err on the side of caution, especially during finals week, etc. |
Planned Maintenance
We have identified several Monday maintenance windows throughout the year where planned maintenance, upgrades, code pushes can be scheduled to minimize impact on services. The maintenance window will occur between 6:00 AM to 12:00 PM on the scheduled dayon every Monday (except on blackout dates). Please note the process for scheduling an update and communicating updates out to the public.
Scheduling Maintenance
Notifying Stakeholders
- Make sure all stake holders know about any changes to the system or potential down time. Lead time on notification will depend on how major the change is.
- For major interruptions, a WEEKLY UPDATE must be submitted prior to the outage with the appropriate lead time.
- Web services requires longer lead times so that Libguides, tutorials, and instruction materials can be updated. SHAREOK Partners must be notified in advance as well.
*** It is up to the unit manager to determine who the internal stake holders are. Any public service must include notifications to the general public.
| Notifying Infrastructure Working Group (IWG)
- The Wednesday before a maintenance window, unit managers should post information about the planned maintenance in the IWG slack channel.
- IWG committee members will review the change and determine if there are any conflicts.
- If more than one change is scheduled for the same day, teams should create a schedule so that multiple services are not interrupted at the same time.
- All significant changes, maintenance, and upgrades need IWG approval, but it is up to the unit manager to determine what changes are considered significant.
| Posting an Alert
- A few days before, an alert should be posted on the libraries website to notify the public of any potential outages.
- The day of the maintenance, teams need to post in the slack Outages channel when work has begun.
Example: "Maintenance on X system has started."
- Teams should post messages about any unplanned delays.
Example: "Maintenance on X system is taking longer than expected."
- Teams should post a message when work is completed.
Example: "Maintenance on X system has been completed."
|
Maintenance Windows
2020- June 1st
- June 15th
- June 29th
- July 13th
- August 10th
- September 21st
- October 5th
- October 19th
- November 2nd
- November 16th
- December 7th*
*SHAREOK can only be down for 15 mins or less | 2021
February 1stFebruary 15thMarch 1stMarch 15thMarch 29thApril 12thApril 26thMay 24thJune 7thJune 21st- We will have a weekly maintenance window every Monday
- Since ILLiad weekend usage prevents downtime on Mondays, we will schedule ad hoc windows for changes that require outages
- The maintenance window will continue to occur between 6:00 AM to 12:00 PM
- Major maintenance work will be scheduled with impacts and peak dates in mind
- We will continue to follow the normal process for notifying stakeholders and customers of planned work and/or outages
List of Mondays that will not be included in the weekly maintenance schedule based on the 2021/2022 Academic Calendar:
Holidays | - July 5
- September 6
- January 17
- May 30
|
SHAREOK Blackouts (final week to submit work) | - August 2
- December 13
- May 9
|
Unscheduled Deployments
Most changes should be announced in advance and applied during our prearranged maintenance windows. This policy allows for low-risk and emergency changes outside of this proposed schedule.
Root Cause Analysis (RCA)
Unplanned outages require an Root Cause Analysis to be completed by the team in charge. See the RCA documentation for more information.