Standard and Emergency Changes

This policy provides criteria for determining whether configuration and code changes can be applied to production systems outside scheduled maintenance windows.

Purpose of the Policy

Most normal changes should be announced in advance and applied during our prearranged maintenance windows. This policy provides allows for low-risk and emergency changes outside this schedule.

Policy

Standard Changes

If all of the following criteria apply, unscheduled deployment of low-risk changes may be permitted:

The change is expected to have a low perceived impact on service. This should typically be understood as 3 minutes of downtime or less, but this standard varies with the targeted service. Stricter standards should be applied for systems with high levels of visibility and use. (e.g. libraries.ou.edu or SHAREOK).
The change set doesn’t implement major user experience changes without updating the relevant documentation and training materials.
The change set has been peer reviewed.
The change set has a quickly testable outcome.
The change set is made using a tested, documented procedure with a predictable run time, preferably through automation.
The change set should be quickly revertible, preferably through automation.
The system being modified should have automated health checks to help monitor for unexpected side effects.

Examples of Standard Changes

Updates to add a new host name or redirect to the reverse proxy.
Updates to add additional storage or swap.
Updates to scheduled jobs (e.g. cron)

Emergency Changes

This policy does not apply to change sets applied to fix an ongoing outage, or prevent the occurrence of an imminent outage.

Assets & Configuration

The following assets are relevant to this policy:

Ansible roles and playbooks are recommended for automation. See our prod_infrastructure repository.
Our standard monitoring tooling (Zabbix, etc) supports automated health checks, and should be updated as needed.