Standard and Emergency Changes
This policy provides criteria for determining whether configuration and code changes can be applied to production systems outside scheduled maintenance windows.
Purpose of the Policy
Most normal changes should be announced in advance and applied during our prearranged maintenance windows. This policy provides allows for low-risk and emergency changes outside this schedule.
Policy
Standard Changes
If all of the following criteria apply, unscheduled deployment of low-risk changes may be permitted:
The change is expected to have a low perceived impact on service. This should typically be understood as 3 minutes of downtime or less, but this standard varies with the targeted service. Stricter standards should be applied for systems with high levels of visibility and use. (e.g. libraries.ou.edu or SHAREOK).
The change set doesn’t implement major user experience changes without updating the relevant documentation and training materials.
The change set has been peer reviewed.
The change set has a quickly testable outcome.
The change set is made using a tested, documented procedure with a predictable run time, preferably through automation.
The change set should be quickly revertible, preferably through automation.
The system being modified should have automated health checks to help monitor for unexpected side effects.
Examples of Standard Changes
Updates to add a new host name or redirect to the reverse proxy.
Updates to add additional storage or swap.
Updates to scheduled jobs (e.g. cron)
Emergency Changes
This policy does not apply to change sets applied to fix an ongoing outage, or prevent the occurrence of an imminent outage.
Assets & Configuration
The following assets are relevant to this policy:
Ansible roles and playbooks are recommended for automation. See our
prod_infrastructure
repository.Our standard monitoring tooling (Zabbix, etc) supports automated health checks, and should be updated as needed.