What is an SLA? – SQL Governor

In SQL Governor, an SLA (or a service-level agreement) is a way to measure your environment's capacity on a longer interval.

Like warnings and alerts, SLAs are based on performance counters. The difference is that whereas an alert is raised whenever one or more consecutive counter values exceed the threshold, an SLA can allow several alerts before it is considered as having failed. That is, an SLA might say something like "my server's CPU usage should be under 75% for 99.9% of the time".

How is an SLA calculated?

To calculate the SLA's error percentage, the number of bad values is divided by the number of all values. As SLAs work on an hourly level, the number of all values is equal to the number of hours in the time period you've selected for the SLA. The success percentage of the SLA is 100 minus the error percentage.

For example, assume that our SLA has the following properties:

Level: Server
Performance counter: CPU usage %
Period: Month
Is sliding: Yes
Alert threshold: 75
Percentage: 99.9

Since our SLA has a sliding window of 30 days, the number of all possible values is equal to the number of hours in 30 days, i.e. 30 * 24 = 720. Since the percentage defined for the SLA is 99.9, the number of bad values allowed is then 0.01 * 720 = 7.2, which rounds down to 7. In other words, the SLA we've defined allows 7 hours per month where the average CPU usage percent exceeds 75 before it fails. Once there are 8 or more hours where the CPU usage is higher, the SLA has failed.