What is SLI, SLO, SLA & Error budgets in Site Reliability Engineering?

  • SRE is a discipline which is used to automate IT operations tasks – e.g. production system management, change management, incident response, even emergency response – that would otherwise be performed manually by systems administrators (sysadmins). 
  • It can help organizations to improve the reliability of their software systems. 
  • SRE uses SLIs, SLOs, error budgets, and SLAs to measure, set goals, and manage the reliability of software systems, making sure they meet user expectations while allowing for some flexibility.
  • In this blog, we will understand these terms by an example of a person who has setup a weekly walking goal  of 66000-70,000 steps.
  • SLI (Service Level Indicator) is a measure of the quality of a service.  In our example, SLI tells you how well you’re doing with your walking goal.   In this context, the SLI is the number of steps you actually walk every day. For example, if you walk 8,000 steps today, the SLI for today would be 8,000 steps.
  • SLO (Service Level Objective) is a target value for an SLI. To achieve our weekly goal of 66000-70,000 steps, our SLO will be around 9430- 10,000 steps per day.
  • Error Budget is the amount of time or number of errors that a service is allowed to have before it violates its SLO. In our example, suppose you set an error budget of 500 steps. This means you’re okay with not reaching exactly 10,000 steps every day; you have some room for variations.
  • SLA (Service Level Agreement) is an agreement you make for your walking performance. In our case it can be, “I promise to walk at least 10,000 steps every day (SLO), and I’m okay with not exceeding 10,500 steps or falling below 9,430 steps (error budget).” So, your SLA sets the expectations for how well you aim to meet your walking goal while allowing for some leeway.

As you try to achieve your walking goal, you monitor your performance (SLI) by checking the number of steps you walk each day. Your target (SLO) is to walk 9430-10,000 steps daily. However, you allow yourself some room for variations (error budget) because it’s normal to have days when you walk a little more or a little less than 10,000 steps.

By keeping track of your SLI and staying within your error budget, you can still stay on track to achieve your overall goal (SLO) in a week.

Leave a Reply