Useful Links
Computer Science
DevOps and SRE
Site Reliability Engineering (SRE)
1. Introduction to Site Reliability Engineering
2. Core Principles of SRE
3. Service Level Management
4. Observability and Monitoring
5. Incident Management and On-Call
6. Toil Management and Automation
7. Change and Release Management
8. System Design for Reliability
9. SRE Organization and Culture
10. Advanced SRE Practices
Service Level Management
Service Level Indicators
Defining User Happiness
Mapping SLIs to User Experience
Choosing Appropriate SLIs
Common SLI Types
Availability
Latency
Error Rate
Throughput
Durability
Custom SLIs for Specific Services
SLI Implementation Patterns
SLI Data Collection Methods
Service Level Objectives
Setting Realistic Reliability Targets
SLO Definition Process
SLOs for Different Stakeholders
Documenting and Communicating SLOs
Reviewing and Revising SLOs
SLO Compliance Measurement
Multi-Window SLOs
Error Budgets
Error Budget Concept
Calculating Error Budgets
Error Budget in Decision Making
Error Budget Policies
Balancing Reliability with Feature Velocity
Error Budget Burn Rate
Error Budget Alerting
Service Level Agreements
Distinguishing SLAs from SLOs
Legal and Contractual Aspects
Business and Legal Implications
Managing SLA Breaches
SLA Negotiation Strategies
Previous
2. Core Principles of SRE
Go to top
Next
4. Observability and Monitoring