Useful Links
Computer Science
DevOps and SRE
Site Reliability Engineering (SRE)
1. Introduction to Site Reliability Engineering
2. Core Principles of SRE
3. Service Level Management
4. Observability and Monitoring
5. Incident Management and On-Call
6. Toil Management and Automation
7. Change and Release Management
8. System Design for Reliability
9. SRE Organization and Culture
10. Advanced SRE Practices
Toil Management and Automation
Understanding Toil
Characteristics of Toil
Manual Work
Repetitive Tasks
Automatable Activities
Tactical Work
Lacking Enduring Value
Differentiating Toil from Engineering Work
Hidden Toil Identification
Measuring and Tracking Toil
Toil Tracking Methods
Quantifying Toil Impact
Analyzing Toil Trends
Toil Reporting and Metrics
Toil Reduction Strategies
50% Project Work Goal
Prioritizing Toil Reduction
Automation Development
Process Improvement
Delegating or Eliminating Toil
Automation Frameworks
Infrastructure as Code
Declarative vs Imperative Approaches
Configuration Management
Infrastructure Provisioning
Version Control for Infrastructure
Runbook Automation
Self-Healing Systems
Automated Remediation
Previous
5. Incident Management and On-Call
Go to top
Next
7. Change and Release Management