Useful Links
Computer Science
DevOps and SRE
Site Reliability Engineering (SRE)
1. Introduction to Site Reliability Engineering
2. Core Principles of SRE
3. Service Level Management
4. Observability and Monitoring
5. Incident Management and On-Call
6. Toil Management and Automation
7. Change and Release Management
8. System Design for Reliability
9. SRE Organization and Culture
10. Advanced SRE Practices
System Design for Reliability
Designing for Failure
Failure Mode Analysis
Single Points of Failure
Redundancy and Replication
Graceful Degradation
Fault Isolation
Circuit Breaker Patterns
Scalability and Performance
Load Balancing Strategies
Global Server Load Balancing
Regional and Local Load Balancing
Traffic Shaping and Throttling
Caching Strategies
Capacity Planning
Demand Forecasting
Resource Utilization Analysis
Provisioning for Growth
Performance and Load Testing
Scaling Strategies
Horizontal Scaling
Vertical Scaling
Auto-scaling
Disaster Recovery
Disaster Recovery Planning
Data Backup Strategies
Restoration Procedures
Recovery Point Objective
Recovery Time Objective
Disaster Recovery Testing
Tabletop Exercises
Partial Failover Tests
Full-Scale Drills
Business Continuity Planning
Previous
7. Change and Release Management
Go to top
Next
9. SRE Organization and Culture