97 things every SRE should know
SRE is not a job, is a hat.
- Measure with a goal in mind. 
- Analyze what you are measuring. 
SLO -> Service level objectives | Set a “reliability” target for your service SLI -> Service level indicators | measure the system from the user point of view. Error budgets | How you performed on your SLOs
Is not your application is OUR application
Good-enough is good enough, aiming for perfection is expensive, both in human and financial resources.
Improve resilience:
- Load reduction - throttling 
- Load shedding 
- Prioritization 
- queuing 
- load balancing 
 
- Latency reduction - Caching 
- Regional replication 
 
- Load adaptation - Autoscaling 
- Over provisioning 
 
- Resilience - Timeouts 
- circuit breakers 
- bulkheads? 
- retries 
- failovers 
- failbacks 
 
- Meta techniques - Improving tooling 
- Scale faster (depending on situations) 
 
Last updated