Continuous learning
  • My continuous learning
  • Algorithms
    • Big O notation
    • Binary Search
    • Bloom filters
    • Heap vs Stack
    • HyperLogLog
    • MapReduce
  • Architecture
    • Distributed architectures
    • Event-Driven architectures
    • Kubernetes architectures
    • Micro-service architectures
    • Multi-cluster architectures
    • OpenStack architectures
    • SDN architectures
    • Storage architectures
    • Video streaming architectures
  • Book Reviews
    • 97 things every SRE should know
    • Antifragility: Things That Gain from Disorder
    • Atomic Habits
    • The Black Swan: The Impact of the Highly Improbable
    • The Culture Map
    • The First 90 Days
    • Fooled by Randomness
    • The Phoenix Project
    • The Unicorn Project
    • The Three-Body Problem
  • Engineering
    • Problem Solving
  • Mind Maps
  • Miscellaneous
    • Building a modern development environment
    • Complexity
    • Conway’s law
    • Feynman technique
    • Food as a reflection of a culture
    • Leadership
    • Leading a team
    • Memory Chunking
    • Rules for life
    • Software architecture
    • Moral of understanding what you are doing
    • UX
  • Projects
    • Blue-Green Deployments with Argo Rollouts
    • Canary Deployments with Argo Rollouts and Istio
  • Reading material sources
  • Tech Stacks
    • Chaos
    • Kubernetes
      • kubectl
      • Kubernetes deep dive
      • Managing Kubernetes Clusters
      • Multi Cluster deployments
      • Topology awareness
      • Cert manager with let's encrypt
      • Harbor
      • Inspektor Gadget
      • Komodor
      • Kubershark
      • kubevirt
      • Kyverno
      • Let's encrypt
      • Mailhog
      • MetalLB
      • OpenShift
      • Robusta
      • ingress
        • Nginx Ingress
    • Home Lab
    • SRE
    • FaaS
      • Knative
    • FaaS
      • OpenFaaS
    • automation
      • CD
      • Argo Events
      • Workflows
      • Dagger
      • Gitea
      • GitHub
      • GitLab
        • GitLab image mapping
        • Deploying GitLab in multiple clusters
      • Pipeline definitions
        • Test multiple python versions for a release
      • Pulumi
      • stack
        • Full platform stack
      • Terraform
    • cloud-providers
      • AWS
      • Fly.io
    • databases
      • Atlas
      • Postgres
        • Postgres for Sysadmins
      • Redis
      • Vault
    • development
      • GraphQL
      • Development experience for the next century
      • UX
        • devcontainer
      • Using code server as a service
      • Go
      • nim
      • Python
        • Making Python Fast
        • Poetry
        • Python Zero Copy
      • Rust
      • UX
        • Skaffold
      • UX
        • Telepresence
      • UX
        • tilt
          • Tilt
    • linux
      • LXC
    • management
      • Backstage
      • Crossplane
    • monitoring
      • Grafana
      • Loki
      • OpenTelemetry
      • Prometheus
      • Spawn a full monitoring stack
      • Tempo
      • Victoriametrics
    • network
      • Calico
      • external Nginx for kubernetes ingress
    • os
      • mac
        • Configure MacOS
    • scm
      • Git
        • hooks
          • Pre-commit hook
    • security
      • CodeQL
    • service-mesh
      • Cilium service mesh
      • Consul
      • istio
        • Istio from the ground up
        • Istio Monitoring
        • Ambient mesh
        • Istio Sidecar Mode
      • Jaeger
      • LinkerD
    • storage
      • Ceph
      • MinIO
    • testing
      • k6
Powered by GitBook
On this page
  1. Book Reviews

97 things every SRE should know

SRE is not a job, is a hat.

  1. Measure with a goal in mind.

  2. Analyze what you are measuring.

SLO -> Service level objectives | Set a “reliability” target for your service SLI -> Service level indicators | measure the system from the user point of view. Error budgets | How you performed on your SLOs

Is not your application is OUR application

Good-enough is good enough, aiming for perfection is expensive, both in human and financial resources.

Improve resilience:

  1. Load reduction

    1. throttling

    2. Load shedding

    3. Prioritization

    4. queuing

    5. load balancing

  2. Latency reduction

    1. Caching

    2. Regional replication

  3. Load adaptation

    1. Autoscaling

    2. Over provisioning

  4. Resilience

    1. Timeouts

    2. circuit breakers

    3. bulkheads?

    4. retries

    5. failovers

    6. failbacks

  5. Meta techniques

    1. Improving tooling

    2. Scale faster (depending on situations)

PreviousBook ReviewsNextAntifragility: Things That Gain from Disorder

Last updated 2 years ago