Kubernetes - Geiser's Cloud

Paging into the Night—Assess Before You Fix: Five Years of On-Call Lessons

29 May 2025 7 min read SRE

Five years on-call taught me the pager’s first demand is context, not heroics. From half-built clusters to rogue upgrades, this post shares war stories, triage tactics, and manager tips for keeping incidents—and engineers—under control.

Kubernetes Cost-Cutting That Actually Works: Rightsizing at Scale

26 May 2025 7 min read Kubernetes

Cluster idles at 8 % yet the bill soars? Learn how Prometheus data + KRR, a 50-line Python wrapper, Grafana and ArgoCD reclaimed 500 vCPU and 200 GiB across dozens of Kubernetes clusters—no magic, no incidents, just rightsizing done right.

When 'df' lies, 'du' swears it’s innocent and Loki eats your disk: a forensic walk-through

22 May 2025 4 min read Kubernetes

When df lies and du swears, look for Loki’s orphaned WAL segments. Our prod cluster filled up every week until we purged legacy boltdb-shipper data from S3. Postmortem, fix steps, and preventive checks summarized.

Kubernetes Operators 101: A Hands-On Tale of Go, CRDs, and Redis

20 May 2025 5 min read Kubernetes

After seven years running clusters I finally built my first Kubernetes Operator—a Redis PoC in Go with Operator-SDK. This post demystifies CRDs, reconcile loops, defaults, secrets, and status conditions, sharing hard-won lessons and next steps for anyone Operator-curious.

"DDoSing" EKS & GKE to Study DR for My MSc Dissertation

14 May 2025 6 min read Disaster Recovery

Deploying on Kubernetes doesn't equal disaster recovery. My MSc research showed this clearly—comparing AWS EKS and GKE recovery scenarios with Velero backups taught invaluable, practical lessons.