youtube image
From YouTube: 101 Ways to Crash Your Cluster [I] - Marius Grigoriu & Emmanuel Gomez, Nordstrom

Description

101 Ways to Crash Your Cluster [I] - Marius Grigoriu & Emmanuel Gomez, Nordstrom

Running a kubernetes cluster requires operating many components. One must be good at running and scaling etcd, multiple control plane components, a monitoring system, a logging pipeline, Docker, rkt, and Linux itself. And this list isn't even close to being complete. With such a long list of technologies comes the potential to make a mistake that brings the whole cluster down. Come hear war stories from the Nordstrom's Kubernetes cluster admins. Each is a true story of how the cluster melted down, how they recovered, and what they did to prevent it from happening again. Don't let any of these happen to you...

About Emmanuel Gomez
Emmanuel initiated and served as tech lead on the Kubernetes platform efforts at Nordstrom for the last three years. He was working with and advocating for containers before the Kubernetes 1.0 release and has continuously (and tirelessly) developed, operated, educated, and led containerization efforts there.

This work has forced him to grapple with many of the challenges that come along with the opportunities of containers and container scheduling. Challenges both technical (ex: complex distributed systems, microservices observability), and organizational (ex: inertia, fragmentation, training). Despite these experiences, he wouldn't trade the new problems back for the old.

About Marius Grigoriu
Marius Grigoriu leads the teams responsible for all of the major tools along the software delivery pipeline: issue tracking, version control, continuous integration and deployment, and production through the use of Kubernetes. His focus is to help teams ship high quality systems on time, on budget, and with a smile.

Off the job, Marius can still be found at the keyboard, whether writing Golang or playing classical piano.
Join us for KubeCon + CloudNativeCon in Barcelona May 20 - 23, Shanghai June 24 - 26, and San Diego November 18 - 21! Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.