youtube image
From YouTube: What Happens When Something Goes Wrong? On Kubernetes Reliability - Marek Grabowski & Tina Zhang

Description

What Happens When Something Goes Wrong? On Kubernetes Reliability [I] - Marek Grabowski & Tina Zhang, Google

One of the best features of the Kubernetes is that it can automatically recover from various failures and keep your application working despite unfavorable circumstances. There are moments when this works like magic and operators won't even notice something was going on. Sadly, sometimes automation fails.

In this talk we're going to describe various policies and mechanisms that are implemented in the system designed to keep user applications and cluster in general running. We'll talk both about things that will happen automatically and those that users need to configure.

About Marek Grabowski
Marek is a Software Engineer turned Site Reliability Engineer late 2017. Currently he focuses on reliability of Kubernetes clusters. Since 2013 he has been working on Google’s Technical Infrastructure, where early 2015 he joined Kubernetes engineering team. In Kubernetes his main focus was scalability and machine management. Before Kubernetes he was working on Google internal orchestrator in Omega project.

About Tina Zhang
Tina joined the Google as a Site Reliability Engineer for GKE in March 2017 and has primarily been working on delivering High Availability Masters in GKE, bringing GKE to more cloud regions and improving monitoring and alerting for the system. Prior to this, she had a previous life as an investment banker at J.P. Morgan.
Join us for KubeCon + CloudNativeCon in Barcelona May 20 - 23, Shanghai June 24 - 26, and San Diego November 18 - 21! Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.