Kubernetes / Reliability & Testing Resources

Add meeting Rate page Subscribe

Kubernetes / Reliability & Testing Resources

These are all the meetings we have in "Reliability & Testin…" (part of the organization "Kubernetes"). Click into individual meeting pages to watch the recording and search or read the transcript.

15 Apr 2022

Context:
Discussion on KEP for improving reliability: https://github.com/kubernetes/enhancements/pull/3139#issuecomment-1095771101
Mar. 17th community meeting:
Notes: https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#bookmark=id.45wmiyb70mnb
Recording: https://www.youtube.com/watch?v=m1nNW7gnbU0&t=26m55s


Health indicators we already have (and how to improve them)
kind/regression bugs (https://github.com/kubernetes/kubernetes/issues?q=label%3Akind%2Fregression)
AI: label issues/PRs related to regressions in your area
Represent issues about things that used to work and stopped working. Starting to look at PRs with release branches, look to see if they are fixing regressions or long standing bugs. Doesn’t matter how awesome new features are if there are regressions in the release that keep users from upgrading.


long-standing + priority/important-* bugs (~trailing indicator)
https://github.com/kubernetes/kubernetes/issues?q=is:open+label%3Akind%2Fbug+label%3Apriority%2Fimportant-soon%2Cpriority%2Fimportant-longterm%2Cpriority%2Fcritical-urgent

AI: regularly check for these issues in your component/area
Bugs indicate health issues, are there new features touching areas with bugs and should we accept these new features. Be careful to accept changes in fragile areas. We have a duty to our users.

test flakes (~leading indicator)
AI: capture these in kind/flake bugs with details
[Hopefully making use of SIG-focused triage board that lets you to filter for specific SIG. We rely heavily on tests, if tests are not giving great signal then we don't have a reliable floor to know if new stuff is destabilizing an area


"known fragile" areas missing test coverage
AI: capture these in priority/important-* bugs with details
When you fix a regression, insist on a test to check for the specific regression. If we want our areas to remain healthier, we should also do a mini "post-mortem" on regression and find out how can we prevent this. If multiple regressions in same area then that is a loud signal that the area is fragile. Might mean we’re missing a category/class of testing. How do we ensure an area has a good foundation so we can accept new features in that area. After a regression, we should have a long term issue to identify what the gap was.
  • 1 participant
  • 18 minutes
reliability
approving
reviews
considering
critical
decisions
concerns
contribution
indicators
tends
youtube image

12 Mar 2021

Dan Mangum and Rob Kielty are back for the second episode of Flake Finder Fridays. In this episode they will walk through how to run Kubernetes e2e tests locally, as well as how they are packaged and run in CI environments.
  • 2 participants
  • 52 minutes
testers
troubleshooting
maintainer
flake
present
showing
guide
execution
peek
screen
youtube image

5 Feb 2021

Rob Kielty and Dan Mangum kick off Flake Finder Fridays, a new Kubernetes community livestream where we explore building, testing, CI and all other aspects of delivering Kubernetes artifacts to end users in a consistent and reliable manner. In this first episode, Rob and Dan are going to look at recent failures in a Kubernetes build job and chat a little bit about why it was failing, what tooling is used to build Kubernetes, and the infrastructure underlying all Kubernetes CI jobs.
  • 2 participants
  • 44 minutes
troubleshooted
kubernetes
monitoring
currently
testing
ci
triaging
live
pinged
boater
youtube image

27 Nov 2019

This will be a live API review, going through a real PR and showing how it's done. It will cover API norms, less-well-known conventions, rationales, validation, defaulting, and other important API concepts.

This is an opportunity to learn how to make your API review PRs go through faster and easier, with fewer revisions. It's also a great way to see how to do API reviews, in order to start down the path of becoming an API reviewer yourself. Every SIG needs to have active API reviewers to make development smoother and faster, so why not you?

HOW TO PREPARE BEFORE THE WORKSHOP

In order to make the best use of our limited time, please prepare ahead of time.

Reading:

Being familiar with the API conventions and API changes documents would help you get the most out of this workshop.

Laptop and build environment:

A working kubernetes build/test environment is only required if you want to try out API tests and code generation on your own during the workshop. In that case, you should have a laptop capable of building and running basic Kubernetes binaries, with the following software installed:

Go 1.13 (This is a change from the originally advertised Go 1.12)
Docker
git
make
kubernetes/kubernetes GitHub repo

Event link: https://events19.linuxfoundation.org/events/kubernetes-contributor-summit-north-america-2019/
Session link: https://sched.co/Vv6Y
  • 3 participants
  • 1:16 hours
apis
api
servers
policies
processing
requests
implementation
behavior
schemas
collaboratively
youtube image

26 Nov 2019

  • 2 participants
  • 17 minutes
testing
tests
testgrid
assessor
grid
grade
features
gcs
dashboard
sig
youtube image

19 Dec 2018

This was an unconference session and therefore has no proper description.
  • 6 participants
  • 48 minutes
tester
testing
troubleshooting
flaking
dashboard
monitoring
fail
bother
push
kubernetes
youtube image

17 Dec 2018

Daniel Smith projects his screen and reviews API-changing PRs while giving live commentary!

Presenter: Daniel Smith, Google
  • 7 participants
  • 51 minutes
api
discussion
thinking
users
clients
lets
guidelines
kubernetes
staging
hesitant
youtube image