15 Apr 2022
Context:
Discussion on KEP for improving reliability: https://github.com/kubernetes/enhancements/pull/3139#issuecomment-1095771101
Mar. 17th community meeting:
Notes: https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#bookmark=id.45wmiyb70mnb
Recording: https://www.youtube.com/watch?v=m1nNW7gnbU0&t=26m55s
Health indicators we already have (and how to improve them)
kind/regression bugs (https://github.com/kubernetes/kubernetes/issues?q=label%3Akind%2Fregression)
AI: label issues/PRs related to regressions in your area
Represent issues about things that used to work and stopped working. Starting to look at PRs with release branches, look to see if they are fixing regressions or long standing bugs. Doesn’t matter how awesome new features are if there are regressions in the release that keep users from upgrading.
long-standing + priority/important-* bugs (~trailing indicator)
https://github.com/kubernetes/kubernetes/issues?q=is:open+label%3Akind%2Fbug+label%3Apriority%2Fimportant-soon%2Cpriority%2Fimportant-longterm%2Cpriority%2Fcritical-urgent
AI: regularly check for these issues in your component/area
Bugs indicate health issues, are there new features touching areas with bugs and should we accept these new features. Be careful to accept changes in fragile areas. We have a duty to our users.
test flakes (~leading indicator)
AI: capture these in kind/flake bugs with details
[Hopefully making use of SIG-focused triage board that lets you to filter for specific SIG. We rely heavily on tests, if tests are not giving great signal then we don't have a reliable floor to know if new stuff is destabilizing an area
"known fragile" areas missing test coverage
AI: capture these in priority/important-* bugs with details
When you fix a regression, insist on a test to check for the specific regression. If we want our areas to remain healthier, we should also do a mini "post-mortem" on regression and find out how can we prevent this. If multiple regressions in same area then that is a loud signal that the area is fragile. Might mean we’re missing a category/class of testing. How do we ensure an area has a good foundation so we can accept new features in that area. After a regression, we should have a long term issue to identify what the gap was.
Discussion on KEP for improving reliability: https://github.com/kubernetes/enhancements/pull/3139#issuecomment-1095771101
Mar. 17th community meeting:
Notes: https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY/edit#bookmark=id.45wmiyb70mnb
Recording: https://www.youtube.com/watch?v=m1nNW7gnbU0&t=26m55s
Health indicators we already have (and how to improve them)
kind/regression bugs (https://github.com/kubernetes/kubernetes/issues?q=label%3Akind%2Fregression)
AI: label issues/PRs related to regressions in your area
Represent issues about things that used to work and stopped working. Starting to look at PRs with release branches, look to see if they are fixing regressions or long standing bugs. Doesn’t matter how awesome new features are if there are regressions in the release that keep users from upgrading.
long-standing + priority/important-* bugs (~trailing indicator)
https://github.com/kubernetes/kubernetes/issues?q=is:open+label%3Akind%2Fbug+label%3Apriority%2Fimportant-soon%2Cpriority%2Fimportant-longterm%2Cpriority%2Fcritical-urgent
AI: regularly check for these issues in your component/area
Bugs indicate health issues, are there new features touching areas with bugs and should we accept these new features. Be careful to accept changes in fragile areas. We have a duty to our users.
test flakes (~leading indicator)
AI: capture these in kind/flake bugs with details
[Hopefully making use of SIG-focused triage board that lets you to filter for specific SIG. We rely heavily on tests, if tests are not giving great signal then we don't have a reliable floor to know if new stuff is destabilizing an area
"known fragile" areas missing test coverage
AI: capture these in priority/important-* bugs with details
When you fix a regression, insist on a test to check for the specific regression. If we want our areas to remain healthier, we should also do a mini "post-mortem" on regression and find out how can we prevent this. If multiple regressions in same area then that is a loud signal that the area is fragile. Might mean we’re missing a category/class of testing. How do we ensure an area has a good foundation so we can accept new features in that area. After a regression, we should have a long term issue to identify what the gap was.
- 1 participant
- 18 minutes
12 Mar 2021
Dan Mangum and Rob Kielty are back for the second episode of Flake Finder Fridays. In this episode they will walk through how to run Kubernetes e2e tests locally, as well as how they are packaged and run in CI environments.
- 2 participants
- 52 minutes
5 Feb 2021
Rob Kielty and Dan Mangum kick off Flake Finder Fridays, a new Kubernetes community livestream where we explore building, testing, CI and all other aspects of delivering Kubernetes artifacts to end users in a consistent and reliable manner. In this first episode, Rob and Dan are going to look at recent failures in a Kubernetes build job and chat a little bit about why it was failing, what tooling is used to build Kubernetes, and the infrastructure underlying all Kubernetes CI jobs.
- 2 participants
- 44 minutes
27 Nov 2019
This will be a live API review, going through a real PR and showing how it's done. It will cover API norms, less-well-known conventions, rationales, validation, defaulting, and other important API concepts.
This is an opportunity to learn how to make your API review PRs go through faster and easier, with fewer revisions. It's also a great way to see how to do API reviews, in order to start down the path of becoming an API reviewer yourself. Every SIG needs to have active API reviewers to make development smoother and faster, so why not you?
HOW TO PREPARE BEFORE THE WORKSHOP
In order to make the best use of our limited time, please prepare ahead of time.
Reading:
Being familiar with the API conventions and API changes documents would help you get the most out of this workshop.
Laptop and build environment:
A working kubernetes build/test environment is only required if you want to try out API tests and code generation on your own during the workshop. In that case, you should have a laptop capable of building and running basic Kubernetes binaries, with the following software installed:
Go 1.13 (This is a change from the originally advertised Go 1.12)
Docker
git
make
kubernetes/kubernetes GitHub repo
Event link: https://events19.linuxfoundation.org/events/kubernetes-contributor-summit-north-america-2019/
Session link: https://sched.co/Vv6Y
This is an opportunity to learn how to make your API review PRs go through faster and easier, with fewer revisions. It's also a great way to see how to do API reviews, in order to start down the path of becoming an API reviewer yourself. Every SIG needs to have active API reviewers to make development smoother and faster, so why not you?
HOW TO PREPARE BEFORE THE WORKSHOP
In order to make the best use of our limited time, please prepare ahead of time.
Reading:
Being familiar with the API conventions and API changes documents would help you get the most out of this workshop.
Laptop and build environment:
A working kubernetes build/test environment is only required if you want to try out API tests and code generation on your own during the workshop. In that case, you should have a laptop capable of building and running basic Kubernetes binaries, with the following software installed:
Go 1.13 (This is a change from the originally advertised Go 1.12)
Docker
git
make
kubernetes/kubernetes GitHub repo
Event link: https://events19.linuxfoundation.org/events/kubernetes-contributor-summit-north-america-2019/
Session link: https://sched.co/Vv6Y
- 3 participants
- 1:16 hours
26 Nov 2019
Event link: https://events19.linuxfoundation.org/events/kubernetes-contributor-summit-north-america-2019/
Session link: https://sched.co/VvNY
Session link: https://sched.co/VvNY
- 2 participants
- 17 minutes
19 Dec 2018
This was an unconference session and therefore has no proper description.
- 6 participants
- 48 minutes
17 Dec 2018
Daniel Smith projects his screen and reviews API-changing PRs while giving live commentary!
Presenter: Daniel Smith, Google
Presenter: Daniel Smith, Google
- 7 participants
- 51 minutes