Kubernetes SIG Testing, 1 Aug 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Testing 2017-08-01

Description

Meeting notes: https://docs.google.com/document/d/1z8MQpr_jTwhmjLMUaqQyBk1EYG_Y_3D4y4YdMJ7V1Kk

A

Okay, I think of.

B

A

Everybody today is Tuesday August 1st you month today. This is Sig testings weekly meeting today on the agenda. Tim Sinclair. You wanted to talk about how we actually like what binaries do we actually test and what Antigo actually cuts I.

C

Will go ahead and hand off to you, so there was a comment on sink Western lifecycle. We were discussing this last week not this week regarding the fact that the sink testing infrastructure builds from the build directory and it doesn't actually run the release bits in the comet we had made is that it doesn't antigo doesn't get tested till the end of the release and by a person.

C

Ideally, it would make a lot of sense if every build was a release and we tested the same mechanism and then all that was required for release was a special hand wave or a label or whatnot at the end of a cycle. I know if anyone had any thoughts to that or comments about it.

A

Well, I mean technically, like Hanako gets exercised whenever an alpha or a beta is cut as well, but that hasn't happened yet because the lease blocking tests aren't passing and.

B

I definitely agree with the sentiment. Are you planning on coming to the 1/8 to kick-off that decided in divide out I? Think that would be one of the good thing to add to the list of things for the release team take on what I was.

C

Not planning to but I can go to what I just need the time date. Yep.

B

It's a Friday, it's.

D

B

It's ready, unlock follow-up. We see I just connection yeah I'll, pull it up in the chat I.

C

Mean I can also talk about it.

B

For you um sure that.

C

It work either or I mean just there's a general problem in that what we actually there's actually just a slight disconnect. So that way, if there's a change to the release process, it all would automatically get detected and pulled in this part of the test. Read.

B

C

On its own Island mm-hmm.

D

B

Just joined us this we're just talking about cutting platform-specific, binaries, deads and rpms as a part of an equal I'm, currently not, which means that after we cut now flows or betas, well, really anything someone has to manually kick off on that process on, so it would be nice to pull it in as part of the script during this cycle, if possible, I.

D

Totally agree, especially since there's been a little bit of turnaround not having those available as at least time so yeah hundred percent on that do we know what the work looks like for that. Well,.

C

I know the tooling I know it's there, I, don't know what the automation to make it go is I.

A

Mean off the top of my head, this almost seems like a question the Jeff Crafton or somebody from a fill team, probably or answer in the context of like maybe on mixing concerns here that maybe one day like basil sauce this problem, because whatever basil builds is will be actually enough using as our tests and a release process ends up just using the same thing as well.

A

Today, I'm not sure exactly what the Delta is between the release, that and I go cuts and the commit that is tested prior to cutting that Elise right, conceivably, the everything comes from the same MIT. Your concern is that it's a different binary at the end of the day is that right.

C

This heart, not really the beige you're concerned, I, think that the entire build products that we produce are not regularly tested so Kubb ADM is a good example where it requires, as I was pointing out. It requires that Deb's and rpms to be built right in order for you to validate everything to be properly running so you'd want to have the those built as a regular, build chain, artifact and they're, not right now, they're on the island.

C

We what you want all of the other machinery and mechanisms to be in place as part of the release, as so that way, every single build that we do is very different than the actual release.

A

Okay, so in the world, where I do that, this is Debian, dense and Arkansas, downstream artifacts who owns those right like it seemed like we gave up strip upstream loads. Those artifacts then.

C

Those are actually produced by the release process. Okay,.

A

Yeah, it's a good discussion for cig release, which psi does happening in about 50 minutes. So.

A

That sound good shoud move on okay um Steve. Are you around just okay cool? So you wanted to talk about exposing failures in the Google Cloud buckets as an API yeah.

E

So quick back onto this I work on origin for a net and really right now are still using Jenkins to back all of our jobs, and we found that there's a plug-in in chickens.

E

It's fairly useful, basically just scrubs through the logs and exposes certain keywords or phrases in a way that it can show in the user interface, and so I was looking at the way that goober Nader right now shows this very similar sort of highlighting, but I think it would be super useful if we could describe the failures that happened in a test by looking at the artifacts that we push up to DCs after the test is finished and Eric had mentioned on a proposal that it would be great to expose this in some sort of machine, readable format.

E

And then we could sort of supplant the jagganath files that get created with with some other machine, readable format. That would help us describe failures. Don't really make to encode as J unit tests and.

E

So some examples of failures that you know I mean we could probably put agenda, but don't really make a lot of sense would be I. Just had one that you know there was a network flake on on tip install. You know it would be cool to be able to notice that have an API that says this failed.

E

This is a test infrastructure category and maybe some other tooling could go ahead and say: okay, we should probably retest that or it would be great if we could look at the go format, verification and look at the output of it say: okay, you know you're go for member if I failed and here are the specific files that it failed for without having to take that in generate Jana for it, because there's not really a test going on there, so speak.

F

What do you think of on the Travis's log format, Nui.

E

I'm not super familiar with the way that Travis does. It were the specific things that you liked about. It.

F

It seems you know, I guess if you we still run. If you look at some of the test in for a pull request, we still run Travis on there and I think other repos will have it and it's kind of like yeast command, gets its own sort of suck a tree view, and so each command gets its own.

E

F

Know, step which you can expand to see more logs about that. So it seems to strike a good balance between verbosity and you know: I guess it Auto expands as things fail or something, and so there may a.

E

F

To take from there, the the.

E

Thing that I had noticed this it's actually kind of similar if we started using some of the Jenkins pipeline stuff. You end up in this weird sort of fight for who gets to control what the step definitions are and Travis only has the resolution. As far as you know, each of the steps of Euclid is that considered yeah MO and often it doesn't make as much sense to have a whole bunch of fine-grained stuffs in there, and rather nidia makes harder or something.

F

Is anybody aware of other formats, aside from j-unit, for sort of representing a workflow and steps within that, because I yeah I think you're right? There were a lot of times like you know, we have a j-unit result for like cube up which technically isn't a test, but that's sort of a useful thing to highlight that that phase failed and so right now we're encoding that into a J unit thing.

F

But maybe you know there would be I think that's mostly just because that's the format that we're using, but maybe there's a better format already out there do we have to design our own I think.

C

Good lab has their own ya know that you can specify I'm not intimately familiar with it, um but I just know of it and just a plus one to the readability of Travis. It's very easy to find out where exactly you fail to dive directly, where you care about versus, like having this sift.

E

Yeah I think having that potentially to get lab DML is a target that might make sense. If we don't have to. We can use some prior prior art. There do.

F

We we know how good lab store. You know once actually the yeah Mille are you saying is the is this like? Is the pipeline representative, yeah mole or is the actual output? Also representative llamo I.

C

Think the pipeline had almost positive I plan is represented in I def, not I can't speak to the output again I'm, not an expert on it. So I've only seen pieces of it and not really directly used in myself, because I had don't have I haven't had a chance to use it really. There.

E

Was a presentation on this SIG meeting a couple weeks ago where they had the pipeline ya know for get lap out there. It looked a lot like Jenkins job builder I mean it's a pretty similar to that I. Think, though, one of the things that I was thinking about at least I was hoping would be.

E

We could probably find a lot of use in a tool that could sit outside the job and not have to be inside of the serialization for the job steps, especially as, but the proposal that I had put up was loosely based on the way that the Jenkins plugin does it, and it really just looks at the artifacts that gets spit out at the end of a job being able to take those artifacts and create a sensible machine, readable format for failures that we found in there would at least allow us to start indexing that, while we work on describing things in a more structured way on the other end of things, having that process be outside of the job makes it a little bit.

E

Smaller. Scope, I, guess and vayan would be pretty easy to, because you just need to have a GCS bucket. Those public's to look at and.

F

So is the essential idea that we run our jobs and then our jobs produced to build blog text and maybe also some j-unit xml artifacts, as well as some potential other artifacts, and then something notices that that build was uploaded and then it scans it analyzes the logs- and maybe we sort of say that we grep for these math these matching lines and these types of files. And what does it do with that? Did it like spit out some other thing with vim Gruber nadir displays that, as opposed to the raw built log, yeah.

E

I think the general idea that I had- and this is kind of half-baked, slow right now, but if we could take yeah define you know, this is a failure, cause that's described by some red X or some block that we look for, and here are the types of log files that we're going to look for the failure inside of and then once we find it. I was hoping that we would be able to tag it, maybe with some opaque tags for search ability and then also whoever's, configuring, the flake or the failure.

E

Detection would be able to provide priority or important values, because I think one of the things that's always really hard is ranking relative importance of failures but yeah. If we could take it and then describe it, you know here's the snippet of the log file use the links or you could find out online. Here's the relative importance that we gave it, and here are some opaque tags that you know we know this is an infrastructure failure, for instance, and then it tools like Google Rayner could use that, in addition to j-unit.

E

Just look for that in the output of the job in GCS and you to generate a front-engine tortoise or like what.

F

Are people's uh thoughts about how the goober nader goober Nader's output right now.

A

It sort of seems to highlight any times where air pops up in the log I don't know. I personally, have don't not spending a lot of cycles like going to a good breeder page, seeing the air text going through the log file finding the exact part of the test where that air at troubleshoot, but I assume that probably the round-trip time, where I almost always have to go to the build log.

A

That's usually what I end up doing like I would like highlighted thing, never almost never helps, there's a lot of false noise and the really juicy information that describes what happened isn't in that every might like any in the abstract. I think this is a really great idea.

A

I sort of second Eric's comment that, like I, think what's helpful for us is the ability to call out the basis of different steps and then output that corresponds to those steps who want to highlight that in je, and it just seems to be like a machine, readable format that a lot of common tools out there used today. Other consumers of data and GCS to me would be cattle, which I think actually maybe consumes the GCSE stuff and puts it a format that gubernatorial not sure and then I also think it's like tester in.

A

What's the you know, through some magic right, I've run tests, a j-unit file gets spit out and then I see rows in test rate that correspond to each of the things that would happen and I agree like go format. Things would be one example of it'd be great to have a bunch of those goal.

A

It would be another right what we're doing with Google in four examples: let's do, if any put together a plugin that runs, go in on-demand against all requests and then just comment on the actual pull request lines where that stuff is happening so that way like if somebody addresses that and pushes a new to Memphis just go away. That seems like really useful context for that yeah. That class of error, but I agree that, like, if there's a nicer format than j-unit for this sort of like these steps happen, then this is the output.

A

That's quote that corresponds to that. That would be incredibly helpful. Yeah I also really like.

F

A

Like that's something that can be given a set of artifacts like monsters and then generate that output, so it could be considered exposed as an API.

E

Yeah one of the things that I think we noticed downstream on OpenShift we've there's a set of tools that was written that allows us to generate jameth for specific lines of Bashan, just wrap it around it and it becomes difficult. I think some of these steps it'll be difficult to integrate with a tool like test grid, because there's not necessarily this temporal linking of these steps between every job.

E

Unless you want to have an entry either a global entry for go format or an entry for each file for go format, it doesn't doesn't make a huge amount of sense to track that over time and sort of divorcing failure causes from specific identifying names, maybe might be useful. I.

F

Would agree with Aaron, and you know Tim, that I have I've pretty much find myself always clicking. The expand. Think is inevitably the error that I'm actually or the actual cause doesn't seem to be highlighted because it doesn't have the word fail or erred in it, and so I'm a little and I'm, also a little bit of verse to I sort of just feel like if we I'm adverse to going down the path of like every error that we want is.

F

We didn't have to put in some reg X, because I feel, like that's, going to get really confusing over time and so yeah, but yeah I totally agree that sort of structure you know getting, especially if we're using bash scripts, getting all the data structured, um how yeah so so.

F

Goober Nader actually process that right now, which probably is not the end, result that we want over time, but goober Nader, actually out files directly and processes them Ketel does the same thing and then it uploads it into bigquery and then test grid has a job which is doing the same thing where it reads all the build log and the JUnit XML and converts it into a proto format that tests grid understands so, like you know, potentially yeah I mean it doesn't necessarily need to be temporarily based uh like, for example, when we do go test it, those will be run.

F

You know we have a bunch of tests that run in parallel a lot of times and then those will all still get different rows and test grid, and so you know I. What I would really like to see is a format that allows us to sort of represent like here's. What happened during this run uh in a way that's sort of convenient for tools to output and convenient for humans to consume.

E

Do you have fun I mean so your suggestion is. We need two drivers top down in the job definition. How does that currently jive with the bootstrap approach, I.

F

Mean so what we did now, it's not so what like, when we convert like there's two so bootstrap actually doesn't really care, it's sort of agnostic to bootstrap. What bootstrap does is it checks out your repository start some process that represents your job and then uh logs all the output and then uploads it, and then it will also upload any. Are any there's like a artifacts directory and anything you put into the artifacts directory. It will then upload the GCS, um and so you know, but perhaps well I feel like yeah.

F

Maybe you know defining like either a proto or an XML format or a JSON format or a gamma format. That, like represents what happened and at the end we can update test grid to you, know up to a test grade and guru, nadir, to read all of that and display that and then make it easier for jobs to produce. That would sort of be something that sounds nice to me.

F

If yeah I mean I and I think it does require some discipline and how we actually write our tooling and I'm, not sure how palatable that is, for people, but I feel like there's sort of a general senses that, like the shell script, usage is sort of a little unwieldy like I know. Zack is on converting the gke like cube up into a more of a you know, essentially getting rid of all the cluster DK scripts and replacing them with raw cheek loud commands inside of cube test.

F

And so, as you expect for.

E

A job to generate the structured output would you expect that to be something that is inherent to commands inside of the job being placed to demarcate okay, we're moving on to this next step? Or do you would you expect that to happen after the fact, every child run and then pick up on output, markers or something I, don't know we're going to be forcing people into some sort of job definition schema for them to benefit from the tool.

F

I'm, not sure yeah I think that's an open question. I don't know. Do you have thoughts on I guess you were saying. What do you think about that? Is that a good idea is that a bad idea I'm not sure it.

E

Seems difficult like to get a lot of buy-in for it. I think the approach I was looking at definitely has the downsides, but was mostly aimed at being an incremental improvement over some of the way. The good writer approaches it right now, where you know the cubelet log is chosen to be one of five things and very specific things.

E

It looks for inside of the cube log per se, and so the approaches really tries to divorce this process from governador specifically, and also make it a little bit more generic in terms of what we look for and whenever we look forward to try to make it a little more intelligent, because I think I also find that you know it'll highlight the word: failure a hundred times that it's not super useful, and so, if we could try to approach it from a mindset of don't highlight until you know, you find something that'll at least sell.

E

False positives, I think that sort of incremental approach might be yeah, I guess the scope is just luck, similarity or more comfortable working on something like that, then trying to figure out a new job definition schema that people would have to adopt in order to Bonnie yeah at least I.

A

Personally, that's not how I interpreted the proposal I look at it more as figure out the ideal output format and you're describing something that would give in a set given today's set of artifacts.

A

Could you populate that I can format with more useful information than what we're getting out of paper nature today and if it turns out that that adds good incremental value and we have buy-in on the consumption of that new format, we could maybe find a way to shortcut that and have tests write to that directly instead of writing to chain, but like definitely in favor of an incremental approach that is well scoped and worth like yeah. You don't want to tackle two hours of a problem.

A

You want to tackle enough to provide value and get feedback on whether or not it's working. Okay,.

E

But, okay, that make sense a consensus on some sort of end format. I guess would be a first step, though, before any any process is created to date.

F

How about you know if we could partition the build log into different phases like I? Think that might be a incremental thing is sort of what you're talking it seemed like part of your format was talking about I'm, going to search for these things and then I'm going to say that in this build bug, you know from dist exposition to this other text. Vision is going to be the error. I think I might also be interested in you know, knowing exactly when is the cubic versus the gingko test versus the you know build?

F

Where is that in the build log text? And if there is some way of representing that that within we could highlight in guru nader, we might be able to do some sort of tree type structure because, like maybe I'm, pretty confident that the build worked? Fine, so I just want to eliminate all of those logs or maybe I want to expand that, because that's what I'm interested in and I don't care about. You know what happened on cluster tower teardown, so hide all that formula.

E

Yeah we're we're doing a similar thing in origin, just using Jenkins steps, and that's we've found that to be pretty useful, so yeah I think that would you gotta do like.

A

Another another bit of low-hanging fruit that would maybe help with the goober Nader report. Today, not the bill blog, but like the little snippets I know, we pinched it a couple times the waiting for conditions find out what was the condition that was waiting for we're, expected err not to happen. There's another great one. Those are really difficult to search for in an easy way and require a lot more parsing of law, dense logs to figure out what the actual failure was at the actual line. Right, I think.

E

That ends up being a code review problem when you're writing vehicle tests and not putting expected statements in there. Yeah I'm, not sure tooling, is the correct place to solve that, but yeah that is frustrating to see expected error to not be know. Okay, great.

A

Garbage in garbage out, we will happily display the test results you have provided for us. Would.

C

It be possible for us to maybe a rally around what it is: an absence of what how we get there, what it is the ideal case that we would want to see and then maybe from there start to read lease rally around the issue so that way it could get traction over time. You know in an ideal world where computers didn't matter and they did everything we wanted to do.

C

What was the output we would have and the results that we would get that would make it unicorns and rainbows and then from there it rates slowly over time to see if we can actually achieve pieces of that. So.

E

If I'm understanding you correctly tim and eric', but it is equally important to highlight snippets when we're confident too they exist, but also to do that inside of something that gives us immediate context and lets you move away from that snippet without having to look at the whole bill at once enough context. Oh yeah.

F

Yeah yeah yeah, because, like right now, the two options are either highlight the focus thing that probably isn't relevant for me or all of it, I'd like to be able to expand and show more of it and again, yeah I think that's exactly right. Would.

E

You respect to different, like n C relation targets, one for descriptions of things that we needed to highlight, and one for markers inside of the log file that give us that tree view or would that it be the same structure. I.

F

Don't know I don't have strong opinions or I. Don't know I mostly just want the ability to uh yeah, you know, show more I mean not I mean as I get right like on github. If there is, if I'm reviewing a particular area of code and I want to see further up context, I can click. The like, you know, show the previous lines above that which isn't super great.

F

You know if there's a little more structure, maybe I want to see like the whole function or something, as opposed to just two like two lines of code around that um so yeah no I think it's not it's kind of tricky. Okay,.

E

Okay, so there's an action item: I'll, try to put something down and ready to try to put a first facet, a format that could give us what we look for.

A

Yeah, like what has been really useful to me for some of the proposals that come out of people like me are like documents that include screenshots of the intended changes to a user facing program just to show like what what the ideal vision looks like from the consuming it perspective and then next to the actual machine, readable thing, that's driving that you can sort of wine back from there. So just like mock screenshots or something maybe help us figure out what we went ahead.

A

Okay, that make sense and I think we're basically a time so public service announcements seems like play, crate is kind of increasing, or rather Eric did a good job of defining the difference between consistency and flakiness right. So consistency is gain coming as passes and fails the same sessions. Our consistency isn't looking too good. I was planning on like I, don't know you said that even though the kubernetes does- and we can raise an issue that the cube root cut the community meeting honestly something to talk about.

A

Yeah we should check it out is pretty much related to that because of those flakes all of the tests on the release. Master blocking dashboard are failing, and you know we're not cutting releases until all the tests that are passing.

A

So that's basically the way that announcement is going to go out communities if people are wondering whether their alpha is because it's we're like two weeks behind schedule on an alpha. It's because we can't cut one because the tests don't pass um and then last thing features fees today, not history of has any. This is the kubernetes facing features, but everybody there was originally supposed to be a meeting this morning to talk about what features we're going to go in for one eight I'll be happening right before the community meeting on Thursday.

A

That's all I got okay cool thanks everybody thank for it. I.

F

Thinks everybody.