GitLab Delivery Team, 13 Dec 2018

Previous Meeting

⏯

youtube image

►

From YouTube: Delivery - Yorick and Marin discussing replacing Danger gem

Description

Discussion on finding a way to enforce more reliable testing practices

A

To others to watch some um in on the topic of red master, so I mean I've, been talking with a lot of companies here at cube con just to see like how they're handling their pipelines and I mean it's not completely shocking to find out that everyone has the same problem. Obviously, right, like everyone has like a large test week and very flaky test. One discussion I had with one of the companies was very interesting and I thought that made a lot of sense.

A

What I do with tests that are flaky? They confirm that their flake once they are flaky, they move it out of the general test, shoot and move it into a separate bucket. That says flaky tests they don't even run those flaky tests. It is up to the team then to create either a new test to replace the flaky test, or they do integration tests. Specifically on the item that the flaky test is causing and they're saying that anything else they've tried right like disciplining the developers automatically reverting just deploying green pipelines.

A

All of that at some point just collapsed under pressure right. The only thing that is working so far for them is just taking that immediate action of moving the pipe or moving the the thing into a flaky bucket and then requiring a change. Basically so I thought that was kind of interesting and I think it theory. It would resolve quite a lot of our problems as well. If we do it that way, yeah.

B

So I think we have that or had that where we had a. Let me check. We actually still have that there was a build called flaky tests or something where it like. We tried it a bunch of times, but it was allowed to fail. Yeah flaky examples check no one out of central how it works.

B

So we have something in that direction. I very much agree like if there's a test that fails on master like it fills one time and then it doesn't, it should be taken out, or at least mark that's pending the issue I can see there is that.

B

We we could automate that, for example, where.

A

B

Check that the promise I think once there's tests are removed or marked as pending or whatever there's no real incentive for people to fix that, because, in their mind like oh, you know, my code is a master. My feature is done: an equality or delivery, they'll fix it, rikes quality supposed to you, do two tests and blah blah blah and, of course, people say: no it it's not supposed to be the way and etc, but.

B

As long as, basically, if the feature itself stays, you have to start fighting to fix these extra things, so people so I go here. The feature itself works. It's just a test.

A

Right but but you know, I think there is an incentive because we do have error budgets. So if something does get a break and you have an issue assigned to your team to handle aspect that was removed because of its flakiness and you have a failure that automatically takes it out of your error budget right. So.

B

We could yeah, you have to affect the air about you, I think I wrote this gonna calm summer of feature, flux to deal and say yeah I think maybe we should start doing something of error budgets instead of just recording. Then oh.

A

But we're just figuring out how to record them still so I think you're trying to do it this way, and we can do something about them. But like one of the examples we could do something about the error. Biases is this thing exactly right.

B

Gnosis, so that is put it this. Whatever we come up with like recording and counting is towards your air budget totally agree, I mean we should do that.

B

My concern is basically I. I, don't want to end up where say six months from now we have this piles like 500 flaky tests. Nobody has any clue what they do or why they randomly feel whatsoever, quality or real. Some specific group of people has to solve that problem, and it's probably not the people who wrote the test well,.

A

Here's the thing this is where you can also play with merge rights, so we can define that you have a bucket of flaky tests. You need to have the the bucket at any time. It needs to be this full right like this is all the max you can have.

A

If you don't have it, your team needs to go to additional approvals to get their things merged, and apart from having additional approvals, you can actually get things refused, as in no one is going to merge your stuff, and that is very, very aggressive, but we need to be very aggressive right.

B

So that idea, like my earlier this week, I took another look at trying to get danger to run for Forks, and it's sort of related to this in sense that as part of that I try to figure out. Okay, all right, just any alternatives that work with a forking model goes on. Some issue.

B

Trackers from dangerous or like people have the same problem, but if get up pull requests, and so I was reminded that what people do with rust, for example, some other projects is, they have a bot where they assigned a metric us to that bot and the bot will thence rebasing until there are no conflicts and a pipeline screen, and then it will merge it. The underlying idea is that if you don't use merge commits, you can have more conflict. So you just let a book to all the rebasing where you could do.

B

There's hook that into this system, where the bot will say I'm not gonna, merge this because you consumed your error budget because I think if you let that up to people they're, not gonna, check the documents or true I, don't even know where we have the current number with the current error budget recorded DRO, and so you could do. That thing is that it's say with this is why I didn't do with the danger for cuz, like oh yeah, okay, we can do that like the idea had. Was we just run like a periodic pipeline?

B

It goes through all Forks and runs it, but then everything else like oh crap, you have to start clothing. Every fork really sketchy it. So.

A

You're not that you're talking about danger jam I had another idea. I think we might have touched this at some point, but I can't remember right now.

A

We need to consider maybe building danger in github yeah functionality of the I. Don't think danger right, but here's the thing I think we could get a you know a sign off of some sort yeah to build it ourselves because doing that straight product yeah or to other teams might be yes.

B

I'm sure they are like five issues of that. Actually, what a dates back to years ago, where seats the create an issue this some, this sort of stack on each other I. Think one issue is sort of a generic lint API or you can just have like a like I think, was like a message with like an indicator like red or green, or something like that kind of similar to the security reports and then j-unit output. We have, and then there was a second one where I think people want like generic build output.

B

That would be posted as a comment or something they were like a couple that sort of relate to each other and that I had one where recently right he's bid out Jason and we turn it into comments, etc.

B

I thought about that I, don't like the idea of posting comments, because you have to create know.

A

B

User and they have to update the existing comments. So what I like this idea of similar to the security report or whatever? It's just like a thing? You click and it just lists all the offenses and those might be marked down or whatever yeah and the instead maintaining super easy. Just look at the existing list and echo and.

A

Also, you, you, then, can plug into the pipelines. That say you can't merge a pipeline because you have offenses exactly.

A

B

I, don't remember exactly the issues I think most of them were somewhat typical and since they describe the intense, not the the implementation, the way I would probably see that is that you'd have one or more CI builds their rights output in a particular format, whether that's a JSON.

A

B

Or something else I don't know, but it's important that they don't talk to the API, because then it won't work for Forks. And so then, if CI, which wants to the build, has done it just checks or does it have and a particular artifact file. For example, then we ingest that and we took that into output, so that could be something like Oh a bill just write like lynnster jason in the route.

B

So if that exists, we sort of imported with some basic sanity checks, and then you know when the pipeline is that we display those results and then you could allow perhaps markdown bear whatever I think conceptually it's fairly easy, I just think you're gonna run into so what I remember from reading there were people saying like oh yeah, we want this in a format, that's supported by Jenkins by default and LJ unit, cuz ga-eun, and then other people say no.

B

It has basically I were just trying to think from their point of view and I think that what happens is when we implement this in a particular way. We're gonna spend like three weeks fighting with both atonic people. True.

A

Urich but but we also have to think about, obviously our values like how we actually do things so what I'm proposing here is. Maybe we wanna, like we schedule this item for ourselves and work on this, and we get to a situation where we can use it. I'm telling you right now that I had and like last night I had a discussion with one of the customers where I explained what we are doing with the commit messages with.

A

You know like all of that stuff with danger, and he was like all right awesome like where do I enable that thing you thought I was like. Oh.

B

A

This is a gem. We need that functionality. Can we add that so.

B

A

There is an interest out there, even for the very basic thing. So, let's, let's, let's talk about I, mean I, know that you're going on a vacation next week, but after after you're back. Maybe we can talk about like scoping this a bit and like finding out like what can we do to satisfy the minimum of our requirements to handle master? And we shouldn't do any implemented right? You and Robert I. Don't think that should be a difficult. You know.

B

And so I think so the core, if it can or basically all the haters yeah and all the bike and whatever so at the core. It's just you right, lins, Jason, whatever and I, think the former I had in mind was just very straightforward. It's like an array of objects, it's just a type which is like I, guess and warning error info, something like that and a message that message can be marked down, though we probably have to sanitize the crap out of it.

B

So you don't get people doing funny, xs/s injections there and then just every time. I build completes that artifact. This present we ingest it and then, despite I, think you can implement that basic idea. In a couple of days. Yep, probably most time will be spent fighting the default encode trying to get it to show properly I mean.

A

B

A

Like ask for some help from from fronted we'd like ours,.

B

A

Thinking about maybe like even having some person like part of their time, dedicated like we have for other teams. Oh.

B

Yeah this, my idea, is to sort of get really intimate like we basically built the backend with shitty font, I'd, say here. It is now. Please approve the front end yeah, but.

A

But that's them so the problems we have with forks. It will solve the problems of us having to like maintain something separate outside of gitlab. It will give you there's some basic functionality that is freakin amazing, yeah.

B

Cuz we could get rid of danger, which my biggest problem with dangerous, not just even the the fork issue. The fact that the code is completely untested, all just some random script and it evaluates in a weird context, and it means say if you require it: it's like everything's global. It starts running it, it's awful yeah and the second thing is we: we had a Philippa who, for some Hugh J s, project wiki lab they implemented.

B

Similar rules with verge project called commit leads by default and forces conventional message which are innate, but the problem there is that, because they used a separate project, they couldn't use all the exact same rules.

A

B

The output was kind of confusing there's a point: it's like okay, at the very least, these commit guidelines, and they need to move out of danger. Somehow yep and I think we.

B

Yeah with this lint API, if it's very straightforward and it supports mark, then you can basically do whatever you want yeah, you could go as far as have test output at limbs, they're ready, they said, go with this test field whatever and.

B

Then we, if you make sure like oh if there's one or more lengths of error than the merge they can get a long way. I won't. There's, though, that we have some checks that.

B

Check merge, request data like that title, for example, mmm-hmm yeah. You can get that yeah. You don't need an API token for that. That's public info. You can just grab that from the API yeah long story short I. Think that won't help a lot yeah.

A

Let's, let's talk about that once your once your back and I! Think with that it would allow us to think about the first iteration of some rules to say like okay, you have a flaky buckets right now, fricatives buckets! If we add this rule to the repository, your team is no longer going to be able to merge without an additional approval and that additional approver I needs to sign off on your merge into monster because you have your bucket is too full or you broke master too many times or something like that.

A

So that's an interesting. Perhaps.

B

Good, because that's definitely one thing we need to do but might be a simpler step, is we have code coverage reporting in gate lab we run into a few more checkers. We could say like hey. If, with your much request, the code coverage drops below X no merge and the reason it's like if you yeah.

B

The underlying idea is that if you have let's say 50% code coverage and there's a bunch of flaky tests, we remove those your code coverage goes down effectively. Nothing at that point can be merged unless something actually increases that code coverage again by ideally fixing the the flaky tests, which I think is probably easier to implement in a sense that we have the cup of coffee which data there.

B

So it's a matter of, oh if we had a threshold per project and if it's lower than no merge, probably people not gonna like it, but that's their way. You sort of kill two birds, oh no way, I'm not supposed to say, and it's not politically correct, somewhere off topic somewhere I read where people were trying to replace these sayings like killing two birds with one stone with two political, correct versions. And anyway, you get two things: they're like one.

B

By removing flaky tests, your code coverage goes down and in theory, people will have to then fix that to new merch requested out code without test will what.

A

B

There won't necessarily be reject like if your threshold is 50% 60%. You add some changes that you might go to like 59 point whatever you could maybe change itself like a minimum, its oh, the increase of code coverage has to be zero or greater.

B

In other words, you do something with code coverage make sure it never goes down always up or to saying that that way, if somebody asks like five front lines of Ruby code and no test, this is so say no and then I don't have selected a where I find the merge request where, as a bunch of helper code added some JavaScript diseases like no test yep yep bad shown said, like oh yeah, apparently I forgot blah blah blah, but if the system there says okay, your code coverage goes down.

B

2% you yep, then, in that there.

A

Is no arguing basically, then no.

B

And it would be human error and it would also remove the discussion. Yep yeah I think I think it's nicer to enforce the code coverage states. The same a goes up versus a minimum.

A

B

People are gonna argue that code, the minimum should be a hundred percent, but we cannot achieve that. So we set it to like 40 percent. I will always have more than that, so it basically becomes useless yep. Whereas if you say oh code coverage, let's see cannot yeah. Basically code coverage cannot decrease. That's it simple.

B

They you probably have to mark it opposite. Maybe you can say like. Oh, you can define a factor as in it can not decrease more and then X percent, but frankly for a first implementation. I would just say it cannot decrease.

A

Okay and I think we have an idea there. What we can do and I don't think it's gonna be well in any case, whatever we implement is gonna, take some time to implement so might as well like try and do the products first and see how far yeah we get. Yes,.

B

A

B

Check of merge request if this would even be Cecil yeah, so recovered 79% 79 point 13 yeah, so you could basically sell. If it goes.

B

You know if it's more than one percent less than the previous le or something then don't merge.

B

Yeah that something like that sounds.

A

Great yeah now.

B

So we have tool in seeing the code coverage thing the whole not merge into master. When it's read I know you could do a ball, but I think it up should be able to do that like it's something so simple to implement and get lap itself this, it's kind of stupid to require people to write a 500 to do that.

B

Let's see what else would be beneficial yeah, so that's the less technical thing- and this is much more long-term- is that we can do all of this I. We should and it's good, but this is also why I started.

B

Looking, for example, bucks per team, cuz, there's always been sort of mean, look how front-end breaks everything and people pointing at each other basically, and it doesn't help that when I start digging through can its in virtue quests, the pattern is sort of confirmed we're like oh, they add like five for the lines of JavaScript or Ruby. For that matter. You know it's like two tests. I think we need to start slowly. Looking at okay, what can we do about sort of the human side of things?

B

People might be very tempted. Okay, you took punish people, but there's a simple one. A simple example: not a simple solution. We made some UI changes to project dashboards and I was super annoyed that the clone button is now like five pixels left versus the RAM.

C

B

Idle right, then, I saw looking at look like. Oh, this is other button. We just like a dashed outline. Instead of a fixed slide and I saw some people reporting issues, and so people see there so get my brakes front at all the time. Bah, blah blah but I think the core problem is there that we don't really have tools for testing visual changes? Okay, you can write a test.

A

That wants too many time as well like they are aware that ya.

B

Know exactly, and so, because you can test that an element has a certain class or ID, but as far as I know, you can test that a certain CSS rules apply like our tools. Just don't support that and what I'm reminded of his the dolphin emulator for its emulator for the Wii U and what they did is to test graphical changes. They have a continuous integration where they basically take screenshots and they compare those with some algorithm to see how different they are. Jakob.

A

Shots made something like that very poor I, don't know. What's the status of that honestly, but implementation, that's right.

B

And so you could do here, you get like a ton of screenshots depending on how fine-grained you make it, but it's either that we have to somehow figure out how, with the browser API you can retrieve the CSS rules apply to an element, because then we, for example, these buttons- you can say. Oh all, these buttons in this area need a fixed line around them. Instead of dashes.

B

Or you know, screenshots or but either way. This is a distinct lack of tooling there and I think, unfortunately, because this is pretty difficult topic, nobody there's really any clue. How do you approach it.

B

C

Off topic, but okay.

A

I think I think we can I have had a good discussion here. I think we have some interesting ideas.

A

If you don't transfer this into an issue I'll, do it afterwards, I'll. Yes, so let me just quickly look after oh.

B

A

Need to leave, though, because I already think in like a few minutes sure here on site.

B

For the lint stuff, there are issues there's like all all of them, actually, because a lot of people want it so I think what I will do is probably two more I just want to dig through some numbers and see how bad know how bad things really are and then I have Monday until Thursday I think I can actually just probably on Monday, just implement this like without front-end cooks. It's not that difficult.

B

A

If you want to test for sure, go for it.

B

Like with no test whatsoever right, yeah.

A

B

A

Then reason issue in the in the framework is tracker and like put all your data, there collect all your data there and please put all the related issues there. So we can estimate what kind of things we can actually cover and not cover so yeah cool, awesome, you're. Thank you very much for your time and.