GitLab Pipeline Insights Group, 27 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Monthly Verify:Testing Internal Customer Call - January 2021

Description

Today Kyle joined us to catch us up on what Engineering Productivity has been up to, how they might use some existing test features and what the new repeat failed test counter.

Testing group playlist: https://www.youtube.com/playlist?list=PL05JrBw4t0Kq53VUOvTk3VdXN79PA0SXT

Last ThinkBig discussion: https://www.youtube.com/watch?v=mV9jCk5Znhw&list=PL05JrBw4t0Kq53VUOvTk3VdXN79PA0SXT&index=3

A

This is the verified, testing uh monthly internal customer call for january 27th 2021, it's our first one of 2021., uh I'm gonna, just vocalize real, quick, a couple of the roadmap deck changes of note, code testing and coverage moved up from minimal to viable maturity in january. Super excited about that and we are in active development on that code.

A

Quality, epic, that we've been in development on for a while to resolve those open dog fooding issues and also move maturity up to viable, been getting great feedback from customers from twitter from the forums of people who want those same kinds of features. They solve problems for them in the code quality space so really excited about that.

A

The other thing I wanted to call out was, I think, big discussions. We do post the recordings to the team, playlist I'll, put a link to that in the description of this video, and our latest discussion was with package about getting some team specific data out of the monorepo. In this case they were interested in understanding.

A

What's the test coverage for the area that we cover or that we're responsible for, and we do have a follow-up issue on deck for that in 1310, uh the team will be producing a coverage report for our stuff out of the monorepo and then we'll be circulating that with some engineering managers of imagine this was your data. You know how would you feel about it so figure out what problems that solves or if there's still problems out there.

A

We need to address with that type of capability, and after that manual effort, we can figure out how to build that in that's everything. For me from a high level view uh ricky, it looks like you're the first data player to vocalize.

B

Yeah, since we have kyle here today, I was just wondering if he could share with us the latest on the kind of dynamic test mapping early test failure stuff that they've had going on over there and if there's anything that we can do to help out.

C

Yeah, um the succinct summary is um we're kind of at a pause on it, so the recall rate was really low and we were facing a situation where we'd have to kind of endlessly adjust and change. I think it was around 85 percent, um so that would mean 15 pipe like what we saw was 15 of pipelines. We had the minimal jobs pass and the full jobs, whatever the full suite. That was run for that mr fail um and we were hoping for somewhere north of 95.

C

That 15 difference wasn't a lot that we can move forward on. What we're looking to do with the information is um starting to focus on mean time to failure and see how we can use that like use that data to just accelerate failure overall um kind of like what we did with art, like with the foss impact and other things there, which was the original um I'll say the um originally.

C

We were looking to swap out the full test suite for a smaller one to save on cost, but now we're just looking to see how we can use the results that we got to accelerate failure. um Meantime.

A

C

Is going to become like a much bigger engineering productivity metric that we uh we focus on over duration and the other ones that we traditionally a duration and cost in particular we're kind of.

A

Hitting the point.

C

With cost where it's diminishing returns- and we just want to make developers faster at getting the information they need.

C

As far as how you can help um like, I said we're really on pause, so it's hard to say we kind of set it aside and we're revisiting after we look at priorities for the next like quarter and how to fit it and fit in the work to accelerate the um accelerate failure. So I don't have anything great for you all on this. Unfortunately,.

B

Is there anything that you can think of from your efforts and work you've done on this that might be worth commoditizing and incorporating into the into the product in some way? I know I know you talked about how we experimented and it didn't really work out, but uh now we're talking about mean time to failure and how we can accelerate that, and you got some data from what we were doing before. So is this something that we can uh build into the app somehow?

B

Can we introduce a template or an image or something that we can? uh You know, sell people.

C

um I'm sure there's some value to customers that can be gotten from this. I just nothing is coming immediately to mine. Let me prompt albert and see what comes to mind for him. He has usually has a lot better insight on that than me um yeah. So I'm going to ask him an issue and cc ccu you all on that. Right now,.

B

One one thing that comes to my mind is the work that you had done to cancel the pipeline in flight in order to make the the fellow fast work, I feel like incorporating that into the product in some way would add a lot of value. Talking to my friends and and ex co-workers who are in the industry. A lot of them are looking into that type of thing.

B

Now it's like how can I save ci minutes by bailing on my pipeline early and like that's, that's, not necessarily a given or an easy thing to accomplish right now, with gitlab ci, and so so, providing them with a tool that just at least shows them how to do it with the api, I think, would add value.

C

Yeah, so I can at least point you to to that. It was all done through the api, so just api calls. um I let me let me take the action to provide you with that, because you're right, I think that was done as a part of the faucet impact. um So so that's that's one of the reasons why they didn't come to mind ideally well.

C

My thinking I should say is: it would be great if there was almost like a um like a short circuit setting um like allow failure so like if this job fails just stop everything else um or like cancel the not cancel, because then the status recording gets a little weird, but that's how that's how we implemented it, but ideally just halt all other jobs in progress, because we want to know about this failure right now and everything else beyond this point doesn't matter, um maybe maybe that's a little excessive, but um let me take.

B

C

Let me take the action to just provide that information that.

B

What it's worth that does seem like a a feature: that's general enough to package as a product. You know just it's very unconcerned with what the job is. It just says you know if something fails, stop the presses that might that might be amenable to sort of repackaging in a general way. I think that makes sense thinking about from like from like a ci template perspective like we can, we can produce something that I think will be easy for people to pick up and get started with that at least.

A

So I'm unfamiliar, how is this different than the current state of the pipeline.

C

uh So this what we're talking about here like short-circuiting the pipeline, already existed, so I wouldn't say it's different than the pipeline today. um It can be something that's harvested, though, and essentially just the pattern is reused in the template is how I see it. Is that what you were, I guess ricky. Maybe I should.

A

C

Let you speak on that.

A

So my question is: um if I have a job that fails today, it kills the pipeline. Where is the savings in like I'm, not understanding how it how it saves you runner minutes if the pipeline is going.

C

To halt so with dag yeah, so with the needs implementation that we have in our monorepo pipeline, we have jobs running in lots of different um stages simultaneously, so we so I in this my understanding might be wrong here. If you're, not using dag, you have a job fail. Everything stops at that's at that stage in the pipeline, but with needs. You can have things running way far out and you have to just wait.

C

So you have a job fail, 10 minutes into the pipeline, but because of everything else, that's running you actually get that feedback about 30 minutes later gotcha. Okay, um when I say like and when I say feedback like the email, the status on the mr would would be fail. You'd have a job status that says that has the circle with the x in it, but everything would look like it was still good.

B

Because the the job failure will stop newer jobs from running that haven't started yet, but if you're doing a lot of parallelization, the jobs that are already running won't stop executing is that is that accurate.

C

That's my understanding. Yes,.

B

Right and so, and so this the api call can jump into those running jobs and say no no stop. Now we really don't care just stop everything.

A

That that makes a lot of sense, because if you have a test job, that's just like a smoke test and that fails. You want to kill those long-running tests if you're, if you're running everything in parallel and they're like whoa, don't run that hour worth of tests stop right now, hopefully someone would say that runs a stage earlier, but.

C

Yeah, and so so you could do exactly what you described and we could actually configure needs to to set up the dependencies like that. Our problem is, there's a there's, a needs limit, I think that's like 50 per job, and since we use parallel on all of our test jobs where the model repo is very limited and how we can implement needs um when we start going when we start talking about like our spec jobs or anything that would depend on them. We go over the limit, gosh.

B

And this is definitely will something that is increasingly useful, as you have more parallelization like in a fairly standard job after job in your ci file. Configuration you're not going to get anything out of this if all your jobs run sequentially and if all your jobs run in parallel, you'll see the most possible benefit, and so, where you fall in that spectrum will depend on we'll decide how useful this feature would be. Yeah.

C

Yeah so it uh to your point, I think maybe it's not something that is important to a large. I don't know our customers very well, but um we may not have a lot of customers that have high parallels parallelization like we do um where they actually get value. Out of this.

B

I I think this goes back to conversations that james and I james, and I have had several times where. um Yes, the the majority stats-wise of our customers, may not see uh the benefit, but the ones who do are gonna be the the bigger customers. uh They're gonna, be the people that are using the product to its fullest and are are probably having lots of engineers participating and are very concerned with the speed of their.

B

B

Yeah yeah, I I I I'm kind of I'm kind of interested now in um working that into the the the needs or the the rules syntax. Somehow, where, like oh, if this job fails, then you should, you know, pull the plug on the whole pipeline kind of thing and having that as part of the the gitlab cie ammo, configuration is kind of compelling yeah.

C

And yeah- maybe maybe um if, if we can get the needs limit raised, I'm not sure like. I just think that's something that hasn't been relooked at um the I'll say the problem kind of gets the the magnitude of the people who would probably want this would continue to decrease.

B

Yeah, I think that's I can I can poke around and see what the current state of that is I'll. Take an action item to try and figure out where those conversations are at and if they're, nowhere to uh poke at them again and get them going.

C

Yeah, I was just trying to find that issue. I recall it from probably about a year ago.

C

Raise the limit, yeah I'll link them.

B

Oh, it's close now.

A

So I was curious kyle if there's and we talk a lot about testing and unit testing and trying to get that feedback loop faster. um What other areas of testing around like accessibility or the browser performance?

A

Can we make some impactful improvements on, so we can pull that further to the left and work it into more of the developer pipeline at gitlab.

C

Yeah, that's a excellent question.

C

I think, with the recent focus we talked about around code quality, that's an area where, as the feedback becomes more applicable to the merge request, that's an area that we can look at as far as the other areas, you mentioned, accessibility browser compatibility.

C

um Well, I can take that back to the other like qems and see, because I think that's just a blind spot that we have right now, yeah and um yeah. Let's, let's see what they uh let's see, what feedback they have. I, I think we're always open to that, but we want to be very careful with what we add and the feedback it provides to developers.

C

um You know we we get feedback, both directions on that we're like hey. This is great that we're doing this now, um but for every positive feedback we tend to see a lot more confusion or negative feedback um on us to communicate better. But um one more thing on that we do have the ability to like run jobs at a lesser frequency.

C

So there are certain things that we delegate to run like every night um das, for example, we run like a specific pipeline that runs all of their desk scans, that reports to um the security team for analysis and audits. We can always start with something like that. um I'm just not sure what we do from like an action perspective with the results, yeah.

A

It might be interesting to do that for accessibility and just drop the results into the accessibility slack channel. You know, even if it's only once in a baseline yeah and like here's four pages go scan those four, even if one of them is just a perpetually open, mr and see how things are behaving um yeah.

A

Okay, that's interesting! um The other thing that I've I'm hoping to get some insight into, uh because we see it continue to see an increase in the number of jobs that are generating data for the metrics report. The custom metrics trying to get a better understanding of how we're using that at gitlab and where the holes are in that, because I think we've had one issue in the last 18 months that relates to that feature.

A

So we've barely touched it as a testing group and it's still getting more usage overall at gitlab.com, so trying to understand better how git lab is using it and how that might inform future roadmap. For that so that we can make it better.

C

Yeah, I'm not as familiar with um with those cut. You say: custom metrics, right, yeah,.

A

Metrics reports.

C

Yeah yeah, I uh I this is really embarrassing yeah. I I I don't really know you see if I can find anymore with.

B

That that's no that's part of the problem yeah, but the thing the thing about metrics reports is when I learned what it did. I was like wow. This is super cool. You could literally do anything with this, so so all you're doing is you're outputting a text file in your job, with the open, metrics format like the prometheus metrics format in it, and then it compares that with the the base pipelines version of that report, so you could put literally any metric.

B

You want in there manually or automatically or whatever, and then it'll compare it against uh the base pipeline in the merge request, widget.

C

uh Where can I learn more about this? um It sounds like there's. Definitely some value and again I have I'm sure it's like in the deck. I'm sure I've seen this like six times and I'm just again very, very embarrassed here.

A

Box page, that's a place, I'm looking for an mr that's going to have it in.

C

There there we go okay.

A

So and I'm just going to share screen for a minute.

A

So it's this widget here, okay,.

A

I don't think I've ever expanded it actually on an mr, um so I'm not sure what's in here or what is useful, yeah.

B

So this is this is just uh this looks like it'll, be some measurements around um from from prometheus. It looks like it was generated by about mega megabyte usage and stuff like that, in the bundle when it was building the bundle. So that's one example of something you can use that for but, like I said it's just open, metrics format, anything so you could. I don't know you could put like how much electricity the runner used when it ran or like lit anything.

C

Yeah, um let me uh let me bring this up with the team and see. Maybe this is a place where we can surface so. Can you fail something on the custom metrics reporter we have to like read it like it's essentially just information presented to everyone, yeah.

A

It just dumps into a text file. I think, and then doesn't compare that way, um so you could fail on it, but it would require some custom scripting to do it.

C

Yeah, okay, um like one of the things that comes to mind, is there's like a lot of danger. Logic for, like the front end like web pack, size so being able to compare that to a baseline and then putting that information in um I mean it'll just be shifting from danger into something that's standard in the product and therefore maybe like harvestable and usable, um but yeah. Let me see um what could be done here cool. I too have never expanded. That, though, for the record, I.

A

Yeah, that's interesting. All of our mr widgets are very important kyle. You should be looking at all of them. I.

C

I do I do understand that every time we talk, I always I'm.

A

C

Well, I'm not looking at mr widgets enough.

B

And when we fix the metrics widget it'll be even more important there you go, but I think it's still it's it's still doing the wrong comparison like code quality was doing in the past from er drew. Is that right that that is right? As far as I know, I recently worked on an issue that I found out was like mostly fixed like a week ago, so I'm I'm currently hesitating to to speak to the state of any problem. But yes, as far as I know,.

A

Okay, that covers all up yeah, so I ad hoc topics, sorry go ahead.

C

Yeah, I was gonna say I think the action there is just for me to like ask the team and see. Is there um I'll read up on the feature and then ask the team about um different uses that we could have for this much appreciated.

A

C

uh So I have like three takeaways right: ask qems about the other testing and I can also ask engineering productivity. um One.

A

C

The things to consider is, if you're looking to use those features as a part of like moving them to viable. There's that option I talked about where we can run it on a different frequency and do something with the with the data um and then yeah. The other action was uh getting with albert on what could be harvested from the dynamics.

C

C

I feel like I never have anything fun for you. I could bring it's always like which which yeah, which, um if there's something more, I can do to help prepare or bring something to the meeting. Let me know: okay,.

A

B

Always a pleasure to have you.

A

I was going to say you don't come to us demanding re-prioritization of the road map, so you're one of the best customers.

C

Well, I I feel, like I feel, like you, have other people who are doing that too yeah you get plenty of that. You don't need one more cook in the kitchen.

C

B

Know you know the struggle.

C

Yeah yeah, that's definitely true, um because I kind of help like manage the team and also try to manage the backlog for entering productivity. It's kind of very, very hard, um but one of the things I was going to ask about code quality, so there's a team member who's looking to become a maintainer on the team um and I'm hesitant to just like spell it all out, because it's recorded and part of the feedback that you got from the maintainership program was maybe doing some development on features.

C

Since our team doesn't work as much on features, they've done development for you in the past on things, if that's a clue to who this might be, um but with the code quality work, it was something that came to mind as he and I were talking about um work streams that we're looking to dog food and that need some feature development. um If there's work, that's holding you back this back end on that or that's like not prioritized but needed.

C

Let me know and I'll see if this person can help out a little bit, it kind of aligns with what they're looking what their learning and growth objectives.

C

B

Sorry, I was just taking some notes there yeah. We can uh absolutely do that as we go, um nothing springs to mind at the moment. We are having some technical issues with uh the limitations of our shared runner, docker configuration, particularly the docker and docker configuration where we're at the point now, where we're not sure if we can speed up the code quality job at all because of the way the caching layers work with the shared runners, um it's kind of like it pretty much needs to pull all the docker images from scratch.

B

Every time the code quality job runs on the shared runners, instead of being able to cache it. So um that's that's kind of drew. If forgive me for summarizing your work, because that sound about right sounds exactly right. There's- um and I think we talked about this a little bit for our internal project- that there's a being unable to cache images.

B

Is the price we're paying for uh being for operating in a strictly disposable environment, um because caching is not disposing of things, um and so, as long as the the shared runners prioritize that disposability we're not going to get caching, and so it's a use case specific trade-off between those things. As far as we can tell right now- um and we don't have a good answer for um the middle- we don't have a good middle ground for that.

C

So um if like we don't use the shared runners, um is there the potential that customers or us like? We have the issue to look at changing the configuration, that's essentially to overcome this problem for our private runners right? Yes, okay, um that makes sense um that helps with the like kind of the background on that.

C

If we can prove that out, then we can have like a quantifiable benefit of our own use case to say: here's the improvement that we saw based on the scale that we run it um and kind of reaffirm that the other thing kind that I that I guess comes to mind. That's kind of related in q1 we're working with infrastructure to try out some new machine specs for our private runners.

C

I'm going to link to that sorry that I put it in front of everything, but this may also just have some general benefits for different tests related jobs, code quality as well.

C

So we might get some data points on using like moving from n1 standard 2 to different machine. Specs leads to these type of performance increases on a shot by job basis,.

B

Are we talking about trying out different specs, in particular, for different kinds of jobs, we're talking about like having our spec runners having gas runners just runners, that sort of thing.

C

I think it's more just a general fleet for our private runners at the moment, but we are going to test different specifications to see the performance on a jump by job basis. So we'll have the data to know. Oh, um our spec jobs are maybe cpu bound, so these ones perform better. Other jobs are maybe memory bound, so yeah. These type of machine specs will be better.

B

Just for decision, sorry, just for clarity, sake, we're talking about different um levels of vm, because uh our providers offer different speeds of cpu with different ram allocations or ssds or something right. Yeah. Sorry, that's! Well! That's! I guess I yes, that's what I meant this size vm for these runners for these jobs and this different vm for other runners for other jobs.

C

Yeah and there's not to say that that's not the decision we'll make going forward, um but I think it starts with gathering the data and then we can assess the cost and the value we get out of the complexity of managing different different runners for different jobs.

C

I think it's. What what's easier right now is having one general pool. It's like the elasticity is easier to manage versus, like growing pools for different jobs.

C

It's not unsolvable, but I think it's.

B

Is this related at all to looking at moving to kubernetes, or is it completely separate.

C

This is separate um moving kubernetes for our runners, like our hosted runners yeah. I think this is separate from this is separate from that.

C

The issue might be related to upgrading some of the components that have already migrated to kubernetes to different machine specs. There's a lot rolled into this, and one of them is looking at our own private runners. um So, just as if you read through the description on the issue, there's a lot there, but um the engineering productivity part is just the runners.

C

Cool. Thank you.

C

B

B

Awesome, I I don't. I'm all out of topics too.

A

See I could talk about profiling. You know build notes all day, but that's my jam, but we can do that a different time.

C

um I am curious, actually a question comes to mind. So the team, sorry, our engineering productivity, is really passionate about trying to do something better than we are with like especs inside of the mono repo.

C

What are some things that we can leverage inside the testing um like functionality, that maybe you don't see us leveraging that could help reduce or at least measure the frequency of flaky specs better on a test by test basis.

A

B

Have you looked at the test? Failure history.

B

C

James, do it? No! You.

B

Have you have, but uh in in the wrong context, you've looked at it from like. Well, we have all this stuff in the database. Can we use it and uh not as to what we're using it for yet? So my question is we're starting to put I'm going to flip. I'm going to answer your question with a question we're starting to put um the the results of tests that failed, specifically, so we're starting to log like okay. This is a test case when it fails.

B

We log it in the database, and this is when it failed and what pipeline it belonged to when it failed and what time it was and blah blah blah so we're starting to aggregate that data in the database and have it long term like what? What what can we do with that? That will help you um most because we're kind of just like well we're gonna, make a little notification badge in the mr that says that this has failed before in the default pipeline, and that's all we're doing with it right now.

B

uh If you had like concrete things that we could do with that data in order to improve the the lives of people in the quality department, that that would be helpful to us to figure out what we should do with it next.

C

um Yeah, so I would answer that as um what the question that always bothers me is: how frequently does this does a specific test fail on uh pipeline, like on a pipeline basis, ideally we'd be able to be able to look at like master and mr pipelines, um but it's? How often is it failing and then almost like how often over time so that we can see? Is this related to just like a master broken where there's like a spike on one day and then it just drops off or is it like a consistent?

C

um You know, I don't know, let's just say, like one percent of all runs of this test fail, we're talking.

A

About go ahead, say the logic that we built in specifically excludes the first case of if more than some percentage or it's some hard number of tests fail, it just abandons it and it drops that data on the floor. So we don't recognize.

B

It I think, as a.

A

B

Did I, I think, that's good to point out, but I'm not sure, that's quite what kyle said.

A

So yeah talking about the I want to know, if, like there's a bunch of failures on one day, there's this weird spike versus it's kind of flaky or flappy over time. Right.

B

Right but it's for like one text, so if one test fails 10 times in one day, that's the spike. But if one test fails once every week, then that's the more like spread out thing, but the thing that the feature does do right now is: if the whole pipeline just blows up and something went horribly wrong. We don't bother logging that every test failed, because it's probably unrelated to the test- it's probably the misconfiguration. So we don't log.

C

B

Data on purpose.

C

And I I think my example master broken super premature, optimization, ideally we'd just be able to say this test fails at this rate across merge requests and master pipelines, or you just say pipelines, and we can filter down from there.

C

Maybe that data is already available with api calls, um or I should ask, is that sort of data, so it sounds like we're tracking it. I've looked at test failures from like a pen to end test perspective. I know we capture the information on the gitlab pipeline. Can we get that sort of um failure? Rate information from what's extract, like what's available in the api.

B

So right now the problem is manifold, uh because we were worried very concerned about the scalability of storing this in the database.

B

um It turns out that it's the way that we've done it and the way that we've mitigated it in certain factors has helped, but uh currently we're only logging failures on the default branch, so pipelines that are ran on the default branch, so we'll only be logging it for pipelines that are run on master.

B

um With that uh I we were kind of of split minds on this. I was like how useful is it if we just tell people that? Oh this test failed to 20 times on any pipeline that ever ran in the last 14 days, like? Is that more or less useful than this test failed 20 times on the default branch uh in the last 14 days?

B

So that's what we're looking at right now over a 14 day period, we're not currently aggregating it and then further, because we're concerned about the scalability of the feature we're not actually storing the whole test name in the database, we're just storing a hash, a hash of the test name. So it's a fixed length because we were worried about people having like really really large test names in their in their files and having that in the database and causing issues.

C

With that and yeah and the volume of data is quite tremendous like when you look at our projects, just the amount of data.

B

Yeah, so with what we're storing right now, we're actually not seeing it's not too too bad, with the way, with all the caveats that I've just explained, only logging test failures on the default branch, not logging, any failures, if there's more than 200 in the whole pipeline, so it'll just ignore it. If there's more than 200 and further we're also looking to probably purge that on a 14 day period currently, and also we're not um we're not storing the full name.

B

So, given all of that, we're actually not concerned with the expansion that we're seeing we're seeing about, um I don't know we're seeing an amount of increase that I know the exact number of. But this is a recording uh and it's it's not concerning to us but who's to say how what that rate would be if we included every pipeline and not just the default branch pipeline, and is it worth it to investigate that avenue to from your from your perspective,.

C

I don't think so. Right now like, we are still like in exploratory mode of how what data we want to capture we'd, want to capture what's available in the product and um really what we're looking to do with it is make more informed decisions about what to focus on from a flaky speck perspective, um what to automatically quarantine. What uh kind of signal boost to ems and say hey?

C

This spec fails two percent of all pipelines um because we're seeing a larger impact to developer productivity than we anticipated based on legacy failures, um like our merge request. Success rate at the pipeline level is 65 on average, which was surprisingly low to me, um and if we can cut flaky specks out of that, raise it up even higher, so that feedback is more actionable.

B

So how what what thing, what kinds of things are you doing, or have you been looking at for pulling that more to the left right, like it's one thing for it to that number is high to be failing in in the app right. But um how have we tried what what stuff have we done already to try and make it easier for engineers to run the test locally so that it's failing there, instead of failing on machines that cost money.

C

Yeah, that's what we're looking to harvest from the dynamic spec analysis piece where we'd essentially use that mapping tie it into the left hook, which is kind of our default pre-hook, tooling, right now, and run tests that are most applicable to the files that you've changed, be able to empower people to do that very easily locally. um That's our our goal on shortening the feedback loop. um I think we're looking to just take more automated actions based on frequency of flinky failures with the data we were really talking about.

B

Yeah, so, given what we have in the table, if we didn't change anything, we could build something into the product where you would feed us the name of the test in a certain format, and then we could feed you back how many times it's failed on the the default branch in the last 14 days. Basically,.

C

Okay, um let me just refine this, like I really just brought this up on the cuff so like that sounds great. Let me um I'm still kind of in our the team is really passionate about this. We have a lot of priorities. We don't have a good plan um on what we want to do. What's the smallest thing, we can do to add value um from a reduce the frequency of flicking specs in or out of interrupting the developer experience.

B

Yeah, so we're hoping that that widget provides some value in that context as well, because now in the test, failure widget. If a test fails in an mr pipeline and it's failing in uh master and the default branch, then you should get that notifications like hey. This has failed 10 times in master in the last 14 days, and so that should be at least an indication uh where engineers are looking at it. That might tell them that it's not their fault, that that the pipeline fail.

C

Yeah yeah, let me um yeah I'll, definitely make sure and kind of signal. I'll signal boost that to the team and say here's a here's, a way for us to get some information based on what was that on.

B

I was talking to eric about this a little bit it's because the expansion of those tables is so much below where we were worried about like this, we can absolutely add, probably the full name of the test in there, so you could query it a little bit more effectively and build some data from that as well um yeah, and then I think we're also talking about like rolling it up james we had we had an issue or a conversation around instead of just purging the data after 14 days, we'd roll it up into a summary table and then store store that for a longer period of time,.

C

I so now I'm curious: do you have feedback from customers uh that are similar scale or like similar model repo strategy, as we are on um where to take test? This test failures feature um that maybe yeah okay,.

A

Okay, so this feature just rolled out with 13.8, um so we know that a lot of those monorepo folks are self-hosted, um not on.com, so it's going to take a while for them to actually get this in their upgrade cycle.

C

A

C

Yeah I wish I would have waited to bring up flaky specs with it. We just had a discussion on tuesday in our team meeting about flaky specs, and now I'm thinking we should revisit it. After some of this information.

A

So the same functionality is coming to the unit test report in 13.9. You'll, probably see it on com in a week or two. I think scott is jamming through that pretty quick and there will be a feature flag on it, so it'll be accessible there too.

A

Okay, all in the merge request, widget right there. It's in the merge request widget today. If you expand it now,.

C

Yeah yeah. Yes, yes, yes, okay,.

A

C

A

For the information.

C

Yeah, I will, um I will look and see what could be helpful for us with the future, after um really refining our need and what we're trying to do and seeing how the functionality aligns with that and get.

A

Back to you and.

C

I have a lot of get back to you before the next meeting.

A

All good I'll bug you in a couple of weeks to remind you.

C

Sounds good, hopefully I don't drop. The ball like I did with ricky on.

B

The uh the issue- don't don't worry about that? I was good. I was gonna, make it if you hadn't made it already. So thank you for making the issue.

B

A

B

Got five minutes.

A

Still time we talked about how cold it is everywhere again, but I would stop the recording for that.

B

I think I think my sweater says it all.

A

Right. Thank you guys, uh gentlemen. I will upload this to unfiltered later today and we can distribute it to the various teams who can contribute there. All right thanks, awesome, cheers.

B

Thank you very much.