GitLab Ops Cross Stage ThinkBIG, 8 Oct 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ops Cross-Stage ThinkBIG for October 2020

Description

A conversation for the strategy team members in the Ops section devoted to look at how the number of tests executed could be influenced by data from monitoring a system in production and how much monitoring is necessary based on test data.

A

This is the ops cross section, think big um and the topics that we're talking about today uh are near and dear to all of our hearts. um It's about how do we run fewer tests by making use of monitoring data and the flip-flop of that of how do we potentially monitor less by making use of test data?

A

So I'm going to go ahead and share my screen, hopefully not too giant, and so everyone should be looking at the mural board that we're using now for the think, big thing small so we'll take about, let's say 20-25 minutes to think big about this set of problems, I'm happy to flip-flop back and forth. I think there's a lot of overlap in both, so I don't think we have to go just down one track and then down the other and then we'll flip over to the think, small and think about.

A

Potentially what could we actually pick up in say the 13 6 or 13 7 milestone that may actually result in an issue that we pick up and run with for any of our teams um or even another team that doesn't represent it today. um It may not we'll see, but that's kind of the intent and how we've run with this format, um at least in my group of we always end up with an issue that we pick up in the next milestone to move us forward towards that larger think big.

A

um So I'm gonna kick this off a little bit the problems that I put into the dock that we would want to solve things that we've been thinking about in testing and I've been thinking about for a long time is especially with someone who's really adhered to, that test-driven development and has that test pyramid and has thousands and thousands, or even millions, of probably unit tests or long-running integration tests.

A

It just takes forever to run those tests which can lead to a lot of wall, clock time for a developer, waiting for feedback or a lot of expense if you're running massively parallel runners of trying to get all these running in a decent amount of time, especially if you're doing it for every mr.

A

So how do we leverage other data to selectively run some of those tests? So that's a problem that we um have been thinking about and then the flip of that, um if we think about well, we could use monitoring data. Well, can you do the opposite of that and start to identify here's areas of our application and our system that we want more granular monitoring on?

A

If we're thinking about tracing and sampling, we want a higher percentage of those traces sampled, because the code is flakier, the tests are flakier, there's connectivity problems or there's downstream things that can really impact this area of our system. um Is there potential there to have smarter monitoring? So those are the the two problem spaces that we want to talk about today.

B

Can I ask clarifying question: do you mind absolutely okay, so what we're drilling in here it sounds like uh it's. Reducing the amount of time spent triaging and testing problems that come out of tests from a developer like if we look at what an organization would benefit from this feature set. It's that we're making developers faster. Is that what you're thinking.

A

Faster in that their feedback loop is faster, so, like the the pipeline for get lab takes an hour to get through all of the tests. Can we make that hour? 30 minutes? Can we make it even 10 minutes, because, instead of running 170 000 unit tests across 12 jobs, you've run 17 000.? Can we reduce that number of tests?

A

That's the yeah. That's the end goal is that you, you reduce by some massive number, how long it takes you to get feedback, but you still have a lot of confidence that you've tested the right things.

A

Sorry, I'm trying to take notes of this one and looking at this one yeah.

B

I'll take notes, so don't worry about that part. I do have to leave early, though so I'll take notes. As you talk.

A

Okay, um does that help clarify your question or does that answer your question awesome.

A

Any other problems here, or can we refine this problem statement some um to make it more big, more big. My words are the best words this morning.

C

I'm just like maybe provide a bit of context from from the monitor side. There are some like a monitoring solution that emerged from testing, specifically synthetic monitoring.

C

So, like synthetic monitoring, uh the uh that you want- let's say you do like the load testing, so you measure, like you, you, you record like some sort of a click through script that like do login and do something else.

C

So usually you are doing it with low testing, but then what emerged from that? Is they that operators realize that those scripts are very useful to monitor them in production, so they took those scripts from the tests and they say: okay instead of running like 1, 000 or 10 000 full load. Let's take one run it like every five minutes, every one minute or 15 minutes, and this way I'll be able to like easily put some like monitoring system like and make sure that at least things are not breaking in in production.

C

Once I'm deploying any application.

A

Yeah, so is there so if we imagined that an organization had that synthetic monitoring, but they were doing it for every single integration test, even if you're only doing it once a day, you're potentially running thousands of those tests, and if you start to see, is it easy to lose the signal in there for where there's actually a new.

C

Problem yeah, I mean obviously um the the second problem in the static monitoring world is that the maintainability is is, is practically impossible. Yeah yeah, so I guess yeah that that was like. That was one of the problems there so like maybe using tests like somehow can you can ease this pain on like maintaining scripts.

A

Yeah I maintaining the scripts is always problematic. I wonder, um is there any benefit, though, to an org if they're running fewer of those tests or instead of running every test once an hour? Let's say your login just underwent a lot of change, and so we want to run the login script every five minutes for a while to make sure that that area is more rock-solid in production or we want to monitor those systems more often because the tests are historically flaky.

A

Over the last 30 days, where we've seen a lot of flapping in the test, results of fixed broke, fixed broke.

A

I feel like I'm presupposing solutions here, but I always feel that way when we think big.

B

I'm curious if this is a problem that has come up from existing users or for anticipating this need from the market.

A

I can say that I felt this pain in previous roles, working with tech, ops teams, development teams want all the monitoring, and that's it's just a huge amount of data that, if you're using a sas provider it's very expensive if you're hosting yourself is very expensive and very expensive to maintain, um and then you run into a performance problem like if you're trying to search across you know terabytes of logs to find information, it's really hard and really slow.

A

um So is there a better way to I've, been thinking about this for a long time of? Is there a better way to get just the right data there? That is going to be actionable? It tells the team what they need to know, um but it's timely as well, and I know, there's there's a balance to be struck with. Well, we don't know where there's problems where problems are gonna lie and if we don't have monitoring on everything, how are we ever gonna find it so like.

C

There is this concept of again thinking out big hill: um let's, let's, let's try to like flip the the problem, a bit. Okay. uh First of all, I think that, like monitoring production is, is super important and it's a it's a job by itself. Obviously, uh and I found it hard to believe that there would be like operators to say yes, I I I'm fine with less less monitoring. Of course, they would like to do request monitoring, because it's an overhead, but it's like you know, it's super critical.

C

If the application is mission critical, they don't want to skip on it but like if we look at it from the other side, let's say that if there is, there is a way to uh analyze the user behavior. What is the most common flaws that my users are doing? Let's see like 99 of my user do a b and c.

C

Then I can take this data and take it back to my. Maybe my testing and say hey. We know, like you, have like a bunch of scripts that you run on tests but like the most important one, 90 99 do this so like this is most the most critical thing for you to test. The other is like, of course, it's important, but if you need to prioritize it, let's prioritize it based on the user action and like not like don't prioritize like all the tests, yeah yeah.

A

Yeah I've been thinking about that when it comes to like apis. uh If you're looking at this service um accepts or 90 of the calls in our system go into this single api, and here are the top three things that call it. Let's always run those three integration tests as part of every single pipeline and then we'll randomize. The other integration points that we have is that a is that a good way to have confidence, while reducing your number of tests that are run.

C

I think I think this is this is uh first of all. Yes, okay. I think it will help spot some sort of conversation between, like you know, ops and there I keep talking about like devops and ops dev. So this kind of conversation and this kind of like alignment between what operator running in production versus what is running on on the test is, is super important so like having having this ability to like, like you, know, bridge this gap, because I know we're talking about devops but like in reality like most of the enterprise customers.

C

You know they have their own monitoring tools in place and it's very hard to bridge this gap, so bringing some production data insights into the development cycle and the test can focus the test effort for the developers. So I think it's it can be like super critical. You need to think about. How do we do that? But this is like this is like the next step. Yeah.

D

Hey again, this is dan. um I think jackie asked this classification question, but I'm just going to ask it again and make sure I'm on the same page so test in this case. Are we talking about unit tests.

A

It could be any test, so I think mostly about unit tests, because that's the broad part of the pyramid and what really can take a long time to run as you start to run into hundreds of thousands of tests, but I think, is equally important when you talk about long-running integration or end-to-end tests, those can have an equal penalty when it comes to wall clock time when you're thinking about you know how long does it take for a pipeline to get to uh to finish running, and it doesn't matter, I think, about the pipeline running as it doesn't matter if it's green or red.

A

If I get a result and it's feedback great like that's what I want. Obviously I want more of my pipelines to be green, but I just want my feedback faster, um so I I've kind of focused on unit tests, but I think this applies across the board.

D

Okay and the last good question then really fast and sorry, you have to kind of just off at 10 as well, for when the call, but um we say way too long generally, are we thinking about hey? The test should be less than an hour less than two hours. Do we have a ballpark range in terms of the timing.

A

So what we've been thinking about and testing, like our super long term vision, is that you can go from opening an mr to running that code in production in an hour, and so that is our really think big like big audacious goal, and that, mr, is not going to be like hey. I just spun up a brand new feature with front end back end database.

A

You know it's scales, ha all of that, but you should be able to get an mr opened and even if you think about it, as like a text change and you're just running linters on it hey, you should be able to get it out into prod within an hour. It's probably more likely. It's like hey here's, a front-end, javascript change, so I've changed some functionality. I've changed some code, but I have confidence that this changes in a breaking change and there's a lot more than just testing. That goes on with that.

A

But testing is a big component of it because it can take a big chunk of the pipeline.

D

Yeah for me, I think um by you introducing javascript you kind of noodle on the next kind of place. My mind went. Is this such a big problem space is that you almost have to kind of potentially say? Oh, let's kind of maybe focus on front-end applications, and then, maybe only you know, javascript ads, or maybe it's just microservices for java, because it's so it's so different depending on what kind of app you're talking about it will kind of change. Yeah interesting.

A

Okay, that would be a wild goal, for, let's say a mainframe. um That may not be the best use case for it, but.

B

I I have another question so right now we're we're focused on this developer, wait time and supporting your super long-term vision to get this down to like 60 minutes, for a change to be made into production and you're. Seeing that the largest bottleneck is tests right.

B

So is there something that can be done more proactively inside of gitlab to show people that they do have test waste now, and maybe that's where, like we're, seeing really long running jobs on and tests on, files that weren't even changed and bringing that to the attention of users so that they can reduce test ways, we're.

A

Going down that path, a little bit with the um tfs test test valve finder, um where we have a template that you can apply it's very language specific to ruby today, but we're working on expanding it, where it runs as a new stage, just the tests for the files that were changed, and so it just pulls those tests out and runs them first.

A

So you can get that feedback loop faster that then you still run through your full suite of tests and so we're starting to think about then that next suite is there a way to, especially in your merge request pipelines, run fewer of those tests and get that same quality of feedback, but in a shorter amount of time that doesn't require throwing money at the problem so trying to strike that balance between hey I've gotten everything I need, and I didn't suspend a thousand dollars on this pipeline.

A

So yes and we're exploring, I rambled a long way to get to yes and work.

B

I think what I'm interested in is like. We have a monitoring production metric that suggests this has been stable in the last 90 days, and your tests are running really really long yeah. You should evaluate if those tests are necessary or not, or even in the inverse that you're having a lot of instability in your production instances and we're seeing that your tests are high. Are your tests working yep.

A

Yep, so I I think that that could apply to if we, if we go into the weeds a little bit. If you have an api that is often returning errors, then you want to. You would want to track that back too. We want to test this integration. More often, if you have an api, that's super solid and stable.

A

You can test it less often so maybe there's some sort of mapping, then, between your gitlab monitoring solution and your gitlab testing, where you could say here's all of my tests and here's the I want to apply lottery factor to them, based on success in production, because I know that these unit tests go with this and go this api endpoint and it is rock solid.

A

So I'm going to only run 10 of those in every test, and it's just going to be randomized during every pipeline is just going to be randomized, and then you can do you know you can start to play with that with gauges potentially so that you could get confidence as uh a combination of a dev team and an operator team um if you're split that way or just as a development team, if you're doing full-on devops of how much do I really need to test?

A

Is that what kind of what you were thinking, jackie.

B

Yeah, it's more like using a lagging indicator to inform what you're next.

A

B

Next step is yeah.

A

So I think that we've kind of started to explore um the who has this problem. We've talked a lot about the developer and the devops personas. These are tiny hold on.

A

These are giant on my screen and I can't even read them. um Are there other personas who we should include here? Who have this.

A

D

Yeah, I agree with, I definitely agree with sasha. um I know you asked a question about if there are other personas, um I don't have an answer to that, but I was just. I know we talked you you talking to all of us in the beginning of the call, I'm still a little bit curious about the devops persona or the deadline.

D

That was an engineer trying to use monitoring wronging, and I know that that's part of the goals that you have in the um the thing big as a second green anime, but I guess I I'm just kind of struggling with with that whole idea of.

D

If you reduce the test surface right or just time takes to set test servers. If you reduce the test surface, I I I struggle to see how that then translates to reducing.

D

You know what we monitor and wrong in a production environment to ensure that we're hitting slos or slas so.

A

For me, this is the flip-flop of that where we say that hey, we have test history and we see that a combination of the unit tests within this module and the integration tests of this module have all passed 100 for the last 90 days. So if we're monitoring that area of our production system instead of saying hey, I want every single log and I want every single trace.

A

I can say I really only need 10 of the logs in 10 of the traces and so the amount of storage and the amount of data going into sacrifana shrinks. But you still get signal but you're doing that, because you have confidence that it's a well-tested, um solid piece of code that you have like if you have fuzz testing, maybe even that goes up some more because you have extra confidence around the bare or the the edges and the funkiness.

A

That could happen with weird inputs, um so a combination of things that could happen in tests that result in still data but less of it and that results in lower storage costs for you potentially when it comes to monitoring, because it just gets crazy. Expensive to start storing, terabytes and terabytes of data, I worked with. I worked with a monitoring logging team, and that was always the the battle of listen.

A

We can't continue to throw hardware at this problem, we're storing so much data and you're doing nothing with it and development teams coming back, saying yeah, but what about when we have a problem, we're going to need the data and so to be able to have tests come into that and say: hey we've tested this continuously for 90 days and it's rock solid.

A

We'll still give you some data, but we're just going to reduce how much of it. I think james.

D

That what he just said to me is a really nice um articulation of the interesting problems taken on pain, point that that that kind of, like hey I'm a dev everything went well when I you know created this code. I ran my unit test and a few months later, something broke into production and trying to figure out what broke right, and so I'm now reliant on these logs and then the devops engineers like dude. We don't want to.

D

I shouldn't, have said that I'm down and we don't want to stop all of these gloves because we're never going to actually use it. I think that's a really really solid, I'm not sure if we have that point, but I think that's really interesting.

A

D

About too yeah.

A

I just I'm speaking more from personal experience. If I've had this pain of like, even if you have all of that data, how could you possibly find problems in it because there's just so much of it like trends, are buried, because you have to know what the problem looks like to slice and dice to find it anyway.

A

So like where's the value in doing it, whereas if there's a huge problem in a small sampling, you're still going to find it like, if you introduce a problem that breaks that system entirely you're going to get that signal. So.

B

Wonder how like a system who is using test driven development, would respond to testing less uh as a result of production performance because those those teams are uh expecting that their behaviors are driven by having lots of tests and constant evaluation of production states.

A

I mean, I think, as you're introducing new functionality, you want to test that and make sure that's solid, but going back to that, like we wrote this code a year ago we haven't touched it since then tests haven't failed since then in production it looks great. So I really need to run those 80 tests. Every single pipeline.

A

Cool, so we have thought big about this for a while. I want to try to get jackie and darren's thoughts on ideal solution before they have to run. They have about 90 seconds to do that so putting them both on the spot, they're, not muted. First, so I'm going to say, go what would an ideal.

D

Solution look like um I don't know if I have an ideal solution, because I I'm struggling with creating if I were just kind of thinking about an mvc type product to solve this, I'm struggling in terms of thinking about what that is. So what I'm going to lean on is um really quickly an example- and I think I talked to you about this a bit before james know, one-on-one an example of what we did um for nike, inc's club acceleration team and involved nike inks. You know plethora of applications right.

D

um The approach we took for testing was a for the 1600 development teams at nike. We said you tell us what tests you think are important to run for your application, that's kind of like step one, and so that meant that the team had the ability to to narrow or widen the aperture in terms of how broad or how narrow the test should be, and then they said: okay, like in diamond's case I'll, need to run three tests to in order for this timer to go to production.

D

So if these three tests fast call, the code goes for the option james. This app might need 10 tests because it's more complex whatever. So that's the way. We started that first and then so we so the the development team defined the quality gate for the test right. We put it in the system and the next thing that we did was okay, now that you've defined the faulty gate right or parameters for that particular type of application or change. Or what have you now on the on the monitoring side?

D

You can start seeing okay over time are changes related to that area of the code base, causing problems in terms of availability or um or bugs being raised or trouble taking incident tickets. Or what have you, and so we try to allow them to correlate the two, and so that's the the path we had gone down, um and so I don't have a solid answer to your question, but I was just kind of imagining how we had done something related in the past and how it might fit into this.

D

But I don't have to start answer for you. Maybe that's my two cents. Awesome thanks, dude.

B

As an nvc, I would like to see uh a place where all my tests are like in aggregate in relationship to their production stability, so being able to quickly see how long are my jobs running specific just to my tests and how performant are they in production? And I could see a second iteration being like hey. Your production has been stable for 90 days.

B

You might want to reconsider the test you're running as like a nudge or a trigger, and then allowing them to select inside of git lab opportunities to remove tests from pipelines uh and then a further iteration can be even proactive like on the mr widget, hey, you can skip this testing job, because this is making a change to a file. That's been stable or whatever you whatever. You know, language makes sense, but I could see this interaction of it being here's push data and then intervening your changes in the pipeline.

B

A

C

My thought was very similar to what jackie mentioned first of all, to see some sort of a comparison because, like we we have, we have an assumption now. Assumption is, if, and if you have like, if, if if your environment is stable, then you need less tests or vice versa, but first of all, in order to prove this assumption, we need to have some sort of a comparison.

C

Let's see, okay, let's first see what we have and if we see that there is a correlation between like number of tests and the stability, then we can start playing with that, but at least let's, uh let's make sure we are not assuming something that is. It does not work so this this would be like my very first thing that I would build, but very similar to what jackie mentioned.

C

A

If there'd be a way to with existing data in gitlab, if someone has um is using the gitlab monitoring and we have some sort of tracking of their production system and we can see error rates or traces or I don't know, maybe even like 500 errors on apis, we can map that back to an api test and say here's the test. History of that test for the last n runs that's capability that we'll have in like one milestone and say this api. This is what it looks like in production.

A

Here's what the test for that api looks like and just bubble it up to say: here's your top 10 flaky tests and how the api behaves and here's your top 10 error rates of apis and here's how the tests behave the history of the tests and just build a page that shows that data and only track. If people go to it and see hey, do people think that this is valuable or not that all we want to know is if people are looking at the data.

C

Yeah I mean like one one thing that we can take from that is: maybe you know when you have tests that are flaky, then you can say: hey like your testosterone, that why won't you create like some sort of a monitoring template.

A

So, like identification of within your flaky test, you could say you have monitoring, you don't have monitoring for what this test is testing. Oh yeah, I like that. That's good.

C

And like if we, if we can do it like seamlessly or like as easy as possible, then it will even encode users to use like more file. You know yeah.

A

That's a very subtle way to say um you don't have any monitoring for this.

C

No, if you don't, we can like in the push of a button. We can do something like this.

A

C

Here's your yeah but yeah and there's an obviously could be like an empty page that you need to do by yourself and afterwards you can like.

A

Yeah I like that, like on test history, you could say: here's your top 10 flaky tests, yeah, here's how the endpoint is behaving in production. Oh, we don't know because you haven't set up monitoring. Would you like to set up.

C

Some monitoring.

A

So you can see yeah we could. We could help drive that uh that category a little bit um cool. So what about from the monitoring side? Is there something that we could leverage?

A

Because it's because I don't know the capabilities of the category very well, um so there's something that we could take, as is today, because I know that we don't have investment there on the development side yeah um to utilize, for either of these problem spaces like is there an mvc approach that we could take from the testing side or contribute code back into monitor that would help solve either of these.

C

Problems like the the the only way I can think about it is like, like the 20, I don't think it's related to the problem, description that you have, but just like, let's, let's just brainstorm a bit, uh I mean the the advantage that we have in gitlab is that we have access to the the people that actually write the code and write the test.

C

They can also add the monitoring, because, like basically monitoring now, is a piece of code that you add, or some sort of piece of configuration that you that you add, ideally as quickly as like as as quickly as possible, you don't wait for things to deploy in production and only afterwards you deploy monitoring. You want to have this monitoring somehow embedded like within your similar to how you do like unit tests in on your test.

C

You want to have monitoring, enabled and maybe in the ci when you create your pipeline or even before, uh like for like when you instrument your code, you do it you do it when you develop your code when you like. After so, how can we like take something from test and and use it probably somewhere in the instrumentation part, but I'm not sure like not sure how.

A

When you're setting up monitoring do we have like thresholds for how much we're going to sample for, um like the sampling rates, I guess or the tracking rates.

C

C

So so there is like some sort of probably like an out-of-the-box setting that we said by parameters which, obviously you can change so yeah. Maybe that's a good idea like changing the sample rate based of like how flaky the test is, because, obviously, if the sample rate is high, then you know you test more, you, you add more data, and you know, as you mentioned it's it's too big to handle yeah. So this is. This is something that this is something that can easily change like. There's the sampling date.

C

And I'm pretty.

A

Just keep going back to there's got to be a way to map, then the test back to what it monitors and vice versa. If we, I think, that's the hardest part of this problem to solve, is how do you get from production monitoring back to a test or a set of tests, or at least that's the problem? I don't have an answer to even so far.

C

One thing that we can do is just like, like if you instrument your code and you identify a problem on your monitor side, then you can go back to the to where you ins, where you instrument the code, because eventually um you know you're, detecting that something is broken and that's great.

C

But maybe if you can go and see, okay to which class that it mapped in my code, then maybe I can focus my test based on that specific section and of the code, and you understand what I'm saying it's like you're, going forward from from the monitor side directly to the to the code line.

A

So like, if you're implementing um tracing you have to instrument that within the code, so as you're doing that, I'm just trying to where was I going. My train of thought got derailed, um be have a view, then of like it's. For me, it's almost like a coverage report of I can see within the code like where there's test coverage, and I can see where there's it's instrumented for tracing.

A

I can see in addition to my tests like if you're looking at the um the web ide or like the raw file view, you can say hey. This is covered by a test. Here's this test, history, hey this, is covered by tracing here's, the error rate in production for the last 90 days like you would have the same thing on the same line or have those two data points on that line potentially.

A

And then, if you extract that data out into a view, you can say: hey here's error rates from your tracing data that you've instrumented your code. Here's test rates for the same kind of code where we see that mapping within the file and here's where potentially you can make a change to decrease your your test runs.

A

I think I'm back over into.

C

A

Big, instead of think small.

C

Because I know that, like a lot of time, when you, when you identify something, is broken and you're monitoring, you want to place it back and say: okay. Why did I monitor here?

C

Because, okay, I got like this l8 I can investigate, but eventually you go back to the code and you want to understand okay, what what happened?

C

What broke the app and it was very helpful if you'd be able to go to that specific area where you did the placing and it's like narrowing down the search of where, where to look at when you want to investigate your code,.

C

So like, if you want to try to understand okay, what are the steps that you can do is like when you have like some sort of an elevate, a threshold violation, something like that?

C

Maybe you can have some sort of a clip that texts you to the code, do some sort of record search and take you to that line of code say: okay, this is the way you instrument it and then you can see if you have tests there as well, and if you don't say okay, maybe I should run some tests or maybe should add some tests to my goal.

A

So working backwards from tracing from an error to see where these things was instrumented and if.

B

A

Yeah, I think that that's really, uh I think, that's a thread that we can pull on with the dev teams or with uh yeah with with the dev team and say hey: how can we get from tracing data back to instrumentation back to a test like how do we connect those three dots and then pull the data into a view? I would think that that would be kind of the minimum viable here of I want to run fewer tests. The first thing I have to know is which tests are important, that I have to run.

C

Well, maybe I'm missing something like that: yeah.

A

Or if you're missing tests like, if you have a high error rate and there's no test that maps up to something like here's areas of code, where you would well, I think we're running tests.

A

Yeah yeah yeah, I think, there's a it's an interesting conversation to have about. Should we remove these tests because the code is solid or should we just run them less often? um No. I don't know which one is right, I'm I I don't think, there's anyone who would say we should get rid of tests for code that we're still shipping.

A

I could be wrong.

A

Cool um anything else you want to wrap up on on the think small dove.

C

I'm just thinking like if sarah was here, she would probably ask: how can we do something around like incident management like when.

A

C

A

Like how can we.

A

I don't know the incident management workflow too well, so I don't know that I could speak to that or I wouldn't feel confident, um because all of my assumptions would- or all of my thoughts would be assumptions on the topic.

C

Yeah I mean maybe we can just like I mean if, if we have this capability, the way to surface this capability will be to an incident. When you have this incident, you can maybe attach some sort of a like a report, or maybe some sort of a a blob that says. Okay, you have tracing, you have testing click here and it takes you to the code.

C

A

We need to clear an incident. I think you do it by creating a new issue. It's a new issue, type that creates an incident from there. I don't know what the life cycle of it incident is, though,.

C

um Like you know, someone is getting assigned to the incident obviously, and then do this digital wood cause analysis or triage of the problem until the incident is getting close, like I'm probably butchering the.

A

Workflow but wait. I guess, then, if you, if you extend that to thinking about hey, we're going to attach it to like this data blob of the monitoring that was happening during the time, then I think that workflow. If now, you can track back to the code and you can see if there's tests and how those tests behave on the last. Mr, like that would be an interesting way to tie that together and if you're able to quickly identify like when you create your incident.

A

If you're coming out of your monitoring view and say I'm creating a an incident, here's the error rates in the chunk of time and in the system that we identified the error, then, on the back end, you can start stitch that together, so that it's just added into the issue. I think.

C

Yeah and even if we think beyond our monitoring capability, which we know, we don't have a lot of investment there, maybe we can attach some sort of an external monitoring solution to that to that incident. So this way you'll be able to. You know, work with other other vendors. It's actually a good problem. Validation, yeah to conduct for incident management.

A

Yeah cool well, um I think I can take a couple of these anyway and at least brainstorm them with my team on the testing side. um I think there's some interesting things here to try to poke at some of these nbc's might even be a think big for us as we try to explore the monitor space a little bit, but I guess definitely I mean it's really interesting for us to think about how do we run fewer? How do we run fewer tests with the same amount or even higher confidence in the code?

A

That's going out, so this has been really helpful for me anyway. Selfishly of course,.

C

It was interesting to me as well- I don't have any anything to like to take for for ci, I think for now. Maybe so I can take some things from monitoring or from incident management.

A

Yeah, I mean, I think, um as we think about this like this will be this will dip into seeing or pipeline authoring, though, as we start to like, can you build a gitlab, ci yaml and as part of your test, config set it up to be dynamic, of like insert lottery, factor and test selectively run within your test job based on your monitoring data that you have, but that it keeps itself kind of updated without having to rewrite the the test scripts or rewrite the ci, um the gitlab, ciamel or whatever? That might be.

A

There's some there's some interesting things. I think we could do there.

C

A

Cool well, hey thanks, doug. I appreciate it um as always great to talk to you appreciate your thoughts on this topic. Thank you. Cheers bye.