keptn Community, 10 Feb 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Keptn Community & Developer Meeting - Feb 10, 2022

Description

Meeting notes: https://docs.google.com/document/d/1y7a6uaN8fwFJ7IRnvtxSfgz-OGFq6u7bKN6F7NDxKPg/edit

Learn more: https://keptn.sh
Get started with tutorials: https://tutorials.keptn.sh
Join us in Slack: https://slack.keptn.sh
Star us on Github: https://github.com/keptn/keptn
Follow us on Twitter: https://twitter.com/keptnProject
Sign up to our newsletter: https://bit.ly/KeptnNews

A

Hello, welcome to the captain developer meeting today is february 10th, uh so we have uh and people on the call uh the most of them.

A

What is this noise.

B

Be honest, you're not muted,.

A

Yeah, well, I guess adjustable height of the table yeah but anyway, nice uh to meet everyone. So we also have michael on the call. Would you like to introduce yourself.

A

Yeah, if you want, you can just unmute yourself later uh yeah. So let's take a look um at our agenda.

A

So what do we have on the list? So first reminder that effective next meeting we changed the time zone. Actually it was supposed to happen this time, but I didn't update the calendar so next meeting we will be meeting at 9am utc for developer meetings, uh but for community meetings. The schedule will remain the same.

A

So next week we have a community meeting at uh on wednesday at 9 00 in utc, so they will have a presentation by brett mccoy about progressive delivery with captain and on thursday we will have a captain community meeting at the current slot, but going forward the developer meetings will happen at 9am utc.

A

So let's just heads up and.

A

I guess this is all from me.

A

Yeah so the most of agenda for today I guess demas and moritz, would you like to take over yeah sure, I'm gonna straight away, take over.

B

Okay, um so what happened in my sprint? um First of all, um one of the more important things um we had a resource like usage stats from our kubernetes um resources. Basically before already um they were kind of gone in the meanwhile and and they're back now, with with the new pipeline setup.

B

So, um basically before we had an actual cluster where we fetched the resource limits and requests from, and now I changed that and we actually um just fetched them from the helm chart.

B

So so here, basically, I do a hand template with the captain installer as a dry run, just to get the whole chart in one file, and then I have some some fancy yq and then jq um to kind of get out and then kind of reshuffle the information so that it's in proper format and in the end we have, as you can see here, resources for cpu and memory, and then all that is gonna end up in a small markdown table which is gonna, be attached to all the new releases going forward.

B

um Then what I also did is I changed our integration test clusters that we use on gke, basically from from being dynamic to being static. So they are running all the time now and they're not created from scratch every time which, on the one hand, saves us a little bit of time and also lots of headaches, and it gives us more flexibility with debugging, for example, when you can just go to the cluster, it's still there. After the integration test run, you can look at the logs.

B

You can look at the new spaces, everything that happened um yeah so going forward. We're gonna use those those two clusters here, one for gke, so kubernetes 119 and one for kubernetes, one 21 for our static integration test clusters, um all the other testing platforms that we had like um k3s, k3d and mini shift.

B

Those stayed all the same, so that's all still um created on demand and then shut down at the end of the run. um Yeah and this didn't go fully smooth. So there were some some two bug fixes yeah two bug fixes that I had to shoot afterwards.

B

One is for unique names in the in the image names in the helm, charts for every integration test run.

B

Basically, we had the issue that, if you change something and then run the integration test again, the image names would stay the same and therefore they weren't fetched again fresh every time in the integration test run on the cluster. So basically, you had a new test run but still use the old images. So you couldn't test your new changes, and this should actually resolve that.

B

So what I did is, I only changed the condition here, but this actually in turn uses instead of just the version for the images inside the helm, charts, it actually uses the version plus a date time so that they are always unique when they are built. Basically,.

B

And then one last thing also another fix for our pipelines. um We we already have ui tests in place for the bridge and and screenshots generated from that, but actually the screenshots were only generated when the test succeeded and if they failed, it would just exit out and and not create any any screenshots for us.

B

So what I did there basically is create a small script that runs the yarn test command and then and then moves the screenshots to the right folder so that we can extract them from from the docker image and then it actually checks the outcome of the tests and exits out if they failed with the non-zero value, um and that's actually already all for me so handing over to florian all right. Thank you.

C

Yeah this spring, though, is uh basically with one research ticket, so this one was about figuring out how we can have multiple replicas of the shipyard controller, so this was actually quite a tough nut to crack. But after some or after a lot of head scratching, I think I found the reasonable approach that we can use going forward.

C

uh So the main challenge here was that, since the shipyard controller listens for it's related to the tasks that should be executed throughout the sequence and, for example, when it receives a finished event by a participant, then this means that this received event can have a further impact on the state of the sequence and it's further execution.

C

And there we had the challenge that if we have multiple replicas of the shipyard controller, where each replica can potentially receive any of those events, then those might overwrite each other's changes and we would have some serious concurrency issues. uh The way to tackle this was to uh come up with a new data model.

C

That is one important characteristic, which is uh being append only meaning that this section here so for the representation of the current task, that's being executed and its event.

C

uh So now this will be the only property that will be affected by incoming uh events received by task executors, and this means that uh in this list only new events will be added and nothing else will be uh changed as a direct result of a reception of such an event, and this means that the threats handling those incoming events won't overwrite each other's changes and then yeah. This of course allows us to achieve this concurrency and also as a nice side effect.

C

uh The code of the shipyard controller actually has gotten a lot easier, because uh previously we used actually a number of different collections within the mongodb to keep track of, for example, the started events related to a task at the finished events, etc, and now all of the information that's needed for the execution of a sequence has been consolidated to this uh single collection, where, basically, the complete status of a task sequence is represented.

C

So, of course, if you want to have a closer look at the detail of this new data model, you can check out this request. I've opened here so here I have an example for an actual sequence, execution and its representation and with a short description below on how this uh is supposed to work going forward for the gp controller, and another aspect was since we have these components that regularly execute uh jobs within the shipyard controllers. So, for example, the sequence dispatcher, which periodically checks whether we can start new cued sequences.

C

Via the shipyard control the event dispatcher, that is responsible for sending triggered events that have been put into the queue so there.

C

If we have multiple replicas of the shipyard controller, where each one of those has these components running, then obviously we also might have some concurrency issues since, for example, a sequence could be potentially started multiple times, so the same sequence and also events might be sent out multiple times, which is, of course not wanted, and for those kind of jobs, uh feasible approach that kind of can be used here is to have some leader election.

C

So for this we can actually use the inbuilt leader election package. That's part of the kivonetti's client library, and using this we can establish a leader among the replicas of the shipyard controller, and then only the leader would be in the position where those periodic background chops will be executed and the other replicas would simply not have these jobs running all right. Are there any questions about that.

B

Question um this is actually already provided completely.

C

uh This is provided by the humanities library, so maybe okay, it's actually a really simple piece of code, that's being used for that. So, let's see if we can.

B

And it probably already also takes care when we lose one of the replicas, for example, that a new election happens and stuff like that. Exactly yes, so um just.

C

Check if I can find.

B

C

C

I think yes, so.

D

C

We have, it, uh is the font size, big enough.

A

Yeah, maybe pico would be nice, but okay.

E

C

No, the shortcut right now, but basically, uh as you can see here, only a couple of lines of codes are needed. So basically you refer to a lease resource within the cluster and then you have this run or dive uh function. Where you have uh several callbacks like uh onstarted, leading, which is the place where we would start the dispatcher jobs and then the unstopped leading where the signal to stop the dispatchers is being sent. And then, after that.

B

That's pretty cool. Okay thanks.

C

Right then, I think, that's all for me, so I'm handing over to.

C

D

Queen for a moment,.

D

So hopefully you're seeing my screen uh yeah, so from the last print uh I made. Actually, I managed to fix two bugs which were discovered by the community members.

D

The first one is regarding sending the double finished events by jmeter service. This was actually caused by say, an improper return values uh from the from the health check. um For example, it was reproducible uh when the end point of the deployed service was not reachable. So if we look at the code.

D

At the beginning of the test, runner, there is a test which runs health run.

D

Health check, run health check, which checks the health of the service, and if there is a problem, it should return an error, but in this case or before our fix, it has returned nil and therefore the execution in the geometry service has uh has continued uh to the run tests and afterwards and test results, and therefore we had received one finished event from the health check, which was that the health check has failed and another one from the send test results which was actually kind of unexpected or yeah uh yeah.

D

So the second bug was about checking the unknown commands in the cli.

D

So, for example, uh when a user wanted to create a project and has used an unknown command in the cli, actually, the celia cli hasn't wrote, error message or whatever hasn't screened, actually that something is bad uh otherwise and it has proceed with execution, and sometimes it also had some inexpected behavior. So we actually add a check for the unknown commands to each of the cli commands to be properly in this way, so the user won't will get some burning or some error if he uses something which is not allowed.

D

Yeah and the rest of the sprint was actually about the research ticket. Regarding the outen kit, remote git, repository authentication, via ssh, private, key or via proxy server.

D

uh This is actually implemented as a proof of concept or info. You can find in the research ticket, uh but I think in the next sprints or releases we will be able to add this functionality to production code, so uh the first one will be that users will be able to identificate uh to remote git repository, not just with token and username, but also with the ssh, with the private key. As you know, it's from the for example, github or gitlab, and also they will be able to authorize themselves via proxy server.

D

So all the traffic to the remote git repository will go via.

B

D

Proxy server, so we will probably address this functionality in the next prints and hopefully the community members will be able to test it and provide the feedback for it.

D

uh Yeah, that's from my side, all are there any questions.

A

D

Thank you I'll stop, sharing and handing over to anna.

A

F

um So actually, just directly show the vr. Can you see my screen fine, or is it too small.

B

A little bit bigger would be yeah. I.

D

A

Just a sec.

F

Okay, this is a problem. My whole screen disappeared: okay, whatever.

F

Let's go with this so most.

E

F

Time went with the.

G

F

Id to the hold the services in the core, this means that now, if you go in any of the run of our integration tasks, you will see that all of the services uh pinball to each other uh git commit id.

F

So the evaluation starts without because of the shipyard is assigning it after the shipyard assigned the git commit id, then, for instance, the lice house will take it and it will forward it to the sli like so, and if the sli has a decision for a different commit id, then this can happen that they pass a different one, and this gets propagated to the whole chain and same happens in all of the other services that have retrieval of the information. So, for instance, this is deliver assistance. So this is.

F

How is it called um uh remediation, and uh that also happens in now in self-healing and in the sequences where we have a j meter or anything like that, so all of these services works the same. The only thing that is different is in this moment the helm service is ignoring the git commit, and this is because we still haven't changed the way we are storing the help service files. In our repository, this pr involved changing a bunch of things in the research service.

F

Mainly there were some bugs like having an extra head branch and not really ignoring faultly, made queries, and also I profit from this to make a couple of changes in the goyu teal, and mainly the changes regards introducing a different way of retrieving information from the research service.

F

Now we can just call simple functions that are.

F

That are these four.

F

Basically, whenever we want to get delete or update or create a resource.

G

We can pass.

F

A model containing all the scope of the resources, so project stage and whatnot and any kind of options we add to the changing the uri to access the resource. So, for instance, I provide a function that helps you adding appending queries to the url.

F

So that's any kind of query parameter that is passed to it. It will be appended to the to the url and you can use all of them the same and then another tiny change that I did for all these prs is changing the way that the a bit of the structuring of the home service and changing the way that the home service retrieves the charts.

F

This is done now with a retriever and the retrieve function makes use of the new go utils function I made, and then my very last thing I worked. This sprint was researching whether or not we can have multiple replicas of core services. I have a few integration tests run for this both locally and on our integrations clusters.

F

Most of our services can actually be duplicated right now, so you can test it on your own, with the 0.13. uh Only problem is the configuration service, of course, for the research service. There are still a few things that have to be fixed. I have done so in this book.

F

Mostly, there are some parts in which we delete the project or in which we try to access the git repository pooling that needed to be changed because we were having non-fast-forward fast-forward issues and as far as we avoid using branching in the stages, then the research service is ready to go for multiple replicas, and I think that sums up what I did this sprint I'm handing over to.

F

E

Too close, okay thanks, so uh let me show the screen: uh could you stop screen sharing.

F

If I find the file- yes, okay, this is complicated. Okay,.

E

Okay, which one is the right one, this one okay. So let's begin, uh the first issue I was working on was the the single sign-on, because currently we cached all the user information like uh access token elite document, also the validation data that is used for the openmd flow uh yeah. We just cached it.

E

uh The problem here was that if the pitch was restarted or if we in future decide to uh have multiple bots of the bridge, uh this will not work because the one part the user has a session and then another put uh the session isn't available.

E

So we changed the caching to the mongodb we currently have for captain, and this is then used for the session. So each bot uses the same mongodb and has the same user data and yeah. Then there were no uh inconsistency, inconsistencies next to that osa data that is stored in the mongodb.

E

Everything is encrypted uh with uh with a secret that is generated on an install and also yeah a session secret that is used for hashing, the the user id yeah and that's basically it for the sso login. Any questions to that.

E

If not, I will go to the next issue.

E

uh We moved the uniform page that was currently here, uh yeah shown and the integration page into the settings and move the settings from the bottom to the top and yeah. That's basically yeah. That's basically it moved it to the settings page and also uh we fixed the the indicator for the error events so before it was a bit inconsistent and now, if you read there are logs of something error, events, it is immediately uh removed- and you see here you have this one that is not weird yet not red yet and yeah.

E

That's basically this one. So everything is moved here and also the yeah all the ui tests well just etc.

E

Then you have any questions to that.

E

If not, then ever go to the next one. This was a small bug fix before there was an error, just a small one. If the filter was set to something with no sequences and it was reverted back, then there was a loading indicator shown at the bottom, and now uh this is fixed that this will not show up anymore and yeah. That's basically it and yeah. That's it from my set. Are there any further questions.

E

If not, then I will hand over to emin.

B

E

B

E

Okay, I mean: are you here so.

G

Now I can hear you um yeah. I had connection issues um so.

G

I'm gonna present two pull requests. uh Two features for the captain's bridge. The first one I was working on was using the new evolution finished payload and with that we made some ui options to the sli breakdown. So let's see what happened here actually now, the lighthouse service provides the compared value. So when.

G

Evaluation happens and yeah we get the result. Previously, we calculated the compared uh width value by retrieving the compared event, payloads and yeah calculating the result there. Now the lighthouse service provides us with this value and we don't need to aggregate those data on the client side on the bridge side anymore.

F

G

Furthermore, uh we improved the way how uh yeah the absolute and the relative change uh is displayed. So previously it was just in brackets behind the value itself, indicating plus and or minus the absolute and the relative value, which we know improved a little bit and I can show it here in life. So when the sli uh is not expanded, we show the value and the relative change next to it immediately and when you expand it, then you can see the absolute change, the relative change and the compared with value here uh in a breakdown.

G

Yeah, I think that was all about this pull request.

G

Are there any questions, yeah.

H

I mean when it comes to coloring. I think that one is currently misleading, because when we take a look here at this example, we see an increase by 99, milliseconds or whatever, and this is um red, but actually it's it's bad in the sense of we have or it takes longer now, um and it should be read from from my point of view or we leave it out or we we just have no coloring.

G

Here um it's currently and I think previously it was uh green when the number erased and red when it was decreased, it dropped, but it depends obviously on the sli, because sometimes the higher value is better and sometimes the lower value is better yeah.

F

G

Request throughput, you might be, um you could.

D

B

G

Maybe better, but the response time is a lower value, better.

B

You could maybe make it corresponding to the result outcome. So here you have failed on the right. Maybe that's something yeah.

G

uh That would be so if the if it's not failed, just uh trade, but if it failed, then mark it as red, for example, yeah something like that. Yeah.

H

G

Yeah sure uh this product quest is still not merged, so I can edit here um yeah, okay, um yeah any more questions to this.

H

G

Cool um yeah, and the second pro request I want to show is which I created just a few minutes before uh the community meeting is.

F

Showing a history.

G

Of quality gates in the environment, details screen um so previously or yeah. We show uh in the environment, screen the score uh of uh um evaluation of the latest evolution of this service on this specific stage, and we got the request to show on the details page a list of evaluations, so the history, as we already have this in the heat map, for example, um here in the uh bubbles and as you can see, um I will just.

G

Show a live demo.

G

Oh, I have local changes. Sorry cannot show you the live demo, but as you can see from the screenshot here on the right side. So when you click on a stage, for example, I clicked here on hardening, then you have on the right side, the hardening stage and all the services listed and next to the service. You see the current evolution, which is the right one, and then you see at most five last evaluations uh before that as a history. So you can see how your uh score changed. um Yeah.

A

Missing the weather, icons, sorry missing, weather icons from jenkins, oh yeah,.

A

Yeah well, we removed them there too.

A

It looks really nice.

B

Maybe some feedback. um It would be really nice to have some differentiation. I don't know in color or something to make it more clear, which one is the like the latest one and which ones are the old ones, because from here.

G

B

I don't know which is the latest one or yeah.

G

B

Left to right from right to left.

G

Yeah true in the in the heat map, it's easier to read because we have the legend, so we have the uh yeah x-axis, which is labeled, but here we could maybe um yeah. Maybe.

F

G

Pasty or something like lightness of color, also.

G

These bubbles already have a fill feature, so either it's filled with the color or just the borders uh in the in the red or green uh color. So maybe that would also be good enough to um yeah the history. uh The historical values show only the border cara, um for example, but yeah thanks for the feedback. um This one is also uh still open. So I will change that as well.

H

G

Okay, are there any further questions to this? One.

G

If not, then I'm heading over to johannes.

H

H

Okay never mind. I have two um updates to share the first. One is about cap65.

H

If you follow this cap, you or let me just jump there first, because this is actually all about the idea of of having a captain operator as well as a githubs operator. uh This was born in an in an in an innovation day and now uh thomas is driving that one um the topic- and he also came up with a very nice, drawing uh giving us an overview of what this is all about.

H

And it's really the idea of having here a git operator which takes over the communication with the repositories and then the cop captain operator, which applies captain uh crds to kubernetes and also talks to the to the control plane.

H

And when you follow this uh cap, then one of the last action items and and and decisions that were made on this cap are actually splitting it into two separate ones. um The one is stealing now with christy captain kiddo get operator, and the new cap is then um exclusively for the captain operator.

H

So we have then to this junked um discussions, while um the git operator depends on the captain operator, meaning that this one has higher priority like the other one, and so we can also focus on on that one um more and also emphasize driving this topic better, and this is what I did. Last week I was yeah. I split up the the original cap into two separate ones, and now you have. We have really focused conversations on both topics.

H

And beginning this week, I filed a new capped enhancement proposal. Actually, this is a feature we are missing for for some time now, it's the ability to add or remove a stage tour from a project. The idea is pretty straightforward.

H

You can set up stages when you create a project, but once this is done, you have no chance to add or delete a stage to your project, and the use case is, I want to add a new stage to a project. I did hear some very bad, drawing or mockup.

H

We should just visualize the idea of having here a plus symbol which allows me to add a stage in between death and chaos stage, and also to have here, an icon which should allow me to duplicate this death stage to have it in parallel to the def. I think it's better to show it here where we have the hardening. When I duplicate this one I get the same.

H

I get us the same stage just with another name on the same level. In this case it would be the quality quality assurance stage which shows up this way.

H

As I said, it's, I think, definitely not the final mock-up we want to have for this capability when it comes from when it comes to the bridge. Actually, when uploading this image, I thought about moving this functionality into the settings page. That claus showed us before, because this is currently our entry point when it comes to modifications for the project stages and the future, also for the state for the stages and not have it in the environment screen. But I just want to show you the idea of what this captain enhancement proposal is about.

H

It's really all about adding and also deleting a stage to a project or from a project.

H

All right, yeah I've, also the api, as well as the cli command for this purpose, and please feel free to comment on that or to to yeah, provide your ideas.

H

Or are there any questions right away.

H

Okay, then I'm done and I would like to hand over to suraj.

B

I don't think he's actually here.

H

So much at the beginning he was participating.

H

But it looks that he's kanye.

A

We all take it uh offline because I do have a few topics uh as a flop for databook service, uh so yeah.

B

H

I think there are no further bullets on the list feedback questions.

H

Okay, that's not the case.

A

Just click reminder so next week we meet at the same time, but for coming and developer meetings.

G

A

Utc 10 pmi set so starting from february 24th.

A

Thanks everyone uh for nice, damask.

H

All right, okay, thanks for the reminder, I think we can close the meeting then for today.

B

Have a nice evening, everyone.

H