keptn Community videos, 26 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: QArantanna#15 - Automated SLO-Based Performance Testing with Keptn (EN)

Description

Ledwo co ochłonęliśmy po jednym spotkaniu, a na horyzoncie pojawia się kolejna QArantanna. Chcemy Was zaprosić na pierwszą w tym roku anglojęzyczną prezentację. Naszym gościem będzie Andreas Grabner, który zaprezentuje temat "Automated SLO-Based Performance Testing with Keptn". Czyli tym razem skupimy się bardziej na aspektach technicznych 💻💾.
Serdecznie zapraszamy.
#keptn #QA #testautomationtools #testingtechnews #tests #qaautomation #qaengineers
Więcej o następnym spotkaniu możecie poczytać tutaj:
https://www.meetup.com/KraQA-pl/events/275860300/

A

Okay, I think we are online already we are live so uh hello, everyone nice to meet you again and today we'll have a guest from austria, andreas grubner, because it's his name is in this industry for quite a long long time.

A

He has experience as a developer, tester, architect and probably many other functions, and you may already know him, because we had a honor to host him couple of years ago on krakow, even twice. If I remember well and yeah, I'm very excited because today he is going to show us a little bit uh how site reliability, engineering work, look like looks like and tell us about a captain. I don't know if I spelled it well tool which he is contributing to so nice to see you again. Andreas.

B

How are you I'm good, I'm good and, as you said, I think I was lucky to be in krakow twice, and I would wish soon, obviously that we are all back, because it's better to do these things in person than just doing it remote, but it is what it is right. That's the situation we have to deal with, and it's great that you are offering these remote sessions and also streaming. It live to youtube. That's great yeah, but overall, I'm good.

B

As you can see here, my kitchen- and I know you said earlier- is this a virtual background, or is this for real? It is for real. It's really.

B

I can interact with my virtual background. These are all real uh plants and yeah all good. Thank you.

A

Yeah yeah on real meetup. We can take a beer after after after after the presentation, but today it will be difficult, but everyone can take a view on himself. So uh are you ready andreas? We can start I'm ready, yeah, okay,.

B

So let me share my screen, then all right. So, let's see here we go yeah, as I said. Thank you so much for for having me, um I have been giving a couple of talks over the last couple of months around promoting our open source project captain. I actually believe the last time when I was in krakow. I think it was a year and a half ago.

B

I didn't talk about captain, but I talked about a predecessor of captain, a capability that we built back then, which we called pitometer to automate test analysis and this really evolved into what we now have as captain now captain you will learn what captain is later today, how you can use it, what it's good for my focus to date or really is, as you said, martin talking about automating slo-based performance testing, I will explain what slis and slos are where it comes from.

B

It comes from size, reliability, engineering and how captain can not only automate performance testing but especially- and I think this is the most critical part- is also the performance test analysis.

B

Now, if you want to know more about me, you'll see all the information you can find me on linkedin on twitter, also on the bottom, the captain project, if you want to follow us, if you want to star us, if you want to join us lake, please do so. We are a cncf project, a cncf sandbox project. So cncf is the cloud native computing foundation.

B

If you have any questions I know marcin is there he will moderate a little bit live. I will also try later on, obviously to watch the youtube channel if there's anything coming in, but otherwise also feel free to hook me up on all of these channels.

B

So let's get started my demo today and actually I want to kick it off with the demo, but before I kick it off, let me just explain one of the things that I want to achieve. I want to be able to say: hey captain I have an app that runs somewhere right, it's accessible through some url internal external url, and it is monitored by your preferenced monitoring tool.

B

In my case, it's going to be dynatrace because a day I work for dynatrace and otherwise I'm doing a lot of work with captain, but I will be using diamond trades, but you can also do it with prometheus or any other apm tool. I believe for testers for performance engineers, especially we need to look into apm into monitoring and into observability.

B

So what I want to say, hey captain here, is my app. It is monitored by some tool like in my case, it's going to be an app. That is not all that pretty, but you know it's my sample app that I'm using it's a node.js based adapter, that I've deployed on the kubernetes cluster and I'm monitoring it with both damage rays and prometheus.

B

Now what I else want to say, I want to say: hey captain, please execute my performance workload. Using my tool of choice. Again, captain, as you will see later on, can integrate and trigger any type of testing tool, whether it's jmeter, neoload, gatling, low, cust, loadrunner, whatever it is, it can trigger all these tools. I will be using jmeter today, and that means captain once I ask captain, please re-execute the test. It will launch that tool. It will execute the test and then the most important thing is once the test is complete once done.

B

Captain please analyze my slos, my service level, objective score based on my slis, based on my indicators, my metrics, that you can retrieve either from the testing tool like jmeter or from dynatrace, prometheus and so on, and then give me back a result. That is easy for me to understand so that I don't have to go into all these different reports and dashboards.

B

So essentially in a nutshell. This is what this is my end goal for today. I want to show you how you, as performance engineers as quality engineers, can use captain to automate your test execution and also your test evaluation. The result evaluation and before I go on with the slides and explain a little bit more about what captain is. Let me actually go into my environment. I just want to quickly show you what I've installed so captain itself, I have installed captain on a kubernetes cluster.

B

That means captain itself is a container-based architecture and you need to install it uh on any type of kubernetes flavor. I have chosen here eks.

B

We also have installation scripts where you just need a linux machine, uh because there is some there's, some very lightweight kubernetes distributions, like micro, kubernetes, mini cube or k3s, which is one of my favorites.

B

So really, if you're not experienced with kubernetes, don't worry, we have a single line installer for you, but the only thing you need is a linux machine with two gigabytes of ram uh and four gigabyte of disk space, and then we install everything for you uh using k3s, but, as you can see here in my case in case you're familiar with kubernetes, I've done it earlier, but I want to have install captain. You install it in the namespace by default or typically, and you can see, there's a couple of components.

B

The important things for me today is the chain meter service. So, in order for us for captain to talk to other tools, you are integrating or connecting your external tools to captain through a so-called service, and I will explain to you later on how you can write your own services, how you can extend captain and integrate your services.

B

I also have the dynatrace service and the diamond trace slide service here, because I'm using dynatrace as my primary monitoring tool. What else is interesting? I have a notification service. I also have my slack integration turned on so every time captain does something, or captain is finished with the task. I can also get a select notification.

B

The data itself he kept restoring it in the it also comes with the installation. What else is interesting here? You can do some other things like one thing that I will also show you is that captain can not only test and evaluate it can also deploy a service for you. So if you're going towards, for instance, kubernetes and microservice in container deployments, you can actually say captain please, before you run the tests deploy.

B

The latest version in the container in the cluster then run the tests, and for this we can, for instance, use the helm service and there's some other things important, for you are maybe one or two others the bridge. This is the ui that we'll see in a second and very important, also an api, because we can control and trigger kaplan, also through an api- and I will show this later on with jenkins, I'm going to trigger the whole captain workflow through jenkins, but this is what I've installed.

B

um Let me go now to my browser and I need to just look at which browser I am. Where is my browser? I have a couple of browsers open. This is the bridge so remember I have captain installed now. If I install captain, then you get, we call it the bridge, it's the ui captain is organized in projects. I have a couple of projects here that I use for different use cases.

B

I, for instance, have one project where I do multi-stage delivery. So in this case, I have a project where I can have captain move a an artifact, a container, for instance, into dev into staging into proc into production, always run some tests and evaluate now. The primary project that I am focusing on today is what I call performance as a service, because remember my talk today is really about uh running performance tests.

B

Let me just show you why this doesn't close. I want captain to run performance tests for me, so I have a project called proof as a service project, and I can onboard multiple apps here and the one that I'm going to use and just zoom out a little bit a little bit too big. It's called proof as a service.

B

This is where I have a single stage that is called performance, and I want captain to run the performance test against the particular application that begins that particular url anytime I want, and then I also want captain to execute a particular workload for me. So, as you can see here today at 1527, I ran a test now I want to do the same thing, which is what time is it now? It's 6 11., so I want to ask captain: please run a test against an app now, first of all, drum roll, not that exciting.

B

But this is my app okay. So this is my app that I've deployed in this particular url.

B

This app is monitored by dynatrace. So this is my monitoring tool. Where I have you know automated monitoring turned on so I built a dashboard of metrics that I would normally look at and what I want to do now is. I want to automate the whole thing and I want to say remember: captain I have an app on a particular url and I want you to execute my tests and then I want you to give me the results.

B

So this is what I'm doing now, I say, build with parameters, so this is just a jenkins pipeline where maybe your jenkins or whatever you have, if you have gitlab or if you have harness or any other tool or circle ci, I don't care it works with any pipeline. Basically, this jenkins pipeline here what it does. It makes a call to captain and say please captain do a particular thing like automate the performance testing. So remember, I've told you that kept is organized in projects, stages and services.

B

So here I'm saying captain for this particular project that you have for the performance stage for the performance as a service application. I want you to use dynatrace monitoring data later on. I want you to execute a certain workload and I will show you what this workload means later on right. You can. I I'm selecting performance now, but I have performance, 10, 50, 100 and long. These are just logical names for workloads.

B

I can also specify what type of slis I want to use like which list of metrics, then the url right. I'm pasting it in to make sure that it's really the one remember this is the url of my sample app perfect and that's it and then wait for result is. If I kick it off, remember, captain is doing something. Asynchronously captain is also an event-driven system, I'm sending captain and an event, and then captain is doing its thing and then I can constantly either poll for the status.

B

I can also have a callback in my case, I'm running a jenkins pipeline, there's no real callback to jenkins. So I just let my jenkins pipeline wait a maximum of 60 minutes and in that it's always polling once it's done every couple of seconds. So if I click build now, then what just happened is my jenkins has told captain. Please execute a test against the url with this particular workload, which means, if I go back to my captain's bridge, I should see on the left side here and see that today, 1814 that's now.

B

I just see that there's a new request that just came in the request comes in through our jenkins integration and the actual event that was sent to captain was asking captain. Please do a performance test. Remember this was the the drop down box with the different workloads.

B

Please do a performance test and by the way, here's more information about the caller, like we have the concept of labels, so I can see this was triggered by build number 55 because remember if I go back to jenkins, if you are familiar with jenkins, this is build number 55.. So I passed this over as a label. I actually have it twice here: the job name and also the job url. So this actually links back. If I open this one up to the jenkins pipeline right, everything is fully linked.

B

That's nice and now captain will do its job. It will execute tests. It will then do the evaluation and then it will. It will come back to me with a result now, one additional thing I want to show you before I go back to the slides and explain a little bit more about the logistics and the architecture.

B

Is you may wonder the test strategy performance? If I go back to my visual studio code here, the only configuration that I have to give kept in order to work is obviously a uh my testing scripts, so captain internally holds a configuration repository, a git repo. So when you set up that initial project, you need to upload. In my case I uploaded loads jmx.

B

This is my script and then you can also upload 4g meter, at least at so called chainmeter.com.mo. This is where I can specify different workloads. So when I ask captain, please execute remember the performance. Where is performance here we go. Please execute test strategy performance. What captain will actually do? It will launch this particular gmeter script and it will then pass particular parameters, properties or variables. 2J meter, like in my case, 10 virtual users, uh running them with a loop count of 100 with a particular think time and also a particularly accepted error rate.

B

The nice thing is, you just specify your different workloads. Obviously you can specify different scripts here. You can upload multiple scripts and also your data files to that directory on git, so that your script can really do. Data driven, uh like you, know, a real, a real load test. Now captain doesn't write the tests for you. You always have to bring your test, but you basically say what type of workload do you want to run, and this is what captain is now actually doing and and running for me and actually look at this.

B

I I talked long enough, which is great captain is an event driven system. That means initially, it started with my jenkins, sending an event to captain saying: hey, I am I'm. I have a deployment for you. This is why this is actually called the deployment finished event, and then captain knows once a new deployment is done. The first thing is, it needs to execute the test. Tests are now finished, so in this case the geometer service said you know it took about two minutes.

B

Everything is good. Next thing is, captain starts with the next phase, which is retrieving performance data or we call them sli service level indicators from my monitoring tool. In this case it's dynatrace. It will retrieve the data for me from my monitoring tool from my dashboard and once everything has been retrieved, all the data has been retrieved. It will then give me an automated assessment of the quality.

B

So, let's see I talked again long enough. You can see if I scroll down now. First captain said I need to retrieve data. Then my integration with dynatrace reported back. Hey here is the data for you and then captain is doing the evaluation for me. So what you see here is what we call the heat map, which is every single metric, that I've specified that I told captain to analyze. Also through a config file was retrieved.

B

Every single metric or sli was evaluated against what my expectations were or objectives. We call them slos and then we calculate a total score, which is the top here, which is a number between 0 and 100 and because everything is green. I assume I have retrieved 100 points, which is pretty cool right, and that means captain is now done and remember my jenkins from earlier. My jenkins pipeline is now also finished.

B

It says everything is green. um Everything is done because you know captain was executing and and um and we retrieved the result. We also have here uh some of the um the artifacts. uh The evaluation result here is also available as json, so you can have. I guess I didn't have the plugin installed, but you can you can get all of the the results or you also get uh the link back to uh to the kept to the results bridge.

B

So here's the link back and that gets you directly to the performance as a service, and then you are also directly from jenkins back to captain all right. So now this was a lot of talk already and kind of demo. Hopefully that showed you. If I go back to my slides, what I did is, I said, hey captain.

B

I have an app, please run a particular test with a particular workload once it's done, you retrieve metrics, and then you calculate an overall score. That's what I want you to do now. I want to give you a little insights and kind of look behind the scenes on what this really is because kept is also more, and I want you to understand what else you can do and how it works.

B

So, for me, captain is actually different things depending on who I am because what captain really does it allows you to pick a use case that you want to automate one that is very important, and you saw this already is what we call slo-based quality gates. The other one is sre automation. This is the whole thing you just saw with sre automation. We mean test execution, optionally or additionally, chaos, chaos, tool, execution.

B

While the load test is running and then evaluation, then we have a use case that supports end-to-end delivery and we have a use case for production where we can do all the remediation. So you pick a use case that you want captain to automate for you, then every use case for every use case. You need to bring your configuration. For instance, you need to bring the workload you saw the jmetercon file earlier. That would be the workload definition for the quality gates, the evaluation to work.

B

You need to bring your slices and slos and there's also other files for different for different use cases. Most important thing is: you can connect your tools. Captain itself doesn't do the test execution. It doesn't do the monitoring it doesn't do the deployment it orchestrates right. You can connect your tool that does the deployment, the testing, the monitoring and then captain is really orchestrating all of these tools so that you don't have to build your own integrations to do a thing like test execution. Then evaluation. We have done this for you.

B

So what captain does it automates? The configuration of your tools, putting them into the use case and workflow make sure that the monitoring is connected correctly and, and you can see all the rest here for yourself with a little screenshot of the of the ui that you've seen underneath the hood what's very important. Is that um all the configuration is declarative.

B

That means you define everything either in yaml or json files. Everything is stored in git. That means every time you want captain to do something. You first have to make sure that the git repository that captain connects to has the right test files, the slis and the lslos, the workloads, and, typically you can. You will put this next to your own source code or your other tests and then, as you trigger captain, you just give captain access to these files very important, slos service level objectives I come to this later.

B

This is very core to captain and on the bottom right standards. Captain itself is open source and the way we communicate with other tools like with jmeter. What you saw earlier is through an open source standard. It's called cloud events, it's a json based protocol, which makes it also very easy to connect new tools, and I think the most important thing is: it's not a proprietary integration.

B

We are allowing you to build, integrations based on the standard, and we are also in uh the cdf in the continuous delivery foundation. We are in the cncf and we are also in the india in the interoperability special interest group. We want to establish an open standard to connect all tools through a common protocol, all right quickly, a couple of stories of people so that you know what's possible with captain submit from intuit.

B

uh He has been using captain in the combination with argo with jenkins and gatling for their. They call it uh large-scale distributed testing, distributed performance, testing and they're, using captain just for the quality gates. That means they have already figured out how to trigger gatling tests from argo for all of the different micro services. But then captain is there at the end, to do the quality gates based on slo, like the calculation that you saw earlier of this of this score, uh roman from triscon.

B

They are working with a lot of companies, especially in the financial industry, where they're using jenkins and azure devops and also neoload as a testing tool, and they are now using captain to orchestrate all that. That means they're using captain to uh trigger the execution of tests and then also do the analysis and providing it as a self-service back to their engineers as part of azure devops pipelines- and you know with this increasing the number of test execution number of test step uh and christian from ert he's a senior devops engineer.

B

They are using captain for a little more. They are using captain for delivery, automation, uh they have gitlab and then they use captain to do from gitlab the whole orchestration of their catalon tests, their chain meter tests, also the deployment and the promotion, and with this they are now automating the delivery, including full performance test, automation in their pipelines all right now this is the only art. This is the architecture slide. I want to quickly give you you've. Seen me earlier. Show you my cube.

B

Ctl get parts, so you know that when you install captain, you install it on kubernetes and you have a list of pots, but really what you get when you install captain like the raw installation, you get what we call the control plane, so the control plane is the core component that has that takes care of all the logic of orchestrating processes and sending events to all these different tools.

B

If you want to use captain just as you saw me earlier, you have to create a so-called captain project and in that project you need to specify what should this project be? For, should it be for delivery, then you have to specify so called shipyard file. Where you specify you know, do you have a single stage for performance testing? Do you want to have multiple stages? If you want to use kept nose for deployment, what type of deployment should happen?

B

A blue green, a direct, the canary, you can do all that, so you specify that in the shipyard file declaratively, if you want to use captain for all the remediation in production, you can specify a so-called remediation emo file, where you can specify.

B

If captain is made aware of a problem in production, if system crash something is slow, then captain can trigger the right tools to then fix the problem. So on the top, you have somebody that defines your processes in your projects now. The second thing is- and this is not a big separation of concerns in captain- there is no hard coded integration of these projects with which tools are used. You have a separate team that takes care of installing tools, extensions to captain and then the tool extension we call it.

B

The captain service really listens to captain events, because internally, captain is an event driven system, so you can see here git for config, helm or jenkins for deploy, chain, meter or notice for testing prometheus or damage waste for monitoring or captain remediation or servicenow or whatever for remediation. So you have a com. You have one thing that defines the projects and processes and two and the other two other team that defines the tooling and you can change the tooling without having to think where these tools used right now and which processes.

B

So that means you can switch from, let's say, j meter to neartis without having to figure out where where's the impact, and you can also change your process. So you say instead of doing just a performance test, we may want to first do a quick functional test and then the performance test. So you can change the process and don't have to think about the tooling and the beneficiary is either the developer, the tester, whoever you are and then you can say, hey captain.

B

I have a new artifact and I want you now to basically execute that process. The delivery, just the testing, just the quality gate or the remediation process, and what captain does it sends events for every stage in the process and every event is then picked up by the right tool that you have configured to consume that event and then this this tool is doing it.

B

So this is a high level overview uh of of what kept me is if you want to get a more detailed description about the architecture, I've been doing a couple of meetups in the last couple of months and in the um software circus that I did in october. First, I did one where I really went into more of the details uh of the the architecture why we built it that way, and also, I showed more of the also the continuous delivery use case and the other remediation use case.

B

But today I really want to focus on the test analysis and providing performance testing as a self-service. Okay, but you know check these out so first use case.

B

I know you are all quality engineers out there, so analyzing and automating quality gates and automating quality gates to to and to automate. The analysis you know is, can be a triple trivial thing, but can also be very tough thing depending on the test.

B

um What we have seen is that a lot of time is spent in manual verification, especially the more complex your tests become. Let me give you an example. I know a lot of you have invested in pipelines in automation, at least right, whether you use jenkins or gitlab or anything else, and building and running unit tests. You know, is easy. A unit test fails or succeeds, there's a clear definition of success or failure.

B

The thing is, though, if you add more and more tests to your pipeline, whether it's end-to-end functional tests or performance tests right this talk today is about performance engineering, performance tests. You get more and more data if you also add monitoring to the mix, whether it's prometheus or dynatrace, or data dog or new, relic or any other tool, you get more and more data and it becomes more and more complex. To really say: is this build now better than the previous build?

B

It seems to be as fast as before, but we're using twice as much cpu or the garbage collection runs much hotter, but we have more hardware now, so everything is faster. But what does this really mean? This is good, or is it bad and we're making 50 more database calls than before? It's still fast in our test environment, but this is a good or a bad thing. So this is why I see- and we see that this is very hard to automate the analysis here.

B

However, we think we can analyze, we can automate that we can definitely automate that, and essentially what we want to solve with captain is, instead of you having to build your dashboards, maybe one dashboard in one tool, maybe multiple dashboards and different tools. If you're, comparing performance test results with your monitoring data, but instead of manually looking at the same dashboards after every build, and then yes you're getting better the more often you look at it, but this is not efficient.

B

Instead of having to look at these dashboards manually, what captain really does it automates that process?

B

It allows you to connect to these tools and pull in the raw data behind these dashboards and then compares them based on the rules that you tell captain all right. So the way this works is again not an invention from us. It's a very easy concept.

B

We are applying what google call what google defines under their site: reliability, engineering practices, or at least one of the concepts it's around slis and slos. So what is sli and slo an sli is service level indicator is a metric, basically something that you can measure like the error rate of login request. If you're testing an app or if an app is in production- and you can measure somehow how many login requests fail. Then this is an sli right. It's a fancy term for a metric.

B

Then we have slos service level objectives. This is where you say: what is the success criteria? What do you consider this particular sli to be good could be, login must be less or log in error rate must be less than two percent over a certain period of time or in the period of a test. You are not ex you're, not allowing more than one percent right, so it's just a threshold.

B

Basically, now for those of you that have done any work in operations in ops, then you also know the term slas and you may wonder well. Slo sounds very much like slas. It's similar and slo basically says this is what I'm expecting and then sla is then typically what happens if we are not meeting our slos?

B

So, for instance, maybe there's a business contract or maybe we just use lose our users, because we are delivering a bad service right. It basically says what happens if we are missing our slos now I didn't come up with it. Google did a great job with talking about sls and slos, there's a great video that I at least found very, very good and very educational, so check this out on youtube.

B

Essentially, on top it says, sli drive slos, which then inform slas. Now. Why am I talking about this? Because we don't want to reinvent the wheel, because what we have seen when we work with with people that are adopting captain and remember. I also come from a dynatrace background. So from a monitoring background, we see a lot of people now looking into slos and actually sitting down and defining their success criteria for the individual services for a particular time frame, whether it's a month a week a day an hour, and typically your monitoring tools.

B

Allow you to measure these slis and compare them against the slos and then, for instance, alert on it will report on it and that's already great if we have it in production, but what we are now saying if this is a concept where we finally have people sit down and say this is good, then why not shift it left? Shifting left means why not just take it in our delivery process? Why don't?

B

We include the same slis and slos in performance testing, and if we integrate performance testing automated in ci cd, why not do it after every commit after we commit? We want to run some performance tests, even though they might be smaller in scope and then evaluate the metrics against our thresholds and that's the whole idea with the captain quality gate capability that you can at any point in time. Ask captain please look at these metrics. These slis compare them against these slos and then tell me how good all of these are right.

B

How how good are they? I mean what do what? What is my? What is my score? Is it green? Is everything green? It's just one thing red: what is it so to give you a little overview and explainer, because I think it's an important concept to understand the way we use slice and slows? Is you define sli as metrics? As I told you right, you basically say from your monitoring tool. I have these metrics or from your testing tool. You have these metrics or from a security tool.

B

The nice thing is kept, and you will see this in the next slide can talk with any type of data source. That means you can include monitoring, testing security code cover anything can be included as an sli, so you define the sli, you define then the slo. So what are your objectives here? In captain we allow you two things.

B

You can either specify a static threshold for pass in the warning, so like the first one is an example where response time should be faster than 100 milliseconds, then it's green or it should be at least faster than 250 milliseconds. Then it's it's warning. So that's one option, the other thing what you can do. You can also do relative changes, for instance with uh the last one where it says test step. Login number of service calls here we basically say we are, we are only. We are not allowing any change.

B

Plus zero percent means we're not allowing this metric to change at all in the previous line. By response time, we can also you see the combination, you can say hey. I want response time to be faster than 150 milliseconds, but I also don't want it to increase from one build to the next by more than 10 percent, so it's really flexible, and so this is what you specify on each individual sli and then overall captain calculates a total score and then normalizes it between zero and one hundred.

B

So you can say overall, I wanna achieve ninety percent green and then everything is good. So now, how does this work? If you have your first build you you deploy it, you run a test and then you say kept now you analyzed and then captain pulls in the metrics from your tool from your tools, one or two man. How many everything is green? That's a hundred percent build number two comes along you. Do it again, the evaluation again, you don't need to look at the dashboards anymore.

B

Captain is doing it then two are in yellow, which means it's only 75 by default, every metric is weighted equally, but you can also say one metric has more weight and can achieve more points was more important by the way now build three comes along. It seems that the response, time problem and the failure rate problem from the for build number two was now fixed, but all of a sudden we have more back-end calls to the back-end service.

B

It increased from one to two calls which is violating our slo, which is why we're getting penalized- and we are only achieving 62.5, which makes this build red, and because this is so automated and hopefully fully integrated in your in your automation in build number four right. The developer fixes these things right away and then makes the pull rig makes the the build, and you run it in test executions automated and then you automatically get back to the state where you really want to be.

B

This is kind of the explanation where you also see build to build comparison, how this works. Now? How does this work from an implementation perspective?

B

um Captain is an event driven system, as I told you earlier, that means captain can talk or can can pull data through events from from many data sources in order for every data source to know what to pull, you have to specify first, your slice, your service level indicators and they are specified uh specific to the sli provider. Here's an example for a dynatrace sli file, where you specify error rate is this particular query number of database calls countdb calls. Is this query for prometheus? This would be a list of promqls for a new load.

B

It would be a list of metrics from neoload and from whatever other tool you have. You would have your individual queries, but always a logical name to a query. Then the slo that I just showed you. This is basically not specific for a tool that provides sls, but this is overall. This is where you specify, for which sli do you want to have which type of criteria, just as I showed you earlier in my excel kind of table format, and then, if you have specified this and then you ask captain, please do the evaluation.

B

What captain then does it sends an event to all the tools that are registered with captain? That can provide data and say please give me your data here. Are your config files here are your sli files. Then each tool can report back the metrics, the actual values.

B

Then captain takes all these values, looks at everyone and then compares it against pass and warning and then gives points and then overall calculates a total score. How many points have you achieved and then normalizes it to 100 right?

B

So what that basically means is that you can take this facility to really automate now the normally manual process of analyzing data. Remember earlier, I said it's very hard to manually analyze these results and it slows down your end-to-end delivery pipelines. But what I'm telling you you cannot use whatever um in you know, automation you already built in in jenkins or in git or wherever, where you run your tests and then, instead of manually looking at dashboards and reports, you can use captain to query all of these results from your data sources.

B

Your monitoring tools, your testing tools. Let captain do the evaluation for you and then let's report it back to your pipeline, just as I did earlier right.

B

So you can integrate this with any type of pipeline, and this is then obviously reducing the normally manual time to a fraction of seconds or milliseconds, because this is all automated um yeah. I want to quickly go back to the demo to show you a little bit on on how this works, and then I have the other sections and then really hope that we have a couple of questions, so you may have wondered uh where do all of these? um All of these metrics come from now. There's there's, there's there's two explanations here.

B

uh The one explanation is that I have shown you in the in my beginning is that we allow you to upload.

B

As I mentioned, these slis and slos right, so one option is, and this is the I would say, the more git ops option. um You are allowing your performance engineers specified. These are all the available slis and then you add the query to it. So you specify here are my metrics and depending on the tool, then this is the query, and then you give this to captain and with giving it to captain mean you upload it to the git repo that captain uses for the specific captain project.

B

So this is an sli you have all of these particular metrics, and then we also have the slo and here's the slo, just as you saw in my in my slides for every single metric pass voting criteria. You see here also the weight, so you can define different weights for every um for every metric, so this would be if this fails or if this succeeds, it gets twice as many points as everything else, because the default weight is one.

B

So you can upload this. The other option that we found very useful, especially if you get started, is uh if you're, using a monitoring tool that has some dashboards, whether it's you know maybe you'll use grafana.

B

In our case, in my case, it's dynatrace.

B

We have built an integration with captain where I can build a dynatrace dashboard like this, so I just put metrics on my dashboard that I would normally- and I would normally look at this dashboard and then I, if I zoom in here a little bit right. You can see here that this dashboard has been augmented with additional data. Like I can say, hey, this is the response time p95, that's the 95th percentile and with the augmentation here, where I say: sli equals service response time p95.

B

I tell captain that hey captain, don't look at the at the yaml files, but take this dashboard is basically your definition for your slis and your slos. So take these metrics and then also pass in warning right. You can see here a couple of examples. Then captain is just parsing.

B

It is, then creating the s line as slowly underneath the hood and is then um you know asking the tool in this case again dynatrace uh to to give me the data, but I think this makes it very easy to get started until you get into a mode where you have figured out all of your xlis and then you just let your performance, engineers or developers or architects just modify them in the in git.

B

But the cool thing is all the metrics here are then automatically analyzed right, as you saw earlier, this was my performance test and let me just zoom out yeah 100, so every run here, let me run one more. Maybe we just run another performance test.

B

While I'm talking build with parameters now, I'm running a different load test: let's do the performance 50 build, and so this will now again trigger the next test. Now this, as you know, takes probably a minute or two. uh What we should, however, see soon is that new test coming in.

B

So, let's wait for this, just that you see that I'm I'm operating here on the live patient, um so I think this screen refreshes every 30 seconds and let's just see here, let's refresh it I'm impatient today, there we go right, so this is now the one that I just kicked off from jenkins now executing performance test strategy 50. now this will take a couple of minutes. So let's go back to this one all right. This was the previous test, so here the heat map. I see this was the car.

B

This was the latest test, build number 55. You also see it in the x chart and then the previous one, the previous previous and like the second to last and so on. uh You may also wonder when I click on this. Why is this kind of gray, because you can specify a comparison strategy? Do you want when you have relative values? Do you want to have it to com? Do you want it to be compared with the previous test, regardless on whether it was a good test or a bad test?

B

What you only want to include and compare yourself against the previous good one? This is why, if I select this one, I see the comparison test. Is this one here and not this one, because this one actually didn't do good. I can also specify that captain should calculate a baseline across multiple tests. In my case, I've only specified just take the last build that passed. So I see the heat map all down here. I also see each individual value.

B

I also have a chart option, so I can look at these metrics right over time uh in the chart. Also, I can query this data through rest. So in case you wonder, there's like a little a little thing here that actually gives you the underlying protocol of the data that is sent back and forth between uh captain and the different tools, and you can also query this data. So if you don't like our visualization or you want to use this data somewhere else, you can also extract it.

B

Okay, um so this is this and then I guess right. This test will run. Oh yeah test finish the performance test. The performance 50 was a short test done and now it's it's retrieving the latest data.

B

Let's come back to this in a second, because I want to make sure I get through all of this and then um I also have time for questions.

B

So this was now a core part, which is the analysis based on slis and slos. I've shown you how I triggered this from my jenkins pipeline. I want to also bring in some of our users right christian. They are using gitlab where they're doing deploy testing and then in the verification stage. They basically make a call to captain and say: hey captain. I just ran a test. Now you do the evaluation and then you give me back the results.

B

This is just an example of you can integrate this in any type of existing delivery pipeline to simply automate the validation of your tests.

B

Important for you now, I've shown you as an sli provider dynatrace. We have also prometheus, we have nutus and we have wavefront if you have any other data sources or any other tools that you want captain to use. Also for the evaluation, then you know build your own captain sli provider. It is very easy to do. I have a custom sli provider that I've built for a recent conference.

B

um It is really right, if you look at this and just open up this sure. I trust my own content right. If you want to build your own extension, you just need to know you can do it in any language. I prefer our templates that we've provided in go. So this is my sample gardner andy, as a live provider, because this was for a conference, and here I've also shown there's also the slides uh for my uh for my for my talk online.

B

uh This was just an example: implementation where this provider is parsing, a json result file and it's delivering a certain data set from a certain section of the json right.

B

So everything is here and the code is really the only thing you need to do is you need to implement a couple of handlers? Actually it's just this function here. If you, if you start from the template, there's a function called handle internal, get sli event, and then this is the function that is then, you see here. I've really tried to explain every single step, so that should be fairly easy and if you don't like go, you can do the same thing in any other language.

B

All right, uh how did my build go up test? 56 is finished, as you can see here now.

B

This one now is not that great, because it seems we have a service response, time, p90 and p95, so the 95th and the 90th percentile violating our slas and therefore we got penalized and therefore we only achieved a total of 81 points, even though we said we want to have 90 points for good okay, so you can see this here and if you want to see the details, we also have the link from here directly to the dashboard with the raw data.

B

So now you can see also you know what would the dashboard have looked like if I would have looked at the dashboard myself, but captain did it for me, so I don't need to look at the dashboards anymore after every build cool all right now. The next use case right and now I think we covered most of it. So this is just going to be repetition. um What I've kicked off in the very beginning is a scenario where I have kept them uh execute a test for me now.

B

Let me tell you first why we built all of this, because we know it is possible, obviously, for all of you to build your own performance test automation.

B

Now it is possible, but there's a lot of questions like where do we run the tests? How much hardware is needed? You can read for yourself whether we split the metrics to there's also a lot of do-it-yourself approaches out there a lot of great guides, so you can all do it. If you want to, you know, feel free. However, we thought we wanted to also provide this capability and then obviously combining it with the automated analysis.

B

What I've just shown you so instead of building this thing in your own jenkins, pipeline or whatever tool you have, you can just say: hey, I'm just using my pipeline for deploying a particular application and then I'll just take captain to also execute the test. For me, just as what my what my two examples have done earlier, captain can execute the test, can then validate the slos and then at the end, bring back the result right, which means you don't have to deal with building this yourself.

B

We've built this for you and because of the event with nature and the way we can integrate any type of tool, you can then add whatever tool you want here to the mix and then add any type of data source and then always come with a very nice way of evaluating all of your results.

B

um I think already shown this demo right. If you want the demo that I've shown now is also available on this github repo, so github.com captain sandbox jenkins tutorial. I have a couple of tutorials just for quality gates uh for performance testing as a service. I also have one for full end-to-end delivery.

B

So if you are interested in this, you know check out my tutorials and all the samples, an example also again from our real users. Christian. If you remember him uh what he's doing, uh he is also taking this things that they built into their gitlab pipeline earlier, like they're, testing out and delayed captain. Do it so now, when they deploy, captain will run the test will do the evaluation and report back, which means there are pipelines that they have to maintain and get. They are much less complex because captain takes care of it.

B

I think that's also something to remember how can you build your own captain test execution service similar to the data service, the sli service? Earlier you can build your own service right, it's very easy to build. We have. We actually had somebody also from from your team uh from from krakow, giving a captain user group talk uh last week and uh he was los using locust, I believe and uh they're trying to build a low-cost integration adrian.

B

So it's really easy to build your own integrations, and with this we make the captain community even stronger last use case before I open it up for question.

B

While this may seem not related to performance testing, but I think it is it's the future of performance testing and I call this test driven operations by combining performance, engineering and chaos engineering.

B

So what that means is captain has one use case for production, which I believe is pretty cool.

B

So if you install captain- and you have it hooked up with your monitoring in production and your monitoring tool detects a problem like, in this case, dynatrace detects, there's a conversion rate drop because of a cpu problem on one of the services, then what captain can do it can execute your remediation workflows? That means somebody has to upload what we call a remediation, workflow and saying. If this problem happens, then do this action, this action and this action in this case it seems captain- has found this remediation file now.

B

Remember, captain itself doesn't do anything it just orchestrates. That means captain would look at that file and say ah for the conversion rate problem.

B

We have the first action defined as scaling up or scaling by incrementing it by one, so captain will send an event and then, whichever tool you have connected to captain that can then handle this scaling up event can execute, will report back to captain when it's done saying I'm done and then captain will evaluate.

B

Did this actually solve the problem? That means it goes back to your sli's and your slos. We also call them blos business level check. This looks if the monitoring tool has closed the problem and so on and so forth. If this is not the case, so if the first action didn't solve the problem, it would execute the next action and then again evaluate so it's really a we call. It closed blue permediation. That means action, evaluate and then again, action evaluate and in case nothing solves the problem.

B

Then the last action would be maybe to escalate it to a human being and now, but this is really exciting to think about it that just you're, basically building self healing operations or production systems. It's also very scary, because this is also codes that runs and logic. That runs in production, and this should not go out there untested. Therefore, I think the next generation of for performance engineers is actually to do. You know call it test driven operations.

B

I really like that word, but it's it's combining performance testing with chaos, engineering, just as adrian showed, and so the idea is you're running a performance test and while you're running the tests, captain can also launch your performer, your chaos tests and then what you can do is performance engineers. You can, first of all validate as you're monitoring an observability platform pick it up correctly.

B

You get the alerts and with that you can then sit down with the architects and say hey if we are enforcing this particular chaos like taking away cpu or killing the database or doing xyz.

B

What would the right remediation action be that we can automate? So you can then, with that team define remediation actions that captain executes as you're running the test, while you're enforcing chaos, and with this make sure that you have an all the remediation workflow in place that can bring the system back to a healthy state.

B

If you do this as long as necessary until at the end the system can he can can withstand the chaos attack, because then- and only then, you really know all of your code on the load and under chaos is really delivering services based on your results.

B

So it's the same thing. What I showed you earlier, but taking it back into your performance environment and adding some chaos and then really helping your engineers to also build a an auto healing and self-remediation concept around it all right.

B

Last piece of information: you can integrate captain with your existing cd tools, there's a lot of different integrations. We have out there, but in general, there's a captain api. So let me quickly show you that if I go back to my to the captain's bridge, then here on the top right, you can get the token. So let me copy the token and then you can open up the captain api right, which it's a nice swagger swagger ui.

B

So here you can authorize and then basically the important ones are posting an event or uh there's once, but particularly to evaluations to a quality, git evaluation. So here I could say I want to evaluate the last 30 minutes for a particular project. Right I could say.

B

What do we do here? Performances you see, I could basically just you know, say: project proof is service stage performance and then service name is what I call. It also is purpose project, and this is previous service. I was really cool with names, I guess perth yes project and then I would hit execute and then captain would trigger the evaluation.

B

Now this one doesn't make a whole lot of sense because I don't have load on it, but actually, let's just do it just want to make sure that your prefix project, previous service and performance, execute it's been triggered uh when you trigger an event like this. As a response, captain is event driven, which means for everything you kick off for every workflow, every workflow gets a unique id.

B

We call it the captain context and with this captain context you can then also do pull requests where you can, you can say, hey you know is, is the? Is the process ready? Yes or no? um So you can go in here and you can get particular events of a particular context. Right, you can say hey. Is it done yet or not? But if I go over here, you can see there was a new event coming in. Let me just refresh here there was the evaluation started, but it's already done right. Evaluation started.

B

It was manuel that was me and I just did a quick evaluation for the time frame of 30 minutes right. Obviously, in the last 30 minutes I had some data, so I just I just kicked this off now. If I wanted to you see here additional property, one additional property too. You could also see some examples here on. If I go back to my api service- and I go back down here- and I do try out right so this is also where you can add any type of metadata these labels.

B

So if you call it from an external tool, add links back to your tool. Add any metadata, because that metadata goes through the whole process. That means any tool that kept involves in the process will have access to all of these labels. You can see some tools, then add more metadata, so that's really cool too good. um I believe, let's wrap it up and then hopefully we have some questions.

B

So hopefully you know. First of all, what captain is captain for me is a really cool way of automating processes, around delivery and with delivery I mean it can be the smallest just like doing quality get evaluation. It can also be test execution and quality gate evaluation. It can also be delivery.

B

It can also be all domination for production. So you pick your use case and then captain connects all your tools and provides to you that type of automation and then you can trigger it from anywhere where it makes sense. So have a look at this and if you want to get started, there's great ways to get started with some tutorials tutorials captain.sh and you, if you go to the captain website, which will get a complete re, a rework.

B

I think we are publishing it this weekend, so you should definitely check it out um and you start from there and contribute and let us know, join the slack channel. That's very important. My last thing before join the slack channel on slack.captain.sh start us on git. Follow us.

B

Let us know if this is something that helps you or what is missing yeah, and hopefully it makes your life easier. With this, let's see if we have some questions.

B

Marching, hopefully you are still there with me.

A

Yes, I am still there and, yes, we are waiting for questions, so, if you have any, please ask them on chat right now. uh Yes, this. This was really nice presentation and I especially especially like the last use case with with chaos, engineering and yes, this selfing process really looks great uh and I think it is worth trying.

B

Exactly and you should you should actually write you have uh in our last in our last user group meeting and let me just see because I'm pretty sure you know him oops, that was a wrong click.

B

uh Where's adrian, um first of all, here we go adrian has been doing a session on continuously evaluating application. Resiliency with litmus cares low custom captain. I think that's that's something that is really cool. Yeah.

A

Okay, can you share the link on yeah.

B

Yeah, definitely let me put this into the.

B

uh Where should I put it, I put it into our chat here that I have with you and then you can post it.

A

A

I have one question: I see that there is. There is a question already, but I will ask my question first and I'm wondering because yeah exactly you, you show the show the homepage of captain of the project and I'm wondering if.

B

A

I've seen a little bit uh a little small uh icon of slack in integrations, and I and I'm wondering if it's possible, to be informed in real time what happened in this pipeline like may I see that something happened without you know. Opening the the website.

B

Yeah, so here is just a very basic integration that I have enabled for slack, so you can see here. I get notifications every time. Captain does something makes a configuration change test finished.

B

So I get this notification, so captain is an event driven system. That means every time. Captain does something and pushes, let's say a process into the next stage. Then one or multiple tools or integrations of captain can can consume that event and the slack integration. That's actually done through the notification service. Let me show you that as well, so captain notification service.

B

Here we go so this service here uh allows you to integrate, either with teams slack and webex teams. That's the integration that we have right now. If you have any other tools that you want to integrate with, we also have what we call the captain. Generic executor generate executed service this one. If you enable this, it allows you to upload.

B

I think I have some usage. It allows you to upload shell scripts, python, scripts or web hooks. That means you can say, hey captain every time you have a configuration change event or every time you have a test finished event or for any event, please execute this particular script and the script could either be a bash file that is executed within the container where your service runs, it can be the python or it can be a web hook.

A

Okay, so here is the the connected question yeah, because the question is: what's the plan for the future integrations forecast so.

B

First of all, captain is an open source project. That means right. We hope that the community will come to us and say this is what's missing and then either put in requests into our git repositories, and then maybe some of them are actually implementing some of them or we see such a big demand and then we find time, but what we see is integrations, as I said earlier with. Low-Cost, I think adrian is, is working on that I want to see a gatling integration from a testing perspective um from a monitoring perspective.

B

We hope that other apm vendors, like datadog and the relic we'll see integrations.

B

We've also talked with folks from spinaker techton flux that we get integrations there, so that captain, if captain, for instance, orchestrates the whole end-to-end delivery process, can make a call to spinnaker for the deployment and then captain can go on with the performance testing right. um Another option would be uh integrations. uh I think we already have ms teams here, but with all the remediation tools.

B

So one of the use cases that we have remember is the whole auto remediation, like fixing things, so we already have integrations with servicenow and with ansible, uh but I'm pretty sure, there's others like chef puppets.

B

This is what we want um and I think, if you go to the uh to the way captain is organized.

B

The core project is on captain captain, um so this is where captain coor can be found, and this is also where I believe we have them here um already, for instance, look for spinnaker exactly so we have under integrations. I think that might be the better way to look at. This uh integrations.

B

Right and then remove spinnaker, and so here are some of those uh github actions relay. These are some that people have already put in and if there's more others that people want, then let us know.

A

Okay, great nice, you already showed us the tutorial tutorials page, but the question: what's your recommendation to start with, what's your first step to learn to work.

B

With so honestly I, when I started with captain the biggest challenge I had understanding kubernetes and getting because captain requires kubernetes kubernetes right, and that was like two years ago. For me, kubernetes was new and I really struggled um and there might still be people out there that where kubernetes is still like, I don't like it, and I don't know it's too complex so for people that are not familiar with kubernetes.

B

I suggest two things. um There is one tutorial. That's called captain in a box and captain in a box is installing captain on a linux machine using micro k8s. But it's just you just execute a shell script and then it installs everything for you. It installs captain it installs uh git. Actually it installs a sample app. uh It does a lot of things. That's called captain in a box. It's a great way to get started. Another thing that's one of my tutorials is called captain on k3s.

B

As I told I think in the beginning, I really like k3s keys from rancher, so this is kind of my tutorial on on installing captain on a very small machine. So the only thing you need to do is this is for dynatrace but other examples. You just need to run this curl command and that's it. You can say, for instance, with prometheus and then it will just install on your local linux machine captain on a k3s cluster, and it really literally, I think, there's some other requirements up here.

B

uh One vcp four gigabyte of memory. That's all you need, so the easiest is to stand up some environment and then you can do this and then you don't have to deal with the complexity of kubernetes. Now, if you have kubernetes and if you're familiar with it, then I suggest there's a couple of tutorials out there on you know installing captain eks on aks or just the easiest is: if you go to captain.h and go to the docs uh release, 73 is current.

B

uh 0.8 is coming soon, but if you're clicking on operating uh captain, then it just goes to the quick start. So you are just downloading the captain cli and then you install captain with captain install uh and if you don't like that, because what captain install does it will basically just install it on the on your cube config.

B

We have some advanced installation options where we give you helm charts. So then, here we have all the helm options and then just the helm, charts.

B

Okay, hopefully this answers the question.

A

Okay, great, I I think that the current version version is 0.7, so my question is uh is because still there is no major version right, the the the first version, so this tool uh stable already or do you experience some stability issues still.

B

um I don't want to lie right, we don't, we are not back free. We have also a major change coming up with 0.8, which is already in alpha version and which will be, I think, ga is planned for mid of february.

B

That will change some of the events, but what 08 also has is two things. First of all, from a stability perspective, we learned a lot about performance of captain we're using mongol internally and we ran into some performance issues with the way we were using because we didn't optimize on indexes and all that stuff. So 0.8 has all of these performance improvements, but 0.8 also has we are ripping apart, and I think there's also on the youtube channel from captain a really great set of videos. Captain 08 alpha explained some examples.

B

We are ripping apart the control plane from the execution plane what that means. Right now, with o73, I install captain on a kubernetes cluster and everything captain triggers like, in this case g meter, the g meter service and the load will be executed from within the same kubernetes cluster right and this might be good, but maybe you want jmeter to be executed from somewhere else and what we are doing with 0.8.

B

We allow you to install the captain control plane, which controls the events and the processes, and we allow you to install the different services anywhere.

B

So that means that allows you to use captain for executing tests somewhere outside of that kubernetes cluster. You can also, then, use captain to do the deployment in other kubernetes clusters or in other environments that might not be kubernetes. So that's, then all possible. So o8 is a big change and we hope this is going to be the last time.

B

We also make major changes to the to the events itself, because remember the events it's these events here right, the ones that we're sending in all of these events have a particular type and the type exactly specifies what you want to do and kind of where captain should start with the process, and we are introducing a couple of new events uh to allow some new things that we have. We couldn't do before, like parallel execution of actions and then waiting together until all the actions are done.

B

So there's going to be start and stop events, that's important now. Our goal obviously is later this year, hopefully to get to a 1.0 release. That's our goal. We also the other goal for us is that we want to become an incubator project. So, right now we have sandbox project on cncf and we want to become an incubator project, and for that we we also need to get closer to that stage of.

A

O1 great great, I can't see any questions on chat, so I think we can slowly come to the end and yeah. So we encourage you every one of you to to to try this tool because it looks really promising and thank you very much andreas for for for being here today with us and uh showing us this, this tool that that was pleasure to host you.

B

Yeah, thank you and you know, as you said, normally, we would now take a beer and drink it together. uh We can just all take our beers now and in the spirits spirit string together and hopefully, once the pandemic is over, we will all meet each other again. That would be great.

A

Yeah next time we will definitely drink some okay. So thank you very much. Thank you. Everyone and see you next time bye. Thank you. Bye,.