GitLab GitLab CI/CD Tips, Tricks and Hacks, 28 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GitLab Scaling Runner Vending Machine for AWS: HA, Scaling, Spot and Windows

Description

Check Easy Buttons for GitLab HA Scaling Runner Vending Machine for AWS: https://youtu.be/2dXw8Dx6ENw

- Self-Service Vending (SMA) of Runners by Developers
- Runners are built with IaC, rather than hand crafted.
- Automatic Hot (2 hosts) or Warm (1 host that respawns) High Availability.
- Automatic availability scheduling (runner is off during off hours).

Many more details and the code: https://gitlab.com/guided-explorations/aws/gitlab-runner-autoscaling-aws-asg

A

Hey everyone, it is darwin. I am releasing a template today for creating an elastic scaling runner with git lab a gitlab runner on aws. It specifically uses auto scaling groups and has spot support.

A

It has a lot of other cool things as well, like automatic updates, latest omni selection, uh all kinds of little doodads, which I will eventually get into the readme.

A

But since it is ready for external testing, I wanted to go through a little bit about how to use it uh and to you know, accomplish your own runner setups.

A

um One of the things that's in view for this is to be able to use it for um ml, ops or aiops. So it is intended to be able to do some pretty strong scaling and that scaling to be based on actual machine metrics such as memory utilization or cpu utilization.

A

There's been a lot of details handled as you'll hear as we go through one small example. Is you can't get memory utilization off of amazon linux? I'm sorry off of amazon instances of any kind. Unless you install the cloud watch agent and start reporting on those metrics, so that has been handled so that it's there and some memory scaling has been done on linux, still having a little bit of problem with getting it working on windows.

A

So this is the release alpha one, and there are four examples for runners being released: docker executor on amazon, linux 2, shell executor on amazon, amazon, linux 2, shell executor on windows, this one's important, because a lot of individuals doing dot net framework builds will most likely need to use shell runners right now and because of that, they'll need it to scale.

A

And so it's not something you know without docker in the mix uh scaling is a little more challenging so being able to scale a shell runner is helpful and then uh windows docker is also available. So if you're, actually using windows, containers and there's been some optimizations applied for it specifically since containers on windows are there in the window native windows, os are much larger.

A

The pull policy is altered to be to allow more image. Caching, obviously, the assumption here is that these runners will be run privately in your own organization, so you'll have to um you'll have more control over exactly what runs on those runners there's a little bit of security concerns with caching images, but compared to the amount of time it takes to pull a windows image.

A

The security concern kind of fades a bit at the root of the repository is the readme and the readme is not comprehensive. This actual project is based on another project called the ultimate asg aws asg lab kit, and that project has a whole bunch of features, most of which were inherited into this one. So they're not restated in this readme.

A

One of them is always selecting the latest linux or being able to peg to our latest os or being able to peg to whatever army you want another one is in the case of this runner, one grabbing the latest runner or pegging to the specific version. You want.

A

Another thing: that's really cool about. It is once you've stood up a runner. You can go back in and rerun the automation and change, just one parameter and cause it to rebuild the runner instances with the latest omni and the latest patches and the latest git lab version, if necessary. So it's got kind of built-in maintenance and that will allow you to very easily bring runners up to date, even if there's only one runner in the asg, it will also let you update large runner farms if you have large scaled ones.

A

Another thing is, it has schedules, so you can shut the runner down and start it up at certain times of day, so sometimes you're dealing with smaller teams they're only working 10 out of the 24 hours a day, maybe only five days a week where you can save money by completely shutting it down also has spot support.

A

So a few other features as we'll go through. You can simply load it in the cloud formation console and as you'll see in a minute. It also has the um the help. uh The description for each parameter is very verbose so that you can kind of just from the template itself figure out. What's going on.

A

Let me just quickly take a peek through the readme here to see if there's anything else, the whether it's spot or on-demand compute flows up in tags into gitlab, because you might be totally fine with running, say, an ml ops, workload on spot compute that you might lose or you might be. Okay with running.

A

You know scaled testing of your software and spot, but you might not be as cool about running a workload that is waiting for, say, a deployment job to finish on spot, because even if it's a very low likelihood of termination, if it is terminated, you'll never get the complete and proper completion of that job.

A

So you are able to make spot non-spot scheduling decisions based on your workload right at the get lab job level.

A

There is some rationale here or oh one. Other quick thing it uses is um life cycle hooks asg life cycle hooks both for spinning up and shutting down what this does on spin up is it lets us patch, even the kernel and reboot the machine before it's put into service, so it won't be terminated when you try to reboot it, because you took a kernel update on windows or linux.

A

It also, though, and spin down, lets us potentially properly drain the runner, as well as to deregister it now right now, draining is not in place, so it simply deregisters a runner and lets the instance terminate um and then one other thing too is even if you are going to use docker machine you'd have to create your own runner script at this point just that piece, but at least if you did a one instance, non-scaling asg. If that docker machine terminates by some fault of nature, it will automatically respawn.

A

So one thing even for docker machine is, it brings back true h a or what we call kind of warm h. A docker machine has been deprecated and git lab has deprecated the docker machine um executor in kind and so trying to use native uh scaling. Appliances on clouds is one way, we're trying to look at plugging this gap, and so this is one of the efforts that's associated with seeing how we can go forward without having a dependency on docker machine for scaling.

A

A couple notes here about troubleshooting, actually there's a lot of information here about troubleshooting. One of the toughest things about infrastructure's code is the long cycles. It takes to actually run a test and then, when you have a problem to be able to find the problem, and once you find the problem, what I usually do is test the fix right on that instance that had the problem. So as long as it's not a timing error or an account issue with the account that spins up the instance, then a lot of times you can perfect.

A

Whatever was the problem on an instance that failed and then run another start. But because of that, if you don't know where to look for all the pieces and parts, it can be a challenge. So within here, one of the things we highlight right away is that it's going to be having the ssm agent on it and what that lets you do is not have any configuration of ssh or ports or routing or internet gateways, in order to get a console into your runner instance to either debug it or whatever else you want to do.

A

um The other well we'll get into this in just a bit, uh so here's a whole bunch of locations both for cloud formation in general, then there's some specific things about this template, such as a special runner, config script. That's downloaded and run as well as a termination monitor that we install. So we tell you where all that stuff is also in the spirit of full parity for windows, this all of the functionality that we'll talk about works for windows and linux.

A

I've given a couple specific tidbits for those of you who might be linux savvy and are needing to do this for windows.

A

One thing: that's a continued frustration is that windows does not have a true console editor and I have actually complained about this bitterly to microsoft, but because it doesn't, uh and because, if you're running um a windows, docker um cons, windows, docker machine or you're, using the console to get in, you have no text editor to be able to open config files and change them to try to fix fix problems.

A

So I have a one liner here that installs chocolaty and then installs the nano editor after that, and then you have a console based editor that you can use in the um ssm console.

A

Also tailing text logs on windows is the same as linux. The cat command is there in powershell.

A

One thing: that's not the same is tailing actual windows logs, so the traditional windows logs are specialized database files, and so this command is restoring that capability to be able to tailor those logs and look into them fairly easily, just by specifying the name of the log and how many minutes back you want to look.

A

um There is also the same information about the windows stuff as there was for linux, same information about the cloud watch and how it was installed and how to debug it um and then there's a bunch of information about concerns and considerations of configuring scaling. So if you're going to be doing scaling with amazon, asg versus other things, you may have used before it kind of tries to give you some guidance and things to think about.

A

So, within the repository there are four scripts for the four different runner types um docker on linux, docker um oops. This one is wrong. I will correct this, but this should be shell right here, I'll fix that up and then windows, shell and windows. Docker. There are a couple memory. Scaling is not working on windows yet so to actually see it scale down based on memory utilization. I don't have that working, but everything else is working.

A

Also, these are sourced. By default, they're sourced from a raw, a get raw url you'll see inside the template. You could re-point them to s3. So if you don't want whatever's in there to be on a public um get url somewhere, then you can do it in s3 or even as a file, that's embedded in the process, so uh but the one of the benefits of doing it.

A

This way is you can easily see how to make your own, but also, if you're, trying to perfect one of these, you don't have to keep standing up cloud formation and shutting it down. What you can do is simply scale down to zero instances and it kills everything update your script, which is out on a get raw, url and then re-trigger say now.

A

I want one instance in the asg and when you do that, it'll spin up and dynamically source the script that you just fixed, so this is also flexible in that it lets you perfect. This part of the install more quickly here is a sample ci yaml that has all the scenarios. So the four different runner types are all covered here and here is showing it completing successfully on a test project, and then this is showing all four runner types registered to the exact same project in gitlab.

A

These are just screenshots to show you that it indeed is working.

A

Okay, so armed with that information, let's hop into well, let's quickly quickly take a look at the files, then we'll hop into the console. So our main cf file is here and then we have. The only other thing we really use is the runner config, subfolder and raw urls that tap into these files as you'll see in the template. So that's the main meat of the entire thing right. There.

A

Cloudformation is interesting in that when you build parameters they are useful as a prompted form, as well as completely automated head lists, and so we're going to go through the prompted form once you've perfected. What you want the prompted form you can simply call cloudformation with the parameters, you've determined and it can be completely headless and run in a different way. I even have a cloud formation runner that eventually will be able to run this template so that you can, from an existing runner, deploy a gitlab shared runner.

A

So if say, you're a sas customer and you're using shared runners, you could use this template to from gitlab gitlab shared runner create a scaled runner in one of your amazon accounts, so we're going to start with a template, we're going to say upload the template file, I'm going to choose so in this case I have already downloaded the repository cloned it locally and I'm picking the file here eventually.

A

I hope to get this so that's up in the cloud and you can do one of these single clicks that loads it right into the console for you, but right now, that's how you get it in here, I'm gonna hit. Next, uh you have to give it a name, so I'm gonna say asg docker, uh let's say: let's go with os next linux, docker and we'll say number 100.

A

This version number is actually it's just a cheesy way of showing the version number. For some bizarre reason. The one thing that they don't do is expose the template overall description on this page, which would let me express the version to you without having to fake a parameter um this. This will also probably need an update to match the runner uh version that we have on this project.

A

um Also, the readme blog post is pointing to the readme file here. So, if you happen to find this template in the wild, you can get back to the instructions. Here's where we select operating system as we go through everything's defaulted to linux, and then we will talk through the windows changes you have to make on a second pass, so that's set to linux.

A

This can really be anything it's not doing any date arithmetic all this does is embed a date in the user data, and that means, if you go back into the stack and say, make a change change this date, and I could change it to abc as long as it's different that causes cloud formation to go. Oh you changed something.

A

That's in the user data we got to roll all the instances, so it's just a clever way to both humanly express the last time you ran a patch run, as well as to force the asg to roll all instances to the latest version, and so, when you change this, it does both. If you have it pointed at the latest omni, then it will grab the latest omni and patch it.

A

If you have it hard-coded to an ami which we'll talk about in a second, then it'll source that on me and do all the latest patching here. I have this set to no patching only for debugging. This is just for testing so that it's faster technically, you should probably pick at least security.

A

Only um all could cause a lot of time and and cause your your uh image to grow bigger, but you can pick whatever you want these, and these work for windows and linux, then the ami so always use latest is what it's set to by default.

A

If you do this, leave it always set to latest then there's a parameter at the bottom that uses amazon's official, ssm, parameter pointers to find the latest linux. In this case, since we're saying linux is going to be amazon, linux 2. I'm just going to go to the bottom, real, quick and show you that and then we'll cover it again later. So this is the name of that parameter. That will get me. The latest amazon linux 2..

A

If you read through all this stuff, there's a whole bunch of windows ones we talk about, but it also tells you how to get these parameter strings. So if you need more, you can run this to get say the latest windows one and there's a different one for the latest linux one and the latest ecs optimized image.

A

So we'll talk a little bit more about that later, but basically, I can specify omni here which will cause that parameter. We just looked at to be ignored or I can leave it like this, in which case it'll use that parameter to get the latest army of whatever official amazon army you're pointed at.

A

I would advise you to use amazon's official armies because they are optimized for their cloud and um a lot of times. They have things pre-installed on them too, uh like and uh and nvme drivers and ena drivers to optimize uh various parts of the system. This is the number of git lab runner, concurrent jobs. By the way the parameters have weird names, because one of the other faults here is that I can't sort these to be grouped together unless they, literally by their parameter name, alpha sort to be together. So 10 is all my one.

A

Os is all my os stuff and then three git lab runner is going to be all the get lab runner stuff. So that's why the parameter names are a little bit odd, but I preferred that over having the parameters say for the runner scattered all through here or trying to apply weird terminology to get a name that would make it sort right. So this is concurrent jobs and if you're familiar with get lab runner, this works for shell runners and container runners tell us how many jobs can run on the runner at one time.

A

Then you have your runner instance url, so you give it whatever your gitlab instances and if you're putting it against gitlab.com, then you'd put that there as well.

A

Here is your runner registration token right now, you could specify multiples for windows. I don't think I have this going for linux yet, but basically, if you look in your gitlab console in a project under settings, ci cd or in a group under settings, ci cd you'll find a runner registration token, and if you enter that here, that is where the runner will register into your system. So you could take whatever group level you register it into it'll, be available to all the projects under that hierarchy then tags.

A

These are just arbitrary example tags, but the idea here is if this runner's got a specific purpose, it's for a certain team or it's for a certain account, or it's for a certain purpose that you would add a tag here. That would then be able to identify it. Now. Only the the linux docker executor is set to run untagged jobs by default, because that's the way it works on gitlab.com, then the other ones, all the other ones. You have to specify their tags to get a job to target them. So just keep that in mind.

A

This isn't just informational in order for a runner to be selected for a job you're, going to specify all the tags that are that are on the built on to the runner, so don't get too crazy here, then you have the get lab runner version and the get lab runner version. If you set it to latest it's going to grab the latest, otherwise you can peg it to a version. So let's say you have a private gitlab instance and you're, not quite at the latest version of gitlab all the time.

A

Then you can set it specifically to the one you want in general. If you have a newer runner version than your git lab instance version. That's usually: uh okay, I've run it. I've had a large runner farms so to speak. That were like that and it didn't seem to cause a lot of problems. Getting too far behind can be a problem, so uh also it tells you how to go.

A

Look at the get lab runner releases so that you can get the version tags specifically that you might need you don't specify the quotes if you do specify one specifically, uh the runner installation configuration script so right now. This is self-referential back to the same repository, and I showed you there was a folder with these in here.

A

It is free form, though, so you could point it somewhere completely else. If you wish to um just be sure that, like one of the primary mistakes you might make, is you're setting up a windows 1- and you forget- to change this- and it will go ahead and run and download the linux script onto the windows machine and since we're not picking distros that have linux services for windows, it's not going to run, but we wouldn't want to do that anyway, because um you've got to run you're going to run a linux runner.

A

You want to run it on linux because it's cheaper to run install code, deploy if you could install code deploy. So you can service your runner instances with code deploy if you want I'll, probably be removing that because it's less likely then the desired capacity. So how many runners do you want to start up by default?

A

If you get, if you create any scaling which we'll talk about in a minute, this will immediately be adjusted by scaling. So sometimes people put two here and then as soon as the runner starts up, it scales down to one because they have no um no load on the system. So two is an example uh in the case that scaling is working, it'll it'll scale down, but if you turn off auto scaling, then this would be how many you get.

A

This is the scheduled shutdown and scheduled startup. They work independently.

A

um If you put this in here, then it'll use asg scheduling to shut down the all runner instances it'll set to 0 for this one and then it'll set to 1 or actually to the desired right here.

A

If you set this schedule, you don't have to use post schedules, so an example would be. If you have a runner that you only use during deployment events, you could set a schedule in here that you know will be you know well beyond the deployment event, maybe once a week it shuts down on saturday at 4 00 a.m.

A

So then, whenever you need that runner, someone goes in the console, fires it up, you use it and then you don't have to worry about it. It'll eventually shut down. The other scenario is smaller development teams. It might be on fairly common working hours, so you might have only you know: developers only work 10 hours, even if they're in multiple time zones out of a 24-hour day, and maybe they only work monday through friday. Well, you could potentially set that runner to be only available at that time.

A

If you set either of these, they there will be a tag that indicates on the runner. That's a scheduled runner, so it's not confusing to people. This runner keeps get going off. What's going on, it's going to tell you in the tags, this runner has a schedule applied to it.

A

Then, if, when the schedule goes hot, how many instances do you want to start with uh then? In the case, this is only ever used when you're updating, so you go back into cloud formation console and you say I want to change that that patch date and then do you want to replace the entire asg or roll through the instances I'm not going to go through asg mechanisms and and how that works. You can read about that on amazon's website about how those two methodologies of doing updates differ.

A

Then. Do you want to auto scale? So the reason you wouldn't want to auto scale is you're really doing this for kind of warm h a so that if whatever's in this asg terminates it'll be re-established. So let's say you, you don't want to bother with auto scaling, but you want two runners of a certain size. You can specify all that and if one of them dies a new one will be spawned, so you don't have to auto scale.

A

Auto scaling is on by default. This is the default maximum scaling size. So we'll only allow this number of runners maximum. If auto scaling is on, then we have the scaling threshold, so we have the scale in threshold and the seconds for that threshold. So this is a scale in of 40 threshold.

A

It has to hold it for 60 seconds now 40 of what that is selectable down here. So you can do cpu or memory based utilization scaling and just by changing this parameter, it will change all the alarms and all the underlying capabilities. For those of you familiar with asgs you'll automatically recognize well hey.

A

I could have 10 thresholds and have some on memory and some on cpu and that's true, and if you need to be that complicated, then you can use these templates as a starting point and go in and edit the cloud formation to add to your heart's delight as many schedules as many scaling scenarios as you want.

A

This is usually sufficient for most small organizations and it's plenty complex enough um as it is, so it's not done by default. We also have the scale in uh percentage and the scale in number of seconds and then, of course, whatever the utilization of whichever aspect you want memory utilization, it requires cloudwat, cloudwatch agent to be installed and configured.

A

This template does that for you, that can be its whole own pain uh to get that set up and debugged and working. So if you pick memory, it's gonna do memory again windows or linux, either one as a sample. This template will create a bucket for you that the runner has access to. This is usually to grab binaries and use them during some sort of activity, um or if you are going to do code, deploy to update the runners themselves, then you'd need a bucket for code deploy to work against.

A

So that's. If you leave it at create one for me, it creates a randomized bucket name. The bucket name is based on the template, and uh so the template, the the I'm sorry, the stack name and bucket names cannot be all caps and there was no easy way for me to filter this or to change from uppercase to lowercase.

A

So you need to keep your stack name all lowercase, or it will fail when it tries to create the bucket uh when we get down here too, it says create one for me, but you can also freeform put in a bucket reference here of one that you've already created. That already has the permissions for the runner. So a lot of things in here it will create one automatically for you, but if you have your own, you can put it in without having to start modifying the code.

A

The whole thing also works on a principle. I call least config, which means, if you tell it not to auto scale, it's not going to create scaling alarms, because it knows it doesn't need to use them. So anytime, you say: well, I don't need this or I don't need that it's going to trim down what it builds. So don't be alarmed by this. If you go in and you're like, oh there's, no scaling, alarms on this one and then oh, what do you know? I set scaling to false. It's purposely.

A

You know creating the smallest surface in terms of security and configuration as possible, um also with permissions to the bucket and permissions to anything else in the template. It creates those as custom managed permissions, and the reason is is that, even if you are say using your own bucket or your own, I am profile you can then either use these as a model. That's how I would set it up or directly attach them to an existing bucket or an existing im profile so that you can get the permissions you need.

A

So it does that as well to to make it easier uh self-monitoring termination intervals. So how often does the instance check to see if it's being terminated, then this is necessary for that closeout hook to work and to deregister the runner it's set to one by default here, because spot terminations give you two minutes.

A

So this would be how we tried to get in there that we were notified, that we're going to be terminated because we're a spot instance and when we check every minute, then we're able to say oops, I'm going to be terminated quickly, deregister that runner. So we don't have a whole bunch of orphaned runner registrations in gitlab and you can expand that to larger or just simply disable it.

A

If you disable it once again, there won't be any self termination. Self monitoring here are the spot instance types I expose four technically amazon lets you do eight, I thought four would be sufficient to get started. These, allow you to specify what instance type your runner should run just a side point which is also in the readme. I do not feel like. You should use a bursty instance, so t2 or t3s for runners they are bursty and there are lots of possible problems with them.

A

So I would always go for at least the m class and m5 is the nitro series, which is further optimized for network for storage.

A

uh So it's uh I, I tend to tick with stick with m5s uh for anything because they just act like you, expect uh kind of a hardware machine to act, a little more money, but you're going to be getting some density if you're, using either the docker runner or you bump your shell runner concurrency up also, even if you turn off scaling and you turn off spot, this functions to prevent you from failing when there's instance, type exhaustion, so sometimes you'll say to amazon.

A

Give me an m4 large in this region and I'll say I don't even have even one left for anyone at amazon and then your automation fails well. This will default to be a fleet request so that when you make that request, it says I don't have any m4s it'll automatically go what about m5s and then, if you get an m5, then you succeed and you don't fail so even without scaling, we solve that problem of instance, type exhaustion in a region spot on demand base capacity.

A

So how many non-spot instances do you want to always be started?

A

So if you set this to one you'll get one instance that will never be terminated in the asg, because it's not spot uh if you want all spot 100 spot, you're running some sort of machine learning loads or you have this whole asg is dedicated to things that it's okay, if they terminate then you'd set this to zero.

A

Or if you just don't want any non-scaled capacity, so this is non-scaled capacity. It will always start up that many now. This is even though it says spot here, that's just because we're in the spot section. This is on demand percentage above base capacity. So when I have this set to 100 I'll get no spot because it starts one on demand and then every other instance that it scales after that or launches after that is going to be an on demand, because I said I want a hundred percent on demand.

A

If I do 50, then one of every two will be on demand and one of every two will be spot. If I put 0, then everything above the base capacity would be spot, and if I put 0 0, then the entire asg will be spot instances. So it's a little bit challenging on the nomenclature. I tried to preserve this part of it uses the amazon terminology, and so does this part here so this these are their terms and their concept of um how to think about uh this.

A

Whether you think about it, you know that you think about how much of the scaled capacity should be on demand instead of thinking about the other way or, however else you could have semantically constructed, it would try to follow their semantic construction of this idea. So I'm going to put 100 here and one here spot allocation strategy. So, if you're not familiar spot, please go read about it.

A

A lot of people are really freaked out about there being a lot of terminations and amazon publishes the termination rates for all instance types, I think, maybe even in all regions, but in any case it's quite low. It's below five percent, for, I think all instance types.

A

uh So already it's less of a concern than most people have. They think that it's going to be terminated. You know consistently or or they're always going to be suffering from. Where did that runner go, but then to further uh reduce your terminations amazon, provided this new scheduling strategy called capacity optimized, and what that means is I'll pay a little more money to reduce my terminations, even farther so you're, getting to the point where you're almost to zero?

A

You know frequency of termination. So that's what this is. This is a key pair name to allow you to log in through ssh, which you can do in the amazon console or through traditional ssh. I don't like this. It's not very secure compared to using ssm session manager, but I provided it for those of you who want to use a key pair name or have other reasons for using it.

A

If you do troubleshooting mode true, this is where we're going to install make sure that ssm session manager is installed in the client and that the right im permissions are attached to the the uh the instance profile so that we can do the remote log into that instance through the console.

A

This is not it's not working correctly. If you start this, it will work for cpu utilization only, but it prevents the cloud formation from exiting until you terminate the utilization loading. So, unfortunately, cloud formation will only wait for an hour for the asg to stabilize before it starts saying: okay, I'm going into rollback I'm! It is my intention to fix this and to, if possible, add memory utilization as well right now.

A

It only works with cpu light is realization when it does work, but I would advise, unless you understood what I just said: don't use this yet and then. Finally, this is the parameter that above, if you want to look up the latest omni, you give it one of the parameter strings that amazon publishes that you can also discover through commands that are in the help here and then. Finally, the I am role for the instance, so it will by default, create one for you that works with a basic runner.

A

Now many of you actually do aws orchestration with your uh instances. So your runners, so you say, hey, stand up a cloud or stand up a cloud formation template and do all this stuff. So you need a lot more permissions and some of those you know back to the amazon hypervisor resources themselves.

A

If you're doing this well, then you'd just specify the one that you need right here. There's a bunch of policies, the managed policies. I told you we created that you can go out and simply look at and either incorporate them into your existing profiles or just add these manage policies in just be careful. If you destroy this template, it's going to destroy those as well, and so it could affect operations.

A

The uh other thing, just a small word of advice, is that uh one of the things you can do here is, if you already have. uh I am roles for your development teams, then simply using that, I am role here can be a great way to make sure of two things. One is that whatever they do interactively in amazon as a developer, they can also do with their runner, and so there's no confusion or frustration that I ran these commands and they worked fine.

A

When I was logged in, I send them to the gitlab runner and they don't work. The second thing is, if you have multiple teams sharing accounts, that developer can do everything they can do from their local login as a human, but nothing more. So if you do actually have restricted rights, then they won't be able to do any more than what their normal role can do. So that's a reasonable way to do it.

A

One thing to keep in mind is that means, though, that anyone who can merge code to a branch that can run on that runner? That has those permissions now in effect, has those permissions.

A

So um you might have one person who has access to certain role and you give 10 people the ability to to merge a mr into that branch. Well, now all 10 of those people have that capacity indirectly, and if you understand how gitlab ci works, they could run arbitrary commands that do stuff. Whatever permissions that runner can do, they could create themselves a an.

A

I am profile if you've, given that that runner admin permissions to that account, so be careful here, you can have a kind of a pass-through attack and this is true of all ci systems. It's not specific to gitlab, but um I call it out here because sometimes people can be careless and just give it an admin profile and not realize that now, if I can get code to run on that runner, I have admin to that account.

A

I'm just going to double check everything here. I think we can let this one run. ah I need a token I'm going to go grab a token from a project that only has a hello world project in it, and then I will change it later. So the fact that you see my my runner token is not going to be problematic for me.

A

uh Just briefly, um you find your tokens, like I said either in your group or in your project. So I'm in this test project I go to ci cd and I go to runners. Expand and within here is the runner token that will attach runners here. You can see I've run this and I have a scaled runner for all four scenarios already running in this particular project. So I'm going to copy that token and then we're going to paste it in here.

A

And I think that's everything click next it'll bark at me if certain things are wrong. Next again we are going to create, I am stuff, so we have to check this box click create and away. It goes so it's going to start creating resources. We can look back in time here at other ones that I have run so here it's creating all kinds of resources.

A

We can look at the resources view and you can see what kind of resources it's creating.

A

I talked about the standard policies that it sets up, so you can easily examine those or attach them either use them as a template or attach them directly so that they that you can have your own custom runner profile and make it work easily.

A

Once this runs, I'm just going to go back to my project here, we will eventually get runners registered here, I'm going to quickly go into ci cd and show you where I ran that test job and it passed. So we have a linux, docker runner, a linux, shell, runner, a windows, docker runner and a windows, shell runner and they're each just little hello world scripts. That basically show you that we were able to run on that runner type.

A

If we quickly look at the get lab, ci yaml that backs.

A

This we can see that, um with the exception of the linux docker runner, we have to specify the tags to let this run on the other runner types so that we can select them and not have you know a bash job run on um a windows machine uh or whatever I mean, if you you can make that work if you want to but and then notice here, the ones that are running on dockers, I specifically, even though they have a default image.

A

I've called out the image, so you can easily see that it's running on a docker runner.

A

I also want to show you how this is somewhat managed to allow you some capabilities in terms of uh management. So if we go into the ec2 console here, we can see that I have um a bunch of instances running and when I open those instances, I can see under tags that I've added some useful tags. So the all the get lab tags are passed right into here, but also uh when we were looking at the get lab runners.

A

A

You can see that they are named in a way that you can find them so they're named instance id in account id in region so that you can quickly find them in the amazon console, and the same is true here we pass through as many tags as possible.

A

Also the compute type one is one that goes in both places as well, so that makes them more manageable. If you start to get hundreds of runners, then it can be a challenge just to figure out. You know. Where is this runner, and how do I go and manage it? So this creation is underway. I don't know if it will complete in time here. What I might do is kind of cut the part out of the video where we wait and we can take a peek at it when it's done.

A

While that's cooking, why don't we take a look at how to set this up for windows? I'm just going to show you the key parameter changes that we need to make in order to set it up for windows.

A

So when we come into here, the first parameter- that is the most important, is to pick windows here in this os platform parameter. It looks like a simple change, but if you look through the template, it actually cascades into a lot of template changes throughout. So it looks very simple, but it was actually fairly challenging to configure this so that you can just pick windows and away you go.

A

The patch scope is something that also applies to windows, so both operating systems obey these different patch scopes. If you want to apply patches and then we also have a pegged ami id, which will also be followed, if you peg it to windows, omni id here's the place, it's easiest to make a mistake, and that is this runner configuration script. It needs to point to a windows one. We have four that we included in this repository.

A

It expects powershell if you're using um a windows configuration script. So this is an example in this case it shall actually what I'm going to do, I'm going to make it docker.

A

And then, when we go down below the only other parameter you really have to worry about is, if you are selecting the latest omni, you have to point it to one of the parameters for windows, and if we pick one here, I have all the a bunch of the container ones shown here so that if you're doing windows docker, you pick one that has containers already installed by amazon.

A

It's good to always use the amazon amis, whether using linux or windows, due to the optimizations we mentioned earlier, and so this would be the setup I'd do and hit next now and it would go ahead and set this up for windows as well. So that's just a few parameters that are a little bit different for windows. Everything else does what it's supposed to do on both operating systems and there's no need for you to play with those those items.

A

So this is how to get it launched for windows. Let's take a look if we have uh finished up on our uh linux setup.

A

All right after saving you, the time of having to watch the cloudformation grass grow, we have fast forwarded and you can see everything is completed successfully.

A

I'm going to hop into my console for get lab and I looked over in the ec2 console and then do two new instances in this asg are these first two, so these are the new runners that are now attached could take jobs and if we scale down these will disappear, they'll be deregistered and they'll go away, so we keep our console nice and clean.

A

So that is the basics that I wanted to take you through. With regard to this template, I hope that it is helpful. I look forward to hearing from you if you're using it or if you have enhancements or bugs with it. Let me know- and I will be keeping a keen eye to making this helpful part of the infrastructure's code you can source in guided explorations thanks for your attention.