VMware TGI Kubernetes, 25 Jun 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: TGI Kubernetes 158: Bare metal clusters with Cluster API Tinkerbell

Description

Join Naadir Jeewa with guest Jason DeTiberus from Equinix Metal as we explore Cluster API Tinkerbell https://tinkerbell.org/ for automated, declarative deployment of Kubernetes clusters on bare metal infrastructure.

Notes: https://hackmd.io/@randomvariable/HJ1Kiql3O

Photo by Callum Whale: https://unsplash.com/photos/3pH7TxHU6gw

A

Hello um welcome to tgik, thank god. It's kubernetes, I'm your host nadir jiwa! I don't know why I'm trying to introduce myself to the tv presenter um so I'll be your host this evening and I will, and today we're going to be joined by we're by jason.

A

How are you jason.

B

Hey great to be with you today, nadir.

A

Cool, so we're going to be talking about tinkerbell, which is a project part of the cncf. To uh that manages bare metal infrastructure. um Do you want to introduce yourself jason.

C

Yeah, so for folks that don't know me, my name is jason datibrus I've been involved in the kubernetes community, specifically around cluster lifecycle management. Since about 2015. Now I think I've been involved with the cluster api project, formerly at heptio vmware and now at equinix metal, helping to bring these cloud native management ideas for kubernetes clusters into the data center.

A

Cool um we'll just wait a few minutes for people to float in. um So if you don't mind, if you can introduce yourselves in the chat in the youtube chat and uh see who we've got, we've got uh martin from the netherlands, hello, there uh rory from scotland just there was just there last week. So I'm doing some biking and uh nice to see you for a bit yeah. How you doing um to see.

A

Right so I'm just gonna put link to the notes as well, so go to the right place.

A

And hopefully that works youtube even link.

A

A

All right, let's see what else, um if you've got any ideas for news, we will be we'll be covering: what's happened in the cloud native community, so if you've got anything you anything you've seen that's cool, um maybe you've seen it on hacker news, which I don't read, which I refuse to read. So if you see anything interesting there throw that in the chat and we'll get going in a few.

A

Minutes all right.

A

Yeah and just be aware, I have closed the windows and the temperature is rising, my room, uh but it's either that or the noise is outside. So if, if the, if I start looking a bit shiny, that's why so just be aware, I got your raj hello. How are you doing.

B

It's a bit toasty over here too nadir. So uh don't be surprised if you see me start sweating too.

A

All right, all right, let's get started, so let's have a look at what the what's happened in the news. So so the big news- I guess this week well last week- is the kubernetes 122 beta is out so um got a couple of new features, uh and one thing to be especially aware of before you do anything else is there is a bunch of deprecated apis?

A

So if you are using things like admission, admission, registration or api extensions stuff, around crds, the beta apis are going to be removed in 3.12, so make sure your applications are up to date um and one one of my interesting things that I saw this week was the interactive mode for exact potential providers.

A

So a lot of people using tube cuttle with like sso or external authenticators, and like got much better support for that today, so yeah, if you're working in a corporation or enterprise, that's really how you want to authenticate with your givenet's api server. So that's good to see, um but yeah, please make be aware of the deprecated api, so we also saw some sandbox projects arrived this uh in the last two weeks, so I'm gonna click on these.

A

I haven't looked at a lot of these before so we're just gonna see what they're about uh the first one we got is uh cubrella, uh so we've got. I had a quick look at this. It's around it's using q, which is a declarative, configuration language. I believe, and it's around like helping you ship applications. So this is, if you want a bit more layering on top of just shipping standard kubernetes manifest, then this might be something you want to look at.

A

um So we've got that we've got cube rip which, uh which is something that we use actually certain uh quite folk, and I think someone with jason works on that.

C

Yeah, so um qbip is a project. That's out there to specifically help folks that would need load balancers in environments where you don't already have a cloud provider there to provide the load balancer.

C

So it's particularly helpful for on-premises type deployments being able to support things like being able to migrate an ip using vrrp or also using uh bgp and published bgp announcements to provide load balancing services where you don't generally have it um it's similar to other projects. In the space like metal lb, um it came out of uh dan finneran's, the creator of it. He created it with uh some of the plunder work that he did a while back for prototyping uh on premises, uh kubernetes deployments um and yeah.

C

That's it compared to something like metal lb. I think the difference is. Is that there's a growing more active community around cube vip than some of the things like metal lb, where some of the issues have grown quite a bit stale and and trying to get fixes and changes in has become slow and problematic.

A

Yeah, that's cool, I mean I hadn't, even uh so we we use it in vsphere to primarily for the api server the kubernetes api server, um so you're doing that at quite an early stage in the kubernetes cluster lifecycle. um So I haven't been paying attention to google. I didn't realize it's. We now support like a service type load, balancing two bit. That's pretty exciting!.

C

Yeah and it's meant to do both of those things and be able to do both of those things in combination with each other and not fight with each other.

C

In some cases, you can have issues with two different services trying to manage uh similar things like bgp routes uh between those two different use cases and in qubits designed to do both that api server load, balancing and the service type load, balancer cool.

A

All right, let's see what else we got so we got uh cube dl l, which is a deep learning project from alibaba. um I am, I don't. Do I've never done deep learning, don't know much about it. It's pretty cool. I suppose.

A

And we've got two projects which I've again not looked a lot at. They seem to be related to around service mesh and particularly around uh monitoring service, mesh performance um and the workloads under them underneath them. So it's all interesting to see um so move on unless anyone's got anything exciting in the chat and just check hello, ymo uh savvy from istanbul hi there uh yeah hi everyone, hi vlad cool.

A

um So today we're gonna talk about um cluster api and in particular we're gonna, but we're going to be talking about in the context of um bare metal. So we don't I'm not going to tread over old ground. We've talked about cluster api, so quite a few times on tgita and you'll. Find links to the show notes, uh you'll, find links to previous episodes and show notes. Do not leave for the live stream. Stick around! Don't go away! um Look at them afterwards, please!

A

um So we're going to be looking at a project called tinkerbell, which is what jason works on so I'll just get their website open. Do you want to just introduce what this? What the project is about.

C

Before we go there did we want to bring up the uh new release of cluster api and the things.

A

Oh yes, thank you. Yeah keep, keep keeping me honest there. So, yes, uh cluster api version, zero, four zero just got released this week. So, uh oh, what am I doing there we go. um So this is the new v1 alpha 4 release of cluster api, and so we've got big improvements to remediation of the control plane. So um so, if you're not familiar with class api, just briefly class api is a way of using kubernetes to deploy and manage kubernetes itself.

A

uh We'll look into that a bit more.

A

We'll have a look at it running a bit later on in the episode, um but one of the things around uh rolling control planes is they're not like just replaceable throw about throwaway items right. There they've got the data for the kubernetes cluster, so we've done quite a lot of improvements, uh making sure that if something goes wrong, we're able to repair them machine health check was a way of like detecting problems, with no nodes and replacing them, um and so there's been quite a lot more flexibility added to them.

A

A new thing that's been added is externally managed infrastructures. This is where you uh one of the things that the cluster api providers do, so we have um so there's different providers for aws or azure, and if you want to bring up a kubernetes cluster in any of those environments, you're going to have to bring up some supporting infrastructure, like networks etc.

A

um These currently, they stand up a sort of default, template and sort of stamp out a sort of a usable set of resources that you want to use for your cluster, but you might want to use something else, so we've got improved support for where you maybe use terraform or use some other cross plane or aws controllers kubernetes to build that infrastructure. And then you can just tell cluster api to use these uh bits of infrastructure instead and not to touch them. Importantly.

A

um So do take a look. um This is the core release and the some of the providers um will be updating to v1 alpha 4 um as we go along, so um take a look. um It's not you'll need to wait a little bit uh for the providers to be compatible with this version of the core release. um Yeah and yeah go play around. I suppose.

A

Yes, so now, back on to the topic that we are talking about is today, which is tinkerbell, take it away, jason.

C

Yeah so the biggest question that most people have when uh we start talking about tinkerbell is what is it? And you know? Basically, the goal of tinkerbell is is to provide declarative life cycle management of hardware resources in the same way that kubernetes allows for that declarative management of application services.

C

So to do that, there's basically a few different micro service based components that comprise that the first one is the underlying tinkerbell workflow engine.

C

This is where you can define things like the actual hardware that you want to manage, as well as the set of actions that you want to perform on that hardware through the means of what we call templates, which is basically just a way of templating out a set of uh in distinct actions that you want to perform on that hardware.

C

A lot of the times we sit there and talk about tinkerbell in the case of bare metal provisioning. It doesn't actually have to be provisioning operating systems on hosts. You can use it to run any type of set of orchestrated operations against your hardware, using tinkerbell.

C

So that covers the uh the templates and then the last kind of component of the main tinkerbell is um workflows and what a workflow is is basically just a way to take a predefined template and apply it to a specific piece of hardware and then, once you have those defined, you have your hardware defined. You have some templates defined and- and you have some workflows created at that point- the other components come into play as well.

C

So there is a boots component.

C

This is the micro service that provides dhcp services, tftp services and, basically is what performs the basic uh pixie booting uh environment for the hardware and, if folks aren't already familiar with uh bare metal management, pixie stands for pre-boot execution environment and what we're basically talking about here is being able to have a dhcp environment that gives out an ip address to a host that isn't already pre-configured from that point, depending on the heart, how how the hardware is configured it could try to do this pixie boot process in one of multiple ways it could be legacy using the boot p protocol, which goes back uh many many years to windows based network boot deployments, or it could be trying to do one of the various different types of network booting in the various different types of hardware.

C

Pixy implementations that there are and that's going to depend it could depend on what type of bios is installed on that host. It could depend on what type of network cards are installed on that host.

C

All of that thing anybody who's had managed infrastructure in a data center knows that there's very little consistency between different devices and how they support network booting as a concept, and because of that one of the things that we do in boots is we do the initial network booting, and then we serve up over tftp, a standardized ipixie environment, which is basically just a small binary um and, depending on you know, what type of host is is uh talking to boots. uh If it's a uefi one, it serves up a uf uefi binary of ipixi.

C

If it's a biospace one, it serves up a compatible binary there, but it serves that over the network to the host, and then it really starts the standardized pixie boot process there and from there it decides. You know: does this hardware exist in tinkerbell, yes or no?

C

If the hardware does exist, is it configured to be able to do pixie booting there's a setting in there, and if it is, then it also checks to see if there are any workflows defined and if there are workflows defined, then it will go ahead and offer up a small in-memory based operating system to the host.

C

In the case of the example that we're going to use today well in the diagram up there, it's mentioned as oc, which stands for operating system, installation environment. This was a larger initial uh os image and the idea is: is it's just an image? That's meant to be able to bootstrap on any hardware and be able to run the tank worker to be able to execute the workflows that are just defined for the uh the hardware.

C

um However, for the case of uh the uh what we're talking about today, uh we're actually replacing that oc image with an image called uh hook, and this is another project uh within the tinkerbell project and it's an actual real minimal uh os image based uh off of the uh linux kit project and it's a much smaller os image, whereas oc was uh in the neighborhood of.

C

I think two gigs, or so the hook image is down in uh a few hundred megabytes, so it can actually bootstrap into the workflows and start running the workflows uh quite a bit faster than the old oc image would, in addition to that, there's also a metadata service called heagle, and what that does is that gives us the ability to have cloud-native bootstrapping workflows like you would have in a general cloud provider, you can have per system defined metadata that is available at run time for use within systems like cloud init.

C

So hegel gives us the ability to actually use cloud init based bootstrapping, which, when we get into the actual running of cluster api provider, tinkerbell we'll actually see that we bootstrapped very similar to pretty much all the other providers using cloudinet there's also another component with tinkerbell uh that we're not actually going to be using today and that one's called uh pb j uh which stands for power boot and jelly.

C

um I don't think we ever defined what that j really stands for other than jelly, because how can you have pb without jelly, um but this component actually provides uh interaction with bmc management components.

C

So you can actually do things like power on systems remotely and things like that, where, if you don't have that component, which if we get to my local lab environment here, I don't have any fancy hardware, so I actually have to go and physically turn buttons on and off and if you're, in a data center, you don't actually want to for somebody to do that.

C

So with pb j, you can interact with the uh baseman management capabilities in in server class hardware and be able to remote power on and power off systems and and interact with them a little bit better.

A

Cool thanks so uh by in our sort of standard tgik format, we normally just download the repo live um beside that point. Given that we're dealing with bare metal and we need some hardware, um we decided- that's- probably not going to be the best approach, so I'll go through. We've got two lab environments got jason's, which uh is going to be a bunch of nuts.

A

I believe some intel nuts I've got mine, so I don't have a bunch of free bare metals, so I did these actually using vms, because there's no reason why you can't, as I said earlier, there's no reason why you can't use tinkerbell to deploy it to vms. So um I'll, just show you what my my lab looks like, which is like this uh yeah, so yeah. This is uh super. I'm I am available for um you know. My rates are competitive.

A

If you need me to do any cabling, so, oh yeah, there we go uh so I I live in a flat in london, be aware, so uh don't have a lot of space. So this is basically a cupboard dryer cupboard uh which has got my sort of mechanical heat recovery ventilation unit. uh There's like an iron next to it and a bunch of like uh like face, masks and other random junk. uh So that's where that lives. So I've got three desktops um they're core i5s.

A

um They do have uh bmc in the form of intel repro, but they're all consumer grade. It's pretty much all consumer grade hardware, um otherwise, uh so the I think one's called i7 because I bought a newer two core i5 machines.

A

They've got one they've all got uh intel, 10 gig card um and then that's connected to this switch here and then we've got one gigabit net networking for the vm network. So um if you're not familiar vsphere does has a replicated file. Storage, not full storage, replicated block storage technology called vsan, uh so you need some fast networking for that and then remotion is the ability to move vms between different machines. So you need to copy the memory in as quickly as possible.

A

So that's what that is, and then, on top of that I've got a.

A

So this is what the resphere environment looks like. So we've got our free hosts um server, three server. Four and seven five. uh Don't ask me what happened to one two three. um So what I've done here is I've created a folder and vsphere. um It's not really anything more than a uh just a way to organize things that file our hierarchy, not much more than that. I've got one machine which we is going to run.

A

The um tinkerbell environment and cluster api make that bigger um and we've got a bunch of machines which are going to act as the bare metal. um So we've got basic machine, 0, 1 24, they they are powered on. um They are they're just on. They just keep cycling waiting for dhcp information, so they're set up to boot from the network they're um set up as uefi um and they they're currently rebooting every 10 seconds, because there's no um nothing's offering anything for them.

A

This is all sitting on its own vlan, so each of these hosts have a virtual switch, so we've created a tinkerbell vlan uh read on 103 and then we've configured that it so I use a micro, tick switch and so we've got a vlan here. Reload 103, we've connected that to the free servers and the switch itself, given the switch and ip address for routing.

A

So that's going to live at 190, 168, 103.1 and one thing we'll get into a bit later is we've added a route to the core switch table to point one: six: nine, two: five, four one: six, nine, two, five four at uh the sandbox vm, which is living on point two. So if you're not familiar with one six, nine, two, five, four one: six, nine, two five four that is normally the ia aws imds um endpoint.

A

So if you're doing anything in aws and every uh virtual machine in aws has um one six, nine two five four one, six, ninety five, four endpoint that you can contact for to get metadata about yourself which tinker bell is here doing that in the form of hegel which we'll go through later. So this network is fully isolated apart from having a router in it, um and so there's no no conflict with my wireless network or anything like that, and then we what we've used is the.

A

Tinkerbell sandbox now I didn't need to use vmware at all, as I found out well, when I noticed um you can use vagrant with it for this right, jason.

C

Yeah, so we wanted to try to make the sandbox uh as approachable to people as possible, so there's two options in the sandbox right now: there's the vagrant-based environment that will basically spin up a minimal system so that you can have a provisioner vm and a at least one worker vm. I think you can specify um some number of worker vms as well, um but you know that's basically a way to kick the tires without any type of hardware.

C

Investment, there's also a terraform based one that lets you spin up an environment using actual equinix metal hardware as well, and I'm sure you used the bare docker compose based workflow when you were standing up your environment. So if you do have the hardware or vms available, um you can go that route too.

A

Yeah, so we'll just go through some of that now, so what that is that the right thing no.

B

A

That's mine, that's mine! There you go right so started off with the sandbox environment and I've ran this grip, which I believe fiddled with my networking. A bit on my machine um is that right, so it put it created, did it set up some ip tables walls with my connected docker or something.

C

It did um it's. It's mainly was intended for using um within the vagrant environment, so that uh you know you're, starting with the machine- that's not already pre-configured with the ip addresses. So it's trying to set that up for you.

C

I think we are, if it hasn't already landed, there's work being done to make it more easy to run without uh overriding the network settings on the local host because, as you saw, it can be problematic at times.

A

Yeah so and and.

A

We just take a look at so um I'm not a regular bunty user folks. So.

B

A

Normally uh fedora so.

A

I believe there's a file in here. That's right. Let's have a look so ens 190, which is the network interface. So we added the one six nine, two, five, four one, six, nine, two, five, four! Sixteen! So that's! This is where this machine is going to pretend to be a imds service and we'll go into why what he's doing right? How does he go? How does this actually work in a bit um but yeah? That's that and- and we have some rules in the nap table. No, it's still that long.

A

It's no case so yeah we've got some that rules that will pass ports on this vm uh to docker, uh so this um got postgresql htc people and that so we'll go through some of those components as well. So the other important thing is in this sandbox scale, uh so we have this deployed. um First of all, we've got this end file, so you run this generate m that with some stuff, and it creates a um yes, it originally would download oc and then, but we will replace the well. We have all.

A

I have already replaced the hook in a minute, so it downloads uh some the tinkerbell components from key okay, great key key. Is it key um um this the host ip and then it's got some usernames passwords that it generates? uh And finally you can see my administrative password for the vsphere. I will rotate this afterwards didn't know about this beforehand. Don't worry!

A

It's not a regular password that I'm using and you go it's not it's not on the internet, so you can't log in don't try, okay, um all right so um and then we in the deploy directory. I we have some stuff set up in the state. Let's have a look at that state directory. So.

A

uh uh So it created some certificates, for us is that right, jason.

C

Yeah, so it's going to go ahead and create some certificates. It's also going to do things like set up the uh some of the state that we need uh persistent state that we need for some of the prerequisite services.

C

So, in addition to the services we talked about, there's a postgres database, that's a dependency there's a web server and there's a uh oci compliant registry- that's needed, so the state directory actually stores that and if you're running it on the vagrant environment, it actually mounts those directories into the vagrant environment. So if you're tearing down the vagrant environment bringing it back up, uh it comes back up quicker the next time because it saved the previous state from the last run.

C

So you don't have to do things like download the oc or hook image and and get that in place on the web server again, you don't have to reload uh any images that you've loaded into the oci registry for running the actions and uh you don't have to regenerate new tls certificates uh for the purposes of talking to tinkerbell and and that sort of thing.

A

Cool, um so this webroot folder is contains the um the directory. That's going to be served up by nginx and in that um I've got this tinkymaster.gz, which is actually from the hook project. um I'm not going to build it because it takes quite a while uh takes quite a while. So um it was a bit fiddly as well, so um uh that expands I in it ram fs and everyone. Do you want to just explain what those are for folks who are not familiar with those items? Yeah.

C

So you have two different files here. The first one is the actual uh kernel binary and then the second one is going to be an initial ram disk uh that can boot up for the os and since we're not talking about an os, that's being installed on the desk, there is no root file system. So everything runs uh in that initial ram disk in kind of uh ephemeral, state cool.

A

um And the final thing we've got here is uh ubuntu 2004, coupe v119.11, so that's been built from.

A

The image builder project, uh so this is used for most stuffing cluster api, so with class api, we are creating machines from disk images that have all the bits pre-installed. We don't want to be um because in most cases you want to scale quickly. You don't want to spend time installing all of the components when you boot up. So we we rely on pre-generated images and there are. There is a recipe for um building a raw image for use with uh tinkerbell.

A

That's another.

B

A

Ran beforehand because it uses qmu and um my environment is virtualized, so nested virtualization is somewhat slower. So I did not do that again. Yeah sorry,.

C

It could definitely take a little bit. um The other thing too is this was our second iteration for the tinkerbell provider, the first one I did like you said we downloaded all the bits at runtime and it gets fiddly, because sometimes a binary is just not available. You know, there's internet routing issues, or sometimes some packets are dropped here or there.

C

So you end up with inconsistency or you may end up with completely different binaries if you're doing things like updating the system packages when it comes up- and things like that, so image builder allows us to do that.

C

However, in a future iteration, you will not actually have to copy the image builder based image to the web server or a web server. You can now with tinkerbell. You can stream an image directly from an oci compliant registry as well, so we're going to be looking at.

C

How do we update image builder to be able to publish to an oci compliant registry and then updating the tinkerbell provider so that it can pull the image directly from a registry that way we'll be able to offer a better experience similar to what you have with like the aws provider, where you can get a kind of default upstream, pre-built image to use, and you wouldn't necessarily have to do the image builder based thing, even though, if you're talking about a production environment, you should probably be building and validating your own images anyway.

A

Yeah that sounds difficult. I think another thing you could do that would be interesting is uh not to have to start a vm to build the image, given that this is kind of like a flat file system. So it's not could potentially build it in a dockerfile or something like that to produce the image and remove that one layer of steps that we currently do in image building.

A

So, if you're not familiar with the image builder with we're generally always creating a vm, whether that's on aws or azure, or on a vmware or whatever, to build the image, but there's nothing given that really with interested in files in a particular place on a file system, we could not do that even.

C

Yeah, we explored uh potentially a few different ways like we have an actual image builder, similar process for tinkerbell in general, for folks that aren't using cluster api, and I believe, that's called crocodile, because why not keep the tinkerbell references going and you can build everything from an esxi image to a windows image to a linux image and and stream it to and from an oci compliant registry.

C

However, for the purposes of cluster api, we wanted to try to maintain as much consistency with upstream as possible. So even though we could optimize some of those things, we saw more value in trying to do the same things with image builder, that the other projects are doing. uh Just for that consistency and ensuring that you have that similar experience.

A

Cool and that, oh, I didn't actually take a look at that, so that gz file is that a raw disk image.

C

Yeah, that's basically all that is it's just a gzipped. um You know uh raw disk image, it'll, eventually get written to written to disk, just with uh dd.

A

Okay, cool um yeah, so there's a bit so um given that I did do this before so what I played with this before so kind of reset the environment. I'll just talk you through the things that I did to reset this environment. So um it's today, this sandbox repo uh hosts the tinkerbell components using docker compose and I did see a repo, that's moving it to a kubernetes deployment. um But I was a bit too afraid to try that just yet.

C

Yeah well internally, uh within equinix metal we run tinkerbell and uh basically all of our management stack on top of kubernetes.

C

uh So for our purposes um having a first class kubernetes deployment for tinkerbell that works out of the box is beneficial to us, and I know there are other folks in the community that are looking to run it on kubernetes as well.

C

So we're putting a lot of work into that currently to try to improve that experience and uh eventually that'll be the default method, uh rather than the docker compose method that we have today.

A

Yeah and if you want to have a look at that, that's that the kate's hyphen sandbox replay um yeah, so going back to.

C

The fun thing is: is there some challenges involved with running tinkerbell on kubernetes as well, because I did mention the boot service, which does things like dhcp and tftp?

C

um You know, and it generally expects to be able to talk layer, two networking directly between the hosts so for the dhcp process. It's inspecting. uh What is the mac address of the machine? That's talking to me um and trying to ensure that within kubernetes, without just doing something like um uh node ports uh is problematic, because if you're tying things to a node port, you have other problems in that. How do you handle like reliability of those things?

C

So when you're dealing with the actual uh ingress or service networking, uh you lose things like the source, ip or the source uh mac address of things, and and we even explored things like uh using uh network uh plugins like multis and being able to do things like direct l2 connectivity and things get really complicated, really quick.

C

So um I think, where we're going to end up settling is probably requiring some type of dhcp relay or dhcp proxy uh when interacting with it deployed on kubernetes, uh so that we don't add in weird complex net uh cni requirements that will be difficult to troubleshoot, but eliminate that direct requirement for uh layer. Two networking between boots and the uh actual hardware that you're bringing up.

A

Yeah, that's right, so yeah the one of the reasons I didn't want to risk. It is dude. I do have the ability to bring up clusters and put in something like kubrick advertise, bgp and then like set up a dhcp relay, but I don't currently trust the dhcp relay in that switch.

A

So even for my home network, I was using relay, but now it's directly because it just was not working properly so um yeah that we'll get into some of the interesting uh behavior around dhcp and pixi, um which explains some of the architecture as well. So um what we're going to do is we are going to bring up the docker composer environment again. um The postgres data is a docker volume, so just make sure that's when uh cleaned off uh power off the vms, we um basically want to blank the disks.

A

So in this case we remove the disks from the vms and create new 100 gig ones that all fin provisions, so they don't take. They don't actually take up that space.

A

Then we start up uh docker compose up and then we've got one little hack, uh which we need for a cluster api tinkerbell, which is um there's an issue with events in tinkerbell, which I believe is like being currently being overhauled. um But we need that for the size of the templates that we're going to be using.

C

Yeah we initial, we initially created a um event-driven uh workflow for tinkerbell and the way that we built it um actually had some limitations, so we built it based on top of postgres triggers which actually have a size limit.

C

So when we initially tried to do this uh integration, the first time we created um the cloud init data and tried to put it uh within the metadata um tinkerbell uh just returned an error saying uh the message size was too large. um So uh for the purposes of this demo and older versions of uh tinkerbell, you have to disable the events. um There will be a new version of tinkerbell release where we've actually removed the events for now and we'll uh reassess how we want to implement that in the future. Without that size limitation.

A

Cool, so I guess what I'm going to do. Is I'm going to bring up this environment ish I'll, put mine, uh so just gonna reset this so um come on to the kind cluster in a minute as well.

A

Right takes a few moments.

C

And I will say, your setup is a whole lot more advanced and uh powerful than my setup is because if I flip my camera over here, this is my home network setup. Here, the small mini itx machine running the tinkerbell components, and then these aren't even nooks, they're uh small, dual core celeron machines that I bought, I think about four years ago now, so they are vastly underpowered cpu-wise.

C

I think they have eight gigs of memory so that they can actually run kubernetes and some type of workloads, but, more importantly, for the purposes of running tinkerbell and deploying kubernetes to it. These actually have really slow, emmc disks, um so things that take about five orders of magnitude more uh to do than if it was like an ssd backed or nvme, backed disk.

A

Yeah, that's good today, yeah, so um the vsan backing alignment is it's hybrid storage, so it's uh one flash dip disk in each server operating goes as cache and then magnetic storage. So I think they're four terabyte drives or something and then it's it's not quite it's like raid one-ish, it's kind of weird with vsan.

A

It's not it's not regular raid, it's so it's actually doing object, replication underneath so, if you've looked at ceph, it's kind of similar in concept, um but it's a lot more like it's kind of set up set it and forget it really. So it's kind of cool, um so I've got my got this environment, so I've got some stuff running now in rocket ps.

A

um I've got a bunch of things running great in that moment, and next thing you have to do is define some hardware, so we're going to load in and where do I put that I here we go hardware, so got my four machine, five machines here and um what I've done is they. These are random new ids just go to that uid generator.net and click the button enough times.

A

um Lots of doesn't have to be secure or anything doesn't matter. um We are going to be deploying to the sda disk on all of those um and we're going to give them ip addresses. So given machine, 0, 10, machine, 1-11, etc, etc, um copied the mac address from the vm uh information, so on vsphere.

A

So that's that's. How we're going to know what one is, what and we've told it ufi true and um we've turned on the allow pixie in our workflow, so go back to the vsphere environment. So.

C

One thing you mentioned there: I don't want to skim over too quickly, but you did mention defining a disc device in that hardware, and I just wanted to mention that this is a requirement of the cluster api provider and not of tinkerbell itself and basically, as we were going through this, our first iteration, uh we actually hard-coded this and the demo that we did was based on virtual hardware. So if you had hardware that didn't have a dev vda device, it didn't work.

C

So then we started exploring. How do we make it work more generally for hardware and it quickly became messy figuring out. How could we detect what this device to try to do and- and we realize that's- probably not the best thing to do- because uh what if we did detect the wrong thing, um you know, especially if you look at some like arm hardware and things uh you may have an sd card device in there and then you that is meant to actually hold your bootloader rather than your actual file system.

C

So if we detect that over, like an emmc device or an nvme device, um we could actually override the bootloader. Instead of storage, that's meant for our use so because of that, we require that the user to find within the metadata for the hardware that they're creating um what actual disk device that they want to use uh for the purposes of cluster api.

C

That way we can, when we generate the templates and the workflows later and the uh cloud init data, it's using the right device later on.

A

Great yeah, I can fully see that being up yeah, so um we are going to load that in so I I've been kind of following the instructions, though, but I haven't really paid attention to what's actually going on underneath so we're going to exec into this cli content, which I guess is a container that has the tink cli in it and we're going to push in these json objects which represent each of these machines.

A

C

A

The postgres database.

C

Yeah, the docker compose file uh does stand up a service running the uh container that contains the cli and has the environment variables pre-configured for how to actually talk to uh tinkerbell. um You could download the binary directly run it on your host if you pointed it to the right direction.

C

This was just a way to have folks avoid having to actually define those environment variables correctly and or uh command line arguments every time they want to run a command.

A

All right, I might just uh launch that see with this, which one's the container there.

A

A

Just tape: okay, cool! um So I've got this hardware this! Oh there we go, got our five machines there. um So just gonna! Oh sorry! Yes, no.

C

No, that's fine.

A

Just gonna see how these are doing so there's been a change in these machines. They've now booted something um what's going on here.

C

uh So at this point, um you're gonna go ahead and see that uh it will start the pixie boot workflow uh pixie boot process. It should get the hook os um and it should start executing uh that and and stand by waiting to run workflows.

A

Okay and what happened I, I can safely just restart this several times right now, yeah.

C

A

All right, so, let's do that turn the central delete there we go um so we've got this efi, so there is already a pixie boot in vm in this vm. So you might think why? Why do we load an another so we're doing pixie and then we're doing pixie again?

A

um Do you want to talk about why we might want to do that.

C

Yeah, so basically um the network boot environments for various types of hardware uh very widely in what type of things they support generally at the very bottom, you can rely that it should at least be able to do a boot p based protocol bootstrapping, be able to talk to a tftp, server and retrieve files from the tftp server and execute those.

C

That's about all that you can rely on being consistent between pixy implementations across default hardware, and while you could take something like ipixie and re-flash it uh on your hardware in some cases and get consistency. That way, that's just not realistic. To expect everybody to do that, just to be able to run something like tinkerbell so where we may want to support more things like within tinkerbell. You can actually, in the hardware, define an ipixi script that you want to run directly on the host and do things like that.

C

You can specify a url-based uh booting so that you can boot from an http server, and things like that. So that's! The purpose of bootstrapping, from whatever exists on the hardware into the ipixi environment, is, is that we know that uh we'll have a consistent environment with support for all of the uh network. Boot methods that we want to support within tinker bell from that ipixi environment.

A

Yeah so and one thing, no like tftp, so many many years ago I used to look after the desktop boot and install process for a university and uh tftp. Boot is extremely slow, like if you just launching a 200, meg windows, so windows has a pre-installation environment, not too dissimilar to what we're looking at here. um It's a couple of a couple of hundred meg, but it would easily take 10 minutes to download off a pretty decent gig network, because tftp is not really designed around performance anyway.

A

So one of the benefits of using ipixi is that we can switch over immediately to http protocol and get that performance benefit, uh and I think you can do strange stuff like iscsi boot or things like that exactly yeah. So I just got some comments from mali, using a ftp for hpc classes, which is I've not heard of aftp. That's right. Let's have a quick look. What that is.

A

We might look at that later fight um and I notice I've got. I can I get a shell in this um and it's got. I don't see much going on. Is there so we now move on to- I guess, maybe a workflow. So I what I haven't done here before so this is new proper, normal tdik style with doing things for the first time is, how do I run their workflow on a machine without cluster api being involved? How do I do anything here? Yeah.

C

So um I think the easiest way is: if we flip over and go back to the tinkerbell docs, um we can go through. uh There's an example there, where you basically just run the hello world docker container.

C

However, if we do run this, if you haven't preloaded that hello world docker container into the registry, it will actually fail to run on us and that's because right now, tinkerbell defaults to when it's running the workflows uh defaults to pre-pending uh the registry url that you configure for it. So it assumes that you're gonna have that standalone kind of on-premises registry.

C

However, we're looking at enabling direct upstream registry support with it to kind of limit some of the barriers to entry for the bootstrapping process for folks that don't necessarily want to run their own registry.

C

But here, if we look at the docs and it's let me see on my end as well, when you define a template and it looks like it defines a template down there a little bit lower.

C

um You can see it's just a basic uh yaml document that you create and you basically just define the steps that you want and those steps are specific actions and those actions link to some type of container image, and we actually publish some pre-configured actions to our action hub, which actually runs on uh the cncf uh through the cncf artifact repository, so that you can go there, find these predetermined actions and you don't necessarily have to write your own uh container image to to do some of the more common things that you'd want to do in tinkerbell.

A

Okay, um so I might I'll just give this guy, so I need to pull down this image and to click, and then I need to tag that.

A

With this vest and then I need to push it.

A

Okay, so we have it in the repository and we'll just uh let's, let's get running from.

A

A

That's all I need that's a workflow so well.

C

That's the template.

A

Oh okay, so this is template and I guess it looks vaguely similar to something like ansible or something where you've got a bunch of actions. That's going to happen.

C

Yeah and- and basically these are being, uh these containers are being run uh in this case with hook it's running within the linux kit, environment. There's a couple of other additional services running in there and you can mount in uh host directories they're running with privileged mode, so you can access anything on the local system that you would otherwise be able to.

C

If you were running natively with some exceptions, because things like reboot and k, exec actions require a little bit more trickiness, but either way you can mostly assume that they're running uh local on the system.

A

Okay, cool um so now we're gonna, try and load this in so uh oh yeah, so the container's not going to have access to my local disk. Is it do I do that?

A

Oh that's, step too far, so I need to create a template. I I need to put push the template somewhere.

C

Yeah, so um what you can do here is, uh you can basically run on this on the local host. You can use the docker exec and then redirect the contents of that file into standard in for the docker exec. When you run the uh template create command with tank.

A

Oh okay, so I can do just a cap on that.

C

A

And then pipe it in and wait. Oh, I have not looked at the commands.

C

It should be similar to the command that you ran for the uh the hardware just for the template create.

A

B

A

A

At pink, template create and use it like that.

C

um I think you have to give it a dash file and then a dash after that.

A

Oh, it's more fine.

B

C

Let's see how I did it on the hardware.

B

C

Okay, so I think you're gonna have an issue trying to use the pipe. I think you need to uh redirect the standard in uh after the docker command.

A

Oh, uh how do I, how do I do that? That's the.

C

So just uh should be similar to the command you ran up there for the uh hardware, create.

A

Oh right, yes, let's have a look at that.

A

One yeah is that people people watching you know my. um I thought I did that. huh Oh without the t- okay, fine, I'm not there. I I I'm not a batch expert.

C

Well, it got tricky too, because uh I I believe, if you use the um depending on whether using the pipe or redirecting the standard in uh it could it operates slightly differently as well.

A

Oh actually, probably would have worked if I didn't put the t, because obviously a pipe is not a tty. That's right!

A

Okay, right, okay! So now I've got a template in there right, so it's got template id. Did we create a template id? Was that something the system did.

C

That's something the system did it'll automatically create that, um and now that we have a template and we have hardware uh now at this point we can create the workflow to do that and um there's a kind of uh template uh syntax that you have to use. So if you take a look at the example there um and what this is doing is inside of the template, we defined um you know a placeholder for the hardware itself, so uh with the workflow command.

C

What we're doing is we're telling it uh to take the template that we created and apply it to the hardware that we're specifying via uh this uh other argument.

A

Okay, so we need to take this template id.

A

Here and then that device one I can put the ip address, I guess that'll be easiest.

A

A

We just alias that there we go right so now we've got a workflow, and so I go to the machine. Something happens.

C

So yes, however, um you're not really going to be able to see much because the hello world example here is running in a container and it's basically just echoing hello world out to standard out. So we could go in to this instance and we could go into the container and we can introspect the output and do all of that stuff, but that's kind of painful.

C

So in this case, what what is probably easiest to do is just look at the. What is the state of that workflow? So if we go back to the tank command, if we just hit enter there, you'll see that there's other uh sub commands for workflow and uh well. If you do a get, it's just gonna show the workflow itself. um There are other sub commands that let you see things like the uh the events associated with that workflow or the state of it.

C

So if we look at the state, we should see that the state is uh completed for it.

A

B

A

Okay right so we've got and we have this state success, so it succeeded. So what happens if it fails? Somehow what.

C

So this is the overall state of the workflow. You would see that um it, there would be a failure and then you can look at the actual uh individual events and you would see what events uh for that workflow actually completed and and which happened.

A

Okay, cool and does stuff retry if there, if it fails or is it just it's a one-time execution.

C

Yeah, so um it depends on uh how the failure occurs um and, and one of the things that we'll see when we investigate things a little bit further, is at this point uh some of the actions like k, exec and reboot- don't actually finish so. The workflow stays in a partially completed state and uh during a reboot it actually goes through and reruns that last action.

C

So any hardware that's configured to run the workflow for cluster api. If that hardware is rebooted, it'll actually start running the workflow again and k exec into the uh into the kernel that was uh installed on the machine uh and it'll just continually. Do that every time it's rebooted, um but if it had a hard failure, it won't necessarily retry that you would be expected to um create a new uh template and or workflow to re-execute it just because we can't be sure it would be safe to retry or not.

A

Okay, cool yeah that makes sense and how how how how are things communicating? How does the machine know that it's got something to run the docket containers to download and execute.

C

Yeah, so uh basically what's happening is is uh when the machine is given that initial uh hook or oc image and boots into that um there's a service running on there. That's doing nothing but uh polling uh the tinkerbell service and say saying: is there a job for me to run it's running this uh tink worker process just sitting there waiting to actually do something?

C

So that's the only purpose really of either hook or osi in this case is just you know. Is there any work for me to do or not? uh Do I continue sitting here or uh if things are complete, uh you know or or do I need to reboot things like that? You know all of that's happening through the definitions of the workflow, rather than uh the uh hook or oc image.

A

All right, so my machine is connecting to this port, I suppose and pulling it figuring stuff out what it needs to do and cool, um and how are things like secured between the different components.

C

um So that depends um right now. We've been mostly focused on feature completeness. Most people are running this in a data center, not needing to worry necessarily about traffic being sniffed on the local network. That sort of thing, but we do plan on doing further work on security and and we'll probably try to push it out more towards uh the edge with uh the kubernetes based deployment, so uh tls would be terminated at um you know.

C

The ingress point and uh you'd have uh more clear traffic going on beyond there, uh but we do want to make sure that um we enable that security over time. Just it hasn't been a priority as of yet.

A

Yeah, that's that's fair enough! So um and at any moment has this hegel been involved this one, the metadata endpoint or is that going to be used when we start bootstrapping with cluster api.

C

So right now um it really hasn't been involved.

C

There is some integration with some of the other components as as far as hegel is concerned, uh but nothing that would really impact. What's going on here, we're going to see it come in more to play when we're generating that uh cloud in it. Bootstrapping data with cluster api cool.

A

And I guess that's what we'll might give a go right now, so um what I'm going to do? I'm going to use kind. So if you're not familiar with kind kind, is a way to run kubernetes clusters in your local environment using docker, so it's kubernetes in docker and you can get that with brew, brewing, store kind, we'll get you that um and you can go to the website.

A

And yeah, so you can go there quick start here and so we're just going to create class there. So it's just going to be a single node control plane and that's just with a kind create cluster.

A

um So it's going to live, live alongside all the other docker containers. um So this node image is a. I think, something a bit like a debian or ubuntu looks a bit like a host image. So it's got a system d in it boots kubelet, runs container d and is going to run cube adm, which is a kubernetes bootstrapping tool. uh So these all kind of components that we're using in cluster api as well um yeah. It shouldn't take long, it's a bit slower than my desktop, but it gets there.

A

I did have to bump up the ram and the cpus on this quite a bit.

A

All right should be ready in a minute.

A

And what we're also going to use is we're going to use something called tilt which, um if you haven't seen the video from ellen korvs, uh did released a quite amusing video on what tilt is all about. um It's in the show notes, uh so go take a look at that. If you want um not now do not go to the link now go later after this live stream. um So uh tilt is away. It's like a local development environment. It operates against the kubernetes cluster that you've got locally.

A

In this case the kind cluster uh it allows you to do hot, reliable, uh hot, hot, real life, hot reload of code. So you can make the changes to your code and have those containers rebuilt within seconds and running. So we have we've supported tilt in cluster api as a development tool for a long time, um we'll not go into details of that um tilt setup. So there's a, but there is a tilt file with that ships with cluster api.

A

It's written in starlock language, skylar, starla, the one that bazel uses, um and we have some settings that you need to put into cluster api if you're using it with um uh tilt. So I've um you have to configure this. I don't think you really have to configure this default registry, but that just happens to be um a google container registry.

A

um We're going to load in the tinkerbell provider goes go which is in an adjacent directory and finally, we're going to enable tinkerbell and the different components of cluster api. So if you want to not like, I said we're not going to go into massive details around uh cluster api architecture and stuff, oh that's the wrong book, but.

A

You can go to the cluster api dot, sig stop kate's, I o and we've got all the documentation there, and so it's bunch of components and nevertheless, there's a bunch of components. There's a core cluster api component, there's something called kubaydm bootstrap and that ties in with the hegel stuff with um just discussing a moment ago. So it generates cloud init boot data um and cube adm control plane, which is a stateful control, plane management controller.

A

um We're gonna, give it some uh stuff. uh We can ignore this x, plus the resource set, we're not using it, really um we're giving it the um we're, basically pointing it at the tinkerbell server and then finally got the tinkerbell ip and I, given that I've got that kill not that kill. That kind plus has come up successfully.

A

I'm gonna go into the cluster api directory and I'm just gonna check something here.

A

um Yeah, so it's now doing some stuff, uh and hopefully I have pipe. I have tunneled this correctly, so this is running on that sandbox vm in this other um ssh I import forwarding yeah. So I should be able to.

A

Oh jeez, I just locked my desktop.

A

That's all sharing! That's all good! All right.

A

All right, so, um let's not do it: okay, yeah! So it's loaded in a bunch of custom resource definitions uh that cluster api uses we'll go through some of those what those are in a minute um and I believe it's gonna- stop building a bunch of stuff. I have previously ran this before so it does take advantage of cache builds um or at least the way that we set up in cluster api. So a lot of this, I haven't really changed the code.

A

So it's going to take use of the it's going to make use of the go, build cache. The docker layers won't have been changed, so it should be mostly a matter of loading in uh the cluster api components um and they should be up in a minute or two. um So whilst that's happening we'll go through the cluster that we are going to deploy in deploy into this kubernetes cluster, um so let's shrink that there you go there. You go right, so first we're going to give it a pool of resources.

A

So um we need to tell um the cluster api provider for tinkerbell and what hardware we're going to use. It's just gonna act as a pool of hardware. Is that right, jason.

C

Yeah, so um basically, what we did is um we could have just created uh a provider implementation to just talk directly to tinkerbell itself, uh but one of the things we wanted to be cautious of is that we didn't just assume all hardware available in tinkerbell is available for us to use uh within cluster api. So we have what we've called a shim layers in as well, where we define crds and have controllers that match up against the actual tinkerbell resources themselves.

C

In this case, the hardware that we're defining matches, hardware, resources and it and it'll- expect that that hardware already exists in tinkerbell and uh once you create that you'll actually be able to see the status and all the things that exist and are defined in tinkerbell directly here, but that shim layer also allows you well it doesn't allow you to actually create hardware in tinkerbell right now. It'll actually allow you to create those templates in the workflows that we were looking at doing against tinkerbell itself.

C

You can create those as kubernetes resources, rather than um having to talk directly to tinkerbell.

A

Oh cool, so that those workflow and template yammers, I could just could have applied them directly through kubernetes.

C

For the template and the workflows- yes, not for the hardware today.

A

Okay, great so um next up, we've got a clustered definition, which is in a weird order, but never mind. um um So we have a kubernetes cluster with a cluster api cluster resource, which is a cool demo, and it's going to be in its own namespace um uh boy. Didn't need to do that, but anyway um podcidr.

A

So that's going to be. You know just what what pod network is going to use? um What services and services idr pog standard stuff there um we've got a definition for the control plane, um so this uses cube admin underneath so what way class api is going to work? It's going to create some machines.

A

It was, and it's going to get. Kube adm, which is a kubay dm, is a cli that can instantiate a kubernetes cluster. If the machine has kubla and container d installed, it will download the images from the gcr where all the kubernetes artifacts live. It will configure static pods, so you can put um pod definitions in slack g, slash, etsy, slash, kubernetes, manifest and tubelet will just run them, so it sets up xcd api server and just runs them.

A

um So one thing that I did see that was interesting is so: if you're working with a public cloud, a provider id, is going to be something like your instance id and what you have in the kubernetes cluster is a cloud provider integration. So it's a mechanism that can connect to that api, uh whether that's aws, gcp vsphere and he's able to get some information about that host and give it and find out what the unique identifiers.

A

So I guess we're dealing with like generic hardware, so that doesn't exist so so, instead, we've just got this provider id. um What's going on here,.

C

So um yeah you mentioned the cloud provider interface. uh Eventually, we may create a cloud provider for tanker bell, but we didn't want to make that a prereq prerequisite for uh being able to get something going. So basically, all we're doing today is we're passing it in as arguments within the template and the controller itself will actually get the provider it'll get the hardware id and it'll generate that provider id itself.

C

So cluster api takes care of that and because we're setting it through the config that ends up getting associated with the node at creation time, rather than needing the cloud provider controller to come through and set it after. The fact.

A

Great thanks um and I'm not sure if I needed to set this, I think I think I've kind of screwed up my deployment initially, but um so I've got a machine template for tinkerbell and I've set the image lookup base url to the local repository. I think I originally had one. I misconfigured it as 191 to say 1.1 when it wasn't. 190 wants to say 1.1 at all um the whole thing exploded, so um yeah I've got that.

C

If that's not specified it'll default to using that tinkerbell ip to associate with the default sandbox configuration.

A

Yeah yeah, that's what I noticed later um and so we've got um so if, if you're looking at this for one of the data providers for the aws, you would declare what instance type you're using like what this you're going to attach and those kind of things. So, given that these are bare metals, don't really need a lot of information around and.

C

We'll eventually want to add in support for that as well, so that you can define via the metadata more details about the hardware- and you can select hardware based on that, but it seemed a little overkill for the initial implementation to try to predefine what that should look like.

C

So we figured start with just using the hardware that you pre-define through the shim interface, and then uh you know we can add figure out how we want to add selection capabilities. Later I did want to skip back to something that you did kind of fly over there, but in general, if you're, using the sandbox deployment- or I think, even with the network configuration that you have on your local machine.

C

If we had left the pod cider block to the default configuration, you would potentially have a conflict between the local network configuration and the pod network. When we deploy cni later.

A

Yeah, that's right because I think it uses 190 1681, slash whatever. So, yes, uh if you've ever had the misfortune of uh deploying kubernetes cluster, where the pod ci pod ips clash with the node ips very strange things start to happen with your network. Do not to always be aware uh of that! um Yeah we've been thinking about. I think, for a long time trying to put some logic into cluster api to sort of detect where you might be about to clash um yeah. So I think that there is an open issue around that pr.

C

Welcome well- and that's always challenging too, because right now we don't assume that the core cluster api components really know anything about the infrastructure providers and when we're talking about things like this, you actually need to get that information from the infrastructure provider generally. What we would call the cluster infrastructure provider to say what is the network that we would expect these hosts to be deployed on?

C

So it's almost a challenge like how do we make that information available without doing things like creating a whole new crd resource and a whole new set of controllers, to do nothing but kind of ferry that information back and forth? So it's it's a quite complicated problem. When we look at it.

A

Yeah, that's right um and then we've got a machine deployment. So um if you're not familiar with class api, so machine deployment is very much like a deployment in kubernetes so except we're using machines. So we use a similar sort of abstraction, behavior and cluster api. So we have a machine uh groups of machines and machine sets and then we're able to do upgrades or roll outs between uh machines using machine deployments, which will then scale down a machine set, create a new one and scale that one up etc so very similar.

A

If you're familiar with kubernetes resources, this shouldn't be, the idea is it's: it shouldn't be completely abstract um for you uh and then we've finally got a cube. Adm config template, so these are going to be used by the worker nodes, um so sep. So it looks a bit similar to the control plane because we're using cube adm underneath um so it's pretty much very similar to that, except it doesn't need the information about the stuff. That's only going to be on the api server, so we are going to load this in.

A

A

Okay, so created a sandbox and I'm going to first add the hardware in.

A

A

A

Effects on me doing that right.

A

I think you're oh you're, mute.

A

You're mute, jason.

C

Sorry uh yeah, so I was going to say um this actually didn't have an effect, because the hardware template and the workflow resources right now are cluster scope, not namespace scoped, and that was just because we haven't defined yet a um a multi-tenant story around. How would we, how do we want to talk to multiple uh tinkerbell services? So that's some that we're looking to solve in the future. Just not now.

A

Oh okay, I wondered why it was working that explains things um all right, so um you can see I've like it's pulled information from uh the tank server, so in the status and one of the cool things, one of the handy things about using clip for this is we can browse the logs as well for for these controllers. So we can see um we can see some information coming in here. It's all nicely json formatted! If you want to put that in your login just server.

A

Yes, that's pretty good and then now we're actually gonna deploy that server. So.

A

All right, we've created some resources.

A

A

See we've got one control plane machine so what's happening is the kubernetes control? Plane controller has templated out a machine. That's going to act as the control plane uh tinker bell has then matched this with one of the hardware pool um uh one thing: that is there an easy way right now to find out what machine that is.

C

um So not right now we could have chosen to do something other than the uuid for the provider id. However, I wanted to avoid any potential issues that, if machines are mutated in some way and ips are changed, and things like that that you don't generally match up with something, and since that uuid is the distinct identifier for tinkerbell.

C

The uid made a lot of sense there. On my local environment, I actually cheated a little bit and I modified the uuids so that I can easily know which machine is associated which, with which you you id that's.

B

A

C

To say, by the way,.

A

Vote it yeah, so um I guess we've probably seen so far. I know is that we have some stuff. That's loaded, something's happened on this machine um and one of the things that was interesting. So what I can do here.

A

Is we should see a whole bunch of traffic going from htv, so it's downloading something from the http server, and that would be that if we go back to the s code, that if we go back into the reboot, is it's downloading? This image, disability.

C

I was going to say we can actually do a cube, cuddle describe on the template and you can actually see the list of actions that are going to be performed on that host okay through there, and this was all generated by uh the cl or the cluster api provider. Tinkerbell controller.

A

Oh okay, so right so it's going to stream this image so it I guess it downloads it into memory and then unpacks. It.

C

um So it actually uh does it mostly uh through a pipe, so it's going to compress it through a pipe and write it at the same time, so it uses uh as it'll use memory basically for a buffer, but you don't necessarily need all the memory available for the uh for the image there.

A

Cool and then the next thing it does is it's gonna write a cloud init config and yes, this is it's doing something a bit interesting here. Isn't it so it's changing.

C

That well and right now, uh because uh we are using a heagle uh for the metadata service and it is somewhat ec2 compatible. uh What this is doing is basically writing the configurations to tell cloudinet to use the ec2 metadata make sure to not treat it as a strict ec2 metadata instance. Otherwise, it'll fail and it defines a default system user that you can then assign later uh an ssh key or a password to uh through the cluster template.

C

uh If you wanted to be able to access it um and because cloud init is a bit wonky at times, you'll see that right below writing that file. We write another file which forces uh the metadata configuration to work too, because when it's bootstrapping, you have fun things like determining determining which metadata service to use is done multiple times, so we have to define it in two places to force the ec2 metadata service to be used.

C

The other thing is, is all of the things that are happening in cloud. Init are happening before the network service comes online because we're not fully uh compliant with an ec2 metadata service, and if you do a detection from the local host and you do a dmi to code, it doesn't show as amazon hardware to trigger the ec2 metadata service.

C

So, if we're going to interact with metadata, it has to happen before the networking stack is brought up and that's the purpose of why we're using that 1 uh 169 254 address is that's actually a link. Local address that's available before you actually configure networking on a machine.

A

ah Interesting, I didn't realize that that's cool, um I don't know how that's working on the switch level. I guess it works. um I did notice this warning um thing. I guess it's just it's just a extraneous event. That's been from controller one time. I guess that's.

B

A

And let's, let's see how this might be ready.

C

Actually- and another thing too, is at that last little bit there, uh where it's saying uh what, uh where decay exec into um this was fun too, because, depending on what type of block device you're using determining what the first partition should be on, that device differs based on the type of block device. So in the case of things like sda, hda and vda, you just append a numeric digit to the end of it.

C

But if you're dealing with an nvme device or an emmc device, you actually have to put a different suffix on the end of it, so that it can find the right partition.

A

Right yeah, that makes sense, um so I believe this machine yeah it's already rebooted into so.

C

A

Allows it to not have to actually do a full reboot right. That's.

C

What I was going to say is it's not actually rebooting after it writes it once it gets to that k, exact action, it's going to actually load that kernel uh into memory and replace itself with the uh with the kernel that we just wrote to disk.

A

Right so you can see here um got the output, which probably should we need to check actually with the other class api providers. We probably should not be displaying the join token on the console, but I think that's the issue with all of our providers. We need to check that, um but you you can see the cuba dms finished and it's we go to here and you can do kg kcp and we can see. We haven't initialized control plane, it's not ready, and there is a reason for that um um which I can.

A

Got the command now.

A

Waiting so just going to switch to this new control plane.

A

All right, so we have a not ready, node, so uh class api is more or less happy to continue with that and the reason it is because that is because we haven't got. um We haven't, got a cni installed, so I'm just going to quickly install a cni.

A

Vmware so I'm going to install andrea.

C

And I think the biggest thing here is is uh to keep in mind when you're installing the cni, some of the cni providers require you to pass in a configuration if the pod citer doesn't match the default. One used by cuba, em and kubernetes, so in this case andrea, and doesn't require that, so we can just apply it without worrying about it. If we used, I think older versions of calico, you would actually have to modify the cni deployment before you deployed to it, because we are using that non-standard pod cider.

A

Yeah, I think they fixed that in the calico operator, but not sure so yeah. We know this one works so well. Well, at least it got me to a ready state eventually uh does take a little while to download, though pretty good a bit yeah. So uh I only have adsl here. So it's not kicking. It's not.

B

A

Internet, so that's what that is um by the onset, keep config to go back to uh so we still have the one. Oh yeah, we don't actually haven't actually defined any machines and uh forgot about that. I was like sitting there watching something. Why.

B

Am I going to other machines.

A

It's like oh yeah, it's not replicas are zero, so that would.

C

Yeah anyway, once you scale up the machine deployment here, you'll be able to bring up worker machines. uh We haven't yet uh added support for um h8 control planes. Yet that's one of the things we're going to be looking to do here in the upcoming months, we're going to leverage the new cncf sandbox project cubevip to be able to deploy an h.a control plane. However, there's some challenges there, because what ip do we use for that? So we we need to solve. How do we do that?

C

Ip address management because we've deferred it with the actual hardware instances by requiring that being predefined in tinkerbell, but for the purposes of assigning an ip to cubevip, we'll need to actually have an actual usable ip address. That's not going to conflict with anything else on the network.

A

Yeah um yeah, that makes sense.

A

Yeah, so I I mean we have so I mean those requirements of bare metal. Are not this similar to everything in uh on any sort of on-premise environment if you're using research as well so I'll, say everywhere, we're going to work together and just to make sure you know you might be thinking, oh, what vmware about metal? It's like! Oh, you know for the way we see it is um you want if you're going to deploy kubernetes cluster, you want to have it on some form of programmable infrastructure.

A

All right and vsphere provides that for you aws. Why is that? For you? You know they. We have well-defined apis. That's spent many years in the making, and that makes it super easy to get a full, fully featured environment. You can deploy kubernetes clutter and vsphere and you're going to get stuff like vsan you're, going to get provision volumes and all those sort of cloud type benefits right.

A

So, if we're, if we I'm, if we're going to do that on bare metal, we're going to make have to make that bare metal programmable in some way right- and this is where projects like tinker bell like- are showing that promise in terms of providing a programmable infrastructure for bare metal. So it's a really interesting space for us and like a lot of the similar concerns. So this is an exciting area.

C

And I think one of the most exciting things for me is, um I think you see a lot of the existing work. That's been done on doing kubernetes within the data center and a lot of that work tends to be.

C

It tends to adopt a lot of the practices of how you manage hardware prior to uh cloud-native uh methodologies.

C

um So you tend to want to support things like doing things in place because doing things like rebooting a server in a data center can take anywhere from like 10 to 20 minutes, depending on what needs to happen to initialize the hardware at boot time. um Things like that. So uh you know what we really wanted to try to do when we developed the cluster api for tinkerbell is is, could we take a real, modern, cloud-native approach and do it with hardware, and so far uh it looks really promising things like being able to do.

C

The k exact into the kernel means that we don't have to do that reboot and wait for that infrastructure to come back up things like being able to leverage a hegel. Metadata service means that we can leverage things like cloud, init and predefined.

C

Os images like golden os images that are streamed on demand and treated like ephemeral, instances and sure we do have some problems to solve like how do you run stateful workloads in this environment? But I think that we're starting to show the promise of what it really means to do. Cloud native management of bare metal infrastructure.

A

Oh thanks um so actually just got the cube config back and it's now not working.

B

A

A

I put maybe that maybe I did need to configure that android and I've now bought its networking um yeah. I know, but we might still think um don't know what's happened. So what we got here- and this is.

C

B

A

Yeah, that's right also. I bet my host is it's one of them screaming at me right now. There.

B

A

C

Did you want your screen up nadir.

A

Oh yeah, that's right yeah, so uh one of my hosts has got a big red alarm next to it uh complaining about memory, usage and cpu, so that that's fun. So that's uh all very normal. That's not a lot of memory in these consumer grade hardware. It is it's! It's just been slow. It's fine! Okay, I've, probably saturated the network and overloaded the cpu. So it's fine, um so I've got some of the other machines coming up.

A

They will go to ready as they download, andrea and get running, um but we're just talking about them uh actually deploying on hardware, so maybe I'll hand it to you. Jason you've actually got your real bare metal rather than my fake, bare metal.

C

Yeah I mean my bare metal is still kind of fake bare metal in that it doesn't have uh server grade uh baseband management, but I think that creates it makes it kind of fun too, because let me switch my camera over here um so have a little uh there in the corner. I have my actual infrastructure, I'm going to speed through this a little bit because it's going to be a lot of what nadir just showed, but um let me go ahead and create the actual hardware resources.

C

I've already stood up, tinkerbell predefined, the hardware and tinkerbell. I already have my cluster api stuff already up and running so now, I've created that hardware and let me copy and paste the command for creating the cluster, because I want to make sure to override that pod cider as well.

C

So, in my case, I'm just going to use the cluster cuddle command and because the image that I built, I built it a little while back I'll actually be deploying kubernetes 1.18.15 to match the image that I pre-built I'm going to override the podciter uh to 172 250.0, uh because in my case, the network that I'm deploying on is 192.168.1.0 24., so uh that would conflict with the default podciter.

C

So now, if I cluster api here I'll be able to see which machine was assigned to it. In this case, I did say that I cheated with my uuids a little bit, so I know that this is the hardware that I'm marked here at a so let me go ahead and power that on and uh when I mentioned uh that this hardware was a little bit older. um It's not going to boot up as fast as nadir's vms did so, where his probably booted up in roughly a minute or two.

C

This machine will actually take about five minutes to fully bootstrap and again because I don't have remote management, I don't have a way to show the console on that machine uh kind of nicely, but similar to uh everything else, um it'll be going on in the background. I can go ahead and get the cube config for this cluster and it'll mostly just sit here. If I tried to do get nodes on it until that finishes bootstrapping and everything I don't necessarily want to keep everybody waiting on that.

C

um So, while that's processing um is there anything else that we wanted to talk about, and we can go back when this has had a chance to actually stream that image to disk and actually boot up.

A

Yeah, let's, uh let's go through some of the comments, because I have not been paying attention, uh so this is not as polished as maybe some of the people like joe I.e, not joe um uh we've got one question: is the new bootstrap os, also written in c or go? I guess that's referring to hook and ac.

C

Yeah, so um oc is uh very much more of a traditional type of os image, so I believe it's based off of alpine, so it basically looks and feels a lot like an alpine os there's some shell scripts involved with oc to bootstrap the tank worker, the tink worker itself is written in go, but when we look at what's going on with hook instead, like I said, it's a much more streamlined os.

C

So um it's also sort of alpine based in that it's based off of linux kit, but linux kit actually is a way to declaratively, build more or less a linux distribution.

C

So it it has a look and feel similar to alpine. But when you get into it, things like troubleshooting, it are, are slightly different. um Everything's run uh within uh it runs container d by default, all of the services that run uh within the os or run within container d um and and that sort of thing. So it's um it's not as easy as saying whether it's uh written in c or anything, uh because, just like the kernel like normal kernel and things like that for the linux kernel, those are written in c.

C

But the interesting thing for hook is the services that we're running. We run a docker service in there so that um you know you have the normal docker interface instead of having to rewrite the actions and tink worker to interact with container d directly.

C

So that's just standard docker service running in there, so docker's generally written in go but then there's uh some of the helper things like uh the tool to write tank worker. uh There's a like a restart helper to be able to help uh issue restarts and things like that, uh because you can't do that directly from a container without some additional help on the on the host side.

C

um But I hope that answers the question uh because it's I it's a really complicated answer or a question to answer, uh but where possible we use go when we're writing uh something ourselves. uh Just because that's what we're mostly familiar with.

A

Yeah, that makes sense, so I guess linux kits newsland, where it differs from a standard. Linux distribution is go based, but you know you're, not it's going to be a long time before we move away from c I mean we are talking about rust appearing in the linux kernel, uh we'll see where that goes. um Had some questions about the home labs. uh I am probably not the best person to talk about home labs. If you want demos, uh william lam and I will put a link in the show, show notes.

A

uh William has a website which has or everything you ever want to know about vsphere home labs and how to build them, how to build them, cost effectively how to get discounts and that kind of stuff and I'm sure, there's a lot of youtube. I will try and dig out um some youtubers where he's gone through that home labs and put that up. uh um Yeah I've got a question from jay uh interested in the vagrant recipe. um I guess this is particular relevance because a lot of the kubernetes on windows work.

A

It's been done in may green.

C

Yeah um so the the vagrant setup, um it's kind of weird um and- and let me um see here, let me open up a new window here and I can switch over to the sandbox.

A

Well, I can show it for mine, actually that's easier.

C

Oh no, this is fine. I got it right here. um So in the sandbox uh we do have the vagrant file um a lot of the things that are happening to actually configure uh tinkerbell aren't actually happening in the vagrant file itself, um but we did have to do some funky things in the vagrant file to make things work. One of them is, is we don't enable? uh We want to make sure that we're not doing parallel instantiation, just because we don't want the workers coming up with the initial provisioner machine?

C

Otherwise you end up with some weirdness there around the type of networks that you're defining, because we define the network for communication between the workers and the provisioner when we define the provisioner and then we just use it in the worker. So if we try to bring those up in parallel, you get weird failures: being able to do things like configure the number of workers, whether you want to enable the gui, if you're, using.

C

Box uh we do have some of the fun things like you can disable configuring the nat so like on my setup, I don't configure the nat, so I opt out of that and don't have to worry about uh things forwarding from uh the worker machine through the provisioner, because if you start looking at the way that like virtualbox does networking, if you stand up a separate separate network, you can't define a gateway on that network uh to be able to directly nat out of the bridge device. But you can do that in libvert.

C

So this combination of this configure nat and this lib forward mode give you a way to basically opt out of the gnat and be able to do uh direct network connection from uh the individual worker vms uh out to the internet um directly. But we can't do that with um virtualbox and if we supported other providers, you know the you know. How do you configure networking across those gets fractal and messy pretty quick?

C

um The other thing is uh everything stays pretty similar uh in here I upgraded the latest uh fedora version, um and I found out that now it defaults to trying to use the user session with the vagrant provider and that causes all sorts of weird things when you're trying to do funky things with networking.

C

So I need to see if we can add something upstream to better support that use case where it tries to detect the right thing um and then here all we're doing is uh syncing the folder, and this is making sure that that state information uh that's available on the local machine is available within that vagrant box for libvert it does that automatically over nfs by default.

C

I don't remember what virtualbox does, but that's so that things can bootstrap faster a second time if we have to, and things like the oc image were like two gigs, so it's like, if you had to like bring the environment down and bring it back up. You had to wait for two gigs to download over whatever network connection you had, and that was messy.

C

um Here's where we're doing some of the network stuff. um You know we're we're basically telling it. um You know to use a private network for the purposes of uh being able to talk to the worker machines. uh There's always a default network. Adapter, that's configured for the purposes of being able to talk to the host and back and trying to make sure that everything talks right and the right traffic's going over everything. So we had to do a lot of playing around in trial error to make sure that everything works there.

C

Other things we're doing is we're forwarding the ports so that, if you wanted to talk uh to take a bell directly from the local host, you can do that, but other than that, there's there's not much uh to do. um We do have a minimal set of memory that we want to make sure that the provisioner has that sort of thing uh all that's pretty standard.

C

I think, though, the workers get pretty interesting because um we try to sit there and uh just have the one layer two network uh only and uh telling uh uh vagrant. How do you disable uh the host? Only connection gets a little bit weird, so we had to play around there a little bit um but yeah.

A

Cool thanks hope to answer your questions. Jay um yeah! So um oh going back earlier with a little comment, so I was looking up. The wrong thing got eight: it's a tftp to speed up tftp downloads, so from looks fingers it uses um multicast, which is, I think, something that the windows pre-installation environment, supported as well. Back in the day. um A little comment about in the hpc environments, using bittorrent to speed up boots for whole hpc classes fairly, interesting uh yeah. So that's pretty cool! That's ended questions how's your host, doing jason.

C

Well, we do have uh the hardware available now and we do have a not ready, node um in my case I'll, be deploying psyllium to there, just because uh that's another uh network provider that doesn't require you to configure the podciter by default and similar. I I don't necessarily see a point in uh going too far through this, but uh once that cni's available, this will show is ready. I could also scale up those machine deployments in a similar way.

C

um You know and that's running on this little uh funky hardware setup that doesn't actually uh support a lot. One of the few things that actually does support is being able to do default, network booting, but other than that it's got no baseband management, they're, really old, dual core celeron processors.

C

I think somewhere between four and eight gigs of memory and a really really slow, emmc disc that doesn't even perform anywhere near what an ssd or even most sd cards would uh do for throughput. um So it takes about um generally about five minutes to bootstrap these instances compared to um something that would be faster, which would be closer to like a minute.

A

Cool and I know so like when I built hook, there was a arm image, so is there can can this work with, say raspberry, pi, bits and pieces? I haven't really looked at how pixie works in once we buy.

C

So from the perspective of um tinkerbell itself, yes, uh it does support uh being able to bootstrap uh arm hardware.

C

I've had some fun because all of the arm hardware that I own is really funky around network booting, like one of the devices I have is a macchiato bin and um depending on how you boot the uh instance, you can get uefi hardware for it. They have a build of edk-2 that you can put on there, but what you see with it is is when it gets to linux.

C

The acpi information doesn't get transferred over to the linux kernel after the boot process comes up, so it comes up with a completely different mac address, so it doesn't actually finish bootstrapping properly, um the second time it dhcps it gets.

C

uh You know it doesn't get an ip address, because the mac address doesn't match what uh you configure it originally, and it uses like a weird default uh mac address scheme that you can't even tell hardware to be because it's not even a legal mac address, um and I I don't have any newer, raspberry pi 4s that support uh proper network booting and the raspberry pi 3s that I have uh don't quite have uh support for it. So uh you know tinkerbell itself.

C

Does uh the cluster api bits uh the biggest thing that's holding us up right now, for that is uh supporting image builder for arm-based devices.

C

There's a lot of stuff: that's hard-coded to x86 specific packages and binaries right now, but we do plan on helping contribute support for arm within image builder and then adding in uh probably a field to be able to select what type of architecture that you want for the os so that we can do the right thing like build the right image name to use to stream the image and uh also for you know, even determining what you want by default, which will probably default to x86 for now, because that's still the most common but arm is definitely uh something that we're going to be looking at doing later.

C

This year,.

A

Sounds cool yeah I mean we definitely need to get arm support sorted out for aws as well. So I've got one last question we'll take. uh I have a couple of old laptops. Can anyone think about on my ubuntu machine and boot those machines using tinkerbell.

C

um So very likely uh you should be able to do that. um The biggest thing would be is: does the hardware actually support network booting, because there is some hardware that uh doesn't, I think most modern hardware now does?

C

But if you look back like I know a decade ago, if you didn't have the right network card in your machine and it didn't have a boot rom uh physical boot rom chip on it, uh it couldn't actually network boot, but I think with uefi. uh Most hardware is capable of it now.

A

Yeah, I think one of the interesting you probably need to have an ethernet port uh with that laptop and with a usb ethernet. That's not gonna have to boot one so yeah. If it's like a lenovo or dell um and they've got physical ethernet ports built in or you're able to plug in a pcie express one with a boot one. Then I think it. I think you should be good to go, um but yeah.

C

Yeah and for the pi for I haven't actually played around with them. I don't know if you can use the default os. The default u-boot that's available on the pi 4 to be able to do a direct, compatible network boot.

C

I know in previous pi instances you had to configure network boot differently in order for the pi to be able to network boot, but there is a uefi firmware that you can deploy on the pi 4. That should uh work completely out of the box with uh tinkerbell.

A

Yep and uh to run the ftp server does it have to be on the host or call it one-on-one? uh You need to expose that tftp server onto the network, don't know how you, if that's even possible and kind, especially if it's running on mac mac os on your printing machine. You say: if you're able to port forward it somehow and maybe there's instructions and kind to do that, then maybe it might work um yeah.

C

So I went down a big rabbit hole with this, and I spent a couple of months trying to see if I could stand up um everything that you would need to be able to deploy cluster api and tinkerbell into just a kind cluster, and it gets really messy with the way that networking works.

C

You know, especially if you're looking at non-linux os's, because on a linux os, you can tell kind to use a specific bridge on the linux host and then you can access direct traffic to the kind instance there. So you could do things like expose boots on a node port on you know that kind cluster to a specific network bridge and then, as long as you have a vm or whatever, on the machine connected to that bridge.

C

Everything would work great. However, the way that things like the networking with max and windows work. It's not that direct.

C

You can't just connect local vms to that, and that got me down to what what, if I just ran the vms inside of kind as well, using kubevert and that got ugly, because then you start having to deal with things like multis as a cni solution and being able to enable direct l2 networking between the machines and even when I did that using things like um uh multis with the the cni bridge interface, you still get weird things like the ip address getting rerouted and it turned out that I thought there was going to be a solution with uh one of the different types of networkings uh settings that would allow you to sit there and basically short-circuit having to run through ip tables for the bridge network, um but that only works if you're using it on a physical machine, because it creates the interface um on the actual physical host, even if you create it in a in a container.

C

So it's it's really messy. Once you're talking about trying to do pure l2 networking directly in a kind server.

A

Yeah that makes sense um yeah it will get tricky. So uh I think um just asking, if you have pointers links, are running all kind from lit uh from something I guess I think you should go to the kate sandbox replay is probably the best place to go and.

C

Probably not today um only because that kate sandbox is uh also running uh in vagrant right now. It's just setting uh up the kubernetes deployment on uh the provisioner machine, and that was basically to avoid the same type of um networking message we're talking about here.

C

I don't know if I have any documentation to get it all working from a linux machine directly, but there's a flag that you can pass to kind or maybe it's not a flag. Maybe it's actual part of the config. When defining a cluster, you can tell it specifically which network device network bridge you want to use instead of the default one. And then, if you just bring up the worker machines using libvar using that same network bridge, then you have direct l2 connectivity uh between those two things so uh it'll work in that case.

A

Yeah all right. Well, I think we'll leave it there. We go for two hours, it's quite a long episode this one. So uh thanks for everyone, sticking around and um join us in, I don't know when we're doing the next episode it'll be on twitter it'll be from you will definitely see one from joe's account and the vmware cloud native apps uh click on the subscribe button and click on the bell to get reminders. uh Don't let the youtube app algorithm figure out if you're interested or not um subscribe to all of the episodes.

A

Just yours, uh I am not on patreon, uh so don't subscribe there thanks. Everyone we'll see you soon thanks! Thank you. So much jason for spending your time with us today.

C

Thank you for having me it's been fun.

A