GitLab Secure: Brown Bags, 11 Jun 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Developing with Docker

Description

This is the recording of a brown bag presentation on discussing developing with Docker at GitLab.

https://gitlab.com/gitlab-org/secure/brown-bag-sessions/-/issues/25

A

Hello and welcome my name is Moe. I am a software engineer for the composition, analysis, team at git, lab and today, I just wanted to share some insights from improving the license compliance, license, canning, docker image and do a bit of a crash course on docker I've got 38, slides, I'm, hoping to average about a slide per minute. Some of them are going to be a little bit slower than others and I just did a quick run last night, so I'm expecting a few things to fail, so bear with me, but we'll work through them.

A

Okay, so well before getting deep into doctor and optimizing I just thought: it'd be good to go over a few things, make sure we level set so I'm going to discuss some definitions or just going to go over some of the definitions and docker terms that I think are applicable to the work that we do in our group and- and hopefully this translates to terms that are useful to others in development. I want to go over a little bit about the docker ecosystem from the client, the host and just like.

A

When we issue docker commands, what do they do now we're going to go ahead and build a few different docker images we're going to go over the docker file format, we'll talk about the different ways of actually mounting images and what that means and we'll analyze an image and then we'll try to optimize that and we'll show you some tools and ways to optimize a docker image. So a few definitions that I want to talk about are when we talk about docker image, a docker container and a docker registry.

A

What do these things mean and the way that I think it makes sense for me is when we talk about code such as Ruby when we define a class a class is, is something that describes the behavior and the data associated with something that you can instantiate as an object. So in docker terms, the docker image is very much like ade, a ruby class in the sense that you describe the functionality that it offers and it works as a template and it's stored as a template.

A

But it doesn't actually do anything until you create an instance of it, and so when we go ahead and we create an instance of an object in Ruby such as Moe and you, these instances are things that we can actually interact with. We can do message passing with them by invoking methods.

B

A

Objects can actually interact with one another and in the sense of docker containers, are very much the equivalent of a object where containers can actually interact with each other. Containers are actually instances of an image, so it means that she ate the Ruby class person as an instance of Moe. The equivalent of that would be a doctor container cult, Moe and, depending on how you set up those docker containers, you can have them interacting with each other I'm not going to go into too much detail about that today.

A

But the idea that I want you to take aways if you're a Ruby programmer a docker image is very much like a ruby class, and a docker container is very much like an instance of that class or an object and that's running alright. So how do you represent these things? Well, for classes, you can identify them with a name.

A

So in the example I gave previously, the person was the name to describe that class and then, when I created an instance of that class person, I signed it to a variable called Moe, and that was the identifier for the actual object. Moe internally, Ruby also had manages an object ID.

A

So this is the identifier for that object in the Ruby process itself and on the docker side, each image has an identifier, that's the equivalent of a name and a tag and I'll talk a little bit more about identifiers and how to build those and then a container or running instance of a container is assigned an ID and also a generated name, which you can supply yourself and we'll take a look at a few examples of that.

A

So when we're using like describing a docker image, its name is typically in the convention of name colon tag. So everything on the right side of that colon implies the specific tag or version of that image. So the convention and docker ecosystem is to use the latest tag as the latest version of this image. However, you'll find many different conventions and I.

A

Don't think there really is a standard in terms of how to version and named them at least I haven't looked too deeply, and then, on top of that, when you identify an image, you can also prefix it with the registry URL for where you can actually source that image. If you don't, the default will be to actually source that image from docker that so a couple examples are provided here, one the first URL has a or the first image.

A

Identifier includes both the registry, where you can fetch the image from plus the name as well as a tag so license management is the name. Yeah latest is the tag, and then the prefix before that is the actual registry, where you can source it from the second one has a name of Alpine latest, because I haven't actually prefixed it with the registry. We can source it from. It will default to fetching it from docker. That I owe alright, so I mention registry a few times. What is a registry?

A

Well, in essence, registry stores, images when we bake images it contains both metadata, as well as the data included in each layer. I'll talk about layers momentarily and then we need to be able to share them in some manners, so you can actually share that metadata and data by pushing it to registry and also pulling it from a registry. So the default registry I mentioned earlier was the docker IO registry.

B

A

You can actually interact with as an HTTP API for the most part. Most registries do require some level of authentication. So if you do actually execute the curl I command that I have there, you probably get a challenge to issue your basic auth or a bare authorization header. But you, the idea here, is that we're just interacting with HTTP endpoints here, so we're pushing and pulling by HTTP with TLS as our transport.

A

So the architecture of docker is actually quite simple. We've got the client, which is the docker CL, is whenever we type docker version docker build.

A

These are actually just commands that the local client interface, command-line interface, is actually issuing to a remote host so that host by default, when you're running in development mode is typically set up on your local machine and you're, connecting to it by a UNIX socket, which is typically mounted at var, run slash docker dot sock, but you can't actually configure your client to connect to a remote docker host, so the client and the host don't necessarily need to be on the same box, but typical setups or development usually have it.

A

On the same host, and so it makes it feel like when you're typing docker commands that is actually doing the work and when, in fact it's actually just issuing HTP requests across a socket to a demon, that's running, that's actually starting stopping containers and doing the building. And then the registry is also the third component. So the registry, its main purpose, is to actually host and share images. So it takes care of the authentication components for being able to connect and pull down images as well as pulling down or actually pushing up images as well.

A

One thing I haven't looked at is the authorization component of registries, so there is a docker image called the docker registry, which is I, believe the default registry image that we use for serving docker images. But in a nutshell, these are the three components and I think it's important to keep these in mind that one running docker commands on the client. This is just the CLI, which is just a light interface over a HTTP endpoint which we're connecting to the docker demon.

A

Alright. So here's an example of me actually just interacting with the locker local docker demon on my own machine, so my development machine. I have dr. running the docker daemon running on my host as well as a CLI. So when I run commands with docker version, it's effectively doing something. That's the equivalent of this, where I in this example I'm running curl telling it to actually send an HTTP request across the UNIX socket. So in this case far run doctor.

A

That saw is the UNIX socket where the docker daemon is actually listening for requests and you can see I'm actually getting back in HD response. In this case, I got a 200 back, and it's telling you here's the list of containers, identifier, zan metadata like available for the local doctor, daemon, that's running so again. Doctor CLI is just an HP client that can connect either your next or TCP socket by default. It's configured to connect to the UNIX socket so.

B

Docker version.

A

Is really just a client that can be written in any language for the most or if we want to interact with the docker daemon using other tooling, it's perfectly possible to just make sure that we write it against the HTTP API, that's presented by the doctor daemon. In this case, you can see I'm running a Community.

B

A

1903 late, alright, so now, let's start interacting a little bit with docker CLI, so docker image. Ls me a listing of all the local images that are sorry I shouldn't, say: local all the images that are present on the docker host that my CLI is configured to connect to. In this case, you can see I've got two different images. Currently in mounted and available, the IDs are omitted, some information, such as the actual full main to the URL, but the image IDs present a unique identifier that represents that particular version of that image.

A

So I've got a name of 0, 0, 2 and latest. It's got a size on disk compressed of 2.3 gigabytes and then I've got another image pulled from the docker IO, which is Debian stable, slim at 69 point 2, megabytes, docker PS is a way to actually see what containers are currently running. So this would be the equivalent of going into IRB in your process and saying mode object.

A

Id give me the ID of all the objects or of that particular object, or going into object space and getting a listing of all the objects that are currently running in in the Ruby process. So docker PS gives us a listing of all the containers that are currently mounted and running. As you can see. Currently, I don't have any containers running we'll come back to this a little bit later.

A

Alright. So now what actually happens when I run docker run so docker run? Is the sub command that's used to actually take a docker image and actually mount it as a container so that you can interact with the container and whatever facilities are available within the container? So when you run docker run alpine lattice? What that's actually instructing doctor to do is first go check to see if the docker host has that image. Remember the docker command.

A

There is again just a facade over the docker host, so it's actually going to make a HD call over to the docker host. It's going to say, can you please run this?

A

The docker host is going to say, I have this in my local copy, my local cache I, don't if I do or if I, don't if I, don't it'll, go and download it from the registry, if it registries and provided will use the default registry download it from the registry to the local host and then it'll actually start up that container and that container you can see here. This particular you man, all I'm doing is just running like just having the content of OS release.

A

We can see that the Alpine latest image was downloaded after its downloaded actually executed. This command from within the container, so I should mention that my host machine is actually Fedora 32. So there's no reason why you should see this output from my host machine cats, slash Etsy, OS release is actually telling us that this is an Alpine Linux container, so I was able to download the image mount it locally and then run a command from within that image.

A

Using my shirt kernel with my house, alright, so we've talked about interacting a little bit with docker image. Our containers we haven't actually talked about like how do we build an image, so I want to go through a few different items that I think are important in a docker file. The docker file reference is comprehensive, there's quite a few different things you can do in there, but the things that I think are more important. Are the ones that I'm going to talk about.

A

So the first line that I want to talk about was from so when we're defining a docker file. A docker file is a file that describes how to or what particular things you'd like to do to build an image. So we take that docker file.

A

We submit that docker file to the docker host and then, when it starts building the actual image it'll go through it line by line and process that docker file and start to build the image, and so the first line in the docker file that I provided earlier was from, and so every docker image has to start from a different docker image. Now, I'm going to kind of skip scratch images and base images for now and just focus in the context of people who are building on top of doctor as opposed to building base images.

A

That's a different topic altogether, but we're saying is we'd like to use Alpine latest as our starting point. So Alpine latest is an image that's already pre-baked and we're declaring that we'd like to use as our base image. So whatever content is already in that base image we'd like to start from there and then we'd like to alter and add additional things to that. I do see a question where our registry hierarchies define. So where is Alpine latest yeah.

B

And maybe we can get back to this later, but I'm just curious, you're like pulling Alpine later? How does it know to go to docker hub to get that like is? If I have my own registry within my own network? Is there a way to say good? My own registry look for Alpine latest there. If you don't find a go to our hub. What's the hierarchy there Oh so.

A

Doing some sort of proxying is where you can act like a caching proxy, where you can see if it's local use local versions going there I believe, there's different pull policies, but for the most part, okay, so let's rewind to just image identifiers. So we identify an image, it's optional, to add the registry URL.

A

So if you include the registry, URL you're explicitly stating I'd like to fetch this image from this endpoint now, if you're, trying to sort of layer a name where you can say this image with this name could be sourced either from the default registry, which is docker data AO or it can fall back to intermediate registry that can actually cache that I don't actually know how you do that to me. That sounds more like a caching proxy problem, where you want to check the local instance. First, then the next so I actually don't know.

A

If I have a good answer to that question. Okay, well,.

B

Yeah is, it is.

A

That the configuration for your client typically ends up in your home. Docker, folder and I'll show you mine, but it's got my credentials in it. Let me see if I can just take a snippet of it. It's got my initials in it so doctor you see what I can show you without getting into too much trouble here.

A

A

And just modify this so going to do some work just on the side here and just give you an idea of what you can configure I'm going to delete all my off tokens and I'll show you okay, so.

A

Let's just break that: okay, so within my home directory I've got this doctor config file. The doctor config file contains configuration for configuring, your local client, so I mentioned earlier. That's a client by default. If we'll attempt to connect to VAR run docker dot sock, you can actually configure it to connect to different endpoints. So if you wanted to connect to a remote host by default, there would be config can fans that you could use to actually configure that or you can modify that config file directly.

A

So I don't want to show you my config dot JSON, because it's actually got my like bearer tokens for connecting to my different registries, but I'll. Just give you a quick example where I stripped out some of those items and let's just paste that, and so what I've done is.

A

This is an example of the config dot JSON, with the auth token stripped out it actually configured a few different registries that I can actually pull from. So one is dr. PKG github.com, so I can use that as a docker registry that I can fetch from with the off credentials associated with that. So if it's like a private image, I can fetch those we've got the default already in there, which is indexed at docker, I/o I. Think v2 is the latest of my config when.

B

You install dr.: does it ship with essentially line seven through nine built in that's.

A

A great question and I don't know if that's part of the installation and if it depends on like if it's Mac or Linux, my assumption, which we'd have to verify is the first time you run the docker CLI is that if the configuration doesn't exist, it probably lays it down. We can test that theory by mounting a docker image quickly, but I think what I'll do is we'll save that question for the end, and then we can. We can look at what happens. There sounds good, hopefully answer your question.

A

Okay, so let's get back where the.

B

A

For dr. version image, okay,.

B

So we're talking about.

A

Docker files, so I've talked about quickly about from so from specifies here's the base image where I'd like to continue building from and so we're gonna go pull down, Alpine latest whatever, whatever size that Alpine latest image is we're going to have to inherit. That, like, we can't alter that at this point, without rewriting every single layer and again I talked about layers. I'll get two layers, so from tells us. Where do we want to start from? So this is gonna, be our starting point next line in that dr.

A

file is copy and we're saying: okay, let's introduce a new layer into the image, so we're going to start from Alpine latest which what was whatever was there and then we're gonna copy. This file from my host machine, hello, dot, RB to specific within the image itself. So I'm focused on this line right here.

A

So we're saying copy from my host machine relative to my present working directory to the image location within the image user, local bin, hello and then the next thing is to actually just run run so run is actually you can think of it as like shc or it's like run this particular command within a form shell inside of the emissions can introduce a new layer. Whatever happens in this layer, any mutations to the filesystem additions, removals changes will be recorded in that particular layer.

A

So in this case we're saying: okay, let's we've already added hello as a file in the previous layer, I'd like net to mark that isn't executable. So it's executable and because user local bin is already in my path when I'm in that image, I could just run hello from anywhere I want and it'll actually execute, and then finally, this is the actual default command. So I think I wanted to talk about entry point commands and the different forms this I family was going to take quite a bit longer.

A

So we what's important here is that this is the default command. That is run when the docker container is instantiating around. You can override command quite easily by providing your own options. The example I gave earlier, which I'm just gonna rewind, was in this case. Alpine latest has its own default command, which I believe is just SH so by default. If you don't add anything after the name of the image, it will just run the default command, which will give you a bourne shell.

A

In this case, I overrode that by passing cats at sea, / OS release which overrode the default command and executed this command within the image and then exited as soon as it was done. So, if I didn't provide the extra variables it would have run the default command.

A

Entry point is a slightly different variation of that entry point allows you to in in a sense like completely hijacked that shc and so from there. You can drop your own entry point that SH and it's kind of like a nice way to shim in things that you want to occur before any activity happens in the image itself, decompressing any files that might be necessary. That's it and we'll talk about that. Alright!

A

So now, let's build this image, so I build it using docker, build I use the T, which is a way to say, tag this image or give this image this particular name so I gave it the name of developing with docker : latest and I said use example, slash, there's or one as the build directory. So that's the directory that it's going to look for docker file by default.

A

If you want to specify a different file, name directly, I believe there's a command or option called F, but by default whatever build directory, you provide the docker, build command. It will search for the docker file and start building that. So you can see it did. The first step, I pulled down the latest, it's using Alpine latest as the base image that was the first layer. Second layer. We copied a file to use your local bin third layer.

A

We changed it to make it executable and then fourth layer, we added the hello, is the default command and we've finished building our image. The image itself is now tagged, and this particular three CAC is the identifier for that image. Any mutations that we make to that docker file will result in a new image identifier, because we've altered the the actual the actual rendered output of that image.

A

So now, if I run that image, so I can go, docker run developing with docker colon latest and when I run this you can see I'm getting this error can't execute Ruby, no such file or directory. So this happens. So in that, when that happens, we need a way to be able to interact with the docker container to find out what's going on, and so what we can do is we can actually get a shell into the container. So what I did here is I running docker run.

A

It means make this an interactive and give me a sudo terminal, so I can actually interact with the docker container itself, I'm overriding the default command, which was previously hello and I'm. Saying? Can you please just give me a shell into the image or into the container? So now I have a shell in the container and I can try to reproduce that defect and see what's going on so what I'll do is I'll just like to see? Okay, where am I first, let's go cat cels release.

A

Okay, I am in the out Linux, okay, well, I know my image should have this thing called hello and my path. So, let's see if I can find hello. Okay, so we found hello. Let's just take a quick look, make sure it's the right content. Okay, so yeah! That's that's my following looks good okay, I'm telling it ends, go get Ruby start so execute this file using Ruby. It should just put s okay, let's just run Ruby. Okay Ruby is not found. This is my problem.

A

I don't have Ruby in my image, so being able to like get a shell into a running container is a great way to be able to debug issues. It's hard to do that by just looking at a docker file and building an image and being able to override the default commands. But you can get a shell is very useful. I find in terms of developing and be able to debug issues. So in this particular case, I built a docker file that was actually defective because it had dropped.

A

The next futile called hello that was dependent on Ruby and Ruby wasn't available, and we can fix that by going apk, add Ruby.

A

Ro I'm already out of time. I am sorry, I started late and okay, so apk update, so I need to update my index first before I can install Ruby I got a fast for it a little bit, but you get the idea here. So in this particular case we launch the image or launched a. We specify the image that we wanted to launch which convert it to a running container.

A

You can go it's okay and then we overrode the default command so that we could get a shell into the image and then we use that shell to debug what's going on so that we could improve our doctor file or fix any defects, all right, so I still have that image. Running I haven't aborted. It sorry I keep getting my own terms, because I still have that container running I haven't aboard with it. So you can see when I run docker PS. Now we can see that there is a container running.

A

It's been given this identifier. This is the image of it's running, and this is the uptime for that particular container. Oh okay. Containers are objects, images or classes. Okay, get it go all right, so the next thing I think it's interesting, so I have this running container.

A

What we can do is like, if, let's say I'm, this container itself is running a web process and in order to able to debug that process, I need it to be running and using the default command. So what I can do is I can use docker exact to actually get a shell into that same container. So earlier we were looking at its user local bin hello. This is the same container that I launched previously. If I actually go hostname, you can see. The name is 9 c8, B and I'll rewind a little bit just the.

A

Control ad not working today.

A

B

A

So you can see when I ran docker PS. Previously, the actual container that was running was 9 c 8 b by using docker exact, I was actually able to get a shell into the same running container without having to start up a new container, which means that I can side loaded promises. I can actually like jump into a running container, take a look at what's going on, debug it from the side without actually X like actually blocking whatever the default command is, and this gives me a better understand why?

A

Oh right, this gives me an opportunity, actually debug of running environment, okay, so next thing I want to do. We talked a little bit about layers now, I want to figure out like how can I make this image smaller, like what is in each layer?

A

What are we packed into this image and the tool that I've been using is called dive? Dive is amazing. You should be provided with a identifier for a particular image and it will analyze the image itself and give you a breakdown of each layer, the size of each layers. You can see the first layer which was pulling from Alpine latest, has a size of 5.6 megabytes, the next layer, which was a copy, took about 40 bytes, and then the Chema took about 40 bytes.

A

So in with that, I can actually analyze each layer jump in and actually see what files were added, what files or remove what files were changed, and this gives me like much more visibility into what's happening in each layer. So I can optimize each layer individually, remove things that I don't need, for example, like man pages, which may not be necessary for the the type of work that we're doing since we're running projects in CI. It's very rare that people are going to run man in that case.

A

Those are things that we can actually remove rate next. Okay,.

B

So what does this.

A

Mean well when we do here's an example. I just wanted to share this particular image that I'm pulling down has seven layers. Now, if you saw six of those layers, we were able to pull down like just like that, like it's already downloaded, but that's seventh layer, it's taking a little bit longer to download and because that's such a large layer that one layer is actually costing us in terms of total time to download this image.

A

So by adding more layers, we can actually increase the number of parallel downloads that can occur, and if we can keep those layers small, that we can actually chop up a large image into smaller pieces that can be downloaded in parallel and therefore it gives the appearance of actually running faster.

A

And so this particular case you can see we're still downloading that 1.2 gigabyte layer, so there's an opportunity for optimization there and in total I took about a minute to download and a bulk of that time was spent downloading, just a single layer, so more layers equals more parallel downloads. Smaller layers equals faster downloads per layer. All right, let's build another example here. So, in this case we're gonna build a docker file. That's got more layers than are necessary and it's going to be rather big.

A

So in this case, I had I pre built this image because it does take some time we can see. In this particular case, we're pulling from Debian stable is our base image. We do an apt get update to just go update our local app index upgrade to up to create any packages, install, get install, Ruby installs that standard, and then we clone this. Rather large data get repository into off DB and then we're echoing done so. We built that image. We can see that on the disk.

A

It has a size of about 2.3, gigabytes compressed, it's got seven layers, but two point: three gigabytes compressed is pretty large and at the moment, I'm not really sure, where all that disk space is coming from, because we saw that the base image of Alpine only occupies about five and I thought five and a half megabytes. So whatever I did in my docker file that five and a half megabytes up to two point three gigabytes so like to find out. What's going on there so again die for the rescue, so we can go into dive.

A

I'll do is I'm going to just open up dive in my shell AC nine four and then we're gonna just interact with dive directly and I'll switch back to the presentation in a moment so, depending on the size of an image, it can take dive a little bit longer to analyze the image and produce an interactive terminal free to analyze. It we'll give it a few minutes just to take a look each layer and then we can actually interact each layer alright.

A

So in this particular case, I can see the top layer sixty nine megabytes, that was from Alpine. The update, took about 17 megabytes by the way on my eyes are up here: I'm, looking at the 17 megabytes for each layer and on the right hand, side we can see all files that are in that layer, so 17 megabytes 69, not too big, 0, not a big deal of 93. That's a bit big forget 31 megabytes for Ruby, okay, 2.6 percent standard, and what is this one way down here?

A

2.1 gigabytes yikes: let's take a look at that one. So if we look at the 2.1 gigabyte layer, I can actually interact and take a look at like what's changed and if we look at this very from a high level, we can say: okay, the bulk of the change has happened in one directory. This opt directory, so let's drive down, it was off directory and you say: ok, 2.1 gigabytes, an op DB, let's collapse them these folders where's, most of the disk being eaten here. Okay, we have 1.4 gigabytes for just the dot git directory.

A

Well, in this particular case, it turns out I, don't actually need the dot git directory for for what I want to accomplish with this image. So that's actually something I can just drop and there's a savings of one point: Giga four gigabytes right there and no perceivable impact at run time. For me in this particular scenario, so you have to think about like the impact of these files and how they affect you in your particular needs of just getting this level of visibility into the image to identify potential things that can be removed.

A

That are unnecessary, that are there it's great just having that visibility. So we can optimize this. We can do a few things. One is, let's just give her in this get directory. We can actually do a couple more things once we've completely gotten rid of that get directory. If we can't the other thing we could do is actually just compress it. So if we compress the this DB directory in the image itself, it reduces the size of the image, but it transfers a cost to the runtime, which means at runtime.

A

We probably need to use that directory, but we can decompress that at runtime, so depending on the compression algorithm we use, we have to be careful about how much we compress and what is the cost of run at runtime. Thankfully, is that standard is a fantastic compression algorithm which has very little cost in terms of decompression time at the total cost of decompression time is fairly constant, regardless of the size of compressions. You can tune it to okay, I'm, jumping ahead, let's get back to the slides okay.

A

So, let's start building this small image you're going to do a few optimizations here. Let's take a look, so we're gonna collapse the layers, so we don't actually need separate layers for each of our install steps.

A

There are a few things that were installed with git and Ruby and that standard that we can actually dump like man files I'm gonna, leave that for now, but what I'm going to do is going to do an update once I didn't include it here, but you can actually delete all the out sources list as well, so anything any dev files that were put into the VAR app cache can be dumped as well, because you don't actually need them at runtime unless you're installing additional packages but like at a minimum. We can clear the cache.

A

So in this case, what I'm doing is collapsing layers, removing unnecessary files, adding a tiny bit of compression so I'm taking the OP DB directory, converting it into an archive and then compressing it using that standard with a compression level of 19.

A

So what this means is like there's an additional cost to build the image it's going to take a little bit longer to build the image, because in order for is that standard to be able to compress effectively, it's need to take several passes at the data to to find out how to compress it effectively. So I've moved the cost to build type, which means it takes me longer to build, but it produces a smaller image and I have a hit at run-time where I have to deflate it.

A

So I'm overriding the entry point in this case and the reason I'm doing that is because I need this to run before anything and everything. Even if you want to get a shell on the side, I still need my entry point to run so when I'm doing in my entry point, which is the run that Sh file is check like unpacking. The tar file deeps are inflating the compressed file, unpacking the tar file and then dropping the original compressed file and then shelling out to whatever, whatever the original command that you wanted to run with.

A

So at one time when you launch that container, you have a fully functioning runtime environment and we've kept that image size as small as possible, and let me just double check that the image is finished. Baking.

A

We may have to come back to that. So, let's take a look and see if it's done baking, a small image is still baking, so it takes a little while, as I mentioned, to do the compression.

A

So I've moved some of the time that was spent previously for downloads into build time, but that happens less often than the downloads. So for me, that's a it's. A pretty good trade-off. What it could mean those increasing the build or CI build timeouts in the case of license management. I think it's now exceed as an hour to build. There's still lots of room for improvement, but we've been able to compress it from the original upstream version of which it was around nine gigs down to just under two and.

B

Ismo is most of that build time associated with the compression.

A

In the case of license management, it depends. I still need to do some timing, because we're also installing multiple tools trying to reinstall those tools in parallel for installing Ruby Python Java to two versions of Java, two versions of Python I'd mention and some other tools so those installations. If we had it, you know a Debian package that we could just go.

A

Apt-Get install, you know, secure dab Ruby, then we wouldn't have to compile from source because they would be precompiled binaries that you just unpack, which would be much faster, but right now, I, think the time is being split between compiling and installing those tools and compression and because I set it to a compression level of 19, there's a good chance that a bulk of that time is being spent on that compression step. So few optimizations there is like things where you don't have to compile avoid that right.

A

If you can pre compile and ship them as a debian package, then when you go, you know, dprk dpkg. I give it the dot deb file, it's effectively already a compressed file. It's just going to take the compressed binaries and just drop them right into the paths in the filesystem rather than trying to compile and Link at build time. So that's the another level of optimization I'm not really talking about, but, yes compression is going to add additional time to build.

A

Okay, sorry, I didn't actually time how long this was going to take.

A

And now that this is actually a little bit hard to see.

A

I'm gonna see if I can adjust this slightly.

A

Running here we go so you can see where we're in the single layer here update, install installing a bunch of packages. Oh no I can't temporary failing resolving security that Debian Oh a.

A

B

A

Now what I wanted to do? Show you that yay moment where we were able to compress that 2gig image down to a couple I think it was like under 100 megabytes or so so we'll come back to that and we'll enjoy that aha moment a little bit later.

A

Alright, sorry, let's come back to that. Okay, so we collapsed some layers, we cleaned up by necessary artifacts. We deleted files within those layers and then we we had a entry point to inflate those layers when the container is launched.

A

So we can see here, big image is still setting up a two point: three gigabytes, we're still waiting for the small image to finish so we'll come back to that and we'll see how much we've saved. Oh I, can't actually dive into the small image yet so run faster.

B

And the key to the size of duction also is that you're combining those layers right, you're you're running a bunch of commands on a single layer, yeah.

A

There's like a cost, there's a point of diminishing returns, as I mentioned earlier, like more smaller layers is better for downloads because you can download in parallel, and we saw with the license management image. We were able to take very several large layers and compress it down to one still large layer because we're trying to so what's happening. There is okay, so the question again, let me restate it's like we're collapsing a layer. So, yes, that is an optimization because apt-get install stings and to devour cache, and so we can install those things at once.

A

We can just clean up the cash all at once.

B

Right, so if if you were to go back to the cleanup, maybe this is the extreme example. So one of those lines you're removing the dot get folder which we look at was like. You know one point, something: if you do that as a separate layer, you're, not you just you're, not safe, your total size would actually be bigger, because you've got all that data, that's one layer and then the next layer is going to delete all that. So correct you have! No, you have no cost savings there exactly right.

A

Yes, so I guess they did gloss over that. But if we did have this get clone as one layer, then.

B

A

Had the RM dash FR as a separate layer, we still have the cost of including the original dot git folder in the layer before it, so the size reduction isn't actually as effective as it could be by doing it in the same layer right.

B

So yeah I think they take away on that. Is you want to start with the smaller stuff? You don't want to start with something big and think you can reduce it like you have to start and build it up. Yes,.

A

Each layer needs to be thought about carefully and making sure that it's as small as possible saving your cleanup to the last layer doesn't yield the benefit that some may think. It could have right all right so trying to time, while this is still going, don't have big image still compiling that image. But what we'll see is that once we analyze the image again, which I think I will skip now that we did see the cost savings in that layer?

A

In terms of that, one point for a goodbye image, layer being reduced and oh I can't run it yet. Okay, note to self compile the image before the presentation. I believe.

B

In talent working, your CPU has gone up, 20 degrees, oh.

A

Yeah 70 degrees, Celsius, it's gone.

B

A

I've got 8 cores and we're load average about 6 right now for last few minutes, so we're working hard we're working really hard. Thank you. This.

B

Is like the modern potbelly stove to heat your place during the winter yeah.

A

B

Actually, nice.

A

And toasty in the winter yeah all right. What can I? What can I show you while that's running any questions, concerns optimizations, no.

B

This has been very helpful, cool.

B

So I think it was a tool that we were talking about previously, actually could take a final docker image and then compress it basically and I think we talked about whether that that made sense or not, because of exactly like what the example I just was talking about, which is, if you install a bunch of things and then you delete a bunch of things.

B

You know that's kind of inefficient and there's a tool that actually goes back and says. Okay, what's the compiled version? Look like.

A

The tool you're talking about is docker, slim and I gave her a try. Here's the results that I found.

A

So in the case for a license management, dr. slim was able to drop it down to three gigs, whereas with the manual tuning and you're able to drop it down to 1.7 gigs. So you know, docker slim did great with very little effort in terms of dropping it, but to get it to just that extra level. You do have to spend a little bit more time tuning it. So docker, slim I, don't know exactly how it works.

A

I haven't looked at that so I can't say: I recommend it maíam I'm guessing is that it is doing what we did manually in terms of looking at each layer, making some guesses as to things that are not going to be required, like the dot dev files and the cache source lists. Those common things that you know you would typically find as bloat in images and then trying to reduce those layers into less layers. So I think, if dr.

A

slim is what you're talking about it's a great starting point: I'd love to look more into it, but just having a good understanding of where your container is going to run. What are the needs of your runtime environment, I? Think you're, probably going to see that you can actually improve the size of the image by just knowing that, because an automated tool isn't going to know whether or not that get directory is necessary or not. It will probably think that it is I'm.

B

Dr. slim is working at the image level. It's not optimizing. Your docker file is that okay.

A

It didn't optimize the docker file when I ran it. It seems what it did is. I took my original I provided the image which was this and then no sorry wasn't this. It was the you provided an image and it produces a new image. That's a mimic of that image using the name. It appends, the dot. Slim okay got it. Yes was a new image from that, because.

B

I could imagine there could be some analysis tool that could look at a docker file and make recommendations of how to clean up a docker file. I.

A

Guess, there's some patterns like just doing static analysis, fun, yeah, I, Jacqueline, Cochran file. You could detect patterns that are common. That can be remedied all right. The image did finish baking. Let me just show you we're just going to do the doctor image LS live and where is it so three step? Three, we went to six 265 megabytes from what was originally 2.3 gigabytes, so some pretty good savings in terms of disk space as well as bandwidth. Now the question is: will it run so?

A

Let's actually just run the image and see what happens so those are the three latest its docker run for an interactive, and so you can see the entry point immediately started and it it inflated my compressed file and now I have a shell. Now, if we go to ops, I can see I have a DB directory.

A

If I go to DB I can see I've got everything else, except for the docket directory so functionally equivalent to what we had previously much smaller space on disk compressed, as well as much less bids that need to fly across the network, with the cost of having to do the inflating at a Trenton. But we saw I I didn't time that, but I would say that was negligible compared to the time it's takes to actually download across the different network speeds so to rewind, oh yeah, let's just quickly do a dive into that image.

A

What was the doctor image? Ls.

A

The image, and so now we can actually see the size of each of those layers. Again, you can see 69 megabytes 196, so these were like the five or six layers that we collapsed into one. So there's probably still room for improvement, but this is already a huge savings from where we were at previously and I already ran the image. So we can see that the image still ran.

A

This was actually a cached version of the slide, so it's not realistic and you can see two point: three gigabytes down to 255 for a functionally equivalent image and so summary is to keep each layer small. As you mentioned, Seth like, if you have a layer, that's large and you try to clean it up in a following layer. You don't actually yield the benefits of doing that. More layers do provide an opportunity for more parallel downloads. So if you take a big problem, split it up into smaller problems.

A

Well, then, each smaller problem can be attacked in parallel, and so this is a great example of that and that's that's all I got so. Hopefully that was a good crash course in docker and shared a few techniques for optimizing. Your docker development and building well.

B

This was great. Thank you. Yeah.

A

Thanks Mel thanks for attending and I'll stop the recording now.