Kubernetes IoT Edge Working Group, 28 Oct 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: WebAssembly Based AI as a Service on the Edge with Kubernetes - Rishit Dagli & Shivay Lamba

Description

Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2023 in Amsterdam, The Netherlands from April 17-21. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

WebAssembly Based AI as a Service on the Edge with Kubernetes - Rishit Dagli, Narayana Junior College; Incoming University of Toronto & Shivay Lamba, Meilisearch

WebAssembly (WASM) is being adopted at an increasing rate for edge applications. That allows WASM runtimes, such as WasmEdge (a lightweight and high-performance runtime for cloud-native, edge, and decentralized devices), to run serverless functions on the edge. Following the large-scale adoption and benefits of serverless computing, we focus on deploying these as a Function-as-a-service on edge devices. Machine Learning inference is often a computationally intensive task and edge applications could greatly benefit from the speed of WebAssembly. Unfortunately, Linux containers end up being too heavy for such tasks. Demonstrating Machine Learning deployments in such a fashion, another problem we face is that the standard WebAssembly provides very limited access to the native OS and hardware, such as multi-core CPUs, GPUs, or TPUs which is not ideal for the systems we target. The talk also shows how one could use the WebAssembly System Interface (WASI) to get security, portability, and native speed for ML models. To top it off this talk ends with a demo of deploying a Machine learning model as a serverless function using WASM deployed on an edge device.

A

Okay, uh we'll get started hello and uh in the next 30 minutes, we'll be talking about uh web assembly based AI as a service on the edge with kubernetes uh I'm rishat and uh I am an uh I'm an under a CS undergrad student at the University of Toronto and I'm and I also work on the machine. Learning team on Finch at SpaceX uh should I.

B

Yeah hi I'm ashuvai and I'm a contributor and maintainer at layer, 5 the service mesh Community within the cncf and mostly developer relations, engineer, admin, research, it's a open source, raw space search engine. So really looking forward to be sharing with all of you. How we can leverage the use of webassembly um to enable very easy access to functions as service within the edge.

A

So I want to start out with this meme uh when I start talking about webassembly, which is that webassemblies neither web nor assembly- and um this is an interesting me now Meme- and this also depends on what you consider the definition of web and we want to run it on more than just web. We want to run it on edge devices, that's what we are excited about.

A

uh So what is webassembly if it's not web and not assembly, and why should you care about it if you are deploying Edge applications, especially uh machine learning, applications which are quite compute intensive? So that's what we want to talk about so simply put it's a binary instruction format for a stack based virtual machine and what that means is it's analog is to machine code, a higher level than machine code, and it's essentially designed as a compilation Target. So you would have uh you'd probably have uh some JavaScript functions.

A

Some Rush functions then use webassembly uh to make uh to make a compilation Target out of it. uh So uh so then you would probably use webassembly make a DOT uh make a webassembly module out of it and then run it with webassembly. So yeah, you can essentially uh take take your code or take your functions. Take your modules uh from other languages and compile them down to webassembly, which is the idea and uh and.

B

The idea is that uh you can see that it's not just your functional programming languages, but also scripting programming languages like python JavaScript that can be compiled into webassembly. So it's kind of a polygote environment where you can use any of your standard. uh Like you know, languages and that can be compiled into a web assembly. Target.

A

C

A

Want to talk about webassembly, and uh especially especially I want to talk about this in context of edge, because it's efficient and fast uh in the sense that uh you are not using uh it's pretty uh it's higher level than machine code, but it's it's pretty efficient and fast.

A

It's also open and debuggable webassemblies open source, which is why we are presenting uh about some of the work we have been doing with webassembly, and it can also run on non-web platforms, which is kind of uh why you should think about using it for your Edge applications and it's also based on the open web platform. So you can also use it on the web for Edge use cases where you might have uh not a lot of bandwidth, not a lot of network.

A

You can also use it for that, and the webassembly security model is something really interesting as well. It's not just simple virtualization um and uh you get. You tend to get a lot of security um with webassembly too so uh I want to talk about uh how uh so Shiva. Would you like to talk about how awesome comes into the picture.

B

Sure so, as we kind of recap over the past few slides, what we have seen is that you're primarily getting a lot of high performance byte code. And why is that the case? Because what we're doing is that we are taking your high performance functions, written in languages like rust and they're, getting directly compiled to these byte codes. So one of the common use case where how web assembly started was on the web, where uh JavaScript functions that couldn't be that performant.

B

You could use C plus plus, like in script, in to be able to run these highly performance, C plus applications uh and functions on web. Now.

B

We are also expanding it to server side and on the edge as well and you're, still doing that that we can take your high performance functions, written very popular languages like C, plus, plus or rust, and convert them into this byte code and then run them because of the small size that comes with this web assembly module and as we mentioned, that it's compatible with all of these various programming languages. And since it's a binary instruction format. So, like you know, destroying it and initializing it at the lowest level is very easy to do.

B

That and again like you know, because it is running as a native bytecode. It runs at a very high speed without a lot of performance overhead to worry about.

A

So on this point, I would also like to add that uh so, if you think about something running at Native speed, the first thing you think is it's not. You cannot put the same thing on multiple platforms and uh well that's right to a certain extent, especially in the case of ahead of time compiling. So if you take a uh take one of the popular version, runtimes like washroom Edge and try to ahead of time, compile it into a DOT SE file, which brings a lot of the Native execution speed.

A

It's also, then tied to the machine type. So webassembly uh in general is not not tied to what machine it is running on it's just an instruction format. You can run it. If you try to do something like ahead of time, compile it down, you would essentially get native speed, but it would be tied to what kind of machines it can run.

B

On and yeah, and one of the other things too is kind of focus on over here is, since we are talking about function as a service right and web assembly as a service.

B

So one of the other benefits that we get is that with webassembly we are primarily looking at again the portability aspect of being able to run these awesome byte codes anywhere uh that we want, because, usually, if you're using let's say a python based or a JavaScript based function, uh it will be more kind of limited and tied with the dependencies on the specific platform that you are running it on. So the platform dependency is also something that improves with uh using bosom and, of course.

B

Finally, when we're talking about machine learning, we know that it's very competition, intensive mathematical uh combination intensive, so especially if you're running it on edge devices. You want high speed, uh given the lower constraint, like give the constraints that you have with the size and also with the competition uh that you get with running a edge device. So it's really good for being able to do model inference, conversion and deployments as well on the edge uh and that also very quickly, and that brings us to water mix. uh Would you reset.

A

So uh a washroom image is a popular runtime to allow you to easily run version for Edge devices and um which is also what we will be showing in the demos and talking a bit about today, and uh so uh we want to uh talk a bit a bit more about how does version run on edge devices, uh which is very interesting uh because we talked about all of these benefits. But we didn't really think uh talk more about. How does it essentially run the same code on all devices?

A

How does it do this? So um what so? What it allows you to do is have a fast scalable, secure way to run your sim code and by safe and portable. We also mean that users and programs can only access what they have the right to access, and we also mean that this uh that one process should not create problems for other processes.

A

So essentially we want standard platform, independent methods, so we want uh to have our system called standardized and these should be independent um so which is where uh uh so, which is where uh webassembly system interface comes in was he was he uh which solves a lot of these problems? And um so what was he does? Is it allows you to have standard platform?

A

Independent system calls- and this is a very popular tweet from the creator of Docker, and he says that, if version plus was he existed in 2008, there would have been no need to create Docker, and uh so um could you go to the next one? So uh uh so I want to motivate this and talk more about waji by starting with a statement which is wrong uh that she directly gives you access to all system resources which it does not uh because it's far too important for stability and security.

A

So um so the way version works is I've. Taken this image from Lynn, Clark uh and the way uh and and the way uh all applications work is uh the RC kernel. Can I do this task and um to which uh the uh so to which, uh uh instead of uh having all of these platform specific uh ways to do system calls uh was he allows you to have standard libraries?

A

uh You can now use Wazi in Rust and in C just an example here to do a system call which is not tied to what kind of system uh what what kind of CPU you are running. The running the version code on so uh uh Shiva would you like to talk a bit more about how was image and kubernetes can come together? Yeah.

B

Absolutely so so viscussed is um the application of how was image as a popular runtime. With the idea of time. Comparison can be used to very quickly do things like modern difference, especially from a machine learning perspective, but also you can do it for other high conditional tasks. So, let's kind of understand, where does communities come into the picture?

B

So we primarily know that what the main idea of kubernetes is right, that it helps you to manage and orchestrate your Docker containers now the proposal is that what we can do is that we can either run these volume modules side by side with the docker containers.

B

uh There have been a lot of debate going on whether bosom replaces Docker, but the ideal situation is that you're supposed to run Docker containers with these models and they go hand in hand very nicely, because uh some of the benefits that you get by running Docker containers is that webassembly itself is very limited in terms of functionality because of the sandbox model and the security model that you get with webassembly.

B

uh So you can use the system, resources and a bunch of different libraries and functions that you can directly get from your Docker containers, whereas uh with webassembly you get very fast execution and fast load times, uh especially if you're having heavier Docker containers. They can take a lot of time to load up initially when you're, like you know, in the warm-up stage. So that's the benefit that you get with the awesome containers as well, because they are smaller in size and they have a much faster uh like a load of speed.

B

So the idea is that we are supposed to. We can basically run these side by side in the entire Community stack, and this is how it the goal. Like you know it's that if you take a look at the entire kubernetes landscape, um you want to be able to run you. You want to be have to be able to have all of your different uh communities applications and then the high level and the low level container runtimes and run wasm as part of this entire ecosystem, to be able to complement each other.

B

And that's where we'll kind of now go over to the demos. Where we'll be showing you the functionality of, was a meds and how you could uh configure them to be able to run on your Edge devices. So now, head over to the demos sure.

A

So before I go on to the demos, I also want to just mention that uh we also have some benchmarks. So I worked on creating some benchmarks for running tensorflow Lite models uh directly with the tensorflow, uh with tensorflow Lite apis uh on Android devices and.

C

Ios devices.

A

As well as as well as running tensorflow Lite models with volume, as well as running tensorflow, Lite modules, with volume and iot compilation, if that's what you're excited and will not go very deep into those benchmarks right now, but if that's what you are excited and we have a GitHub repository and you will find all about the differences in time. It takes another thing. I just wanted to quickly mention before we go forward is uh when we were talking about, was image and kubernetes or version and kubernetes.

A

We particularly want to share that. The idea of having Linux containers on volume, containers side by side is a great one, because we want you to take a look at running your Edge applications with volume, but we don't want to to run all the processes on version and I would again like to take this by an analogy and say CPU, GPU and tpus. If you are running a YouTube video, you probably don't want to use a TPU there, uh which is the same case over here too.

A

You don't want to use volume containers, ideally for all your tasks, for a lot of your tasks. Linux containers would work just as well um so the I, uh so one thing you are focusing a lot on is having them run side by side, uh not have all your processes on just a Linux container or just a version container.

A

So I'll start out with a demo about so I'll start out with a demo about so I'll start out with a demo about uh running was on my jobs and we have a few demos over here. I'll start out with a demo. I had been preparing just yesterday, and uh this is running a mobilenet V2 model running a mobile net. V2 model with watch image and I want to first show how to run this app locally.

A

So we'll do that. This is also in a GitHub repository. I have spent some time writing some docs, so people can understand and uh yeah that's what we'll be doing today. So I'll start out, so this is actually a rest application and uh let's go uh I I open in uh what my uh main dot RS looks like. So we we are essentially using track tensorflow, which is a pretty popular, uh which is a pretty popular.

A

um A way to oh yeah. I. Would like to zoom in a bit yes, so we are actually I'm actually using track tensorflow over here, which is a pretty popular way to run machine learning, on uh machine learning with rust. It's a pretty popular library and it's actually multi-threaded version doesn't support multi-threads. So we have made some changes into how track is loaded? How truck does some stuff to make to still make it work, um but essentially what I want to quickly show over here uh is that um we have.

A

uh We have a tensorflow model over here uh which is being loaded. We are doing some pre-processing just ideal stuff, putting in an image, and then we are just running the model.run, so the t-wake is something I also want to talk about, uh which is how how version, uh how version interacts with the state and um uh yeah so I'm, simply running a model inference over here uh interest, and what I want to do now is uh show running this locally uh to do that.

A

I'll first start out with building um with building this rust application uh as a as a web assembly module, and to do that, I'll specify the I'll specify my target to be volume 30 to minus 4C. uh You do have to add this uh install the version 32 minus wasi Target, which I've already done so I'll just start I'll just start with the building. I'll just start with building this rust application uh and at the Target to be version 32 minus 4C. So it knows that it needs to.

A

uh It needs to create a DOT washer module at the end, and this will take a bit a bit of time and I'll uh and I would like to do this live so and.

B

It's going to set the stone like over here that um this is where this is Step, where we're taking a rust function, and this could be used with any other programming language as well, for, for example, with C plus plus you'll, be using uh M script and, if you're, using like tiny, go or any other language. This is the step where we're generating the byte code. uh The awesome bytecode that can be used anywhere.

A

So uh yeah uh we'll wait for it to run for a while. uh I was expecting it to go faster than this, uh but it seems the internet is a bit slow, but later we will also talk about uh a warp T very quickly. uh So uh oh yeah there we have oh.

A

D

B

Always the case with live numbers.

A

uh And the most interesting part is it worked yesterday.

B

We did didn't work out.

A

Okay, uh so, regardless of that, uh what I did want to show you was, uh you could so I seem to have some dependency error, um but uh I seem to have some dependency error, but the GitHub repository does have the right one, at least uh okay. So uh what I wanted to show? You is how you could easily make it as a compile it down as a web assembly. So what you'd get is a DOT was awesome file.

A

uh Now the dud wasn't file can run anywhere and I also wanted to show you aot compilation, which you can also do with washroom time and watch image. We are just showing an example of aot compiling with version time here and so that that would compile it down to a DOT C awesome file. You can also get the Linux shared Library format to aot, compile this and then I wanted to run it locally.

A

uh I also wanted to talk a bit about the webassembly text format, so the web assembly text format allows you to get your webassembly module as a bit readable code, which is pretty helpful for debugging, so I just wanted to show that you can get it dot w80 file as well. All of this is up on a GitHub repository, so I mean feel free to try it out for yourself. I, unfortunately, could ensure it here. So I'll go back.

B

All right, and uh also what we want to showcase is the future of service functions right. So, as we move ahead from writing functions and having to use managed, Hardware uh or, like you know, resources to be able to run these Services, we are of course moving towards serverless platform, which inherently has a number of uh benefits as compared to your standard. uh Like you know, Linux containers or services that you might be using.

B

So if you, instead of like the standard virtual machines, because you get on-demand service and then, of course, it's very easy to scale it up and down as compared to your standard virtual machines. So that's where uh webassembly is also moving to be a really popular, functional, Service uh platform as well, and that's what we also want to showcase with Edge devices as well, specifically because, as we mentioned earlier, that you're some of your popular programming languages that are typically used to write these functional service calls such as python or JavaScript, are great.

B

But then they do come with resource limitations and uh like in a lot of dependencies that might not run or on various platforms. And that's where webassembly comes in pixel and the additional benefit that you also get is the Securities unboxing. That is uh typical of webassembly, and it provides a lot more isolation to your functions that are being written uh with this entire uh infrastructure of being able to write them in webassembly. So those are some of the, uh like.

B

You know, benefits that come with uh these serverless Computing as well, when we are talking in terms of how webassembly is now uh getting into this functional service space. Oh, this is, if you want to add.

B

So, for this also we'll have a demo that will quickly demonstrate.

A

Start by showing a demo of using kubernetes and to manage your webassembly modules and the main role that uh the main thing we want to talk about when using kubernetes is um so we uh so uh recently. uh A very interesting talk uh right here at kubecon yesterday was the docker was in preview, which is uh pretty interesting. They just announced it yesterday, so they had essentially written one of uh so they had essentially written a shim in volume which allows you to uh to have containers and directly run them with kubernetes as well.

A

There is also Crush lid, which is uh pretty popular and allows you to allows you to run web assemblies on with kubernetes.

A

uh So what we want to uh also show over here is uh uh is um something we had done to run it with kubernet manage it with kubernetes, and especially, we want to show how the shim layer works. So you can um so you can run your web assemblies. uh I I have a kubernetes cluster here in Azure uh and I also have two node pools here.

A

So if I just go to node pools and show you that I essentially have two note pools called pool, one which is uh the three nodes for system and uh the mywasi pool, which is a single node, and this is a Washi node pool. So this is kind of the idea we want to talk about. You have a process. You have multiple pods.

A

You run some of the parts on uh so not on pool one, because that's the system node, probably on pool 2, which is a Linux node, uh and then you run some ports on the my Washi pool uh so essentially run some of your processes on Wazi, uh which can give you uh some pretty um pretty nice improvements over what you might be doing so I I have two node pools here and right now, what I'll do is essentially schedule um essentially schedule what I want uh on uh there was.

A

You know uh on the bossy, node pool, which is what I do with, uh which is what I do with this runtime class uh you you use it with um schedule. It schedule my pouch to be run on the watching output and what I also use here so I've. Also, given the configuration for spin, uh which is actually taken from the official spin repository, uh but uh that's again, if you want to go beyond Edge use cases, I will not really talk about that in this demo.

A

We'll just show how you can use slight as your Handler, but I also just put in the official code from the spin Repository. ah So with that we also have. uh We also have our container running over here. This is essentially a web assembly converted to a container, and you can do that using volume to oci uh so Volume 2 oci is a pretty popular tool which allows you to take your web assembly modules and convert them to uh and convert them to oci compliant containers uh and convert them to oci compliant containers.

A

So this is uh this is what has been done, and this is also using the containerdy version shim. This is actually one of the examples that uses containerd wasn't shim, um so uh yeah uh we'll just try to run this.

A

This is one of the very popularly available containers out there to demonstrate using webassembly modules inside kubernetes, so we'll just show this, and so I'll start out with getting all the services I have I do have one load balancer, which is called volume slide uh created with this, with this configuration itself and um I'll I'll get the extent.

A

So this is a very simple example to show webassembly modules running in what my are you managing it with kubernetes and if I just call this uh I, if I just call the a external API and appended or hello to it, I should just probably get a Hello. um How, which is what this container? Does it just prints out hello, uh uh but it uses uh was him and uh was he under the hood uh to do this uh to get to get the system calls and so on.

A

We also have some more examples with uh running a tensorflow model, uh all in bosom and managing it with kubernetes, but we are already at 10 am so uh right. So that was uh about our talk and uh thank you so much for hearing us.

C

Thank you. We got time for some questions. If anybody's got any raise your hand and we'll bring you the mic.

D

um You were saying before that you don't think we should run all the workloads as random modules. um There are some that are better fit for a regular container and others that are probably more suitable for wasm, uh which workloads do you see wasn't shine like when? Would you say yes, this is a good candidate to use for what, as a awesome container instead as a regular one sure.

B

So as we gave an example for machine learning over here right, so when we're doing machine learning inference, uh we need to be very quick now such kind of more High computational tasks where you would need, uh because it will take time right to compute to do the inference. So those are kind of tasks where you can use wasn't because of the faster load time for actually spinning up a wasn't, a cluster and being able to run a wasm container because of the small size and quick uh inference right.

B

You'll want to do all of those such kind of tasks using your version container, whereas with Docker you could run all of your system uh files and all the system calls using your Docker container, because it's well supported uh because of the entire ecosystem of Docker containers and, like you know, being able to make system calls that's where you're going to be using the interface between both Docker and wussy. But specifically, if you're asking for, like the use case for wasm. It's mainly for doing all these high inference tasks.

E

Any other questions comments.

E

Is there a? Is there an Upward Bound of how big a wasm binary or whatever you want to call? It can be I.

A

Mean I'm thinking about.

E

Ai models that get very very large, sometimes.

B

Do you mind repeat the question.

E

Please um I'm wondering if there's like an upper bound to like the size of an executable or not executable, but a byte code, executable for wasm for like large AI models and things like that. Oh.

B

So is the question regarding like for there wasn't executables for larger models. You should don't take it up.

A

Oh sure, so, uh ideally for larger models, I mean you can still use volume, uh but uh I I would urge you to uh first evaluate whether the model can be used on edge devices itself.

A

um I mean volume would allow you to reach the bar, taking even more larger models and what something like tensorflow Lite would allow you to do, and I've been one of the contributors on tensorflow Lite myself, but uh volume definitely raises the bar when used with tensorflow Lite, but for very large models. uh I I think one of the right ways would be to work on optimizing the model itself, if it's a pretty large model in terms of wasm and the stock.

A

So to get back onto the track, the volume counterpart will raise the bar because, because of the lower container sizes is so one of the examples we show is with the tensorflow mobile net V2, uh more tensorflow mobile net V2 model, a Linux container to run the tensorflow mobilenet V2 model, uh even if you try to make optimizations in size, uh would be at least 20 to 30 times larger in size than the awesome container.

A

So um so wasn't already does a great job at raising the bar. In essence, if that answers the question.