wasmCloud Machine Learning Working Group, 14 Apr 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: wasmCloud Working Group - Machine Learning 04/14/22

Description

wasmCloud is a platform for writing portable business logic that can run anywhere from the edge to the cloud, that boasts a secure-by-default, boilerplate-free developer experience with rapid feedback loop.

https://wasmcloud.com

A

uh Welcome to the wasm club machine learning, weekly bi-weekly call. uh This is april the 14th 2022., um so uh kristoff andrew steve. um I mean, I guess we should probably start with the state of the demo. Kristoff you've been making a ton of progress, and we've got now pre-processing and post-processing, um as well as um the sort of core work you're working and it's working with a bunch of different models. Steve you did a demo on the wasm cloud call on wednesday.

A

Just yesterday um is there anything else that we should review as far as um what the state of the demo is now and where things are.

B

One of the there's some minor tweaks, um but it, but I think those are probably lower priority than moving ahead with the next features for the demo. I think so. Tweaks we can clean up the output.

B

um Another tweak is doing performance, benchmarks and figuring out places to optimize, but I don't, I think it's a little too early to do. Try to pay a lot of effort into optimization.

B

um We could do a web-based demo, but that sort of goes in line with features of the demo. So so I think it's pretty good, as is, and until we decide on the next steps forward, we'll have to hear what other people think.

A

uh Christoph what do you? What do you think um yeah you've started to socialize this bit with some broader communities? What was their response or where or what was your perspective on that.

C

I I would totally agree with steve, so we've got a super base. Let's agree how we can make that even larger, better prepare for the future, but we're good to go.

A

That's great um andrew, I think you kind of asked an interesting question around like what does this demo mean for us um in its current state, and so I think what we have now is.

A

We have you know tensorflow with a couple of the standard image detection models configured, which is a demo that you can do anywhere, but I think what is powerful is is that we know that, in order to use these models that there is both pre and post processing, so I think one of the things that this demo means that we can materially think about demonstrating through some performance benchmarks or something is the ability for pre and post processing to seamlessly move in a standard way right.

A

So now, when we think about what it means to have the sort of portability that wasn't cloud brings, it means that we could be conducting either of those things either locally remotely or varying combinations thereof for testing or for other other things.

A

The second thing I think that distinguishes this demo uh andrew, is that the ability to mount the capability providers, whether they're, local or remote, so you could um acquire an image from a local device but mount a capability provider that's remotely and in another way uh how you would do this is you know you would create some client server structure?

A

You know just submit it via a rest call or something like that, but I think this way is incredibly simple, because the developers don't need to change any of their code in order to execute you know, design a process for consuming machine learning um and then implement it. You know deployment, topology becomes a detail, you know becomes a deployment detail, it doesn't become an architectural criteria that I have to bring into my code.

A

So that's the the second thing that I think that um it means, and then I think, there's um a broader third thing here around um you know. The connectivity of this um in you know like this is now so easy to take a set of code and because it's already tied into all of the other ones. I think that this really opens up. You know an architecture, that's very popular, you think, like nest, type architecture where you've got a lot of edges and end points. This just makes building those things um very easy.

A

So I think the combination of distributed with wasmcloud plus machine learning does unlock a few powerful demos. Are there other things that this demo means today that that I didn't accurately account for the my list? Was pre post processing all this stuff around the machine learning is now has portability um the ability to locally or remotely mount capability providers and the sort of architectural collaboration? That's unlocks? Is there anything else that you guys would highlight.

D

Sounds about right to me, can I ask, can I rewind to ask a couple questions about the original not to derail too much here, but uh with the pre-processing that you guys have built so far?

D

Are you happy with the res results that you're getting back from the models.

D

In other words, are the is the classification working correctly or is? Are things getting garbled during the pre-processing step.

B

C

My answer's not very very good, but I wanted to say: yeah looks some more or less right, but if you look after the content, I didn't try to prove it or didn't do a benchmarks or content-wise tester hope that is fine.

D

But at least in just the smoke test that you ran, it looked like it was classifying the image correctly: uh okay, cool cool.

B

Okay, I've tried, with I've, tried with um about 15 different, varied images from the web, um jpeg and png, and it's worked what looked fine great on all of them, um so I I would, I would add, a caveat to um liam's uh idea, liam's statement about where we are in terms of architecture.

B

I think this architecture is still in the prototype stage and even the block diagram that I showed in the community call yesterday. I think that could change once we get further along and into a demo, that's more realistic and more full featured or plugged into a real world application. I I don't think we would want to declare that we've decided on the right architecture entirely and, and people should just go and run with this now. I think it needs a little more.

B

It needs some more proving out in real world and that's partly related to performance issues. I know there's some some things that kevin and I have been talking about about um response time that um will probably rear their head when the models get a little more complicated and their response time increases.

B

So there's a few um subtle variations. I think we're going in the right direction. I don't think we've made any um I'm not aware of any um completely wrong turns, but we might make a 30 course correction in the architecture once we get more experience with it.

A

Okay, so let me play that back to make sure that I understood it steve. Your suggestion is: is that we take what works now and we try to load larger models into it. You know, like maybe try to run things through it at a higher rate, and we look at where things um are.

A

Maybe aren't meeting our expectations so that we can understand if we have to do some architectural changes now before we go too far down one path or another did I did. I say that right.

B

Yeah, so I guess the term architecture is a little ambiguous. um So um what I was what I was thinking when so nothing you just said is wrong. um What I was thinking about was the topology of so the the way. The way the message passing works. Now we we send the image to the api actor.

B

The api actor calls the preprocessor and then the inference engine and then the post-processor actor, and that's part of something that I think could change. Maybe the message goes in a different direction, or maybe it starts at the preprocessor and then and then goes to api.

B

So some of those kind of things might change, and those are the kind of things that if we were to tell people 10 developers to go off and build your application with this, the variations in those would cause a lot of duplicate work and, um and maybe some frustration, and so those are things where um it's not just that.

B

I want to test with bigger models or more complex models, but I also want to test in some real world applications, ideally in more than one domain, so image plus something else, but um uh getting that real-world application in there and all the other things that that's going to require for the flow to work. I think is important before we say that we have.

C

B

That this is the architecture we want to recommend for people to use ml and wisdom cloud.

D

A

Kristoff right, kristoff, yeah.

C

You open for a feature request. Maybe this is a good time to introduce again what I maybe once called a low latency mechanism. So I dream of of something where maybe there's an intermediate actors, something um which intermediates between our today's capability provider, which does the inferencing and the new capability provider, which represents a data source like regardless of if it be a camera or something else.

C

You know the source of some data and like shuffles, that intermediate actor could shuffle data hands and forth or like getting pulling images whatever stuffing that to that inference provider and put or send the results somewhere else, and what I remember steve you mentioned, want some async actors or so I would like to try some of these for that mechanism.

B

Yeah, so that so that work is that work is in progress so for um for the other folks on the call.

B

We still have a default uh two second response time um anywhere in the chain for for any rpc message, and that includes from the http server to the first api actor, um because then all of the other things that happened, the pre-processing, the inference the post-processing, those all together, cumulatively, have to complete, within two seconds before the um the uh api actor returns to the http server. And if it takes longer than two seconds it'll get a timeout.

B

You can increase that timeout with a system-wide configuration, but um what kevin and I have talked about a way to send fire and forget messages from actors and that'll that'll, allow different kinds of technologies and it'll allow paths that take longer than two seconds. So that's um that's in progress. We should see that um that change soon, probably within a couple of weeks, um but um that's separate, and so that might actually influence the topology uh that we end up recommending um and there's also some other things.

B

I know andrew had brought up the idea of trying to get zero copy in the tensors um that, if, as we dive into that, so I put that under the category of performance, optimization um as we get into that the api will probably change a little bit. So that's another thing that we'd like to get a little more practice with before we before we tell a bunch of people to go run with it.

B

I'm sorry, I hope I'm not going too much into the weeds on this. No.

A

No, no um so kristoff. What I hear from steve is before we go down the zero copy approach that we should. Maybe try to you know steve. I don't know if you have a specific proposal here, like you know, um do you think we should spin up a couple models ourselves and just run them? um You know like point: a camera out my window run the dog recognition model and figure out. What's the most popular um you know, dog walked in the suburb of washington dc.

A

um You know uh what what do you? What do you suggest, or we try to find and recruit a couple folks that are building things like kristoff? Maybe you have some concepts that we could lean in and help out to get up and going um to try to generate some some data and some experience with this.

C

D

C

There there's a lot of potential benchmarks out there. I would just grab some existing stuff, for example, the other week I switched on for the first time that rather new coral dev board and run the demos which are in that which are shipped with that board, and there are some nice things and many of them deploy also that mobile net model. So one of these we also have here, we could more or less easily compare the results. Maybe that's a good start.

A

uh So um so you want to use some of the existing benchmarks that are already shipped with the coral we set them up in blossom cloud and we see what the difference is running them through there. What's the test, harness that the coral has is it just like some python python scaffolding or um anything like that.

C

Yeah, it's c plus plus, uh and it has a python wrapper, but the core is c plus plus so of course, tensorflow lite, that's a c plus library.

C

And which I'm entrained to I mean uncompiled it on my pc I try to. I would try to integrate it even in in our ml inference provider.

C

So one of the goals don't know how important that is- is to run that ryzen cloud on that corel dev board and let's use it the edge tpu, which is the accelerator of the dev board. So that would be my lt ultimate demo. Also for for my use cases, which I really target so for industrial use,.

D

Yeah that'd be pretty interesting to see, uh see what that looks like in terms of your question on the slack channel. uh Are you envisioning that uh at the wasm cloud level, users will be able to see the different hardware? That's performing the inference right or like with the fact that the inference is being performed on a tpu, be exposed at the wasm cloud level or not.

C

um I didn't understand the question I think.

B

It should be transparent, but we could we could expose it in terms of adding information to the response. um uh If you want to use that information for routing the request, we could also try to come up with something. That's somewhat abstract like like. Maybe a choice of optimize for speed or cost, or something like that and then have some router decide.

B

We we try to keep uh architecture specific stuff out of the messages, but if there's any kind of some goal parameters that might make sense to add to the request, we could figure those out.

D

I think actually I was leaning the other way towards not exposing that type of information to at the wasm cloud level that you know I was just checking to see and it sounds like what the proposed path would be. Well, the capability provider just runs either on the on the coral dev board or on some other machine.

D

Wasmcloud really doesn't know the capabilities of that machine, but the capability provider itself does and it can choose, it can say. Oh I see I have a tpu, I'm going to use the tpu. Oh, I don't have a tpu over here. I'm not gonna, I'm gonna use the cpu right is that sort of the model we're looking at here, yeah.

B

Yeah, that's correct, um so the provider could decide which which of the available resources um on the current host where it's running it wants to use. um In addition, there is some there is some network smarts built into built into wasm cloud, and um so, if your edge tpu is closer like connected to the same nas leaf node, then all requests will go to that by default.

B

But if the edge tpu is not on the network, but there is something in the cloud uh is: if the same capability provider is running in a cloud server, then it will route to that. So we'll get the kind of failover scenario that um you and minku were talking about. Yeah.

D

Yeah cool and then and then potentially in the future. The the qos side of this, where the attributes exp expose at the wasm cloud level would be qos based, not you know, architecture dependent, stuff, definitely right, we'd, be talking about latency and and throughput, not about tpus and gps.

D

A

Okay, so I'm kind of feeling, like we have a consensus here on a path forward here for next steps. um It sounds like the um the demo kristoff that we're going to do is we're going to maybe focus on the coral dev board, steve.

A

If you don't have one, I can send you one if I didn't already and we're going to pull some of the sample examples that they have and we're going to try to configure the same examples in wasmcloud locally and we're going to just compare them side by side and uh use that as a baseline for how we think we're doing, and then we'll use the results uh from that. In order to understand what we think the next direction is for the demo kristoff.

A

I guess you didn't hear that you may have stepped away for just a minute. I said it sounded like we have um some consensus that we're going to try to recreate um our clone. Some of the default coral um uh dev board uh examples using the tpu we're going to configure them to wasm cloud and just run them side by side and see just what the initial pass at the differences between the two and then we'll use the output of that to understand where we need to pursue and where we invest.

A

Our efforts from the demonstration perspective does that feel right, yeah.

C

That's a very good idea, I'm very curious not only to run models on the edge dpu, but also on that these arm cores, because I read that the inference engine we use that track. That shall be optimized for armed course, because that guys who wrote it they had arm in mind. So I'm really you can on the dev pod, also run the inference on the cpu right. So I I'm really interested to to compare.

A

All right, steve andrew, um are we all in agreement here on what our next steps are.

B

It's okay with me: what do you think andrew.

D

Yeah yeah, it's it's uh sounds like a good plan. I don't have too much skin in the game right now, so I will cheerlead at the moment, but.

A

Well, andrew, I think we're heading towards this is you know, sort of gives us the kind of like um um test for your use case, which is.

C

A

Decisions based on some of that right and.

C

A

Routing as needed, do you do you feel this is a prereq that this will get us one step closer to the demo that you uh that you want to see done with this.

D

Yeah, I think so because that's a couple weeks ago or a month ago, I know I was talking about. We need the data right, we need. We need to understand what is the performance at the edge and what is the performance in the cloud and, uh as you guys are running these experiments, if, if there's a way to capture that in a wasm cloud accessible way, then that can inform the qos attributes that we were talking about earlier.

A

I like what we're thinking about here like um the qos of ml, like yeah yeah, there's, there's something there like it's.

D

Hazy, it's hazy still, but it's it! You know if you know how many, if you knew your throughput or your latency down at the edge, then you can start making decisions back based on that.

A

You can start to decide. Is it worth it to process this.

D

A

Pre-Process it versus offloading it yeah or what at least what the trade-off is. If I'm offloading, I'm gonna get one frame per second, but I'm gonna get this level of inspection. If it's local, I'm gonna get, you know, 30 frames per second, you know yeah.

C

A

It's only going to be really lightweight, maybe I want to maybe I want to strategically. Maybe it could be adapting that rate to do you know the lightweight model and the heavyweight model. You know, there's lots a lot of different areas. We could go in with this.

D

Yeah, the the self-adaptive thing, that'd be that'd, be extra extra credit right. There.

A

Adaptive ml would be super super extra credit right because then we would get to call it. Aml and we'd have a conflict with anti-money laundering right on the appreciation just always attack when you could, like you know, just muddle the bottle, the communication even more all right. I love this. This is great. um Okay, so do we need to talk about who's, gonna be uh steve and kristoff? The two of you seem to be, like you know, doing a great job managing the work now between the two of you.

A

Do we need to talk about any of that, or do you guys want to rely on anything um or do you guys just gonna hand wave and just continue to make magic happen? Sort of like when the free godmother comes down and just waves, your wand and, like the pumpkins, get up and dance? That's what I feel like happens here. You guys just it's amazing, yeah.

A

Bippity boppity boo, indeed steve.

B

Yeah christoph christoph, and I can um uh follow up on slack- to fine-tune.

A

Things all right well, bippity-boppity-boo. I guess that is the end of this week's call. I'm gonna go and stop recording we can hang out have a great week.