OctoML Apache TVM Community, 18 Mar 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Apache TVM Community Meeting, March 18, 2021

Description

Apache TVM Community Meeting Topics:
Status update on µTVM
Status update on Project Generator API for µTVM

A

All right so again everyone welcome to the tvm community meeting. uh Let me share the agenda.

A

A

Okay, um so we'll get started off with um with some introductions. So if there is anybody who is new to the community or who hasn't come to the meetings before um we'd be happy to uh to meet, you welcome you to the community, uh I'm I'm chris hodge.

A

I work for octo ml as the developer advocate for apache tvm, and so I'm here to help do things like like, like run the community meetings, try to help um write some docs um and you know just kind of like, hopefully make it make make using and being making make using, tvm and being part of the tvm community uh an easier and better experience for everybody. So welcome to the meeting everyone.

A

And is there anyone else who would like to introduce themselves or say hello at the start here.

B

Hi, I think uh this is actually it is the first meeting I am attending. uh I'm started to work with a micro, tvm and tvm. Recently I work as a software engineer at linnaro and yeah hi, guys.

A

It's nice to meet you nice, to meet you chris.

C

B

D

B

I I had the chance to talk uh several times with andrew before so yeah.

A

B

E

Andrew okay, well hi, I'm also attending the first time. My name is alicia and I am a software architect at siemens ai. Previously I worked at intel in san diego, so I already know some folks hi.

A

Hello, it's nice to meet you.

F

Alicia good to see you again.

A

D

uh Hi, I'm chris um I've recently joined uh arm, and I started looking at uh micro. Tv me type things um very, very new, only like been armed for a few months so happy to be here.

A

Welcome this is uh this is a great first meeting to be to be joining since, since we're going to be largely focused on a bunch of micro tvm topics as we move into the meeting. So welcome to the meeting chris into the.

A

Community, okay, uh so with with uh uh with the with the introductions, um uh completed um whoops um we're going to move on to some announcements, um the uh so uh the apache tvm community um uh recognizes people who are contributing to the project um with with uh different. You know, with different status of you know, as a reviewer or as a committer, and so over the last month, we've we've promoted.

A

Two members of the community, so we'd like to welcome andrew royce, is as a as a new committer to the tvn project, as well as dmitry smirnov as a new reviewer to the project. So a big round of applause to them, and thanks for all of the work that they've done on the project, uh all of the all of the work that they've done.

A

I don't know if there are any other announcements from the community that people might want to make like if there are are there if there are any talks or any any presentations or anything that that you'll be any papers. You publish with tvm, but we'd also love to hear about those here too, so that we can get the word out to everyone.

C

I think we're giving a number of talks uh from the octomell side, um luis and jared- will be speaking at gtc from nvidia I'll, be speaking at the stanford mlcis seminar for some applied ml user, um tvm, evangelism um and there's there's a couple. Others I'm blanking on now.

A

Alrighty, so with that, let's actually move into the the primary agenda of the meeting um with uh we're going to start off with a micro, tvm update and then move into a project generator update.

G

Cool yeah thanks so much for organizing chris and um yeah, I'm uh andrew royce. I work on micro, tvm um yeah for those of you who don't know me um and uh yeah. So, let's see maybe I could share my screen because I think um just wanted to. I don't have anything super formal put together, but I thought it would be a good time to walk through our m2 roadmap, which is kind of what we're um centering all of our work on actually sorry, chris, could you enable the participant screen sharing yeah?

G

I just I just turned it on okay, cool thanks, great uh cool, so um kind of the way we are planning on micro, tvm um kind of at a high level. Is we use these road maps to decide all the different projects that are active at one time and so, like our previous one? We linked here, which was a standalone micro, tvm roadmap, and each roadmap has goals, and things like that. So last year we you know tried to.

G

We basically worked for the goal of getting a model to run sort of standalone on the device um without interaction with from the computer and so kind of hit that goal- and so um let's see so now this, hopefully in the first half of this year, um without putting too much of a um exact date on things, uh we're kind of hoping to broaden our ability to support runtime environments.

G

So we want to be able to right now we support um zephyr, fairly um uh tightly, and um so we'd like to add more, um I guess, rtos or platforms or other ways of running models on on devices we'd like to also support extend that idea of environments to also mean um other architectures. So um you know we have uh arm. We have risk five. There are other micro architectures out there that you could um imagine adding.

G

So we just wanna sort of you know, prove out our flexibility there, um as well as consider architectures that are kind of emerging like um support for accelerators um on socs, which are sort of becoming increasingly um close to reality, these days um uh in the microworld and okay. So then, the next one we want to do is we were looking to um build a an integration with some command line tool for tvm and- and we have um kind of some initial work on on a tool called tvmc and that's actually packaged with tvm.

G

You, you get it when you um install it the tvm package, but we don't have support in tvmc right now for micro, tvm, workflow, basically, and so right now, if you want to use micro, tvm, you're kind of writing a python script, and that makes it really hard to use micro tvm as a tool in embedded development.

G

The last thing we wanted to do is was provide a little bit better estimates from the tbm side on kind of the model footprint, how much ram how much code space are we going to be using on on your device um uh before? Hopefully, before you run your firmware compiler, um at least on the ram size, it should be possible to estimate that a little bit better and so um those are kind of our goals and then and then to those goals.

G

We define projects that um we think kind of individually contribute and as a whole, um move us toward those goals.

G

So I thought, maybe just as a start for um the micro tv update, we could just go through the projects that we that I know of that are um in progress and if anyone's working on um any of these projects- and I haven't heard about it yet I'd love for you to speak up and give us an update um I'll start with the first one, which is the library generator and- and um this was also a model library format. um So I have a couple of the rscs pulled up here.

G

I don't want to go through a lot of these in detail, because I think a lot of people here have um looked over them, but um briefly, this is a standard output format from tvm, and I actually have an example of that. Oh, I need to reshare.

G

Maybe I'll just share my whole desktop. Let's see, can everyone see that same window now? Can everyone see finder yep that looks good, okay, great yeah, so uh model library format? I've got a little example here. I don't know if this is too small for people um but effectively. This is a set of output files that we think kind of closely reflects the output of tdm relay build and just to come back over here. That's this function here, um that's kind of our core.

G

um I guess our our top level entry point for compiling um end-to-end machine learning models and uh right now you get this three tuple from tvm relay build it.

G

You get uh graph json, which configures the graph runtime you get a library which contains the generated code or compiled code depending on your target, and you get um parameters which I I'm calling simplified parameters kind of in my micro tvm talks, and these are kind of parameters that have been optimized by the compiler and what I've done here is um created a basically a disk tree format that represents um that has a place for each of these things.

G

So here we've got runtime configuration if you had other runtimes like aot or or virtual machine runtime. You could have a separate directory in here. There's no reason you could could only produce output for a graph runtime. You could produce you compile a model to work in many ways.

G

Here you see the graph json which you saw before we've got. I guess at the top level. We've got a summary of the model here, so you can see. This is a super simple model. I don't know if everyone can read this very well, but in here you can see we're just adding two tensors of size, one by two together. So this is a really simple example, just intended to kind of make sure that things are kind of working for another demo I'll do a little bit later, there's a metadata file at the top level.

G

That kind of gives an idea of the memory, the different tensors that we need to allocate in memory, and so you can see here like for this simple example: we've got a a tensor for a a tensor for b and then a tensor for the product. Basically or sorry the the sum of a and b, you can also learn about the target that was used to compile this.

G

The runtimes that are enabled um there's a model name parameter that'll, become more important, a little bit later, and also um as well as versioning um here to to kind of allow downstream tools to make sure that they can actually interpret this output.

G

Then, looking to the actual code, um we've got a codegen directory underneath the codegen directory. This host target here is is in the host directory. Here is intended to group together all of the code, that's intended to run on the host cpu, and if you were compiling um a model into some sort of heterogeneous scenario, you may have another directory here that would contain code, that would run say on an accelerator on a gpu that kind of a thing. So the intent is to accommodate that here.

G

Inside of host we've got src and you can also have a live directory. If you were compiling with the llvm backend, and here you can see our two source files we produce here. One of them contains a function registry, which is for the c runtime when doing remote code execution and the other one is just the implemented edition here. So so that's, hopefully a whirlwind tour of oh, I guess. The last thing I should say is: there's also a very small parameters file and that contains the third output from tvm relay build.

G

So that's, hopefully, a very small whirlwind tour of what we've got for model library format right now. um This format is not set in stone. We put a version number there on purpose and um you know we intend to kind of expand it. um One one known issue that I've discovered is that we don't currently support um producing this from just plain tvm.build, which is kind of a function we use to build single operators, and so we'll have to look at that.

G

I don't think the format needs to change for that, but I think that we need to change our generator to accommodate that output. A little bit better.

G

So that's that project um next we have auto tuning support and I've actually got a pr for this up um and it's a little bit. um It's it's close to ready to merge, but it it has a there's a little bit of a um a problem where it expands the set of tvm dependencies um that are are required, and so um I was kind of uh letting this one sit for a moment. While I work on the product generator api and as we'll discuss later, that's kind of meant to resolve this problem.

G

um Nevertheless, you can run this if you, if you pull this, commit and and see how it works. um So um that's kind of the support the status there, uh the ahead of time, runtime. um I know that we've got this um rfc here from um uh giuseppe and um I believe, there's been some implementation done there. I don't know if you guys are interested in giving a short status update or um we can kind of leave it at that for now and and get a more comprehensive one. You know at a later date.

G

um I don't know if you're separate romano, you wanna to speak to that either of.

D

H

Yeah I mean yeah, we are I we are developing it. uh We are not not too far from a first draft, pia but yeah. That's there. Is there isn't much more that I can cannot do that? It's it's it's it's getting there. uh We will yeah. I I don't know exactly when we didn't get estimated sure.

B

Sure yeah sort of deadline.

H

But yeah it's uh it's getting there and I think my my uh many things here is. I think we got a sort of agreement. There are still few minor points to be agreed upon, but we can discuss that on the pr when it's definitely.

G

H

I think we're at a point where.

G

We've kind of discussed here like um some of the different you know: uh high-level accounts aren't implementation and you know as you're working through an rfc. It's kind of nice to um you know to get some high level feedback and then go and and make a pr that people can look at more in detail, and so I think we're definitely in that phase of things where um you know just good to have um more more.

A

Detail and more.

G

Context and then we can, you know, come back and have a uh another discussion at that point. So.

I

F

I

Hey so this is michael, so I'm with bosh so nice to meet you, hello,.

H

Michael nice, to meet you.

I

I would be interested. Can you maybe comment a little bit on the memory planning aspect that you have the that you have discussed in this rfc uh so.

H

For now we are not touching the the memory, so the if you egg, thanks andrew, if you can scroll uh down a bit.

A

G

An example yeah sorry that.

H

The earth is pretty long. I just.

G

I know it's okay, at least here we go memory management here.

H

Yeah, so basically there is this glt memory, uh which we basically um we basically borrowed from from the graph run time memory management we we are not touching. We are sort of reusing uh we're sort of using that for now the memory still needs to be specified uh as a sort of a um a constant in the in the runtime but yeah once the once. The work is is done. That is, it's gonna, be very easy uh to to to have something a bit better than that yeah.

G

Definitely, I think, also from our side we, we kind of see the memory management. um I assume what you're asking about michael is is um kind of the need to remember the desire to remove um kind of this dynamic memory allocation from the from the runtime, and we kind of see that as a little bit separate from the aot runtime, uh the aot runtime, you know we want to focus on.

G

uh Let's see if I have an example up here, I think uh basically assembling a sort of a run function here, that that um traverses a static networking order and just invokes um the compiled operators kind of one after another, um then, as far as how we supply the tensors, there's actually another entry on this m2 roadmap here that we can talk about in a minute here that um kind of goes over a brief sketch of kind of how we might do um sort of memory pinning or some sort of some sort of memory allocation scheme that doesn't require dynamic memory.

G

um Does that answer your question.

I

Yeah uh kind of so I was going in the direction of I think you were mentioning something about unifying the relay, so basically, the intra operation and inter layer, operation memory planning so.

G

H

I

One of the goals right, whether once we have.

H

Once we have a t that that goal can be that that sort of goal can be um sort of unblocked right. Okay, that that's the main thing, because.

A

H

Producing a sort of tier model, so this this function that then we show in this tbm run func uh is generated through tier, so we will have a tvm run, function function and so, once you have here, everything is steer and then basically you can. You can do uh a sort of unified uh memory planning uh we want so yeah. That's.

A

H

Enabler, but for now we are not, we we just give aot and then we will. uh After iot, we can work uh on unified memory planning.

J

Hi, my name is ramana. uh I also work at arm. I think one of the other ways of saying that is. We see the static memory planning or the aot memory planning as a key part of enabling micro tvm on embedded systems, and we are also very interested in that from the point of view of supporting cortex and the cortex-m architecture, as well as the ethos. U micro, npu, that we are working on so so this is uh so.

J

What giuseppe has presented so far is the portion of the aot, which is trying to replace the json from the graph runtime so that we are more more comfortable with uh with and with a full ahead of time flow. That's where.

C

J

Go through the thin slice of ahead of time right from the front end, all the way to the uh to being able to execute stuff in a ahead of time, fashion on a device and then.

G

Yeah as say, I think I think that that's that's exactly right, and the other thing I was going to say was that we found that um there's actually, um uh the json parsing itself actually takes considerable memory overhead, um and so even just doing this will actually um uh switching to this aot methodology will actually bias quite a bit in terms of um uh ram utilization on the device.

G

um Sorry, roman, I didn't mean to step up. No, no, that's absolutely correct.

J

Yeah, that's absolutely correct and that's why we want to go through that aot route.

G

The other thing I wanted to bring up here was that there's a parallel effort here um uh that hasn't it had a little bit of traction, and um you know we're working on this kind of um as we can, um you can see, it's called remove, compile engine um and so we're sort of stalking, jared's branches here and um one of the the key things that um that we need to be able to do in order to um pursue this world of sort of a unified memory, planner that that knows about both intra operators and inter excuse me intro operator and inter operator memory, so that would be like um if you look at a lot of our operator implementations.

G

I don't know if I have an example here to show you um but you'll see calls to um this tv and backend alec workspace uh yeah. I think it should be there in the if you, how do you have it in here? Let's see here, I don't yeah yeah.

H

You'll see calls like.

G

This um in in our um functions and then there's a separate call called, um I think it's uh tbm device, alec data platform with pvm platform, allocate well yeah. There's that one and there's another one.

G

Oh no, it's here.

G

There's another one in here: this tpm device, alec data space, and so these are. This is used um for the uh inter operator memory. So that's like, if you have an operator that produces a tensor, it uses this call and and then, if you have a if you want to to get a temporary working area from within operator, we have this other call here: tbm, backend, allocary space, so you'll see. One of these is in the back end, uh see back-end.

G

Api tvm expects the c back-end api to be implemented just to run operators, because any of these functions are legal to be called from operators. Meanwhile, this tvm runtime api, the device alec data space that one we use from sort of nd array and from our our graph runtime. um I believe um to allocate uh these inter uh tensor memories and so for micro tvm. um It all goes to the same place.

G

It goes to this tvm platform memory, but- um and that's that's what giuseppe is saying, but um but we do still have this sort of bifurcated concept at the at a higher level here um and so pursuing this compile. Engine refactor will create a um sort of fuse tir function. um What's the right way to say this, while we're compiling a relay function, um we partition the function into. I guess what I would say: ta are our tasks.

G

These are kind of individual units that you can auto tune and um at the end of that process, uh we don't ever, I believe, assemble a um sort of a glue function. That then uh uh represents the call graph in tir again uh once we partition that call graph. We don't um we don't have a good way to to visit it with uh look like your traditional compiler visitor um uh architecture. So this remove compile engine um change is sort of uh the first step to um to fusing the uh back together the graph.

G

At the end of the compile compilation um step, we think that will number one enable a bunch of interesting optimizations with um kind of graph level um optimization, but also too. I think this will allow us then to run the graph memory planner over the entire original relay function and as it's implemented in here, and so that will allow us to do memory planning across the entire graph. So does that kind of? I guess maybe that hopefully that clarifies your kind of where we're going with that michael.

F

D

F

D

F

Yeah uh so sorry manova from mom, uh so I was in in that proposed unified tier module view the primitive function, boundaries be there or will they get in line.

G

um I think that that part is, I believe I unfortunately I wish jared was here, because he would be the right person to speak to that, but I believe currently they're.

G

Separated and the idea is to make that appear as if they're, combined in in the visitor logic, but I don't hold me to that, because I haven't actually read this change as much as I should. So um I think, if you have an opinion on that, there will certainly be plenty of time for youtube to voice your opinion, yeah sure, yeah um and and with changes like this. um You know it's worth pointing out to the community that um you know uh as we develop these changes.

G

We we always want to go through an rfc process to make sure people are are um okay with these changes and- and this particular change is like I'm, showing you kind of like a early look at our um um kind of our proof of concept work and especially as you're writing an rfc. It's really helpful to just do a proof of concept to make sure that you're um uh what you're writing the rfc kind of matches with reality and that um you aren't sort of um um talking past yourself.

G

I guess that makes sense, and so um you know at some point when we actually come to a point where we want to merge this we'll develop an rfc and release that on the discuss forum, and so um uh especially for a change for for points like this. I think that's a great place to bring that up. um It certainly doesn't you know just because we finished the rfc. The proof of concept doesn't mean that the rfc is too late to ask for things like that. So um cool, okay.

G

Well, I think we've talked a lot about aot and memory planning. um Maybe.

I

One last thing is the code available um in the git report somewhere.

H

You mean the afd code.

I

Yes, the uh the poc code.

H

So the you mean dpr that backs this uh this rfc michael right. That's what you! Yes, that's what I mean? Yes, no not for now. I I yeah I'm about it will be available uh soon, but I.

A

Think for now I'm still.

H

Cleaning up the code and making some experiments to to make sure it's it's correct, so yeah, it's it's! It's gonna be available soon, but it's not yet ready.

G

I think we're still in the sort of early phases or early early steps with this rfc too so yeah. I think you know we'll look forward to the pr in um coming weeks. I guess it's gonna be the right way to say it so yeah.

I

G

Great um okay, so let's go back to our road map here and just I think, we've kind of covered some of these uh topics out ahead of time or out of order, I should say different thing: um uh we have this risk. Five isa support uh a line item here.

G

I don't know that that much has been done on this yet, but we'd still like to to work on this um zephyr has a couple of uh boards to find for uh risk five and I think the easiest way to kind of start tackling this would be just to um work with that. Actually, I will walk that back a little bit and I think there was a forum post on this uh last month. I want to say that someone had um worked through this. I need to revisit that um before.

G

I could give you a summary of exactly where that was, but I think it did get some traction, so um some work being done there um comprehensive memory. Planner is what we just discussed. That's the idea that we have these two different ways of allocating memory um from the compiler's perspective and uh we'd like to be able to at least see all of them in the memory planner, um whether or not things get moved sort of fused into a single um back-end call or whatever that's sort of up for.

G

I think that'd be another question project level. Api we'll discuss that after um we're finished with going through this milestone um uh roadmap list, um but just to give you kind of a a taste of this, because I wanted to talk a little bit about what it means for the next one here. um The the current way we implement we integrate with um with platforms and rtos are by.

G

We use a python, a set of python interfaces and we um place the implementation of the interfaces inside of tbm. And that means that, if you ever needed to import packages from your particular artist say you need to import some zephyr dependency like a serial library.

G

Well, technically, tvm has a dependency on that now, because we've got code in the tvm code base to do that, um and it gets tricky as we're trying to do sort of these parallel efforts like to clean up our ci dependencies and things like that to to have this code in there in the repo.

G

um So with the product level api, we are sort of um adding we're converting that python interface into an rpc's layer, and that allows uh as well as sort of cleaning up the interface um to map it more towards what you would um traditionally think of, as sort of like a embedded build flow, and so we'll talk a little bit more about that.

G

But um one of the I guess upsides of doing this is that um especially of the the interface cleanup is that, um as we discussed earlier, we want to start building these command line tools for tbm um tbmc is for building towards and tbmc. We want to be able to um do things like um take it, take a um a relay model or some some onan x model or any of the formats that tvm can import from compile that into c.

G

And then, um if we're sort of interested in in the auto tvm flow, we want to be able to.

G

uh Build uh generate a firmware project, build the firmware project, flash the firmware project and sort of drive, remote execution on on a device, and um one of the so so doing all that is kind of in the domain of the project level api. But then that allows you to also uh add those features to the tbmc command line tool so, rather than having these python scripts that you're kind of running um that have a bunch of code. I wonder if I had put up an example of that so, rather than having.

G

Let me just find this real quick here.

G

So a lot of our our demo code looks like this right now, um it's sort of a bunch of um python and you have to write sort of custom python to match your workload and and uh and drive the entire compile build flash toolchain from here. um That's a lot of a big ask for a developer who's just coming to tvm, to try to add a machine learning model, particularly if they've got a bunch of other project specific stuff already set up.

G

So the idea with this tbmc integration milestone here, is to make it possible to install tvm on your system, um run tvmc from the command line and produce one of these at least produce one of these model library format, directory trees that we saw before um that way.

G

Tvms kind of does its job and gets to a point where developers can kind of consume that in a format that they they know, will will stick around for a while um and then provide additional tools to build and flash if they want to do things like auto tuning.

B

Andrea uh thanks a lot for uh posting the rfc for the project api sure I do intend to uh have a look at it today and update the rfc for the tvm integration um very soon.

G

Okay, oh cool yeah, yeah yeah, so we did have a uh that's right. So um what guitar was saying? um I think that was I yeah. That's right!

G

Zoom hadn't popped you up here, so I couldn't tell 100 of that was you, but what cause I was saying is that uh he's posted an rfc on this tvmc integration um and thanks for bringing that up, because I should have um should have linked that from here.

B

Yeah, it's not totally up to date because it uh currently doesn't include the new uh model, library uh format and and also doesn't include the um projects api. So right needs to be included, also and uh incorporate the comments uh right and the discussions we had also in that thread, but that that's the rfc yeah.

G

If you guys want to read through it, you can kind of see all these rfcs are kind of starting to fit together into a sort of this new development story for kind of easier.

K

G

Story for micro tvm, so that's tvmc, the last few here are: we discuss the pin memory a little bit.

G

I think the idea here is we discussed kind of being able to see all the different tensors that we need to uh that we'd like to use from the graph memory planner, and once we can see everything, then we can actually start thinking about uh defining exactly how much ram we need to use and then start doing things like either pinning those tensors into global variables or maybe even better, yet defining memory, regions and um thinking of tensors as offsets into those memory.

G

Regions um kind of depends on uh the situation, um but uh you know that's kind of future work that we're hoping to get into in the next. Maybe in a month or two um footprint. Estimation comes directly from that. Once you can do that, then you should be able to say that you need a certain amount of ram. um That's kind of it's almost comes for free from pin memory.

G

um Auto scheduling is a new way of scheduling operators on devices and and it's sort of um it's it's another iteration on the auto tvm concept, and uh one of the advantages of auto scheduling is that uh it allows for uh it's a simpler way to express your operator, implementations um and so right.

G

Now, you're gonna have to have this fairly complex um schedule definition uh and what I mean by that is like we just you describe the tvm okay, here's like a high level um uh description of how to implement a matrix, multiply or something like that, and then you kind of have to map that all the way onto the concept of how to implement a com2d and auto scheduling allows some of that um definition to be automatically computed and and performed and sort of.

G

uh I guess, searched in the same way that auto that auto tvm works, um so we'd like to try that with micro, tvm uh and then the last thing here is um accelerator-based inference and multi-core inference, um the idea being that we should start looking at heterogeneous computation, and um you know I think that uh arm is doing some work with this with the ethos c55 um accelerator here.

G

um But I think there's also some really interesting use cases with sort of dual core socs that I think would be really interesting to look at as well, because hopefully those two things appear somewhat similar to dbm. So, okay, so now we've got through the roadmap.

G

Does anyone have any questions before we kind of move on to the next? I wanted to spend a few minutes talking about the project api, which I realized is a very a very recent rfc um and post.

G

I think I just posted it yesterday, so um uh I'll do a little bit of an intro and maybe show you a demo of that, um uh just to show you sort of the proof of concept, but just wanted to stop for a second and see if anyone has any um questions or things they wanted to bring up or- or you know hey. Why aren't you working on this? That kind of a thing.

G

And I guess yeah I guess we can keep going if not.

G

So let's talk a little bit about project generator api and now I have too many tabs and I can't find anything uh okay here. It is okay, so.

G

uh Let me bring up one other thing here:.

G

G

Just briefly show the compiler interface so right now the way that um tvm integrates with um well okay, maybe it's a good idea to back up and talk a little bit about. Why do we even care about compiling code from from tvm shouldn't? We just generate code and be done with it, because you know in some sense um for for from our projects.

G

um Maybe we want to implement operators with llvm, but there's significant value to being able to like look at the output of look at the transpile output of a model in c and and play with it, like it's kind of nice, to be able to have this um lib1.c that contains.

G

You know the exact c code that you're running on your microcontroller, particularly when you know it may be that in in a first cut of your implementation, it turns out that the memory that you give it is kind of um off in the weeds and- and maybe one of these ad instructions triggers a hard fault. So how could you debug that, if you know we're sort of generating this binary blob right, it's kind of nice to be able to have some level of this code? So you can point your debugger at it.

G

So why should tvm even think about building and compiling code? And this you know- comes back to basically tvm's optimization strategy, which is search, um and so you know at some level um in order to to get good performance. We have to go through this sort of um uh test loop, where we, we have a high level description of your model and we break that up into tasks. This is the auto tvm process and then um for each task.

G

We sort of try many different ways of implementing the task on device, so um uh things like reordering for loops um things like changing the uh stride, I guess of a computation unrolling for loops. That kind of a thing, and so we we do that um we basically generate a bunch of different candidate operators and then the next thing we want to be able to do is time those operators well for that. That means that we need to be able to build flash and run code on a microcontroller.

G

So this is why we even care about integrating with compiler full chains and right now, starting from this project. Api kind of our integration is quite um tight. I would say so. We have this tv microcompiler class, just jump over here, real quick to show it to you, it's an interface. So it's you know it's meant to be um implemented and we have a couple different implementations in the um um repo, but tbm expects to compile like a library, very separately and and then a binary, and why do we do this?

G

Well, that's because um uh for auto tvm there's several pieces that don't ever change um the rpc server, the uh all these common memory allocation functions that you saw before and then there's kind of the one changing piece between iterations, which is like the different operator implementation and to start with. You know when I started when I started writing this project.

G

You know it seemed like a good opportunity to get some speed up between um uh you know between iterations, you just compile the new operator code for each iteration and and that you know you kind of um uh go with that.

G

As I started to work through the pro the process of getting auto tv, I'm running, I started to realize that one problem with this particular interface is that it assumes this output.

G

Object is something it can move around and that's not necessarily always true of embedded devices, particularly if you wanted embedded project build tools, particularly if you want to run a debugger, um and so, if you want to run a debugger, you kind of expect like well we're going to generate one firmware project and then build it and then flash it, and it's going to stay in place and further.

G

um It's worth saying that a lot of these build systems have an initial spin up like when we're building for x86 you just kind of launch gcc or g plus plus, and then let it do its thing and build a dotto file, and it doesn't have very much overhead. But when you're, for example, launching zephyr, it wants to do a bunch of board specific discovery to figure out. You know which module should be enabled, how much ram do you have and and that process actually takes considerable time.

G

So when you're doing this um one time for each library or binary implementation, you get a lot of overhead here. um So all that kind of came together to make me sort of want to reevaluate the project api that we introduced. So we have this compiler, flasher and transport classes that we were kind of using to model the whole um compilation flow.

G

um And that's not sorry. Let me go back here. That's not um exactly I mean if I was an embedded developer. I'd start with this and say: okay well, um first we'd want to generate a firmware project that had everything in it and we just want to run a sort of build step. Not don't worry about library or binary. Just build the project then program the project on device and then you know somehow drive the execution.

G

I think that probably corresponds a lot better to what most embedded developers have um uh would would expect from uh like. If you just pop up on an ide or or use a build tool, this is kind of like the four obvious steps that you think of here.

G

So project api gets rid of these interfaces here and kind of centralizes around a single project, api interface and so we've got generate project, um build the project, flash the project and then a set of functions to handle connecting to um we're still retaining our approach of of putting the microtvm rpc server on device, and so these functions here you know, handle the process of setting up what we call a transport for the rpc layer and then sending and reading data from to and from the micro tvm rpg server.

J

G

J

A question here: is this really about uh looking to make the auto tvm flow, more streamlined with our tosses? Yes, I think the main.

G

Ambition here, yes, the main and well, I would say that is, I would say I would almost say, that's the primary goal. I think the secondary goal of this is that we we get a um if you want to. You can also use this as a project generator for for projects that don't just run auto tvm, but the primary goal is to sort of enable um this auto tvm build flash or generate build flash um execute um workflow.

G

J

My question would be how much effort do we think it is to support the cross product of our tosses that are out there their vagaries of build systems and the way in which they evolve over time.

G

Yeah I mean, I think, that that's basically a big question and I think one of the challenges we have with the current system is that um right now, if we want to look for a particular compiler implementation, um we look in tv and micro contrib and, if we're you know, signing up to maintain one we're putting it all in here- and you know this was non-trivial even just to get zephyr working it. um It meant that we had to check in um uh well it's.

G

It meant that we had to create like a new ci container, and none of this work is like particularly unexpected. But I think what you're getting at here is like what I was seeing with the tvm repo. How much? uh How much work is it going to be to maintain the cross product of these things, um especially in the sense that they're occupying the same virtual environment and occupying the same ci?

G

um That kind of a thing um now.

J

I'm also thinking about the time evolution of these projects because they evolve in their own.

G

Every project evolves in its own right exactly exactly um so. I kind of see this as tvm putting a um a line in this hand of here's. Here's kind of the api um around which we expect the product to um to build, and it's it's a very loose api. So it's not um something that you need to. You know really tightly um integrate with your with your platform. um You know these generate build. Flash steps are are very high level steps that almost every platform is gonna um to implement.

G

I think I see your I I guess maybe is a good way to summarize your concern that uh how challenging will will it be to implement one of these project? Api servers.

J

And maintain it.

J

Because it's it's, I guess the question I have is um really. uh The other project also has a has a direction of its own.

G

Right, let me talk a little bit more about where I see this like living and then um maybe uh maybe I I actually. This is a really good point to bring up, because this is one of the things I wanted to one of the areas of feedback I wanted to solicit about. This was like um there are some sort of implications of doing this, that um sort of mean uh it will change where code lives, so maybe just to quickly go through the rest of this implementation, and then we can come back to this.

G

um So the idea.

A

Is that you have.

G

Oh, no, no, no problem! No! I'm thanks for bringing this up. It was a great great question. Ask so you start with this template project and underneath that template project you have um let's say you want to implement this server in python. You've got a microtubule api. Server.Py tvm will look for this file and just run it if it um if it finds it in order to start the server.

G

The other thing you can do is you can have a shell script that launches the server it'll run that instead um this template project. The idea is that it has um kind of the the the generic glue that you need to actually run uh computation on the device, so src main would be. um This is very similar to sort of our template.

G

Zephyr project here um in zephyr runtime, so the idea is you just you, take this sort of template project here and add this micro tvm project, api server?

G

Okay, so you've done that um and then what you would do is is in the generate step. You'd. um You tell this api server to generate another project, point out a directory. It would create a very similar directory, um but with you know the model library format inside of that directory, um as well as and then sort of expand that model library format, maybe copy source files around. However, however, it needs to set up the generator project.

G

At this point, then you've got this artifact that you can interact with just like any other saved effort project that you're familiar with in the past, so you're creating a project on disk, that's standalone for micro tvm. You can build it with the zephyr west tool. um You can flash it with that tool and and debug it just like any other um firmware project you're used to so that um kind of solves one. Big pain, point of development developing with tbm right now, which is that debugging is kind of very integrated with tvm.

G

um The next thing is that we want to be able to have this api server live in a separate virtual environment from tvm, and so in order to to facilitate that, that means we need to. We can't just import this python api we'd, like we, we have to define an rpc system that sends commands and receives commands. um I wrote a little summary of kind of options I picked here and I'd love to hear feedback if others prefer other options.

G

I picked json rpc because it didn't have any external dependencies and um it it still was like fairly standard and had servers implemented in various different languages. But um you know again we're not super tied to that.

F

So maybe that's.

A

G

Go ahead, yeah.

A

I just want to interrupt really quickly we're we're getting up pretty close to the top of the hour, um so I just want to make sure that we we knew about the time box. There.

G

Yeah, in fact, I think, maybe that's enough to kind of drive any sort of initial discussion um people wanted to have. I think the idea with this is that we would be moving um zephyr support, um at least outside of the python directory and in the tbm repo, if not maybe to its to another repo um with that said, I don't want to drop support for testing zephyr in the ci, and so this is sort of a um uh this is, I would say this rfc is like I mean I just posted it yesterday.

G

I haven't completely thought through the best way to um uh to support this sort of um uh from a ci perspective in in tbm just yet.

G

um I think that the the rpc protocol and api is versioned, so that is one tool we can use to help us here, but I think that um we do need to think a little bit through how uh if we have multiple implementations, how and when we would test those implementations- and it may be that as we get closer to sort of a more rapid release, cadence um that answer changes as well. So, uh okay, that's kind of what I wanted to say.

G

I don't know if anyone, I guess we're getting close to the top of the hour. Does anyone have any um thoughts they wanted to to mention about this so far,.

A

I mean I I wanted to make one comment about. Just like you know, if we, if you consider zephyr like a third party driver, a micro tvm, you know in some sense, is like it provides an api where it has a number of different back-end interfaces.

A

um What I, what I've seen in other in other projects, that that are similarly structured, where there's a where there's an api, you know kind of machinery that does work against that api.

A

um The earlier you separate those those kind of like third-party libraries out um and implementations out from the from the maid code base, the the better off you're going to be because carrying the weight of of of a bunch of different implementations can begin to wear down on a project, um and you know separating those out means that people who want to run zephyr, you know, are able to get the libraries they need and and they're able to support that and and it's a little bit independent and for testing.

A

What I've seen in the past is just there's a there's, a test matrix that is run with some some regularity where it it it just. You know, there's a there's, a separate job that goes and and and builds like you know, say tvm with the with the third party code and and runs a suite of tests on that you know a set of dedicated integration tests so that everybody knows like these are the tests. We need to pass to be able to conform to how the project works.

A

So I think we're on the right path for um you know for for that, it's it's. You know. I think other communities would say that that's the direction you want to move in.

K

I got a question, maybe very basic one in terms of this rpc mechanism that that you are introducing is that anywhere compatible with the existing rpc tracker and rpc server mechanism. Yeah.

G

This is probably a little bit confusing, because um the existing rpc tracker and server mechanism relies on having a tbm runtime on the other side of that rpc link, and here we're just we're really just trying to connect to a script and drive some basic function calls. So this is a separate rpc from that um we're not expecting to kind of place the tvm um uh rpc server inside of each of these projects.

K

Yeah, we probably will probably need some kind of good terminology for yeah.

G

K

Be clear for people I.

G

K

G

Easy, I know yeah yeah, yeah yeah, that's it that's a really good point, we'll I'll definitely take that um into consideration um and feel free, if you guys have ideas of names and stuff to post that up on the rfc.

A

Okay, well with just a couple of minutes left, I think that that we can, we can maybe wrap up this thanks andrew. This was like an amazing overview of of micro, tvm and and the project generator. I think I think this was this was this was really helpful.

G

um For coming- and I'm really sorry we- I talked for a long time, so I I meant to have a little bit more time for open discussion, and I just like to encourage everyone to um please post up more in the discuss forum. If you want to um to engage with us a little bit more, I I want to do this with community support and buy-in. So it's not something I want to like. You know dictate from from our position at all. So so please do um post up. We don't have to do this.

G

um There's certainly other ways to do it or or especially, if you have concerns about testing or anything like that, it'd be great to address.

A

Those yeah and and looking forward to more of the meetings. I think that we're going to try to structure more of the topics around rfcs that are coming up. You know and try to try to have this kind of synchronous, community, engagement and and discussion over the rfcs, and so as rfcs move in we're going to be driving the community meetings more towards that, and I think that there's also been an interest in in possibly there's a there's, a community calendar that lists all the community meetings right now.

A

That's that's the only thing that's on it, but possibly opening up the community calendar to to some of the more more of the sub project leaders so that they can schedule some meetings for a larger discussion and have those appear on the public calendar so that we can have larger discussions like this, especially around subtopics like like micro tvm. um You know- and you know, but but also do so in a way that is public and makes everyone feel engaged.

A

um But with that we are at the top of the hour. So, thank you, everybody for coming and uh we'll see you in the forums and um and on github and next month at the next meeting.

C

Thanks chris nice to see everyone.

E

B

Thanks folks cheers thank.

D

You guys cheers.