OctoML Apache TVM Community, 20 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Apache TVM Community Meeting, May 20, 2021

Description

* Agenda
* Introductions
* Announcements
* New TVM Plugin for GDB, written by Eric Lunderberg
* Trial run of Discord Server
* Regular µTVM Community Meeting
* AOT Roadmap (slides)
* Open Discussion on AOT

A

Okay, everyone welcome to the may edition of the apache tvm community meeting. uh We have a really big agenda ahead of us today, so we're going to try to move through it as as effectively as possible. The majority of the meeting is going to be taken up with discussions about um about micro tvm, so we're going to move through some of the earlier agenda items because I'm sure that we're going to take up.

A

um You know most of the 45 minutes to an hour on on this topic um so but we like, but we like to start off every meeting with introductions and um so, if you're, if you're, if this is your first time to the meeting or you're new to the apache tvm community, but please feel free to introduce yourself is there? Is there anyone who wants to say hello right now.

B

Mr also I'm david gamba, I work for sema.ai on the marketing sales side. It's great to be here.

B

That's good to have you here, david.

C

I'll also say hi.

D

C

Hi, I'm theodore gray. I work with linaro in the ai division.

A

Nice to have you here.

E

Hello, I'm jeffrey spitz, uh I'm with sima ai, and I work on our simulations and applications.

A

Great, it's great to see a lot of the uh um a lot of the sema and uh lenaro people coming. Welcome, jeffrey.

F

Hi everyone, I'm carl evans, I'm the founder of tercero technologies based in pittsburgh. We do uh edui, I'm here with my colleague, david.

A

Hi welcome carl and david.

A

Is there, uh have you been working on tvm that much before I um or is this, is? Are you fairly new to it.

F

I've been working with tvn since um november december, ish, okay, yeah so fairly fairly new, but uh have been digging a lot into the code, um particularly the fpga, related code.

A

Okay, fantastic! Well, if there's, you know, feel free to you know to reach out, if there's anything that I I hopefully you've been reaching out to some of the engineers who have been working on this, but uh I'm the developer advocate for the community and so feel free to reach out to me. If you have any questions or need uh pointers and any directions to looking for things that you've been working on.

G

A

Okay, uh anyone else all right with that, we'll with that we're going to move on to some announcements. um First off we wanted to. uh I wanted to have uh have eric talk a little bit about a new plugin that he's written for tvm, that plugs into gbd and uh and and helps with um helps with debugging tvm specific actions, including debugging models and so uh eric. If you want to take a few minutes to describe this work this uh this this this plugin and how it works, that would be fantastic.

D

Oh certainly yeah if you'd like, I can also uh share my screen to give a bit of a demo. If that uh helps.

A

Sure yeah happy to do that.

D

A

D

Participant screen sharing.

A

All right all participants can share now.

D

Okay uh share screen and this window yeah. Let me know when you can see my screen yep. We see it okay, so I have uh two uh windows side by side. uh So in both I'm going to just open up a quick stack trace.

D

The difference is that on see the right, I'm going to source uh the a extension file on the left, I'm not going to everything else will be identical between the two. ah So I'm going to put a quick say, break point just uh so that way I can simulate. Oh, there was something that went wrong and where does it happen, uh so import tvm and then let's go somewhere that triggers that break point.

D

So just doing some query in this case on the vulcan api and okay, I've hit a break points, so first thing where and you'll notice. This looks uh very similar between the left without the plugin and the right, without the only difference is a lot of these uh say: stack frames are indented.

D

uh These are uh stack frames that are part of the python interpreter are part of the packed function, interface say, and they don't necessarily give the most intuition as to uh where you are because they're going to be the same series of stack frames for each one.

D

So uh if I give the dash hide option say as well, then I can see still the full stack trace on the left without the extension, but on the right.

D

Anything that was indented is now instead see hidden entirely.

D

So I can see oh yeah, here's where I am in the vulcan api, where I hit my breakpoint uh and see runtime api and then, let's skip a dozen frames until we get to uh pactfunk, hey pactfunk.pi, so in addition to hiding stack frames that aren't necessarily that's uh insightful into a given problem.

D

uh It also will, if it recognizes a uh stack frame as being uh associated with a c python stack frame.

D

Then, instead of printing out the see location in the c python c c code, it instead looks up the location uh in the python script that was running and shows that instead- and so this way, even as things go back and forth between some things happening on the python side, something's happening on the c plus side, it shows the relevant stack traces that you yourself have put in by calling python functions or by calling uh see the pax c plus functions rather than showing the intermediates that handle that look, those levels of interaction uh save the uh link.

D

uh If that's the link to the uh discuss thread, has instructions on how to install uh there's, also environment variable just to hide everything by default, rather than needing the dash hide argument.

D

uh But overall, I found this to be really helpful in orienting myself, where I am in any given stack frame just by ah uh optionally, suppressing out the uh in directions through python and packed functions.

D

Eric can I ask a question.

H

Uh-Huh, uh so does it mean so can I use this whole framework now from vs code directly.

D

See right now I don't or I don't know how much uh let's say: interaction vs code has with gdb, so this interaction is done using gdb's frame, filter api, and so, if see, vs code is making those same calls, uh then it will say be enabled through gdb's extensions.

D

uh I don't know, uh see compatibility with vs code, mainly because uh that's not something that I tend to use, and so I haven't tested compatibility there.

I

Okay cool, so, as far as I know, vs code does just use gdb standards, so it should be possible to configure this. I've used a bunch of lldb extensions before too inside of uh vs code, so I think it should work.

A

Yeah, if you, if, uh if anyone, tries if you do, try it out in vs code, I think that the uh you know, maybe posting that into the discuss forum too, to to uh to to express if it, if it works or doesn't work. Maybe uh a nice way to capture that information.

H

And this is a great feature.

A

Yeah thanks eric. I think I think this is like really helpful and uh for for debugging, because it's definitely like what the what we're seeing on the right. There is definitely a lot easier to understand, what's happening on the left.

D

Yeah 145 stack frames versus seeing the first 21. right. Thank you.

A

Yeah fantastic! Thank you very much! Okay. So uh let's go let's head back over to the agenda.

A

Okay, so so next on the agenda, um so next month's meeting, uh so we talked a little bit about this at the last meeting, but next month's meeting we're going to be selecting um an asia pacific friendly time.

A

This this meeting time was chosen because it struck a balance between people working on the west coast in north america and people who are working um within european time zones.

A

um But we've had some requests also from uh from users who are um in japan in china. Kind of you know in you know, in a similar region to you know, maybe have alternate meeting times so that so that it's easier for them to come to the meeting, because you know typically it's in the middle of the night um for for these meetings and so look for an announcement on the exact time that the meeting is going to happen.

A

If there are, if there's anybody who has specific suggestions- and we can some of this on this on the discus channel- also um to to do this- you know about about a time that works um and then we'll send that out and we'll we'll work towards adding that to to the agenda. Another thing on here that I I didn't quite add, but it was also important to note- is right.

A

Now we are running the community calendar from the calendar, that's being hosted by octo ml, and uh we want to be able to open this community calendar up to more of the community so that you can schedule your own meetings and your own subgroup meetings and we'll we'll get into this a little bit more during the regular micro tvm community meeting time um so we'll be looking for some calendar changes as we stand up that apache tvm organization and start reproducing the calendars inside of that and then broadcasting out to the community more another big announcement.

A

Is we, after a poll on the discuss forum, we're going to do a trial run of a new discord server for the apache tvm community? We talked about it a little bit within within uh within discuss and essentially what this platform is, is a synchronous way for community members to be able to talk to one another about tbm, and so this is largely about hey. How do I, how do I get? How do I work on something if you're collaborating with someone?

A

If you need some, if you need a little bit a little bit of help, I'm doing things, um we're hoping that this will be like a really nice place for you to be able to drop in ask a few questions: interact with other people who are working on tvm a little bit more directly and a little bit more one-on-one.

A

But we also want to remind everyone that all official decisions and all official discussions need to happen on the mailing list and within the discuss forums so that um so that everybody who is involved in the community has an opportunity to participate, and so you know we're really thinking of the of the discord server as, like you know, catch up with someone in the hallway have a chat with them.

A

You know, you know, maybe sort out how to work some things or how to get some help and then bring those decisions back to the community in the discussion forum. So please there's an invitation link there. I have a link to the initial forum post about this and and and try it out. This link should be good for everyone for forever um and if and if you do run into any problems, feel free to reach out to me. You know either inside of the discord server or through my email, which I will put down here.

A

um If you, if you're running in any problems, so does anyone have any questions about this or or has anyone uh you know want? You know, have any observations about it that they'd like to share.

J

Chris, there is a channel at the tlc pack called the micro. How does it fit in in that uh new discord uh server? What are the contexts you know? uh What should we use.

A

Yeah, I think that you know. I think, that what we would like to do is so so one of the reasons that we're looking why we started up this. This discord server was because we were running so there is a for those of you who are unaware there actually is a tlc pack, slack server that that some members of the community have been using, but we've run into some pretty significant limitations, particularly in the number of people who can actually be invited and and how and how we can manage that.

A

And so my recommendation is that discussions that would be happening on that slack server. If we can take those over to the discord server and really try to see that community there and so favor discord in you know over the slack- and I think that's a really good point, like you know, like there's a micro tvm channel in there, that people are talking that people are talking, and so we need to create a micro tvm channel within that server.

A

And if there is a and and perhaps it's a good time to maybe list some of the channels, we would like to see in this document so that we can make sure as the day goes on, that those are created and and and that everyone is able to access those.

J

Awesome got it chris that that's a good move in my opinion, yeah. I will uh create a channel and I.

I

I think to answer your question um more yeah. A lot of the frustration with the slack is that we tried to make it more open, but without paying an incredibly large amount of money to slack, it's really hard to get all the features out of it, and so um I think, in the old days a lot of us were meeting in person, and you know, like people weren't using slack that much, but it seems like as we're growing across companies, it's really important to communicate for just some development synchronization.

I

So I'm hoping that that you know, if we're all happy with this chord, we can just move everything over there and slowly kind of wind down the slack and bring all the communication there. um If that makes sense, okay, yeah it does.

A

Yeah and again, please please, if there is something that you feel like is missing or something that you know doesn't doesn't seem to look right. Absolutely, I'm I'm kind of a newbie on how to how to manage a discord server, and so I am I'm happy to take any.

A

um You know, criticisms of things that we could do better requests for things that you would like to see. I mean we really want to make it a community resource that is going to be vibrant and useful for everybody.

K

Yeah- and so you know, one thing is like um you know: we currently have the slack and we're calling this a trial period of the discord server. So let's all uh move to a micro channel on the discord, server and try stuff out um again. You know for for anything, that's kind of a material um decision that we're gonna make within the tbn community. We um we can't make it on either one, so we wanna uh make that those on the discuss forum. So um hopefully it shouldn't be too big of a deal.

K

If you know we move over to the discord server and then um you know we we decide for some reason and a month from now that actually we didn't like something um there and we have to switch to some other platform. um But, let's you know I'll monitor both going forward um as we're in this trial period.

L

Just a comment on this, so I noticed that there is- or there are 154 people on slack. So, judging by the number of people in here which are 33, maybe we should notify people on the on this lack. I don't know how to do that, but probably somebody who we've admin.

M

Yeah, you could do like a tag with everyone and just put it like in the announcement or something like that and.

A

Yeah yeah, we can also.

M

A

We can do a pinned, uh a pinned message in the general channel too. I think that's.

M

A

That that's a really good idea to make sure.

D

That people who.

A

Are there and aren't at the meeting understand that we're doing that migration? I'm also going to be posting the link to the discuss forum later on too so that it's more widely available, so that anyone who sees it in the discuss forum can can join the can join the group. Now, as with any of these other community platforms, um when you post a public link like that, we also want to make sure that everyone in the community feels like feels like they're safe and that they um and that and that this is.

A

This is a supportive and welcoming side. And so please, if anyone shows up who begins to abuse the service report, it to report it to myself or any of the people who are listed as moderators within the discord server immediately, so that we can take action to correct.

A

A

Okay thanks everyone. I'm super excited about this chat server. I think it's going to be really valuable and I'm I'm looking forward to interacting with with everybody more as this goes forward.

A

Okay, so we're about 25 minutes in and let's move on to the next topic, um the rest of the meeting we're going to be mostly covering micro, tvm um and one of the first things that we wanted to bring up is there's been some some requests generally to have some more um online meetings like like, like the community meeting that are a little bit more focused towards micro, tvm, and so to that end, we're going to be kicking off bi-weekly meetings over zoom, um and uh did you want to uh um so I'll turn this over?

A

Did you want to talk about this tom or, or uh um is this something that you wanted to talk to talk about andrew.

M

Sure, well, we'll just keep it short and sweet. I think that's probably the most important thing, so you know, as um velocity has picked up in microtvm development and sort of facilitating. What's all going on, you know in the community, it sort of makes sense that we should get together, maybe on a more regular basis, sort of having a little side huddle. That's about micro, tdm, but of course, with the provision that again it's the discuss forum that counts when it comes to business and along those lines.

M

So really this is a you know, just meant to be a way to you know help what's going on already with with with micro tvm, as speed has picked up on it. So we've got all the information in here um and we'll we'll try things out as an experiment, and we are looking at a two. You know bi-weekly cadence, we'll dial that up or dial that back depending on how well things work out. So maybe you know monthly is more that the uh the correct cadence will again make adjustments along the way.

M

um As far as the meeting time, I just picked a random time. That would try to appeal to a good set of time zones. I realize it is not best for those people on the west coast of the united states and it's terrible if you're in china. So you know, as you sort of alluded to earlier in the meeting chris, you know. Maybe we need to do something that flips back and forth with something that would be.

M

You know more friendly to those folks that are in asia, pacific and kind of share the pain, if you will from uh meeting to meetings so with that links. Are there hope to see you next week on may 26th everyone, and with that I think, that's enough.

H

But tom one question: would this yeah, would this be just uh micro, tvm or because I think, last time we also talked about an aot meeting? Would would these topics be covered as well.

M

Yes, yeah, so micro tvm intends to make use of the aot, so I think it makes sense that you can, you know, sort of mix those two things. I wouldn't be surprised from time to time that there will be other topics that drift in because micro tvm, you know in some way interacts with. You know some feature. That's you know in the larger framework, you know we'll just sort of let those things organically go as they need to make sense.

A

Okay and and also reminding everyone too, that that we are going I'm going to be setting up a new apache tvm organization that will allow us to share some of these resources, particularly the calendars for this and so watch for the announcement on the discuss forum about the new calendar and you should be able to subscribe to the calendar, and my intention with the calendar is to make sure that we put all of the public tv events onto that. And so that's going to include the monthly community meeting.

A

That's going to include this micro tvm bi-weekly meeting, as well as like we're going to be doing tvm comp later this year, and so that's going to be including information about um when tvm conf is going to be happening when the deadlines are that for that are going to be.

A

And so you know, and that's and a big part of the reason that we want to make this a community owned resources so that folks in the community, who are planning events like say that we're planning local events or planning you know they want to have like local, meetups or or other things that are going on. We have a place for community members to add that and be able to broadcast it to the wider community in kind of a a um you know, a repeatable and discoverable way.

A

Okay, so thanks a bunch well, okay, so the rest of the meeting is going to be devoted to the aot roadmap, and so I didn't write down who who was taking lead on this discussion.

K

I figured what we could do for this is um I I think, there's been a lot of um discuss posts and forum posts and- and I myself have even at times um become confused about uh just the number of things in flight. So um I thought to start with the discussion here. We could. I threw together just like a couple of slides to kind of just give an overview of what we've been working on um some of the the um some of the work.

K

That's been kind of done and some of the work that's outstanding, and then you know we can kind of talk about whatever people feel like is a concern from here. um I know a number of the contributors um to aot, or maybe even all of them are on the call here. So if you have questions for for them, I've certainly been serving in more of a reviewer role. For for this, and so um so kind of we have.

K

uh You can ask me questions, you can ask them questions and we can kind of go from there. So let me give a quick overview of that and then we'll kind of open it up to to discussion. After that, um so I thought I just would talk about like what is the aot effort in case people here have not um kind of engaged with it. um What are the sub projects and then kind of what are the rfcs?

K

We have outstanding right now, um so to start with, I mean uh I just just to give some background for for aot. You know what does this mean?

K

It's we're, calling it ahead of time, compilation and what we're what we mean here is um uh when you run the tvm compiler, you supply uh basically a relay program at left, even if you give it a um something like a tensorflow model, tdm is first going to convert that tensorflow model into relay um and that's cvm's kind of internal model description language after you're done with that tbm will then compile pieces of this program separately, and then it currently relies on a sort of executor to um to handle this, and so I'll talk about this.

K

But but aot is an effort within tdm to um to generate basically glue code and so that that runs kind of all the pieces uh without a need for like a runtime library to do that. So let me explain a little bit more um so, like I said, we're going to compile a model um and and run it and isn't basically compiling the model and calling um the pieces what we're already doing. Not quite what we do is we break things into sorry?

K

I think I skipped this should just skip this to this slide before we break the program into schedule, pieces, there's sort of one function for each piece, and I've kind of illustrated that with this graph at the right here- um and we export basically three things. We export kind of the generated operator implementation for each piece, so you've got in this graph. You've got com2d bias ad. That was from our relay model before, as well as max pool 2d.

K

We then have like a graph that explains how to link together the pieces as well as how the data dependencies flow sort of between the inputs. So, for example, if you want to run this comptonity bias ad, the graph explains that you need to supply an input called com2d input and a p1 which is a parameter and, and then the output is going to be placed in this intermediate buffer here, and so um these three outputs here um are what's currently generated.

K

uh I guess when you're, using the graph runtime, which is the default um thing to use in tvm um for for at least for micro applications and uh kind of moving forward, then um you know the problem with this approach is, is actually mainly that what we do is encode the operator graph as json, and so the runtime you know basically has to load in this json reconstruct the operator graph and then sort of call these operators in order and and doing this is pretty expensive on a microcontroller, especially um so um what is aot then?

K

The idea is, why do we need this executor library to parse this graph json and reconstruct this graph? Can't we just generate the code needed to call these models in order, particularly for for feed forward models or models that are straightforward.

K

You know what why do we need to um to rely on a library for this, and um the answer is that we we don't. We could do this, and so I made this um slide to basically kind of show, roughly the tvm compiler architecture- and you can see here- I've got the outputs written here at right on the right here. We get simplified parameters, we get an operator graph and we get the generated operator.

K

Implementations and the thing that we wanted to to do in aot then is take this operator graph and somehow output a function that implements it and um so in the aot world. What we do is we actually, instead of outputting this, we feed it forward into a new pass that then creates sort of a top-level function that that lives alongside all of the the um operator pieces and uh and then this whole thing becomes kind of the code generated output from tvm.

K

um Just to give you an example, maybe this slide is not quite fitting.

K

Let me make sure: oh I see. Okay, I think I let me this is the example slide. I wanted to show you guys sorry. This is not quite as polished as I wanted it to be, but um just to give you an example of what this top level function looks like um people that have been working with tvm for a while are kind of familiar with this function. Signature at the top.

K

This is our packed c-function function, signature, and so all of our generated functions are invoked using this signature here and so, but you can see that this function here is not actually an operator function. It's a the top-level glue function. That's meant to call all the operators in the graph kind of in the order that um reconstitutes the original model um and so to do this. um There's a couple challenges one.

K

If you remember that intermediate buffer um well, uh we've got to allocate one of those things in this top-level function, so this aot project generates code that uh performs these allocations. uh It performs them kind of minimally. According to um our graph level memory planning algorithm.

K

um It's then got to assemble um kind of a call stack that it can use to call sub functions uh with with that same signature, and then it's got to call operator functions in order, and I've just shown one example of a function call here you know in in the true implementation. Of course, you can expect kind of these last two blocks to repeat themselves several times, as as it works its way through the model, and you can see um in calling this operator function um effectively.

K

It's um uh it's using the same signature here, so so it's kind of keeping this um uh pax c phone call signature.

K

So in doing this, this basically allows us to um get rid of the need to have a large runtime library, because if you remember that runtime library, the graph executor um that is basically was supposed to parse json reconstitute the call graph and then do all this code and what we've done is we've effectively codified this in the output of tvm here so okay. I hope that was somewhat comprehensible too to everyone here, but that's kind of what is aot.

K

I want to say one thing about this too, and that's that um a lot of the aot effort has been kind of started under the micro tvm, guys and- and I think that micro tvm in particular has quite a bit of interest in the aot project, because resources are pretty constrained on microcontrollers, but we don't intend to nothing. We've been presented so far is particularly specific to microcontrollers, so um in developing the aot project.

K

You know, I think, there's certainly a number of folks, especially a number of folks here that are focusing on microcontroller applications of the aot. But um this isn't something that we're intending to make specific to microcontrollers.

K

um I guess in we intend to make this this core part of the aot uh effort, more broadly reusable across the tdm code base, and so um particularly if there are people on the call here that are interested in non-embedded applications or applications outside the tvmc runtime. um You know we'd especially love to hear any thoughts or concerns from from your site as well.

K

um Okay, so I wanted to talk. This is a fairly large effort, and so we kind of have an initial rfc um about basically uh uh well a number of things, but what we wound up doing was was merging a core piece and then you know we kind of expect several pieces to follow this, and this is by no means a comprehensive list.

K

I'm sure there will be additional pieces as we continue developing the effort, but just to kind of give everyone kind of a brain dump from me of where I think we are, and uh if anyone, if I've forgotten something, please let me know um so. The first thing we've uh first, I guess project or rfc this. These all roughly correspond to rfcs um the first rfc or project that we've landed, is, is basically making a tensory hour function to mimic this graph executor run.

K

So that's this top level function that I just kind of showed before um and so that rfc and pr has been landed, as well as a couple of follow-on prs and at this point now, there's um a bunch of follow-on rfcs that are all kind of working their way through the discuss forum and a lot of these are are open and active, and so if people have um comments and questions on this like.

K

Ultimately, it would be great if we could um materialize these on the discuss forum, um but just to give you an idea of kind of from my mind where we are um you know, so we have stuff needed by this kind of broader non-firmware or non-embedded effort, and and for that, that's basically implementing something that replaces the uh runtime api that you see with the graph run runtime today. We call that the module-based model runtime api. If you search for that on the forum, you'll find a pretty lengthy rfc.

K

That kind of goes through um current consensus and and which, of course, we can always change going forward of you know what should it look like to load and run a model with tvm? So, that's you know what function do we call to um allocate memory? What function? How do we set a parameter and how do we drive inference and get the output?

K

um So that's kind of a packed func based thing, and um this rf, this project hasn't even been started yet hasn't, there's hasn't been an rfc written yet, but that's kind of a to you um and then the next few things are are um things that are, I think, are more specific, or I guess I would say needed to by microtvm could be used elsewhere too.

K

um There are a bunch of ideas, basically from the implementers of of aot around basically reducing stack usage, you'll notice that, in this example, function here we're allocating a bunch of things on the stack and I've actually pruned. This code example down quite a bit, um there's a bunch of other things. I also allocated on this act here as well.

K

All of this kind of conspires to really blow up the stack usage requirements on embedded systems, which is kind of a no-no, and so one of the things that we do is allocate dl tensor instances on the stack and in the embedded world. You know we'd like to kind of avoid that, since for the most part we just need the data member. So that's one thing in flight. There's an rfc out about that. We'll talk about that in a second um there's. Another effort is basically to if you notice this function.

K

Signature here is not the most friendly thing for, like a c firmware programmer. Of course it's like a c function interface, but specifically, what are the arguments? uh What order do you present them in? You know, there's no documentation that you would that you would kind of um normally expect of a tbm function and there's also a bunch of sort of extra arguments in a sense that um uh out return the return, value and return type code arguments are are typically um are actually typically contained in the um args here.

K

I believe um I think I said that right. So, uh basically, there's just a lot of uh a lot of pieces here that that could be minimized and so there's some interest in creating basically a smaller api that doesn't rely on this packed function, type, signature and and can also just kind of um uh be more oriented towards embedded, embedded development. So doing things like including model metadata is part of this.

K

Lastly, there's um if you notice that the the intermediate tensors here were allocated basically as scratch, fat tensors- and so um uh you know that only at some point we'd like to get to a point where we can do more comprehensive memory planning over those scratch pad tensors and that's kind of an effort, that's kind of a work in progress.

K

um So. Lastly, I just wanted to give a list of kind of the rfcs that are out. These roughly correspond to those projects that I gave before. So this slide is actually probably very similar to that, and then I just wanted to open it up for for discussion so um implementing aot and tbm.

K

That's the initial rfc that implemented this core tir bit, um and then uh we have these kind of outstanding rfcs, as well as a few to um uh to kind of um implement some uh following pieces, such as the model module based model, runtime interface.

K

um Okay, so that's what I wanted to talk about uh just to kind of give a brief overview. I hope that was somewhat informative and not super confusing. um We have some of the folks from that are implementing the rfc. I think manupa and giuseppe are on the call. I don't know if you guys had anything you wanted to bring up or or talk about or feedback. You were interested in um and I guess just wanted to open it up for for a discussion.

N

Hi, I hope yeah, I think you have been you've, been quite comprehensive in your description. I don't have much information uh specific information to what. If anyone has any questions or doubts or they want to know more, I can we can provide information.

K

And one thing um one one aspect too- that I think is kind of um one of the bigger pieces that are in development here right now is this embedded c runtime interface- um and you know that's kind of a um the idea here again is to provide kind of a firmware facing interface and uh actually there's been a kind of a proposal.

K

In a sense, um the the folks from sd micro on microelectronics um have kind of up pushed a pr that uh shows basically their um kind of a demonstration of how to integrate tbm with their um embedded api, and I think, there's a lot of really good things there. um That kind of were were built kind of from um kind of a firmware developer's perspective.

K

So there's a lot of there's metadata. If you need to write reusable functions to pre-process model input, um a lot of the mechanisms for getting input and output are pretty well thought through. For you know, in terms of a world where um you know, the application developer wants to be able to kind of manage all the the memory on the system, and- uh and so you know, kind of one of the.

K

The things that I wanted to to ask people here is, if they have specific requests of of you know what should a firmware interface to micro, tvm, look like going forward. You know right now, we've kind of confined you to this. um You know very packed funk centric graph, runtime interface.

K

um So if anyone has um thoughts or concerns on that I'd love to hear those challenges, if people have tried things out, that kind of a thing.

K

Oh michael, I think you're, muted.

H

Yeah, so I'm not sure if this really fits hundred percent. Your question so my question uh kind of, is um I I I think you covered the right point. So number one would be for me getting rid of the dl tensors and the tm values in the uh in a generated code. I think this is not a not a large issue. So the second point: what would be interesting is um do.

H

Do you think we should limit from a design perspective on generating uh code only for the main function, or should we also consider um to uh on the kernels so like getting rid of the logs and and free operations.

K

Yeah, that's a good question. I mean we've talked about this a little bit too, with um I've been talking about this with manufa and others on the the arms side, um who have been kind of implementing a lot of these uh operations themselves, and I know that um you know memory memory. Planning is kind of one of the big concerns here and- and there is kind of a question of what we want to do is get away from a world where we have this sort of dynamic memory uh interface.

K

You know sort of a mallet-like interface. If we, if we have it, you know at the best, it's like we'd like to sort of pre-plan every mallet call, so that we can uh target it to a sort of a predefined memory location, and I think what the the um the aot implementation today does is. There's actually like an alternate stack based memory. Allocator.

K

You can pick that sort of as long as your model is feed forward and straight, and then all of the memory allocation should be um first and last out, just like a stack, and so you can use this stack memory allocator with kind of very little um waste and so kind of. How does this all relate to the original question here, which is basically like?

K

Should we allow these um uh operator kernels to to invoke malik basically for scratch pad memory, and I think that um kind of one of the proposals that has been pushed so far is: um can we implement basically a tir pass? That then goes into the generated kernels and hoists up all of the allocations into the main function? So basically this would.

K

This would look like uh uh basically, uh each each of these scratch pad buffers would be passed in as another uh argument sort of a magically added argument to each operator kernel and then um within the main function. um You know that kind of gets to the level of user interface and kind of graph level memory planning and then there's options. You can either access a set of memory, pools and- and you know, use the memory planner to target those areas or um you can.

K

If you want to continue calling the malik function, you can do that um so there's kind of at least that moves the workspace scratchpad allocations closer to um the the kind of the top level graph um interface.

K

I don't know if any of um any of the armed folks want to say anything more about that or if there's um kind of further thoughts there.

O

Yeah thanks andrew uh can you can you hear me yeah um so yeah that that kind of aligns with the vision so yeah? As for the original request, uh how do we get rid of uh these uh analog? Sentries was a concern uh in we are in engineering design, aot and I think we are looking at.

O

We are in the process of designing something that we hope to put in an rfc very soon um in which we try to uh make the main function accept some buffers uh probably to be used as pools so that all the intermediate code in the main and operator kernels should be able to access by uh using a predefined offset into it so that there are no analogues and is actually happening on the runtime.

O

So yeah, as andrew mentioned, we are currently uh kind of using a lightweight stack allocator, because uh that's what tier code could get lowered to which has this lifo pattern, but yeah we have planned. We replace that with uh user provided workspace buffer, so the idealizing the in the application itself, they would be able to pin them into particular memories.

K

Yeah exactly and then in parallel, I think we also want to do some work on the tvm graph memory planning.

K

I guess implementation and the interface currently there's kind of a implicitly user-facing aspect of the tvm graph memory. Planner, that's called it's. Basically, how do we identify a memory region where a sensor is going to live and the tbm graph memory planner currently works by basically looking at all the um memory allocations that are visible at the graph level and currently that doesn't include scratch pad buffers.

K

um But you know with this idea of wasting spatula buffers out of operator functions that would that would change, and, uh and so it looks at all the graphable um allocations and then it sort of tries to map each graph level tensor to a backing, a storage area by doing sort of live, liveness analysis on those graph level, tensors and then, of course, the kind of the implicit question then is: how do we identify a storage area and there's kind of a user facing, but not not exactly a user-facing identifier, there called a storage id that you can see in the graph json.

K

If you, if you look at this, json object that we generate, um and you know moving forward as we get, we move into a world where we want to think about pinning tensors into memory and um also, especially as we start to consider things like uh heterogeneous memory systems or maybe systems with accelerators that have small areas of accelerator, visible memory versus larger areas of cpu, visible memory.

K

um I think that the the first step in that direction is basically to make the um the storage identification process kind of a first class user level concept um and and so kind of moving forward. I think that's that's one thing I want to do. I see giuseppe's raising his hand, and I don't want to ramble on about this, so go ahead.

N

No yeah, I would just uh make sure that there is a another pr that I just pushed like uh 40 minutes ago. So you you, you didn't list here for obvious reasons that he's trying to decouple um the couple aot from the the memory planner just described so because the the the relay planet, the graph memory planner in tdm as the problem that it treats the um output buffer as a temporary as well. So this is bad because it the the email, the output buffer, is provided by the user. So, yes, you know.

D

N

Cannot share it, you cannot, you can share it, but you cannot change the size of the buffer. Basically, you cannot expand the buffer, that's the problem. So uh there is an issue raised and in order to solve this issue, we decided as a first move in the direction of memory planning and to to do a um in aot memory planning. We we try, we are trying to to decouple the um the running, the the airt, uh the the uft, the ut pass, the ulti function from the uh tvm graph memory planner.

N

So it is, we run a very, uh very sequential allocation and then we use a tier pass called um storage rewrite. uh That is basically that is basically doing what remember the tvm memory planning does on the t function. It's still not. It is not a a a.

N

It still doesn't take in consideration all these crash buffers. So it's still uh the is still a bit a sub optimum, let's say but yeah, that's uh that's a step toward the direction.

K

Great yeah I'll uh glad to see that and I'll take a look at that and yeah of course, interested. Definitely please take a look and review it. um What was I going to say.

K

Yeah, I was just going to to speak a little bit more about this um issue you raised, which was just like a thing that mood had ran into the other day and and previously um one thing that the um kind of module based model runtime interface uh implies. Is that there's a tvm library, that's managing all the memory?

K

And so, uh if you, you know, uh if you want to um uh change the way that the memory is allocated, it's kind of buried in kind of shared tdm library, code and, and so you kind of have to change that code and uh as essentially as we're moving into this microcontroller world. uh We're gonna have to change the paradigm, at least so that you know potentially there's memory blocks that that tvm wants to generate for you. So, for example, it knows it needs, say 32 kilobytes of cpu facing ram.

K

It should provide you some some con construct that you can use to easily allocate that memory, whether it's a defined constant. That says how big the cpu facing ram is, or a struct, or something like that. That contains it, um and in particular why this is important. Is the thing that that giuseppe just raised with the aot api?

K

One thing that we changed was that the the tensor buffers that are um inputs and outputs of your model are supplied by the user and uh and so by contrast to the previous uh model-based module-based model, runtime interface, uh you know those were managed by tbm and as we move to a world where we're um considering that the user is passing uh memory, what we did was that we just reused the graph memory planner and um the the output buffer.

K

um The input buffers were kind of assumed to be uh kind of don't touch these, but the the graph memory planner then for for all following buffers in the graph we'll try to reuse them. um So if there's an intermediate tensor that lives for exactly that, just needed as a a go-between between two calls um that are subsequent and then later you know, there's another intermediate buffer. It will try to use the same memory for those um two intermediate buffers and that's great.

K

That's fine and well as long as those two buffers aren't an output of the graph, but the minute that one of those two buffers becomes an output. Then the user has to suddenly care about. um You know if it does reuse these buffers, it will kind of size them up, and so um so this is where you know this.

K

This bit us basically in the interface is that um if we allow the output buffers or output tensors of the graph to be um sized up by the graph memory planner, then the user has to know that somehow so that was kind of our bug. There.

G

Oh, uh I also hit this issue. um I don't know if you guys noticed in that copy to from api there's a new dl tensor overload, which you can extract the shape of the output tensor, and so then you can look. You can grab out the slice of the uh output storage that you're needed that you need for that particular output tensor. So you shouldn't have to copy out, like the full storage pool, um you'd only just cut the copy of that slice right. So I think that that should solve that issue.

G

I think um it did for us.

K

That's great yeah, yeah well, we'll have to see if we can uh kind of mirror all those kind of analogies across the interfaces there um so kind of looking forward. I guess there's there's just five minutes left, I guess is there? Is it am I missing anyone that's raising their hand or has anyone wanted to say anything that I'm missing.

P

Yeah hi, this is uh michael vogelseven you're from uh um one thing that I would like to. I mean I'm very interested by this activity, but not for necessarily uh the full microcontroller. What I'm interested in particularly, is limiting the uh current runtime that we have with tvm, especially when models are run within a streaming environment. Let's say, for example, when you want to put a model generated by tvm as a filters inside gstreamer.

P

Currently, if you do that, the runtime actually that gen, that tvm generates serialize everything and it creates quite a problem actually with gstreamer and probably some other libraries. So I see this work that you are doing as actually a bit a bit higher than the mcu. You know a bit bigger devices, but to reduce the runtime to be really self-contained uh and that can be put inside a streaming environment like gstreamer and others.

P

That would be absolutely awesome, and here you got the same problem you mentioned with the memory that is allocated by the external application. Like a g streamer. You know that is going to be passed into this function, generated by gstream via tv and the output that is going to other filters so to have it well self-contained uh with the memory management and the multi-threading management. You know that would be generated outside of the of the tvm itself uh generated function is act very, very paramount.

P

I mean I have done this in in the past and unfortunately I had to change completely the runtime itself uh that the tvm generates, because otherwise, as I said, it's just impossible, everything is serialized. You got plenty of clash with memories. You have to duplicate memory all the time, so uh that would be really helpful. I mean I see this. The work that you are doing guys with aot has a really great duration.

K

Yeah, that's awesome so so in this case, um just one question for you on that when you're talking about the kind of serializing the runtime, um are you trying to kind of run multiple inputs through the graph simultaneously or is that yeah just to say? Okay, I see yeah cool cool.

P

What I mean by cellular is uh serializing the runtime. uh Sorry, maybe I was wrong here. What I mean is what happened with the tvm runtime currently is instead of running in parallel. uh You know with gstreamer and the filter that generated by uh that is wrapping the code generated by tvm, the tvm runtime that you need to start within this filter to run the tvm code, actually serialize everything, that's the one.

D

That is serializing.

P

It so you end up with. Actually, uh you know a big slowdown in performance when you are using g streamer, if you don't change it, and that is really purely because of the runtime. The way it is today, so removing it uh making it as uh as pluggable, let's say to other framework, would be actually really really a good thing. Great.

K

Yeah yeah, that makes sense, that's definitely aligned with the embedded stuff. So, okay, we just have a minute left. So um I guess uh the last thing I wanted to say was: like you know, aot is, is again it's uh something we're working on for for embedded um uh development.

K

I guess is kind of the impetus for this, but it's certainly something we want to target for other applications as well and so um kind of going forward like we talked about we're, probably gonna start having these or we're gonna have to start having these micro tbm um bi-weekly meetups, and um you know I expect a lot of the the things that we are uh talking about here, to be micro, tv in focus things like these uh embedded tvm uh emitted c runtime interface, but one thing that I've found with micro tvm is that it tends to touch a lot of different parts of tvm itself, and so I expect we'll we'll talk uh about different cons.

K

You know quantization um aot uh memory planner a bit of these meetings as well, and so um I I just like to say, I think what we'll do is we'll post up. You know uh agenda items and if um you know, I think the focus of the the meetup should definitely be. You know firmware oriented, but if, if people want to attend, you know just to kind of learn what's going on in the firmware world or you know raise concerns about.

K

If we're going to do, you know push something in a direction that that is going to lock out non-firmware use. Cases like please feel free to attend the micro tv and meet up um it's certainly. You know we won't yeah we'd love as many people that want to come as um to join so yeah.

K

Chris. Do you want to say any last closing words or anything like that? Is there anything else anyone wants to raise before work.

H

Yep, okay, can I uh follow up one on your last comment, andrew just one thing, so you.

A

H

A

Like a comment, oh sorry, I just want to comment that we're at the top of the hour, so people shouldn't necessarily feel free to stay on if they have other other things going on. But please go ahead. Michael.

H

Sorry, so just one minute so andrew you mentioned that there should be a new paradigm in in memory planning. So did I get it right so so the memory planning should be moved out of a summon the the the core tvm code into the interface, so that people can integrate their own memory. Plannings.

K

Yeah, so we're still, um I think that we definitely do want to create an interface that has kind of here's what you get, um and I think the idea is what you would provide the memory. Planner is basically a tir um program, basically a let's see an ir module containing tir code, um and then you know kind of here's. What the interface is expected to produce and and that kind of is uh those that's basically the the first class storage identifier.

K

This is basically a data structure, that's contained, um I guess, buffer pools, and so uh so sorry this is kind of hard to go through in a whole minute here, but basically the idea is, I think the interface would.

K

We should be very clear about what we want the graph from marine planner to consume and what we wanted to produce and then um whether or not we provide kind of um I mean, I think that you could always re-implement the interface as you wanted to.

K

But whether or not that means I think we would probably still try to provide a more flexible implementation um that can kind of be retargeted for for small changes in memory planning and then, if there's something trying to do something really crazy, you know you can you can reimplement yourself as as you need to so that's that's kind of my view and we'll definitely talk about that more in the micro, tv and meetups going forward, because um I think that's related to things like accelerator offloading and things like that as well. So.

H

A

Okay, uh thank you, everybody and uh super excited about this. I'm excited to see everyone at the next at the at the micro tv meeting, coming up, uh keep your eyes open for the new community calendar and resources coming out, and I hope that everybody has a great day thanks a bunch.

C

O

You bye everyone bye.