ONNX Roadmap Discussion, 22 Sep 2021

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: ONNX Roadmap Discussion #3 20210922

Description

1. Rajeev Nalawadi & Manuj Sabharwal (Intel) – Address gaps with Opset conversions across broad set of models
2. Rajeev Nalawadi (Intel) – ONNX model zoo example for E2E distributed training scenario of large models
3. Rajeev Nalawadi, Rodolfo Esteves (Intel) – Define concept of federated learning for ONNX (multi-edge training and model aggregation)

A

Okay, hello: everyone welcome to the third roadmap discussions uh if you've attended it before, I think it's very productive. uh We have had 30 minutes and three presentation of 10 minutes each to the extent you can try to stick to the timing, because it's pretty tight and leave some time for questions and discussions. Everything is recorded. It will be posted on youtube and the slides if you, if you agree we'll repost it on slack. So the first presentation is by my news and it's with um address gaps with offset conversions and both sets of models.

A

uh Take it away.

B

Okay, you can see my screen right. Yes, we.

C

B

Okay, so um yeah manoj, I'm manoj from intel, um and I don't know if arpanad had joined, but uh we are in the same team with rajiv and currently I own the ecosystem enabling with owner next specially with our partners. So software vendors working on the client side, so I'll go over some issues, gap which we are working with. It's large set of customers, not just one customers and as they started, adopting onan x model. We are seeing some of the issues or gaps in the conversion site right.

B

So one of the uh basic problem we are seeing is the offset conversion, as new use cases are, keep coming and there's a backlog. You can say once they deploy in production, these companies ecosystem partners, they don't go and keep updating the offset, because they are not working with the risk at pace of research in production. So they are facing multiple issues on offset conversion uh from either going from 7 10 to 13, because we need 13 for quantization which I'll go through.

B

But that is one of the common issue we are seeing is once we get the model from our customers. They are not able to con, we are not able to convert or either they are not able to convert to newer offset, and we did try uh some workarounds, which was filed under this issue, but none of them also worked for us. um I think there was a offset converter or unit test was added in 1.10 1.10.1, and that also has some.

B

I think it's have some gaps which are not able to convert these models, which were have very old offsets.

B

Okay- and this becomes as we are not on offset 13, uh several customers are not on offset 13. We are not able it's a show, stop or a blocker for them to start doing the work for quantization. We want them to move to int 8 leverage, uh good performance on the it's available on many different hardware, vendors, but they are not as they are not on 13. They have to stick with fp16 or fp32, which they are deploying and intake has become a bottleneck for it.

B

Adoption has become a bottleneck, so the feedback we get from customers is. They can go back to pytorch code if they want to convert it right if they're using pytorch, but as their researchers have moved to different projects or they're, not in their own company, then we are at the end. We are only with the own nx model, and that means we are stuck at that place.

B

So the request is to see if we can solve this issue of offset conversion or have better unit testing in the future on nx conversion side a converter, so we are able to move older offsets to newer. So we can take advantage of uh performance, good accuracy and also, of course, intake quantization.

B

The other part which we also see like I called uh we if we are able to convert these offset to 13 I have seen, is these models, and this may be again related to the unit test. uh The python verification tool says this is corrupted model, um even though we have converted so first stage.

B

It should have not converted itself if there was an issue in the offset, but that is another common issue, so the unit test or the all the tests related to every operators or layers are not um to the point where isvs are happy, like they're, not happy with uh with the own and excite, and that's why it is.

B

If I see currently, most of them are on 10 or 9 offset, and this has not just scaled customers tier one customers are the ones we say they have big market share in the company for creators or in collaboration. Video collaboration use cases who are adopting on nx, and that has uh become one of the bottleneck. Plus now has some feedback like going from one offset to another offset.

B

We have to go through another offset right because there's some layers which are not supported and that also isv feedback is why we need to do this right. I mean they don't have expertise or time, but these are more developers who are integrating ai and shipping in their application instead of their researchers side. So what their feedback is why we cannot convert from one offset to directly offset 15 for the use cases they are doing.

B

Of course they they are models which can convert, but there are many models which we are not able to convert from one offset to directly to the latest offset. So if we are able to fix that the adoption of own nx with the latest offset will go higher.

B

And this all comes to at the end is how much correct correctiveness of these tool or conversion tools are there right now or how the unit test is implemented, because if we are having these kind issues, which are very common, uh we are seeing in the ecosystem yeah. Then it will hurt the onan x adoption with the new like optimized stack right, new offset will be coming. They we want them to go to new upset for new use cases, but it will have hampered the adoption site.

B

So for us, as request, is on the tool side, if we can find some solution to fix these conversion tools and make sure it's very seamless at least functionally, it should work.

B

So rajiv warp now, if you want to add anything, this.

C

uh Aparna, do you want to bring out like any examples of the issues you have faced previously.

D

uh Absolutely uh can you uh hear me, uh okay, on the bridge.

A

Yes, we can yeah yeah.

D

All right cool, so a couple of models we had tried, converting uh specifically with the resnet 50 model and also some of the bird-based models from the onyx model zoo and the issue that we had in such cases was the offset conversion. So, for example, if we are jumping from offset 6 to offset 11, it goes through intermediate offset conversion, so it just goes from six to seven, seven to eight eight to nine. So if either of those intermediate steps are missing, then in that case we are missing.

D

Some features to run the model in our inference, so we had such issues where the onyx adapters were not implemented, and we had to kind of do that testing ourselves to resolve those issues. The second example that I mentioned, for example, birth. um The bird-based model is still having issues one because it has a squeeze operator for which the adapter is not yet implemented.

D

So uh I guess one of the biggest issues that I have seen with the version converter currently is maintenance of the tool itself is, uh as we keep adding operators to the onyx repository.

D

uh It's going to be difficult to maintain the offset versions going from 6 to 11, because or even higher, because I think we're almost at 16 17, so jumping every intermediate step is going to be very difficult to maintain.

D

So that was one of the issues that we did observe while doing the offset version. Conversion.

C

And I would like to highlight the fact that uh manoj mentioned right, like the creator segment, as well as some of the intelligent collaboration.

C

That's where we are facing all of these hurdles, with respect to trying to push a performance model which is quantized, and some of some of the issues that we are running into are preventing this transition.

A

C

Yeah, I have a.

D

Quick question.

A

So do you believe this? You know you want a better coverage. uh Can you share the models that breaks or, um or is there, in your opinion, enough models out there so that we can test whether it works? Because often you know you want coverage, but it needs models to to go through.

B

Yeah, I can bring companies get the approval at least one model, because it's one of the top creator company, so I have already shown them. This means if you see the bug or I'm I added to the bug someone opened, but I put as a ecosystem partner because I don't have approval, but I can try to get one model which is failing. Actually, all of their models are failing, which is that's why they are not moving to new ops.

B

So, yes, I can follow up with them and try to get and if not, we can create a proxy model, and that should not be an issue. Yeah.

A

I think that that will be certainly helpful to the developers.

B

And other part which has improved uh between with 1.10 is the fp16 conversion, and that was one of the gap between another other uh irs which are on different ecosystem on, especially on the mobile side. They are deploying fp16, but on the windows side they are not deploying fp16. The reason was the conversion, but with 1.10, and they have some microsoft converter tools which have been recently updated. It's not on the same repository, but we are seeing um those issues have started getting fixed. It's not only offset conversion before it was fp16.

B

Also conversion issues which has improved recently.

A

Last question, which, which is which seg is responsible from for that, is that the conversion convert like the converters.

B

Oh rajiv is that.

C

I think it comes under the domain of operator's sake. If I'm not mistaken, I see rama is here. Probably he can. He can advise.

E

Hello, so the uh so you're talking about the offset offset conversions right.

A

E

E

So I mean in some sense it's almost a standalone, I think, but I would say it's probably belongs to the architecture infrastructure.

E

He said, but I mean I'm not.

E

A

Okay, we will follow on on this one. Let me follow up on this. No one, all right, let's go to the second presentation uh in interest of time.

A

And according to schedule, I think reggie can.

C

A

Guess we can see the model.

C

Now you can see the presentation, okay, this is rajiv narwadi from intel, and the request here is essentially related to the model zoo and one of the requests is essentially.

C

There is a good coverage for inference as of now for all the models across different categories and with the inclusion of phonics operators for training model zoo, believe we should be covering a few models: samples for training scenarios as well and maybe started out small like we were thinking.

C

Probably any category would would help getting getting this thing going, but from our side, preferably like transformer or nfp training samples would be the first priority, at least from the way we are engaged in the back-end workloads so ultimately like starting out with transformer and nlp models, which can showcase training and then move over to at least having more than one three model.

C

Samples would be uh providing like enough starting momentum to showcase the differences, as well as the similarities, and uh I think also we'll be setting like a path to show the distributed training infrastructure uh with other libraries and communication primitives that uh that are out there in the industry uh with training. I think uh with we can show the various techniques of our data model.

C

Parallelism and pipeline parallelism, as applicable and onyx has a very good opportunity to showcase these, by providing at least some of the onyx models for our training itself and the other big area that we are seeing. A lot of involvement is the quantization aware, training and again that is another.

C

Thing which kind of quantization aware training scenario. We would also like to request the original fp32 model, so that will allow for uh for better uh accuracy, uh comparisons and uh also in the context of quantization, aware training. Mixed precision uh is also becoming prevalent and we have another request, probably for the next roadmap session on how do we?

C

How do we reflect the mixed precision usage in the models? uh Like a proposal out there for roadmap? We can. There is an opportunity to showcase mixed precision usage for training, with these onyx models as well.

C

So I think uh main main thing is at least start out with one sample moving over to distributed training as an example with various backend libraries of collectives and then quantization over training and literally, like the mixed precision usage, the combinations of fp16 and fp32, or be float 16 and fp32s orientate fp32s. We are seeing a lot of these combinations uh come out there when most of the model models are being considered for performance improvements.

F

Are these samples ones that you would be willing to contribute.

C

um The first priority, as I mentioned, we can certainly contribute with one of the transformer samples we are working with, one for our backend hardware, backend.

C

And would like to request like help from the community if there is other popular categories that can be contributed, uh that would that would really be helpful.

F

D

Of our training work.

F

Is with uh transformers oh perfect,.

C

Alex that's all I had.

A

Actually? Okay! Wonderful! Thank you. Any other questions from the community.

G

Oh, I was just uh this is calvin here from light matter with respect to quantized, where training one thing systems matters is, you know, the precision of accumulation often needs to be higher than you know the input and output precision. So that might be something that we need to enable with onyx model format.

C

Thank you calvin for bringing that uh up like a next roadmap session. We do have like one of the proposal. There is. How do we better reflect the usage of mixed precisions within a model as part of metadata and trying to give out? More probably uh like next roadmap session, will bring in a detailed proposal from our site.

E

A

Excellent, so good plug for the last session. Wonderful, all right! So, let's move to the third and last presentation today.

A

The floor is all yours.

A

And we can see your screen.

A

But we cannot hear you so I'm not sure if you're muted.

H

Okay, yes yeah, that was a that was a good guess. I'm gonna need okay, so so yes, so this is. This is a description for uh a deployment configuration that we are starting to take a look at at intel. My name is gabe stevens. I work with. I got with rajiv and other of the members of this of these forums. So thank you for the opportunity to describe to describe this.

H

So um so let me, let me first just very briefly touch onto the the actual the actual um characteristics of these of these, like uh uh training, slash, inference configuration, so we are the for the purposes of of of this presentation and kind of inter-generally calling distributed and federated learning, but but the the topic or the name that you will see most often associated with these. These specific.

H

Is federated learning and like the idea is that you would do some learning at the edge? So so you you, you deploy your model onto onto a fleet of of devices and some of uh and those devices just not carrying on. They carry not only inference but but they they have a training group themselves which which uh it has implications on the plasticity of the model. So they they. They enrich the model with the data that they are, that they are seeing.

H

So the the advantages of doing training at the edge are like well documented, and so so here I have a. I have a by no means comprehensive list, but you know like important things include the privacy or the technical considerations, such as latency efficiency or like use of use of computational resources that would otherwise go under underutilized and that sort of of uh thing.

H

So the idea of distributed or federal or federated training in this is that you have each of your device include as part of like one or several configurations of of of training, the capability to modify to update the parameters of their of their model and to also notify a server that how the model has changed.

H

The the at the the server would collect like a bunch of of such notifications and then then, average them or otherwise consolidate this into a into an uh a new model. That includes these. These uh findings from the from the edge devices and broadcast the new the new model for the edge devices to to to continue with with uh this.

H

So so so, if in the in the specific case of onyx, the the way that this with these, this looks like in a block diagram of how this would look like it's similar to what I have in the in the in the slide here. So once it once the the application receives the model from from from an update, they carry out their basically like the the pre-processing and inference loop as as normal, then, and then they they somehow updating their own model.

H

With a training like a local training loop at the uh and at the given intervals, they they update their. I update the parameters to the server and the loop, the the and you know like rinse and repeat sort of deal. So, as you can imagine this, this, this sort of configuration is not this well, like I mean this is not.

H

This is not an original original proposal that federated learning has a been of interest to for some time now, and I think some scenes like about 2017 or so like has has started to gain traction and we were starting to see a deployment of these or encodings of this configuration in other frameworks, so so it to make onyx relevant in this in this space.

H

uh uh What we're proposing here is like some some modifications I mean through the to the to the onyx format formatter to the unix, to the onyx uh attending libraries, so that it would facilitate this this particular deployment using using onyx. um So so we believe that you can do federated learning with with onyx pretty much as it exists, but but there would be some modification that would make it make the the user experience select the user, as in like the the developer of such application,.

H

More more likely to use to use onyx if, if these, these modifications or these facilities existed, so in particular with we believe that that a minimal set of of additions to to to the api of onyx would consist of is basically a way to query the part to get the parameters of the local model, to update those parameters and, and uh and most importantly, it's some some conventional or at least agreed upon metadata format, so that we could.

H

We could very much like, like we would in like transfer learning annotate what parts of the model are allowed to change from from different from different, uh like from from a learning or from a from a local look. That would presumably be trained with a much smaller data set than the one that that that the original model was was trained on.

H

So, um as as you can imagine, the full system or the full implementation of such a system has multiple design, design decisions, design, design points, and so the idea, or like a set a minimal set of desired characteristics of of making such inclusions in the in the the honest tonic. Spec good good, would very much leave a lot of these of these desires. Desired design points out of scope so that the the application specific decisions can be made, and basically that we would offload all of these decisions to the implementers of such a system.

H

But there are some things that I that I think that that they would make it if you, if they were included in the in the unexpected, they would uh facilitate such facilitate like kind of the the basic infrastructure, without limiting the the flexibility that would be needed for application developers like the three. The the three uh items on the list that I that I mentioned before are such, but also like this.

H

This notion that um if you have, if you are, if you're distributing your model across a fleet of devices, probably like some consistency in what version of the model you deployed as in the the same model, would have to have undergone the same optimization. So you don't you know you cannot. Oh it's it's not clear. How would you deal with parameter updates when one of your some of your devices are running a quantized model, for example, versus a non-quantized model and that it does those outstanding questions?

H

So so those those set of set of concerns seem to be more appropriate to the two to two encodings inside the model, rather than than decisions left for the for the the implementer of the the the higher order, the high the the higher level system, yeah and but but while still being mindful that in many of these, in many of these cases, it would be very useful to have the kind of a mixture of of uh or at least not mandate, that in a federated federated learning system, only onyx models are allowed to participate.

H

So yeah, as long as the architecture is, is con. The the model architecture is consistent and the some edge devices would be allowed to have a tensorflow model versus others using onyx as long as the the parameter update makes sense for both for both of these these options. In any case, this is the uh I my intention in presenting this is kind of soliciting feedback of whether there is an interest in in from the from the spec designers to pay attention to this sort of configuration of of learning and whether they would be open to.

H

You know like further discussion on this.

A

And I made it to 6 p.m. Thank you very much like good presentation, any any questions uh comments for this presentation.

F

Yeah thanks for bringing this up, I was actually curious um if you tried to do this with onyx runtime, um honest runtime supports training and I believe we've had some folks who've built some prototype or proof of concept of doing federated learning, using that, oh.

H

I'd be very interested in learning about that yeah! Absolutely! Yes! If, if you, if you mind, could you would you send a link or some reference or something that I can follow up on yeah.

F

um I I don't know if it's on github or anything, but we can, we can connect and talk about it and see if what what was done matches what you're trying to do all right. I appreciate that. Thank you.

H

So so rajiv, I don't know if you want to to add to add something to this nugget.

C

I think you have covered it. Thank you.

A

Any other questions from the community.

A

Well, if not, I I want to take the speakers. You guys obviously put a tremendous effort in the presentation and that's really much very much appreciated by the community and we'll be looking forward to the follow on. So the the goal is to to basically to present to have those ideas presented to the relevant sig for further in detail and proper proposals and and eventually implementations and look forward to your continued involvement with those ideas.

A

Thank you so much so the last session is going to be on october, 1st and I'll see you there. Thank you so much.