ONNX Training Working Group, 5 Jul 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: ONNXCommunityMeetup2023: On-Device Training with ONNX Runtime

Description

We will be introducing On-Device Training, a new capability in ONNX Runtime (ORT) which enables training models on edge devices without the data ever leaving the device. The new On-Device Training capability extends the ORT-Mobile inference offering to enable training on the edge devices. The goal is to make it easy for developers to take an inference model and train it locally on-device—with data present on-device—to provide an improved user experience for end customers. We will be giving a brief overview of how to enable your applications to use on-device training.

A

That was an interesting conversation. Next we have on-device training and we will have Sharma from Microsoft. Take us through on that.

A

B

You good morning, everyone, let's discuss on device training now I'm shama and I'm from Microsoft I'll also have Bijou with me on video, and we are both from the AI platform team at Microsoft.

B

So, let's dive briefly into the motivation for why we need on-device training for uh with Onyx runtime.

B

uh Before that yep outline motivation scenarios we'll do a tech Deep dive briefly and then we'll take questions in the end.

B

So motivation, uh assuming that I'm, an app developer I, will have my apps running on multiple devices across different platforms, maybe even using different Frameworks and I have a lot of data that's being created across these devices. I would potentially want to use this data to enhance the End customer user experiences.

B

However, these devices are resource constrained, so it could be a laptop, it could be a mobile phone and it may be resource constrained in terms of battery life in terms of memory, in terms of even the shared space that I have to have with other applications that are running on the device and, of course, privacy is very important, so I want to make sure the data does not leave the device and the training that happens to enhance these experiences is all local I.

B

Also as a developer want portable Solutions I want a solution that works across Frameworks I want a solution that potentially even works across different platforms like Cloud desktop mobile Etc. So what uh the on device training solution provides is uh using Onyx runtime. It is an efficient framework, agnostic, local trainer, that trains with Device data on the on the edge. So let's break this down a little bit, it's efficient because we've strived to make it both performance and memory efficient, uh including to save battery life.

B

We've also made a framework agnostic, so it starts with an onyx model. um As long as we have an Linux model, maybe from uh if you're, using or the inference you already have an onyx model or if, even if you're using other Frameworks, you can convert it to Onyx and it's ready for use and it's local. The data is not leaving the device. It's all the training happens on the edge.

B

It's it's preserving the Privacy for your end customers and it again it's it's training on the edge. So this this is using all the resources, the limited resources, including uh CPU. So that's the as long as you have a CPU on your device. You can train uh using the uh the data on the edge so quickly we'll touch on the scenarios.

B

The major scenarios are Federated learning and personalization. So personalization is where you fine tune on the device. Typically you'll enhance your inference, model and train it with local data, make it serve your customers so that it's it's serving it's just giving them a richer user experience. We have Federated learning, which is more training, uh Global model, so you train on the device.

B

You send the model differences up to a global model which is potentially in the cloud, and then you update that model with all the necessary changes from the from the edge devices without the data again leaving the edge devices.

B

uh We touched on a lot of the key benefits before, but I just want to reiterate that this extends the ort solution so um or the inference again. um If you are already in the ecosystem. This provides an end-to-end flow from inference to training and back to an inference model that you can run on the end user's device and again give them a better experience.

B

So with that I'll jump into our video presentation which in which bedroom will take us through a tech, Deep dive and he'll, also show us a demo for Onyx runtime.

C

Hello: everyone I'm Belgium and I'm thrilled to be here representing the AI Frameworks group at Microsoft over the past few months, our dedicated Onyx runtime training team has been hard at work.

C

Developing an Innovative solution for training, Onyx models on edge devices today, I'm extremely excited to showcase the incredible progress we've made on device training allows us to train models directly on edge devices such as mobile phones, web browsers and gaming consoles, all while ensuring the privacy of end users, so Onyx runtimes on device training solution builds upon the robust Onyx runtime inference framework, providing a memory and performance efficient way to drain models without significantly impacting power, consumption or battery life.

C

Now, let's delve into the process of learning on an edge device using Onyx runtime, which is divided into two phases. First, we have the offline phase. In this initial stage, training artifacts are generated using python in an offline script or a server or development machine. These artifacts include the training on its model. The evaluation Onyx model, the optimizer Onyx model and the checkpoint file.

C

Next comes the training phase, which takes place directly on the edge device. This is where developers interface with Onyx runtime's, training, API and Define the logic for the training Loop to better illustrate these two phases. Let's work with an example. Imagine we start with a simple Onyx model that consists of a linear layer, followed by a relu activation, followed by another linear layer.

C

For this example, let's assume that we want to train only the second linear layer in the model by passing this model to the Onyx runtime python utility, along with the names of the model initializers, we intend to train on the device we can generate the necessary training artifacts.

C

One of these artifacts is the evaluation RX model, which is essentially the input Onyx model with the Lost node attached. This node in this case is the softmax cross entropy loss node, serving as a crucial component for evaluation.

C

Another artifact is the training Onyx model, which incorporates the gradient graph of the input on its model. On top of the evaluation Onyx model, the training, Onyx model, outputs, the gradients for each of the trainable initializers, allowing us to focus specifically on the parameters associated with the second linear layer. In this example to avoid duplicate definitions within the training and evaluation, Onyx model, the checkpoint file, abstracts the shared model initializers into a binary checkpoint file.

C

Lastly, we have the optimizer Onyx model. This model takes the trainable model initializers and their computed gradients, effectively. Updating the model parameters in the direction of the gradient.

C

Now that we have these training artifacts in place, we are ready to deploy them and enter the second phase of on-device training. The training phase Onyx runtime offers a wide range of language bindings, including python C, C, plus plus C, sharp and Java, with Objective C and typescript bindings currently in development packages for Windows, Android and Linux are already available, while packages for iOS and web can be expected soon.

C

The entry point for training is the training session, which takes all the training artifacts as inputs and exposes methods that enable developers to train their models effectively. These methods include train step, eval, step, Optimizer, step reset grad and more once the model has been trained on the device. The training session also offers a convenient way to generate an inference ready on its model directly on the device itself.

C

The infinix Onyx model can then be used with the inference session to perform inference tasks.

C

Now, let me demonstrate the power of on-device training through a practical application. For this demo, we have developed an Android application that utilizes the mobile net V2 model. We specifically select the last classifier layer to be trained and we use the model to learn to classify celebrity images.

C

This idea can be extended to tasks like tagging images of loved ones from their gallery.

C

In our demo we will be using images of four popular celebrities: Tom Cruise Brad, Pitt, Ryan, Reynolds and Leonardo DiCaprio.

C

Initially, the model has no knowledge of these celebrities and randomly predicts their images.

C

However, as we start feeding in images and labeling them, the model begins training. We intentionally provide 16 images per celebrity, reserving a few that will be used later for inferencing.

C

Each input image is pre-processed, batched and fed to Alex runtime's training API, which updates the model parameters, allowing it to learn and make accurate predictions.

C

This training process occurs for five ebooks at the end of training, Onyx runtime seamlessly transitions from training to infants. Now, with the model fully trained, you can begin inferencing.

C

As you can see, the model has successfully learned to classify the celebrities using data that never left the Android device itself.

C

In closing, I want to emphasize the immense potential and impact of on-device training with Onyx runtime.

C

This approach enables us to bring the power of machine learning directly to the edge devices, providing an efficient way to train models, while safeguarding user privacy developers can unlock new opportunities for personalized and intelligent applications, whether it's image, classification, voice, recognition or natural language processing on device training empowers us to create smarter and more responsive experiences and can be used in a plethora of scenarios such as personalization and Federated learning.

C

Thank you all for your attention and for joining me on this journey into the realm of on-device training. I invite you to explore our work by visiting our GitHub repository where we host the code for this demo. Thank you once again. That's all for now.

B

Thanks Bijou, so, as major mentioned, we have um a lot of resources. I've listed it here in the slides and I invite you to check out the repo contribute reach out to us. If you have questions we again, as you mentioned, we have a large number of language Bindings that we support CC, plus plus C sharp python Java. We also have JavaScript Objective, C and Swift coming up pretty soon. We have support for Windows Linux Android, and then we have IOS and oit web support coming pretty soon. So with that, thank you.

B

Everyone, and if there are questions.

B

Go ahead, yeah.

A

C

A

Software and that's where the smart, sequential data.

B

So let me repeat the question uh as I understand, the question is: is there are there plans to text to speech for on-device training of a sequential data.

C

A

B

So um on device training is not restricted to Vision. We do support speech, um along with images text to speech, there's no restriction in terms of the uh the the task itself.

B

So if your task, if your model is supporting text to speech, you can bring that down to uh on device training, it's it's mostly around um the last couple of layers of your model that you will be training and as long as you are able to train that offline and create a text-to-speech model, you can bring that down to on device and then you can train it. Also just it's just that the examples to show vision is more appealing to a large audience, but um but it's yeah, it's applicable to text as well. Yeah, yes,.

D

Oh yeah, so this is a online users question, um so there are actually two questions. First, one is, um is the on device? Fine tuning done online or first fine tune and then use the personalized model.

B

So Define, so is the fine tuning done online or you first fine tune and then use the personalized mode. So there are two stages: um one is the offline validation where you essentially prepare the model, uh get it ready so that you can deploy it. But there is um a phase of validation online, where you make sure your model is ready for production testing and then the actual fine tuning, where you use the user's data and you tune the model to make it ready for inference later on, is done on the edge.

B

So that's why the data never leaves the device all of the tuning happens on the device. There was a question in the back. Yes,.

E

I was asking uh whether all the training node operators has to be explicitly defined and included in the Onyx operators app. So what if I have some very special data pre-processing for the training framework and it cannot be represented by uh you- know those Onyx operators, specifically Onyx training operators. uh How am I going to deal in this particular scenario? Sure.

B

So the question was what, if I have special training operators that are not part of my Onyx inference model? How am I going to deal with it right, so um you do not have to have the training operators in your Onyx model. You start with an inference Onyx model and then, when we have the artifact generation phase that baiju had mentioned. That's when the training graph gets created so potentially most of the operators standard operators for training will be available.

B

However, if there are instances where you have a special model, you can reach out to us and then we can get that added into the system, but we don't expect a training graph to be present in the beginning. Yes, are there any important consumption measurements? Sorry.

E

I didn't get that. Are there any like experiments done on like power consumption on like Edge devices, we.

B

Are in the process of and we'll we, so we are having a multi-part Blog series, that's being released. We are in the Deep dive stage right now, but we are also going to release um the measurements for both performance and Battery, as well as memory requirements. So we'll let you know soon, yes, and the question was: uh was there any power measurements done? Sorry about that.

D

Yes, yes, one more, it's about the same, also I'm, very curious about the Federated learning use case. How would a custom model incorporate the waste from other custom models in bracket all originated from the same Onyx model.

B

So I'm not going to delve into the details, so the question was: um how can we talk more about the Federated learning use case? I will share some resources.

B

um The Federated learning is work done by Microsoft research and there are other companies as well, but essentially we we transferred the model weight differences back to the cloud rather than sending any of the other data specific differences and that's how the global model gets updated. I will share some resources, maybe in the team's Channel, and then they can look into it further.

B

Thank you so much. Thank you. Everyone. Thank you.