ONNX October 2021 Community Meetup, 21 Oct 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 001 ONNX 20211021 Ning ONNX Runtime Web for In Browser Inference

Description

LF AI & Data Day - ONNX Community Meeting - October 21, 2021
Emma Ning (Microsoft)
ONNX Runtime Web for In Browser Inference

A

Hi everybody: this is emma a product manager from microsoft, ai framework team. Today, I'd like to introduce alex runtown web, a new solution for in-browser inference.

A

First, let's take a look at why in-browser machine learning is interested by ai community. As you know, there are variations in programming languages and deployment environments. It's challenging to make relative ai applications portable to multiple platforms.

A

However, web applications can easily enable cross-platform portability with system implementation through the browser. Furthermore, running motion learning models in a browser can protect user privacy and accelerate the performance without sending data to the server.

A

It also simplifies the distribution experience without asking for any additional libraries and the driver installations.

A

Then, what's on its runtime web, it's a new feature in onyx runtime, to enable javascript developers to run and deploy machine learning models in a browser.

A

Some of you might have heard about onyx.js onyx runtime web is the upgraded version which is going to replace our next.js with improved inference performance model coverage as well as development experience.

A

This graph shows onyx runtime web architecture. It has two backends web assembly backend for cpu and webgl backend for gpu that allows orange runtime web to accelerate the proof on both cpus and gpus.

A

I will dive into each backend in the following slides.

A

Web assembly is a way to use server side code on the client side in the browser compared to javascript. It has faster load time and higher execution efficiency.

A

Furthermore, web assembly supports multi-threading web worker and simd to accelerate bug data processing. This makes web assembly an attractive technique to execute the model at a near native speed on the web.

A

We leverage the immune scripture to compile the later audix runtime cpu engine into web assembly so that they can be loaded in their browser by doing that, onyx runtime web web assembly backend can run any onyx model and supports most functionality.

A

Native rx runtime supports including four onix operator coverage, quantized onix model, as well as mini audix runtime, build onyx runtime web also utilize, maltese, writing and simd in web assembly to further accelerate model inferencing taking mobile at v2. As an example in this table, the cpu inference performance can be accelerated by 3.4 times with three thread of these two threads, together with a simd enabled comparing to the pure web assembly without enabling these two features.

A

Different from web assembly backend, which can leverage native linux runtime webgl backend, is a pure javascript based implementation with webgl apis webgl is a popular standard for exercising gpu capabilities.

A

It's a javascript api for rendering interactive, 2d and 3d graphics within any compatible web browser based on opengl webgl provides direct access to your computer's gpu, so we adopt a webgl to accelerate the perf with gpu.

A

In addition to that, onyx runtime web also enables many optimizations to further push the performance to the maximum.

A

For example, by now region pack mode onyx runtime map reduce up to 75 percent memory footprint, while probing improving parallelism on gpu influence.

A

All right operator, support and platform compatibility are two important factors for ai development, as we discussed before since the whole onix runtime cpu engine is built into web assembly backend. All the onix operators are supported by web assembly. Backend webgl backhand only supports a subset of onyx operators.

A

We are keeping updating all of all the supported operators through this link for user to quickly check this table shows the compatible platforms each by canned supports, as you can tell both web assembly and web gr. Backhands supports most popular platforms in the web world.

A

Here are code slippies of running artists, runtime web. The apis are quite similar to what relative rx runtime provides, create. A inference session then run the model with session run.

A

By doing that, we can provide a consistent development experience for server-side and client-side, influencing with onyx runtime to demonstrate you know web machine learning capability with onyx runtime web. We build up on its runtime web demo website, where you can see several interesting in a browser vision. Scenarios powered by image models here is an example of running mobile net model. In a browser you can choose different backend for cpu or for gpu.

A

We open sourced this demo website as well for you to dive into detailed implementation.

A

We are keeping improving alex runtown map and the welcome community's contribution here are three major buckets for further improvements.

A

Currently webgl backend supports limited rnix operators to improve the model coverage, not only adding more operators, but we are also working on enabling fallback from webgl backend to web assembly backend.

A

Secondly, there are still a lot of opportunities to further optimize onyx runtime web, including both performance and memory consumption, for example. Webmn is one promising technique on its runtime web could leverage in the future some experimental results have already showed very promising performance gains.

A

Lastly, we are going to work on more demos, since demos can help showcase onyx runtime web capabilities, as well as helping user ramp up quickly.

A

All right, that's the end. Please try out alex runtown web in your web applications looking forward to the feedback thanks.