ONNX October 2021 Community Meetup, 21 Oct 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 004 ONNX 20211021 Wang ONNX Intel Neural Compressor A Scalable Quantization Tool for ONNX Models

Description

LF AI & Data Day - ONNX Community Meeting, October 21, 2021

Intel® Neural Compressor: A Scalable Quantization Tool for ONNX Models

Speaker: Mengni Wang (Intel)

A

Hello, it is my great pleasure to make a presentation about intel neural compressor, a scalable quantization tool for onyx models.

A

Intel neural compressor is an open source, python library running on intel, cpus and gpus. It delivers unified interfaces across multiple deep learning frameworks for popular network compression technologies such as connection sparsity with pruning and knowledge distillation.

A

The left diagram shows the architecture of inter neural compressor. The top is acceptable, fp32 framework models. Middle is the inter neural compressor containing many user facing apis and the key compression components.

A

It is built above deep learning frameworks, including tensorflow, python, onyx and mxnet, and can yield a compressed framework framework model for deployment um continuation said it unified, different connection, apis of different deep learning frameworks and provides an out-turning mechanism to help users figure out the best low precision solution on intel hardware rapidly.

A

The right diagram shows the workflow. It takes fp32 framework model, calibration data set evaluation, data set and the evaluation metric as inputs after backend, independent optimization.

A

It acquires frameworks quantization capability and constructs an op-wise quantization turning space with some strategy. It picks picks up one of constellation, configure configuration and generate a context model by invoking framework quantization interface.

A

It will evaluate the accuracy of this context, model and checks if it meets predefined accuracy goal. If it doesn't meet this meet this goal, the turning strategy will select the next constellation configuration and generate a new contest model.

A

This flow will keep work till it finds out a contest model, meeting, meeting accuracy goal.

A

Next, I will show how to deploy onyx model connection with intel neural compressor rapidly.

A

There is an example of configure file and the launch code on the right side since the launch code is generic user just needs to update some minimal items in blue boxes, and then the trio will construct a coordination pipeline with built-in data sets transforms and matrix.

A

According to the settings of config file automatically, it can speed, speed up the process of connection greatly by reducing preparation of influence code and the quantization paper pipeline construction of specific framework.

A

There are many typical built-in datasets transforms and metrics in this tool, which can cover most computer vision and some nlp tasks.

A

For example, user can organize the image folder data set resize transform and top key magic to build an image classification paper.

A

Thanks to these built-in components, user can contest a model with just five lines of code and minimal changes in config file for more detailed instructions. Please refer to internal compressor, github repo.

A

Now let me show our contributions to onyx model zoo. This table shows our status in onyx model composition.

A

It includes widely used computer vision models like resnet, 50, vd16 and shuffle net v2, as well as some typical nlp models like birch robot and the distilbert, compared with their fp 32 baseline. The contest models have smaller size, performance improvement and less than one percent accuracy drop.

A

It can make great great contributions to productivity until now, resonant 50, vdd, 16 and the shuffle net v2 have been upstreamed to onyx model zoo, and we got some positive feedback from mother zoo owner. He said into eight res: 15 is the first contest model for onyx model zoom, and it shows a great performance improvement.

A

Other models are working in progress and we will submit them soon.

A

This is our contribute contribution plan. We plan to contribute to more context models in the future, which will enrich the diversity and the skill of intent models in onyx model zoo. Our goal is all enable the fp32 models in onyx model 2 would have corresponding context models through inter neural compressor.

A

In the first stage, we will focus on image classification domain based image, classification, object, detection and the image segmentation models.

A

Then, in the second stage, we will mainly focus on machine comprehension, speech and audio processing, image, manipulation, body phase and the gesture analysis models.

A

Also, we will reach more condition strategies and improve api design to make interior neural compressor, more powerful and easier to use.

A

Well, that's all of my presentation. Thank you for your time.