ONNX June 2022 Community Meetup, 13 Jul 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: ONNX Tools: Polygraphy and ONNX-GraphSurgeon

Description

Over the years, NVIDIA's TensorRT team has developed tooling that makes it easy to generate, transform, and debug ONNX models. Among other things, this includes a sanitizer that can simplify your models, and an automated bisector for debugging ('git bisect' for ONNX!). In this talk, I'll cover some of these tools and how you can effectively leverage them in your workflow.

Pranav has worked as part of the TensorRT team at NVIDIA since 2018, developing, among other things, ONNX tooling like Polygraphy and ONNX-GraphSurgeon.

A

Hello, everyone and welcome to my lightning talk on polygraphy and onyxcraft surgeon. So a little bit of background on me. I work on the tensority team at nvidia and onyx is our primary import path. So, as you can imagine, we've developed quite a bit of tooling for it over the years and this talk will cover two of those tools.

A

The first is onyxcraft surgeon, which is a python library that lets you, create and modify onyx models, and the second is polygraphy, which does many different things, but for the purposes of this talk I'll cover the onyx related aspects of it. So we'll start with onyxcraft surgeon and at the heart of it, is a python based intermediate representation for bipartite graphs that consist of nodes and tensors and we'll see exactly what this means on the next slide.

A

But you can essentially do any kind of modification on this intermediate representation using a simple python api and in addition to that, it provides some conveniences like constant folding topo, starting and dead layer, removal and I've linked the source code and examples on this slide. So here's a very small example of the ir.

A

The basic idea here is that nodes produce and consume tensors, and so, if you look in the center here, nodes have ops names and various different attributes and tensors all have names. Data types and shapes in addition to the fields shown above inputs and outputs are also tracked for tensors inputs and outputs are the nodes which consume and produce them and for nodes. There are the tensors which they take as inputs and outputs.

A

So this makes graph traversal very easy as compared to something like onyx, helper, apis and editing inputs and outputs lets you restructure the graph and reconnect things in the graph and once again, everything shown here can be edited freely using the python api or constructed from scratch manually.

A

So, in addition to the lower level, apis, there's also some higher level wrappers, which make it easier to construct models. So here we'll see how to use the register api to add methods to the graph class and they can be arbitrarily complex and they can access the graph via a self parameter.

A

They're totally reusable, so you can stick them in a module or submodule and use them across your project. So here's a quick example of implementing leaky relu and this is using the layer api to actually define the node and tensors and we'll see how to use this on the next slide.

A

Once we've registered one or more ops, we can use them directly from the graph instance, and this makes creating models almost as simple as constructing tf graphs, for instance.

A

So here we're constructing a graph with a particular input, and then we give that to leaky relu, which gives us an output tensor, and then we mark that as a graph output and finally export the model to onyx next we'll dive into polygraphy calligraphy is essentially a toolkit for debugging deep learning models. So it includes a python api as well as various different command line tools.

A

There's lots of functionality included, but for this stock I'll focus just on the onyx tooling, and I've linked the source code and examples on this slide before we get to the onyx specific tools. I just wanted to mention calligraphyrun quickly. So this is a tool that lets you run inference with various different backends like onyx, runtime and tensort, and then compare results.

A

So here's an example of running a model.onyx with an onyx runtime backend and since there's only one back in here. It's not actually doing the comparison, but as you can see it, it still runs inference.

A

Moving on to the onyx specific tooling, we'll begin with inspect model which can show us a text, representation of onyx models and the display can be configured to show things like initializers and ones and attributes.

A

So here's a small example we're doing polygraphy inspect model, giving it model.onyx and we've configured it to show layers, attributes and weights. So, as you can see, it tells us the name of the model, the the onyx offset we're using the inputs and outputs and you can see at the bottom all the nodes.

A

So this is a nice alternative to tools like netron, if you're limited to the command line, maxwell cover tooling, that can modify onyx models on the command line, so the surgeon, sanitized sub tool lets you simplify and fold constants in the model. This is a lot like onyx simplifier, but it has a few key improvements.

A

So the first is that it can preserve dynamic shapes, whereas in onyx amplifier you actually have to fix the shapes to use it and even as it preserves dynamic shapes, it can still simplify, shape computations, where possible, it's highly fault tolerant due to partitioning, and it has special optimizations like removing conditionals, which are known at build time and removing casts which do nothing.

A

Here's a quick usage example: you do polygraphy search and sanitize, you give it the model which is model.onyx. In our case, you give it an output path which we're using for the dynamics and finally set the fold constants flag. Here's an example with an actual onyx model.

A

So this is a typical pattern that you might see in a pie, torch exported model and, although it looks complicated, all it's really doing is flattening the last three dimensions.

A

You'll also notice that the leading dimension is dynamic, but the last three dimensions are all static, so this should be able to be done in a single reshape and once we run it through surge and sanitize, that's exactly what we get. The dynamic dimension is still dynamic, but the entire shape subgraph is gone.

A

Next up is surgeon, extract which lets you extract subgraphs from a model, and so this is super useful for debugging.

A

You can use inspect model or netron to figure out the input and output tensors that you're interested in for inputs. You also need to provide the shapes and data types and for outputs. You just need the data types, so this is the format that the tool takes for the inputs and outputs. It's tensor name, colon shape colon data type, here's a quick example- and if you don't know the shapes or data types you can use auto to indicate that these should be automatically determined by the tool.

A

Let's do a quick example, so here's the model that we're interested in- and you can imagine that we want to extract the leaky rail from this model, maybe to debug it in isolation.

A

We can see the names from the netron view, which are identity out, zero and leaky reality out, one we'll use those names and we'll fill the shapes and data types with auto so that polygraphy will decide them for us. So here's the command. We do polygraphy surgeon extract, we give it the input model model.onyx, we tell it the output path, which is subgraph dynamics, and then we set the inputs which are identity out, zero, uh auto auto for shapes and data types.

A

We give it the outputs, leaky really out one and auto for data type, and this is what we end up with. We. We have a model, adjust the leaky value which we can now debug in isolation.

A

Finally, we'll cover the model bisector, the tool is called debug reduce and it's a lot like git bisect, but it works for onyx models.

A

The basic algorithm, assuming we start with some failing model dynamics, is as follows: remove some number of nodes from the model and generate a new model if the new model fails, we repeat that process.

A

If the new model succeeds, then we add back some number of nodes and generate another new model, and then we check if that fails- and we keep doing this until we find the smallest failing model.

A

So I'm using fail and pass here very generally, because bisection works for any type of failure, so anything where an onyx model is involved. Bisection can be useful. That can be things from onyx runtime accuracy bugs to issues with metron display to even bugs in the model itself. So to better understand this, let's see it in action.

A

So imagine we have the following onyx model, uh and this is giving us an error when we run it. So we can do calligraphy, run model dynamics, and we see this a very big error message from onyx runtime.

A

Reducing the model to something smaller can make it a lot easier to debug. Of course, in this case it's already pretty small to begin with, but for the sake of example, we'll start off with the interactive mode, so in this mode, debug reduce, will generate models successfully and then prompt us to report whether each one passes or fails and we'll run each of these models manually using run and then report back.

A

What we see the debug reduce command in this case is pretty simple: it's just calligraphy debug reduce the input model which is model.onix and then the output path for the reduced model, which is reduced on onyx. Here's what it actually looks like in action.

A

I have two terminals open here, so on the left side, I'm going to run polygraphy, debug, reduce and on the right side, I'm going to check each intermediate model. So if it passes I'll go back to the left terminal and enter p for pass, if it fails I'll enter f and move on to the next iteration, so let's see it so I've started debug reduce run the model, it fails, so report failure run the next model report, failure again and then we're left with the final reduced model and here's what it looks like.

A

So we can pretty clearly see that the reshape is invalid, since the volume of the input and output dimensions does not match. That was pretty easy, but we can actually make this process easier by doing it in an automated fashion. So, instead of running the command manually ourselves, we simply tell debug reduce which command to run. So here's what that looks like we do. Polygraphy debug reduce, give it the model.onyx.

A

We give it the output model, which is reduced.onyx and we give it the check command, which is exactly what we were doing manually before essentially running the intermediate model with onyx runtime and the resulting model is exactly the same as what we got in interactive mode. And that concludes my talk. If you have any questions, feel free to email me at the email that I've posted on this slide and thanks everyone for attending.