National Energy Research Scientific Computing Center (NERSC) Deep Learning for Science School 2019, 7 Aug 2019

Previous Meeting

⏯

youtube image

►

From YouTube: 22 - Geometric Deep Learning - Or Litany

Description

Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda

A

Thanks for the introduction should I just the one I'm wearing I closed this. Can everyone hear me awesome all right so yeah thanks for the introduction, I always try to hide the fact that I did my bachelor in physics, because then you know, people were expected. I'll know something about physics now, but no, you outed me so yeah feel free to ask questions about that.

A

So there's some abuse of notation about what geometric deep learning is, and it's actually fun to be in that field, because then you know we can just kind of collect all those things that fall from the other traditional deep learning right. So it's it's! It's a lot of fun, I like to say to think about that at anything that doesn't fit in a regular grid, we'd like to call geometric dipper and then that's general enough, but specific enough at the same time.

A

So you can write grants and get research on these things right, so I'm, currently a postdoc at Facebook, AI and moving to Stanford in September for a second year of a postdoc and yeah. That's Devine I like this to be open for discussions because I have like many slides and if you'll just go through all of them I mean if my interest you for a couple of slides, some some slides I.

A

You know if they interest you, then I don't have enough of, and then we can just open this for discussion, so just feel free to interrupt me with questions and it's okay, if you don't finish them, because they're online, so starting with 3d I mean I, always like to go back to this to this paper. That's kind of you know to cook, not we I'm not part of this paper, but they're authors on this paper. It took a bunch of images and then kind of rebuild a 3d model.

A

I always like to look at this as an example to how we keep representations in an efficient way right. So there's something a little bit unsatisfying about just keeping all those images, because in the end, they're just there as some projection of what's there in 3d, so I always like to start with saying that is just a better representation of the world.

A

It's also important to understand where we come from when we say 3d, because because many times when researchers have talked about 3d, we thought of you know a handful of clean, meshes and and synthetic objects that artists created for us to work on and not for us actually for themselves. But then we took those, but only a few right, so like no messy massive data collections who actually work on, but nowadays I mean similar to kind of like the transition that images have seen. You know what I guess.

A

In the old days, people have tried to create models of images by using Springs and masses and stuff like that. But then you know everyone has a camera in their pocket. So let's use that I think this is something that we see in 3d starting to happen. So this is the right time to kind of prepare for the revolution of 3d. So this new iPhone already has a scanner in the front camera that produces pretty dense point cloud.

A

If you want to use like there are apps out there, that can only give you that that could already give you that in data, so we're almost ready right and it's not just the this camera. So we have a bunch of sensors and now they're, smaller and cheaper, and it's not only about acquiring the data, because we also need to ask ok. So we require data. What can we do with it? So now we can have. We have stunts like VR and AR devices, so we can enjoy that data. We can.

A

We need to process it in different ways. We can print it and some companies try to drive with it as well.

A

So what's our job I'd like to think of our job, as you know, being there to collect good data sources and then at the same time develop the tools that will allow us to work on this on these type of data yeah and that's the to me. That's the fun part and.

A

Start with maybe addressing one key property of 3d, that's different than 2d, because I think you know in 2d probably heard a lot this week about different ways to process the data. But somehow everything starts with a continent, and one reason that that's possible is the fact that images are represented on a regular agreed and that's kind of like agreed by everyone. So we know that the camera manufacturers they're not gonna, do anything radical for the next.

A

You know the next version of the iPhone will still have an array, and that means that we can keep our algorithms still working on arrays. That's fine, they're gonna be 2d they're, not gonna, invent any. You know some sort of a weird thing, because that will influence an entire complex pipeline. If really, we don't have that agreement all right so because it's kind of maybe early stage and also not because it's early stage also because it's not always appropriate to use the same representation because you have different needs in there.

A

So without this agreement we were left with a bunch of representations. These are the ones that are that I'm. Showing here are probably the most common ones, so right so I'll talk about so one of them is two images right, like the first slide, I show they're just a bunch of images that represent some underlying 3d object, that's still valid, so we can still use that.

A

If not, then we have point clouds and I'll talk about each of them right, meshes, voxels and I'll briefly, mention also level sets, maybe not so briefly, depending on your questions, all right, so let's maybe take extra minute to maybe go through a bunch of things. So it was importantly when preparing the slides to kind of push the message that Phoebe is not just these guys. It's not always the case where you get a 3d or you're trying to output, something it VD is when you need 3d, ok and what do I mean by that?

A

So, let's take this example from you know. You have an image and let's say you want to do some kind of enhancements, denoising depth of field whatever, and we can think even with you know, even if your model, assuming you're doing your solving that problem with with a neural network, even if the model runs in 2d honor to the input and outputs 2d, it has to understand something about the 3d or the depth of the scene.

A

Right, for example, if you want to, if you want to put one object in focus, but not the others, and you need to understand, what's the difference between a boundary between object objects and a boundary within an object right, if the stripes of the zebra are boundaries within the object, you still want them in the same focus, but if the boundaries between two zebras and they're not standing at the same distance- and you want to understand that and distance is already not- is no longer 2d right so and and the access I point put here is going from implicit to to explicit I.

A

Think if you read recent literature.

A

Not all papers mention this, but you can find some part of the network with the kewpie you can think of as like you know, somewhere in between on that spectrum, between very impressive and very ecstasy, and that's that's a game I like to play when I read those papers, so this actually from from a work we recently submitted and in here we had like the same the same thing.

A

So imagine you have like two images and you want to say: are they the same and again like 2d methods, working on images, doing metric learning from very well using features like color, but once you've changed the view angle significantly, then maybe you can benefit from seeing many other 3d models of cars right. So you can imagine some underlying 3d model of a car is placed somewhere in the memory of the networks that so that you can utilize it a little more towards the explicit side.

A

Is the multi view representation later, but we can all understand that unique. It's a bunch of images and they're there to represent a 3d model, then there's or importance to talk about speedy. Yes,.

A

Yeah so essentially the purpose is just to say that really exists, even if the input and the output are not really alright, so the 3d is useful, even even if the input in this case is 2d and say. The output, for example, is a 1d like a decision, whether it's same car or noticing.

A

So obviously, if I told you that the output I'm expecting here is a 3d reconstructed model of the car you'd be immediately convinced that 3d is necessary, but I'm just saying that, even if the output is not really still, maybe 3ds is useful somewhere implicitly inside the model.

A

That's the that's the key message right- and this is a very, very cool work from from from deep mind from I guess like two years ago, where again at no point in the system so what's happening here, is that they try to teach the network about the existence of this scene, but the way they do it is by only showing it images from different viewpoints, and then they want the network to be able to generate a new image. So again, the input is 2d and the output is also 2d, but with the camera direction.

A

So it means that somewhere inside the model needs to build a 3d understanding of the scene. Otherwise, how could it you know generate it, but what type of a 3d model? It's definitely not one of those four or five that I showed in the first slide right, you don't know, we don't know. So that's right now, oops that should have been 3d. So, of course, when we go towards more explicit 3d that that's clear right. So here the input is 2d, but the output you're interested is. It is a 3d point cloud.

A

So definitely fit is important here as a representation.

A

Surely all those applications that start Phoebe and try to do something like segmentation, you know correspondences object, detection, that's like what you immediately think of rings, maybe and and to some extent, maybe like it's not fair to call this light, VD and then say that's beyond 3d, but I'm going outside the spectrum and we care about other things that are like especially graphs. It could be used to represent VD, but are more general yeah.

A

Okay, so maybe, let's briefly discuss the two first representations, so the first one is the multi-view and the reason I don't want to touch too much up on. That is because you've heard the entire week about the neural network. So you kind of already know how to solve this. This thing and the other one is voxels.

A

So here, let's say that the problem is you take a bunch of images and you want to classify this thing, for example, so you basically process each image in the penalty using your your CNN and then essentially you don't care about the fact that he was easel as a 3d. So the first works that have done this type of things. They just said: okay, let's take the collection of images, and the only thing we need to care about is how to pull them together, but once I pull them together.

A

I have one feature vector and I can process that try to classifier so.

A

Right this is another another example in in this example. We actually so here is again one of those examples where we're somewhere in between the implicit and explicit where each image is being processed individually, but the way they're aggregated- and this is something that I kind of want to put out there, as maybe a general message doesn't necessarily have to do with this. Talk that if you're no longer you know, if you don't know what happens in a network at all, but you only care about one output.

A

That's like a classification, then fine I mean that that representation can lie anywhere to be the you know the layer before the feature or the logits or whatever you want. But if you want to do aggregation, then it may be a smart idea to ask the network to give you to do. Processing such that the operation you want to do is easy, and what do I mean by that? So, for example, here we took a bunch of images and we represented them each in a 3d frame and what the reason we did.

A

That is because we wanted some kind of a 3d reconstruction and what we know to do right, what's difficult to do is to merge two images, because how do you do that? But if those are already 3d models, then it's just the union operator right. So the thing I need to ask my network just give me something. Give me a representation where it's easy to unify things, and so this is a general message, but that's again, like an example of usage of 2d images to represent an underlying feeding model.

A

So here it's a classification and here is a 3d reconstruction and also that representation can give you a novel view. Render voxel grids were probably the first. You know the most naive generalization from 2d to 3d right, so you move from pixels to voxels. Just add another dimension. It's also easy in terms of the machinery you have you don't have to introduce, except from memory which we'll discuss you don't have to invent anything in you right. You just add one more dimension.

A

What's one more dimension, if you already work with a very deep, you know per pixel representation, so each each pixel is no longer RGB right, but we have like hundreds and hundreds of feature, so we just add one more dimension and then so everything stays kind of kind of the same and and it works to some extent. The main issue with this thing and the reason he doesn't like we're trying to find other solutions is mainly resolution all right. So here's there's one example. There are many many such ah you can find online.

A

That basically say that if this is the thing you wanna you want to represent, then if you run a really good representation that keeps all the details, then you're starting to find you're starting to have sorry a memory issue right, because you need to now keep huge arrays in your memory, so you can no longer process many batches together, like, although all those nice, if you like to do right.

A

Sorry right, so you can imagine that you have this VD model right and you place just place a like sorry leg. Oh yeah, it ends up looking like a Lego, but essentially because you take just a 3d grid right and say it's a 32 by 32 by 32 voxel grid, and then you can have a like a decision.

A

If that box is completely free of that say, existing points, then it's just not not occupied any number of points in that voxel exceeds a certain threshold, and maybe you can have many many ways to decide whether you it's okay occupied or not stup, according to the threshold Oh. Oh sorry, so it's the percentage of the occupied voxels right. So this one says that if you use the 32 by 32 by 32, then you see that, like I mean more or less 10% have been used, but the rest are zeros or just free space right.

A

The issue is that you need to process this free space unless otherwise treated you. Don't you don't, like your network, doesn't know, what's a zero and what's not a zero in terms of computation, you're gonna input that huge thing ten percent of this is actually has like meaning meaningful values. The other are zeros, but you want to process everything.

A

So you need to keep everything in memory and, as you want more resolution, but then less and less usage is being is then, actually you so you're mainly processing zeros right just to win more more resolution, all right, so that truly not is the optimal way right right, so so few techniques that try to mitigate this is the first is still in in voxel world, it's octrees, so that follows kind of like a hierarchical split of the of the space.

A

So whatever, if you have you start with large voxels and they're completely empty, then you can imagine that there's no reason to split them to break them apart with smaller vassals. And then you can do this iterative process and this one is going one one extra step, further and kind of computes objects in a hierarchical way again, but in a kind of like an onion layers, I mean the techniques are interesting in my mind, whatever is not simple enough is just not not gonna, not gonna stay.

A

You know this community, like simple things like the computers wants to do simple operations because they can scale up very quickly. These things are very interesting ideas, but I don't see a lot of adoption in the community to those techniques right. So that's that's a that's work for voxels. Anyone has questions on the two multi-view or voxels all right, let's go into the more interesting stuff, so I think so.

A

Hopefully, it's convincing that voxel is not the best way to go, and especially if you want like a good representation of an object, so meshes graphic people, graphics, people have already used meshes for you know for so long because of exact exactly this. This reason why you, just you know, have an efficient way to represent a single surface right and point clouds.

A

Yeah I'll talk about boats, maybe I'm talking about that first, so for point clouds its so actually I'd argued that the reason we want point out is not for a presentation, because if you think about it, whenever people read whatever you want, so you can't draw a point cloud right because the point doesn't have any you know it doesn't occupy space right.

A

So it's not a representation issue as much as it is a sensor issue right now, then we have lighters sensing you know situated on top of cars, so we just get these data in such a structure that we need to know how to process. No. No! So it's not okay. So that's a very good question.

A

So maybe I'll start with describing the point cloud because that's very intuitive and then, when I'll get to talk about meshes I'll tell you exactly the difference right, but but what you should keep in mind for now is that the main difference is that in point, clouds is just a bunch of points and the mesh also tells you how the points are connected.

A

Okay, just a simple example: if you sampled my fingers right and the point here and a point here are as close as the point here and a point here, the difference is that you don't want to connect these two, but you do want to connect these two. These two are situated on the same surface, and these are not right.

A

So you in order to understand that that's hmm right, so in terms of the representation, yes, whether you can or cannot, fold or unfold, you know a 2d, a 2d plane to the entire 3d surface now has to do with the topology of the 3d shape you're trying to represent, but yes locally. At least you can build those atlases and and do that right.

A

So, let's start with with pointers, I think this is like something that the community cares about a lot in recent years and and I I'd argue that mainly it's the driving force of the no pun intended, but the driving force of the auto driving and I think like. So what are the challenges when we want to work with with?

A

We want to work with with point clouds. That's different from say, image, pixels right, so a we don't want to first process it. So one exam one thing with it we could do is we can convert this to a voxel agree right? You just place your grid around the object and then you're back at the voxel grid. So we have a solution for that great, but you want to go raw because you don't want to invent data, you don't want to lose resolution, you want to be efficient and you don't want to process zeros.

A

That's that's another thing that so that's the unstructured right. Another thing that point sets are sets right, so they don't have a defined order. So that's also something we need to address right. If there's, no, if I put a grid, I, essentially hint that there is an up direction and there's an order in which I can process those points, if I want to push them to a recurrent neural network and treat them like an anon like text inputs or audio input, it has order.

A

This is just a set right, whether I permute it or not, shouldn't matter to the algorithm so right. So how do we? How do we treat? How do we handle this? This thing so, a few years back there was this point at paper that came out. I think was the kind of the original paper that started the whole rush. 44.5 networks- and it start with this, like very, very, very simple observation- is that if you want to be invariant to the order of the points, then essentially you needs to only work with symmetric functions.

A

Right and a symmetric function is essentially what I just said: right: a function that doesn't care about the order all of the input it produces the same the same out and some easy examples of such function will be. For example, if I give you a set of values and I, just ask you: what is the maximum value out of all those values? Then you don't care about the order in which I gave you those bad right.

A

Another thing you can do, just you know just combine them together, add them together, and there are many of those. So that's exactly what point it is proposing is proposing to do so. You start with your n by 3 representation of a point, as you have on endpoints in 3d, so and by to be right and the first thing that the network does. If you can see.

A

I, don't have a pointer, but the first thing that the network does is processes each point independently. All right, that's the other. That's the other thing that you can do that. Keeps you equivalent to the order right. So one option is to you know, make an operation like adding all the points together and then your environment, because you don't care about the permutation, but another thing you can do is be equivalent. So you do the operation point wise right and if you per moved, then it changes, but it changes in the same way.

A

So if you first do the operator and then permute or if you first permute and then do the operation, you get the same result, and so that's another kind of a legal operator that you can do so. You can take the endpoints and process each of them individually, say by passing them through a multi-layer infrastructure on each of them separately. Right. But now you can introduce one of those symmetric functions, say the the max per channel max right. So now you let's say you went from n by 3 and I'm skipping.

A

Some part here to N by 1024, you can, in a channel wise fashion, take the maximum value of those points and just global poo and you're. Now agnostic to the order in which you saw the data. Is this clear, yeah.

A

Sorry, oh, it's just the number of channels at each layer of the MLP right, so you start with 3 and then you have like essentially a matrix multiplying from 3 to 16 or whatever. So this will be like take your 64 to 124 1028, yes, just on design choice, yeah, right, yeah and there's some some interesting arm observation in the paper about.

A

So the property of this thing essentially says that you don't lose a lot right or you can be as close as possible to do to the function. You actually want to represent well still being ordering variant, and maybe you know a little intuition about that, because when I first read a paper, it was kind of puzzling to me, for you know why is it even useful to process each point individually right? So what's the point?

A

It's really gonna tell you it's just a point right, so if it doesn't communicate with any of the other points, what kind of information can it can it you know hold, even if you take the three dimensions and represent it in a 1024 right, so one interesting intuition behind that part is let's take this simple example: let's imagine that the network, so one option right, so it's just choosing another parametric way to represent the point cloud and here's. The simplest example say that what the network is actually doing it takes up.

A

It places effectively agreed around the point cloud right, and how can you do that? So if each point is a point in space, then the network can simply say: ok, let's say there have 1024 values to represent that point. So I can think of like of like a 3d grid with indicator functions representing like 1 and 0 elsewhere, and if the point XYZ is situated inside one of those boxes, I'm gonna in the 1024 dimensions, I'm gonna, place, 1 and 0 elsewhere.

A

Right so I'm, not saying that that's what the network is doing, I'm saying that that's one trivial way for a representation to take a 3 coordinates and just represent them differently in a thousand dimensions, but that now has a meaningful representation of the shape, because if we have that, then just doing a max pool will exactly give you the voxel ID shape back it. Does that make sense?

A

So that's one way that that at least I could get my head around like why it should even be used now along the network to train and do something more fancy. Then you can think of things like hey. Maybe you know similar to what people have done in dictionary learning and then techniques like that? Maybe they're more.

A

Recurring structures in 3d right so, for example, if I see a bunch of points, it's wasteful to represent each of them in its own voxels, but I can have I, can think of them as like having multiple evidences for the existence of a plane right. So now, I'm, not gonna waste, whatever 15 dimensions. Out of my want 1024 to represent that plane, but I'm gonna just have one indicator function for a plane at a certain degree at a certain location.

A

I hope this wasn't confusing I'm, just trying to say like there are many many ways to parameterize this space and the fact that the network is working is doing these operations point-wise shouldn't hint that it's not meaning not useful.

A

Okay, just to give you like you know, you know placing some some additional bells and whistles on top of that things. For example, one thing that you can do is is: if you want to be somewhat invariant to transformations in in the global in the global shape, then you can.

A

You can play some kind of, but for a soma Network, so a first network that sees the point out that decides: okay, I'm gonna, rotate it in a certain degree and they added one both to handle the input and both and and also the hand of the features later on. So the first one is like very intuitive. The other one is less so, but still seems to help, and another question is you can ask, is okay that global feature can allow me to do to answer global questions like what is that shape right?

A

I can do that. I could do classification, but what, if I care about going back to the original resolution? I say something about each point. Maybe what part of the shape that point? That point belongs to so in order to do that, they just do a simple trick. They take the global feature and then concatenate it to each of the individual points. Alright, so they go back to the original endpoints or from some intermediate stage of where there were like 64 dimension and they just concatenate the same feature over and over again.

A

So this is a duplicate of the same of the same vector, but you can think of each row here as now having some information about each individual point and all others. So that's already allowing communication between each point and it's the rest of the shape. If you will and that, if you keep that resolution, then you can keep processing each point individually, but now it's it's aware of the existence of all the other points, so you can get back results that are that are meaningful.

A

For example, you could do part segmentation, so first any questions about those additions, so yeah. So for the task of object, classification that that has already kind of shown, you know as the first work, that's not doing voxel grades or not image based. It was already impressive in terms of the result that it bet that you got. You know, that's like one. One thing you are, you learned the hard way is when, whenever you wanted to reinvent or sorry to invent something new, you have to fight with things that are so well engineered.

A

It's very hard to push an idea to be state of the art. So you you either find a reviewer. That's willing to accept the fact that you know novelty can come at the expense of maybe sometimes being the best or you just have to see it on that for like a year before you can, you know, push that so I think yeah, sorry for using the stage for self frustration.

A

And then, and then as I said so this this shows classification, so essentially the things that you can do once you have a global representation, and this shows what you can do once you did once you added this global representation and kept processing each point individually. So now we can do things like do part segmentation right, yeah.

A

Oh sorry, I was gonna. Do this yes and then I'll go back to that slide. So here here, for example, the task was worse part segmentation right. So now you can paint different parts as belonging to difference.

A

You can have supervision for that. You can train for that as a pair point task right.

A

This is also a cool experiment that they show in the paper that gives some some very nice intuition on what the network's is learning and that has to do with the specific type of global operator that I chose, namely the the max operator. So if you remember the max operate, a symmetric function at the global pooling stage that they chose to use these appear is a pair feature max operator. Right so, let's say point number one screams the loudest at feature number fifteen. Then that feature is going to hold for the global pool.

A

So one experiment that you can do is you can go around the existing points or maybe start with this, so you can start with your model. That's the original point cloud that you provided with that say, thousand points and now some points they never screamed the loudest right, so they're, just not important for a global operation. So let's remove those we can easily detect those. We can just remove those and they're asking okay. So what what are we left with? Like?

A

Can we understand what the model has learned in terms of kind of some internal representation? If you throw those out, we get a set of what they refer to as I think the critical points or something like that and the critical points ad shows you that you know at least it makes some sense. You get the boundaries of the object that may be. You know important parts that hint towards what that thing is because that was useful for a crucification task right. So that's one experiment: you can also do the opposite.

A

You can add points as long as you add points that don't change the global pooling, mainly the representation of the model. So you can also do that and then you can think of it as a way to do some kind of a densification of the cloud, but also I kind of like to give you the minimal and maximal boundaries of what you're allowed to do with the objects you you were given without changing its meaning. The way the network is thinking of that I thought that gives a little right right.

A

So the way I think of this- but maybe there's like more deep difference here- is that in this case the object is your scene right and the parts are your objects right, yeah I mean definitely there's a different behavior. So, for example, you can think of maybe networks learn in image segmentation. Maybe they'll learn to do some kind of compositions all right. So if you detected people and cars and whatever animals in they repeat at different places at different images, so you can kind of learn to detect those.

A

And here you know, if you've seen a wheel of a skateboard. Maybe that doesn't mean that you're able to to use that later on on different different models. So it's not like a pedestrian in one scene. He will be also a pedestrian in a different scene. Right, that's a part, but it's unique to that object. At the same time, there's much more context, because if you've seen a pedestrian enon in one image, it doesn't necessarily mean that there will be more right.

A

But if there's like you know, you just detected I, don't know a skateboard kind of a skateboard part and and one of the wheels, then you know there should be more hips right. So you do have more, maybe more context if you're focused on a single out that will not generalize to object that are outside the class of objects, but in terms of variability within the class. It generalizes this yeah.

A

If people here like care specifically about parts of 3d, object and there's a recently, a very large collection and I mean ping me out I'll give you the I'll, give you the links right so like 4d 3d in in time. Yes, yeah, so III don't show here, but there recently is a is a technique that tries to focus specifically on on that I think it's called a Minkowski, sparse, convolution Network yeah yeah, following like the other.

A

You know course key time, yeah yeah, so you can check that out its forum, Silvio's group in Stanford and they're, specifically yeah I, think one of the problem is the datasets. So now you need moving dynamic scenes to work on there's some recent releases I forget by which company, but there's some recent releases of that, but not as many as you'd find otherwise.

A

Right, you just need you to just need the companies to release their data and also to annotate. It sometimes.

A

Right, so this is an example of using that in scenes right so like scene segmentation. Only in point clouds and yeah I mean hey, I, totally agree with you, that's the it should.

A

He definitely has some similarities, but but in general, is a very different problem right to to what's done him in in images and even but one example of white would sometimes be easier, for example, is that height is a very, very good hint, so I had a like I had a work I guess five years ago we used I think random forests to do some kind of analysis of a scene, and the advantage was that you know we had to place the features ourselves.

A

So we could, we could later see which features were very useful and we had to detect to do some kind of a segmentation of an indoor scene, and what we saw is that, like height almost immediately reveals, you know desks right, so they they tend to the same always. So that's a very good hint towards this thing.

A

So, actually, in one of the advantages of working in 3ds that you have the actual absolute sizes as opposed to images where you know that, depending on how far you are from the object, it could scale here you have the absolute absolute sizes, so you can use that. That's that's how! Yes! Oh that's a very good point: okay, yeah I'll get back to that! I'll! Get back to that!

A

Yes, thanks yeah right I mean they don't have to be because if you think of having such a point, it's very easy to detect either you do some kind of a manual pre-processing of let's say the 99 percentile of the scene will usually hold the floor, so you can just push that to be 0 the 0 plane or the network, and just understand that on its own yeah yeah, but you probably don't want to included the same scene rooms that are 2 meter, high and 50 meter, high yeah.

A

No, no I think there were there were annotated on on reconstructed meshes of those scenes and then for the purpose of this were. Were sampled began to be point clouds right, yeah yeah, so so, if it's reconstructed as meshes then and there's like plane detections, then then there are like pretty easy tools out there that you can just kind of paint on top of the objects. It's not a fun job, don't get me wrong, but but you still have to like rotate the surfaces and paint all over them.

A

But it's not as bad as you think like here. Picking up point by point. Maybe yes.

A

It's a very good question, so this whole effort of individual objects is very is I'd, say like the leading effort in dissimulated things. I one problem with simulating scenes. There have been some efforts right, so I think one famous data set is the Sun Sun CG that many people have used.

A

It's just in some sense, too easy right. How do you generate a cluttered room with? You know a bad that has kind of clothes road on top of that, and and you want that thing that they have in cocoa dataset right, you want the image equivalent in a 3d, so I think it very fat very quickly becomes too easy to to work on synthetic data in in 3d for those tasks. But why do you say you want the same number of points, because the architecture I showed right right?

A

So so ok, so there are a couple of answers to this thing right and and depending on what you care about. If it's, um if your data came from the same scanner and it's similar scenes, so you can trust. The did, for example, density is not a property that you that you worry is not constant. So it's pretty much constant, but sometimes you scan just little points and sometimes lots of points.

A

Then you can either you know, normalize everything, a maximal amount of points and pad or or you can work on on parts of the scene. If they're far enough, then maybe context doesn't hurt you so much or you can find a clever way to encode that into Python, for example, what they did in the mean Coastie network I think is like the representation is similar to the personal representation you have in sparse tensor representations. So you just add for each point like it's below like where it is in the batch right.

A

So you can imagine that all your examples or all your points from all the scenes at all in the entire batch or just concatenated, to like one long vector, but but you just know you have the label for each points you know which batch which scene it belongs to. Essentially so that's know just a practical solution to that, so that exists as well yeah as for different densities. That's that's a different problem. I'd say because then you actually need a model that knows to handle different, as it will learn that type of thing.

A

So, for example, if it's a if it's a single viewpoint.

A

If it's a GBD right, so you have a single viewpoint of a scene and you have the depth spare point. Then it's still a regular grid in 2d, but a depth spare point. So you you probably have similar density, no matter how far the object is, but if it's a lighter that scans like this you'll, probably have a denser density in the in the nearby points, then in the far far away point so mixing the two together is gonna, be very tricky in one in one model,.

A

Right right, another another thing that that you can do with it is instead of consuming the point set, you can generate a point set right and so again like there have been some ideas for generative models here.

A

In this case they start with an image, and then they process these processes to have one representation, as as you do with any image and then and then you can go to to 3d, and there are many techniques to do that like, for example, one I don't think this another technique is just converting the channels at some point, you just a depth pair point so that that that's one one option right.

A

Another option is to first have some kind of a encoder and then have your your sample, your image to sample from that late representation, the important bit- and so here let's maybe skip this for one second, because what I want to what I want to discuss is is like. Let's say you generated a point cloud: how do you know if it's a good point cloud right, so you need a way to measure an order.

A

Let's measure of measuring the loss between the sets again, if we were in the case of images, you can just place an image one image on top of the other and measure their differences, although you probably heard this week that this is also not the best technique, always right, but at least we have that as something to start with, but what we do with with sets right. We we want.

A

We want to do this type of assignment, so what is usually being done is is use what's called the chamfer distance and just simple way to understand is by visualization. So essentially you pick for Eve for each point. It's closest it's closest point on the target, and also you do the symmetric operator. So for each point on the target, you pick its closest point on the source and you're, essentially trying to minimize both both distances right.

A

So that seems to be simple enough. Overexpression shoes, you could differentiate your entire network through and works very well in practice right, and this allows for essentially, this is like one of the important enablers to all those generative models, because you know you know it's generator the chair. So, okay, let's see if I can compare it to what I started with if it's a VA ethers right, another cool thing that you can do once you've generated some kind of a point out.

A

Vi e is to do some kind of interpolation in latent space again just trying to probe what the network is has learned say, and you can see some. You know interesting transitions between one model of a chair to another, more of a chair, so there's something interesting going on there.

A

Right, one application that that goes more into the regime of entire scenes is object, detection and I'm going to touch upon that RGB thing very soon. So in this work we were interested in. You know we had a point out as input and we were trying to detect the location of 3d objects, and let's assume that we can all agree that our presentation of how an object is detected is by finding its center and its bounding box and I mean let's not open for discussion. What is a better representation for the existence? Does that endless?

A

And essentially you know, one thing you could do is just go directly from the points into those things, but that doesn't seem to work that well and in fact like. Instead of the art methods they currently were based on either you first detect the objects in 2d, and then you search along 3d, and that requires RGB or you do everything in a voxel grid, and what we found is that one reason- and that has to do with you know with scanning artifacts- is that that is hard for you to do object.

A

Detection is that for object, detection you need to do you need to pass through a stage of proposal that comes from a point inside the object right. So if you I think yesterday, you probably heard about object detection in 2d, you probably were told that you know steadily. Art methods they currently proposed from a pixel and a pixel belongs to the object is something you usually have. If you have an object in a scene that sorry the center of the bounding box is one of the pixels you're observing.

A

That's not necessarily the case in in 3d. Point clouds right, for example, look at this look at this scan of this scene right. So in many cases the center of the bounding box of the object is not a point in the scene right. Even if you have all the points of the object the center it doesn't have to be on.

A

The is not necessarily a part of the scan right just floating in mid-air, and that's like kind of like one of the differences and and maybe when I said earlier, we're not using voxels because they have too many zeros. Sometimes the zeros are useful right because you want to hang some information there, and so how do you deal with? How do you cope with that?

A

In the case of we need phone tiles, so one option and that the reason I'm bringing this up is because it's until now, I talked about points as the existence of an like. If point is somewhere in space, that means that there is an object there, but that's not necessarily the case right. So what we did here, for example, is we just invented new virtual points and you can invent any meaning. You want to those points here.

A

What we did is we took the scene points a subsample of them and we allowed each scene point to vote for where it thinks the object of that. The center of the object is so that created a bunch of new virtual points. They float in space, they have a 3d existence and location, but the information that they carry is the information about where the point thought is the center of an object, and the reason this is useful is now that you look at those clusters of points. You can think okay. This is a great.

A

This is like a meeting place for all those points. It didn't it's not a point. I sampled I couldn't work on it. I couldn't process it in the beginning, but now I created it. So it has. Some I have like a place in space to hang all that information and also like a meeting place for all those points to discuss with each other and together jointly agree on where we're like, probably what's the more precise location and other features like the the bounding box dimensions of an object is so.

A

This is an example of where points are not necessarily just points, but there are virtual points, so I think the mechanism is more more general. Another work that I'm not showing here is is predicting.

A

Instead of proposing a bounding box, it's proposes like each each point in the scene tries to find some kind of a feature or presentation of what he thinks. The object looks like and then uses kind of a generative way to represent that object. So he things like okay, here's the you know that point here on that object and what I'm going to do is instead of proposing a box I'm going to just propose a bunch of points, hopefully they're close enough to where the object.

A

What the object actually looks like right, so some examples of what what you can you can do with it so, for example, here I'm just showing the RGB image, because it's easier for us to understand what's going on, but the actual input is this and you can see that like we're, able, for example, here to find more chairs they're more given in in the ground truth speaking of lazy labelers, you know like here, you know the annotation for all those chairs, even though they are here.

A

Like someone just said ya know: I'll have no algorithms gonna find it in any case, let's not annotated, but in fact in fact you can yeah about the usage of our who has the question about RGB. You did alright so about the usage of RGB. So this is not.

A

This is just from experience of myself, colleagues and and and other researchers. I talked I talked with. This is a huge gap now in in 3d of how to use RGB to do proper, say, object, detection, it's not it's not clear. So what happened until this work that just recently that we recently had is that just result results in general and object.

A

Detection were in that great, and so there were algorithms using 3d object, detection or previous techniques for 3d object, detection and when they also introduced color, they got improvement of, say, 10 percent, or something like that.

A

When we try this voting scheme, we actually passed all previous methods with we didn't use any RGB and we passed all previous methods by like 10 to 15 percent accuracy, so that kind of means that even if RGB was helping it's just because the thing was broken, so whatever you could use that to fix a little bit it it used.

A

When we try to also add RGB, we couldn't see significant improvement and one reason to that was that there's not a whole lot of data here and RGB sometimes have like a very good clue on where the object is so you can over fit very quick another. Another explanation is the difference in resolution. So in 2d images you have like a very dense resolution, but the 3d point cloud is usually has a much much lower resolution and also we often subsample.

A

So we have less resolution here and then how do you take the high resolution, information from the image and push it into the low resolution? Point cloud is another question writing to find a good way to do that. A third explanation I think all of them coexist. It's not like that. One thing you know is true, and another is false.

A

Clinician is come comes from, say you wanna resolve the problem of fine-tuning, so you say of a story of overfitting and you say: okay, let's, let's pre train the network on something, then here's like a very general question, we're trying to ask what is a meaningful task to train a 2d image on so that the clue you'll have is a geometric. You understand what I'm saying so. Essentially there works showing, for example, that if you let's say you want to train you're, gonna, pre, train on calcification, say I, don't care!

A

Give me two images from imagenet I'm gonna train for classification. Is that going to be helpful to solve a 3d detection problem? It's a good question right. So previous works show that these networks for classifications, they care a whole lot about a whole lot more about texture than they care about geometry. I. Think one example was: if you take you know an elephant and just replace this texture with the texture of a giraffe. Then the network will still think it would will think it's a giraffe hinting that it ignores the overall geometry of the same.

A

So, in a case where the task you're actually interested in is geometric in nature, you probably don't want to solve classification or not in the usual way. So I think it opens up a very interesting kind of area for research of what is a proper geometric task to perform on to the images so that the features you learn are transferable to other, like to other, to 3d or or in general, are useful right, so I think I think the problem we saw is that it improves it too much right.

A

So it learns to kind of to ignore the 3d information, or you know very quickly- over fits your train set, so it doesn't generalize to your test set. But if you have tons and tons of data to train on then then yeah, then that's on the RGB is used, and maybe I should have also said that just adding RGB is very, very simple, because until now we talked about the input to the network is n by 3.

A

But you can think of each point if it carries some RGB information or some other handcrafted descriptor you you can compute from it, then it's just n by 9 or whatever right. No, so it is annotated. The bounding box is given and annotated, so we can definitely supervise for for the centroid. Even if it's not a point in the scene, it's the centroid of the bounding box of the object. The tight pounding box of the object is what we're after and the dimensions and orientation.

A

Yeah, but it's true that, for example, in one of those data sets they took the extra mile and gave us this and gave them an ass, but the community the supervision of the a model box, namely the seen and unseen parts of the model right. So if you have, for example, this sorry yeah back of a chair here, then this supervision for the box will be the entire chair and that you can do from the to the image, for example right.

A

So you know you're, not just given that in that other data set we worked on, then the boxes are tight boxes. So if that that was produced from a fused scan of indoor scenes, so they walked around and the sample things or whatever you have like, for example, a refrigerator. You only have the front side of your fridge or you never see the the whole depth of the thing, and then you have a truncated box and that's very complicated, because now we have to tell your network like the ground.

A

The ground truth box, you're, looking at is gonna, be very noisy right in some case is gonna, be very deep. In some case, it's gonna be very shallow. That's that's a problem.

A

All right any more questions about point clouds.

A

How are we with time.

A

Ten minutes less 20 minutes, 25. Okay, so probably a you know, an actual transition from from point clouds to two meshes would be to say that until now we only cared about it boy, the points, and now we also care about how the points are connected right. So one good example is when, when you know the connectivity, let's start with that. That's simple right: someone told you what the connectivity between between the points, but maybe to have like maybe arm or just speak, the understanding one step further.

A

It's not exactly the same right because, for example, if you want to represent the stage I'm standing on right, if you need a point to represent, if you need a whole bunch of points, but if you have meshes so those triangles, then you can just take a bunch of points and connect them right. That's enough! So you can go from a very dense point cloud to a mesh, that's great, but the most common case would be that you're given a mesh, namely it's a more efficient representation representation of what already is there?

A

So it's not really like you know. The two are the same: only with edges, so here's an example of again the explanation of why we need to develop things to work on the meshes, and that goes to non rigid deformations of share. So you a useful use case of of meshes to represent non rigid shapes.

A

That means that the shapes need to deform, and that here shows that, if you keep using the same filters that you learn on one model, but now you say you deformed it a little bit, then this is not going to be useful anymore. But if you can learn filters that kind of band along or or deform along with your representation, that's more useful. You can reuse them right so again, a demonstration of what is intrinsic.

A

So where do we have known connectivities it actually in many cases right so I work, a lot on non rigid, 3d shapes so human shapes. We have very good models for those we have decent, meshes and and also social graphs. So some companies have large social graphs and they know how people are connected, and so what do we want? Out of those out of those filters?

A

Right, so we want the same thing we have in in 2d right, so we want convolutional filters and when I say we want convolutional filters, I actually mean we want to do weight sharing, let's, like the key, the key property here right. How do we share the weights from one location to other location? There's lots of reoccurrence that's happening. We want to utilize it.

A

We want multiple layers, because you know deeper networks are better right and we want again some property of locality and and yeah I'll say something about that. You know right so I thought of how to present this this topic, because it has some history and maybe you as physicists, will appreciate it more I.

A

Don't know I'm gonna start with giving you like the way people are currently kind of doing graphs or networks, and then we can maybe discuss a little bit the history, if there's time so like a very, very kind of natural way to do this is that okay?

A

So we have a you, have you have the vertices and you have the edges, so you know how they're connected and let's say you have some features feature could be a trivial thing like just you know: I put 1 and 0 elsewhere on the existence of a node or it could have a 3d meaning of like the location in space of that point, or it could be a feature that represent a person in a social graph, and you can also also have edge features.

A

So this could be again just ones and zeroes, depending on whether two points are connected or not, but they could be more meaningful and what we want here that we lack. So what we lack here that we do have in images is some kind of an ordering. So if you have a filter, you can place it on top of your image and you, you know which way it multiplies, which kind of you know image location here.

A

You don't have that, so you again resort to some kind of aggregation operator that symmetric right, other max or so so now. Essentially, what you have is this, this type of symmetric copper. So you have a layer that consumes two vertices and an edge and just performs the symmetric operations on top of them, and then you can follow with some kind of a nonlinear function right. So so what did we say here? Essentially, we said something super simple right, so we want the output of a con flare of that point.

A

What we're gonna do is we're gonna place a filter. We know it's say the one ring of that, so we know jccc matrix of of the graph, so we can just combine all the features say we average right. So we can combine all the features from nearby points right. We push them through some kind of a nonlinear computation and that's our new value for the next stage. All right and we have. If we have a bunch of those filters, then we can represent things that are a little bit more complicated right.

A

So if you follow like recent literature on graphing or networks, basically all methods they use that as like the baseline and then start complicating things from from from this point. On and complication can come in many forms, so they could either. You know, how do you prevent from just using this very simple operations? So, for example, you can think of maybe I want to attend some points more than others right, so I can think of how I weight an edge between two points.

A

According to how similar their features are so that'll be very similar to an attention and attention networks or a graph attention and network is doing exactly this.

A

Right yeah, let me let me show you, how is this been done in 3d right? So in 3d, for example, you can again, if you want to not so you start. First works have started with this notion of, and that's kind of, like a bit of history that I'm a little it's trying to skip here.

A

Is that where people have realized, but I can just say a few words right, people have realized that you know it's very hard to define the shift operator on this structure right different from a patch that you can just you know you can just change its position in a very consistent way. If you have a graph, then how do you go from one point to the other?

A

Or luckily you know we have the Fourier, transform and and the theory that tells us that if you go to the Fourier basis, then this convolution operator just becomes a multiplication right. So the first works on graphics were actually doing that that we're going to they were computing.

A

The graph updation that will allow you to take the signal and represent it as a bunch of coefficients in a Fourier transform and then, if you also represent your filters as a bunch of coefficients in the Fourier space, then you just need to learn those coefficients, multiply them and then resynthesize the function. If you want to go back to the primal and you have to go back actually, the non-linearity was introduced in the in the primal space. So you have to go back. Do the non-linearity so forget the computational complexity for a second.

A

You have a solution to that. One problem with that was that people saw that it's not it's very sensitive to the graph. So the graph, a patient, is unique, paragraph different from a food truck for a transforming 2d. That will be the same for all images out there in a graphic depends on the graph and this filters the filters that people have learned, weren't, transferable to other graphs and that's the property of locality that was mentioning earlier and one way they solved it in in sweidy was to define local neighborhoods on a 3d mesh.

A

So you have the one ring. You know each each point which other vertices it connects with, so you can build a local patch and extract it. And now, if you want, you can be either doing the Fourier transform. So you can have this kind of a representation or you can just find a way.

A

Oh, you have sorry or you can do a symmetric operation that that also ignores orientation. But if you're, lucky and you're on a match, then you can actually use properties like the largest curvature to kind of guide you as the north right. So now you can rotate all patches to be to have a consistent representation of the patches and you can go back again to learn those features that you like in 2d, but only now on a mesh all right, so you can translate them between different different positions and different ways to measure.

A

You know the distance between a point and the patch that you're that you're taking that you're extracting or proposed, and that addition was very helpful in practice to do some kind of so once you detect. This is a symmetric operator, like the one I showed here right, but this one already gives you a principal direction, so you can actually use consistent, patches representation to learn more so each filter can hold more more information can distinguish between. You know the hand oriented like this or orient like that right, so right.

A

So, for example, let's let's think, let's imagine you have a graph representation of a mesh of a person and then suddenly do something like a topological change. Okay. So if I, you know you computed some filters on this shape, but now the hands are touching right and there you know edges connecting these points right. You don't know, you don't know that these are separated. So you change the topology. Suddenly you change something locally, but it influenced the entire global.

A

It had a global influence on the Fourier representation on this thing right. So this type of changes were sorry that the learn filters were very sensitive to these types of changes.

A

Yeah and and keeping local essentially means that well.

A

On how you think of it, if you think of it in primal space, then being local is just being local just to crack the patch that looks no further than a one or two hop. If you look in the Fourier space, you just think of very smooth functions. So if you try to do that, you can do that on the images as well right, yeah that then there shouldn't be a problem right.

A

The problem is that that when you, when you have, when you have a graph or or or a shape, you have to compute a graph laplacian and that now is different depending on on the inputs. Now it you know it changes with its back to the input. So you need things that you can reliably transfer between. You need filters that you can reliably transfer between different different graphs.

A

You can have that sometimes it's okay right, if you have, for example, a social graph that doesn't change, but only the signal on the graph changes like a temporal signal. For example, you had a bunch of people, then that may be fun that that's maybe fine right.

A

If the graph is constant, so it's not that it's not useful. It's just that it's sensitive to changes in the graph or XI ality is another good example.

A

So if you try to do a partial matching, let's say I scan you from the front side and I want to find correspondences between each point on that graph and in a model of a human shape. Right they'll also be very sensitive to the to the graph change. So what was the first of it, you said, should I think of them as what was right.

A

Yes, right, I think I. Think I could return the question to you right because if you you're the one giving building the graph right so here in the case where we assume the graph is given, then it's up to the user to decide what's an edge or what's the meaning of an edge. So if you decided to take an image and build a graph from that image, according to spacial proximity, that's one option: if you chose to build it according to color proximity, that's another option, and this will be different graphs right.

A

You can mix the two and be kind of a bilateral or or whatever, but yeah, and and if you are not given the graph and so I think, let me give a couple of examples and then and then I'll talk to you about when you're not giving the graph and- and how is that. Is that different? So this this technique, I'm showing you're. These techniques have been used in mostly in in shape matching applications where you want to find correspondences between each point say on this graph and after the object has been has been deformed.

A

So this is showing like how close it is to the actual or yeah yeah. The color here shows that if that point was wrong, how wrong it was right, where the model think it belonged to yeah like basically, all these techniques. There are deep learning based or now really outperforming all baselines.

A

This I just had to show, because we had such a cool visualization. We actually had a friend an artist friend that generated those meshes, but I did hear again that this was kind of an example of how do you do that in a non in an unsupervised fashion, but I don't have to go into that yeah. So partiality, as I started to explain is like another is another part where you really saw a difference between using you know, graphs, neural networks, kind of like the fourier graph of artworks and and local filters.

A

hmm So, firstly, I mean exactly the example. I gave you like if you have a partial mesh, so this is given as a mesh, but it's just produced from one viewpoint, so you're not saying later and the colors here are just the texture. So these should match some global. Some complete model, yeah I, think I'll skip yeah. This is an example showing one you know concrete application, but maybe I'll just you know, spend a minute on this slide.

A

This is like one concrete example of the usage of a graph neural network, so just to understand the setting of the problem. The setting is that you're given partially input. In this case, you know we remove the limbs of a person and we wanted to complete them right and there are many plausible way in which they could be completed. But let's yeah forget about that for a while, just just think of a partial graph and you wanna, and you want to learn how to encode that into some kind of a representation.

A

So here we use the technique that actually uses this. The graph is the connectivity vertices, and this is like very similar to actually, if you think, of a 2d convolution, but instead of you know a now, you're not allowed to place like one filter on top of the Imogen and ask the the order to stay constant. Instead, you can think of take take each of those filters along the its dimensions and just perform its computation.

A

It's multiplication with all the optional, with all your your neighboring vertices right, then you can just have what people of those so essentially yeah. It's a little bit hand wavy, but I'm. Just saying that in terms of computational complexity, it's the same one as you'll have in in in images it's just. If you transpose the dimensions, the computation becomes a little bit different, but you get you get this thing to look exactly like a graph graston, so this will give you the output of each of each new vertex. So again you take the filters.

A

The important bit or the main difference from images is that now you can't commit to the location, so you just take in each each and think of this is like a one-by-one convolution, the best analogy and that one-by-one convolution is just doing that operation on all of your neighboring points, and then you aggregate right. So this way you're able to build from whatever input you have. You stack those layers together to get to get a code that way you can build an autoencoder, and that was the.

A

Yeah, let me go just to the point here right, so you asked earlier about what, if you're, not giving the the edges right or I wanted to talk about what happens if you're not giving this so I'm using you as just a surrogate, so so I think until now the examples that I gave you know they had a samson of given the edges. So he was up to the user to say how do you connect to two vertices either by special proximity or by some other features, but this work.

A

I think it was very interesting again touching sweidy, but now saying something a bit different. It's basically saying I have a bunch of points now we already saw point cloud networks can embed each point with some high dimensional feature by looking at some aggregation of a local neighborhood. So what if I now invent edges- and I invent them in the way that measures the difference between the features right?

A

So that's something that's something I can do, and essentially what it does is that at each layer, I change my graphs, the connectivity is being reinvented on the fly as I pass through different layers of the graph. So that's that's called a dynamic graph right.

A

Another option that I haven't discussed, but it's like a very interesting extension that that's been going on in also recent efforts is to say edge features. If you give me edge edge edges in the graph, then I can only address edges that either exist or don't exist, but if I want them to hold more information, maybe I also want to update that information that they carry. So imagine that you give me a graph. You gave me a graph, but you have some errors there.

A

So it's a I, don't know some kind of a citation at work or social graph, and you accidentally connected to people it should that ad shouldn't be there. How can you fix that right? So one option that we proposed, but recently there are many other I think techniques is to work on the line graph or the dual graph. What's called where each edge now becomes a node and each node becomes the edge right so using the same techniques of graph neural networks.

A

You can imagine like each layers being as being going back and forth between primal and dual, so you first have the dual to kind of update the edge features, and then you use the update that edge features back in the primal graph and do another step of graphing or I think if we ever tested it on unweighted examples, I think eventually, even if, if it's even if it's not weighted in the first iteration, it will become weighted in the second iteration right right right right. So you have to take that weight into account.

A

But why is that different? Then? All right, I, don't think I have enough time to kind of open up a new sort of representation. But let me just so you'd know it exists right. So we've seen box upgrades, we've seen point tiles. We've seen meshes recently like in past one year, roughly, there's been a lot of interest in in this implicit surface representation and that the finger I'm excited about this topic, especially because I feel like it's very natural, to ask a network to represent this implicitly.

A

So, for example, networks are very good at approximating they're doing classification, for example right and the way they do. It is, or at least we hope is by finding some smooth decision boundary.

A

So some nice properties that we get, for example, the fact that the that they'll, naturally using a stochastic gradient descent, will find a regularized smooth decision boundary, for example, hints that if, if I represent, if I'm trying to teach my network about the existence of a surface by giving it points off the surface and on and on the surface, and let's say that the input I give it is a little bit noisy. We can still expect the network to behave the same way as you. If you would feed.

A

Let's say 2% error into a classification effort right, take em nest, take 2% of your data, replace its its label, the network, who will learn to to cope with that, so something similar will happen here as a recent papers show which gives the network's. So this rigor rigor natural regularization that happens in networks. Give you this nice smooth, diffusion, boundary representation of the surface, so I think this is like a very promising direction.

A

Actually, in order to go from that to having something that you can display, you have to use outside kind of procedures like marching, cubes or stuff like that that take occupancy and perform transform them back to two meshes, so you can display them, but it also gives a good opportunity to like use this universality approximation theorem of networks that they can essentially represent any function. You want so one example that maybe is a good go-to. If you're interested in that in that topic, is there isn't a deep SDF paper you can?

A

Think of this is just you know: you're learning, to overfit to one specific shape, so the network doesn't see a collection of shape, or one version of that paper is that, where the network is just overfitting on a simple on a single shape and what you're learning is a code? Is they call it an o2 decoder right?

A

So you learn a code and when you introduce an XYZ point that you could think of like having as like a cube of X, Y, Z's you're, just telling the network okay, please sample whatever implicit representation of the surface. You have please sample it at location, X, Y, Z. So from the existing XYZ point you have or the dice the distance functions of them to the surface. You can teach that network, but later on, you can potentially sample an infinitely infinitely dense resolution.

A

And if the network is has learned something meaningful, then you should have like a free high resolution version of whatever you started with, and it shows very promising results. I think this a very cool direction to pursue. Some interpretation result that they show yeah, I. Think I'll finish here.