Numenta 2013 NuPIC Hackathon, 2 Nov 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CLA Deep Dive (2013 Fall NuPIC Hackathon)

Description

CLA Deep Dive (2013 Fall NuPIC Hackathon)

A

Okay, guys we're going to start in a few minutes uh after I get subutai miked up um thanks for joining in.

A

A

Tests we'll start in about five.

A

A

A

A

A

Okay, guys, this presentation is probably going to go a little long, so I want to start it on time.

A

There's the potential for it going along so go ahead and mic on cyber subutan.

A

We're gonna start the cla deep.

A

A

A

A

Okay, guys, let me uh introduce stupid ahmed.

A

Subitize, the uh vp.

B

Of engineering at grock dementa and he's going to do the cla deep dive right.

B

Now you got it okay, so the light doesn't go on.

B

Okay, cool! This is quite a technology get up here all right! Thank you. um So matt asked me to do a deep dive into some.

A

Aspects of the cla algorithm and it's a bit of an experiment. I have no idea uh how much of this will be interesting or not to you and but my goal here will be to try to give you a little bit deeper understanding into some of the foundational stuff behind the cla algorithm.

A

I'm not really going to touch the code too much it's more about the algorithm principles around the cla and what I've done is really just uh prepared, a short 15-minute uh kind of piece of it and then I'll open it up to questions, because I know a bunch of you have different questions around this healing and already people have been asking me all sorts of things. So uh we'll we'll go from there.

A

So there's a lot of stuff we could talk about um and the cla is such a deep uh theory. There's you know we could talk, spend a lot of time talking about the cortex and the hierarchical structure.

A

We could talk a lot about what goes on within a level we could go down to the level of neurons and cells and talk a lot about what.

C

Happens at that level as well within the cla. What I thought I'd do is, uh basically you know, there's no way we're going to cover it all, there's no way. So what I'm going to do in the beginning is focus mostly on the on this. What happens.

A

In a single level- and it touches on some of the hierarchical aspects as well and then I'm going to open it up for questions- and uh I imagine we'll do a lot of stuff on the on the white paper, because I'm really not sure exactly what aspects you guys are all interested in.

D

A

Thing I we thought about is at grock. We have this.

E

Cla quiz that we give to uh new employees who want to work on the algorithm, so these go really really detailed, and these are hard. These are not easy. So, for example, you know if you have a temple pooler that has only learned these two sequences, a b c d, e and f g c d. H now suppose you present the sequence, f g c d: what is the temporal pool of predicting at that point in time? What is the exact state? uh What is the exact representation of the temple cooler?

E

Now, if you present the sequence cd, which is in the middle, you notice that's shared amongst those two sequences. Now, what does the temporal cooler predict? What is the exact state of the temple cooler? So yeah there's a lot of really detail like this, that you know you could ask about, and.

A

Happy to go into later as well, but that's not what I'm planning on right now, but you know there's if you go through stuff like this, you really want to understand.

E

The algorithm deeply, I think these kind of questions really help help with that. So what I'm going to do is I just have one thing: I'm going to talk in a little bit of depth and hopefully it'll take about 10 or 15 minutes. You know I have a few slides on it. We'll do some stuff on the whiteboard.

A

um It's a property of uh sparse distributed representations um and it's a property that I think it's really key to understanding exactly how temple predictions work in the temple cooler. I think it's key to understand to getting a hierarchy to work, and I know a bunch of you are interested in that aspect of it. So I think understanding this piece is going to be really.

A

You know important and understanding in detail how hierarchy works and why it's hard to get working. I think it's key to invariances, which is part of what happens in a hierarchy.

A

So I think this property will help with that, and I think it'll also help you understand some of the numbers that we have in the cla and in newpick. um You know: why do we have 40 out of 2048? Why does the activation threshold set a certain way, or why do we have you know so many? uh What? Why is the you know, the percent, the percent of uh connections that are uh initializes in the spatial cooler initially? Why is it that way?

A

Why do we have encoders set a certain way, so I think it'll help. You understand some of those numbers and parameters, and the other thing I noticed is that we haven't really covered this property in depth in the talks or in the white paper.

A

But you know it is key, and you know alex alexander was asking me about something that was exactly related to this earlier. So hopefully at least a few of you are interested.

A

I should say this is a totally optional session, if you're not interested in in getting into this level of detail, feel free to hack or continue on your hack. I won't be upset at all so we'll I'll talk about that. One thing in in some a little bit of depth, maybe 10-15 minutes and then I'll, just open it up for questions and whiteboard sessions and happy to address.

F

Any any type of question.

A

You want okay, so quick uh background. um I think most.

G

Of you already know this, but just very quickly what is a sparse.

A

Distributed representation, um so this is from one of jeff's talks.

D

These slides, so it's it's a representation where you have a binary vector with lots of bits.

A

Typically, hundreds or thousands of bits, and only a small number of them.

G

A

So, for example, you know with two thousand bits you might have: two percent of them acted and in new pic, many of our setups are set up so that you have 2048 bits, and you know, 40 of them are active. That's what comes out of the spatial cooler today and then each bit has some sort of semantic meaning to it.

A

This is very different from dense representations like ascii, where you have a small number of bits. Typically and all- and you have all combinations of ones and zeros are- are possible uh typically they're, very dense uh and eight bit. Ascii is an example and, and each bit typically doesn't have any you know a huge amount of meaning, but so sparse represent distributed representation, that is the language of the cla.

A

Every part of the cla relies on sdrs or sparse representations, and- and really I think, if you really want to understand the cla in depth, you really should understand sdrs on that, if there's the lingua franca of the cla okay, a couple of properties of sdrs that jeff talks about quite a bit in some of his longer talks and again, probably a good chunk of you have seen that and if you've read the white paper, you should be familiar with that.

A

Given two sdrs, you can compare them because each bit has some semantic meaning you can look at how many bits are shared across two different sdr representations and you just count them and the more bit that are shared the more similar. They are you. You can store sdrs efficiently, because these are sparse.

A

You just need to store the indices of the active bits and because that's a very small number, the actual number of bytes you need to store an sdr vector, is only dependent on the number of on bits. It's not depend on the size of the whole vector, and we definitely rely on that quite a bit in both algorithmically and from a code optimization standpoint in a few places.

A

Another property of sdrs in I didn't talk about the distributed nature of information, but the sdr's each bit has semantic meaning and the information is actually distributed across the bits that are on and no one bit is critical to uh to the to the meaning of the thing. So what you can do is you can sub sample. You don't actually have to store every single on bit.

A

You can actually sub sample and you can uh get the gist of what that representation is, and often you can get a very accurate uh representation of the underlying vector. Even though you're not storing every single bit.

A

How would you know which bits of the of the possible of the ones that are turned on to subsample yeah, so you in most of in the algorithms that we have? You actually don't need to know you just randomly subsample and uh it's an interesting property that as long as you subsample enough of them, uh you're going to get the gist of the meaning in there?

A

And so the temporal cooler uses that when we, when the segment, when you learn the next step of a sequence in within a single segment, it subsamples the active bits that are on and you don't have to store all of them. On there. Yeah yeah.

H

A

An array of values: um is there any? Is there any significance between the positions within the array related to one another, so from position 256 to 257?

A

uh Do those have any closer significance because of their proximity than 256 and 310, or is it only between subsequent patterns that the that the indexes make make any significance yeah? So it could be either or so in. That's a good question in it sort of depends on the structure of the problem in a problem like vision, there's a natural topology uh and for mostly for efficiency.

A

We, you lay out your columns in a topological manner, and so uh neighboring columns will be more similar or be looking at the same part of the input space than something that's far away. Now in new pic in the opf, we use a version.

D

A

That, where the topology does not matter in that case, you know once you initialize it, you know the system learns, you know each bit will learn. You know properties of the input, but you know you could have scrambled them up and it doesn't matter.

A

And I'm trying to predict how far I can go before jeff is gonna, interrupt and say no you're wrong: okay, the the this another property and another question: oh, we have another question.

A

So it's a code, efficiency purpose or is it a neuroscience reason to subsample from 40 to 10 uh active bits to store those, um so it at least it definitely helps with efficiency. um So in the tempo cooler um you know you can use the number of bits you have to store because you can subsample.

A

There is a neuroscience reason, although I don't think it's really that important for us. We really just efficiency, but in the neuroscience you can only form a synapse uh which is what's going on here, a connection between two cells and you can only form them if the cells, the the axon and the dendrite of the cells are near each other sufficiently close to each other, that you could grow a synapse between them. So there's a very small number of people in neuroscience who study what are called potential synapses.

A

These are the how many cells could this cell connect to and those cells cannot connect to most other cells. They can only connect to some subset and that's that's the biology it just can't physically do it, but but there's lots of reasons we wouldn't want to do it anyway, right so from an anatomical standpoint, there are constraints that, whatever representation you use has to be robust to sub sampling, that's right and also about robustness in terms of the cells dying and connecting failing.

A

What's going through your brain, all this stuff.

A

Yeah, it has to be robust noise as well in the input.

A

So, okay, so the last property on this slide is the union property, and this is the one I'm going to go into in a little bit more depth. So really fascinating thing about the about sdrs is you can take multiple patterns?

A

Take the union of them and then query to see if some other pattern is in that union or not? Okay. So let's say in this case there are 10 different patterns that are just org together, so you have a new representation which may not be sparse anymore, but it's got a number of uh on bits and now you can take another vector and say: okay, is it part of that original 10 or not? You can query that vector and the way you do that is again using the same technique.

A

There you just look at how many bits are actually shared. You just you compute what we call the overlap of it, one sort of small node. You can't use hamming distance for this. You have to use overlap.

D

And so you just count the number of bits that if the number of bits is some threshold, then if that set is in there or not, if you're familiar.

H

With bloom filters, they kind of rely on a very similar property and I think the math behind bloom filters, and so on is kind of interesting for understanding.

D

Sdrs as well.

G

Okay, any questions.

A

Here, okay, so what I thought I'd do is um walk through one aspect.

A

um And I can do that on the on the whiteboard here. Let's see you pull it up, it's close here.

A

A

A

Later, this is why you need well whiteboards.

A

Okay, so let me walk through one small thing: suppose you have an sdr and there's uh 50 bits here.

A

Okay and you have a particular, you know a pattern stored here, and it's only got, let's see, only choosing one out of 50 bits to be on okay, so this this pattern has one bit on and the rest are zeros. And now you have a new pattern and you want to see.

A

um Okay, is this pattern in here or not, and you're only allowed to choose one out of 50. okay? So this pattern, um you know what is the chance of of an overlap here. um You know what is the chance of a false positive. So in this case um you know, let's say this pattern is not that original pattern, but what so?

A

What is the chance of a false positive because you're picking these bits randomly well since it's only one out of 50 there's a two percent chance that the bit that corresponds to this pattern is the same as the one that's stored. Okay, exactly so there's a two percent chance of the false positive of a mistake being made.

A

Okay is that is that clear, okay, so now, let's say you're allowed to have two bits that are on okay, so you randomly choose two out of these 50 bits to be run, and this pattern also has two out of 50 bits on okay. Now, what are the chances of a false positive?

A

So the chance of a false positive at this point is, is going to be less because you've got to have there's a two percent chance that you're going to match this bit and multiply by a chance that you're going to match this other bit.

A

Okay, so you can see that the the probability of a false positive drops a little bit now. You can continue that, and you can say: okay, what, if you're choosing 3 out of 50, um then the probability is even lower. So at some point and these probabilities drop exponentially.

A

Okay, so, as you add more and more bits into your original pattern, the chance that some two random patterns are going to have the exact same representation drops exponentially.

A

Okay is that uh is that pretty clear.

E

E

That's right, yeah.

D

E

The question is it's going to trade off after some point. If you have 49 out of 50 bits on then that you're going to be exactly matching right so, but so as you increase the number of bits on the probability decreases exponentially, then after some point it's going to go back up. Okay, so well.

E

Okay, that's all right.

E

I don't know why I have this slide here.

A

E

So I actually plotted this. um This is um so this shows, if you have 50 columns.

A

um The chance of a false positive, as you have four bits, and I only went up to you- know nine or so eventually it's going to go back up, like you mentioned, and as you can see it, it drops exponentially and if you just have nine bits on the chance of an exact match is like 10 to the minus eight.

A

It's like never going to happen, um and you know, if you can, you can keep going if you're you know, 10 12, 15 bits on and the chance of a false positive is going to be minuscule. Okay,.

F

um And- and this is for an exact match now, if you don't want an exact match, if you're looking at you know, let's say a 99 match or a 90 match, these numbers still potentially they're going to be higher chance of a false positive, but it still goes potentially okay. So in order.

A

To have this property that two random things are not going to match, you need to have a reasonably high number of on bits. You know you can't just have one or two bits on you want to have you know somewhere around here, maybe higher on. Okay. Is that that clear?

A

But so I talked about how you can store patterns in in here. So let's say this is a pattern. Now we want to be able to or multiple patterns into the same representation. Remember that you can take the union of multiple patterns. So let's say let's say you have again. You know one bit per pattern.

A

Well, that's it sorry, let's say three bits per pattern and I want to store two different patterns into the same representation. So I would take this pattern and this one and I or them together- and I would get you know five bits on here.

A

Okay, so this new representation contains both of these patterns work together. Okay, so now I can take a new, a third pattern and see: okay. Is it does it correspond to this or not? So the question is either: how many of these patterns can you store into this representation without, while still having a very small chance of a false positive occurring?

A

Okay, so I plotted that- and this shows, uh if you have one bit on at a time two bits on at a time four bits on at a time ten bits on at a time. Oh I'm, sorry! um What did I do?

A

This is sorry. This is the number of um bits on at a time that you have, and this is the number of patterns you store and then that's the chance of a false, positive okay. So, as you might expect as you, if you, if you have a large number of bits that are on okay, the number of patterns you can store by oring them together without getting false positives, it's kind of low.

A

If you have 50 bits on and you're storing, you know, each pattern has nine bits on you're not going to be able to score 10 different patterns, reliably.

A

Okay, but if you just have a small number of bits on then the chances of mult. You know multiple patterns having the same bit on is is lower, so you can store more, but there's a trade-off here. Okay, so the number of patterns you can store at a time by ordering things together is pretty low, with 50 columns here, okay, so the answer to this is just increase.

A

The number of columns, okay, and so what I've shown here is suppose you have a thousand columns and again, the x axis is the number of bits on at a time yeah, and at that point in time uh the number of patterns you can store grows exponentially.

A

Okay, so there's another exponential here and- and that is the number of patterns you can store without having a false positive, and so here, if you have a thousand columns, if you have eight bits on at a time, you know with ten. If you store ten patterns in there, the chance of a false positive of an exact match is again. You know really small and you can play with these numbers.

A

It exponentially grows with the number of columns and, and uh the chance of you know any any false positive also grows as you increase the number of on bits up to a certain point.

A

A

A

Is okay, so what we're doing is um the way we're storing patterns in this one fixed that fixed width vector, is by oring multiple.

I

F

A

So and the way we detect whether in this case the way you detect whether the pattern is in there or not, is you take a new pattern and you count the number of shared bits, so, let's say you're, storing nine bits per pattern and if the number of shared bits in this word representation is nine, then you say: okay. This pattern is in my set in my union that I've stored okay. So if it will it will never give a false negative.

A

Okay, if you have a if you have a pattern and it's in there.

D

Because you're oring, the.

A

Patterns together, you're always going to get nine bits that match right because they're in there. But what the problem that can happen is a false positive is you can get another pattern? That is not that any of those in there, but now because you've worked those bits together, you have many more bits on. You could have a false match.

A

So that's the error that you you need to focus on is the false positive in in this. In this case, okay, that's, uh okay! So this is um so what if the numbers work out right?

A

Okay, so if you have a large enough number of columns and a reasonable number of on bits, it turns out that you can store a reasonable number. You can order together a reasonable number of patterns and actually still be able to retrieve them or be able to detect whether they're in that set extremely reliably. The chance of a match is very, very low.

A

Okay and this this property of um being able to take the union or the superposition of lots of.

F

A

And being able to still say whether answer reliably, whether that pattern is in there or not, uh is extremely important to the cla and the fact that with sdrs you can do it with a fixed representation is really advantageous because everything after that point can just it doesn't need to know uh you don't need dynamically growing structures or anything like that. Everything just works off of that fixed representation. This is a really important nice property of sdrs yeah.

A

Oh, so the question is about the layers and the columns and how they fit inside I'm just working. You know this is sort of theoretical. At this point we haven't mapped this to layers and columns yet but yeah, it's mapped to neurons, well, yeah.

A

Cla, you know each neuron is either on or off. So you know that's a.

A

Number of neurons and some percentage of them are on so that's an sdr. um It also works at the level of segments, so there's a number of synapses that happen to be on and how they map to the columns that you know. There's these properties coming.

A

That's all I just will add just add to that a little bit you.

F

A

In the biology you've got a set of cells, that's your fixed representation. The cells are either active or not active. We always find everywhere in the brain that you have very few cells that are very accurate. Most of those are relatively inactive, so you have this sparse activation, but what subutai was saying is that you don't get the state of these patterns from some buffer or some linked list or something like that. They have to be all in the same codes yeah. So the same set of cells are representing all the different things.

A

Different states of the system at any point in time that same set of cells can be representing multiple predictions at the same time, yeah, and so that's that's it that's part of the key to how all that works, and it's just this fixed set of resources that are used over and over. That's right, and uh some of you are really interested in the math behind this and stuff. I would love to like sort of develop this part of the theory.

A

More canerva has done a bunch of work on on this, and so you know I definitely encourage reading his papers and his books. I think he's done some of the initial. You know some of the mathematical foundations he uses are are applicable here, not everything, but that's some of it. um So, let's just let me just uh get one more question.

A

You said that uh envision right, the sdr individual bits they have meaning with respect to other proximate bits right. So in this case, in in case of this, and even if we get false positive in the union, will it be some relevance to the yeah? That's a really good question. So the um what I you know mention here is just randomness and looking for an exact match.

A

That's not actually what will happen in reality. You know the sdrs have a semantic meaning to it.

A

You know what that means is that if you have similar inputs, you're going to have similar outputs and so of course, the chance of a false positive, there is totally different than what I went through, but it may not be a bad false positive. You know: you'll have false positives with similar things. um That may not be a bad thing.

A

That may also be how you get generalization, because sometimes you want to make a prediction that it's independent of small changes in the input right, so that is one of the ways that you get generalization through here as well.

A

Okay, so this analysis is very abstract and just talks about random bits, but in reality the the semantic component of sdr.

D

Has come into play and and it's it's one of the ways that you get generalization uh in the cla?

D

Okay. So just to recap, you know so by super superposition.

A

If you have your numbers right, you can store completely separate patterns in the same vector with minimal chance of false positives, okay and there's basically two variables that are relevant here. You have the number of bits that are on and the number of columns or the dimensionality of the vector. Those are the two numbers that you play with and that's the sparsity level.

A

Okay, so number of on bits, divided by number columns, is your sparsity and and this this, these two numbers sort of control. How many patterns you can store. um So you know the constraints. Are you need to have a reasonable number of on bits and you need a high number of dimensions.

A

You could have very low sparsity like point zero, one percent uh extremely low, in which case you could store uh potentially a lot of this, but.

D

Remember you also have to have a reasonable number of on bits so now.

E

You you, the number of columns you need would have to be huge. So there's a bunch of trade-offs here and these kind of guide us in in how we set the numbers from a theoretical perspective and biology must have gone through a lot of the same.

E

Exploration in that space to come up with the numbers, because the numbers.

F

E

Seem to match uh seem to have really nice properties uh on this point, so in new pic, we've chosen 40 out of 2048 um that that gives us about two percent sparsity um and one simple coding exercise is just try to figure out how many random patterns can you store.

A

E

You know simultaneously without getting too many false positives, so you can just write a little. It's.

A

Or you could try to figure out analytically, which would be nice, but just you know it's easy to code. This up, you just sort create random patterns together, create another one, whether you've got a false, positive or not and just repeat and see. If you can come up with plots- and you know that's something you can try doing if you don't have anything else to do so. This is this is uh just a simple coding exercise. You can do okay, any more questions than that.

A

um I don't actually have slides for this, but so now, if you look at the temporal cooler, this superposition of pattern is the.

J

Way that temple prediction happens, one is the way that you, the temple pooler, can make multiple.

K

Predictions about the future, so what happens with the temple pooler is that.

A

K

Will predict what activities are going to come next and what the representation you get is the ore of all of the possible patterns that could happen next.

A

And that's because of this property, the tempo pooler is able to make multiple predictions.

K

In the same vector, okay, so this without this property it that.

A

Wouldn't work um and.

E

The way you you have to again have the number of on bits and the number columns and all that in a reasonable space so that you can actually do it. So um if you go back this exercise here, this will actually tell you how many patterns attempt the temporal pool, how many random patterns the temporal pooler might be able to predict simultaneously without the chance of a false positive. So this is not just some abstract exercise.

E

This will actually give you a hint as to the capacity the prediction capacity limits of the temporal cooler, and so you can you can so that's really cool another. I mentioned that this property is also important in a hierarchy and.

F

E

And let me try to explain how that might happen, and you might have a lot of questions about that. Let me see so that you know why do I have all these okay? So in a hierarchy you have um a level that's speeding to another level, and the output of this level is the number of cells that are active at any point in time. So this is again the state of the the temporal pooler and that's fed into into the next level.

E

Okay- and one of the properties you want to have with a hierarchy, is that as you go higher and higher up in the hierarchy, it's representing more abstract concepts and you want the output. If you look at the temporal system, you want the output of the cells to be slower, as you go higher.

K

In the hierarchy.

E

Okay, so if you're, the lower levels of the hierarchy will be representing very fast local kind of spatial temporal concepts, whereas you know cells high up in the hierarchy will be representing concepts that change much slower or you know the cells will be much more invariant to transformations, and you see this in the visual system, you know lower levels.

A

In b1, you'll have cells that respond to small edges, and you know you move the edge a little bit and then the cells will go off the cells. At you know. Four levels of five levels up in the hierarchy will represent concepts such as faces. So you can, you know, do whatever you want to the image.

E

As long as there's a face there, that cell is going to be active so as as.

A

The images move or the face rotates or lighting changes or whatever um you know the the cell is going to stay long. So the you will have a the cell's going to be slower when at the higher level of the hierarchy.

L

L

So one way to show that is what this trace shows is time going this way and along the vertical axis, you have the output of different cells.

L

So in this case I think there are 10 000 cells and each cell is on for a certain period of time, and then it goes off. So this might be the activity coming out of one level.

L

What you want to see is, as you go higher in the hierarchy, the the outputs.

A

L

A longer period of time, okay, so you're going to each cell, is.

I

A

L

Multiple time steps instead of one time set as in that and as in that example,.

A

Okay, so this slowness is a property of the hierarchy. Now it's extremely easy to get slowness. If that's all you want, you can just keep cells on. You know, there's no! You know it's very easy to keep. You know get slowness, but what you want to do is as you, you want to have things that are slower, but also discriminative of the input that phase detector cell is not going to detect chairs it's only going to be on for faces.

A

So what you want to be able to do is you want a representation that can or together all of these things, because each each.

F

Now remember the sparsity.

A

The density of this representation is going to increase because each cell is on for a longer period of time, so, instead of two percent sparsity, you might have a much higher percentage that are active, but at the same time you still want to be able to recognize faces.

A

So you want to be able to have a representation that can or together all of these similar representations without getting confused with other topics or with other concepts.

A

And so again I think the superposition of the ability to be able to take a union of different inputs and being able to maintain the discriminability at the same time is an absolute necessity. If you want to have a hierarchical representation that works well so again, the sdr and that the the, if you have the numbers right, you can get this property, and this is absolutely uh key, all right.

A

Okay, so if you think about what's happening, it's really cool, but you know the temple pooler is making multiple predictions about what might happen in the next step and with the slowness property you're, making multiple predictions about what might happen. Multiple steps into the future.

A

Okay, so the output of a set of a of a level is actually this um a combination of those two, it's you know all the different things that might happen in the next step, as well as multiply by all the things that might happen, multiple steps into the future okay, so this little spatial temporal concept is, is represented all in one sdr and I don't know of any other representation that.

L

Can actually get you that property and I think this is really critical as you go up. The hierarchy.

L

A

Just saying that you might want to clarify a little bit about how we have implementation yeah. I thought it glossed over yeah that here, but um you know in the cla, if you look at the white paper, there's the neurons the weight there's, a mechanism by which you can, and so um you know, sometimes called a pooling.

D

A

You know a cell is predicting, you know, will become active, not not just to predict the current input, but also look one step back at what happened at the previous time step or two steps back and and try to become active when that comes on. If you do that, recursively, you get cells that are predictive further and further steps into the future.

H

And again, there's a limit to how much you can do that because you you don't want to get too dense there, but that's sort of the basic mechanism by which happens. So you want to say something or.

H

Yeah, I think, just to clarify in his comment that this the term temple cooler came from this property. That's why we call it it's pooling over time. It's basically the cells are staying at all time, but I think what ian was trying to say is that the current implementation, if you just install new pick and run with it, this property is not enabled right from the start um and.

A

It's really only useful in the hierarchy yeah and we're not doing much with hierarchy right now, and it also is computationally much more expensive because you.

H

The cells have to.

A

Do a lot more thinking right about what they're doing so, uh but it's there. We tested it to some to a certain extent, yeah the pseudo.

D

Code in the white paper does have it yes yeah, so you can. But if.

E

You just turn on the system and run it today. You won't see that in there yeah. This is.

A

E

Well, explored by us, we've tried it a little bit and I can talk about what we've done. um Yeah there's definitely a lot of room for research and exploration here and creating hierarchies, but um okay. So that's the sort of the first part of the talk just I just wanted to talk about one concept in depth because I think it's really cool and we haven't really talked about it much and it's sort of critical to sdrs and getting the whole thing working. So hopefully, that's been helpful and I'm open to any questions.

E

um So you mentioned that.

E

um Yeah are all the cells in the predicted know, which are predicting two steps in the future, rather than one.

E

It doesn't know um doesn't know that. Okay, is that part of what you mean by timing? No, okay, that's a whole separate.

E

If any of you know some basic neuroscience, a lot of studies been done on v1 at the primary visual region and the and b1 you see these cells that respond.

A

To lines that are that are moving in directions like across the screen and they're they're sensitive to like a line moving from left to right in a certain part of visual field cell will stay active. That's that's an instantiation of this mechanism. We believe that a cell has learned that hey I'm representing a line, and I can predict that the line's going to get to me so I'm going to stay active throughout that entire sequence until it becomes unpredictable and it becomes, it can come unpredictable for various reasons.

A

um So, but that's a totally different question than the one you asked which is about specific timing and which we have not done, and I was talking about it earlier, which is sometimes when you're playing back sequences.

A

um Well, let's put it this way. The cla today is an inference engine. It doesn't actually we can playback sequences, but it's as it's implemented.

M

A

Really just inferring sequences and if you want to play them back like like to make a motor behavior like my speech right now, uh which I'm playing back a sequence stored in this fashion, and to do that I have to have very specific timing and I have to be you know. I have to turn on my neurons and my muscles in the precise time of certain delays between different activations- and we don't have any mechanism in here at all. For that, that's what I was referring to earlier, there's no specific timing.

A

The cla can't recognize a melody based on its rhythm. Today, you can only recognize it based on its um it's the sequence of notes, and although those that got those team that did this in the last hackathon, they did a cheat. So it looked really cool, but um was a clever cheat, but um that mechanism is not in there and I have speculations about how specific timing works, but.

J

It's well beyond what we've done today.

J

Okay, any other any questions whatsoever about the cla, any concepts that you're worth of wondering about, or the questions you have about the code or um how certain things happen. We can. I just want to point out that subitize knows more than anybody. So no that's not true.

A

I've forgotten a lot since we wrote the white paper, well, I'm around too yeah yeah. uh So I was trying to ask jeff about that before, and he actually pointed me to you uh because you know a lot about cla, and so the question is: uh is there a way to hybridize? uh You know pattern, uh learning and pattern. Recognition with the unsupervised classification so basically can is a way to make cla uh learn.

A

You know, classes of patterns and in an automated way, yeah like, or is there a good way to cluster the cla representation somehow and um yeah so and maybe make a next level of hierarchy based on those you know on the names of the patterns, basically, rather than on the on the patterns themselves, you know okay, so there are a few different.

F

Answers I can give to that there's maybe two different.

E

Ones, one is, you know you could be asking about invariance and you know whether we learn whether you can use the cla to learn invariants in there. So this is goes back to the edge example and the face. So you want um the cla to have a very similar output, regardless of whether the edges are shifted or whether the faces are shifted or not. So that's one type of clustering.

E

If you will then there's kind of the more traditional machine learning type of clustering and jeff's dotting, so that one um is that what your question is about: yeah, okay, so that wouldn't apply to the it's, not something you would feed to the next level of the hierarchy. It's just given the cla. You wanted to apply clustering to it and that's very doable um so with clustering.

E

You, the main thing, is you want an input to the clustering where there's a similarity metric right, and so you want to have inputs if, if two patterns are similar, you want the representations to be similar according to some metric and as long as you have that you can apply any traditional clustering to it. So you can actually take the output of the s of the spatial cooler or the tempo cooler.

E

They both satisfy their property and then run clustering on that and I've actually done that in the past. I did that with the energy data. For example, I ran energy streams through the cla and looked at the columns that were activated and then I just ran a traditional like k-means clustering or something on the on the up on that. There was a very high dimensional space, but you can still run it.

E

uh You should use the overlap as your distance metric and then what I found is that you get clusters of buildings and it turned out when we talked to our client that those buildings actually clustered meaningfully.

E

So there were some buildings that had um these are gyms, that's where one of the hot gym examples come from, but there was a gym that there were some gyms that had swimming pools and they had a much higher energy usage or a very different pattern, and there were some gyms that didn't have swimming pools and there were some other characters. I forget, but.

A

The system actually.

E

Clustered according to that yeah.

A

But what if you have.

E

The same stream of data and two different discrete states- you know, like you know, day and night, could you somehow by clustering infer a switch between those, are two different different patterns and that there is a switch occurring between them with you know certain predictable or unpredictable? I know period. You could do that so.

A

That would uh you know you'd need some mechanism uh to know when one ended and.

E

The next begin, um or you could.

A

Just actually, you could just feed the whole thing in and just run clustering.

E

On you know, so you get one vector output per timestamp and then feed the whole thing into a clustering system and just see what clusters come up and see whether they're clustered, similarly all right. So if there's any similarity in the patterns, if you do it at the level of the spatial cooler, you'll get kind of instantaneous or static similarity, but if you do it from the output of the tempo cooler, you might actually pick up similarities and sequences, which would be very interesting yeah.

E

So we have a question from the irc channel. This is uh rick asks you talked about um how many str's you can store when you have 40.

A

Bits on out of 2048- and he understands the probability of false positives- goes up with more sdrs uh with more str bits. However, the the question is about um uh you assume that the false pos, uh false positive probability remains below some threshold uh and what is that threshold? So um how do you decide what is a good threshold, and where do you come up with that number?

A

So the answer is. What is the question is what is the what right uh threshold of error yeah for error? I believe so, um I imagine that's somewhat application dependent and.

A

So one of the things that errors in the cla are are kind of funny, because there are never any hard errors as you as you overfill the system in many different ways, if you're over training or you're, due to union of too many patterns. What will happen is you'll start it'll start over generalizing. So what so?

A

Let's say I'm trying to differentiate between the union and many different patterns, and now a new pattern comes in, and I've made an error and I've made an arabic because I've picked up part of one pattern and part of another pattern, and I said, oh that's something I've stored before.

C

Well, you've now made an error for that's similar semantically, similar to you picked a new pattern. That's semantically, similar to two previous patterns. You store it well.

A

How bad is that? Well, it may not be bad at all, as super tight said earlier, maybe actually what you want, and so what you basically lost is the discrimination ability to discriminate between two things that are suddenly they're subtly suddenly different. um You know so like so things start blurring together. After a while, you start you start being able to yeah. That's just like a bunch of other stuff I've seen before, and I really can't tell apart and I'll start generalizing saying. Well, it's just like these other things. So it's not usually a heart.

A

It's not usually a hard air. It's just. Usually the system fails by over generalizing.

C

And often that's not bad. We saw an example of that this morning with matt, so we can actually look at matt's.

A

Animal fruit example and we can actually analyze it with this mind. What what's going on is you know you have some sept representation.

C

Of an animal which might have a cluster here- and let's say this- is a cat and.

A

You know there might be.

C

Some other things here and then it's followed by.

A

I don't know cabbage, and that has a cluster here and some other cluster here now. If you see cat again and it's followed by broccoli in sep's representation, they're going to be very similar, so you know you might have you know a slightly bigger cluster here and maybe a smaller cluster here, and so once you train the system with these sequences. What the temporal.

C

Cooler will start to do is predict the superposition of these of these uh patterns so next time it sees something that looks like.

D

A cat it's going to predict the or.

A

Of these two okay and but it's not going to predict something in here and so.

C

D

Happen and as you saw it's making mistakes, but there are mistakes that are not that are totally reasonable. You know it's going to take predict.

C

If you take the ore of this and and and this these two, you might get another vegetable that has is more similar to that than either cabbage or broccoli, and so.

A

C

It's not really a mistake, it's you can call it generalization. It's that's how it's making predictions that are semantically, similar to what it's seen before. It's that exact same property that you're not going to get something totally.

A

Different you're going to get something: that's that's reasonable! Yeah. I was wondering if you had played with having sort of multiple, uh not columns but sets of columns wired to the same input with different sparsities, like maybe it'd, be useful to have more generalization or less generalization on the same input and then kind of wire. Those together. That makes sense that you're sort of over generalizing by having a higher density and sort of under generalizing by having low density, then having making uh predictions with both things at the same time might be yeah yeah.

A

So we haven't tried that specific thing, but we've.

C

Tried something very similar so in the temple cooler when you learn which segments of which synapses get attached to a segment it randomly.

A

Samples from the sdr from a from a region- and what will happen is that every time you see that sequence, it's going to randomly sample again, and what will happen is that over time, it'll get more dense in the areas where corresponding to the inputs, which are actually meaningful to that sequence, and the other areas which may be noise will will drop out.

A

And so you know you might see something very similar here. There might be some bits that are cat and tiger, and so on, but they're not as relevant to making this prediction. um You know those bits will eventually die out and you'll get more higher density, naturally within the areas that actually correspond to the signal. In that sequence,.

A

This is sort of more an advanced topic, but if you're following okay, um so the id you know tibetan is a great topic to talk about uh the density of unions and so on, and we talked about how you can fail if you have too many active and so on, but in the brain there's something that's going on that. We do not do today, and this could be a very interesting I'm not. You could get.

C

A

In a day's hack you might be able to, but it's certainly someone could work on what we underlying all the predictions is that each of the cells dendrites have these synapses on them and there's a threshold. If there's a number of synapses active over a certain threshold, let's say 10, then that dendrite becomes active and that cell gets be in a predictive state.

A

So we just have this static threshold in brains.

C

I don't believe the threshold.

A

Is static, there are these inhibitory synapses that belong to dendrites, which would in effect dynamically change the threshold, and why would it want to do that? Well, a system that's over predicting doing predicting.

G

Too many things at once could be bad.

A

And so what you would want to do is you'd want to lower the threshold and to get back to a reasonable.

G

Number of predictions, similarly, a system that wasn't predicting enough you'd want to.

F

G

I did it backwards. He would raise the threshold to reduce the number of predictions.

E

And if a citizen wasn't predicting enough, you could lower the threshold and get more predictions. So if you were looking at a sequence and all of a sudden there really wasn't any good prediction coming out of it. There's not maybe nothing. He says. Look, you know this is bad, you might say. Well, I want to force, make a prediction: just do anything you would lower the threshold until you got some predictions and then you could go with that. I believe this is what happens when I made this.

E

I made this comment in talks that if someone says hey, do you see that you know animal in the cloud? Well, there's no animal in the cloud, it doesn't look like the dog, but if you lower the threshold and it'll start trying to make predictions and eventually you'll see the dog pop out, so that's an interesting thing we haven't done probably need to do some time. It's related to all the stuff that I was talking about. um That's pretty tricky research topic, but it's really might not be. Maybe you'd implement it pretty quickly.

E

Would that happen, sell sell by selling would each have.

E

Yeah, the inhibitory cells usually spread over some regional area, um and so you would be looking at the cumulative prediction of a some area. So if there's too many cells in a predictive state, it could- or you know active, if you will it would, it would tune them down. If there's too few, it could tune them up type of thing, yeah. That makes it easier to do it from a code standpoint, because we have a single threshold. You can actually change that for the entire.

A

And you could change that on an iteration by iteration basis and you could play around with this and another thing which I thought I'd just throw out here. You danced around a bit in your talk, but but just stated explicitly, someone asked earlier about the topology one of the nice things about the cla, and these properties. We're talking about is that they can all be implemented with local rules. So if you had a very, very large region like a million columns or something which is not very large for a brand, it's pretty small actually.

K

So you've got a million columns. You don't want to have all the columns talking together, you'd use topology and pretty much you get almost all the exact same properties, um if you just local, make local connections and local inhibition and local rules and brains have to do this, of course, because cells can't connect very long distances, they have to connect when you.

A

When you do that, you have to keep these numbers in mind, you can't you have to be very careful about how you set the parameters, but if you have.

K

You, if you kind of have this property at the back of your head, you can kind of help. It helps you set those parameters and going back to the segment. You know you mentioned the threshold on the.

A

On the segments.

K

A

You know, I think, so what should that threshold be? uh You know it can be dynamic and we have it set to some static thing, and I forget what number we have it's set to like 12 or 15, or something like that, but we have a actually. I think we have a range that we swarm over uh typically, but that range has to be chosen so that.

K

With with this property in mind, if you make it too low, then.

A

You know then you're going to have too many false positives, because that, basically that's that what that threshold says is that if you have a pattern, uh should I predict this particular pattern. Next,.

M

Or not, and if the threshold is too low, then you're going to do a lot of false predictions and if it's too high, then you know you might be too sensitive to the noise or you might uh you might just slow down, it might be unnecessarily high. So you know how you that number, that you set that segment threshold to has to be done with you know this property in mind, same thing with our encoder bits and we set a reasonable number of on bits per field.

M

It's all because of this, so is the the size of a region in terms of number of columns, identical to the bit length of the vector or not um so the output of the spatial cooler is, um you know, 2048 columns and some number 40 will be on typically in our in our settings, so that would be one of the vectors and then the output of the tempo cooler.

M

You have multiple cells per column, so the output of that vector and what the output that's actually fed up to the next level would be the number of columns multiplied by the number of cells per column right. So it's a huge it's a pretty big vector but say if you would have a region of a million columns. Would you still talk about a 20 48 bit vector, or does it become a million? No, there would be a million.

M

If you have a region with a million columns, then it's a million and then, if you have 10 cells per column, the output from the temporal pool will be 10 million. Okay. So if you talk about a region with a number of columns that automatically tells you the size of the vector and then the sparsity tells you how many ones there are now you probably don't want to have such a huge region. It might be just you know.

M

Well, it depends, but you have to you know beyond a certain number, if they're all kind of looking at the same part of the input, it's just wasted space, you don't need to have so many things. Once you have topology like vision, then you can, you can imagine, having really large regions, okay,.

K

And another question is the so the collapsed uh 10 uh ored vectors that become one? Is that then.

K

Speaking of collapse,.

K

um What was my question? Oh yeah, so the sdr is the collapsed 10 bit patterns right. So if I'm looking at this image of the 2048 bits yeah so that that is a prediction of uh 40 bits.

A

K

A

The sdr that it predicts and.

E

It has some that it predicts right and some that are predicted wrong yeah, so the output of the temporal cooler will be um all the prediction of all the different patterns that could happen next. So if you have a sequence, you know a.

E

Second, and if you have a sequence, a that could go to b or c, then, when you present a the output of the tempo pooler will be the or of the patterns for b and c.

E

The patterns of b and c are those than those uh those 10 bits that were 40, but you sort of yeah. Okay. Actually let me okay. This is a little more involved, so you have 2048 columns.

E

You know each column is a is the number of cells.

A

So if you have the pattern a and it can be followed by b or c, if you look at the columns that are predicted by the tempo pooler, okay, that will be the or so each each letter here has a particular set of 40 columns that correspond to that better.

A

L

You present after you've done the training once you presented.

I

A if you look at the columns that are predicted by the temporal cooler, where there's at least one bit on okay, that's going to be the or of the columns for b and for.

A

C: okay, the exact.

I

Cells that are on here.

A

Are going to represent the temporal context for that sequence right so this is, um I don't. I didn't explain that properly at all. This actually takes a little while to explain, and if people want we can, you know, walk through the operation of the temporal cooler and exactly what this means. That might be better done in the smaller yeah and I'm happy to do that exactly. You know how we construct the cells that come on and how we predict higher high order sequences and do all of that great thanks.

A

I think most of the most of.

I

You have, I think, the feedback I got from the meetup group is that you know people most of you have read the white paper and gone through that and kind of understand the basics of that already.

A

But I can I'm happy.

I

To do that in a in.

A

A smaller section I'll try to record it too, and I think rahul did some of that in his video yeah there's a youtube video called cla basics. It's on our youtube channel, where these are streaming so just search in the history and wait. Jeff.

A

I just I'll give you a very, very high level view of this, because this was one of the hardest things to solve. For me, for a long time is when you're dealing.

G

With these sequences that super tight has been talking about- and these are high order- sequences meaning you can you.

K

Can distinguish between repeated elements in these different sequences that at any point in time, you have to be able to tell me what the input is, and you also have to be able to represent it uniquely. So there's always going to be place in the brain where you have the same pattern coming in, but a unique pattern coming out and I need to be able to go backwards too.

K

I may be able to make a prediction which is unique to the sequence but be able to bring it back to the sort of comments like I predict the next note and a melody. I have to know that this is.

E

My brain actually knows this is the 13th. You know interval of a fourth in this melody, um and so, but I have.

A

To be able to bring it back.

E

Down to very generic, like here's, the note type of thing, so this bouncing back and forth.

A

Between a common.

E

Input and a unique representation and a unique representation, a common input which we're doing in the columns and the cells. The columns are the column input common input. The cells are the unique representation, and this idea that you go back and forth it's an absolute requirement for any system like this, and it was.

E

um It wasn't obvious how to do this at first, but in the end it turned out pretty nicely, and that gives us some confidence that this is probably what's actually going on the brain, and it does take lots of several repetitions to really grok that it takes a little while- and it's kind of this question here that we often go through in the in the quiz. You know you learn these sequences. Now you present, you know part of a sequence.

E

What happens next, you know what is predicted and how is that represented in different situations, so yep, so the output of the temporal pooler is all the active cells right that got activated because they were in a predictive state or bursting and ored with all the prediction, all the cells in a predictive state right, that's the first. uh I don't know the predictive status.

A

Is I don't think, that's that.

A

Output of the temporal pooler to the next spatial puller in the hierarchy. Is it true that it's it's the active cells?

A

It's the active cells and the columns, but not the predictives, not the predicted cells, then, if you think about it, predict the active cells, the set of cells that are active at any point in time, encode, the full temporal context, that's needed, and it also determines exactly which cells are predicted.

A

So you don't actually need the predicted cells that that set of active cells are sufficient.

A

They also predict. uh Could you repeat that last sentence about how they uh the the way you determine which cells are predicted, is by looking at which cells are active? Currently, so it's? Actually you don't it's redundant. You don't need both okay yeah just to explain how this works. Okay, when you.

E

A

Input: okay, the that causes a prediction for what's going.

E

To happen at t, plus one okay, that represents what the cla believes is going to happen next and it's probabilistic, so you.

A

Get kind of 30 activity for one pattern and 70 for another, we'll say right: okay, so that means that you have 40 columns for a that are 30 predictive and 40 columns for b that are 70 predictive okay, now in the neocortex they're voltages on particular cells, okay and that's the temporal pooling, is how you learn: how to create those voltages on.

M

On those cells, okay, that's done through the connections between the currently active cells and.

G

These cells that are predictive now as a result of the activity, that's at t equals zero. Okay. So the next step that happens is you get input and currently they don't do it, but that's! What I did yesterday is the input is added to the predictive state right, because the voltage coming in from the prediction and there's a voltage coming in from the input, so the ones that have the biggest total they fire. Okay. So it's either going to be a or b and 70 percent of the time it's gonna be b.

G

Okay, so what the output.

D

Of the temporal cooler is, is the winning cells, so in the columns that fire.

G

Because it actually was b, the ones that best the ones that predicted b are the cells that indicate which sequence you're in that's the output of the temporal pool. Okay- and there are other cells that are being better.

G

Some cells are active that represent context and they're the output, and then there are they connect to other cells that are predictive because predictive state, yeah, they're sort of depolarized is that the they're basically ready to fire when they see the next input come in they're ready to fire, they're gonna fire more likely because they are in a predictive state right.

G

So then, how does the temporal pooling aspect of it apply in the hierarchy? How how do we get more stable output? That's a good question, so I don't know if we've ever really figured that out.

G

This is a great question and um this is a problematic part of the theory, um because although we can make it work well, I don't. I can't get it to match the neuroscience exactly and- um and I think it's more than we should do as a group. I think we should just pull this offline.

G

If you're really interested in this piece of thing, there's some it gets pretty detailed um and but the the way we think about it now is that the all the cells that are firing- those are the cells that are currently active plus. Those are the ones who are in temporal pooling state, that's the output of the next level. What fergo was talking about is the cells that are the predicted cells are just depolarized.

G

No one knows that except the cell themselves, and that's the prediction for the next step in time, um and all this actually seems like it should work. Well as, but I said, there's some there's some problems with doesn't quite match the neurophysiology, and so I'm not happy with it and I'm not willing to say yes, this is the answer. There's something weird going on. I don't understand, I'm happy to talk about it, but not maybe not in this big yeah.

G

Any other sort of basic kind of foundational questions we can answer and stuff, and then you know we can definitely break out into smaller groups throughout the next 30 hours.

G

G

Just in the document it says that the input is connected to the columns and that connection is learned over time.

G

The input is connected to the columns, yeah and but those connections are those fixed or are those learned or this is in the spatial puller, so yeah, so the each column in the spatial cooler connects to some percentage of the inputs below, um and so that's another random sampling and the connection. Each connection there is is represented by permanence, and that is learned and the impact that it has on inference is either it's going to be above a threshold in which case is connected or it's not connected, but that is a learned.

G

Part of that is learned yeah, but is it so is that for the cells that are within columns that you have a permanent or also between okay, so both in inputs and the columns yeah, so the spatial pooler? You know the it's, you know, columns that connect to the inputs and so their permanence is there and then in the temporal cooler. You have multiple cells per column and they have connections laterally to other cells within that within the temple cooler. So those are also learned and we can talk about the biological mapping for that.

G

But in the code, that's those are both learned and they're. Both problems- yeah, okay, I'm gonna, do one more question. Then we need to take a break. Okay,.

G

um So when, um when I think about examples of.

G

uh um Any equivalent representation in cla there are different states, but they don't seem to have a probability.

G

Sorry, I would do you have an answer for this I'll. Go ahead. No! No! No! No! You usually give better answers, but um do you? I don't know if you had an answer for this, why don't you give the neuroscience answer and I can give you okay? I would argue that you actually don't have a very good probability of things. You're gonna predict. Next, the example I use all the time is: is you're constantly predicting what what you're gonna hear, what words you're gonna hear and so on.

G

So if you listen to someone speak, your brain is constantly predicting. We know that because if they say something really odd, then you know that was wrong, but but there are many things you could hear at any moment in time and I think it's actually challenging to say. Oh, I have a probability of knowing what is the most likely word and what's the next most likely word and so on.

G

In fact, I think most of the time you actually you're not conscious at all about what you're predicting it's not you're, not even aware of it, um and it's only if I sit and say now give me a prediction: you and you do something to do that.

G

Then that's when you might get a probability, but the kind of predictions we're talking about here are ones that there's actually no activity in the cells to be external to the cell, and so you really don't know, there's no conscious and there's no conscious ability normally of what you're going to see or hear or feel um it. Just it just happens, and you just know it it's right, but you know, but I don't deny that you can say well give me a likelihood of what word follows.

G

You know this, uh but I I'm not sure we're not trying to capture that here. I think it's beyond what we're doing here. Yeah. I think the word probability has a very loaded, meaning too it's it's a very specific mathematical concept, but there's there's also some. You know other ways you can get.

G

You know kind of likelihood from the sdr, so if you think about again the sept representation could have you know if you see animal followed by a vegetable over and over again you're, going to see the vegetable areas really well predicted, but a few times you see, um you know I don't know steak, um and so you might, you know different types of meat, but you see it very rarely.

G

You know that part of the sdr is going to be fairly sparsely sampled, and so you have some notion that you know vegetable is much more likely to happen than that, but uh you wouldn't really be able to you know if a meat you know if, if a particular type of steak happened over and over again, you wouldn't be able to distinguish that versus just uh you know: meat in general not happening, you don't get the exact probability, but you can kind of get a rough sense that yeah vegetable is more likely.

G

I also, I think, that's good, and I think I could tell you a mechanism by which we could get the cla to do what you want to do um again using thresholds on the dendrites, but um that's a quick standpoint since uh in newpick a classifier on top of the cla. So it's um and that classifier tries to estimate the actual probability of the next steps happening and so we'll actually give you a probability distribution across the values that it expects to happen. One or n steps into the future.

G

You know- and that will be you know close to an actual probability, but that's not part of the sdr or the biological part of it.

G

Yeah we're going to take a break. um I want to take a quick poll. Real quick for uh the next session is uh mapping, machine learning and artificial intelligence terminology to the cla, and vice versa, who is planning on attending that? Okay, a lot okay! So we're going to do it here, just making sure do you have slides at all in okay, so we have a wiki page, so um I'm gonna, I'm gonna, set that up and we're just going to basically uh record the audience and I don't know, pass the mics around a.

G