Numenta HTM School, 24 Jun 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Datetime Encoding (Episode 6)

Description

Now it's time to investigate datetime encoding, and explore how different semantic information from the same data point can be encoded into one output SDR.

Encoding Data for HTM Systems: http://arxiv.org/abs/1602.05925
HTM Forum: https://discourse.numenta.org/categories

Intro music: "Books" by Minden: https://minden.bandcamp.com/track/books-2

A

What's that bring it up here, let's see what was so important: it couldn't wait until after class. Do you love HTM school? Yes, well, you're! In luck, because it's time, for instance,.

A

Hello and welcome to HTM school I am Matt Taylor from Aventa today, I'm going to talk again about encoders this time about the date-time encoder. But first let me talk about a few principles of encoding that I didn't touch upon in the last episode. So encoders are like the outermost layer of an HTM system that translates real world data into SDR so that they can be processed by H dam systems.

A

It's very important that a given input value be converted in a way that translates captures the important semantic characteristics of that data inside the SDR, that's being created, so there's generally four different principles that we always want to pay attention to when we're talking about encoding data. The first one is that semantically, similar data must have high number of overlapping bits with other semantically, similar data. So remember we talked a lot about overlap. Scores in previous episodes.

A

Well, something that's semantically similar like the number 462 and 463 might have a lot of overlapping fits in their different representations. Number two: the same input should always create the same output, so the encoding algorithm being used inside your encoder should be deterministic number three. The output should have the same dimensionality all the time. So if you create an encoder and it's encoding data, no matter what the input that it's getting, it should always produce arrays of bits with the same length. That's that length is never going to change over time and number four.

A

All the STRs and encoders creating should have a similar sparsity across the entire input space. So it's not like you would get a much denser SDR in one area of the input and a much sparser one in another area of the input should have a similar sparsity throughout the entire input range. It also needs to have enough one bit so an enough density so that it can handle noise and it can handle subsampling, which we talked about previous episodes of STRs for more details about coders, there's a great paper by Newman to engineer Scott Purdy.

A

It's called encoding data for HTM systems. It's available in the video description down here, I highly recommend! You read it. If you want to know lots more detail about coders and different types of encoders, please read that paper, so we're gonna now talk about another encoder, and this is gonna, be the last one when we talk about called the date encoder a date/time encoder and we're going to take a basically a time stamp or just a point in time and try and semantically encode details about that time in an SDR.

A

So let's take a look and I'll show you what we have here so here is our visualization. The date that we are encoding is defined here, it's a little bit small, but it's June, 8th 2016 at 2:44 p.m. so that's just the one we're starting out with- and this is essentially this encoding here, this entire encoding, that's the encoding that the date decoder is producing, how it's doing that is it's looking specifically at different semantic information. It wants to extract from that date and it's encoding them individually.

A

For example, it's encoding the day of week here in this representation its encoding, whether it is a weekend or not in this representation, its encoding, the time of day in this representation and finally, the time of year or the season in this representation- and this is something that any one of us could do. We could say well, June, that's some time in the middle of the year. That's kind of why this season bucket right here is in the middle of the representation 2:44 time of day, that's somewhere in the middle of the day.

A

It is currently not a weekend. So that's why this this generally has two ways to represent a weekend or not a weekend and the day of week. It is a Wednesday, so we're right in the middle of the week.

A

Okay, let me let me get into this date and show you what it looks like when we start changing things around so I'm gonna jump ahead one week and look at all the bits on the screen here and watch what happens when I jump I had one week so from the 8th to the 15th I'm just going back and forth here.

A

You can see that the only thing that's changing is the season encoding the day week hasn't changed because I'm still on Wednesday I jumped ahead one week the weekend encoding hasn't changed because I'm still not a weekend and I haven't changed at the time. So nothing is changing, except for that season and, as you can see as I go into the future, August September October we're getting further along in the year until we overlap once we get towards November, December January and then that season SDR representation is periodic.

A

Like we talked about in the previous episode about scalar encoding, it's a periodic encoder, so it just wraps back around because January is semantically similar to December because they follow each other and that season encoding. You can also see wrap around here in the output encoding as well as we move it. A lot right. So if you haven't figured this out by now this encoding down here at the bottom, this grand encoding, which is the complete output of the encoder, is simply the combination of all of these different encodings.

A

Now we're not doing an Andrew or on them, it's just a concatenation. So essentially you can see this season encoder somewhere right around in here, it's the first one, that's being encoded and I think the time of day encoder might be next. So that's! This is the time of day encoding. This is the day of week. I believe this is the weekend, and this is the last one whichever it was so as we start moving these things around so I'm gonna move ahead in the week now so I'm changing day of week.

A

You can see that day of week change in the day of week, encoding individually. You can also see the entire the day of week blob in the entire encoding changes, because these arrays are just concatenated together. It's a multiple encoding with a bunch of different semantic values or representations in different SDR is just being concatenated together, and this will just be handed to the spatial. Fuller yeah job will be to decide how to normalize this and represent the actual semantics that exist in this STR coming from the encoder.

A

So one of the important things I want to talk about here and I didn't do time of the day. We can see time of day changing if I go into the time an hour and keep moving it along, but it operates the same as the others, so you might be thinking. Why do I need so many bits to represent the day of the week? Why don't I just have seven bits, and one of them is on for each day of the week, yeah so yeah. We could totally do that.

A

In fact, I have a slider here on each one of these, so that I can control how big each representation is or how many different on bits. So so, let's try and minimize this stuff I can do two bits for the weekend. It's either weekender. It's not seven bits for the day. It's either one of the days or not, and we could even you know, dial down the time of day and just have three buckets for the time of day.

A

I don't have to good, rather have time of day a little more, maybe season a little more. But the reason why this may not be what you want is because now the encoding for day a week and time of day are really small compared to the other encodings for time of day, especially if we make these a lot bigger time of day in season could have a lot more bits.

A

So, as we change the day of the week, there's a significant amount of bits moving now in the season encoding and if we go into time of day and start moving hours again, a significant amount of bits changing just for a change in the hour. But for the day, if we move from Thursday to Friday, there's hardly any bits changing. In fact, there's there may be more bits changing in the season encoding than there is in the day of week, encoding.

A

So we're so there's there's a waiting problem here, the time of day and the season encodings are weighted much more heavily now than the day of week and the weekend encoding, so that semantic information is going to be conveyed more strongly in the HTM system than day of week and weekend. You might not want that. You might think well. Dave week is actually extremely in important in my data. I want to make sure that all of these different representations of semantics and the data are equal.

A

So what we'll typically do is make sure that they're equal by providing a common bucket bucket widths across all of these different coatings. That are coming out of this date, so at this point now, I'm going to use 21 for the common bucket width for four day of week, weekend time of day and season so I know as these things change.

A

Each of those semantic information categories gets this the same amount of attention when the HDM system process, when the spatial Pooler tries to understand this and normalize it, it has the the same ability to extract each one of these semantic bits of information. So that's an important concept. I think that you should think about when you're talking when you're talking taking a point of data and trying to understand and extract the semantics of it into an SD are a representation of bits.

A

You want to make sure that there's a there's a common amount of on bits between the different things that you're trying to extract so I have talked about very basic encoders. We've done a scaler encoder random, distributed scaler encoder and now date/time encoding I saw a comment on a previous YouTube video. That said, let's just skip it. Coders and coders are easy and more interested in spatial pooling. So yes, the encoders that I've shown you so far are pretty easy. But encoding in general is not an easy thing to understand and do take.

A

For example, sound we don't have a sound encoder for HTM systems, and it's not because we haven't tried it's because it's really hard. It turns out that your cochlea has had hundreds of millions of years of evolution to become the extremely complex system that it is it's much more than just a bunch of hairs. Responding to different frequency ranges.

A

There are papers and papers about how the cochlea works and how it creates this representation it passes in to the neocortex in the rest of the brain, so coming close to biological sense is very, very hard, but there's a lot of opportunities for the HTM community to really innovate and create new encoders that are specific to common problem areas. In the last episode, I mentioned a blood pressure encoder, but there could be lots of different biometric type encoders that can focus in on one specific problem, problem area and semantically encode.

A

Specifically, what needs to be encoded much much better than a collection of scalar encoders. Could ever do so? This is a place in HTM. That is right for innovation. So I challenge you to think about different ways that you could encode data into sparse, distributed representations because you might be able to provide an encoder that the rest of the community could use and create amazing things with so again. If you want to discuss encoders in general, you should go to our encoders section of the HTM forum, which is linked in this video description.

A

Thank you for watching this last episode on encoders and I'm, Matt, Taylor and I'll see you next time. When we talk about spatial to length.