Numenta 2013 NuPIC Hackathon, 3 Nov 2013

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CEPT SDR Session with Francisco Webber (NuPIC 2013 Fall Hackathon

Description

CEPT SDR Session with Francisco Webber (NuPIC 2013 Fall Hackathon

A

B

So then I slowly start I try not to speak too loud.

B

Okay uh welcome this morning, um I have had the opportunity yesterday uh to talk with many of you uh on one of the other aspects of of our work, and um so I would just kind of try to make the picture complete by now and show you more or less why we have started doing what we are doing and how we are doing it.

B

So, maybe to give you a bit of a context. uh I was personally following jeff's work for many years now, shortly after he published his book and at that time I was deeply into trying to find ways to better process, textual information and stuff like that. That was involved in a search engine developments for various application areas.

B

The latest one was patent search, in fact, which is a pretty tough discipline in information retrieval, because you tend to have a lot of documents where even humans have problems in reading it and getting what it's talking about, and we basically tried uh endless numbers of algorithms methods to improve the the the the retrieval of such documents from repositories.

B

um But what I noticed uh in the traditional, I would call it traditional approach now uh that basically uh works by counting the words in documents and then inferring some statistics based on the distribution and the counting of these documents.

B

The thing that basically, the people knew who are working in this field is that you reach a certain quality uh pretty fast. So everybody who has tried something like a lucine index or a solar index knows that this works pretty well up to a certain point.

B

But if you have more, if you have higher needs on precision or if you have so many documents that a result set of- I don't know, 5 000 documents is simply too much would be the case in in patent search, for example, because it would mean that you need to sort of browse by hand 5000 documents.

B

So if you want to improve this, if you want to actually find the five documents within 100 million documents that are actually important for what you're doing, then you need to invest a lot of energy yeah. So every percent, I would say beyond the easy ones, is extremely expensive due to the fact that you need to do things like annotations.

B

So you have to pick the words, for example, that stand for persons or organizations or locations or whatever, and you need to start to annotate this and to give the indexing engine that you're using hints on what the words mean and fundamentally the problem is that the algorithms actually don't know what the text is talking about.

B

They just know that a certain word occurred in a certain number of documents and that there is a certain statistical relationship between them.

B

So our approach needed to be completely different than this, because, basically all the statistical approaches uh they have been investigated pretty deeply uh and we wanted to find another approach, even if it's not as good in the beginning, but just to have another approach, gives us another tool set basically to to face the problems that we have there and we started with a pretty, I would say, straightforward uh thinking, namely that language is produced by humans, which means by brains and language is composed of symbols.

B

So it's not an information that you can understand if you have not the whole education and knowledge about the language, you cannot decipher it because the the the symbols themselves don't tell you what they are talking about, and so that's exactly the problem uh the computer has when, when he processes the text.

B

So this is one issue that language and- and we are just concerned about text, so we are not concerned about the speech recognition about ocr about all the other disciplines, so to say that necessary to basic, basically bring in the information from its physical existence uh might be a tape recording of somebody saying something.

B

So what we focus on is the actual text level, which means we start to reason at the moment when we actually know what word appears so to say, and we don't care how it made it up there, and the second issue is that uh knowledge and the meaning, so to say uh of words, uh is a problem in the thought that uh you don't in fact what we, what we don't know is how do we represent the meaning uh in a way that it speaks for itself yeah.

B

So if I take a numeric value, I don't know 50 degrees celsius that speaks for itself. You know what it means. If you know what temperature is so to say, and you know, if you have learned what 50 degrees celsius means, you know what 61 degrees celsius means.

B

The same thing is not true for words. If you know one word, it doesn't tell you how you should read the next one and where the meaning is, and this representational problem is in fact a problem that is very fundamental to artificial intelligence in general yeah. So how do you actually capture uh what we observe in intelligent behavior? How do you record this and make it happen somewhere else?

B

So what you can do is that you capture what is called features basically and you try to create feature, sets that you work on that you process and that you can store in a computer, but uh for uh the the language case or the text case to be most more precise.

B

uh The features are, for example, in the traditional approach, just statistical values. The word such and such occurred that many times in a text and that many times in a collection where the text is a part of- and that's a very slim feature so to say so.

B

In order to solve our problem, uh we need to find ways in actually uh learning features about text and learning them in a sufficient amount and in sufficient quality uh that we can reproduce on the quotes. What is meant by by text or by language.

B

So basically, what a language does from the point of view of the brain I would say um is that it basically captures a set of uh sensorial information that you got a context of a number of real world things, and we try to capture this in order to send it to another brain and have the other brain have the same experience as I had it. When I created uh the the the text basically or the language that describes it. So it's a a way of communicating stimuli.

B

That, in principle, should go to the brain and we developed language so to say to capture this and to transfer it and to allow another brain or person to experience. Something without actually experiencing it, and this is in fact an interesting improvement. If you compare this to animals who are not able to do this, uh they lack this, uh I would say social uh dimension and the social processing of of information.

B

What is interesting is if you observe uh neuroscience, or brain science uh and uh the science of of language understanding. So to say, you see that many of the complexities that you encounter are very similar somehow and if you go into detail, you find out that they actually come from the same fundamental problems, and this basically uh shows nicely that brain and language are extremely linked, and my idea was that by better knowing how the brain works, it will help me in better know how the language thing works and vice versa.

B

Good, I started thinking basically by doing a number of assumptions, so I decided so to say to take them as given and to try uh to find a reasoning based on that, without necessarily looking for so to say, the the fundamental truths of these assumptions, uh because it's the the the field is so complex, so that if you want to basically uh define all by yourself it it is not feasible.

B

So one assumption is that the neocortex is composed of modules, of small structures that are repeated over and over again and that the more of these modules you have, the better the more processing power on the quotes you have. um The second assumption is that these modules are designed according to a building plan.

B

That is the same for all the modules, so they are so to say there is one type of module that is available and this one type of mole of of module has an algorithm that does basically everything that is needed for the whole structure as such, and you see this in in many ways, for example, that the brain areas that do vision don't look very different from brain areas. That process, auditory information and so on.

B

The next assumption is that the inputs and the outputs to the cla are built of sdrs, so that there is a a standard way of inputting data in a layer and a standard way of getting this data and by having sdrs at the input and at the output. It means that you can actually stack layers and feed the output from one layer to the input of the next layer.

B

The next assumption is that the fundamental input where new information comes along is always by sensors. They need to be censored somewhere and they need to transform the data whatever they need to to transmit in the format of the sdr and to bring it to the cortex, and the last assumption so far is that the organization of these repetitive structures is organized in regions, layers and, finally, in hierarchies that all communicate between each other with sdrs.

B

So it's I I think all of these assumptions are pretty obvious to you guys here. uh I I just wanted to say that I tried to start with a with a very easy set of hypothesis. From my point of view and to try and find some answers, so, as I said before, we are focusing on a very specific.

B

I would call it for now virtual layer in the brain, the cortex, namely the place where, for the first time, the sdr of a word appears- and this has to happen somewhere in the brain, because you might read something so you have information that comes in through your eyes- goes to all the visual analytics so to say, but at some point there must be a layer where the output of this layer is a word which is so to say the fundamental meaning packet that language defines and regardless, if it comes from hearing language, seeing language even touching, if you, if you use the braille way of reading, so these are all so to say, different ways of producing this one.

B

I call it virtual layer of word sdrs somewhere in the brain and from there the words go to other layers where they are then processed according to what they mean until then, they have been just processed to figure out. What is the actual word that they are talking about?

B

So this is a bit low, so in principle you could uh create a representation like this, where you create, where you get a symbolic input, symbolic, because it is text at some point and from the symbolic input we have uh the the adaption so to say of the information until we reach the word level and from there there is a stream of word patterns that is created and that corresponds to the content of the actual text or speech or whatever.

B

That is understood and based on this, there is some, let's say, reasoning and processing happening, and then the the whole thing goes downwards uh through the same system through the motor output and we then produce again symbolic information based on what the result of our processing about the language has given, and this layer of the word sdr is what we have tried to capture in a technical fashion, and we call this retina, it's a bit ambiguous.

B

I know, uh but this sort of happened while, while working on this, the idea is that the sept retina that actually maps words into sdr word. Sdr representations uh has a very similar functionality if you want as the real retina that is converting images of the surrounding into something that the brain can process and in fact the retina itself is also not only a sensor but somehow a processor that pre-processes the information accordingly, yeah.

B

So if we want to represent words uh by sdrs in order to be compatible and assuming that somewhere in the brain, this has actually to happen. uh If we have all the uh uh aspects uh of the brain as I've listed before, um we have to see what is needed from such a word as they are to be a true sdr in the cortex sense. So to say one respect.

B

One aspect is that it is a bit vector, so every bit in this bit vector has an actual semantic meaning, and that's precisely what words are trying to capture are mostly big numbers of fine granular semantic meanings of where this? U, where this word can be used in what context of other words it can be used and what it represents in different uh ways in different moments, even sometimes, and each of these features is represented- let's say by a bit in this binary vector of the sdr.

B

So what it needs uh to comply to the sdr is the sparsity.

B

So we have seen that in order to properly process it in the cortex, it has to have a high sparsity in order to keep so to say, the the combinatorial space uh very large and in fact, if you calculate uh what could be possible so to say in a combinatorial way, by using language, you would have basically endless variations of sentences and so on, and still we just need a very thin layer of true sentences, so to say compared to combinatorial sentences that do not have any sense.

B

So we need to keep this very large to keep the language open to describe whatever might be needed to describe.

B

There is the aspect of stability, so the representation of a word as an sdr needs to be in a way that, if I drop some or even many of the bits representing the sdr, it should not vary much, at least in what the word actually means.

B

So this is in fact one of the hardest aspects, so to say in word, sdrs to make it stable against dropping a number of bits of the representation.

B

A very interesting, um a very interesting functionality. I would say that you can get by adding sdrs together and by having this compositionality.

B

The fact that you can bring a number of sdrs together and the sum of these sdr still contains all the semantic information that the components had before, which again allows you to save from any other, be unseen as they are. If it could have been a part of the set of stacked sdrs.

B

uh This might seem a bit um complicated, but, as we found out, this is a fundamental functionality that actually allows you to do computations with the sdrs, because then they behave like like a set according to set theory, and you can do many operations according to this set theoretical approach with sdrs, and if we manage to convert words into these sdrs, we would be able to do this with words.

B

The second most important aspect is the similarity, the the way how you could compute the similarity between two representations and therefore the uh repres, the similarity between two words- and this is very hard to sort of capture, because two different people would have two views on the similarity of two words but in what is interesting.

B

Is that with all this blurriness that you find there, uh we still always basically uh find some common ground and therefore we are able actually to have conversations, because if you would, uh if you, if you would need to clarify every word what we actually mean uh before, we uh uh continue our conversation.

B

There wouldn't be any conversations, so um the similarity is a very subtle measure uh and we have found out um that making this uh computable so to say is one of the biggest features that we have in using static sdr representation of words, which I will show later also, and then I have a last aspect.

B

You can't see this. I tried to make this in a different color, because that's my very personal approach to it. I do personally think uh that the topology in the sdrs is also very important. So this is uh so to say uh very much discussed in the last in the last weeks.

B

I personally believe that the sdrs, as we see them, resulting from words they are very similar to sdrs from the visual system, for example, where the fact that a certain number of features is positioned in a certain place on the sdr is part of the whole information so and has therefore be to be considered, and this is essential, as we will see later on the question- how to build actually hierarchies that process. This kind of data um yeah.

B

So I I leave this for for later, so that the key on the on the computing part is so to say that we want to use word vectors to actually uh be represented by sdrs, and we describe the words by using collections of features, as I said, and built a matrix space. So the matrix space is basically not a space with a coordinate system, but it's a space where the measure is done by a similarity of a distance between two points.

B

So the metric space is, if you want an endless environment that allows to compare any two entities by comparing the distance between the features and, as we will see, this works pretty well the problem uh to do this to actually create word vectors that correspond to the meaning of words and that comply with the necessities of being sdrs.

B

This is pretty tough, so in many different ways this has been tried for many years. In fact, and many different approaches have been done- I mean the easiest approach would be to create the features manually. So you take a list of all the words that you want to use and for each of the words uh you say, okay, I want uh to call the color feature. I want to call a specific taster feature and then you would have to start and create manually a dictionary that basically associates a number of features to every word problem.

B

There is, even if you have the manpower to do so. If you give the same so to say task of of of tagging words to another group of people, you would get possibly pretty different interpretations, so it's very hard to get something objective. Basically, that allows you to make more fundamental claims based on this, so the the next try was to find ways of generating these these features, so there was- or there is recently published a word to back approach from google.

B

That does this uh by using the collocation information to say for every word uh in my list. I look uh in what documents this word occurs and which words are in two two words before and two words after uh collocated with this, uh with the word that I tried to capture, and if you do this, you can, in fact so to say capture some of the semantics of this word, uh as you can imagine by just looking at two words before and two words after this is very shallow.

B

So it's not really comparable to our language understanding and, as a result, you need a lot of data to actually get statistical, statistically significant content that allows you to build the word vector that has a certain power so to say in its semantic content and in fact, by using this tool, you can already see that many interesting phenomenons do apply you can you can see that you can compare words and you get reasonable answers, if you so to say, query the system.

B

Another interesting aspect: there has been a a technique around for a while that that uses random indexing, where you don't take a training text so to say as a source for the semantic content, but you just try to represent every word vector in a random fashion, but in a way that you make sure that it is distinct from all the others and even with this random distribution.

B

If you further process data that is represented like this, uh you can improve so to say uh the unders understandability of text. Up to a certain point, I mean it's it. It tends not to be as as good in nlp area as the traditional approaches, but it's interesting to see that it works at all by using random representations.

B

So uh we have tried uh to develop from there an approach uh that ensures that we have a certain richness, semantic richness uh that represents every word. That might, of course, not be as rich as a true human understanding, but at least rich enough so to say to get some smart output from it.

B

So what we? What we want to create is that we want to create a system that that defines for every word, a set of contexts of list of words that are related to the first word.

B

And uh if we have many of these lists, because we have looked at many documents where this word occurs, we aggregate all these contexts to create some sort of uh standardized amalgamated context of these lists.

B

And what is interesting to note here is that many words have context of a different type. So to say you will find for the word apple. You will find lists of contexts in the computer field, for example, but you will find also about plants and biology, and you will find about music because it has been a record label and so on. In fact, if you look carefully, uh you will hardly find a word that has not many contexts, so it's not even the exception, it's more or less even the standard for words.

B

So the point is, we said: okay, the more context we have the better it is and the more precision we will have so to start and create our engine. We said we need in order to avoid manual definition. We need somehow to create the corpus that we use to train our retina.

B

This will allow us also to train it for specific types of languages, so it should be completely independent of which language you use. It should work in english as well as in german, french or any other language, but it should also work for languages in the sense of let's say, medical, english or english for chemistry or for aviation or whatever.

B

So the fact that we are based on a training couples allows us to decide at the moment when we take a specific type of documents to define about what the retina will be able to talk or what it will be able to understand.

B

For uh our uh first prototype. We use wikipedia because we we found that it is more or less a general english approach, where we have a well-balanced importance of the all the different semantic meanings and it will be easier to investigate it, and will it will be easier for uh uh average users who look at our demos to understand uh what what is happening there.

B

So we take the wikipedia documents and we process them uh basically by, and we have done this basically on trial and error. We use all the machinery that exists so to say in nlp. We try to apply it and to basically get most out of the meaning of the documents as we find them.

B

So we do a lot of annotation of filtering of preparation of segmentation, of the texts and in the end we get a collection of documents where the most important aspect is that every document just talks more or less about one thing, and this one thing is the word that it is a context of yeah. So this might sound a bit blurry, but I'm afraid that's as concrete as as it as it can be. So it's really.

B

uh um It has really to be evaluated so to say how you prepare uh such a training uh couples, but we have managed to automate this process. So we don't need 300 people to actually do this, but we have a set of algorithms that basically, um at least for languages like english, german and french does a good job in the automated fashion and yeah.

C

When you say the document is the context of a single word is that the topic of the wikipedia.

B

C

So you take that we.

B

Take the topic exactly, and so you.

C

Assume that all the text after the filtering and so on is the context of that yeah.

B

And typically, what you find, for example in wikipedia is, I don't know you have henry ford and then you have the life of henry ford, the work of henry ford and so on, and we try to segment this to have always like uh one one statement about him in in in one snippet, so to say: yeah and uh basically we don't need to explain all the words that exists in in that fashion, because there is not a wikipedia document for every word that exists, but by using so to say a sufficient large number of topics.

B

We end up using all the words that we basically use by describing uh these different topics and so on, and that's by the way, more or less how we do our language learning too. So if we go to school, we learn about a certain number of topics, and while we learn the topics, we learn more and more words to actually understand and argue about these topics.

B

And we build our vocabulary not based on lists that we that we learn, but by using the words in a description or by talking about topics that are in common and by choosing the topics we ensure that we have a certain semantic space yeah. So we have history and geography, and so everything that is important uh in the world basically uh becomes a topic, and then we can decide for every topic.

B

How deep do we want to dive and the deeper we dive, the more specialized language we start to get so we might start in yeah.

D

Within each article of wikipedia, you have subsections and they are different contexts.

B

Yes, yeah yeah, because what we want to create this one context is a list of a certain number of words and we want to create for as many words as many lists of such different contexts as possible for the training corpus yeah.

B

If we do this with a number of topics and an interesting aspect, maybe that that I could add here, uh contrary to the approach, for example behind word to vect, where you need to get larger and larger training corpora, which is, of course not easy to do uh for the others than google.

B

What we try to use is to to be more and more precise about the single snippet snippets that we produce and to use less and less of these snippets. If you uh again compare it how humans learn, learn language, you don't learn language by reading 5 million books. If you in fact look how many books you have learned or read between learning, to read at all.

B

First of all, up to let's say the end of college: that's in reality I mean compared to the google repository it's a handful books, but this was sufficient to give you a pretty precise view on what language represents and how it works.

B

Yeah, and so we tried with our approach also to rather go into using less and less training information, uh because it has the advantage less information needed uh less work to prepare it less errors and so on, and it will also be much easier to transfer this concept to other languages, because I don't know, for example, the thai wikipedia is by far not as rich as the english one. So if we do need, I don't know millions of wikipedia documents. Then we would again limit ourselves to the few languages where this is available.

C

Did you use sources other than wikipedia or just wikipedia now.

B

We started now experimenting with other sources. The point is that wikipedia is simply optimal optimally, structured so to say for the way how we use it and other sources need a bit more pre-processing to do so, and in fact we are now investigating uh uh on how important this encyclopedic aspect is, uh because if it's, if we find ways of not relying on this enzyclopedic acid aspect, then we could use other and and more general uh text. Material, yeah,.

C

I was wondering something like a thesaurus, for example, you know yeah, but.

B

C

That would give you a lot of similar terms.

B

The problem is that the tesoros.

C

B

Very narrow, it's the same like dictionaries here they are good for defining a word, but they don't really give you a context. Maybe if, if it's a dictionary with many examples, then you get context again, but again it would be a be a bit of a synthetic context, because that would be kind of sentences stating the use or the meaning of the word, uh and we found that.

B

Having really narrative descriptions uh is what makes a difference in the end.

C

For the people, I like the analogy about how children read or learn you know, just a small amount of text is.

E

C

We thought about like children's non-fiction or you know something because that's very yeah.

B

And it's obviously sufficient to lead to very complex handling of language.

F

Have you considered, I mean I'm not an expert in linguistics, but my passing understanding is that there are theories that uh language understanding is genetically encoded uh in our our brains, um and so it's not like purely a blank slate. um Have you taken that into consideration.

B

uh Absolutely, but to tell you it's somewhere, we want to go and not it's not the starting point, it's more or less a place where we want to uh to to go by applying these techniques. The point is that we are actually not yet at the level that we want to do language understanding, which is yet more complex.

B

What we want to start with is to to to understand words to understand the meaning of the atoms of language, and I will talk later on the reason why we want to link our retina, in fact to clas is because that would be the next step so to say uh to use the words with a meaning to become sentences sentences with a meaning streams and sequences of these patterns yeah. So this is just meant to capture the meaning, whatever this means of single words uh in a realistic fashion. That's that's the goal of this approach.

E

So um it I can see how a noun would work, because if you have a noun in text, then all the words around it are kind of describing that noun. But for say the adjective the adjective is there, but you're not really describing the adjective.

B

It's not really about describing it's fundamentally. Words occur together when I can formulate something that describes something by using this set of words. It's not that I I'm not looking for a description of the word in in in in the pure sense, so to say, but I want to create realistic contexts, and one of the easiest and realistic context is the explanation of a certain topic. For example, there you find also the adjectives when you explain uh again the the word henry ford.

B

You will find that there is a bias on which adjectives are used in many contexts again, but the adjectives of of the document about henry ford are pretty different from the documents about the naza yeah, and this is precisely what describes the contexts uh in in which they appear and for every word in this description.

B

All the other words are the context, so even for the 25th adjective in the text about henry ford, all the other texts is the context for this adjective, and if you have many of these contexts, then you can start and say something about the word about the meaning of the word. So to say, or at least you can compare it to other words.

B

Yeah, so the point uh is that by creating the retina corpus, it's the only place where we try to use all the tricks and tools that the traditional nlp provides us with, even if they are very costly. Even if it's things like a part of speech tagging the documents, which is not an easy thing to do, if you want to do it in a good quality, but the point is: if we do it once in the retina, then it's done after it's done.

B

We can use the retina to work on any other text and we never will be obliged to make a named entity recognition on these other texts or whatever, but the vocabulary is already defined and all the occurrences of the words with their possible tags are already defined. That's that's the approach so to see um yeah. So how does this? How does this work? We, as I said, we have some a number of source documents.

B

We pre-processed them in a pipeline and we have in the end documents that contains that contain uh terms and out of this term of these documents, we create a corpus that we then represent in document vectors, which is basically every document is described by a vector that says uh if a word is present in the document or not.

B

So this is very traditional so to say in the next step, we take these document vectors and we use a variation of common networks to make a 2d map a semantic map of these documents that we have created. So the result is that every document from my training corpus now gets a coordinate that actually positions it on somewhere on this map. I don't know if you have seen this. There are millions of ways of calculating these maps. What it basically does is that uh everything, let's say that talks about pets is in a certain place.

B

Everything that talks about cars is in a different place and all the documents are arranged in a in a way that documents that are nearby uh are talking about similar things. Basically- and this works pretty well- and we create a map of our training documents in this step and in the next step, we need to link the semantic that we have captured by organizing the documents, which is still relatively easy. It takes a lot of computation, but it's so easy to understand, and now we map all the words to to this map.

B

By saying uh I have a list of all the words in in my uh document couples. uh I take the first word and they say in what documents do you occur and because every document is positioned somewhere else on the map? I get this representation of distributed pixels uh if you want in my void vector. So it's an extremely simple mechanism.

B

uh It's sometimes even a bit scary, how simple it is and in the end, we so to say post-process uh the the the bit vectors that we get and standardize them to become uh our sdrs. And then we have uh created a number of um a number of basic computational functions to basic, basically access these sdrs and to combine them.

B

And here again to make the point all we do with conversion of words into sdrs. uh We call this basically lexical semantics, so this is really just about words about what words mean what they represent, uh what other words are close and so on, but uh we are not talking about grammar.

B

We are not talking about uh building sentences and so on, because there, the sequence of the words how they occur, is of importance, and when you hear sequence learning of patterns, you immediately think, of course, about cla and that's the reason why we think that linking up the two systems might be extremely uh interesting.

C

Could you talk a little bit more about how you normalize the sdr to get the you said you look at for each word. You see all the documents it appears in and.

G

C

Aura of all of that or the some of all of that yeah and then you but and then so the most common words will get. You know a lot.

G

B

C

Less frequent words may have very few exactly uh so. How do you.

B

Formalize that to get this, uh we don't normalize it in general. What we try to do is to make the training corpus that big, that the less frequent words have a sort of minimum representation and all the higher densities we basically calculate them down uh to whatever level we need them. So, in the current api, uh we compress that to five percent, more or less uh the thinking behind that was that uh the cla wants to have two percent.

B

So if you want also to throw away a number of bits, uh then we give you some sort of three percent space, but it's just a a way of um of agreement, so to say and of experimentation.

B

uh In fact, we have uh uh had recent results where we worked on the way how we do this, we call the specification of the of the sdrs, and we have found a new way of doing this by taking into account the locality of the pixel, also not only the the number of pixels per position, but also other many pixels around. uh Then it gets another weighting and so on, and this obviously improves also uh the the the the discrimination capacity of the system.

H

Francisco so I noticed working with your api that the sparsity of the terms returned is either one or two or three there's no like one and a half.

B

H

B

Percent yeah, that's that's the way, that's the way how we uh basically compute it. It's it's a it's uh due to the speed, so we want to do this dynamically and, as we, the whole sdr computation internally functions with integers, and so we did not want to introduce, doubles or float and to slow down everything.

B

And so we sort of made the steps and another argument is atoms, do the same. They have their quants. There is no between uh no, it's just for performance reasons uh that we decided, and this um can be, can be changed at any point. Yeah another.

C

B

C

Be two technical questions, so when you create the cajon map, um you know there are a lot of parameters you can tweak and all of that- and you know we face this all the time and with different parameters, you can get a different.

G

Mapping of the documents.

C

Onto the map, so how do you know when it's a good mapping.

I

C

Choose that is that sort of you go through the whole process, or do you have some objective way of.

B

Well, the difficult question: yeah yeah, it's it's not easy to say I mean mostly it's trial and error.

B

uh What we try to have to use up the space best possible yeah, so we try to map it in a way that we have a sort of an even distribution and in fact this is reduced to the selection of the features that you take for creating the map, which is so to say a tough thing to do so uh in the in the in the first generation of sdrs that we created, we have in fact used again traditional tf idf stuff to calculate which are the most important words to do this in the more recent versions.

B

We have used the older uh retina to do this, because you can also use the retina to find out, uh which are the most important words to reconstruct the sdr of the document and that uh improved.

A

B

The the the semantic specificity uh extremely so it's sort of a boost trapping that you need to do at some point. You have to decide. It should be that way and after having done you kind of revisit the whole thing and you refine it step by step- and I don't know where this leads, because we are doing this now only for a short time.

B

So this might really be the way how, in a more general way later on.

B

Okay, so the the colors are a bit weak, so the first fundamental function that we apply to these sdrs is similarity. That's in fact, the reason why we made it was to to be able to calculate similarity, um and here you have the representation of cat and dog and uh uh on the lower part, you see the overlay of two and normally you should see the the the blue and the red dots uh uh separated and black dots were a dot between the two and in general terms, I mean just take this as a symbol.

B

uh You have again the areas that are specific for specific aspects of the word, so everything that is related to home, family and whatever is down here uh here- are social aspects, free time aspects, biology aspects and again we don't know we don't care to be true what the single dots mean, but what we can say that in the different areas of the sdr, that's what I meant before with the locality of the representation.

B

You have certain type of aspects that are bundled so, but there are also aspects for example. Here it's just blue. This is just red that are specific for the one or the other, so typically two words regardless how far or close they are together. The shares a certain number of aspects and they diverge in certain.

H

Aspects so when you have a term like apple, that's overloaded, meanings, that means it's going to be denser and there's going to be computer aspects in the sdr, as well as vegetable aspects,.

B

Yeah, it does not necessarily mean it is denser, but usually it is denser if frequent term and in case of apple, it's really frequent. Specifically, if you look at the definition of wikipedia obviously- and the most important aspect of the word apple in fact, is the computer company yeah.

B

So, but I think it's realistic if you take the average appearance of the word apple on the planet, or at least, let's say in the northern hemisphere, I'm pretty sure that typically you're talking about the computer company and interestingly, a completely different sdr would be apples, the simple plural, and if you remember, we had this yesterday.

B

These are two words. These are two words because apples in the plural does not contain the the cluster part. So to say that relates to the computer company, but because there is only one computer company, like apple.

H

So that I think that's an interesting point because in the um so I just got finished reading pinkers to language instinct.

G

H

In my head that these words are very ambiguous and how they're represented in your brain um there's, there are separate sdrs in your brain for all the different contexts of each word and the way we're representing them is is like squishing them all together. So just the fact that apples removes the computer context, because it's plural but still retains sort of the biological context, is interesting.

B

And cooking recipes.

H

Yeah, whatever yeah that stuff so it'd, be great. If there was some way to separate the contexts out into separate.

G

H

It'd be much more meaningful, but how do you do that with when you all you have is a term or a word.

B

Well, it's very on one level: you can do it by doing uh set arithmetic with with the sdrs, but this is only valid for static representations. I can show. I could even show you right now, but I need my glasses.

B

On our demos, you can.

A

Here, for example, notice the wrong one. I take this one, ah it's it's even open already.

B

Yeah so never mind the grey boxes, they are just artifacts on the left side you have the word apple and on the right side you have iphone and if you could see the colors, you would see that these two clusters are pretty heavily overlaid in and if I compare.

A

B

To apples which is the plural, then it precisely lacks these two clusters and, if I put in another word, oops like computer.

B

You see that these clusters are represented by computer. So if I take the word apple here and we we do have an expression evaluation behind this and we subtract a computer.

B

These clusters disappear yeah, so we have taken away all the points from the word apple that are in common with the word computer and if we now compare it.

B

To apples we see that except of this cluster, uh it looks very similar here and here all these distributed clusters are very similar and in fact I can tell you what this remaining big blob is here. uh It's uh it's about apple, the record company. So if I subtract.

B

B

The cluster disappears yeah, so and now you see that the remaining of apple in singular is more or less matching with apples, in plural,.

B

Yeah so yeah, this is basically the expression engine. It's one of the of the two important engines that we use for manipulating these sdrs. The one is the similarity engine. It's basically to compare that and the other one is the expression engine that allows you to do an ore and uh sub and and so on, uh with with.

A

The with the sdrs yeah.

C

Are you subtracting the sdr of apple and the sdr computer, or are you doing it before.

B

No, this is, this is done the normalization.

B

I don't like to call it normalization, because it's not really, but the re-specification here is done dynamically after the processing has happened, because we want to take as many pixels as possible to do the correct operations and only the remainer is then specified to become the two percent at the five percent story: level: yeah yeah internally internally. uh The system always works with with the complete sets, because we don't need the the sparseness factor as much as the cla would need it.

B

Yeah, it's interesting even with various passwords, the still functions yeah, so it you have really to go and and look for extremely rare stuff, uh uh specifically due to the wikipedia, of course, and in practice, whenever you have, you need words that are too sparse to be processed in a good fashion. You simply add training material containing these terms, and you add to the pixels, but typically you want to use a retina in a specific environment and you train it for the environment, because the other way around is also true.

B

uh So if I could, for example, I want uh to build the retina for the agricultural.

B

Industry, let's say I would not be interested to have all the time, uh apples and pears from uh from the computer business uh mess in my in my uh things, so I would like to train the way that there is not that there are no uh pixels about apple computer industry related stuff yeah, and with that I can improve the signal to noise ratio basically uh dramatically.

B

I mean I can show you, there is another another approach we have used in the second demo, so this is basically just thought to do this operations and investigate what the outcome is. There is another thing that basically tells you that gives you the context terms for a given term, and it does this by converting the term. You said you you have given into an sdr, then it looks for similar sdr's and it converts them back into words and returns.

B

The list of words, for example the word source- and I get many different words here and what I also do get is the system can detect different clusters of meanings in the representation. So here, for example, there is a cluster that is named water and if I select it, I get context terms like water temperatures, amount, carbon and, and so on that are related to the concept of source and water.

B

I get the context terms like this and if I select computer, because there's also a context between source and computer, I get uh proprietary software, gpl development tools and so on a whole other set of context terms.

J

Yeah so yeah just go back to the previous. Damn thing when you did that expression.

G

J

And can you put that in as your term and get an sd.

H

G

B

J

It was basically apple yeah. I.

B

J

In the in the previous thing, when you did apple's apple sub, yeah,.

B

I'm about to type this.

J

Yeah, but in the in the sdr in the api you know, if you make an api call, can you put that in.

E

B

J

B

Yeah, so basically, uh but it's this is just a small change, because uh right now it expects terms, and we just have to change it- to expect the expressions and then it should work uh yeah. We could even try this again apple here. You see the most important context is obviously the computing context, and here I find the different. So there is a disambiguation function. That might also be interesting uh due to the fact how simply how simple the disambiguation actually works.

B

What we do here is, as I said, we convert the word apple into the sdr and we look. What are the most similar sdrs of words to the word apple?

B

If I get my list- and I take the first term of the list- and I subtract it from my initial term- that I've searched with so, for example, I subtract apple and the subtract computer.

B

The whole sector disappears, so there is, as as you remember, there's this whole blob. That is not available anymore, and now we have a different list of context terms, namely the one about the music industry yeah. So if I do the same as before in the subtract.

B

Beetles then, I have only two uh two clusters so to stay left and now we have all the fruits and salads and and so on. So the same thing works basically with the context terms, and the nice thing is that uh every one of you who has been working in the nlp field knows how complicated it is to do this automatic disintegration of terms.

B

Here we have a very simple feedback, loop, so to say, feedback. Subtract subtraction allows us to disambiguate any word, so you can type in the the example from from my title, for example, the word jaguar and it finds that jaguar can be a vehicle can be an animal, can be a character, obviously from cartoons or whatever, and I can do again the same thing I can also restrict now. I said I want just nouns to be complex terms.

B

I could restrict this to be, let's say adjectives uh and you see the typical it was to your question before uh here. You have typical adjectives that are related to vehicles, for example yeah. If I select animals, I get.

B

Terms that are related to animals, obviously, and so the nice thing is, as you see, you can get pretty sophisticated answers to former complex nlp problems by applying very simple fundamental set theoretic rules.

B

Basically, to this whole thing, and in fact what my company is trying to do is to commercialize this aspect, so we are really just focused in improving traditional nlp, so to say functionality by applying this very first static use of the word sdrs, so just to say how good the the methodology seems to work, and that's in fact the reason why I think that feeding text into clas by converting them in this representation.

B

I I expect this to work pretty well and, as we have seen all the very first experiments already showed that it's obviously not so hard for the system to learn. Yeah.

A

Okay, let's see if I have.

E

Are you, are you automatically filling out the parts of speech or result.

B

Yeah, I mean yes, we are finding them automatically, but with standard standard tooling. Basically, we use a stanford tagger as far as I know to do this, for example, but, as I said before, uh we are doing this uh we're doing this in the sense of annotating the terms that are in the retina, and we do this to avoid to annotate every single document we want to process later on, instead that we create, for every word, a part of speech tag.

B

In fact, the part of speech is also an ambiguous issue, so the very same word might have several, let's say, part of speech tags. What we do with that in the retina is that we try to find which of the occurrences correspond to which correct part of speech tag. So you will not get a single answer by saying I don't know, I want only nouns, you might have words that also co-appear as verbs, for example yeah, but the pattern that is associated should be the pattern that is true whenever it's used as a noun.

B

So that's basically the approach and the same thing would be true, for example, uh for doing an annotation of locations of named entities, and so on. You could basically restrict say, give me all the context terms, but only the terms that specify a location or only terms that are related to a person or whatever yeah. So it brings in a new uh usefulness so to say for the old-fashioned annotation job, to say uh by being applicable to more different areas.

B

Okay, another another practical use of the of the sdrs that we are doing is that we are using that to create so-called document fingerprints. So we decompose a document. In all these words, every word gets replaced by the fingerprint of the word, and then we ore all those word fingerprints together and we responsify it and in the end we have a fingerprint.

B

We call fingerprint, which is an sdr for a document, and the interesting thing is that all the magic we could do with the words we can now do uh with the documents in a very same way, so we can disambiguate that we can look for similar documents. In fact, that is the core of the search engine that we are about to build for documents and uh just.

C

So um each document also has a location in this map. These are not the same documents.

B

Oh, so it's a different: now we have created our retina and using this retina we can now pass any document and replace the words in any document with the fingerprints and create a fingerprint of this document. Okay, I'm.

C

Just here, if you took one of these older documents that are already processed, how would that compare with I mean they would probably be with the most common cluster or something.

B

um No, I think you could. You would not be able to differentiate between a new new listing document and the training document. Okay, that's um yeah! If you would look at the whole collection, you would see that it nicely fills up uh the space, but that's the only specificity of this yeah yeah with the document fingerprints, as you can imagine, we can do many things on the document level where there is a big business interest in doing so.

B

This is the set theoretic operations that we do with the expression engines. I've showed you that before and this to do basically works in a very straightforward way. We get a collection of documents that we want to index. These documents are all converted into fingerprints.

B

When I do a query, I don't formulate it by using a query language, but I take a snippet of text. Another document- and I say I want to find all the documents that are semantically closest to what I'm showing here and then I get the list. That is ranked according to the distance measure between my query document and the documents I've indexed before, and the nice thing is that I can use another feature that has been in ir theoretically already for a long time, but it has been uh pretty complex to create it.

B

I can ask the user to specify what kind of information he's interested in so let's say I have a repository of technical scientific publications, and my user gives me five documents about medical aspects, medical issues here, so he's a medical doctor and that's what he's interested in he gives me. Let's say five documents that basically he likes.

B

I take the documents and make a document fingerprint of that and instead of ranking the result list after the search against the distance measure. To my query, I can rank it to, according to the distance measure, to the fingerprint of the actual user profile so to say, and this allows basically to make document searching uh uh using two uh uh strategies, namely by refining the definition of what I'm interested in and by defining. uh What I'm looking for, but just to to show you that this functionality is already so to say.

B

uh uh We get this already from the static sdrs, and all of that I'm just saying to uh put the emphasis on uh that. We believe that there is a lot of information captured in the sdr and that the cla should feel well in learning about this information.

A

D

So I imagine, if you take some very narrow field of knowledge, you can make your parameters work on that very narrow field, absolutely.

A

D

Well, so that would be probably a good uh way to determine how well the system will work in any other fields. It would be very easy to assess them.

B

There is another way of doing I mean just to evaluate uh so to say the contents. uh That would be one aspect to become more and more specific, with the training training coupos. Another way is to uh uh change the resolution of of the actual sdr.

B

I mean we have uh chosen to use 128 by 128 because it somewhere in the middle of we are able to actually compute that because it takes quite some time to process the retina to in this resolution, uh and it makes the use of the of the word sdrs more or less easy, so the bigger they become.

B

Everything gets slower basically, and we try to find the minimum resolution that is necessary to get a certain quality out. But if you have a very broad field, where you need a lot of resolution, then you would need maybe to go to five, twelve, five, five, twelve or even larger uh yeah, so we basically scale whenever the processing becomes cheaper.

B

We can scale this uh uh accordingly yeah, um maybe uh just uh I would just like to make a short interruption here and ask my colleague daniel to come over and uh just oh, it's another question.

K

How difficult will be to reverse this process from document to fingerprint back to from fingerprint to the original.

B

Without using the retina or with using the retina, I mean in principle it's not possible because after having done an or of the bit vectors, you can just say another vector might be part of this vector, but you cannot actually recalculate what it was built off this. This is uh lost so to say: okay,.

B

Okay, uh yeah so uh daniel, who came with me uh from vienna. They are the the first users of of our of our word sdrs and his company phase. Six works in the e-learning area uh and what they do is that they provide uh english learning uh um uh functionality so to say, and they have to face a certain number uh of problems so to say, uh and he wants to generate uh or create the next generation of e-learning systems.

B

And we are now working together for some months and they use the sdrs, which I find very interesting to individualize the training environment for a specific language learner. And maybe you just give us a short uh overview on sure on what you and what you're doing that. Try to find.

G

You another link.

L

Okay, well, hello: my name is daniel. Like francisco introduced me um phase, six is the company um we've been uh providing educational software in germany, mostly for about 10 years, and we're now looking to make the software smarter? Our ultimate goal is to create a digital tutor which fulfills, who fulfills many of the features that our human tutor will fulfill and our principal aim.

L

Our principal goal is to personalize education, because every student learns differently learns at a different pace, has different strengths and weaknesses, and the school system simply does not provide the kind of individual and personal service or tutoring that the student would need to fulfill his full potential, and so the situation in education is that most kids do not live up to their potential and get frustrated about learning about school, and so I think technology can really help a great deal with personalizing education and we're specifically focused on english. Let me get my water quickly.

L

We're specifically focused on english because it turns out that if you want to have a high quality product and if you want to really personalize along many dimensions and not just superficially, you need to go deep in the vertical. You need to have a lot of knowledge about the actual subject that you want to teach, and so we we picked english language as a very uh as a subject.

L

A lot of people around the world are learning and which is also a vehicle to other things, because language is the vehicle to knowledge, and so, if we want to personalize the learning and teaching of english for every individual student, um there are several aspects of things that need to be personalized. And I'm going to talk to about two of them where the fingerprints, the semantic fingerprints or sdrs, we call them here are actually extremely useful.

L

And those two two aspects are number one word sense: disambiguation in a way that automatically maps any occurrence of a word to the dictionary definition from pedagogically optimized dictionaries, that's the one thing and that we need to do and the second one is to have dynamic exercise.

L

Content to show the student exercises which actually engage the student, because they're interested in the topic and every student would be interested in different topics, and so we we're going to use the similarities in order to find the type of content that the individual student would like to engage with, because countless reach. Research and studies show that if the student is engaged, they learn many times more and they remember many times more than when they are bored or simply not engaged and not interested.

L

So those are very central two features. I will start with the word senseless invigoration.

L

G

We'll go to, let's see what is it uh so many things that we don't need, chico, I'm looking.

L

For your demo: okay, okay, okay, sorry! So I'm going to the sub demo! Here we go.

L

um And I'm going to make use of something that francesco here showed us earlier, which is the capability to do sentence level? Well, you talked about document level uh sdrs, and that means we can also do sentence level sdrs one. Second, there we go, I'm going to put here a the whole sentence. That includes the word jaguar, so we're sticking to the example jaguar we have here. The maya also includes some mentions of the jaguar in their mythology and what the system created. Let me scroll up.

L

A little bit is um a fingerprint that contains the semantics of the whole sentence.

L

It's resparcified, so it's adding up all the individual words, let's responsify them, and now I can check and put in jaguar vehicles- and I get the similarity of 9.2 vehicles was the first cluster that the system identified in in the context words- and if I compare this to jaguar animals, which is the second cluster the system automatically identified, I get a much higher similarity score.

L

What this tells us, the jaguar in this sentence very likely means the animal and not the vehicle, and it turns out that semantic context on the sentence level is a very important determiner of the actual dictionary definition of the word to disambiguate it. So we have. uh uh This is something which, for us is quite intuitive, because we also capture as humans the semantic context of the sentence in order to guess what does jaguar actually mean here now. Why is word sense, disinfiguration, so important for individualization of education?

L

Well, if you think about it like francisco said every word, almost every word has many many meanings, especially simple words. If you think of the word party, it can mean the fun party where you go to celebrate, or it can be. The political party and the student who learns english would learn those two different senses at very different stages of his education.

L

The the one sense could show up only two or three years after he learned the first sense and if you want to automate the process of serving up educational content and exercises which are relevant to the student at his stage of learning, you need to automate the word sense, disambiguation and map it to the dictionary level.

L

So I have here um a little um demo. I don't need this one um one. Second, where is the new one that oh here we go? So let's take the sentence to fire a worker and um the the word that we want to disambiguate is fire. There can be. You know you can fire an employee, you can fire a gun, you can fire ceramics, you can fire somebody with enthusiasm and it turns out that um the semantic context is not the only factor that we need to consider in order to disambiguate it.

L

But it's one important one, and what we do here is to map it back to dictionary definitions. We have a strategic collaboration with oxford university press. They believe in our new approach. It's basically the next generation of english learning and so we're using their dictionaries as as kind of information sources. In order to understand what are the properties of different word senses sentence level. Semantics is one of them. We are using the sdrs for that, but others are grammar structure.

L

For example, um you fire missiles were fired at the enemy, so something was fired at somebody is a different grammar structure. Then you fire an employee or you fire somebody with enthusiasm yeah, so you have different grammar frames or verb frames surrounding the head word that you want to disambiguate, which you also need to consider.

L

And thirdly, you look at collocations and you look at domain level context, so document basically document context. If a whole document talks about war, it's much more likely that fire means firing a gun than firing an employee, no matter what other words you're used to so again.

L

The word census integration extremely important in order to serve up those sentences and examples to the student which contain the word in the sense that the student is supposed to know in on his level of learning and not something which will only supposed to show up a few years later. Traditionally, this has been the turf of publishers. Publishers spent, spend educational publishers, textbook publishers, elt english language teaching, publishers, spend considerable resources and and pay very expensive editors.

L

In order to do just that, in order to write books and texts which introduce the which manage the progression, it's called progression, so the progression of which word senses. Do you introduce at which stage and then selecting the content for that so far, there has not been a way to automate this and, I think, we're. Basically, um we can come very close to automating this and revolutionizing a little bit the publishing industry.

L

By doing so so this was the one use case for the uh fingerprints, and we we've seen this with the example of jaguar and the second one that I talked about was um to get content which is interesting for the student, and we can. I will demo this right now we have a corpus with one billion sentences.

L

uh This corpus is composed of various sources. It includes the complete english wikipedia. It includes 200 000 books from literature. It includes the british national corpus. It includes every kind of corpus that we could get our hands on and that we're allowed to quote, and so we have one billion example, sentences from which we can generate the content and and select the content which is appropriate for the student in order to generate exercises. I'm going to show this, so let's take any word that the student would need to learn.

L

Let's take any vocabulary listing to door to achieve.

L

And you get here examples of example, sentences of the word to achieve you can additionally filter them by difficulty level. This is this common european frame of reference level, and I can now also say I'll put in a context. Let's say I put in the context politics politics so.

L

It's thinking, and so it just went through one billion example sentences, and it gave me any number I just I put 100 here as the number to return. It can give me any number of sentences which include the word achieve, which is the word that the student needs to learn and talk about a certain subject in municipal elections for local government boards. The labor candidates were achieving a significant share of the vote. They can apply other other methods to filter the resulting sentences even more and now. If we put this back into the similarity engine.

L

And it has very little to do with apples, this sentence yeah, but let's see how much it has to do with politics, 57 extremely high similarity score, and you see that the resparcified version of the whole sentence, as the sdr has similar clusters.

L

That's where, even though the word politics itself is not included- and this is how we can filter one billion sentences by all kinds of criteria, including subject of interest which will keep the student engaged. Those are two examples, word sense, disintegration and filtering.

B

By the way, there was the the demo he showed before was using the the google spreadsheet that actually interactively calls the api. So this is for some of you. This might be interesting if you want to sort of programmatically access the api. That's a a nice way to do this in an online way.

L

Actually, yes, I can. I can show this for all of you. This would be extremely interesting we're, so we're really using uh look at this. This is the context terms that were on your website, we're just using the api call here we have completed.

A

L

So we have apple sub computer. I can put the term here the term two and then I get the similarity score, which is uh we're using um yeah apis to to do that and I'm happy to share them. Our programmer basically created a whole set of sept apis and integrated them into google spreadsheet, because then you can do overviews and computations and you can combine things and you can do all kinds of things to play with it and to evaluate the results.

L

E

A

L

Was uh the showcase for your technology? Okay, great.

F

B

I just have the two three more.

H

I need to reconnect to the hangar.

A

Yeah, I know he sort of killed it.

B

A

So here we are.

A

Again, oh my god,.

H

A

A

B

uh It took a bit longer to give all the the details about the sdrs.

B

What is really important to me is to emphasize on the fact that we are very interested in finding out how the two systems work together, I mean they have been.

B

Our sdrs uh have been created in fact, are modeled uh along uh the the development so to say of uh of the cla theories in in over the last years.

B

uh So it makes uh perfectly sense now uh to sort of use both functionalities together, as I said, the sdrs by it by themselves allow to do a certain number of static understanding from language, but as soon as we want to go to the uh so to say that the grammar level, if you want, we need a backend that actually manages the sequences, how they occur in real text and so on.

B

So we want to create so to say or extend the functionality by being able to learn sequences and just in theory, uh we should be able to generate text, so we have had the very first example of generated sdrs according to the training that happened in in the cla uh and with when we have seen that so to say what comes out does in fact make sense compared to the other, to the other meanings that we have seen there.

B

So I think that by doing we can use several so to say sdr outputs on the quotes that occur. uh If we use uh the prediction uh functionality, uh we can basically do what I would say uh the the perceptive talk. It's basically understanding uh what a certain text is about by guessing what the next word, basically in a sentence would be and to use this functionality to give an alarm as soon as the interpretation of what this sentence could mean differs from what actually was said.

B

So a way of using this would be, for example, in speech recognition if we are because you have, if you analyze speech, so to say on the simply on the auditory level, you reach certain moments where it can be ambiguous.

B

Has it been that word or that word that was said? If you actually take the two possible words that it could be, you create the sdr representation of them, and you compare it to the sdr of what has been said before.

B

I'm pretty sure that the match will be pretty good and you can give the the speech recognition system good hints on uh what the actual word uh was, that it was uh uh hearing so- and I think this is very similar to the example of two persons standing in the discotheque with a lot of noise, but they still can follow each other's conversation because they actively interpret what they would expect to be said. Next,.

B

If we take the layer output, for example, and we try to generate sdrs from the layer output, we should get something that describes what the input was in a way.

B

We could use the concept of generalization so to say to create text that generalizes or describes what was at the input- and this is very speculative, of course, but a potential motor output could be used, for example, to create something like dialogue systems where you respond using what using a cla you respond to a human, for example, that states something in principle compared to the words that just kept the word sdrs that just captured the meaning of words.

B

uh By putting the two systems together, we could, we could think of working with statements with arguments metaphors and dialogues, and things like that.

B

This is an aspect that is oops very important to me is the idea that the nlp environment, a part of being maybe interesting for some of you or not, could be a very interesting field for the evaluation of the algorithms, because one of the problems, if you want to know if a certain improvement on the algorithm actually performed an improvement, is how do you measure and what do you measure so typically, what you do, if you want to measure the performance of an algorithm, is that you define a standard set of let's say, question and answer, and then you give the question to the system and you look if the answer that the system is produced are are the same answers as you have been expecting.

B

If you do that, let's say 400 answers and you count how many times it was right. You get a percentage uh on how good the the algorithm works. uh If you try to do this with data from let's say images, vision, videos or so you have the problem that you have to prepare the data you have to tag it. You have to create a collection of let's say pictures where you have manually identified things that you would expect your algorithm uh to detect.

B

If you do this with, I don't know uh the the heating and temperature measures. uh This is very hard to create measuring sets using this this, this sort of data- and I think that using language to create a measuring corpus is much more intuitive. It allows you to so to say, uh see much faster if uh the there hasn't has been an improvement or not, and in our case we have even by using the sdrs, we have even a way of numerically measuring this phenomenon, so we have a reference text.

B

We pass it through the retina, we feed it into a cla and we look what we have predicted. So, every time we get the prediction, we compare uh the actual word that there is that that should have been found and the word that was predicted and we can use the api by computing, the the the euclidean distance between the two sdrs and we immediately get a numeric value.

B

That tells us how good the system performed and, as we have already sort of started, doing such a collection with with the kids tales, which is a small collection but which I think would be a very good first step in setting up a setup like this, and this would basically allow you to measure any variant or any improvement that you might add to the to the cla code and immediately measure. If it's actually improving uh or not.

B

And the same thing is true for the for the for the sdrs themselves by the way, because for a given version so to say of new big, we could play around with different ways of creating the retina and measure again if it has actually helped the the cla or not in in doing so. So this.

B

I think that text data is very well suited for that, and maybe a more philosophical approach to this. If you want to test the cortical algorithm using some real world physical, so to say data, you basically have to evaluate the whole embodiment of the system if you want, which makes it very complicated.

B

If we just limit ourselves to this level of the word where it appears so to say in the human brain, we don't need to take into account the whole embodiment part in the measurement loop, but we can stay so to say in brain to brain, uh to evaluate it, and I think this could be very efficient, and this is just the last uh aspect that is very important to me.

B

uh Is that I think the topology uh and the way how you link up the sdrs uh to the network uh is of central importance here, and I think that um by keeping the topological structure of how the the sdr arrives at the network has one big advantage, it implicitly tells you how to create the hierarchies.

B

If you sort of uh link every input, potentially every every sensor bit to every input, there is no.

B

In my understanding at least, there is no obvious hierarchy that you can find there, but the fact that we have uh in the lower corner- let's say aspects as we have seen about uh leisure, for example, uh by saying that in the next level there is one bit that corresponds to these nine potential uh representations of leisure aspect uh makes a lot of sense and all the uh the the the thoughts behind the the hierarchical structure and so on uh make sense again in in this context so uh yeah this, but it I think this has to be found out experimentally.

B

But I have a very strong belief that uh we could use this sort of trivial uh hierarchy hierarchy that comes from the nature of the data that we that we work with with the sdrs okay. So any any more questions.

J

I am francisco um okay, if you, if you include the um the the depth of the columns in in the first layer of cla, then you're actually getting that already, because you have 60 000 outputs generated from 16k inputs and they they. What they do is add in the sequence information. So they would. They would be recognizing phrases, combinations of words and sentences. So.

G

J

Recognizing sequences and the next layer up is actually is going to identify those sequences.

B

Yeah I mean the point is that I think already here in the in the beginning, to use the spatial puller to decide for himself uh which of the nine bits uh is obviously the sub pattern in this area that occurs more frequently than other sub patterns.

B

I think this it would make sense to use also the spatial polar in the in the first level, because this is in reality it's much bigger. It's not nine pixels, it's uh much bigger than that, the the areas that you have and in the experiment. As far as I know, you have bypassed the the spatial polar uh in the first step and in principle this contradicts the concepts that you should use the spatial puller, basically as a part of every layer.

B

uh As far as I understood it- and I think that uh by it is that the spatial pooling that it makes in this sector means something different than the spatial pooling that it would do for the whole sdr, because in the locality it basically just says which of the sub variations are more frequent for representing a feature within, let's say laser than any others, and it would already be able to put some of the learning at this very low end and potentially prevent confusion on a higher level. That's, but it's just a a guessing.

B

So to say I do.

C

Yeah, I think you know adding the spatial pooler will help a lot exactly like you say you can pick up on the common.

G

Patterns and the.

C

In the topological retina.

K

C

Have the other thing that fergal was talking about? Is then, if you have the temporal pooler, you can now learn sequences of those and represent sequences differently. So if you have dog bites men that will have a very different representation from man by its dog, absolutely right and.

G

That's what you want, whereas if you just order tickets, you don't get.

C

That distinction, so you can get the best you know you can then stack it up and get more and more complex concepts.

B

And it will be interesting to see what actually appears at the higher levels yeah, because I I don't think that it necessarily will be a sentence. For example, I mean these are our concepts about how the hierarchy works.

B

Nobody would have guessed uh before. It was measured that the different angles of the of the edges that you see in the first visual layer end up in differentiating uh faces, for example, yeah yeah. So I think that uh it would. There would be very interesting to to play around with this aspect and to find out how it can be used.

D

More, maybe hypothetical theoretical question references you mentioned that you encode a particular document into x and y coordinates.

D

If you extend that algorithm to make a mapping into 3d dimension uh space, how would that affect the efficiency of this approach?.

B

That sounds like fundamental research. Yes,.

D

B

Just basically to see what comes out.

D

B

I mean to tell you the truth: it is already extremely costly to compute 2d map in of this size, uh so to do that in three dimensions. uh This is.

D

Why I'm mentioning the very narrow fields of knowledge.

B

Yeah, but still even if it's narrow, you need a certain resolution of the map, and that is what makes the the processing.

D

For example, you uh scanned millions of words very.

I

Narrow something.

D

Like legal system, maybe just some parliament or a very narrow part of medicine, medical knowledge.

B

Yeah, but I think, even if it's narrow, uh even if you I don't know you, take a hundred words out of the legal system, you still need- let's say five thousand words of common english- to embed them, to make uh reasonable sentences and to have those five thousand words mean something reasonable in the representation you need a certain size, yeah.

D

We could take some fundamental physics, maybe like.

B

Well, you you, can you can experiment with that? I'm not sure that in the end, it's worth the effort too.

B

I

So because the the algorithm that you've designed for your sdrs is about a clustering of the meaning. Behind all these words, I see a lot of potential for translation software.

B

Yeah this this was in fact another example uh to use translation software, so you could do, I would say, a dump translation system that basically just tries to map one one word to the other. That would be doable. I would say just with the sdrs themselves.

B

This could be useful to say I just need to find out what this webpage is about. I don't want to know exactly what is written. I want to know if it's about chemistry or about aviation, but if you want to do real translation in a human sense, then you would actually need a sequence learner coming afterwards and that could be.

I

uh You could create a similar document clustering, so so you would actually be able to map the document clusters onto each other from one language to another. uh You have you with your 2d representation of all the documents. I mean what.

B

The approach I would choose I've thought a little about this would be to create two retinas of two languages and then use the cla to learn about the two retinas and to see what corresponds to what substandard meaning level.

B

I mean, I'm pretty sure it ends up in enormous calculations in the end. But uh it's I think that would be the approach that is worth trying. Yeah and I I believe I mean we have to investigate the behavior of the sdrs with the clas further to sort of make a better strategic planning for that. But I think just out of gut feeling this would be an approach that makes sense.

E

So if you, if you um gave it to wikipedias in different languages, um would the article about dog look the same like visually in german as it did in english, so that you would know if.

B

E

In front of it, they would.

B

um Theoretically, it should.

B

That's the problem: the problem is that the article in wikipedia about dog in english contains different texts and different content and different opinions than the one in german, for example, yeah, and that's the reason why you will never be able to create two retinas that in two languages that actually look the same.

B

If you would have a lot of money and if you would say okay, I take uh 10 000 english wikipedia documents and I have the best professional translation units translated exactly it should be very similar yeah and that could be another approach of uh creating uh translation, uh potentially.

C

But that could also there's no particular reason that that particular xy coordinate is is um was chosen. I mean any rotation of that would be fine.

A

C

I don't think you can just say it's going to end up in the same xy location, you know sort of depends how you initiate.

B

It's it's very sensitive, so if you add to a collection of let's say a million documents, you add five documents and you recreate the semantic map, which is a competitive learning algorithm. The map could look completely different, but really as as if you would have chosen a complete other set of documents. So this is practically it's an iterative algorithm. So if you change, I don't know two degrees in the first step, the 100 step might end up completely differently, but for our for our use, the actual topology of the map is not relevant.

B

The only thing that is relevant is that we map all our words to the same map yeah, it's the same, as I mean it's in in in humans. It's the same. We have a completely different understanding, I'm pretty sure about every word that we know, but still we share sufficient contexts uh that are sufficiently uh similar so that we can engage in a conversation. If we would not share any context, we we could just talk to each other without transferring any meaning yeah.

B

J

That you can put in an entire sentence.

G

J

Get it so you can get a fingerprint for any string effectively. Okay and- and as I was thinking that I realized that the the sdr will have a kind of an invariance that comes from the sentence, so the um so it's as you build up the sentence. If you imagine that there's a higher level, that's looking at the sentence, it will actually have an sdr that would correspond to that sentence right and that, as each word was added that that word.

G

J

Would become closer and closer to the best of the full.

B

J

And then, eventually for the full documents, this could be a way of measuring by the way, uh and that's the the sdr of the of this of the sequence so far, and that would, for example, what you could do is you could connect up the the sent the sequence so far learned and get that the retina for that back from the sdr for that back from set so to compare.

B

The static with the dynamic.

J

As they are, what would happen would be if you were talking about politics or if you started talking about jaguar- and you know we said the jaguar and then dot dot dot then it would be. It would contain fingerprints for animals as well as fingerprints for cars, but then, as the next word appears, that would improve so there's a there's kind of an invariance that that changes throughout the sentence in the brain is obviously helping us to disambiguate, because the previous sentence was about was about.

G

J

So we're expecting jaguar to mean cars, we're not expecting it to be the cat.

J

So I think that the sentence level sdrs would would be invariant across the across the conversation so far at a very high level, yeah and that that would be transmitted possibly downwards and help to assist the upwards disambiguation of the different components of an sdr for jaggers, for example. So it's quite possible that we're boosting the we're boost we would be boosting the recognition of animals if we had previously been talking about animals yeah.

B

I think I think pressing.

J

It for surprising for for cars, so.

B

Something like that is probably the mechanism that drives us humans to create things like ontologies, for example, is because we want to group stuff together and the the the interesting thing with ontologies is that it's pretty easy to agree with others on a certain ontology. If I actually explain it to them, and also in in the moment when we try to teach somebody something we tend if they are grown up, I mean if they are young, we give them examples of context.

B

We distribute context for different words to explain, but people who have already a certain understanding of language. We rather transfer a part of the ontology tree, and we explain the first and the part of the ontology tree and the person then can itself regroup um the the sub patterns into it. But again, this will be interesting to see how the how the cla handles this and what we can create by fingerprinting the different parts and comparing it to how the cla handles it.

B

H

Do um otherwise be here for the demo right now.