DevoWorm Lab Meetings, 10 Jul 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: DevoWorm (2023, Meeting #26): Genomics of differentiation, Thompsonian grid divergence, DevoGraphing

Description

GSoC updates (nucleus segmenter pre-topological data, DevoLearn). Open problem: genomic signatures of differentiation trees. Protein space navigation, biology is more theoretical than physics, Thompsonian grids and connections to development. Papers on directed graphs, transformers, and topological data analysis. Attendees: Sushmanth Reddy Mereddy, Bradly Alicea, Morgan Hough, Himanshu Chougule, Susan Crawford-Young, Richard Gordon, and Jesse Parent.

A

Well, yeah welcome to the meeting. uh We have a Manchu here and Morgan dick and Susan um I guess want you. Can you give an update on your things? I, don't know if uh six month will be here today, but.

B

B

uh Is my screen visible.

A

B

Yep uh yeah, okay, so I'm still continuing my work from last week uh about like uh the nuclear segment and the membrane segment of uh so basically, like my main issue with collab. Was the memory had like a bad? The free resources were not there. Like I've finished. All my computer I changed my workflow from to just train the model and uh like I, got it to I, got uh I tried to change some things and finally got it running on as well.

B

uh So, basically uh right now, I uh changed a few things in the code, like uh the I found another way of getting the bounding boxes like other two ways in which we can use like from the masks, and we find the X X Min x, minus y Min y Max position of all the coordinates of the box with respect to mask and, as another way is to use like uh Pi torch's own custom implementation, uh which also gives you the similar results.

B

But uh this you know, I found that it's better to use uh graph function, because it's a more quicker and more innovative, so uh and also uh one thing I tried to do- was to like train for a longer period of time.

B

So the advantage over uh collab is that kaggle gives you like around 15 to 16 GB of a GPU uh for like around uh 30 to 40 hours of uses per week, and it's not and it's much better than what uh Google gives you, because app gives you, because the resources and like the free resources, get a real switched every month like get renewed every month in collab, and here it's only one week.

B

So uh so I tried to like train the model once again and uh basically uh so uh for more ebooks and I ran into a problem where my.

B

uh My learning rate like changed to zero, like it was zero and it was not learning further so, uh like I, tried to Google the error about it and to see how to change a few things so that it gets up and going so uh like one thing, I noticed from this was like it was only uh like uh finding only one mask out of all the masks that are given like uh I'm, not sure if I have an image.

B

So basically, uh if these are the bounding boxes like it was only focusing on one bounding box. So uh the problem is with respect to gradients. So it's like the gradients are Vanishing from uh like for a foreign update, updating the wage during training, the uh the gradients are Vanishing so, which is why your only uh it's only focusing on one boundary box and four, so the uh so.

B

The trick to like overcome that is to like a hyper parameter, tune it much better than uh what I've been doing right now so, and also changing the uh changing the optimizer and all so I've used in this stochastic gradient descent, which is known to do that. So I also tried it with Adam and in atom. It's the gradients were exploding because the loss, for example it's like 1.3 1.2 something and it just kind of hits like it goes still a really big loss like 5000 something so right now.

B

My only uh the main thing I'm focusing on is in parameter.

C

B

And uh also optimizing the things so that I overcome this and also uh I I'm star I'm, reading papers of of the okay Craft part of it. So there is a paper file like using persistent hemology in uh graph neural Nets like so it's doing, link prediction uh which is now like. We are trying to sell uh tracking, but the uh process for before doing the final task. That is to.

B

uh It's similar to what uh uh it's similar to what you're discussing last week, so yeah uh nice now I'm just looking into the. uh What do you say you could say the uh papers, part of it like a reading material Center. So after once that is done once my laptop uh actually works starts working again because, like it has an issue right now, so I've come from my brother's laptop. So uh basically my CPU files select uh this stopped working like completely so I have to change them. Yeah.

A

That's great: uh let's go back to the paper. What is that paper from? Is it from a journal or oh, it's archive. Okay,.

A

Yeah I'm just getting the number.

B

A

A

Yeah we were talking about that last week. That'll be a good thing to go over uh yeah and uh yeah, so that they're doing link production here. But the code you're working on now is for sort of segmentation getting to that point of creating sort of links, we're creating uh nodes and then maybe creating edges from that so you're having problems with uh the gradient Vanishing and the gradient exploding. So it's it's clear. Something is wrong there with the uh hyper parameters. So what what's your strategy for solving that.

B

um Like two approaches, one is to focus more on that.

B

And since it was like a tifs file which was uh changed into like uh some uh like more images like you know, we converted the trf file into pngs and it was like a series of images so right now, I'm Focus I'm, trying to like take the best images with which has the like kind of correct bounding boxes, which was uh and then to focus only on the pages and try to fine tune it on that, and uh also remove, like all the noisy uhness in the data like some more uh but and after that, I'll focus more on uh changing these parameters and to see which one is the most perfect ones uh like now I'm using the step LR so uh for the LR scheduler, which basically changes the learning rate with respect to uh the epochs and and so I'll just play around with these and try to get better results.

B

uh My main concern was the GPU problem, which was like in collabs so right now, I can what I can do more iterations of the parameter tuning and everything? So it's better now.

A

Okay, yeah: well, that's good um yeah! Keep working on that and um you know, keep us posted hope. Your computer gets back to normal soon, yeah all right yeah. Thanks for the update.

D

A

Hi, how are you.

D

Sorry, for me, my laptop was also not.

A

D

D

Right now, my progress was I almost implemented Sam because, as following my tutorial to implement some action, this tutorial I am using ents. I am using Transformers method. Library I am using this guy's data set to implement it. I am almost done with it, see it. So what I'm sharing my screen? Oh so.

A

I'm not seeing it oh there we go.

D

This is the transfer foreign.

D

I was getting same as the implemented. I just need to write a training. Loop, then moral will be trying. I am also having G issues with GPO, so I got some yesterday by paper space around it took 40 or something so I took it. It was the cheap one, then my phone to start implementing it and on the demonet also sorry.

A

D

Okay, sorry I have issues with my network so tomorrow, if you are feel badly, can I have my evaluation tomorrow.

A

Yeah yeah I can uh yeah I. Guess they open today or.

D

I mean if it's possible tomorrow, because I was at all.

A

D

I will show my whole update with my bike. Also gonna join tomorrow.

A

What time is it.

D

Give me a sometimes.

A

Okay yeah, as usual around this time, is good.

D

Almost he was okay with it. He saw my updates.

D

I'm almost there, my project is also on, and another thing is there are so many things I want to discuss tomorrow.

A

That'd be good yeah, be good to see.

D

Yeah from there there was no particular paper. Actually there are difference, but there is no particular paper. So we are thinking to make a research paper out of that we'll discuss tomorrow. Yeah.

A

Yeah, that's good.

D

uh All my updates, I will update in the blog and I will just show you hit that in the selection, I'm.

B

D

This actually, this laptop so I came to the math story. Oh.

A

D

Show the whole update.

A

Yeah, that would be good sounds good. All right, thanks for updating today,.

D

I will leave a meeting now.

A

So yeah that was uh yeah. He was doing his update so he's also using Kegel notebooks and we're working on that and we'll I'll meet with him and Maya tomorrow. You know, like he's, been working with Maya in a separate context, but we'll kind of bring those together tomorrow, so we'll see um yeah, so any any updates from Susan or Morgan or Jesse for dick.

C

Well, I've got some discussion over mapping the differentiation tree to the genome.

A

C

Okay, uh now we have differentiation trees for C, elegans, partial one for Axolotl. uh You know they're they're, not pocket, yet yeah, okay, well, we'll see elegans might be the place to look okay, so I'm, going by a very simple premise that one differentiation is represented in the DNA: great okay, two, that differentiation occurs by copying a group of genes and then having them to Verge.

C

Okay and that's what I call continuing differentiation the ability to copy something responsible for differentiation of revolution and uh duplicate it. And then the duplicates diverge from one another, producing two cell types where you had one.

A

C

Okay, that's the basic idea! Well, I! You know it's an untested idea. uh So, as I said uh in a note to you, the uh the problem with finding the DNA responsible for differentiation might be approachable by saying, let's suppose that there's a motif in the DNA right, which is similar for all steps of differentiation.

C

Okay, if that motif, if we, if we can find motifs whose count equals the number of nodes in the differentiation tree, then we have a possible candidate for the DNA that is responsible for differentiation.

C

Yeah I'm not sure how to go obviously or DNA. We don't know, we don't know the number of uh base pairs involved in differentiation and uh and once we know it, we don't know how what varies in it and what's Constant in it.

A

C

So this is kind of looking for a pattern that might match the differentiation tree now. The only clue that I can think of right now is that whatever the motif is the number of copies of it should equal the number of nodes in the differentiation tree.

C

So you could imagine a massive long-term computation which creates motifs and then looks for them and counts them. How often they occur and once it finds at least, and it matches the number of Roads of the differentiation tree. Then we'll look, we're not home, but we've got a candidate okay, so.

A

I I'm not sure how to proceed from there. Yeah yeah. It's it's an interesting idea. I think we've talked about this in the past, but um so basically yeah yeah. It makes blasting look simple, yeah yeah.

C

Okay, because in a sense it's kind of blasting but I, don't know, you've got to assume there's gonna be variation in the motifs right: okay and you're, looking for a count of The Motif in the whole genome, right, okay, so it's it may be computationally impossible right now, I don't know, but it's.

A

Conceptually possible Right.

A

So I mean well.

C

I I don't know how many uh of the people joining us play uh are familiar with blasting and stuff like that.

A

C

Let me draw it out a.

A

Little bit um so you know, differentiation tree is where we have.

A

Me yeah uh differentiation, trees where we have a tree that diverges, so we have like a single cell type and then it splits to two cell types and then the four cell types and we go down, but we'll just use this as a quick example. Okay, now, of course, DNA has a structure of base pairs. So you have these combinations. These codes, that kind of come out if you were to take like 100 base pairs or it'd, be a code there of of a sequence of different letters.

A

Let's take a six base pair uh code here, let's say uh I'm, just kinda, making one up because okay, just just for the sake of simplicity, so this is a six uh base pair code here now. What you're saying is that this might be like involved in differentiation. It'd probably be longer than this, because this probably doesn't give you much. Sometimes you get repeats of things you get. uh You know uh tandem repeats, which are like two letters repeated.

A

You get other types of repeats that are more complex, but they usually have some sort of functional significance. I mean sometimes they don't, but.

C

A

uh Well, yeah, so you can have like uh you know, you can basically go through and look for different motifs. If you know that they're, if you see that, like you, could uh design an algorithm to go through a draft genome and look for different repeats or different motifs of different types.

A

If you predict that you know, because we know what the codons should look like, we know what the amino acid should look like uh that come from the codon, so we could say well we're interested in this combination of amino acids, and so that would give you a motif and then you could see how many times you find that in a draft genome. So you could search for say this. uh These six bases at the top, and it would give you a number so.

C

I mean you know, yeah I think one way, one way to approach this Spike would be to say. If, for each cell type, you.

C

For each cell type, there are unique genes produced if it's a to distinguish which distinguishes from other cell types.

A

Yeah so the set of unique trades yeah.

C

That's that's from the duration itself. The question is, what would they have in common.

A

C

You know if you pick a cell at the top of this graph and sell at the bottom. What do they have in common with each other yeah well in general, metabolism is obviously one, but that doesn't help much yeah.

C

Okay, I'm trying to think think ways through it. The housekeeping jeans, yeah.

A

Basic differentiation, yeah.

C

Yeah so we're looking for what is involved in differentiation and can we make a motif for that and then hunt for it and then count the number of times it occurs. Yeah.

A

That's the number of times occurs equals the number of nodes on that graph. Then we have a candidate yeah.

C

A

Yeah thinking in.

C

A

Of oh: go ahead.

C

Yeah, okay, it might be possible to do this, but uh you know Finding what to look for it. It's sort of, like uh you know, Carl. The only stuff is Carl Lewis, yeah, okay. He had to find something that that uh was good enough to be in common and evolve slowly uh and look for it and he ended up with the uh the s16 uh RNA from the from bacteria right.

C

Okay, so he intelligently and hone down on something that might happen.

A

C

There might be a way to do that for differentiation.

A

Yeah so one thing I mean this kind of what I'm getting from this is that you have these sets of genes for each node and kind of drawing a Venn diagram over a tree. But uh so each circle is a set of genes that are involved in sort of differentiation.

A

But then you should in there and there should be some uniqueness, but there should also be some overlap, say what the daughters right, maybe.

C

Maybe that's the way to look at it, take a trip, any Triple, A cell type and then two cell types that are derived from it.

A

C

Okay and then ask what do they have in common right that might own it on on the differentiation genes.

A

So then you know they should obviously have housekeeping genes in common, but we know what those are we can throw those out. Then we have things that maybe are involved in differentiation. I mean there are a lot of functional classes, but, like you could narrow it down. Probably.

C

Yeah, maybe okay, we might need some super computers, hopefully.

A

We don't need a quantum computer, yet oh no I mean the way this works generally. Is that usually we have like a set of candidates, so people will do searches, they'll, search for some uh string or they'll search for a set of uh genes, or maybe like uh different regions of the genome, that they want to search for, and then they say well, there's a certain probability that this is. You know a match in this area or this area we know kind of from functional studies.

A

What different parts of the genome do if it falls within a gene or not, and then also we know uh like, we can also predict um kind of it's uh likelihood that it's uh you know.

C

Here we have to go a step backwards and say: well what could it be.

A

Yeah I mean that's. Function is always really hard. Unless you have like uh studies, a tie to I mean they're. Probably you know. If we go back to some of the literature too, you can probably identify things that are involved in you know differentiation.

A

They have like a sequence and then you know find motifs in those sequences or something like that. Yeah I think the elegans might be.

C

C

What around it 800 2000 cell types?

C

Oh! No, what we know how many nodes we're looking for right, okay and uh I- don't know! What's what's done, some of the mutations might be might involve splitting some of these differentiation regions whenever screwing it up yeah. Well, lots of lots of mutations for C elegans right.

C

Something may involve proper uh new differentiation.

C

Okay, yeah yeah. It's a tough problem, but I.

A

Right, okay, yeah, so yeah I mean then and then the other problem, too, is like. If you have different cell types, you know you're talking about the things that make them unique or the things that are involved in differentiation is that I mean a lot of people will go in and look at like different, like tissue types and know that their differences in tissue types and that's just a bunch of cells in the tissue that they sequence and uh but those aren't involved in differentiation per sales or a different thing.

A

So that's that's! The other thing is that you want to be specific to that, and it might not be that hard, because a lot of things are just kind of like this is specific to bone. This is specific to muscle, and it's really about you know getting like upregulation of those genes more than like the actual.

D

Organic advantage.

C

Of C elegans is that every time there's an asymmetric division, you get two new cell types, okay, yeah, and you only have about 50 cases where it doesn't happen where you get a pair of cells that are supposed to be similar right. So uh that's why I think C elegans might be the right right organism.

A

A

So that's you know, I, don't know if people are interested in that, if people out there want to work on that problem, uh you know we could talk. We've been trying to do something like this for a while, but we've been kind of it's hard to kind of get sort of, get it going and doing something interesting with it. So it's hard work because it's it's a lot of bioinformatics, not everyone has that skill set and.

E

It's okay, it's just a dick, pushing computers again to do more.

A

E

C

Yes, yeah I mean conceptually it's not that hard, but uh the combinatorics are rematch right.

C

Okay, but given that we have the fully sequenced Gene, obviously data is available, yeah, okay.

C

So uh that's why I think this would be a major step towards testing whether differentiation trees are the basis of differentiation and if they are okay, what is the representation of DNA yeah yeah? Okay, we have trained situation of mapping a tree into a linear structure right, okay and that's that's why uh it might be more complicated than that I. Imagine but okay yeah, but if you try to take a tree and map it into a linear structure, you'll see the problems right.

A

C

Granting structures don't become.

A

Straight lines easily right.

C

Yep DNA is the cause of differentiation. It's got to be there.

A

Yeah, well, it's either going to be the same sequence like sort of.

A

So yeah it's it's usually like um you know it's either going to be the same sequence: that's conserved across different uh cell types or probably more likely you're going to have some. Maybe some changes depending I guess. If, if the differentiation process is the same, it would be basically conserved. So you know it would basically be maybe like an instruction of change to the cell type or a lot of times. You have uh genes that are for, like things like stemness or things that enforce the identity of a cell.

A

So, like a lot of times, you'll have genes well, there's a whole set of what they've called stem-ness genes which maintain like Blurry potency and that's something that gets relaxed when you have different cell types. But there are also genes that are like definitive of a certain cell type and so.

C

I know yeah stem cells and feel it okay. What happens when they differentiate? Do they differentiate further up the differentiation tree or to anything.

A

Well, they have, while in C elegans you have a deterministic uh differentiation uh pattern, so you know certain cell types. They start from the one cell stage and their sort of Developmental cells, which are basically pluripotent cell, well they're at pluripotent in the conventional sense. But they have this they're. Not yet they're uh adult terminal differentiated form yeah until they get to a certain point and then they differentiate and.

C

It's like there's a okay, I guess what I'm asking is when they do differentiate. You will go up the.

A

Tree or down the tree, um I guess that on the tree.

C

Then they can become something more basic like huh okay. If they go up the tree.

C

A

Closer and closer- oh okay, well I, guess probably probably down the tree, then I guess I gotta share what the terminology is.

C

Thank you, friends. Differentiation.

A

Oh okay, all right.

C

So you know I mean it could be. The whole idea is wrong. Maybe the thing to do is look at Adventures patterns and stem cells, see if they can be classified as to what part of the differentiation tree.

C

If they can produce the old tree, then all the way back down then.

C

Have something quite specific.

C

Yeah available for that.

A

I, don't think so, I mean there's, there's a genome and there's like uh uh because I know they do a lot of single cell sequencing, so they may have single cell data available. So it'd be interesting to see what the differences are in the actual sequence.

A

Now, let's try it probably what's more, what's easier to find is stuff like RNA seek data where you have like a sequence, but you have things that are upregulated in a single cell so or down regulated in a single cell, so you actually have the sequence associated with that activity and then that would actually be informative because you could divide up the genome further into what's actively being transcribed or not.

C

Right right: okay, yeah, you know in a way the this might involve the epigenous okay, but differentiation.

C

If differentiation is caused by the DNA, then the cell types of the given or North could be represented by closing down more and more of the DNA. Until you get to a specific set.

C

uh In which case, one would like to know the epigenome for a cell and subsequent cells from that.

A

That might be a way of getting at it, yeah yeah that would be yeah, so there yeah. There are a number of ways to get at this problem, but really it's going to depend on what kind of data we have available and then you know what yeah.

C

If anyone worked out the epigenome for C elegans at all,.

A

uh I, don't I think yeah people have uh worked on like different things like uh methyl I, think methyl markers and things like that in the genome. But I don't know if people have also done things with uh epigenetic inheritance but and then associated with that work. They've done some work on looking at the epigenome, but I don't know where that those data are I'm sure those data are out in.

A

um Not sure I'd have to look at the data.

C

Well, this that may be another approach to it.

C

So I think if we could solve this problem, it would be a major answer. Yeah.

A

Yeah, it would be great, I mean I, don't think people have I, think people think about it more in terms of like the cell type. That's you know what's up regulated in the cell type, but that's kind of like beside the point. Is there a mechanism that sort of guides the cell towards a faint? So, like you know you, obviously you have the cell and it's Associated things associated with that cell type.

A

You have the stem cell and things associated with the stem cell, but what's that transition look like and that might be where those yeah symmetric division and.

C

What is it becomes different between the two symmetric division.

A

I'm, just writing some notes. Here, okay,.

A

Right yeah that'd be good. Well, maybe I'll take a look and see what kind of data there is for uh for these types of things, I'm.

D

Not really sure.

C

Thinking about what was his differentiation, it was very muddled.

C

That's mostly Gene circuits and whatnot and yeah.

C

Where it belongs, solace is possibility.

C

Is Universal at all stages of differentiation from the genes involved in the cell States good remote, be part of this differentiation apparatus.

E

um I have a student group who might like you come on and explain the self-state splitter.

E

Okay, uh I tried the other day and I said well skip the first. Like seven pages of this, like.

C

The genetics and when I first lectured on this took me three hours.

E

C

I and the problem is that it's so different from what people are taught in school, so.

E

C

E

It oh well, these are, these are physicists, so they work well. In that case, they can't swallow at all.

E

Their biology is a not always there yeah.

C

Okay, you're, not Vincent of anything.

C

Okay, well I think he made a little progress in talking about yeah. Okay, the question is what to do all right.

C

Yeah, okay, all right: okay, let's.

A

Get off all right sounds good uh Susan. Do you have any updates for us? Are you working anything.

E

No I decided that the paper I was trying to duplicate, um ignore the instabilities in its in the tensegrities. They were using and just Maps before the graph before the instability and then graft afterwards and made a nice smooth curve and said: oh there's instabilities like yeah. There are so that that's my half baby.

E

They weren't cheating, not exactly, but you know right.

A

Right they were selectively ignoring things that they didn't want.

E

To deal with yeah and then they said they were, but I would make the point by saying it was in particular um force or or distance moved. It was at this point like it's obvious. It's right there great.

E

Every time yeah.

A

Morgan or Jesse did you have any updates.

E

Rotation Problem by placing the triangle in the negative y direction so that the Y is actually pointing out of the plate. Okay I was trying to explain that last time both anybody cared but anyway, I did solve that good. So now, I'm going well at least stress strain curves you to make them point in any direction. You wanted that!

E

Doesn't that's not very good either, but um because, apparently, if you make the the top and the bottom and the bars uh infinite elasticity, then you get a J curve, which is what you're looking for in a cell.

E

So somehow cells have a kind of an infinite elasticity like over 200 gigapascals, wow I'm going. uh Maybe if you include the myosin and the fact that they're actually Contracting I, don't know anyways, uh it's a mystery and.

E

All I can do is just write up this.

A

E

Like I want, my graphs I want to be done with it, and I want to write it up, I'm so over this project. Let me do something else. Please.

E

A

I know what you mean: yeah yeah,.

E

A

Looks like Morgan and Jesse here: okay, I didn't see anything if you, if you want to mention anything you're asking questions, please feel free to put it in the chat.

A

um So I'm gonna go back to sharing my screen here or have I shared. My screen. I did share my screen. Okay, uh go back to sharing my screen and I'm, going actually I'm going to talk about this first um I I get uh and I know Morgan knows about this because he has the T-shirt um there's this organization, of course called Fermat's Library and they have a paper of the week and every week they have like this.

A

Fermats library is like a set of papers that they, where people can build a library in this specialized PDF uh interface, where you can mark up the document collectively, you can take notes on on a paper or whatever. So that's that's where this is coming from and this last week they had a paper from uh John Maynard Smith in 1970., and he published a paper entitled natural selection and the concept of a protein space.

A

This proposes a simple analogy for the incremental process of adaptive. Evolution is protein. Space analogy contains the basic basis for many Central ideas in evolutionary genetics, so he just he talked. He likened The evolutionary trajectory of proteins to a simple word transformation game in this letter, uh this game with the goal of evolving one word into another by means of a single letter. Substitution serves as a striking model for protein evolution, uh some of the work I've done with vehicles and composing organisms, uh there's sort of a quasi-developmental process.

A

I I, you I, illustrate this with these kind of word games and I. Actually wasn't I didn't remember that this was this sort of what this was this exercise. But it's basically, you know taking like a string of letters, it could be DNA or it could be like uh the English alphabet, which has 26 letters and permutating those letters. But you start with like a word, and then you permutate the letters to make new words, but they have to be words that mean something they can't be gibberish they have to.

A

You know, have like an a meaning to them. So that's the challenge, and so it's like you know, in a protein sequence or even in a DNA sequence, where you have a string of letters, you're mutating them, but you don't want to mutate to something: that's not functional. You want to mutate to something, that's functional. So that's the idea, um so you have 26 letters you have so you you know 26 possible States in each position.

A

If you have a string of four, you have uh four to the 26th possibilities which you don't like I said. You don't want to use all those possibilities. You just want to use the possibilities that have some meaning. So usually what that means is that you can build a transformation path from one word to another by mutating one letter and then maybe mutating another letter, and it's a little bit artificial, because you're kind of doing this consciously to sort of to hit words with no meaning.

A

So an example would be like word to war, so you're mutating the last letter from D to E. Then the third letter is Gore or the third word is Gore, which is mutating the first letter from W to G and then the fourth word is gone, which is where you mutate, the third letter from R to n and then the fifth word is Gene, which is mutating the second letter from o to e. So you get go from word to Gene and in five steps, and you can map that out in a tree.

C

Yeah, this is a little bit similar to a game we played as kids like a given word. How many words can you make from the letters in it.

A

C

You're restricted.

A

A

Oh right, yeah yeah I mean it has a probably a lot of uh similarities of things that it's kind of a fun thing to do, because you see kind of like similarities between things so yeah, it's it's. It has some. You know there are other games that you can play with this, but so you can build a like a sequence and then you can build alternate sequences. So it's not just one sequence.

A

It's like you, have multiple paths from word to Gene and you can just mutate different letters and get there in different ways, and so this is similar to the concept of a neutral space and it you know, uh do not familiar the neutral space. It's basically the space of possible genotypes uh that if you were to just uh let you know your system sort of uh go at random and explore different states. You would end up in, like.

B

You know there isn't.

A

Any sort of specific spot for a mutation or any specific mutation rate, it's just exploring that space at a neutral rate at a in a neutral way. You know: where would you end up and so a lot of people?

A

Sometimes people characterize this with a hypercube and they have different states in the nodes of the hypercube and you you have to Traverse a path through the hypercube to get to the other side, and you have to count how many steps it takes you to get there, and so this is sort of similar to that as well, and you can look at optimality Criterion and all that and then uh you know there so there's this sort of analogy between functional proteins and words in English or words in any language.

A

You know you have a finite alphabet, you can look at their. uh You know the transitions between different words and build a space from that, and so that's sort of treating uh you know. Protein sequences and Gene sequences as like a linguistic.

A

Sort of dictionary- and so that's so this is kind of an so. This was from 1970 and then this is the paper here where this is where the notes are taken here. So these people just pop notes in here and I, don't think this is the paper I think I just took a screenshot of it, but this is uh sort of where he starts talking about natural selection and the concept of a protein space.

A

So, instead of thinking about it as a neutral space, he talks about natural selection, which would be where there's a directional imperative for selection to go to, like the other side of the hypercube in that neutral space in a minimal number of steps. So, instead of just exploring every mutational pathway, there's an imperative for natural selection and if you think about like the sort of the artificial nature of this or you go from word, Gene you're really imposing sort of artificial selection. On this to say, I I want to go to a certain location.

A

Can we get there in a certain number of steps? And so that's um that's. That's the.

B

A

It is similar to a lot of games that people played it's very game-like. You know you have a lot of States, you can visit and it's yeah, it's nice, um but that's that's something. You know that relates to what we're talking about before, because you know you're looking for things that are permutating, uh maybe from different for different cell types. You might have differentiation genes, for example, that all are sort of Highly conserved. So you have this one process that unfolds in every cell type and it just tells it to go.

A

One way you know tells it to say differentiate now or you could have something that mutates a lot and it has a specific code to say it's, this cell type or you know, which is maybe less likely, because um you know you it's uh through different, like throughout Evolution you'd, have to really kind of Target or it'd have to be under high selection for certain cell types. So a little bit of a word on what hyperspaces are so a lot of times.

A

um If you want to Envision a sort of an evolutionary space that is subject to neutral processes or neutral Theory people of God, with this idea of hyperspace, so I'm going to draw a simple example here, where we have a cube- and this Cube has a number of binary States, we have eight nodes and we characterize it with a three bit register. So we have this path from zero: zero zero to one one one.

A

So we're going from the bottom left-hand corner to the top right hand corner of this Cube, and the idea is that you can navigate across from node to node, and each node represents some mutation on zero zero zero. So you could have zero zero one zero one, zero one: zero zero! Basically it'll! Allow you to move to 111, so you can have each step as a mutational step and it changes what we call. They call this the phenotype.

A

So a lot of these models have been developed in RNA, um looking at RNA molecules and looking at their phenotypes. So a mutation in an RNA molecule is a change in the phenotype. It can change the conformational structure. It can change the message, that's inherent in the molecule, so people also use hypercubes. I. Think I mentioned this in the lecture where you have a hypercube.

A

So you have a cube within a cube. Basically, sometimes people call these tesseracts um and I think they use the word tesseracting correctly in the Marvel series of comics but whatever. um So this is our hypercube and it's basically a cube within a cube.

A

So has a lot more States and you can explore more space with this and there are other sort of mind bending topologies that you can use. It gives you a larger number of states, but it also gives you this larger space to Traverse, and the idea, though, again, is to go across the space traversing it one step at a time and then figuring out how many steps you need to take to get to your endpoint now.

A

The other thing I want to mention is that you know I talk about this as if, like it's a magical process, but it's really about you know what what are the constraints that you impose on these models? So uh you know we talk about neutral processes, we're just basically talking about a system that explores a space at random, so this space has eight uh eight possibilities, and so you can explore it at random. So you could start with zero zero zero and you could move up to zero one zero.

A

You could go to this back, left node here and then down to zero zero one over to this bottom right, node and then over to one zero, zero up to this front, right, node and then to zero zero or to one one one, and so that is seven steps which is not optimal, because what would be optimal would be to go from zero, zero zero to one one to one in three steps, and you can go take three steps to get there.

A

That's the shortest route, but in you know, in a neutral process, you might end up achieving that in seven steps. It's just a matter of random mutation and not favoring any one site over another, not favoring anyone's strategy over another, and so that's and then natural selection is more reflective of this shorter path of this optimized path from one phenotype or phenotypic state to another.

A

um The other thing I want to talk about. Is this.

C

uh Interesting, maybe one approach would be to say, make a particular cell type and you get for an organism. That's been sequenced and then pick another organism with the same cell type and see what's similar between them. Yeah.

C

Okay, so you look at particular cell types across across in general,.

A

A

Yeah that would definitely yield a uh an answer of some type. At least you'd have an idea of what to look for.

C

And hasn't been done.

A

uh Well, people have well people have compared different uh species and cell types, I'm, not sure. They've looked asked that for that specific question, so they're, probably not looking for you know they might be looking for a specific functional Gene, like maybe a muscle Gene in you, know, muscle specific Gene in two species and then and then, but then you know, you'd have to decide whether you want something that's closely related or not, because a lot of people look in model organisms and it's like you know, a mouse and a c elegans.

C

Which isn't regarded this back from the 30s well before the genes were discovered, yeah? Okay, where you could actually take? Let's say fish, liver, I'm, wearing a chick or vice versa? Oh.

A

Right, okay and.

C

Deliver cells will be incorporated into it until the river of.

E

The organism and I was sleeping uh eye retina in eyes. Invertebrates might be interesting, yeah.

C

But that's your numbers cell type, so it's a different cell type out of it.

A

E

Cones and rods and.

A

No yeah I, guess yeah. If you take a sample of tissue from any organ you're going to get like so well, I mean you'll, get cell types of different levels of differentiation, sometimes and sometimes you'll, get like a single cell type like and liver versus, like if you were to sample and retina. You have like a bunch of cell types. Jumbled up in there like.

E

A

But yeah it's so I mean the data set. You would get would probably not be like one cell type we'd have to work with the data we have available. I mean you could do the experiment. You could maybe separate out different cell types, but maybe yeah that that's that's a good yeah. Those are, of course, classic experiments where you just take a self, a type from one organism. You put them in another organism or you know yeah.

A

C

C

A

Yeah well yeah I'm, taking notes.

A

So I think Susan threw a bomb earlier where she talked about uh physicists, not having a lot of biology background, which is somewhat true, but uh there's this argument that uh bio biology is more theoretical than physics, and this is a new newish paper. Yeah.

E

The biophysicists I talked to are are definitely another for what they're doing but um yeah you go off there. What they know and and students seem to have some difficulty with that. So.

A

Yeah yeah, it's a challenge to kind of cross. You know across.

E

Yeah, yes and I know because I've done this a few times since, like okay.

C

For example, molecular biology it's business and physicists throughout the way yeah.

A

Yeah, yeah, okay, so this is a paper. This is from 2013. Actually so it's about 10 years old now uh and it's called biology is more theoretical than physics, and so it's it's kind of an argument that uh you know we think of physics, as maybe really theoretical, in biology, not being so theoretical. But uh the this paper makes the argument that the opposite is maybe true, and so the abstract reads. The word theory is used in at least two senses.

A

One is to denote a body of widely accepted laws or principles, as in darwinian theory or quantum theory, and to suggest the speculative hypothesis, which is two often relying on mathematical analysis that has not been experimentally, confirmed. So it's basically, this widely accepted laws or principles that you know you have developed and then speculative hypotheses. So this is just the way that people use Theory uh and then this is only two definitions. There are probably more.

A

It is often said that there is no place for the second kind of theory in biology and that biology is not theoretical but based on interpretation of data. So this is where often people will go and get lots of data, but they have no framework or they may have a framework for it in maybe like accepted laws or principles, but they don't really. You have a lot of room for, like you know, building speculative hypotheses that go just kind of Beyond like uh Data Mining, and that sort of thing.

A

So that's and that's you know something that you know.

A

Maybe it's sort of the way that the field works, I don't know, but anyways here, ideas from a previous essay are expanded upon to suggest to the contrary that the second type of theory is always played a critical role in that biology, therefore, is a good deal more theoretical than physics and says kind of goes through some of these things that, uh in a previous lessee I pointed out the Curious Case of the enzyme, substrate complex, which was widely used to understand enzymes before any enzyme, substrate complex was shown to exist.

A

So this is where you have this idea of this structure, or this thing we're talking about with differentiation genes. We don't know if it exists or not. No one's shown it exists, but you kind of build a model, and you say this exists and I'm going to show that exists and in a way that's how you kind of do biology. Although you know a lot of tennis, people will, like I said you know, they'll collect data on you know some.

A

You know comparing two different species or you know looking for all uh functional genes of a certain category or something like that. But this is where you actually build a you know a sophisticated model of something, and then you kind of uh kind of realize it with data so uh Britain chance who brought these hypo hypothetical entities into existence was in no no doubt that he was providing the first experimental confirmation of a theory.

A

uh So this is uh that one I think Britain chance was also the person who developed uh fnir if I'm not mistaken, but um in the interviewing intervening 30 years, biochemist happily used a theoretical entity because it was so useful and explain so much uh expediency overcame the kind of philosophical Scruples that would make a physicist swing, and so this is something where um perhaps this is a marginal episode among enzymologists, which can be glossed over in favor of the party line. That biology is not, of course, theoretical.

A

I claim that this is far from the case. In fact, similar episodes have occurred throughout biology involving some of its most important entities, and so he talks about the receptor and some of the equations that they developed to describe a receptor, and these are all theoretical things where you have like these constants and variables that you're kind of coming up with you know. This is what it should look like.

A

So this is the the hill function, and so this is, uh you know, just kind of something that had to be experimentally, verified, but you build this model and then you eventually realize it, and so you know it took 30 years for the enzyme substrate complex, to become a chemical reality. The receptor took a good deal longer.

A

So this is all you know, kind of thinking about Theory and how you realize it in data uh related to that is this paper by uh Wallace Arthur, and this is sort of biological Theory. So this is actually. This is on Darcy, Thompson's, morphological transformations issues of causality and dimensionality.

A

This is uh the person we've talked about Darcy Thompson before he did a lot of things with mathematical, Transformations and looking at organismal phenotypes and how they change across species.

A

So if you take like a fish- and you put it on a grid- and you map the intersections of the grid with the landmarks of the fish on the phenotypic landmarks of the fish, if you take a different fish and you put it on that grid- you warp the grid to match the landmarks, and you end up with this transformation between a straight line grid and a warped grid. And so this is interesting because Darcy Thompson was building these theoretical models.

A

And you know it's something that is a nice tool, but also kind of is a theory of transformational change, so Darcy Thompson's drawing show showing coordinated differences between the shape of an individual one species and the shape of an individual of another have been reproduced and discussed countless times. However, while they can be widely regarded as inspirational, their interpretation and causal terms has been proven difficult and there is no, as you got no consensus on the matter. So this is you know, these models are somewhat predictive, but they're also mostly inspirational.

A

So it's not that we understand the cause of those Transformations. We just understand that those Transformations exist and so we're not really at the stage of a predictive Theory, we're just kind of at the stage of like this nice tool that can show demonstrate these Transformations, but we don't really know why they exist in that in that form.

A

Here, I approached these thomsonian Transformations from a particular angle, namely their dimensional insufficiency I argue that this problem must be solved before the issue of causality can be considered. So you don't you know a lot of times in theory. We want to know about causality, we want to know if one thing causes another and so, but maybe that's the wrong question to approach right with.

A

Maybe you need to understand, you know uh what he calls dimensional insufficiency, uh so the uh thompsonian approach we or this approach of looking at dimensional insufficiency leads to the conclusion that thomsonian, Transformations or morphological Transformations. Remember those transformations of the grid have not taken place in evolution and logically can never do so because they involve the direct conversion of the adult form of one species into that of another.

A

So this is really interesting that the whole idea of the thomsonian grids is to show this transformation between like, if you want to represent an adult phenotype of one species versus the adult phenotype of another species. You look at these mathematical Transformations and you say: well, you know you assume. I must have gotten there in development, but there's no we're not actually uh measuring that we're measuring adult form versus adult form. So what he's saying here is that these Transformations are all adult adult comparisons, and so you can't necessarily say that these are.

A

Can these conversions represent what's happening, say in development? So that's that's where you have this problem. You use a nice approach like that, but you also have to have this developmental aspect. So what he does here is he talks about developmental Transformations, which are also important, and they would actually involve the phenotype developing along a certain trajectory and that's really. The whole business of evodivo is to explain like if you have adult forms that are very variable, and especially a trade.

A

That's variable how that gets there in development or what the developmental mechanisms are underlying that. So this is in contrast, developmental Transformations do occur in the short term, and so those developmental Transformations, don't necessarily show up in those Bridge Transformations. Those grid Transformations are sort of the end product of the developmental transformations, so you have to think about it in a different way. It's not that those Transformations are going to explain anything on their own.

A

They just show what happens when some X Factor, which is developmental transformations unfold, and so so there are these developmental Transformations that occur in the short term within the lifetime of a single, individual and evolutionary Transformations occur in the long term. In a context that can be described following Scott Gilbert has five Dimensions Scott Gilbert's, a developmental biologist um who describe these as five-dimensional um Transformations I, argue that these two kinds of transformation, Developmental and evolutionary have different causal agencies.

A

They consider the possible nature of these agencies and related to that in the way in which Thompson's work connects with darwinian evolutionary theory. So he really kind of lays this out in terms of like you know, a lot of what Darcy Thompson did in terms of the developmental biology, and that involves the expression of Hox genes and other things that uh uh you know.

A

So this is an example of these graphs or these these grids lay you know in putting different adult phenotypes on them and you use different landmarks like the tip of these antennae or something to sort of as a landmark, and then you warp the grid accordingly. So if there's a larger distance between this left uh appendage in the in the antenna that grid will warp accordingly, so it's a shorter here and longer in the species and then it diverges outward in the species.

A

So there are a lot of ways you can transform these grids, but you know this is something that is a nice tool, but it doesn't really explain the development or even the evolution, because the evolution would have to be of the development, and so this is really kind of getting a lot of theory, building and causation.

A

In some of these issues- um and then you know I'm not going to get into this too much today, but there is this, uh this document 101 unsolved, puzzles in Evo Devo and what I mentioned evodivo or the evolution of development. This is like all these unsolved problems, so these are problems that we've kind of observed in biological systems and we need to really we don't really have a theory of how they happened. So you know there are a lot of puzzles, involving origins, evolvability, The, genome symmetry asymmetry, digits, Behavior, nervous system.

A

There is a whole host of open questions here puzzles and they range from like. Why did hair evolving mammals or heterodontia evolving mammals to things like you know, with symmetry?

A

uh What's the cellular basis for monozygotic twinning hottest twinning occur in Armadillos, so you go from, like the level of you know, different traits to the level of like uh things in specific species uh to you know the cellular basis for epithelial Fusion. So you have a process where you want to know the mechanism, uh what causes the loss of tissue at symmetric, Fusion planes.

A

So you have specific, like sort of processes, scenarios and different processes, and so these are kind of interesting because they kind of go through a lot of Unsolved puzzles that this author has posed.

A

um So this is uh these: are numbers and brackets to note references as enumerated and quirks of human anatomy uh for context and perspective on these topics, see quirks uh definitions of Evo, Devo terms, so there's this book keywords and Concepts in evolutionary developmental biology, which is a good book. If you're interested to know kind of the jargon of that area of Evo Devo, they kind of have a lot of things on Evolution and development.

A

um It's a good book for kind of the introduction sort of an introduction introduction to that area, and so they're kind of like putting these together from different from this. These two sources largely and it's I- think it's a nice list because it gives all these different examples of different biological questions and potential theoretical questions, and it gives, like you know, gives a nice diversity of things how we think about Theory and how we think about asking questions in biology.

A

Bradley yeah I have two comments: yeah one.

C

Is basically, we have a bunch of software available, probably online now for working?

C

My question is: to what extent does warping do what Marcy Thompson's Transformations do.

A

uh I think there's like a similarity in the math so like a lot of times, they're using the same sort of techniques. Of course Darcy Thompson was using, you know doing it by hand or by like you know, but they you.

C

Get the same results, I guess as the question.

A

um Yeah, well, they don't have it as a grid. Usually it's like if I take an image. I can warp parts of the image so I have like these features in the image that I can pull apart or dilate, or make smaller yeah.

C

The other comment is that the cigars Michael's old bio, what.

A

We call it biogenetic law.

C

All genetic law right, biogenetic role, you see if what Heckle noted is that embryos at early stages, all look alike right. Okay, I was joking with it. My wife's been watching some movies on, uh or people wrestling words as eggs, and they can't tell what bird it is until it grows up great.

A

C

Because all early birds look pretty much the same, so I was joking and we should have a guide to baby.

A

Birds, it's just one picture. They all look the same. Yeah.

C

A

C

The idea that the Transformations are reversible may be wrong.

A

Yeah uh and the Wallace or is their paper I think he mentions hickle's work and and I don't know what I I haven't read through what he says about it specifically, but.

C

But I mean I mean the information from.

A

You, oh right, yeah.

C

May not tell you anything about the transformation embryo to another.

A

Kind of embryo back to the adult.

A

Yeah, that's true transformations.

C

How should I say Lord not reversible.

A

C

C

Okay, which, which screws up the whole thing in a sense.

C

Maybe the transformation between very different adults which have similar embryos.

A

How on Earth, would you take them backwards right, like finding the the common Point like just like reversing that.

C

A

Grid and just finding the midpoint or that wouldn't really tell you anything well.

C

It doesn't mean anything I'm, not sure what Transformations, between between the transformation between embryos.

A

A

Well, I think. The point, though, too, is that the Transformations between adults tell you something about the adult phenotypes, but that process from the embryo or that process from an evolving to you know the common ancestor between them, or rather you know, what's going on in their development, is a separate thing. So it's it's not. It may be responsible ultimately for that transformation, but we can't say that you know we can say: okay, there's something that moved this here, move that there um you know it's a much more complicated process than that.

A

All right uh any other comments or questions.

A

E

I'm, going no don't have uh comments or questions no.

A

Problem yeah all.

C

A

Talk to you later, bye.

E

A

Now I'd like to talk about two papers, that uh one was the one hermanshe talked about and the other one is something that I found. That's related to this. So, let's get into it. The first one is link prediction with persistent homology. This is the one that Hamachi talked about in the meeting. This is uh kind of moving from that microscopy uh and segmenting things and finding features to building nodes and then links, and actually, at that point you need to predict your links and you can use persistent homology to predict links.

A

So this is this is what this paper is about. So uh this is again, you know you, you have this underlying graph structure of the data and you're trying to predict links so um link prediction is an important learning task for grass structured data. So this is an interesting approach.

A

What they're saying is that the data that you have is inherently structured as a graph, so it has connections, it has interactivity inherent in the data, and so you need to find the links you need to find the important links in this process in the data set. So it's an important learning task. So we talk about graph, neural networks and topological data analysis and graph neural networks and that that sort of intersection we need to think of Link prediction as a learning task.

A

In this paper we propose a novel, topological approach to characterize interactions between two nodes, so they're actually using a topological approach. Instead of maybe like a statistical approach where, like we saw in the hyper uh hypercube example or the neutral space example, just simply associations based on the state of the nodes, you know what's the path from one node to another, is it mutational? Is it something else, uh but this is a case where we don't necessarily have that natural guide?

A

We have to predict the links from where the you know the data, the association of the data, the association with of the features with one another, our topological feature, based on the extended persistent homology and codes Rich structural information regarding the multi-hot paths connecting nodes. So this is where again we have this Cube. We have zero, zero, zero. We have one one one. We have a path between them. We need to predict these what they call multi-hop paths.

A

This is a communication networks term, but basically having multiple uh sort of steps to getting from one point to another in the network. So we need to be able to predict these paths and in doing so, we predict leaks based on this feature. We propose a graph neural network method that outperforms state-of-the-art on different benchmarks.

A

As another contribution, we've proposed a novel algorithm to more efficiently compute the extended persistence, diagrams or graphs. This algorithm can be generally applied to accelerate many other topological methods for graph learning tasks, so they kind of go through this. They show two example graphs here, so you have these multi-hop topologies, where you go from one red dildo to another red node and you can see as they get more complicated. So this is kind of analogous to our Cube versus our hypercube example.

A

When I was talking about um neutral spaces and and that sort of thing so this graph on the left, you have this multi-hop architecture, where you know you want to go from one red node to the other. What are the intervening nodes and then? Obviously, what are the interview links? uh The second one is a little harder because there's more interactivity, so in the first one you have three alternate paths and the second one. You have three alternate paths, but those paths interact in different ways.

A

So that's what we have here so there's a lot of data that we can capture and characterize here. So this is an example here in figure, two of extended, persistent, homology and so um a is where they plot the input graph with a given filter function. So they're plotting this they're plotting this over time. So you have at different time points these nodes appear and they are linked together in this way, the ascent in B, which is this- the ascending and descenting filtrations of the input graph.

A

So they have filtrations of it by time so like each at each time you get a new node that pops up, and so, as those pop up you get new links so like, for example, you have link or node one appears at time, one node, one and or node two appears at time two, but you can't necessarily find a connection between node one and two at times three node. Three pops up and, of course there is a connection between three and two and three and one, but not one and two.

A

So it's not just connecting anything it's connecting things as they appear, and then the evidence appears that they're connected at T4. You have uh one two three and four and they're connected and then at time four you also have the disappearance of one two and three. So they disappear your four here. So these these uh nodes, these blue nodes, die. You get a red node. For then four becomes a blue node. You get a red node three. That appears at time three uh and then time two so you're going it's I.

A

Guess it's you're going up to time, four and then you're doing a reversible analysis here. So this is where you're actually getting I would see. So uh the bars in the brown and blue colors correspond to the waistbands are connected components and Loops respectively. So there's a lifespans of things die off. Things come back to life uh as you go as you try to do this reversible process.

A

um You get this uh uh these connected components or the com, all the components that are connected in a graph, so sometimes you'll have nodes that are sort of out there without any connections. But the connected component are the parts that are all sort of. You can go from any one node to any other node through the connections that have been built.

A

um The first four figures are the ascending filtration of the last four figures denote the descending filtration. So there's this filtration, like I, said you go, you do a reversible filtration.

A

You do a forward filtration in the ascending filtration uh fuv Max fev, while in the descending filtration fuv min fufv, so they're, basically using a different Criterion for each one, they're using the maximum value in the minimum value, so you're getting two different ways of building this graph or building the links uh on the for the graph you're using two different criteria um in the resulting extended persistence, diagram, red and blue markers correspond to Zero, Dimensional and one-dimensional topological structures. So these are, of course, Zero Dimensional points in the one-dimensional points.

A

This is death time and time. This is birth time and time. So you have nodes that are being born nodes that are dying off and you have this intersection, so high def times and lower birth times or Zero Dimensional points higher birth times and lower death times are one dimensional points, um so this is kind of their approach.

A

Their contribution is or actually, in this paper we propose a pairwise topological feature to capture the richness of the interaction between a specific pair of Target nodes. We compute topological information within the vicinity of Target nodes, which is defined as the intersection of the k-hop neighborhoods of the nodes. It has been shown that such locally enclosed graphs carry sufficient information for link prediction, so they're able to like boil this down to these local graphs.

A

This carries enough information for a link prediction, um so you're, basically evaluating small parts of the graph. Here it has enough information to tell you about the links and then you can. You know you find the intersection of these larger they're sort of neighborhoods in multi-hop relationships and you're able to find predict the links from that and then you're able to build up from those neighborhoods to a larger graph structure.

A

We proposed topological Loop, counting graph neural networks or TLC GNN by injecting the pairwise topological feature in the latent representations of the graph and all that work. And so.

C

A

Is achieves good performance, computationally.

A

And these extended persistent homology- and this is the citation Cohen's Diner 2009.

A

And that is uh here so this or this this is the paper Colin Steiner at all: extending persistence using Punk array and the chef's Duality. So this is the paper that they're drawing from.

A

And so, in summary, our contribution is threefold: we introduce a pairwise topological feature based on persistent homology, to measure the complexity of interaction between nodes. We compute the top logical features specific to the Target nodes using a carefully designed filter function in domain of computation, so they Define their uh their sort of topological features and their nodes very carefully, and they can predict links from that. So they have these neighborhoods. They have these uh the sort of birth of death process and that's what they're using to sort of make these uh predictions.

A

We use the pairwise topological feature to enhance the latent representation of a graph neural network and Achieve state of the link protection results on various benchmarks. So they do this in a DNN context where they run it. As a an algorithm graph neural network algorithm, we propose a general purpose, fast algorithm to compute, extended persistent homology on graphs. The.

B

A

Complexity is improve from cubic to quadratic to the input size. It applies to many other persistent homology-based learning methods for graphs, so they're able to do comparatively well and so I'm not going to go through the rest of the paper. It looks like they do. A lot of uh you know highly technical work here and- and we can talk about this later- uh amanshu might want to present on some of this later.

A

But this is a nice paper for getting you know, you're, making predictions of links link predictions generally have some sort of there's some sort of strength of the connection and then you're using different Criterion like a maximum value or minimum value, to sort of predict that like whether it should be there or not. So this is a nice paper going over how to construct these, and especially in a developmental context where we have cells that are being born cells that are dying cells that are diverging uh from their uh their ancestral cells.

A

The Descendants are diverging and we actually have this. We make sure we have two connectivity uh networks of Interest. We have a lineage tree which predicts which cells emerge from which other cells so they're going to have an affinity and a spatial Network which predicts where they are in space, where they are in the embryo and so forth, and so this is uh useful. I think this will be a useful sort of guide to some of these things.

A

Another paper I want to talk about, is Transformers me directed graphs, and so we talked about directed graphs, lineage trees as directed graphs, differentiation trees as directed graphs and so directed graphs. Again, you saw in the example with the differentiation tree. We have a mother cell, we have daughter cells and we have this replicated this sort of binary division, sometimes.

B

A

Have things that are you know, multiple cells originate from sort of ancestral cell, but most of the time, and sometimes you have asymmetric divisions where you only have one daughter cell, but most of the 10. This is all can be characterized as a directed graph, so we can use directed graphs, but we can also use Transformers which are as a machine learning technique, and you know these things you know, as opposed to like maybe using graph neural networks. We can always use Transformers and understand these. You know or use them in in a similar way.

A

uh So the abstract Transformers are originally proposed as a sequence, the sequence mode for text, but have become vital for a wide range of modalities, including images, audio video and undirected graphs. So these are, you know, different types of input, data we have video, you might want to analyze, audio or still images, but also under active graphs, which are just graphs with interactions like we saw in the last paper.

A

You just have these interactions you're trying to predict sometimes you're trying to predict maybe causal linkages or sometimes you're, trying to predict like ancestry um and so you're trying to do these things, where you have some information about the direction of your graph and so, however, Transformers for directed graphs or surprisingly underexplored, despite their applicability, ubiquitous domains, including source code and logic, circuits and development, so we'll put that in there in this work, we propose two direction and structure, aware: positional encodings for directed graphs, so they're doing two direction and structural where positional encodings, so the positional encoding is like like what's the position in the order.

A

So is it you know before or after basically um the first is the eigenvectors of the magnetical plausene, a directional aware generalization of the combinatorial laplacian and two directional random, walk-in Coatings, so directional random walk is where you take a random walk. You take the you know, you take a step, you draw from a random distribution and take another step, and so on and so forth. A directional, random walk is that you have a certain direction, so you might randomize the direction, but you have to go in a certain like forward.

A

Instead of you know a true random walk, you could go any direction you wanted, and so you end up with these little clusters of movement around a central point. A directional, random walk is maybe like all the way, maybe to the top of the screen, if you're doing a simulation or to one set of the embryo or another, if you're in a biological system, basically you're randomizing your movement, but always in One Direction, and this is good for like directed graphs, because that's what directed graphs are they're just this, they proceed in a certain direction.

A

Empirically we show that the extra directionality information is useful in various Downstream tasks, including correctness, testing and sorting networks and source code understanding, together with a data flow, Center graph construction, our model outperforms, the prior state-of-the-art on the open graph, Benchmark code, 2 relatively by 14.7 percent.

B

A

So they talk about Transformers Transformers used to generate solutions for competitive programming, tasks from textual descriptions for conversational question and answering with the popular chat GPT, so chat, GPT uses a Transformer or to find approximate solutions to combinatorial, optimization problems like the traveling sales and pro these are all standard benchmarks.

A

uh Transformers have also had success in graph learning tasks through predicting the properties of molecules to other things. Well, virtual. We all prior work focuses on directed graphs. They attack a bunch of active graphs here, and so um the attention mechanism needs to become aware of the graph structure. So Transformers uses a attention mechanism, um for example, prior work, modified the attention mechanism to incorporate structural information and I think that's the same paper, that's different paper, um or to proposed hybrid architectures that also contain graph neural networks, and so this is where you have.

A

You know the attention mechanism in the Transformer uh Works instead of on visual features, they're working on some of these aspects of graph structure, so.

B

A

Is uh you know way we're going to go forward in this paper, another complementary option or positional encodings that are used by many, if not most, structural or Transformers? So these are positional encodings, which is where one thing comes. First then another thing, so this Min paper actually is the properties of molecules so they're using this structure where Transformer or predicting molecules, and then there's this molar paper, which is actually.

A

Attending to graph transform, so this is where they're actually applying attention at graph. Transformers in uh this is another example an archive this is the Min paper, so this is transformers for graphs in our review for architectural uh from architectural perspective. It's also in the archive so.

B

A

Are different ways: you can do this by merging attentional mechanisms in craft Transformers and graph neural networks? Okay, so now we have uh there's a motivation for building directed graphs. This is an example of the first eigenvector of magnetic laplacian node size, encodes, the row value of colors of the imaginary value. So they have these different ways that they've implemented this sequence, the undirected sequence. So you have the sequence, which kind of goes in a certain direction.

A

It's like going from one to ten, the undirected sequence, which goes in different directions from a source node, a binary tree which has splits between a single node and two different nodes and then a trumpet, which is this thing here, where you have a directed sequence, but you have a shortcut that cuts across, and so these are like I guess these are examples of sorting networks, but they're. Also uh this first eigenvector of the magnetic laplacy. The goal is then to predict whether the sequence is a correct, sorting network based on the sequence of operations.

A

We can construct a directed acyclic data flow graph that models the dependencies between the operations, so every every node that comes before has a set of dependencies, so every node becomes. You know after the node that comes before has a set of sort of dependencies.

A

It requires a note that came before it and if you have something in a sequence like that in the temporal flow or a temporal sequence, you always need the Mother cell, the daughter cell can't just pop up on its own, and you know there are a lot of properties of these cells that we're interested in, like you know, different Gene gene expression patterns or different properties, that of differentiation that make this an attractive approach, at least in terms of the data flow graph at abstraction.

A

So, moreover, we show that ignoring the edge Direction in maps, both correct and incorrect, sorting networks, the same the same undirected graph, losing critical information, so the main number of contributions here they make the connection between sinusoidal positional encodings and the eigenvectors of the laplacian. They propose spectral positional encodings.

A

uh They extend random, walk, positional encodings to directed graphs. They excessively assess the predictiveness of structural or positional encodings for the set of graph distances.

A

They introduce the task of predicting the correctness of sorting Networks the canonical ambiguity, free application, where directionality is essential, uh the model, a sequence of program, statements, the directed graph and rethink the graph construction from source code to boost predictive performance or robustness, and then they do the benchmarking. So there they have a number of different things.

A

They introduce here basically they're trying to find the position when coding through these kind of signal processing techniques, then they have the eigenvector or laplacian, which is this graph 4A transformation, so they're taking the graph and they're using a Fourier transformation on it. Then they have directional spectral encodings.

A

They Define that here they Define this in terms of directedness. So we next illustrate how eigenvectors of magnetical flossing in code direction for the special case of the sequence the eigenvectors are given by this function. This corresponds to the cosine transformation type 2, with additional factor that encodes the node position V.

A

So the eigenvectors of the magnetic laplacian also encode the directionality in arbitrary or directed graph topologies or graph topologies that are directed but also arbitrary, where each directed Edge encourages a phase difference in the otherwise constant first eigenvector between these two and so basically, what you have is you have this phase difference, so each Direction Edge is sort of a phase difference in this eigenvector. So um then they tell what directional random logs random, logs and graphs, which you often use to like, find these paths.

A

We showed in the last paper, where you go from one node to another, you're, basically crossing the graph and you're trying to find these pathways. So to overcome the issue of only walking in the forward Direction. We can additionally consider the reverse Direction. Additionally, we add self-loops to sync nodes. This avoids that a might be no potent and ensures that the landing probability is sum up to one. So we then Define the positional encoding for node V, and so then that's how they build those kind of uh random walks that are directional making.

A

You know sort of convert that to a directional encoding. So one important point about directed grass versus sequences, so we showed the example of the sequence where you go around in a circle. You thing has something that it's connected to in a certain order, but that's not really a directed graph. A directed graph is where you have a branching process that is uh dependent in time, and so that's a little bit different than a sequence um where it's kind of obvious.

A

What you know path is, and so an important implication is that each topological sort of the directed graph is an equivalent program or a different ordering of statements that yield the same result.

A

In figure seven, we should have the number of topological sorts over the sequence lines P for type of compact and deterministic. We struck a constructed, sorting Network for such networks. In the sequence length of just eight, the number of equivalent sequentializations already exceeds one million. So this is a pretty large combinatorial space. Note that in the worst case, a directed graph is n, n, factorial, topological sort. So that's a big number, therefore representing directed graphs, so sequences can introduce a huge amount of arbitrary orderedness.

A

In contrast to the sequential modeling, the graph based representation can significantly reduce the size of the effective input space. So.

B

A

A note on some of these different applications, then there's an application of sorting networks. So this is something that goes back to Donald news. It's a certain classic comparison-based algorithms of the goal of sorting any input, sequence of fixed size with the status sequence of comparators. So.

B

A

A sorting Network, where you have these different lines, you're, comparing between them and this being out of that you can build a directed graph. You know it's a little bit hard to see kind of how this applies to development, but it's I think it has a nice, uh maybe a set of applications in kind of thinking about Transformers and thinking about topological data analysis and how to connect all this with development.