South Big Data Hub Data Science Round Tables, 13 Jul 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Data Science Challenges for Cancer Immunotherapy

Description

Held on Thursday, April 13, 2017.

Panelists: Joel Parker of the Lineberger Comprehensive Cancer Center; Benjamin Vincent of UNC-Chapel Hill; and Victor Weigman of Q2 Solutions, a Quintiles Quest Joint Venture.

About: The age of immuno-oncology is upon us: new cancer immunotherapies are providing fresh hope to patients who previously had few treatment options. Combining these technologies with the “Cancer Moon-shot,” the sky’s the limit.
However, immuno-oncology exists at the intersection of oncology, immunology and molecular biology, each of which alone bring significant data science challenges.

For more information about the Data Science Roundtable series, visit bit.ly/SBDHroundtables.

A

Good afternoon and thank you all for joining us, so I'm, dr. Lea Shanley on the co-executive director of the South big data education for those of you who may not be familiar the South hub or and data innovation hubs generally at which therefore we're launched by the National Science Foundation's that serve as a catalyst to help build and strengthen public private partnerships that apply data science to real world challenges because I we get started I. Ask that those in your room mute your phones for those who are watching online.

A

We ask you to mute your mics, so we don't have screaming babies or broken bones throughout the panel. You can ask questions of the panelists when we move to that part of the the program by typing in your questions in the chat box and Carl will then convey the questions to the panelists and they will respond.

A

Let's see to tweet those of you who are treating SB, dh4, south big data hub 17, hashtag or hashtag BD hubs and then you'll catch all the hub purchase. So we welcome you to our South pod data science. Roundtable series I think we're on number five or so series or six is a monthly series that highlights emerging research, challenges in data science and identifies potential solutions.

A

Today's discussion will focus on navigating questions of data management, data sharing privacy and more in order best to take advantage of the opportunities offered by the promising new field of immuno oncology. I'd like to start things off by introducing today's moderator, dr. Kimberly, hrabowski Kimberly is a translational sciences and CI here at the Renaissance computing Institute and an adjunct, professor in the UNC Chapel Hill department, genetics at the Renaissance computing Institute. She supports best practices for cyber infrastructure and new business development, especially minna mains of biomedical and genomic initiative.

A

She received her PhD in bioinformatics from Boston University on a research fellowship from George church's lab in the department of genetics and medical school and and.

B

Ever Kilis alright, thank you very much for that kind introduction and welcome.

C

B

We are very very pleased to have these three very distinguished panelists to join us today and I'd like to introduce them to dr. Benjamin Vincent in the center is an assistant professor of medicine in the division of hematology and oncology at the University of North Carolina Chapel Hill, dr. Benson was trained in cellular immunology and immuno genetics in the laboratory of dr. Geoffrey Ehlinger, former chairman of the unc department of microbiology and immunology and past.

C

B

The American Association of immunology, dr. Vincent, has also completed his research fellowship in the lab of dr. Jonathan Sir OD, and he is currently a member of the Lineberger Comprehensive Cancer Center immunotherapy group faculty, director of the immuno genetics facility and leader of the mv-1. You preclinical immune therapy program. So thank you. Government, dr. Joel Parker do dr. Kinsey's left is, is a director of sequencing, microarray and other genomic analysis or the bioinformatics shared resource at Lineberger, Comprehensive, Cancer Center. His research is focused the methodological development and integrated analysis of high-throughput genetic and genomic studies.

B

He previously led the development of algorithms that in content that resulted in Pro Sigma, the only XI e mark and FBI FDA 510k cleared breast cancer diagnostic assay for FFPE tissue. So dr. Parker is currently involved in similar diagnostic development in Sydney, modern campus and dr.

A

B

Is the director of translational genomics at Q square solutions, a Quintel and quest joint venture doctor Wegman leads the group with a goal of continued facilitation of preclinical drug development through biomarker identification.

B

Ongoing research results revolves around the moment, profiling of cancer using both na and RNA approaches, including the development and deployment of robust assays that we leverage clinics clinically as laboratory developed tests for LD P. Dr. webmin brings more than 13 years of biomarker discovery, research with comics and the majority of those being dedicated to freshman houses for a PA, a Q squared solutions. Company dr. Waymon has published 14 papers on biomarker identification and ashlee development and has contributed to the development and launch of several genomic assays and further ado.

B

I would like to pass it over to dr. Vincent. To give us some framing is an introduction to the thanks.

D

Kim after that introduction, we can go to the next prize, so this is a survival curve showing the clinical data that we motivates our desires. Using no genomics approaches to understand responses in immuno oncology. The y-axis here is overall survival. My percentage, the x-axis is time in months. The numbers at the bottom are numbers of patients in the various groups that are still being followed on the study, and so they are all in all the groups.

D

They start at 100% survive over time and then, as patients are lost to follow-up or unfortunately die from their disease or other causes, the curves the curves go down, but three groups that you see there: colored blue red and green.

D

Our patients who have tumors where the tumor infiltrating immune cells are positive at two or three plus fairly positive or negative, or what is as of yet the best biomarker of response, the immunotherapy in any tumor and although I should say these data are from a large study in bladder cancer, probably the most robust data we have so far as you can see. The patients whose tumor infiltrating immune cells highly expressed this biomarker, which is PDL one they they do better so at 12 months about half of those patients, are still surviving, whereas.

B

There's only about 30%.

D

Surviving in the other two groups, now that said, the external excitement of immunotherapy is not only that it can extend survival from a bit for a bit in a number of patients. It's really.

C

D

What colleges are calling the tail on the curve, the leveling off of a number of patients who have long-term, durable remissions at five years? Some oncologists may call that a cure to the challenge. The real challenge of the field is to understand why the vast majority of patients even PDL one positive patients, still progress and succumb to their disease, whereas a small minority get long-term, durable responses. Can we understand that from pretreatment characteristics of the tumor micro environment, from assays, available from the blood etc? Let's go on to the next slide.

D

The reason why genomics is attractive in tackling this problem is that the tumor immune micro environment shown here in cartoon form on the Left panel. It's highly complex. It's a mixture, a dynamic equilibrium of competing and reinforcing cell types with many various functions, all of them communicating with one another and then the cancer immunity cycles shown on the right panel is equally complex disease, a series of events that has to happen for a t-cell to recognize and kill a cancer cell reading from the bottom left and pock why's.

D

There has to be cancer cell death of some kind, either natural turnover, chemotherapy, radiation, immune attack. It then leads to the elaboration of antigens that can be picked up by antigen, presenting cells migrate to the lymph node prime naive T cells, leading to T cell clonal expansion and activation, and those T cells which can potentially react and kill. Tumor cells have to get back into the circulation traffic to the to achieve ingress into the tumor micro-environment, find the specific target, tumor cells and actually achieve killing and at each one of those steps.

D

There are multiple layers of molecular regulation and public and private entities are looking at ways to augment anti-tumor effects at each one. So the reason that genomics is attractive is because genomics can allow us to see that complexity out of one or a small number of assays in a way that we can't do with traditional functional cellular immunology techniques, which largely require live cells so to robustly develop biomarkers. To respond to immunotherapy I think we need two things. We need to be able to assess the complexity of the immune micro environment.

D

We need to do that in a way that's accessible from FFP or formalin-fixed paraffin-embedded tissues, because that's the way that most samples are archived for ninety plus percentage of the patients, even at academic centers, next slide. So what is immuno genomics or genomics applications in immuno oncology?

D

Well, it's classical genomics like mutation detection, copy number, variation, detection, gene expression, profiling on top of immune, specific genomic assays like HLA type identifications, t-cell, receptor and b-cell repertoire identification and then modeling approaches to take all of your features classical genomics and amino genomics features and try to associate them with outcomes and separate the signal from the noise in order to develop biomarkers of response with that.

C

I'll transition.

D

E

Thanks man, so was a great introduction and your point there that we have to have this integration of multiple biomarkers in order to really power these trials. What I want to show is that that's possible and I have a few examples of how we've used a large number of variables and taking those into account in proper context in order to model them in such a way as to give clinical decisions. The next slide, please.

E

So in some early work we developed a classification device for breast cancer that was reliant upon the expression of 50 genes and measurement of these 50 genes, and you can go ahead and click the next one to think if they get it in, is really simply looking at a new sample and profiling it for these 50 genes and saying what is it most similar to from what we know and when, once we classify a sample in this way, using this multivariate approach, we can provide some assessment of it's not only its sub tight, but also its prognosis.

E

So how well that individual is going to to us or how long that individuals going to survive. In absence of any therapy and on the top right, what we found was that that producing this continuous score was actually very a very strong and accurate predictor of relapse free survival. What we're showing there is the linear relationship between the score. That's provided to clinicians on the x-axis versus the probability of survival five years on the y-axis.

E

What we also found is that using data in this way was was much more accurate than any clinically based diagnostic that was currently available. So taking things into account shown in the bar plotted tumor size by ter status is the ER and grade, and we can do our best job of modeling these data, but it's still nowhere close to the accuracy of the genomic based predictions which are shown in the later bars with the x-axis there being accuracy in determining the risk of relapse. Next slide.

E

Please next slide so so this was commercialized into a test called Pro, Sigma and Pro. Sigma commercialization turned this into a very easily interpretable report for the clinician so that they can take this highly complex, genomic measure that is evaluated with a computational model and then project that back to the clinician and very easy to interpret form, and so in the top left.

E

You simply get a score, which is, is this risk of recurrence score and it can use the used, classify patients into low intermediate or high risk and again there's this lynnie there's this continuum, which tells them not only about this categorization but on the bottom right, the actual probability of relapse at a given time point given their continuous score megawatt grades. So that's that technique was FDA approved and has been distributed, and it's been starting to be used to analyze samples that are in numerous clinical trials.

E

Here's an example of one clinical trial led by Lisa Carey at University of North Carolina, which was actually a negative trial and that they used combinations of different of different Herceptin, inhibitors or her2 receptor inhibitors. In order to look for in response to two in or philippi response in this particular subset of patients, we called her2 positive by a single IHC marker.

E

However, about subtyping it this way, what we see is that we get a enormous increase in the response rate, so on the top left are those three drug combination, and while there is, there is some variation, there was no statistical difference between the three arms of the trial. However, on the bottom right, we get a significant interaction with those that are sub sizes, the her2 enriched group- that's the genomic marker instead of the single protein. This is a multivariate genomic marker. Now we're getting pathologic complete response is what our outcome is here.

E

That means after the drug is given, the surgeon is going in and looking for tumor and there's none left, so this is as close to a cure as we can get in a short timeframe and seventy percent of the patients that receive this drug and are in that subtype achieve that result. Next slide, please. So, in other work, we're extending this work and looking at other drugs and using genomics to develop these models of subtype, and we can show repeatedly here's with another drug called enzalutamide. It's an androgen receptor antagonist.

E

It was approved in prostate cancer, and the thought was that there was some subset of breast cancer patients which may also be sensitive to this drug. Of course, it wouldn't be many because it's androgen and not estrogen receptor, but using genomics we can actually find that subset of patients who don't express estrogen receptor instead of androgen receptor, may be driving their breast cancer and the results of this drug works very effectively in that small subset of breast cancer patients and so using the genomic based biomarker predict.

E

They are up here on the top left we highly enriched for those those patients that are sensitive to the drug as opposed to a single protein marker I'd, see which would be used in the clinic right now and on the top right. So in this case we increase our sensitivity by 10% our positive predictive value by 10%. That is the patient's. We predict to be sensitive to the drug or actually responding to the drug and also increase our negative predictive value.

E

That is, we were more accurate at not giving the drug to people that that will not respond to it and next slide. Please, and so the result of this work again, a multivariate signature that was built in research space taken into the clinic. It will be a phase three trial where the biomarker is going to be entry in criteria into the trial, and this will get provide us definitive answer as to the enrichment of this based on the biomarker.

E

The beautiful thing about this particular test is that these triple negative breast cancers typically get chemotherapy alright and if you all heard chemotherapy is not too good for you right, whereas in volute Ahmad, the drug that we are going to give in in in substitution. For chemotherapy for these diagnostic positive patients is a hormone agent. We're only ten percent of patients even have grade three fatigue.

E

So not only will we go to the next slide, please essentially double their lifespan for those double marker, positive patients, but they're going to have a much better quality of life, and this is really the promise of genomics that we can take these high dimensional measurements.

E

Distill them down into some clinically actionable result, and that clinically actionable result is something that we can do right now, because we're taking a drug, that's already approved and giving it to the patients that that are accurately needed, and the result is shown here that in this particular cohort, the median survival is 32 weeks for those that are diagnostic negative. Those that achieved that are our diagnostic positive achieve. Seventy five weeks of overall survival and they're not even getting chemotherapy it's a single agent hormone therapy.

E

So this is what we want to be able to do with the amino oncologist space, but the problem is even is even larger, because, while this is just looking at tumor, genomics has been just mentioned.

E

We now have to incorporate tumor genomics, as well as all the micro environment, in order to see this kind of result, and oh, this was just a summary slide showing that, while we give you a few examples within a few different disease, subtypes of breast cancer, that in the larger cohort of all breast cancer, this kind of resulted by developing a gene expression based or general genomic model commercializing. It has produced results for all of breast cancer and because of that, we're seeing improved survival across the entire spectrum of breast cancer.

E

So thanks for listening now pass it over today, all right.

C

So yeah this is really good setup, because if amin ology is hard understanding how you mean the therapy is, is even harder, as we've kind of got a crash course in this. You can go next slide by the way, I'm so glad that I switched out my slide last minutes to a microwave virus. For this one. That way we don't throw up all the same one, but since we've already gone over a lot of this at least wanted to touch for the people. That's not familiar with it's.

C

What we like to hope when we get a disease or any other kind of affliction, our memes cells respond to it. We get sick. You know we kind of feel crummy for a little while immune cells are building up ability to kill whatever. That's, not you and it immediately goes away, so it shouldn't be too far removed to think about.

C

If you get a cancer which is which is a disease, which is definitely not you that your immune cell actually responds in that fashion, if you can imagine, if you actually have a heightened response, you should have better survival and the figure on the bottom left here by the way, the brown, stain and I'll show.

B

Myself for taking pictures.

C

Of pictures you can see it will blurt out the ground. Here are the cytokeratin markers that are highlighted here in the red, the red so again, CD 3, CD, 8, Fox p3. So in this particular paper from Paloma it out you.

E

Have a liver and.

C

It's a appropriate metastasis and you got your margin here in the rhythm when you actually get the tumor cells to actually start responding. Kind of are the British official for this patient is not so great. You know your body is not responding it as an immediate danger, although when you've got immune cells, giving a very strong response and by the way.

D

C

Is you know once they old school, but very little throughput I mean images, are big data but they're just about one picture. For one thing, um how do we really know what's going on here? Why does this patient's immune system respond versus this one, even though they both have colorectal cancer? That's really a tough part.

C

Let me get this to or micro environment, which is not just a tumor in the top figure here, that's in the page, but all these different cell types that are just sitting around doing what they're supposed to do or not, because the tumor is evolved enough to where the recognition pathways don't exist as we would like them to. If you go to the next slide yeah. This involves all the things you're not supposed to throw on a fraud, but I'm doing this to a point so I mean therapy is, is very important.

C

You guys have seen it on the cover of magazines, people talking about it wildly it's an MP all over the place and, of course, coming from a clinical trial organization. We know a thing or.

A

Two about what.

C

Trials being held for what means Darren's, so in this figure, as you get a radially out each pie, slice is a type of cancer, melanoma, non-small cell renal cell in colon, and, as you go outward, you go from your large phase, 3 to your small phase, 1 and every dot fear is a particular therapy or combo therapy for those that are that are yellow. While that's important is you have all these trials going on with all these immune therapies, primarily on? As those speakers already mentioned, you know, PT 1p do one, that's one marker.

C

So it's having lots of trials based on one marker, as we've learned and explored about pd-1.

A

C

Not necessarily the best marker and as Joe, showed and kind of getting the pro stigma and the combined gene expression leveraging the fact that genomics can provide you, multi analyte testing provides you a much more robust response. So what we're learning from these trials? They were only running one type of analyte testing, we're missing the more patients aren't getting that nice leveling off that this immune therapy is, is promised to do next slide.

C

So there's some thoughts here into what all things we test in the immune system and the immune response to that tumor can't go over all the slides, so I want these all the material here, cancer muta gram that was pushed out, but the reason why I showed this here mom. So given I come from a clinical testing organization, we actually have testing strategies for all these types of items. Whether there is the tumor riddled with mutations is to our tumor cells.

C

Actually, seeing the immune system present in there are they actually getting deep into the cells start killing the root of these items, and as we get around to these questions, we want to know about how the immune system responds to that tumor. There's all kinds of testing that are available on ones, the blue ones. My specific lab does, and the ones in green are the larger global laboratory structure.

C

But there's lots of ways to do this and, as was mentioned earlier, a lot of them are relying on live cells, which is very difficult to get I mean you do a blood draw, that's life cells, but normally when we get the tumor there, an FFPE or you get this big mass a lot harder to do these things. So the idea of a one test, one fit.

D

C

Immune therapy is just we're not there, and as clinical trials are coming, if you guys are next slides matter of fact, they're starting to come alive every time, I show these kinds of flaws. I have to do more research, because so many more clinical trials are showing up trying to get at what the how the immune system is responding.

C

This some work I did back in November manually, reviewing close to trial cells because it was too hard to to find some magic query past that gave me everything that I needed so I just did old school like the biologists in these woods, but wow that's really important and, as you can see, what study types are you just observing patients or you're, actually intervening with drugs? I will say you know it's. The immunes is something in the immune system, a primary outcome measurement. That's this slide right here.

C

It's something in the immune system of secondary income, yes or no.

B

C

Then, on the right here are we actually measuring something besides one cutting one marker now, if you guys have ever played around clinical trials.gov, which is a food, um the descriptions are anything from meticulous and well thought out to. Oh, my god. I'm meeting up with some friends at 4:30.

D

C

Got to put out these sausages into my description before I go catch them up as well, so they're really really widely different about what's going on in these trials. So let's take the first one here for prostate cancer, that's recruiting I've got the codes. I've got the whole list of these 80 some on trials in a different publication, but you know here we're going to measure t-cell diversity as a t-cell repertoire, deep sequencing all right over the drug helper t-cell, that's great, and there is a whole slew little.

C

We haven't introduced here all the different types of markers for what that is and I will bring it up. In my last slide. Ok.

D

E

We're going to get a lot of the.

C

Commencements about how the t-cell cells really get in there and doing the killing, how they're responding okay cool? What are we going to do in this CLL study? All right? Maybe it has the new system as primary outcome. Definitely not the secondary. The primary one is cell surface antigens Rick, which one antigen so.

B

It makes it really tough to kind of figure out exactly what.

C

Complexity of data we could potentially mine to get at this kind of stuff, and the reason why I'm setting all this up is that big data is great when it exists and well characterized and organized by the way. I'm sure this is a lot of preaching to the choir, but in the clinical trial realm.

D

C

Is not a consideration? That's going on most of the time in this last day here at lymphoma is just an example. This is extra explicit, we're going to look at minimal residual disease or MRD um ss5 t-cell, with this clinically available assay great I, know exactly what's going on. Yes, it is indeed in genomics and by the way, it's something they're exploring by the way is we're not using this data to actively get people on these drugs, while some of them exist.

B

C

Not a lot I mean even here in this consider it's an additional study so obviously based on some testing they're going to get a drug or they are not going to get a drug. In this case, it's not going to be related to anything about their immune system, they're going to sit and watch that's kind of where we're at right. Now in the sitting and the watching phase in my last slide, I'm going to show the evolution of how this kind of.

E

Testing comes along from our laboratory here.

C

Is I will say: I have a full complete information about all the owner school testing mechanisms, but as we do PCR- and this is a hundred and seven different primers from a 2003 tapir- you run those primers together, you look at how their kind of size distribution set and you have the same patient doubled up. So you make sure you get a consistent response and you kind of say yep. Those squiggles are totally different than those squiggles there's less of them here. There's less immune cells that are different here.

E

C

It's not so different and, of course, we love the gel right. Very highly resolved, as you can tell from here to the schmear with some dot, makes it very hard for like actually something as resolved as what joel had mentioned because attending on what picture and camera you use, you might be getting different intensities of this kind of stuff and pending on how good your fragment, analyzer is or how.

E

C

Know your protocol is these. Squiggles may not make me shift, so this is when it's really diverse and very healthy. This is what it's not so. The vs. in your green cells sees all kinds of stuff like that big dark black mass from earlier on. Here's one that says meu system is not really doing really well, so you got between these two controls, something in the middle, something close to the end.

C

Okay, that doesn't really work out so well as far as a something we can mine measure go back and forth with maybe there's you know a few computers that can look at the image recognition, but we.

A

E

C

Here, with exactly two degree of the recognition sequence, foam does tumor antigen we get to know what kind of frequency we see, the specific sequence, how it counts. We get how many actual not normalize, counsel we get so actually starting to get expression of the different antigens from the the T and B cells here, in a way that we can actually kind of consistently measure specifically.

D

C

2003 to 107 different primers to now 25,000, so automatically we've exploded. It's consistent, you know pretty substantially about what we're measuring from the t-cell receptor. Of course, now the trials are starting to come out. There are now last I heard at least at ACR last week was maybe like 150 trials and immune therapy that are using marker I. Couldn't look that up it's really hard to to go to trial, but we're starting to see these things. But the question is: how are we actually going to be able to leverage this data?

C

How is it actually going to be created in the first place in such a way where we can start understanding how the immune therapy is responding to these patients? That is it's almost like we revisited or reengaged in expression, profiling in a 15 years ago. So it's getting really difficult to mine this, and not only that as I mentioned, the trials are describing things very well. I will say complaining the space, the data organization isn't so hot, so there is there's a lot of room to grow in that space.

C

Although the data is volumous enough to really kind of lead that charge, I will end on that. There's some Easter.

D

B

Thank you, fantastic, wonderful, whirlwind, introduction to the problems around immunotherapy and I think so the takeaways are there's loads of data. It's kind of messy. It's not well organized.

D

B

Really need lots of different assays and we have to figure out how to integrate all this information and so I think there's much opportunity for and I for data scientist chocolate and help save the world of you guys because of the really exciting thing about amino apology and immunotherapies is the results that are being seen in the clinic and I'm sure everybody's familiar with the story, President Carter, and how he responded, having brain cancer and his personal permission as a consequence of immunotherapy, okay, so I guess one question I'd like to ask you.

B

We saw the immuno gram, we thought kind of a number of different asking that we could be using to measure these things.

B

Dr. Benson, what would you say would be some of the primary things that need to be measured moving forward. Maybe can you talk to us a little bit about you know what is being measured today for these things and what other things need to be added right away, and maybe a longer-term you know do some sense of what you would like to see sure.

D

B

D

A genomics perspective I think we think in terms of the base level, assays and then the analytic so from base level assays. We can get a lot out of RNA sequencing alone, but if you wanted to say, what's the what's, would you have if you know in your dreams, but actually reasonable to do? It would be RNA sequencing, whole exome, sequencing and t-cell receptor and b-cell receptor, amplicon, amplification and and see those those three sets of things are doable at a decent cost structure and can give us a huge wealth of information.

D

But on top of that is, the analytics are more complicated than in classical genomics, and so that's the sort of the next layer is. You need analytics to actually fish out the immune, signaling kn'l in especially in the RNA sequence data, and then you need sophisticated modeling approaches like what joel has described in order to essentially build your biomarker model or your model.

D

That will be a biomarker from a large number of features at you know at each level deciding what's where the orthogonal information is what we would exhibition so on that that's also a difficult problem. So maybe there's three layers: there's the base assays then there's the analytics to get the base level care immune feature characterizations out of those assays and then there's the advanced modeling to test and build the biomarker and so.

B

When you talk about performing these assays, I presume you're, talking about on the FFPE tumor tissue on DNA.

D

And RNA derive from FFP and.

B

D

B

Even touch what you would want to do with normal, and it's because I heard you say also, you need to get looking here, that the normal immune environment and I would presume. You'd probably want to do it in multiple spots. Many, where do you think, is sufficient. If.

D

We had a choice. We want multiple spots infected late last year. There is a wonderful article in science about heterogeneity and expression of tumor antigens across different geographical geospatial or spatial regions in to mer, and so you may get different answers about what the tumor target space is they're, trying to predict that from one versus another, so multiple. So this this is actually a place where a genomics result may inform clinical biopsy strategy because right now, what happens if the tumor is large enough?

D

Is the radiologist or the interventionist who is getting core biopsies with the long needle will just get in a piece of an arc they're getting four biopsies one, two three four well people are starting to consider actually requesting more heterogeneous biopsy strategies, because some of these studies are coming out about intramural, heterogeneity and gene expression, including antigen, and then.

C

For me, I found the hard part to follow up on that is actually getting that in the trial protocols. You mentioned the the RNA seek exon and a t-cell receptor over the last year and a half our trials that are testing for exo-m and RNA. Seek have slowed it to you know several thousand or tens of thousands of patients that are getting those two things done. The CMB is too expensive now for everybody to start doing, but I'm going to grab go out in the secondary arena, they're starting to understand the value of that data.

C

But the funny part is is that when we try to deliver that data, is okay go and put it in the database? What do you want? The database RNA seek date. What about.

E

It we want all of it.

C

To send a file, but we can't have 80,000 columns in our clinical trial databases. What do we do? Is there use of this? You know we were expecting pd-1 and ctla-4 can't don't you have that, but we we have that and seventy nine thousand nine hundred and ninety six other. um So it's been really difficult. We spend a lot of lag time in the data arena, just getting people understanding. What that information is because the pis.

D

C

The clinicians know the people on the back end are used to status quo type testing, and that makes it really tough to do this biomarkers. So every so.

B

That begs the question, then: what can clinicians, who are trying to interpret some of this data.

C

B

Also, you know, how can they be assured.

C

B

D

Of their getting our.

B

Highest quality have any possible magic, yeah.

C

That's a great question: um it depends on how much people study understanding what the assay that used to generate that data and in any kind of we get audited all the time and, of course, for you and see out imagining and testing in an isolated arena. Do you understand how the sequencing group works? You understand how the samples go through joel spike watch you.

C

You can follow that kind of chain of custody, but that's not always done everywhere, just because all these trials- and you look at the places that are doing these testing they're everywhere, so having a single way to follow all that information that the patient that you're looking at is the white one is actually a very difficult problem. So controlling that having standards for how lap you know your cap and CLIA activation is a great way filter these types of things out and then getting into the trepidation.

C

Oh man that is yeah, there's no wand-waving it'd, be like ah it was that antigen that I had up on there that six, you know string of 60 bases, that's the one that responds to the therapy. I! Don't.

B

Know how you just the rub, trying to take all this information and integrate it and report it and potentially even share it with other biomarker developers which a whole nother can of worms Jill. Do you have any comments about that? Hi yeah.

E

Absolutely so this is the real kind of the new challenge with immuno in ecology. Is that that you want to integrate these different features, whereas, prior with with you know the biomarkers that I described earlier? These are all you know. We consider them as all just tumor markers and and while they are voluminous it's you know typically simple, linear, based strategies of modeling will that will allow us to produce a very directly interpret.

E

Vil result, however, that when we go to immuno oncology, because we need to integrate these different components being the tumor and all the different components for micro environment, we have to consider interactions between them and as soon as we talk about interactions, the combinatorics get extraordinarily large and the combinatoric SAR already hard when we were just looking for for linear effects, and so so not only do does our combinatorics blow up, but at the same time we want to condense it back down.

E

We, you know really good ways of convincing that information back down to simply to simplify interpretation. So here are the cells that are present of those cells that are present.

E

Here's the ones that are active and in what ways they are active in this particular sample, and so we have to reduce all this data with the knowledge of interactions back down to these simple phenotypes that we know exist, and we know our part of the tumor killing or not, and and so until we can get enough samples to understand the combinatoric and to understand at the genomic level, to a point where we can reduce it back to these simple phenotypes and produce a clinically interpretive or result, it's going to be very challenging.

B

We need more data and.

C

I'll, say I mean, but even then getting into that phenotypes. If we go back to the figure of that, we have to would go back to the figure. That's in a micro environment, I mean now I mean you get actually put the slide out here on the assumption that YouTube I am I, have it, but actually the gene expression signatures that our immune cell specific, we can actually start saying.

C

We have cd8 here we have cd4 here we have memory, be really at a basal level, but at least we can start characterizing each of those immune cells and ideally hope they it's there. It's active or it's we don't see it at all and maybe getting to the pathology and the staining something. That's a lot more robust and automatically dimensional reduction becomes much more tangible, I've, not seen you know quite that predicted level yet, but I know there's definitely papers existing to help with that process. I think.

B

C

B

C

A remote participant has a pair of questions are related. What.

B

Are the key privacy challenges in this type of research and how you address these challenges.

D

Till may be better able to speak this to me, I think you probably understand the architecture a little bit better, but I think we, you know we are very careful to store sequencing data and hipaa-compliant wait. So unc takes the approach, there's kind of two schools of thought about genomics data and privacy. One school of thought is it's not it's not person if it's not whole genome sequencing. It's not personally identifiable data. Thus I can do with it. Whatever I want put on my laptop and whatever there's another school staff.

D

That says essentially any NGS data, including you know, exome variant, server, Ian's patterns is potentially personally identifiable and so should be behind the HIPAA wall and that's the approach at least that we take at my broker. So.

C

We have a lot of number.

D

Of protections in place, but you don't know exactly how look old.

C

E

D

E

D

E

D

To accomplish that, for us yeah we tried.

E

But you know the I mean I think there's a cover of The Economist this past week. That said something about nothing on computers is ever safe and then, and so so I think we all have to bear that in mind. You know ultimately lock.

B

The door that's.

E

Right, ultimately, you know our DNA is: is our identity there's nothing more identify both in the DNA, however, in order to make it identifiable it requires that you go out and have something that I know. Is you and test it and compare it right, and so so you know the HIPAA regulations right now, taking the stance that it's not identifiable, because you would have to go resequenced. Someone- and that's is reasonable. However, I don't think it's going to be too far in the future before before such technology is, is very amenable to me.

E

Finding you know whatever it is a piece of your hair or skin flakes that allow us to be identifiable, so I think it's a short-lived stand, but at the same time we try to be more conservative in our in our in our stance at UNC, I. Think since you, the data for I'll, talk to the clinical part suite to it, which is that you know a lot of privacy is also considered upfront in the consent, right and and I.

E

Think what we're really having trouble with right now is is that you know, as you know, we're in this discussion of the day about consent from historic studies where we have banks and banks of FFP tissue, which we now have technologies to completely unlock, and they have years of outcome and clinical data available, but they were not consumed 'add in the way in which we can say that information correctly right, many of these people are maybe is now guide. What do we do?

E

Plus, because it's DNA that that information, you know, has relevance to their families, it's not just them where their tumor. So we have to be very careful with how we approach those things, and you know it may be I think this is a real challenge. Actually, I don't want to be optimistic or pessimistic is, but there has to be a way that we can unlock that, while still having some respect for confidentiality and and what they've continued to what the patient can send to when they enter the study and.

C

I'll, add to this on another clinical level for the trial, testing and other items like this. So when a person contracts with us to russets, it exists for their trial or whatever testing we're doing when you sign up for that particular test. A box arrives through that provider that already has sample collection, materials and things like this pre bar coded with an anonymous ID so where my organization is holding under lock and key that identifiers database.

C

So once that data leaves the hospital and comes to the testing lab or where testing labs, plural they're, only known as that barcode ID as far as a privacy entity is concerned, and so one of the larger parts of our business is actually maintaining that sample sourcing security aspect in the clinical databases that then we would share back with very private means encrypted means back to the the clinical trial site in position about the result of that that person.

C

So, even my team, when we get a sample in process that we know it's a sample ID once we upload it back to our delivery portal, that's when the delivery portal can read the barcode slap, the patient, ID back on and go back to the Firefly. We never even have access or knowledge of those kinds of things. And yes, of course, as far as the computer infrastructure is set up, the monitoring of every single person, even our robots on that system, is completely locked.

C

Every single file made was this person at this time, logged into version control, etc, etc. So, there's a lot of things in place where.

B

The requirements for CLIA validation, especially.

C

When you're, making diagnostic.

B

All right well, is that there.

C

Was a two-parter right.

B

Yes, that opens more questions. Probably do you know how do how does one try to be foresighted by consenting for long term research, and also, how can you be farsighted in terms of anticipating what will be called thi with regard to genetic information in the future and I? Think that's a really good, very important point that Tomatoes countries we've.

D

Integrated now into our tissue banking, specific consents or genomic studies, with the recognition that it's identifiable, essentially identifiable material, so going forward, I think we're okay. But the jewels point of what about the tissue banks is started in 1997 and as a thousand pin right I mean you haven't.

E

D

E

It just to save like what a valuable resource that is. It takes any additional statements. The initial slide I showed on making getting that biomarker approved by the FDA. It was because we could run it on FFPE tissue, and so we were able to run that biomarker on owen retrospectively, collected trials that already had 15 plus years of follow-up. So in that way we are, you know the trials already done.

E

The data is or the samples are already there, the 15 years of follow-up writing there, so we can immediately take it to action and that that's where the real value is otherwise, no matter what you know, you're great hypothesis would come up is with is right. Now it's going to be that long before, especially these long outcome, cancers like breast cancer can have any action taken on.

B

So then I think maybe the flipside of that question is you know you have this data they're not broadly consented in some cases, and you have multiple teams from different pharma companies and we'll have you kind of running the race to see who can get the biomarker? How do you share data to facilitate you'll discover it, you know. Is there what kind of data can you share? How can you incent organizations to this kind of data sharing wealth in time respecting privacy consent, the United and all these other things.

C

That's part of what this data composition I'm, looking at a job with someone who says well.

E

C

Probably touches it.

E

More than I for public data, but I mean all the you know all these own. These former run trials aren't going to see the light of day in those no.

C

E

C

E

Don't instead I think speaking to what Kim said about incentivizing them, I think as more and more trials come out where they see that their particular drug has no great effect in a general selection process, but with a biomarker in place. Now, all of a sudden, they increase that positive predictive value and that positive predictive value.

E

What that means to the accountants that the drug companies is that they can spend a lot less on patients in the trial, because now, if I, have a two-fold increase in effect size, it's going to cut the number of samples that I have to run through the trial in half in order to get the same effect and convinced the FDA and so I. Think. As you know, as more of these biomarker based trials permeate medicine, that farmers will start to realize that there will be these cases where we actually save money.

E

You know, and it's going to be, that kind of that's that's their incentives right, but as long as as long as they can continue to push the drugs board with in the absence of biomarkers, because the effect that the general effects size is enough, it's going to be challenging and.

C

I will say that it's going to happen. If we have speakers we get to do a public service announcement. I would say you know, being able to change. How we run an administer and select patients for trials based on biomarker is something we have to start doing a whole lot more of I mean numbers that we see in our CRO. Space is surprisingly low. Like team there.

B

C

B

On Obama sort.

C

Of certain of certain types, I mean, like all.

E

C

A mean their items- that's one marker, so that's really great, but a lot of the cases for these it's still awaiting seat and they want to get the trial started right now. They hope that this blockbuster enough and it's a-okay- they don't need a biomarker, because the phenotype is cancer, gone high-fives for everyone, but when it doesn't work- and we know it doesn't work 70, 60 percent of the time- we know that, because we're in the field. Well, how do we solve that? How do we fix it?

C

How do we make sure someone we identify, someone as not being able to get that very expensive new therapy or preventing them from getting the side effects? So I spend most of my time now. You know, begging and pleading that this extra cost of the trial actually saves this much money down the road, it's a very different role, but it's something that we have to do. So, if you guys know friends like, are you through the Bible?

C

You know I'll be great, there's no local congressman for that kind of stuff, but that would be I. Think that's something that we need to see a whole lot more of genomic testing isn't isn't that bad value from mining the stuff going down the road is, is better and also interplay. Opening from these farmer databases that are so discrete there's no sample volumes.

D

The other thing that has to happen is more clinical. Options have to be approved in order to actually show value of biomarkers, to clinicians, who are actively treating patients, because now, if we can do a study, clinician is faced with a patient with lung cancer, who fail multiple lines of therapy and a biomarker can tell that clinician that the patient has a 10% chance or a 40% chance of responding to a certain truck, but that drugs the only option left.

D

The clinician will prescribe that drug and not use the biomarker test and not care about the biomarker test, because it doesn't change his or her decision-making calculus, whereas if, as I expect five years from now, there will be multiple combination. Immunotherapies available that work by different mechanisms across a number of tumor tissue types and clinicians will be faced with. Oh I have double-digit possibilities to use.

D

How do I know, which one is best for my patient, that is when I think immuno genomics-based biomarkers will start to shine, but in order to shine five to ten years from now, we have to be developing them. Yesterday,.

B

D

And unit, and so I think that in terms of being broadly applied, the field is still extremely young, but but the work that's been done so far marks. It is also extremely promising. That's the real.

C

Exciting part of the mining aspect is getting to that.

B

So, with regards to the kinds of datasets that are available public datasets to support some of these decisions and some of the research that guides these decisions at you know, we know we have clinical trials like gob, which is not as well organized. This is like.

C

A box of chalk.

B

And we have sequence, databases, I am JP, but you know different types of HLA sequences are out there and- and maybe we have clonal repertoire, but you know.

A

B

Of databases, would you, if you have your dream list, I would like to ask you to you to think about. Is there a data set that you wish that you had and that you know that won't be corporate funded, but that might help with fully and I'm.

C

B

To I'm going to be this too sick person to say clinical.

A

B

Could stuff you know if there was one extra box that people were asked, you know of ten things that you might have in your head? What would be one of the things that you would want to have on that clinical trials like oh, especially Lessing,.

C

Listing the test being used and therefore you can leave backtrack and know that this does that thing there's so many more, but already as I showed earlier, it's like we're going to look at presenting antigens wow. That's what we're going to look at very difficult for me to understand which ones those are until the five years later or I go to ASCO, and it shows me oh, we are at least going to look at ctla-4. Okay, well, I knew that what.

E

C

Would be the one thing just inkling I was like no. That I would like to I would like to know. Maybe.

B

A drop-down and it's something to help structure is.

C

Amazing for something that I can like a catch, it reads through its and hey. You didn't put this test ID on there. I know it's kind of sometimes maybe a hard thing to do.

B

Do you have any other ones, oh yeah, so at.

C

Least, in the what what and I'm trying to do this personally in my shop is when we look at the tumor microenvironment we're running gene expression on there, which we do for vast majority, for our mean therapy stuff, actually also staining it in a lot of the areas that are there. Having more confirmation we asked earlier about, the this particular mean cell is up or not or active. You know we have flow, we have I see.

C

We have NGS to actually start building a more better verified kind of hot and cold signatures, so a database that has the RNA seek whatever by knows and love and the flow which everybody's comfortable with and most hospitals have and in some cases, the IHC, and so you know with these not just a phenotype treatment, but the molecular phenotypes are known along with the genomic. That would be a great database very hard to get I mean some.

C

These Charles are just starting to pop up with some of those it's not easy to do, especially if it's low and you're sorting you don't necessarily do this.

B

C

B

The uninitiated initiated flow cytometry. Is ah you.

C

Take lots of like, would you then provides an engine there? That is a floor of color and there's really smart camera and you basically flow the cells through capillary tubes once it sees its the of course, all cognition camera on the flow meter sees these particular colors. It shifts them down different suits and actually starts counting the sales. So tube has been enough to wear nothing. Nothing, that's nothing, something, nothing, nothing, something something something else something else. These things actually get pretty complex.

C

So that is the thick version of psychology and we're seeing that test in 2003 and ever and.

B

I'm going to ask you I HC,.

C

Immunohistochemistry, so when you do an FP slide, you do the same thing. It says, set a flow you're just staining for a particular marker on the cell, and you know that cell is very specific for that very marker. So if it shows up, then there are cells of that particular type. In your thing,.

B

And so you show the immuno gram before and we're a lot of different wedges in that pie. But disk would it be fair to say that slow and light see in genetics they.

C

Dominate this still testing space right, many dominate it. Absolutely there was something.

B

As a data, scientist and I wanted to learn about the athletes, so I knew what kind of data there might be some places it.

C

Would be having databases, a flow that would be I would love to know where those exist. No, they don't. Ok, absolutely I've, never heard of them. Whoo whistles.

D

B

Ls Nicholas, Cage and.

C

National treasure 5 find any for me so.

B

Then you know, let's say some some young and up-and-coming data scientist. This does create a slow database. What kinds of intellectual property issues that we talk about in terms of database ownership? Is that something that is even you know considered. In this basis, this pointer is still laughter.

C

Not that I would know of assess what I can imagine areas where that's identifiable so yeah, but.

D

It is, it is pervasive, it is everywhere.

C

So it's something you're going to get the most out of and.

B

It's still a lot of data and.

C

Those are tower keepers of use, yeah yeah, all right.

B

C

There any questions from.

B

Anyone in the room just.

A

E

The challenges are all around of gave four days of different data and how much your challenges.

B

Is computational and you have usually.

E

Common controllable queries enable it like that, so we a lot of technologies are out there today when I talk to other former pharmacy I worked.

B

Other colleagues, they said technology's, not an issue, he had up to speed and power. Would you agreed that or there are.

E

Other things that are kind of holding you back I think.

D

It depends on how loosely you define the term, not an issue. You know we don't have all the algorithmics and all the software. We need to produce the immune characterizations from genomics data that we want, but I know, and sometimes people say that's not an issue. They mean I've got a team that can get that done in a year or two with a high degree of confidence, as opposed to I. Have the solution in front of me right now, but I mean at least in my lab and in our work together.

D

I would say our confrontational problems are not all so unless.

E

D

Saw them this morning, no.

E

Not yet, but you.

D

E

Give you an example of where they have been solved. It would be while modes in the back of the room here he worked up an algorithm to reconstitute the b-cell receptor, which is extraordinarily complex from RNA sequencing data, and he published that last year. As part of that we wanted to. We wanted to assay the Cancer Genome Atlas set, which is about 10,000 total to, and you know constitutes for a few hundred terabytes of data which would have taken. You know quite a bit of resources at UNC.

E

In fact, all of our resources quite some amount of time, but through the use of things like Google cloud, you can get it done in a very short amount of time simply bought through. You know: rapid parallelization across you know most Google computers on the East Coast right. So so there are some.

A

E

We you know we can tackle if it's just a magnitude of computation problem because of technologies like the cloud, however, there.

C

Are other computational.

E

Problems which are more algorithmic that you know in order to get to that work. It was a year of development, you know of algorithmic development, and so so, even though we can, we can tackle some of these things just by throwing more computers at it and those are solvable and somewhat tractable. There are still all the combinatorics that we even make Google's.

E

You know cloud, not a solution for evaluating all combinations and then then it comes to issues like really do we have the data even to do this with, especially in some of these problems, where you know the even in linear, forgetting about combinations right forgetting about interactions, are the number of features we're looking at, or you know tens of fold smaller than the number of samples that we're seeing well.

B

It takes an individual little time to discover good algorithms of machine learning can do that from the data and.

C

It can, if it has an endpoint that you know, is true. It has a phenotype that you've checked the bar box off of and.

B

The trainer right or the hard part here, is that you.

C

Know in that old-school kind of smear thing, I showed. We know that it's relatively diverse the onion repertoire for that patient. Well, I'm, now telling you exactly what the antigens are. So how do I know that it's correct I, don't know that what my fancy assay does. What it's actually saying is accurate. That's another thing: that's hard to do. How do you go back and define the truth of what it is: orthogonal e, a different technology at different mechanisms, and also something that clinicians and regulatory bodies?

C

So that's why maybe this possibility for machine learning is there, but only with orthogonal testing, and since we already just described the testing in general with expensive, adding another one for the purposes of algorithm tuning I've still yet to have Pharma salt for me, but it creates the problem that you can't throw them at it, because you don't it's hard to find the truth. You can have a best guess for the truth, but even then you're, relying so much on these reference databases that themselves nee curation.

C

We find bugs in them all the time, but that's ok! It's supposed to do because, or the growing process of having these things are right. We.

E

Rely extensively on machine learning in order to develop these biomarkers and you need sometimes the process algorithms are using are quite simple, sometimes they're more complex, but in general machine learning is. Is it's a highly utilized tool in our lab? However, what we're talking about here is really: how do you get the features that the machine is going to learn from, and that's really the challenge, and that's that's where work is needed in order, because we can't just give it the raw data, even if we have all the supervision week, we can.

E

You know, there's a lot of domain-specific knowledge that goes into developing these features before their input into the machine model or into the machine learning process.

B

Some reason there are any clinical trials or accessories for it, but you know the Couture sequencing and patience.

B

C

No, that they're not available, but that are being done. They.

D

Exist but they're not available, so there there are two, so there they're a couple of large pharma funding trials in the net there pspace that have been published where RNA and DNA sequencing were done, and one of them was. You know this Fievel curve I showed earlier was derived from one of them. Unfortunately, those data are not publicly available and I had written emails and called authors on the study and drug company representatives and gotten shut down handily over and over again.

D

Well, they say that they did, but whether they actually I mean it's not in the published reports. There are two trials two trials, one of the where our immunotherapy trials, with associated RNA and DNA seated with out amplicon repertoire data, but where we can run our algorithms to infer TCR and BCR am repertoire profiles from RNA sequencing.

D

Unfortunately, one of them has 28 patients and the other has 40 patients compared to these larger pharma funded trials, where you've got hundreds of patients with medical annotation. So it's.

A

B

Patients out with.

D

28 patients, in with with 40 patients, it's just not enough to do robust model building for associations with response and biomarker discovery. Now that doesn't stop people from trying. So one group published a paper where they used a pseudo machine learning approach, the perfectly classifier response versus non-response in these small datasets and I came by Joel's office and I said Joel. What do you think of this and he looked at it and he said I can't believe they published this garbage, and that was a direct quote and so now.

C

D

A presentation to my lab group I actually had a picture of Joe. Sorry Joe Mike, alright, I'll stop talking about this. What Joe meant to say it is it is excellent with is, is this: it was an example, an extreme extreme overfitting and you know not just it's not just Joel's opinion. It's sorry to hang you out to dry there and close it so.

C

I get a second part.

D

C

B

So just say something to make: Karthik use an orbit like where you would have an on scale collection of the sequencing of the genome in profile. You know the tumors and if you.

D

Have something.

B

Like that, what would be the best way to put in whatever biomarkers you can will be the most useful, given that, and you know, problems of our community.

B

C

B

Rna didn't give me a choice. You know.

C

B

To say that yeah.

C

Sure I'll take a path.

B

Because, oh yeah whole.

C

Rna is actually.

B

Typing you'll be able to get it rather cheaply if you gather whole RNA, we.

C

B

Of ideas, that's what would be the most useful by Marcos.

C

If you knew that and you probably ahead of analysis working group.

D

Well, there is an immune response, working group that Joel and I are actually a part of for TCGA, and we are trying to answer that that very question, but without being able to actually look in the trials in big trial, datasets and figure out what correlates with response the best that we can do and what we are doing is we're. Looking for the combination of immune features that best predicts say overall survival, adjusted for tumor tissue type and other clinical factor.

C

Right, like a figure I showed earlier with the very weak response versus the wild. You know heavier response that person lived two more years and the person that had the weak response with the RNA. You can kind of get to that. Get to that point, but I was actually addressing those on access. You tishe have as much like therapeutic outcome where you can do that kind of stuff. I imagine you can just do risk and this.

E

Is this is what I'd survival back to Kim? You asked aminika like. What's the one thing that we that I mean this is the huge missing piece is that we have beautiful genomics data sets like TCGA that allow us to make a lot of informed. You know informed I, guess, associations about genomics within genomics, and then you have clinical data sets, but that that have you know some level of demographic and clinical characteristics that you can start to. Look at.

E

The missing piece is really putting those two in the same showed you some examples of when we have clinical data when we have two nomic data, it works well, but that's really what's missing right on TCGA doesn't have this, it wasn't the goal of TCGA and that's why we were funded in it's not supposed to be the next round of TCGA. But I forget the name. It's another cancer program from the NCI, where we'll actually be doing a lot of this.

E

You know global RNA and DNA sequencing for clinical trials that were designated by the NCI. So in.

C

A way that we are leveraging things in the TCGA where, when we come up with the signature and these 40 patients, 28 patient items, the cross-reference prevalent TCGA to then essentially leverage what percentage of patients from that as a you know, marker for say, melanoma potentially, could react it. The same benefits there's a lot of small phase, one work with a cool biomarker being said, of course, with 48 it, because it's the garbage. They will be that small and everything you know, but expanding that out TCA.

C

You actually start getting that what population could potentially benefit from that same area and that's where looks like myself or even you guys can go back in the farm and say you probably already have data stick and go around begging nicely like a PR, fun drive or something are.

E

What we're doing you know with the idea is that we can use all that the high dimensional genomics data to try to reduce it, to the features that we really care about. So you know a lot of the RNA signatures are correlated some of the protein markers and so on and so forth. So how can we distill all this down to here? The four or five things that we think we can measure clinically and and represent the diversity of genomics that are in these larger data sets.

B

So I guess: if there's no more questions in the Reformers room, that's.

C

B

The other thing I want to ask for one was I talked to accumulated to non-committee, so usually take a long time, because we need response. I mean your effective immune response. Right now, the doctors occations may not show up like as early as a.

B

I just wanted to get your perspective on that game. Show that you don't give. It is not my genre, but.

E

D

And, as you might imagine implied by what you said, a majority of the toxicities are actually autoimmune phenomena, so the classical one that came with the first immunity agent was inflammation of the colon or colitis, causing severe diarrhea, that mimics autoimmune disease like Crohn's disease or ulcerative colitis, but you can have autoimmune, thyroiditis, Drina, lightest, even lung disease or pneumonitis, and they serve up so a multitude of autoimmune phenomena that can happen after you systemically activate the immune systems to try to be effective tumor, but even even that is still the overall burden of toxicity is far less than would say chemotherapy.

D

So it's still substantially better and I. Think one of the things that the field has to learn over the coming years is how to best manage those complications. So someone gets colitis from one of these drugs right now. We give them high dosage of steroids and taper them off and then hope it doesn't. It doesn't come back. Is that good enough? Is there a better therapy for that same thing for the thyroid disease or adrenal, gland disease or lung disease, and so on?

D

So I think it remains to be seen what the best therapies are, but there are autoimmune toxicities, the overall quality of life and mortality burden is still far less than what came out there.

B

So, thank you again to our panelists and and would you have closing remarks kind of.

C

B

Up that takeaway, if there's something that you wanted to leave lots of our data scientists, we left with all this Newell College information.

C

Even outside the amine oncology information that I mean, if we're going to really start to link clinical databases in the patient and their marker history or their testing history with genomic they'll tell us the thing.

A

C

Linking it I think is where I'm putting the quotes in. We don't have. Oh there's just not a solution that I've seen. Maybe it does exist where you're marrying the patient clinical information, which we need to get the survivability. We need to know what drugs they were put on and then all this wealth of genomics as I mentioned earlier, we have pharmacists. They just throw that X almond RNA seek into our cloud database it'll be great, and you can't because you're clowning them several orders of magnitude beyond what the database can hold.

C

But if there is such a mechanism that can dynamically link those things, then that would be that instead of the challenge that I, don't think is yet solved even by.

E

A lot of these large data.

C

Company that I will remain nameless, but I still haven't seen, I think a solution that is going to yield the outcome that everybody anticipates. Having these kinds of discussions undergrad at Pepperdine well,.

B

You can do that and it'd.

A

C

There's just not a Mexican infrastructure to support what the physician needs and what the researcher needs in one place is wholly up. I.

D

Think it's very exciting time, but we in summary, we need more data and- and we need better algorithms for feature selection at the same time and.

B

We already have well.

D

Well, I think it's more of the data of patients sequencing data from patients who are on immunotherapy trials with robust clinical annotation pharma companies have it for big trials and aren't releasing it, and so we need either to convince the NIH to give us know millions of dollars to do these. Studies on the large trials that the NIH is supporting or we need to take real.

C

Action with a Pharma garma.

D

To release release the data that'd.

C

Be like just taking one as award for you know, 17 years, if I.

D

Had a solution for that problem, I would have already executed yeah in a bit. You didn't have some not event to start up, be.

C

Like because that would be amazing yeah, oh yeah,.

B

E

Know not much where I got nothing, that more.

B

E

Analyzed, but they know that we have okay, I would take well annotated data over more data. Any day of the week has been and simulated to and and I think really that's. The key is is to figure out some way where we can get, and you know better annotated data, not just more data, and- and so this has been illusion. This happened in a couple different ways: it's either you know government-sponsored research or it's figuring out a business model that makes it attractive for the pharmaceutical companies to to to.

E

You know, allow others to dig into their data, and but neither one of those is happening right now and because I, you know, there's just it's either too expensive or it's too much of a risk, and that's really limiting now, but speaking to just a data science perspective that that doesn't mean that we should just stop. That doesn't mean that we should oh, we can't get to the good data, so we're going to quit.

E

There's still enormous opportunity to take the data that are available and, and companies are already doing this from just looking at public data. So there is, there is, you know, good knowledge to be had of whether it's going into hospitals that are willing and linking up clinical data with treatment schedules or taking and then going through and sequencing those samples right and kind of putting together figuring out here's a cohort that I can access what what data can I?

E

What data can, I, you know bring into this cohort in order to make it the goal that I want and, and so that that's happening, I think that's that's an exciting part, but back to the data science part I, think that you know we should continue to look for ways to aggregate data either through you know old school record linkage or or newer ways of using technology in order to figure out cohort level associations and matching up whether it be on genetic ancestry or who knows what right in order to to make predictions about folks that are going to respond to these drugs that can be tested, I think there's many opportunities.

B

All right, thank you very much. Fantastic great information. I know I've learned a lot. Thank you. We.

A

Have a few more announcements from the South big data innovation hub? If you hang on for five minutes, the South hub will be open up application process program to empower partnerships with increasing government starting tomorrow. So companies and government agencies are interested in data science, faculty postdocs and graduate students for up to 12 weeks. In the summer they will post descriptions to the internships and residency's available on south big data hubs. Org.

B

As of tomorrow,.

A

Friday April 14th Pepi program in its second year. The big data hub, as you may know, was awarded three million dollars in Microsoft sewer critics along with training and technical support. The south hub is getting 750,000 of those of your credits, which will.

C

A

Up to competitions for those quite soon as part of that Vonnie Mondavi of Microsoft Research will be. She and her colleagues will be presenting on what their computer to do for our hub members, so be sure to check that out on the data sharing and infrastructure working group meeting Friday April 28th from 3:00 to 4:30 p.m.

A

Globus will also be presenting at that time again.

C

A

Website for that help save data innovation hub, for there will be a free microsoft, viewer training workshop full day up in washington, DC on June 8. The same day, the South hub will be having a workshop at Intel on our digital intelligence and then the next day on do nice. Sal hub will be hosting its second annual All Hands meeting at Microsoft facility in Friendship, Heights right right off the Metro, so we'll be announcing the hold the date and register shortly.

A

Amazing, Sarah Davis we'll be sending that message out for you all so stay tuned for details, details and lastly, you haven't seen it already. The National Science Foundation has issued a call for proposals for the next round its post proposals as part of the hub. Those grant awards go from anywhere from 100 to higher styles and slippery Millions. It was looking for cross-sector partnerships and connected data scientists and Maine scientists and practitioners for real world applications. There are.

C

A

Competition, meaning that each institution, each University can submit one. So the.

C

A

Are now having in terms internal competition, so look for an earlier deadline. The NSF deadline is September 18th and you will need to obtain a letter collaboration from your hub by June kind of heat, so look for FAQ, that's coming out as a call for proposals. Announcement is already out there so with that. Thank you to Kimberly for a fantastic panel and thank you for our distinguished guests who have joined us today. Those of you who are in the room please feel free to come up to talk to our guests as you.

A

Thank you very much.