South Big Data Hub Data Sharing & Infrastructure Group, 2 Feb 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CI WG demo: Big Data Trends for Health Care Analytics

Description

Date: 02/02/18
Presenter: Kelly Gaither
Institution: Texas Advanced Computing Center
South Big Data Hub

A

And again, we'll have more people join, I know as the call goes on, but let's go ahead and and I I think we're all queued up for our first presentation from Kelly Kaiser she's, director of visualization and a senior research scientist, the interim director of education and outreach all attacked the Texas Advanced Computing Center an associate professor in women's health at the Dell Medical School at the University of Texas at Austin, where dr. Gaither conducts research in scientific visualization, visual, analytics and augmented virtual reality.

A

She received her doctoral degree in computational engineering from Mississippi, Mississippi State and her master's and bachelors in computer science from Texas A&M. She has publications and fields ranging from computational mechanics to supercomputing applications to scientific visualization, she's, currently, a co P I and director of community engagement and enrichment for extreme science and engineering discovery environment, as we all know, is exceed and she's, given a number of invited, Thaksin keynotes in including one on our call today with that I'll. Let you take it away. Kelly thank.

B

You so much can you all hear me? Okay, yes,.

A

Yes, perfect: okay,.

B

So this is a you know: I apologize for the 2017 in the slides, but all I would have had to have done, was change it to 2018. So these are. These are big data trends for healthcare in 2018 as well, so I do want to point out that I join the medical school faculty this past summer. So it was an opportunity for me to my research, as, as she said, is in visualization, primarily, but I also have sort of a computational engineering background that Women's Health is very progressive.

B

Here we have a medical school that has built brick and mortar from the ground up about five years ago, so they wanted to do things a little bit differently, so they're trying to look at ways that they can combat problems. They know they have and ways that they can be very innovative by combining people from dinner, different disciplines from creating interdisciplinary and multidisciplinary teams, so just to point out some of the the problems that they are looking at value based or patient, centric care, where that the care of the patient is actually judged.

B

Primarily, the success of the doctors really judged primarily by whether or not the patient feels as if they've gotten a lot of value out of the visit and and the interventions there's also, as we all know, the healthcare Internet of Things we've got. You know millions of devices and sensors.

B

The problem in this space is that we've got so much data that it's very difficult unless we get sort of a small portion of it. With these small, well-defined problems, it's very difficult to make sense out of it. I know that there are a number of people looking at reducing fraud, waste and abuse I mean you all probably already know this, but the u.s.

B

spends five times more money on health care than any other country with our same profile with no better outcomes, so that gets to sort of the next bullet, which is predictive analytics to improve outcomes, and that's primarily, the area that I work in, although I have worked a little bit visualizing in the value-based patient-centric care as well, and then real-time monitoring of patients, primarily the experience that I see is more in telemedicine.

B

I can't seem, let me see, I can't seem to you. Let me see if I can. Okay, so I am working primarily directly with a maternal fetal medicine physician who is in women's health. His specialty is in high-risk pregnancies, but he has a bigger vision, which is he news that there are two primary problems that are driving healthcare costs and really at the at the end of the day, creating adverse outcomes. That's non reproducibility of medical evidence and there are a couple of factors that that go into that and overutilization, so with non reproducibility.

B

If it was a bit of an eye-opening experience, I don't have any other medical background other than being a patient. So it was an eye-opening experience, a little bit like like drinking through a firehose when I first got there, but it was really illuminating to understand that all of medicine is based on an averaging effect and with my engineering background there are a number of times where averaging means taking out some of the details that we really want to see.

B

The other problem is is that they rely on people, understanding, statistical significance and if you give the same person, if you ask them whether they're willing to take a 30% chance of risk with their life versus a 30% chance of risk with their finances, it's shocking to see what they will say same amount of risk, they're willing to take it with their life, but not with their finances. A little bit surprising there. We also have a problem with over utilization and that's driven primarily from a fee-for-service.

B

So you know: I went into this thinking that hospitals and clinics were nonprofits and- and that was also very eye-opening, but a fee-for-service. Basically, all of the private insurance pays for patients with no insurance. So there is an awful lot of ordering extra tests ordering extra diagnostic type evaluations that drive up private insurance fees to pay for those that are uninsured.

B

Additionally, we also have doctors practicing defensive medicine, so since the 70s, when it became fairly litigious- and we had malpractice suits, there are a number of doctors that don't really want to take the risk, so they'll be much more cautious. They'll prescribe way more diagnose diagnostic tests and interventions as well and as I said, this also contributes to rising healthcare costs and adverse outcomes.

B

So what we are working on is taking a set of data. So right now we have an enormous amount of medical data, probably from a combination of sources as much as 10 to 15 years worth of data that we are trying to put together. I will say very honestly that it's very ugly data, particularly coming from my engineering background, where things had a nice structure very rarely was it that we had any missing data, and that was kind of the first thing that you worked on.

B

Was you made everything fit together in Nice little puzzle pieces with the medical data that we have and some of the environmental factors that we have? We know from the get-go that we're going to have incomplete data. We know that it's messy. We know that it's unstructured. We know oftentimes, particularly if the data comes from a public source, that some of the details have been taken out of it to maintain the privacy of the individuals and the files makes it really complicated.

B

But what we're trying to do is individualize, given a person's baseline risk and their characteristics, individualized diagnosis and treatment net effects so that we can communicate that really put the decision. There's been an awful lot of conversation in women's health about the fact that informed consent, oftentimes, isn't informed, or at least not well informed, and in fact, when the medical decisions are more and more emotionally based when there is the possibility for an adverse outcome. That's when people really truly don't understand oftentimes.

B

What they're consenting to we are using data analytics certainly is a big data problem, but also visualization to communicate to the stakeholder population that includes physicians. It includes patients, it includes also policymakers and business business and industry as well. All of the stakeholder populations in the decision made the decision-making process, so we are developing. We are I like to put it this way.

B

We are thinking big and starting small, we're doing personalized decision, support tools again, bringing together this very large set of unstructured some of it linked some of it not linked some of it measured and very quantitative, some of it not some of it, environmental trying to capture social determinants of health, with an eye on trying to determine whether we can personalize decision support.

B

We are starting with particular projects in women's health, but let me back up for just a second and give you some idea of what the scale of the data is that we're looking at in the future and I'm just gonna talk about the state of Texas, which is roughly 30 million people. It stays fairly constant over time. If you look at people's individual genetic code, so I get a lot of questions about. Are we doing genetic variants?

B

Well, we have, you know only about 3 million variants that make us the special flowers or special snowflakes that we are that's only really about 125 megabyte, so there is because we've been around as a species for a very long time. We share a lot of genetic overlap in our code. There is only relatively a very small amount that makes us individuals, so let's go and look at what we collect over time, so EMR electronic medical records, EHR health records, hie health information systems.

B

If we look at what's collected there, if you are a healthy adult, they're gonna collect roughly on scale of less than a megabyte per year for a healthy adult. If you are unhealthy, but they don't collect images for you, that's roughly about forty megabytes. If you are unhealthy- and you also have images that are assessed with your records, it's about 300 megabytes a year again, really not that big a deal. But what we also know is that the medical information that we're collecting through the EMR s is really only a portion of the story.

B

So there's a lot of information that we know goes into causation that we know we don't know there are people working on life, history trails, that's trying to collect decisions, people make gathering information about environmental factors about travel about where you were trying to piece these things together for an individual's life. History tale life. History trail we're talking about approximately 50 terabytes per year.

B

Now that's getting into some sort of a reasonable scale, but if we look at the data for the population of the state of Texas or a state, this size we're looking at 1.59 zettabytes of data per year. Here's here's the problem, there's an enormous amount of data here that we really probably don't need, and in fact we absolutely don't need. But right now we don't know a lot of factors that go into causation. There's an awful lot of historical assumptions and a lot of conventional wisdom that go into making decisions.

B

There are a lot of doctors that try to put patterns together with things that they've seen before, but the problem is when you actually measure bias in a physician population, it turns out that really what drives their decision-making processes, the things that they remember really the most emotionally based decisions, oftentimes with the most adverse outcomes, so we're holding on to this collection of data and trying to link it together, trying to get everybody to put it together so that we can understand and fill in the gaps and find out what we need to do with it.

B

What we need to keep overtime a little bit about what we're working on right now is in women's health, primarily if we look at pregnancy we're looking at problems that we know have a known time frame and a known outcome, and you can compare that with something like cancer on order of looking at drug trials for cancer you're. Looking at ten to fifteen to twenty years before, you really know what the outcomes are. If we look at pregnancy, we know that we have nine months.

B

I maintain it's ten months, but nine months until you have a known outcome at the latest, and then we have an awful lot of data that we can go back and look at right. Now, we're looking at the risk of stillbirth versus the risk of neonatal death, trying to determine an individual woman's optimal time of pregnancy and what happens.

B

We've got some preliminary results that suggest that it can shift as much as six to seven weeks, meaning that there are instances and populations that we can characterize as needing to be induced at 36 weeks and then all the way up to those that probably do need to stay baby needs to stay in utero up to 42 weeks, and it makes a significant amount of difference. We found that some of the conventional wisdom that we thought, for example, mother's age, does not have an overwhelming influence on the outcome of this.

B

Some of the conventional thoughts that we thought weight gain. They really are using a rule of thumb with weight gain so we're starting another project to go back and look at all of the factors that weight gain influences as well. We are currently using publicly available data sets which come with all of their issues and problems.

B

So, as you can imagine, data cleaning data verification really trying to go through with 4.2 million births or earth outcomes a year, and we have roughly ten years of data looking at that is, is quite challenging and it's something that we're trying to develop visualization tools to help us with the analysis and then also communicate to a broader population. But I'll leave you with just a couple of items. You know from my perspective of HPC and data science is not approachable by or targeted for other domains. It's not taught in an interdisciplinary context.

B

One of the first things I noticed with the physicians, was that there was an awful lot of time in the very beginning, trying to understand our translation. Well, the same word in my backgrounds. Vernacular meant something completely different in theirs and there was this translation period where we had to really learn how to communicate with each other and I think we can actually teach. You know high-performance computing data, science and even visualization from the perspective of problem solving and then dive down into the guts in medicine. Specifically legacy decisions are strangling. Our progress.

B

Data is viewed as the intellectual capital and it's very difficult to get people the hospitals, some of the organizations that own data it's very difficult to get them to let go of it. Long-Term decisions are being made on some very naive concepts of scale. They are hamstrung at this point by decisions that they made where they did not fully understand how big this could grow and then try to really architect around it and then computational science principles are just now being taught in medical schools. There are a couple of medical schools around the u.s.

B

that are actually teaching a new breed of doctor or physician so that they really are more technically savvy and more comfortable with data and all of the information that they're trying to put together, but the one thing I do know and I'm. Certain of is that medicine is already moving in a data-driven direction and they are already using it to try to make data-driven evidence-based decision-making. Rather than just going with their gut, so thank you so much I will open it up for questions.

A

Kelly, thank you. That was extremely interesting I'd like to ask you a quick question, while others think about what they might want to ask you on your last slide. You said that there are some medical schools that are doing more to teach about computational science techniques. Can you think of any offhand that are exemplars or yeah? There's writing.

B

Work and I cannot yeah I, want to say it's Johns, Hopkins and they're teaching, visualization and temporarily, not so much high-performance computing. It's really more what they call big data or data analytics. So these guys know statistics very well, but they don't know any of the computational methods for machine learning or for any sort of more exotic analytics which to you and I might not be so exotic, but to them it would be beyond sort of what you would do through MATLAB or our it's a it's a little bit of magic to them.

A

Thank you. um We have time for maybe one or two questions for Kelly yeah.

C

This is Florence if I could ask a question or make a comment. This was very interesting Kelly and it was great to hear what you're saying. Are you familiar? What, with the computational approaches for cancer workshop that occurs at supercomputing every year, Frederick National Lab for cancer research and Mount Sinai out of new? You are kind of leading I'm on the program committee still for that they're. Actually, looking at how we marry, you know like people from do-e who are used to looking with high energy physics, computational algorithms to apply to you know cancer research.

C

This is like in that vein of what you're talking about it's, not in your specific area. You know with obstetrics and gynecology, and things like that, but I think it's in the same frame. It.

B

C

And, as a matter of fact,.

B

So we are partnered up. In fact, there's a there is someone Tommy Angela at UT. That's doing computational cancer research, we as a well women's health, but as a med, school or partnered up with the engineering for computational engineering Sciences at UT Austin- and you know, like I, said my background- is in computational engineering, so I'm very familiar with the simulation models based on physics right, so that the physical simulations we are using for known outcomes.

B

We are using inverse Bayesian methods which have been used quite a bit by do e to measure uncertainty and to go back and fill in some of the gaps so, for example, o market oz. We are partnered up with them to do inverse, Bayesian methods to go back and see if we can predict these known outcomes, given what we know is an incomplete set of data but yeah. Absolutely it's very similar. Well we're specifically in women's health.

B

Now there is nothing and I would say, really we're in the area of rare outcomes, so a lot of what we do because, for example, you know neonatal death, maternal mortality stillbirth. These are relatively rare occurrences. There is nothing to prevent us from going out to other rare diseases, rare outcomes in other medical fields very.

C

Interesting, so if you want a connection to these guys now, let me know I would.

B

Love it okay,.

C

I'll, send you a link.

C

It seems like I was really interested in the difference between average effects and individual specific effects, which are obviously what matters from the patient's perspective. But I'm curious in sort of the big data context. We have a lot of routine data, but not necessarily in experiment that you're building this off of how you how you square that, with the causal inferences that that you would want to make at the individual level. Well.

B

So we're not yeah, that's a good question. So what we're trying to do primarily now is stratify risk which doesn't necessarily get down to an individual person's level. It gets down to an individual set of characteristics, so so, for example, the risk of well in the u.s.. We know we like in Texas, we have the highest rate of preterm birth.

B

We have the highest rate of maternal mortality and it's increased dramatically since 2010 the chances of us finding the exact cause and being able to prevent an exact individual from dying or having preterm birth may be rather slim, but what we are able to do given a large amount of data is, we are able to look at all the characteristics and see whether, in fact, your race has is a factor or your. Whether you have a baby boy or a baby.

B

Girl is a factor or whether there are certain sets of characteristics and the idea being that, if we can at the very least stratify risk, then we can actually point more intensive medical care towards that higher risk population and and we've actually been able to at least identify that in these this massive set of data that we have that those risks stratifications do seem to hold steady. They do seem to bear out cause causes a different story.

B

Cause is a complicated matter and and cause may be actually something for which we are even collecting data on right now,.

C

Well, the prediction side of it is critically important. It sounds really interesting. Yeah you.