►
From YouTube: CI WG demo: Big Data Trends for Health Care Analytics
Description
Date: 02/02/18
Presenter: Kelly Gaither
Institution: Texas Advanced Computing Center
South Big Data Hub
A
And
again,
we'll
have
more
people
join,
I
know
as
the
call
goes
on,
but
let's
go
ahead
and
and
I
I
think
we're
all
queued
up
for
our
first
presentation
from
Kelly
Kaiser
she's,
director
of
visualization
and
a
senior
research
scientist,
the
interim
director
of
education
and
outreach
all
attacked
the
Texas
Advanced
Computing
Center
an
associate
professor
in
women's
health
at
the
Dell
Medical
School
at
the
University
of
Texas
at
Austin,
where
dr.
Gaither
conducts
research
in
scientific
visualization,
visual,
analytics
and
augmented
virtual
reality.
A
She
received
her
doctoral
degree
in
computational
engineering
from
Mississippi,
Mississippi
State
and
her
master's
and
bachelors
in
computer
science
from
Texas
A&M.
She
has
publications
and
fields
ranging
from
computational
mechanics
to
supercomputing
applications
to
scientific
visualization,
she's,
currently,
a
co
P
I
and
director
of
community
engagement
and
enrichment
for
extreme
science
and
engineering
discovery
environment,
as
we
all
know,
is
exceed
and
she's,
given
a
number
of
invited,
Thaksin
keynotes
in
including
one
on
our
call
today
with
that
I'll.
Let
you
take
it
away.
Kelly
thank.
B
So
this
is
a
you
know:
I
apologize
for
the
2017
in
the
slides,
but
all
I
would
have
had
to
have
done,
was
change
it
to
2018.
So
these
are.
These
are
big
data
trends
for
healthcare
in
2018
as
well,
so
I
do
want
to
point
out
that
I
join
the
medical
school
faculty
this
past
summer.
So
it
was
an
opportunity
for
me
to
my
research,
as,
as
she
said,
is
in
visualization,
primarily,
but
I
also
have
sort
of
a
computational
engineering
background
that
Women's
Health
is
very
progressive.
B
Here
we
have
a
medical
school
that
has
built
brick
and
mortar
from
the
ground
up
about
five
years
ago,
so
they
wanted
to
do
things
a
little
bit
differently,
so
they're
trying
to
look
at
ways
that
they
can
combat
problems.
They
know
they
have
and
ways
that
they
can
be
very
innovative
by
combining
people
from
dinner,
different
disciplines
from
creating
interdisciplinary
and
multidisciplinary
teams,
so
just
to
point
out
some
of
the
the
problems
that
they
are
looking
at
value
based
or
patient,
centric
care,
where
that
the
care
of
the
patient
is
actually
judged.
B
B
The
problem
in
this
space
is
that
we've
got
so
much
data
that
it's
very
difficult
unless
we
get
sort
of
a
small
portion
of
it.
With
these
small,
well-defined
problems,
it's
very
difficult
to
make
sense
out
of
it.
I
know
that
there
are
a
number
of
people
looking
at
reducing
fraud,
waste
and
abuse
I
mean
you
all
probably
already
know
this,
but
the
u.s.
B
I
can't
seem,
let
me
see,
I
can't
seem
to
you.
Let
me
see
if
I
can.
Okay,
so
I
am
working
primarily
directly
with
a
maternal
fetal
medicine
physician
who
is
in
women's
health.
His
specialty
is
in
high-risk
pregnancies,
but
he
has
a
bigger
vision,
which
is
he
news
that
there
are
two
primary
problems
that
are
driving
healthcare
costs
and
really
at
the
at
the
end
of
the
day,
creating
adverse
outcomes.
That's
non
reproducibility
of
medical
evidence
and
there
are
a
couple
of
factors
that
that
go
into
that
and
overutilization,
so
with
non
reproducibility.
B
If
it
was
a
bit
of
an
eye-opening
experience,
I
don't
have
any
other
medical
background
other
than
being
a
patient.
So
it
was
an
eye-opening
experience,
a
little
bit
like
like
drinking
through
a
firehose
when
I
first
got
there,
but
it
was
really
illuminating
to
understand
that
all
of
medicine
is
based
on
an
averaging
effect
and
with
my
engineering
background
there
are
a
number
of
times
where
averaging
means
taking
out
some
of
the
details
that
we
really
want
to
see.
B
The
other
problem
is
is
that
they
rely
on
people,
understanding,
statistical
significance
and
if
you
give
the
same
person,
if
you
ask
them
whether
they're
willing
to
take
a
30%
chance
of
risk
with
their
life
versus
a
30%
chance
of
risk
with
their
finances,
it's
shocking
to
see
what
they
will
say
same
amount
of
risk,
they're
willing
to
take
it
with
their
life,
but
not
with
their
finances.
A
little
bit
surprising
there.
We
also
have
a
problem
with
over
utilization
and
that's
driven
primarily
from
a
fee-for-service.
B
So
you
know:
I
went
into
this
thinking
that
hospitals
and
clinics
were
nonprofits
and-
and
that
was
also
very
eye-opening,
but
a
fee-for-service.
Basically,
all
of
the
private
insurance
pays
for
patients
with
no
insurance.
So
there
is
an
awful
lot
of
ordering
extra
tests
ordering
extra
diagnostic
type
evaluations
that
drive
up
private
insurance
fees
to
pay
for
those
that
are
uninsured.
B
Additionally,
we
also
have
doctors
practicing
defensive
medicine,
so
since
the
70s,
when
it
became
fairly
litigious-
and
we
had
malpractice
suits,
there
are
a
number
of
doctors
that
don't
really
want
to
take
the
risk,
so
they'll
be
much
more
cautious.
They'll
prescribe
way
more
diagnose
diagnostic
tests
and
interventions
as
well
and
as
I
said,
this
also
contributes
to
rising
healthcare
costs
and
adverse
outcomes.
B
So
what
we
are
working
on
is
taking
a
set
of
data.
So
right
now
we
have
an
enormous
amount
of
medical
data,
probably
from
a
combination
of
sources
as
much
as
10
to
15
years
worth
of
data
that
we
are
trying
to
put
together.
I
will
say
very
honestly
that
it's
very
ugly
data,
particularly
coming
from
my
engineering
background,
where
things
had
a
nice
structure
very
rarely
was
it
that
we
had
any
missing
data,
and
that
was
kind
of
the
first
thing
that
you
worked
on.
B
Was
you
made
everything
fit
together
in
Nice
little
puzzle
pieces
with
the
medical
data
that
we
have
and
some
of
the
environmental
factors
that
we
have?
We
know
from
the
get-go
that
we're
going
to
have
incomplete
data.
We
know
that
it's
messy.
We
know
that
it's
unstructured.
We
know
oftentimes,
particularly
if
the
data
comes
from
a
public
source,
that
some
of
the
details
have
been
taken
out
of
it
to
maintain
the
privacy
of
the
individuals
and
the
files
makes
it
really
complicated.
B
But
what
we're
trying
to
do
is
individualize,
given
a
person's
baseline
risk
and
their
characteristics,
individualized
diagnosis
and
treatment
net
effects
so
that
we
can
communicate
that
really
put
the
decision.
There's
been
an
awful
lot
of
conversation
in
women's
health
about
the
fact
that
informed
consent,
oftentimes,
isn't
informed,
or
at
least
not
well
informed,
and
in
fact,
when
the
medical
decisions
are
more
and
more
emotionally
based
when
there
is
the
possibility
for
an
adverse
outcome.
That's
when
people
really
truly
don't
understand
oftentimes.
B
What
they're
consenting
to
we
are
using
data
analytics
certainly
is
a
big
data
problem,
but
also
visualization
to
communicate
to
the
stakeholder
population
that
includes
physicians.
It
includes
patients,
it
includes
also
policymakers
and
business
business
and
industry
as
well.
All
of
the
stakeholder
populations
in
the
decision
made
the
decision-making
process,
so
we
are
developing.
We
are
I
like
to
put
it
this
way.
B
We
are
starting
with
particular
projects
in
women's
health,
but
let
me
back
up
for
just
a
second
and
give
you
some
idea
of
what
the
scale
of
the
data
is
that
we're
looking
at
in
the
future
and
I'm
just
gonna
talk
about
the
state
of
Texas,
which
is
roughly
30
million
people.
It
stays
fairly
constant
over
time.
If
you
look
at
people's
individual
genetic
code,
so
I
get
a
lot
of
questions
about.
Are
we
doing
genetic
variants?
B
Well,
we
have,
you
know
only
about
3
million
variants
that
make
us
the
special
flowers
or
special
snowflakes
that
we
are
that's
only
really
about
125
megabyte,
so
there
is
because
we've
been
around
as
a
species
for
a
very
long
time.
We
share
a
lot
of
genetic
overlap
in
our
code.
There
is
only
relatively
a
very
small
amount
that
makes
us
individuals,
so
let's
go
and
look
at
what
we
collect
over
time,
so
EMR
electronic
medical
records,
EHR
health
records,
hie
health
information
systems.
B
If
we
look
at
what's
collected
there,
if
you
are
a
healthy
adult,
they're
gonna
collect
roughly
on
scale
of
less
than
a
megabyte
per
year
for
a
healthy
adult.
If
you
are
unhealthy,
but
they
don't
collect
images
for
you,
that's
roughly
about
forty
megabytes.
If
you
are
unhealthy-
and
you
also
have
images
that
are
assessed
with
your
records,
it's
about
300
megabytes
a
year
again,
really
not
that
big
a
deal.
But
what
we
also
know
is
that
the
medical
information
that
we're
collecting
through
the
EMR
s
is
really
only
a
portion
of
the
story.
B
So
there's
a
lot
of
information
that
we
know
goes
into
causation
that
we
know
we
don't
know
there
are
people
working
on
life,
history
trails,
that's
trying
to
collect
decisions,
people
make
gathering
information
about
environmental
factors
about
travel
about
where
you
were
trying
to
piece
these
things
together
for
an
individual's
life.
History
tale
life.
History
trail
we're
talking
about
approximately
50
terabytes
per
year.
B
Now
that's
getting
into
some
sort
of
a
reasonable
scale,
but
if
we
look
at
the
data
for
the
population
of
the
state
of
Texas
or
a
state,
this
size
we're
looking
at
1.59
zettabytes
of
data
per
year.
Here's
here's
the
problem,
there's
an
enormous
amount
of
data
here
that
we
really
probably
don't
need,
and
in
fact
we
absolutely
don't
need.
But
right
now
we
don't
know
a
lot
of
factors
that
go
into
causation.
There's
an
awful
lot
of
historical
assumptions
and
a
lot
of
conventional
wisdom
that
go
into
making
decisions.
B
What
we
need
to
keep
overtime
a
little
bit
about
what
we're
working
on
right
now
is
in
women's
health,
primarily
if
we
look
at
pregnancy
we're
looking
at
problems
that
we
know
have
a
known
time
frame
and
a
known
outcome,
and
you
can
compare
that
with
something
like
cancer
on
order
of
looking
at
drug
trials
for
cancer
you're.
Looking
at
ten
to
fifteen
to
twenty
years
before,
you
really
know
what
the
outcomes
are.
If
we
look
at
pregnancy,
we
know
that
we
have
nine
months.
B
I
maintain
it's
ten
months,
but
nine
months
until
you
have
a
known
outcome
at
the
latest,
and
then
we
have
an
awful
lot
of
data
that
we
can
go
back
and
look
at
right.
Now,
we're
looking
at
the
risk
of
stillbirth
versus
the
risk
of
neonatal
death,
trying
to
determine
an
individual
woman's
optimal
time
of
pregnancy
and
what
happens.
B
We've
got
some
preliminary
results
that
suggest
that
it
can
shift
as
much
as
six
to
seven
weeks,
meaning
that
there
are
instances
and
populations
that
we
can
characterize
as
needing
to
be
induced
at
36
weeks
and
then
all
the
way
up
to
those
that
probably
do
need
to
stay
baby
needs
to
stay
in
utero
up
to
42
weeks,
and
it
makes
a
significant
amount
of
difference.
We
found
that
some
of
the
conventional
wisdom
that
we
thought,
for
example,
mother's
age,
does
not
have
an
overwhelming
influence
on
the
outcome
of
this.
B
Some
of
the
conventional
thoughts
that
we
thought
weight
gain.
They
really
are
using
a
rule
of
thumb
with
weight
gain
so
we're
starting
another
project
to
go
back
and
look
at
all
of
the
factors
that
weight
gain
influences
as
well.
We
are
currently
using
publicly
available
data
sets
which
come
with
all
of
their
issues
and
problems.
B
So,
as
you
can
imagine,
data
cleaning
data
verification
really
trying
to
go
through
with
4.2
million
births
or
earth
outcomes
a
year,
and
we
have
roughly
ten
years
of
data
looking
at
that
is,
is
quite
challenging
and
it's
something
that
we're
trying
to
develop
visualization
tools
to
help
us
with
the
analysis
and
then
also
communicate
to
a
broader
population.
But
I'll
leave
you
with
just
a
couple
of
items.
You
know
from
my
perspective
of
HPC
and
data
science
is
not
approachable
by
or
targeted
for
other
domains.
It's
not
taught
in
an
interdisciplinary
context.
B
One
of
the
first
things
I
noticed
with
the
physicians,
was
that
there
was
an
awful
lot
of
time
in
the
very
beginning,
trying
to
understand
our
translation.
Well,
the
same
word
in
my
backgrounds.
Vernacular
meant
something
completely
different
in
theirs
and
there
was
this
translation
period
where
we
had
to
really
learn
how
to
communicate
with
each
other
and
I
think
we
can
actually
teach.
You
know
high-performance
computing
data,
science
and
even
visualization
from
the
perspective
of
problem
solving
and
then
dive
down
into
the
guts
in
medicine.
Specifically
legacy
decisions
are
strangling.
Our
progress.
B
Data
is
viewed
as
the
intellectual
capital
and
it's
very
difficult
to
get
people
the
hospitals,
some
of
the
organizations
that
own
data
it's
very
difficult
to
get
them
to
let
go
of
it.
Long-Term
decisions
are
being
made
on
some
very
naive
concepts
of
scale.
They
are
hamstrung
at
this
point
by
decisions
that
they
made
where
they
did
not
fully
understand
how
big
this
could
grow
and
then
try
to
really
architect
around
it
and
then
computational
science
principles
are
just
now
being
taught
in
medical
schools.
There
are
a
couple
of
medical
schools
around
the
u.s.
B
that
are
actually
teaching
a
new
breed
of
doctor
or
physician
so
that
they
really
are
more
technically
savvy
and
more
comfortable
with
data
and
all
of
the
information
that
they're
trying
to
put
together,
but
the
one
thing
I
do
know
and
I'm.
Certain
of
is
that
medicine
is
already
moving
in
a
data-driven
direction
and
they
are
already
using
it
to
try
to
make
data-driven
evidence-based
decision-making.
Rather
than
just
going
with
their
gut,
so
thank
you
so
much
I
will
open
it
up
for
questions.
A
Kelly,
thank
you.
That
was
extremely
interesting
I'd
like
to
ask
you
a
quick
question,
while
others
think
about
what
they
might
want
to
ask
you
on
your
last
slide.
You
said
that
there
are
some
medical
schools
that
are
doing
more
to
teach
about
computational
science
techniques.
Can
you
think
of
any
offhand
that
are
exemplars
or
yeah?
There's
writing.
B
Work
and
I
cannot
yeah
I,
want
to
say
it's
Johns,
Hopkins
and
they're
teaching,
visualization
and
temporarily,
not
so
much
high-performance
computing.
It's
really
more
what
they
call
big
data
or
data
analytics.
So
these
guys
know
statistics
very
well,
but
they
don't
know
any
of
the
computational
methods
for
machine
learning
or
for
any
sort
of
more
exotic
analytics
which
to
you
and
I
might
not
be
so
exotic,
but
to
them
it
would
be
beyond
sort
of
what
you
would
do
through
MATLAB
or
our
it's
a
it's
a
little
bit
of
magic
to
them.
A
Thank
you.
We
have
time
for
maybe
one
or
two
questions
for
Kelly
yeah.
C
This
is
Florence
if
I
could
ask
a
question
or
make
a
comment.
This
was
very
interesting
Kelly
and
it
was
great
to
hear
what
you're
saying.
Are
you
familiar?
What,
with
the
computational
approaches
for
cancer
workshop
that
occurs
at
supercomputing
every
year,
Frederick
National
Lab
for
cancer
research
and
Mount
Sinai
out
of
new?
You
are
kind
of
leading
I'm
on
the
program
committee
still
for
that
they're.
Actually,
looking
at
how
we
marry,
you
know
like
people
from
do-e
who
are
used
to
looking
with
high
energy
physics,
computational
algorithms
to
apply
to
you
know
cancer
research.
C
B
So
we
are
partnered
up.
In
fact,
there's
a
there
is
someone
Tommy
Angela
at
UT.
That's
doing
computational
cancer
research,
we
as
a
well
women's
health,
but
as
a
med,
school
or
partnered
up
with
the
engineering
for
computational
engineering
Sciences
at
UT
Austin-
and
you
know,
like
I,
said
my
background-
is
in
computational
engineering,
so
I'm
very
familiar
with
the
simulation
models
based
on
physics
right,
so
that
the
physical
simulations
we
are
using
for
known
outcomes.
B
We
are
using
inverse
Bayesian
methods
which
have
been
used
quite
a
bit
by
do
e
to
measure
uncertainty
and
to
go
back
and
fill
in
some
of
the
gaps
so,
for
example,
o
market
oz.
We
are
partnered
up
with
them
to
do
inverse,
Bayesian
methods
to
go
back
and
see
if
we
can
predict
these
known
outcomes,
given
what
we
know
is
an
incomplete
set
of
data
but
yeah.
Absolutely
it's
very
similar.
Well
we're
specifically
in
women's
health.
B
Now
there
is
nothing
and
I
would
say,
really
we're
in
the
area
of
rare
outcomes,
so
a
lot
of
what
we
do
because,
for
example,
you
know
neonatal
death,
maternal
mortality
stillbirth.
These
are
relatively
rare
occurrences.
There
is
nothing
to
prevent
us
from
going
out
to
other
rare
diseases,
rare
outcomes
in
other
medical
fields
very.
C
It
seems
like
I
was
really
interested
in
the
difference
between
average
effects
and
individual
specific
effects,
which
are
obviously
what
matters
from
the
patient's
perspective.
But
I'm
curious
in
sort
of
the
big
data
context.
We
have
a
lot
of
routine
data,
but
not
necessarily
in
experiment
that
you're
building
this
off
of
how
you
how
you
square
that,
with
the
causal
inferences
that
that
you
would
want
to
make
at
the
individual
level.
Well.
B
So
we're
not
yeah,
that's
a
good
question.
So
what
we're
trying
to
do
primarily
now
is
stratify
risk
which
doesn't
necessarily
get
down
to
an
individual
person's
level.
It
gets
down
to
an
individual
set
of
characteristics,
so
so,
for
example,
the
risk
of
well
in
the
u.s..
We
know
we
like
in
Texas,
we
have
the
highest
rate
of
preterm
birth.
B
We
have
the
highest
rate
of
maternal
mortality
and
it's
increased
dramatically
since
2010
the
chances
of
us
finding
the
exact
cause
and
being
able
to
prevent
an
exact
individual
from
dying
or
having
preterm
birth
may
be
rather
slim,
but
what
we
are
able
to
do
given
a
large
amount
of
data
is,
we
are
able
to
look
at
all
the
characteristics
and
see
whether,
in
fact,
your
race
has
is
a
factor
or
your.
Whether
you
have
a
baby
boy
or
a
baby.
B
Girl
is
a
factor
or
whether
there
are
certain
sets
of
characteristics
and
the
idea
being
that,
if
we
can
at
the
very
least
stratify
risk,
then
we
can
actually
point
more
intensive
medical
care
towards
that
higher
risk
population
and
and
we've
actually
been
able
to
at
least
identify
that
in
these
this
massive
set
of
data
that
we
have
that
those
risks
stratifications
do
seem
to
hold
steady.
They
do
seem
to
bear
out
cause
causes
a
different
story.