►
From YouTube: Data Science Challenges for Cancer Immunotherapy
Description
Held on Thursday, April 13, 2017.
Panelists: Joel Parker of the Lineberger Comprehensive Cancer Center; Benjamin Vincent of UNC-Chapel Hill; and Victor Weigman of Q2 Solutions, a Quintiles Quest Joint Venture.
About: The age of immuno-oncology is upon us: new cancer immunotherapies are providing fresh hope to patients who previously had few treatment options. Combining these technologies with the “Cancer Moon-shot,” the sky’s the limit.
However, immuno-oncology exists at the intersection of oncology, immunology and molecular biology, each of which alone bring significant data science challenges.
For more information about the Data Science Roundtable series, visit bit.ly/SBDHroundtables.
A
Good
afternoon
and
thank
you
all
for
joining
us,
so
I'm,
dr.
Lea
Shanley
on
the
co-executive
director
of
the
South
big
data
education
for
those
of
you
who
may
not
be
familiar
the
South
hub
or
and
data
innovation
hubs
generally
at
which
therefore
we're
launched
by
the
National
Science
Foundation's
that
serve
as
a
catalyst
to
help
build
and
strengthen
public
private
partnerships
that
apply
data
science
to
real
world
challenges
because
I
we
get
started
I.
Ask
that
those
in
your
room
mute
your
phones
for
those
who
are
watching
online.
A
We
ask
you
to
mute
your
mics,
so
we
don't
have
screaming
babies
or
broken
bones
throughout
the
panel.
You
can
ask
questions
of
the
panelists
when
we
move
to
that
part
of
the
the
program
by
typing
in
your
questions
in
the
chat
box
and
Carl
will
then
convey
the
questions
to
the
panelists
and
they
will
respond.
A
Let's
see
to
tweet
those
of
you
who
are
treating
SB,
dh4,
south
big
data
hub
17,
hashtag
or
hashtag
BD
hubs
and
then
you'll
catch
all
the
hub
purchase.
So
we
welcome
you
to
our
South
pod
data
science.
Roundtable
series
I
think
we're
on
number
five
or
so
series
or
six
is
a
monthly
series
that
highlights
emerging
research,
challenges
in
data
science
and
identifies
potential
solutions.
A
Today's
discussion
will
focus
on
navigating
questions
of
data
management,
data
sharing
privacy
and
more
in
order
best
to
take
advantage
of
the
opportunities
offered
by
the
promising
new
field
of
immuno
oncology.
I'd
like
to
start
things
off
by
introducing
today's
moderator,
dr.
Kimberly,
hrabowski
Kimberly
is
a
translational
sciences
and
CI
here
at
the
Renaissance
computing
Institute
and
an
adjunct,
professor
in
the
UNC
Chapel
Hill
department,
genetics
at
the
Renaissance
computing
Institute.
She
supports
best
practices
for
cyber
infrastructure
and
new
business
development,
especially
minna
mains
of
biomedical
and
genomic
initiative.
B
We
are
very
very
pleased
to
have
these
three
very
distinguished
panelists
to
join
us
today
and
I'd
like
to
introduce
them
to
dr.
Benjamin
Vincent
in
the
center
is
an
assistant
professor
of
medicine
in
the
division
of
hematology
and
oncology
at
the
University
of
North
Carolina
Chapel
Hill,
dr.
Benson
was
trained
in
cellular
immunology
and
immuno
genetics
in
the
laboratory
of
dr.
Geoffrey
Ehlinger,
former
chairman
of
the
unc
department
of
microbiology
and
immunology
and
past.
B
The
American
Association
of
immunology,
dr.
Vincent,
has
also
completed
his
research
fellowship
in
the
lab
of
dr.
Jonathan
Sir
OD,
and
he
is
currently
a
member
of
the
Lineberger
Comprehensive
Cancer
Center
immunotherapy
group
faculty,
director
of
the
immuno
genetics
facility
and
leader
of
the
mv-1.
You
preclinical
immune
therapy
program.
So
thank
you.
Government,
dr.
Joel
Parker
do
dr.
Kinsey's
left
is,
is
a
director
of
sequencing,
microarray
and
other
genomic
analysis
or
the
bioinformatics
shared
resource
at
Lineberger,
Comprehensive,
Cancer
Center.
His
research
is
focused
the
methodological
development
and
integrated
analysis
of
high-throughput
genetic
and
genomic
studies.
B
A
B
Ongoing
research
results
revolves
around
the
moment,
profiling
of
cancer
using
both
na
and
RNA
approaches,
including
the
development
and
deployment
of
robust
assays
that
we
leverage
clinics
clinically
as
laboratory
developed
tests
for
LD
P.
Dr.
webmin
brings
more
than
13
years
of
biomarker
discovery,
research
with
comics
and
the
majority
of
those
being
dedicated
to
freshman
houses
for
a
PA,
a
Q
squared
solutions.
Company
dr.
Waymon
has
published
14
papers
on
biomarker
identification
and
ashlee
development
and
has
contributed
to
the
development
and
launch
of
several
genomic
assays
and
further
ado.
B
D
Kim
after
that
introduction,
we
can
go
to
the
next
prize,
so
this
is
a
survival
curve
showing
the
clinical
data
that
we
motivates
our
desires.
Using
no
genomics
approaches
to
understand
responses
in
immuno
oncology.
The
y-axis
here
is
overall
survival.
My
percentage,
the
x-axis
is
time
in
months.
The
numbers
at
the
bottom
are
numbers
of
patients
in
the
various
groups
that
are
still
being
followed
on
the
study,
and
so
they
are
all
in
all
the
groups.
D
Our
patients
who
have
tumors
where
the
tumor
infiltrating
immune
cells
are
positive
at
two
or
three
plus
fairly
positive
or
negative,
or
what
is
as
of
yet
the
best
biomarker
of
response,
the
immunotherapy
in
any
tumor
and
although
I
should
say
these
data
are
from
a
large
study
in
bladder
cancer,
probably
the
most
robust
data
we
have
so
far
as
you
can
see.
The
patients
whose
tumor
infiltrating
immune
cells
highly
expressed
this
biomarker,
which
is
PDL
one
they
they
do
better
so
at
12
months
about
half
of
those
patients,
are
still
surviving,
whereas.
D
D
What
colleges
are
calling
the
tail
on
the
curve,
the
leveling
off
of
a
number
of
patients
who
have
long-term,
durable
remissions
at
five
years?
Some
oncologists
may
call
that
a
cure
to
the
challenge.
The
real
challenge
of
the
field
is
to
understand
why
the
vast
majority
of
patients
even
PDL
one
positive
patients,
still
progress
and
succumb
to
their
disease,
whereas
a
small
minority
get
long-term,
durable
responses.
Can
we
understand
that
from
pretreatment
characteristics
of
the
tumor
micro
environment,
from
assays,
available
from
the
blood
etc?
Let's
go
on
to
the
next
slide.
D
The
reason
why
genomics
is
attractive
in
tackling
this
problem
is
that
the
tumor
immune
micro
environment
shown
here
in
cartoon
form
on
the
Left
panel.
It's
highly
complex.
It's
a
mixture,
a
dynamic
equilibrium
of
competing
and
reinforcing
cell
types
with
many
various
functions,
all
of
them
communicating
with
one
another
and
then
the
cancer
immunity
cycles
shown
on
the
right
panel
is
equally
complex
disease,
a
series
of
events
that
has
to
happen
for
a
t-cell
to
recognize
and
kill
a
cancer
cell
reading
from
the
bottom
left
and
pock
why's.
D
There
has
to
be
cancer
cell
death
of
some
kind,
either
natural
turnover,
chemotherapy,
radiation,
immune
attack.
It
then
leads
to
the
elaboration
of
antigens
that
can
be
picked
up
by
antigen,
presenting
cells
migrate
to
the
lymph
node
prime
naive
T
cells,
leading
to
T
cell
clonal
expansion
and
activation,
and
those
T
cells
which
can
potentially
react
and
kill.
Tumor
cells
have
to
get
back
into
the
circulation
traffic
to
the
to
achieve
ingress
into
the
tumor
micro-environment,
find
the
specific
target,
tumor
cells
and
actually
achieve
killing
and
at
each
one
of
those
steps.
D
There
are
multiple
layers
of
molecular
regulation
and
public
and
private
entities
are
looking
at
ways
to
augment
anti-tumor
effects
at
each
one.
So
the
reason
that
genomics
is
attractive
is
because
genomics
can
allow
us
to
see
that
complexity
out
of
one
or
a
small
number
of
assays
in
a
way
that
we
can't
do
with
traditional
functional
cellular
immunology
techniques,
which
largely
require
live
cells
so
to
robustly
develop
biomarkers.
To
respond
to
immunotherapy
I
think
we
need
two
things.
We
need
to
be
able
to
assess
the
complexity
of
the
immune
micro
environment.
D
We
need
to
do
that
in
a
way
that's
accessible
from
FFP
or
formalin-fixed
paraffin-embedded
tissues,
because
that's
the
way
that
most
samples
are
archived
for
ninety
plus
percentage
of
the
patients,
even
at
academic
centers,
next
slide.
So
what
is
immuno
genomics
or
genomics
applications
in
immuno
oncology?
E
Thanks
man,
so
was
a
great
introduction
and
your
point
there
that
we
have
to
have
this
integration
of
multiple
biomarkers
in
order
to
really
power
these
trials.
What
I
want
to
show
is
that
that's
possible
and
I
have
a
few
examples
of
how
we've
used
a
large
number
of
variables
and
taking
those
into
account
in
proper
context
in
order
to
model
them
in
such
a
way
as
to
give
clinical
decisions.
The
next
slide,
please.
E
So
how
well
that
individual
is
going
to
to
us
or
how
long
that
individuals
going
to
survive.
In
absence
of
any
therapy
and
on
the
top
right,
what
we
found
was
that
that
producing
this
continuous
score
was
actually
very
a
very
strong
and
accurate
predictor
of
relapse
free
survival.
What
we're
showing
there
is
the
linear
relationship
between
the
score.
That's
provided
to
clinicians
on
the
x-axis
versus
the
probability
of
survival
five
years
on
the
y-axis.
E
What
we
also
found
is
that
using
data
in
this
way
was
was
much
more
accurate
than
any
clinically
based
diagnostic
that
was
currently
available.
So
taking
things
into
account
shown
in
the
bar
plotted
tumor
size
by
ter
status
is
the
ER
and
grade,
and
we
can
do
our
best
job
of
modeling
these
data,
but
it's
still
nowhere
close
to
the
accuracy
of
the
genomic
based
predictions
which
are
shown
in
the
later
bars
with
the
x-axis
there
being
accuracy
in
determining
the
risk
of
relapse.
Next
slide.
E
Please
next
slide
so
so
this
was
commercialized
into
a
test
called
Pro,
Sigma
and
Pro.
Sigma
commercialization
turned
this
into
a
very
easily
interpretable
report
for
the
clinician
so
that
they
can
take
this
highly
complex,
genomic
measure
that
is
evaluated
with
a
computational
model
and
then
project
that
back
to
the
clinician
and
very
easy
to
interpret
form,
and
so
in
the
top
left.
E
You
simply
get
a
score,
which
is,
is
this
risk
of
recurrence
score
and
it
can
use
the
used,
classify
patients
into
low
intermediate
or
high
risk
and
again
there's
this
lynnie
there's
this
continuum,
which
tells
them
not
only
about
this
categorization
but
on
the
bottom
right,
the
actual
probability
of
relapse
at
a
given
time
point
given
their
continuous
score
megawatt
grades.
So
that's
that
technique
was
FDA
approved
and
has
been
distributed,
and
it's
been
starting
to
be
used
to
analyze
samples
that
are
in
numerous
clinical
trials.
E
Here's
an
example
of
one
clinical
trial
led
by
Lisa
Carey
at
University
of
North
Carolina,
which
was
actually
a
negative
trial
and
that
they
used
combinations
of
different
of
different
Herceptin,
inhibitors
or
her2
receptor
inhibitors.
In
order
to
look
for
in
response
to
two
in
or
philippi
response
in
this
particular
subset
of
patients,
we
called
her2
positive
by
a
single
IHC
marker.
E
However,
about
subtyping
it
this
way,
what
we
see
is
that
we
get
a
enormous
increase
in
the
response
rate,
so
on
the
top
left
are
those
three
drug
combination,
and
while
there
is,
there
is
some
variation,
there
was
no
statistical
difference
between
the
three
arms
of
the
trial.
However,
on
the
bottom
right,
we
get
a
significant
interaction
with
those
that
are
sub
sizes,
the
her2
enriched
group-
that's
the
genomic
marker
instead
of
the
single
protein.
This
is
a
multivariate
genomic
marker.
Now
we're
getting
pathologic
complete
response
is
what
our
outcome
is
here.
E
That
means
after
the
drug
is
given,
the
surgeon
is
going
in
and
looking
for
tumor
and
there's
none
left,
so
this
is
as
close
to
a
cure
as
we
can
get
in
a
short
timeframe
and
seventy
percent
of
the
patients
that
receive
this
drug
and
are
in
that
subtype
achieve
that
result.
Next
slide,
please.
So,
in
other
work,
we're
extending
this
work
and
looking
at
other
drugs
and
using
genomics
to
develop
these
models
of
subtype,
and
we
can
show
repeatedly
here's
with
another
drug
called
enzalutamide.
It's
an
androgen
receptor
antagonist.
E
It
was
approved
in
prostate
cancer,
and
the
thought
was
that
there
was
some
subset
of
breast
cancer
patients
which
may
also
be
sensitive
to
this
drug.
Of
course,
it
wouldn't
be
many
because
it's
androgen
and
not
estrogen
receptor,
but
using
genomics
we
can
actually
find
that
subset
of
patients
who
don't
express
estrogen
receptor
instead
of
androgen
receptor,
may
be
driving
their
breast
cancer
and
the
results
of
this
drug
works
very
effectively
in
that
small
subset
of
breast
cancer
patients
and
so
using
the
genomic
based
biomarker
predict.
E
They
are
up
here
on
the
top
left
we
highly
enriched
for
those
those
patients
that
are
sensitive
to
the
drug
as
opposed
to
a
single
protein
marker
I'd,
see
which
would
be
used
in
the
clinic
right
now
and
on
the
top
right.
So
in
this
case
we
increase
our
sensitivity
by
10%
our
positive
predictive
value
by
10%.
That
is
the
patient's.
We
predict
to
be
sensitive
to
the
drug
or
actually
responding
to
the
drug
and
also
increase
our
negative
predictive
value.
E
That
is,
we
were
more
accurate
at
not
giving
the
drug
to
people
that
that
will
not
respond
to
it
and
next
slide.
Please,
and
so
the
result
of
this
work
again,
a
multivariate
signature
that
was
built
in
research
space
taken
into
the
clinic.
It
will
be
a
phase
three
trial
where
the
biomarker
is
going
to
be
entry
in
criteria
into
the
trial,
and
this
will
get
provide
us
definitive
answer
as
to
the
enrichment
of
this
based
on
the
biomarker.
E
The
beautiful
thing
about
this
particular
test
is
that
these
triple
negative
breast
cancers
typically
get
chemotherapy
alright
and
if
you
all
heard
chemotherapy
is
not
too
good
for
you
right,
whereas
in
volute
Ahmad,
the
drug
that
we
are
going
to
give
in
in
in
substitution.
For
chemotherapy
for
these
diagnostic
positive
patients
is
a
hormone
agent.
We're
only
ten
percent
of
patients
even
have
grade
three
fatigue.
E
Distill
them
down
into
some
clinically
actionable
result,
and
that
clinically
actionable
result
is
something
that
we
can
do
right
now,
because
we're
taking
a
drug,
that's
already
approved
and
giving
it
to
the
patients
that
that
are
accurately
needed,
and
the
result
is
shown
here
that
in
this
particular
cohort,
the
median
survival
is
32
weeks
for
those
that
are
diagnostic
negative.
Those
that
achieved
that
are
our
diagnostic
positive
achieve.
Seventy
five
weeks
of
overall
survival
and
they're
not
even
getting
chemotherapy
it's
a
single
agent
hormone
therapy.
E
We
now
have
to
incorporate
tumor
genomics,
as
well
as
all
the
micro
environment,
in
order
to
see
this
kind
of
result,
and
oh,
this
was
just
a
summary
slide
showing
that,
while
we
give
you
a
few
examples
within
a
few
different
disease,
subtypes
of
breast
cancer,
that
in
the
larger
cohort
of
all
breast
cancer,
this
kind
of
resulted
by
developing
a
gene
expression
based
or
general
genomic
model
commercializing.
It
has
produced
results
for
all
of
breast
cancer
and
because
of
that,
we're
seeing
improved
survival
across
the
entire
spectrum
of
breast
cancer.
C
So
yeah
this
is
really
good
setup,
because
if
amin
ology
is
hard
understanding
how
you
mean
the
therapy
is,
is
even
harder,
as
we've
kind
of
got
a
crash
course
in
this.
You
can
go
next
slide
by
the
way,
I'm
so
glad
that
I
switched
out
my
slide
last
minutes
to
a
microwave
virus.
For
this
one.
That
way
we
don't
throw
up
all
the
same
one,
but
since
we've
already
gone
over
a
lot
of
this
at
least
wanted
to
touch
for
the
people.
That's
not
familiar
with
it's.
C
What
we
like
to
hope
when
we
get
a
disease
or
any
other
kind
of
affliction,
our
memes
cells
respond
to
it.
We
get
sick.
You
know
we
kind
of
feel
crummy
for
a
little
while
immune
cells
are
building
up
ability
to
kill
whatever.
That's,
not
you
and
it
immediately
goes
away,
so
it
shouldn't
be
too
far
removed
to
think
about.
C
C
It's
a
appropriate
metastasis
and
you
got
your
margin
here
in
the
rhythm
when
you
actually
get
the
tumor
cells
to
actually
start
responding.
Kind
of
are
the
British
official
for
this
patient
is
not
so
great.
You
know
your
body
is
not
responding
it
as
an
immediate
danger,
although
when
you've
got
immune
cells,
giving
a
very
strong
response
and
by
the
way.
D
C
Is
you
know
once
they
old
school,
but
very
little
throughput
I
mean
images,
are
big
data
but
they're
just
about
one
picture.
For
one
thing,
how
do
we
really
know
what's
going
on
here?
Why
does
this
patient's
immune
system
respond
versus
this
one,
even
though
they
both
have
colorectal
cancer?
That's
really
a
tough
part.
C
Let
me
get
this
to
or
micro
environment,
which
is
not
just
a
tumor
in
the
top
figure
here,
that's
in
the
page,
but
all
these
different
cell
types
that
are
just
sitting
around
doing
what
they're
supposed
to
do
or
not,
because
the
tumor
is
evolved
enough
to
where
the
recognition
pathways
don't
exist
as
we
would
like
them
to.
If
you
go
to
the
next
slide
yeah.
This
involves
all
the
things
you're
not
supposed
to
throw
on
a
fraud,
but
I'm
doing
this
to
a
point
so
I
mean
therapy
is,
is
very
important.
C
C
Trials
being
held
for
what
means
Darren's,
so
in
this
figure,
as
you
get
a
radially
out
each
pie,
slice
is
a
type
of
cancer,
melanoma,
non-small
cell
renal
cell
in
colon,
and,
as
you
go
outward,
you
go
from
your
large
phase,
3
to
your
small
phase,
1
and
every
dot
fear
is
a
particular
therapy
or
combo
therapy
for
those
that
are
that
are
yellow.
While
that's
important
is
you
have
all
these
trials
going
on
with
all
these
immune
therapies,
primarily
on?
As
those
speakers
already
mentioned,
you
know,
PT
1p
do
one,
that's
one
marker.
A
C
Not
necessarily
the
best
marker
and
as
Joe,
showed
and
kind
of
getting
the
pro
stigma
and
the
combined
gene
expression
leveraging
the
fact
that
genomics
can
provide
you,
multi
analyte
testing
provides
you
a
much
more
robust
response.
So
what
we're
learning
from
these
trials?
They
were
only
running
one
type
of
analyte
testing,
we're
missing
the
more
patients
aren't
getting
that
nice
leveling
off
that
this
immune
therapy
is,
is
promised
to
do
next
slide.
C
So
there's
some
thoughts
here
into
what
all
things
we
test
in
the
immune
system
and
the
immune
response
to
that
tumor
can't
go
over
all
the
slides,
so
I
want
these
all
the
material
here,
cancer
muta
gram
that
was
pushed
out,
but
the
reason
why
I
showed
this
here
mom.
So
given
I
come
from
a
clinical
testing
organization,
we
actually
have
testing
strategies
for
all
these
types
of
items.
Whether
there
is
the
tumor
riddled
with
mutations
is
to
our
tumor
cells.
C
Actually,
seeing
the
immune
system
present
in
there
are
they
actually
getting
deep
into
the
cells
start
killing
the
root
of
these
items,
and
as
we
get
around
to
these
questions,
we
want
to
know
about
how
the
immune
system
responds
to
that
tumor.
There's
all
kinds
of
testing
that
are
available
on
ones,
the
blue
ones.
My
specific
lab
does,
and
the
ones
in
green
are
the
larger
global
laboratory
structure.
C
But
there's
lots
of
ways
to
do
this
and,
as
was
mentioned
earlier,
a
lot
of
them
are
relying
on
live
cells,
which
is
very
difficult
to
get
I
mean
you
do
a
blood
draw,
that's
life
cells,
but
normally
when
we
get
the
tumor
there,
an
FFPE
or
you
get
this
big
mass
a
lot
harder
to
do
these
things.
So
the
idea
of
a
one
test,
one
fit.
D
C
Immune
therapy
is
just
we're
not
there,
and
as
clinical
trials
are
coming,
if
you
guys
are
next
slides
matter
of
fact,
they're
starting
to
come
alive
every
time,
I
show
these
kinds
of
flaws.
I
have
to
do
more
research,
because
so
many
more
clinical
trials
are
showing
up
trying
to
get
at
what
the
how
the
immune
system
is
responding.
C
This
some
work
I
did
back
in
November
manually,
reviewing
close
to
trial
cells
because
it
was
too
hard
to
to
find
some
magic
query
past
that
gave
me
everything
that
I
needed
so
I
just
did
old
school
like
the
biologists
in
these
woods,
but
wow
that's
really
important
and,
as
you
can
see,
what
study
types
are
you
just
observing
patients
or
you're,
actually
intervening
with
drugs?
I
will
say
you
know
it's.
The
immunes
is
something
in
the
immune
system,
a
primary
outcome
measurement.
That's
this
slide
right
here.
B
C
Then,
on
the
right
here
are
we
actually
measuring
something
besides
one
cutting
one
marker
now,
if
you
guys
have
ever
played
around
clinical
trials.gov,
which
is
a
food,
the
descriptions
are
anything
from
meticulous
and
well
thought
out
to.
Oh,
my
god.
I'm
meeting
up
with
some
friends
at
4:30.
D
C
Got
to
put
out
these
sausages
into
my
description
before
I
go
catch
them
up
as
well,
so
they're
really
really
widely
different
about
what's
going
on
in
these
trials.
So
let's
take
the
first
one
here
for
prostate
cancer,
that's
recruiting
I've
got
the
codes.
I've
got
the
whole
list
of
these
80
some
on
trials
in
a
different
publication,
but
you
know
here
we're
going
to
measure
t-cell
diversity
as
a
t-cell
repertoire,
deep
sequencing
all
right
over
the
drug
helper
t-cell,
that's
great,
and
there
is
a
whole
slew
little.
C
D
C
Commencements
about
how
the
t-cell
cells
really
get
in
there
and
doing
the
killing,
how
they're
responding
okay
cool?
What
are
we
going
to
do
in
this
CLL
study?
All
right?
Maybe
it
has
the
new
system
as
primary
outcome.
Definitely
not
the
secondary.
The
primary
one
is
cell
surface
antigens
Rick,
which
one
antigen
so.
C
D
C
Is
not
a
consideration?
That's
going
on
most
of
the
time
in
this
last
day
here
at
lymphoma
is
just
an
example.
This
is
extra
explicit,
we're
going
to
look
at
minimal
residual
disease
or
MRD
ss5
t-cell,
with
this
clinically
available
assay
great
I,
know
exactly
what's
going
on.
Yes,
it
is
indeed
in
genomics
and
by
the
way,
it's
something
they're
exploring
by
the
way
is
we're
not
using
this
data
to
actively
get
people
on
these
drugs,
while
some
of
them
exist.
B
C
Not
a
lot
I
mean
even
here
in
this
consider
it's
an
additional
study
so
obviously
based
on
some
testing
they're
going
to
get
a
drug
or
they
are
not
going
to
get
a
drug.
In
this
case,
it's
not
going
to
be
related
to
anything
about
their
immune
system,
they're
going
to
sit
and
watch
that's
kind
of
where
we're
at
right.
Now
in
the
sitting
and
the
watching
phase
in
my
last
slide,
I'm
going
to
show
the
evolution
of
how
this
kind
of.
C
Is
I
will
say:
I
have
a
full
complete
information
about
all
the
owner
school
testing
mechanisms,
but
as
we
do
PCR-
and
this
is
a
hundred
and
seven
different
primers
from
a
2003
tapir-
you
run
those
primers
together,
you
look
at
how
their
kind
of
size
distribution
set
and
you
have
the
same
patient
doubled
up.
So
you
make
sure
you
get
a
consistent
response
and
you
kind
of
say
yep.
Those
squiggles
are
totally
different
than
those
squiggles
there's
less
of
them
here.
There's
less
immune
cells
that
are
different
here.
C
It's
not
so
different
and,
of
course,
we
love
the
gel
right.
Very
highly
resolved,
as
you
can
tell
from
here
to
the
schmear
with
some
dot,
makes
it
very
hard
for
like
actually
something
as
resolved
as
what
joel
had
mentioned
because
attending
on
what
picture
and
camera
you
use,
you
might
be
getting
different
intensities
of
this
kind
of
stuff
and
pending
on
how
good
your
fragment,
analyzer
is
or
how.
E
C
Know
your
protocol
is
these.
Squiggles
may
not
make
me
shift,
so
this
is
when
it's
really
diverse
and
very
healthy.
This
is
what
it's
not
so.
The
vs.
in
your
green
cells
sees
all
kinds
of
stuff
like
that
big
dark
black
mass
from
earlier
on.
Here's
one
that
says
meu
system
is
not
really
doing
really
well,
so
you
got
between
these
two
controls,
something
in
the
middle,
something
close
to
the
end.
C
Here,
with
exactly
two
degree
of
the
recognition
sequence,
foam
does
tumor
antigen
we
get
to
know
what
kind
of
frequency
we
see,
the
specific
sequence,
how
it
counts.
We
get
how
many
actual
not
normalize,
counsel
we
get
so
actually
starting
to
get
expression
of
the
different
antigens
from
the
the
T
and
B
cells
here,
in
a
way
that
we
can
actually
kind
of
consistently
measure
specifically.
C
2003
to
107
different
primers
to
now
25,000,
so
automatically
we've
exploded.
It's
consistent,
you
know
pretty
substantially
about
what
we're
measuring
from
the
t-cell
receptor.
Of
course,
now
the
trials
are
starting
to
come
out.
There
are
now
last
I
heard
at
least
at
ACR
last
week
was
maybe
like
150
trials
and
immune
therapy
that
are
using
marker
I.
Couldn't
look
that
up
it's
really
hard
to
to
go
to
trial,
but
we're
starting
to
see
these
things.
But
the
question
is:
how
are
we
actually
going
to
be
able
to
leverage
this
data?
C
How
is
it
actually
going
to
be
created
in
the
first
place
in
such
a
way
where
we
can
start
understanding
how
the
immune
therapy
is
responding
to
these
patients?
That
is
it's
almost
like
we
revisited
or
reengaged
in
expression,
profiling
in
a
15
years
ago.
So
it's
getting
really
difficult
to
mine
this,
and
not
only
that
as
I
mentioned,
the
trials
are
describing
things
very
well.
I
will
say
complaining
the
space,
the
data
organization
isn't
so
hot,
so
there
is
there's
a
lot
of
room
to
grow
in
that
space.
C
B
B
Dr.
Benson,
what
would
you
say
would
be
some
of
the
primary
things
that
need
to
be
measured
moving
forward.
Maybe
can
you
talk
to
us
a
little
bit
about
you
know
what
is
being
measured
today
for
these
things
and
what
other
things
need
to
be
added
right
away,
and
maybe
a
longer-term
you
know
do
some
sense
of
what
you
would
like
to
see
sure.
D
D
A
genomics
perspective
I
think
we
think
in
terms
of
the
base
level,
assays
and
then
the
analytic
so
from
base
level
assays.
We
can
get
a
lot
out
of
RNA
sequencing
alone,
but
if
you
wanted
to
say,
what's
the
what's,
would
you
have
if
you
know
in
your
dreams,
but
actually
reasonable
to
do?
It
would
be
RNA
sequencing,
whole
exome,
sequencing
and
t-cell
receptor
and
b-cell
receptor,
amplicon,
amplification
and
and
see
those
those
three
sets
of
things
are
doable
at
a
decent
cost
structure
and
can
give
us
a
huge
wealth
of
information.
D
But
on
top
of
that
is,
the
analytics
are
more
complicated
than
in
classical
genomics,
and
so
that's
the
sort
of
the
next
layer
is.
You
need
analytics
to
actually
fish
out
the
immune,
signaling
kn'l
in
especially
in
the
RNA
sequence
data,
and
then
you
need
sophisticated
modeling
approaches
like
what
joel
has
described
in
order
to
essentially
build
your
biomarker
model
or
your
model.
D
That
will
be
a
biomarker
from
a
large
number
of
features
at
you
know
at
each
level
deciding
what's
where
the
orthogonal
information
is
what
we
would
exhibition
so
on
that
that's
also
a
difficult
problem.
So
maybe
there's
three
layers:
there's
the
base
assays
then
there's
the
analytics
to
get
the
base
level
care
immune
feature
characterizations
out
of
those
assays
and
then
there's
the
advanced
modeling
to
test
and
build
the
biomarker
and
so.
D
B
D
We
had
a
choice.
We
want
multiple
spots
infected
late
last
year.
There
is
a
wonderful
article
in
science
about
heterogeneity
and
expression
of
tumor
antigens
across
different
geographical
geospatial
or
spatial
regions
in
to
mer,
and
so
you
may
get
different
answers
about
what
the
tumor
target
space
is
they're,
trying
to
predict
that
from
one
versus
another,
so
multiple.
So
this
this
is
actually
a
place
where
a
genomics
result
may
inform
clinical
biopsy
strategy
because
right
now,
what
happens
if
the
tumor
is
large
enough?
C
For
me,
I
found
the
hard
part
to
follow
up
on
that
is
actually
getting
that
in
the
trial
protocols.
You
mentioned
the
the
RNA
seek
exon
and
a
t-cell
receptor
over
the
last
year
and
a
half
our
trials
that
are
testing
for
exo-m
and
RNA.
Seek
have
slowed
it
to
you
know
several
thousand
or
tens
of
thousands
of
patients
that
are
getting
those
two
things
done.
The
CMB
is
too
expensive
now
for
everybody
to
start
doing,
but
I'm
going
to
grab
go
out
in
the
secondary
arena,
they're
starting
to
understand
the
value
of
that
data.
C
C
To
send
a
file,
but
we
can't
have
80,000
columns
in
our
clinical
trial
databases.
What
do
we
do?
Is
there
use
of
this?
You
know
we
were
expecting
pd-1
and
ctla-4
can't
don't
you
have
that,
but
we
we
have
that
and
seventy
nine
thousand
nine
hundred
and
ninety
six
other.
So
it's
been
really
difficult.
We
spend
a
lot
of
lag
time
in
the
data
arena,
just
getting
people
understanding.
What
that
information
is
because
the
pis.
D
C
C
C
That's
a
great
question:
it
depends
on
how
much
people
study
understanding
what
the
assay
that
used
to
generate
that
data
and
in
any
kind
of
we
get
audited
all
the
time
and,
of
course,
for
you
and
see
out
imagining
and
testing
in
an
isolated
arena.
Do
you
understand
how
the
sequencing
group
works?
You
understand
how
the
samples
go
through
joel
spike
watch
you.
C
You
can
follow
that
kind
of
chain
of
custody,
but
that's
not
always
done
everywhere,
just
because
all
these
trials-
and
you
look
at
the
places
that
are
doing
these
testing
they're
everywhere,
so
having
a
single
way
to
follow
all
that
information
that
the
patient
that
you're
looking
at
is
the
white
one
is
actually
a
very
difficult
problem.
So
controlling
that
having
standards
for
how
lap
you
know
your
cap
and
CLIA
activation
is
a
great
way
filter
these
types
of
things
out
and
then
getting
into
the
trepidation.
C
Oh
man
that
is
yeah,
there's
no
wand-waving
it'd,
be
like
it
was
that
antigen
that
I
had
up
on
there
that
six,
you
know
string
of
60
bases,
that's
the
one
that
responds
to
the
therapy.
I!
Don't.
B
E
Absolutely
so
this
is
the
real
kind
of
the
new
challenge
with
immuno
in
ecology.
Is
that
that
you
want
to
integrate
these
different
features,
whereas,
prior
with
with
you
know
the
biomarkers
that
I
described
earlier?
These
are
all
you
know.
We
consider
them
as
all
just
tumor
markers
and
and
while
they
are
voluminous
it's
you
know
typically
simple,
linear,
based
strategies
of
modeling
will
that
will
allow
us
to
produce
a
very
directly
interpret.
E
C
I'll,
say
I
mean,
but
even
then
getting
into
that
phenotypes.
If
we
go
back
to
the
figure
of
that,
we
have
to
would
go
back
to
the
figure.
That's
in
a
micro
environment,
I
mean
now
I
mean
you
get
actually
put
the
slide
out
here
on
the
assumption
that
YouTube
I
am
I,
have
it,
but
actually
the
gene
expression
signatures
that
our
immune
cell
specific,
we
can
actually
start
saying.
C
We
have
cd8
here
we
have
cd4
here
we
have
memory,
be
really
at
a
basal
level,
but
at
least
we
can
start
characterizing
each
of
those
immune
cells
and
ideally
hope
they
it's
there.
It's
active
or
it's
we
don't
see
it
at
all
and
maybe
getting
to
the
pathology
and
the
staining
something.
That's
a
lot
more
robust
and
automatically
dimensional
reduction
becomes
much
more
tangible,
I've,
not
seen
you
know
quite
that
predicted
level
yet,
but
I
know
there's
definitely
papers
existing
to
help
with
that
process.
I
think.
B
D
Till
may
be
better
able
to
speak
this
to
me,
I
think
you
probably
understand
the
architecture
a
little
bit
better,
but
I
think
we,
you
know
we
are
very
careful
to
store
sequencing
data
and
hipaa-compliant
wait.
So
unc
takes
the
approach,
there's
kind
of
two
schools
of
thought
about
genomics
data
and
privacy.
One
school
of
thought
is
it's
not
it's
not
person
if
it's
not
whole
genome
sequencing.
It's
not
personally
identifiable
data.
Thus
I
can
do
with
it.
Whatever
I
want
put
on
my
laptop
and
whatever
there's
another
school
staff.
D
E
D
E
D
E
E
Right,
ultimately,
you
know
our
DNA
is:
is
our
identity
there's
nothing
more
identify
both
in
the
DNA,
however,
in
order
to
make
it
identifiable
it
requires
that
you
go
out
and
have
something
that
I
know.
Is
you
and
test
it
and
compare
it
right,
and
so
so
you
know
the
HIPAA
regulations
right
now,
taking
the
stance
that
it's
not
identifiable,
because
you
would
have
to
go
resequenced.
Someone-
and
that's
is
reasonable.
However,
I
don't
think
it's
going
to
be
too
far
in
the
future
before
before
such
technology
is,
is
very
amenable
to
me.
E
Finding
you
know
whatever
it
is
a
piece
of
your
hair
or
skin
flakes
that
allow
us
to
be
identifiable,
so
I
think
it's
a
short-lived
stand,
but
at
the
same
time
we
try
to
be
more
conservative
in
our
in
our
in
our
stance
at
UNC,
I.
Think
since
you,
the
data
for
I'll,
talk
to
the
clinical
part
suite
to
it,
which
is
that
you
know
a
lot
of
privacy
is
also
considered
upfront
in
the
consent,
right
and
and
I.
E
Think
what
we're
really
having
trouble
with
right
now
is
is
that
you
know,
as
you
know,
we're
in
this
discussion
of
the
day
about
consent
from
historic
studies
where
we
have
banks
and
banks
of
FFP
tissue,
which
we
now
have
technologies
to
completely
unlock,
and
they
have
years
of
outcome
and
clinical
data
available,
but
they
were
not
consumed
'add
in
the
way
in
which
we
can
say
that
information
correctly
right,
many
of
these
people
are
maybe
is
now
guide.
What
do
we
do?
E
Plus,
because
it's
DNA
that
that
information,
you
know,
has
relevance
to
their
families,
it's
not
just
them
where
their
tumor.
So
we
have
to
be
very
careful
with
how
we
approach
those
things,
and
you
know
it
may
be
I
think
this
is
a
real
challenge.
Actually,
I
don't
want
to
be
optimistic
or
pessimistic
is,
but
there
has
to
be
a
way
that
we
can
unlock
that,
while
still
having
some
respect
for
confidentiality
and
and
what
they've
continued
to
what
the
patient
can
send
to
when
they
enter
the
study
and.
C
I'll,
add
to
this
on
another
clinical
level
for
the
trial,
testing
and
other
items
like
this.
So
when
a
person
contracts
with
us
to
russets,
it
exists
for
their
trial
or
whatever
testing
we're
doing
when
you
sign
up
for
that
particular
test.
A
box
arrives
through
that
provider
that
already
has
sample
collection,
materials
and
things
like
this
pre
bar
coded
with
an
anonymous
ID
so
where
my
organization
is
holding
under
lock
and
key
that
identifiers
database.
C
So,
even
my
team,
when
we
get
a
sample
in
process
that
we
know
it's
a
sample
ID
once
we
upload
it
back
to
our
delivery
portal,
that's
when
the
delivery
portal
can
read
the
barcode
slap,
the
patient,
ID
back
on
and
go
back
to
the
Firefly.
We
never
even
have
access
or
knowledge
of
those
kinds
of
things.
And
yes,
of
course,
as
far
as
the
computer
infrastructure
is
set
up,
the
monitoring
of
every
single
person,
even
our
robots
on
that
system,
is
completely
locked.
C
B
Yes,
that
opens
more
questions.
Probably
do
you
know
how
do
how
does
one
try
to
be
foresighted
by
consenting
for
long
term
research,
and
also,
how
can
you
be
farsighted
in
terms
of
anticipating
what
will
be
called
thi
with
regard
to
genetic
information
in
the
future
and
I?
Think
that's
a
really
good,
very
important
point
that
Tomatoes
countries
we've.
D
Integrated
now
into
our
tissue
banking,
specific
consents
or
genomic
studies,
with
the
recognition
that
it's
identifiable,
essentially
identifiable
material,
so
going
forward,
I
think
we're
okay.
But
the
jewels
point
of
what
about
the
tissue
banks
is
started
in
1997
and
as
a
thousand
pin
right
I
mean
you
haven't.
E
D
E
It
just
to
save
like
what
a
valuable
resource
that
is.
It
takes
any
additional
statements.
The
initial
slide
I
showed
on
making
getting
that
biomarker
approved
by
the
FDA.
It
was
because
we
could
run
it
on
FFPE
tissue,
and
so
we
were
able
to
run
that
biomarker
on
owen
retrospectively,
collected
trials
that
already
had
15
plus
years
of
follow-up.
So
in
that
way
we
are,
you
know
the
trials
already
done.
E
The
data
is
or
the
samples
are
already
there,
the
15
years
of
follow-up
writing
there,
so
we
can
immediately
take
it
to
action
and
that
that's
where
the
real
value
is
otherwise,
no
matter
what
you
know,
you're
great
hypothesis
would
come
up
is
with
is
right.
Now
it's
going
to
be
that
long
before,
especially
these
long
outcome,
cancers
like
breast
cancer
can
have
any
action
taken
on.
B
So
then
I
think
maybe
the
flipside
of
that
question
is
you
know
you
have
this
data
they're
not
broadly
consented
in
some
cases,
and
you
have
multiple
teams
from
different
pharma
companies
and
we'll
have
you
kind
of
running
the
race
to
see
who
can
get
the
biomarker?
How
do
you
share
data
to
facilitate
you'll
discover
it,
you
know.
Is
there
what
kind
of
data
can
you
share?
How
can
you
incent
organizations
to
this
kind
of
data
sharing
wealth
in
time
respecting
privacy
consent,
the
United
and
all
these
other
things.
E
E
E
C
E
Don't
instead
I
think
speaking
to
what
Kim
said
about
incentivizing
them,
I
think
as
more
and
more
trials
come
out
where
they
see
that
their
particular
drug
has
no
great
effect
in
a
general
selection
process,
but
with
a
biomarker
in
place.
Now,
all
of
a
sudden,
they
increase
that
positive
predictive
value
and
that
positive
predictive
value.
E
What
that
means
to
the
accountants
that
the
drug
companies
is
that
they
can
spend
a
lot
less
on
patients
in
the
trial,
because
now,
if
I,
have
a
two-fold
increase
in
effect
size,
it's
going
to
cut
the
number
of
samples
that
I
have
to
run
through
the
trial
in
half
in
order
to
get
the
same
effect
and
convinced
the
FDA
and
so
I.
Think.
As
you
know,
as
more
of
these
biomarker
based
trials
permeate
medicine,
that
farmers
will
start
to
realize
that
there
will
be
these
cases
where
we
actually
save
money.
C
I
will
say
that
it's
going
to
happen.
If
we
have
speakers
we
get
to
do
a
public
service
announcement.
I
would
say
you
know,
being
able
to
change.
How
we
run
an
administer
and
select
patients
for
trials
based
on
biomarker
is
something
we
have
to
start
doing
a
whole
lot
more
of
I
mean
numbers
that
we
see
in
our
CRO.
Space
is
surprisingly
low.
Like
team
there.
B
C
C
A
mean
their
items-
that's
one
marker,
so
that's
really
great,
but
a
lot
of
the
cases
for
these
it's
still
awaiting
seat
and
they
want
to
get
the
trial
started
right
now.
They
hope
that
this
blockbuster
enough
and
it's
a-okay-
they
don't
need
a
biomarker,
because
the
phenotype
is
cancer,
gone
high-fives
for
everyone,
but
when
it
doesn't
work-
and
we
know
it
doesn't
work
70,
60
percent
of
the
time-
we
know
that,
because
we're
in
the
field.
Well,
how
do
we
solve
that?
How
do
we
fix
it?
C
How
do
we
make
sure
someone
we
identify,
someone
as
not
being
able
to
get
that
very
expensive
new
therapy
or
preventing
them
from
getting
the
side
effects?
So
I
spend
most
of
my
time
now.
You
know,
begging
and
pleading
that
this
extra
cost
of
the
trial
actually
saves
this
much
money
down
the
road,
it's
a
very
different
role,
but
it's
something
that
we
have
to
do.
So,
if
you
guys
know
friends
like,
are
you
through
the
Bible?
C
You
know
I'll
be
great,
there's
no
local
congressman
for
that
kind
of
stuff,
but
that
would
be
I.
Think
that's
something
that
we
need
to
see
a
whole
lot
more
of
genomic
testing
isn't
isn't
that
bad
value
from
mining
the
stuff
going
down
the
road
is,
is
better
and
also
interplay.
Opening
from
these
farmer
databases
that
are
so
discrete
there's
no
sample
volumes.
D
The
other
thing
that
has
to
happen
is
more
clinical.
Options
have
to
be
approved
in
order
to
actually
show
value
of
biomarkers,
to
clinicians,
who
are
actively
treating
patients,
because
now,
if
we
can
do
a
study,
clinician
is
faced
with
a
patient
with
lung
cancer,
who
fail
multiple
lines
of
therapy
and
a
biomarker
can
tell
that
clinician
that
the
patient
has
a
10%
chance
or
a
40%
chance
of
responding
to
a
certain
truck,
but
that
drugs
the
only
option
left.
D
The
clinician
will
prescribe
that
drug
and
not
use
the
biomarker
test
and
not
care
about
the
biomarker
test,
because
it
doesn't
change
his
or
her
decision-making
calculus,
whereas
if,
as
I
expect
five
years
from
now,
there
will
be
multiple
combination.
Immunotherapies
available
that
work
by
different
mechanisms
across
a
number
of
tumor
tissue
types
and
clinicians
will
be
faced
with.
Oh
I
have
double-digit
possibilities
to
use.
D
D
B
B
C
B
C
Listing
the
test
being
used
and
therefore
you
can
leave
backtrack
and
know
that
this
does
that
thing
there's
so
many
more,
but
already
as
I
showed
earlier,
it's
like
we're
going
to
look
at
presenting
antigens
wow.
That's
what
we're
going
to
look
at
very
difficult
for
me
to
understand
which
ones
those
are
until
the
five
years
later
or
I
go
to
ASCO,
and
it
shows
me
oh,
we
are
at
least
going
to
look
at
ctla-4.
Okay,
well,
I
knew
that
what.
C
C
C
Least,
in
the
what
what
and
I'm
trying
to
do
this
personally
in
my
shop
is
when
we
look
at
the
tumor
microenvironment
we're
running
gene
expression
on
there,
which
we
do
for
vast
majority,
for
our
mean
therapy
stuff,
actually
also
staining
it
in
a
lot
of
the
areas
that
are
there.
Having
more
confirmation
we
asked
earlier
about,
the
this
particular
mean
cell
is
up
or
not
or
active.
You
know
we
have
flow,
we
have
I
see.
C
We
have
NGS
to
actually
start
building
a
more
better
verified
kind
of
hot
and
cold
signatures,
so
a
database
that
has
the
RNA
seek
whatever
by
knows
and
love
and
the
flow
which
everybody's
comfortable
with
and
most
hospitals
have
and
in
some
cases,
the
IHC,
and
so
you
know
with
these
not
just
a
phenotype
treatment,
but
the
molecular
phenotypes
are
known
along
with
the
genomic.
That
would
be
a
great
database
very
hard
to
get
I
mean
some.
C
C
Take
lots
of
like,
would
you
then
provides
an
engine
there?
That
is
a
floor
of
color
and
there's
really
smart
camera
and
you
basically
flow
the
cells
through
capillary
tubes
once
it
sees
its
the
of
course,
all
cognition
camera
on
the
flow
meter
sees
these
particular
colors.
It
shifts
them
down
different
suits
and
actually
starts
counting
the
sales.
So
tube
has
been
enough
to
wear
nothing.
Nothing,
that's
nothing,
something,
nothing,
nothing,
something
something
something
else
something
else.
These
things
actually
get
pretty
complex.
C
B
C
B
Then
you
know,
let's
say
some
some
young
and
up-and-coming
data
scientist.
This
does
create
a
slow
database.
What
kinds
of
intellectual
property
issues
that
we
talk
about
in
terms
of
database
ownership?
Is
that
something
that
is
even
you
know
considered.
In
this
basis,
this
pointer
is
still
laughter.
B
A
B
D
It
depends
on
how
loosely
you
define
the
term,
not
an
issue.
You
know
we
don't
have
all
the
algorithmics
and
all
the
software.
We
need
to
produce
the
immune
characterizations
from
genomics
data
that
we
want,
but
I
know,
and
sometimes
people
say
that's
not
an
issue.
They
mean
I've
got
a
team
that
can
get
that
done
in
a
year
or
two
with
a
high
degree
of
confidence,
as
opposed
to
I.
Have
the
solution
in
front
of
me
right
now,
but
I
mean
at
least
in
my
lab
and
in
our
work
together.
E
D
E
Give
you
an
example
of
where
they
have
been
solved.
It
would
be
while
modes
in
the
back
of
the
room
here
he
worked
up
an
algorithm
to
reconstitute
the
b-cell
receptor,
which
is
extraordinarily
complex
from
RNA
sequencing
data,
and
he
published
that
last
year.
As
part
of
that
we
wanted
to.
We
wanted
to
assay
the
Cancer
Genome
Atlas
set,
which
is
about
10,000
total
to,
and
you
know
constitutes
for
a
few
hundred
terabytes
of
data
which
would
have
taken.
You
know
quite
a
bit
of
resources
at
UNC.
E
E
Problems
which
are
more
algorithmic
that
you
know
in
order
to
get
to
that
work.
It
was
a
year
of
development,
you
know
of
algorithmic
development,
and
so
so,
even
though
we
can,
we
can
tackle
some
of
these
things
just
by
throwing
more
computers
at
it
and
those
are
solvable
and
somewhat
tractable.
There
are
still
all
the
combinatorics
that
we
even
make
Google's.
C
C
Know
in
that
old-school
kind
of
smear
thing,
I
showed.
We
know
that
it's
relatively
diverse
the
onion
repertoire
for
that
patient.
Well,
I'm,
now
telling
you
exactly
what
the
antigens
are.
So
how
do
I
know
that
it's
correct
I,
don't
know
that
what
my
fancy
assay
does.
What
it's
actually
saying
is
accurate.
That's
another
thing:
that's
hard
to
do.
How
do
you
go
back
and
define
the
truth
of
what
it
is:
orthogonal
e,
a
different
technology
at
different
mechanisms,
and
also
something
that
clinicians
and
regulatory
bodies?
C
So
that's
why
maybe
this
possibility
for
machine
learning
is
there,
but
only
with
orthogonal
testing,
and
since
we
already
just
described
the
testing
in
general
with
expensive,
adding
another
one
for
the
purposes
of
algorithm
tuning
I've
still
yet
to
have
Pharma
salt
for
me,
but
it
creates
the
problem
that
you
can't
throw
them
at
it,
because
you
don't
it's
hard
to
find
the
truth.
You
can
have
a
best
guess
for
the
truth,
but
even
then
you're,
relying
so
much
on
these
reference
databases
that
themselves
nee
curation.
C
E
Rely
extensively
on
machine
learning
in
order
to
develop
these
biomarkers
and
you
need
sometimes
the
process
algorithms
are
using
are
quite
simple,
sometimes
they're
more
complex,
but
in
general
machine
learning
is.
Is
it's
a
highly
utilized
tool
in
our
lab?
However,
what
we're
talking
about
here
is
really:
how
do
you
get
the
features
that
the
machine
is
going
to
learn
from,
and
that's
really
the
challenge,
and
that's
that's
where
work
is
needed
in
order,
because
we
can't
just
give
it
the
raw
data,
even
if
we
have
all
the
supervision
week,
we
can.
B
D
Exist
but
they're
not
available,
so
there
there
are
two,
so
there
they're
a
couple
of
large
pharma
funding
trials
in
the
net
there
pspace
that
have
been
published
where
RNA
and
DNA
sequencing
were
done,
and
one
of
them
was.
You
know
this
Fievel
curve
I
showed
earlier
was
derived
from
one
of
them.
Unfortunately,
those
data
are
not
publicly
available
and
I
had
written
emails
and
called
authors
on
the
study
and
drug
company
representatives
and
gotten
shut
down
handily
over
and
over
again.
D
Well,
they
say
that
they
did,
but
whether
they
actually
I
mean
it's
not
in
the
published
reports.
There
are
two
trials
two
trials,
one
of
the
where
our
immunotherapy
trials,
with
associated
RNA
and
DNA
seated
with
out
amplicon
repertoire
data,
but
where
we
can
run
our
algorithms
to
infer
TCR
and
BCR
am
repertoire
profiles
from
RNA
sequencing.
D
A
D
28
patients,
in
with
with
40
patients,
it's
just
not
enough
to
do
robust
model
building
for
associations
with
response
and
biomarker
discovery.
Now
that
doesn't
stop
people
from
trying.
So
one
group
published
a
paper
where
they
used
a
pseudo
machine
learning
approach,
the
perfectly
classifier
response
versus
non-response
in
these
small
datasets
and
I
came
by
Joel's
office
and
I
said
Joel.
What
do
you
think
of
this
and
he
looked
at
it
and
he
said
I
can't
believe
they
published
this
garbage,
and
that
was
a
direct
quote
and
so
now.
C
D
A
presentation
to
my
lab
group
I
actually
had
a
picture
of
Joe.
Sorry
Joe
Mike,
alright,
I'll
stop
talking
about
this.
What
Joe
meant
to
say
it
is
it
is
excellent
with
is,
is
this:
it
was
an
example,
an
extreme
extreme
overfitting
and
you
know
not
just
it's
not
just
Joel's
opinion.
It's
sorry
to
hang
you
out
to
dry
there
and
close
it
so.
D
B
B
C
D
Well,
there
is
an
immune
response,
working
group
that
Joel
and
I
are
actually
a
part
of
for
TCGA,
and
we
are
trying
to
answer
that
that
very
question,
but
without
being
able
to
actually
look
in
the
trials
in
big
trial,
datasets
and
figure
out
what
correlates
with
response
the
best
that
we
can
do
and
what
we
are
doing
is
we're.
Looking
for
the
combination
of
immune
features
that
best
predicts
say
overall
survival,
adjusted
for
tumor
tissue
type
and
other
clinical
factor.
C
Right,
like
a
figure
I
showed
earlier
with
the
very
weak
response
versus
the
wild.
You
know
heavier
response
that
person
lived
two
more
years
and
the
person
that
had
the
weak
response
with
the
RNA.
You
can
kind
of
get
to
that.
Get
to
that
point,
but
I
was
actually
addressing
those
on
access.
You
tishe
have
as
much
like
therapeutic
outcome
where
you
can
do
that
kind
of
stuff.
I
imagine
you
can
just
do
risk
and
this.
E
Is
this
is
what
I'd
survival
back
to
Kim?
You
asked
aminika
like.
What's
the
one
thing
that
we
that
I
mean
this
is
the
huge
missing
piece
is
that
we
have
beautiful
genomics
data
sets
like
TCGA
that
allow
us
to
make
a
lot
of
informed.
You
know
informed
I,
guess,
associations
about
genomics
within
genomics,
and
then
you
have
clinical
data
sets,
but
that
that
have
you
know
some
level
of
demographic
and
clinical
characteristics
that
you
can
start
to.
Look
at.
E
The
missing
piece
is
really
putting
those
two
in
the
same
showed
you
some
examples
of
when
we
have
clinical
data
when
we
have
two
nomic
data,
it
works
well,
but
that's
really
what's
missing
right
on
TCGA
doesn't
have
this,
it
wasn't
the
goal
of
TCGA
and
that's
why
we
were
funded
in
it's
not
supposed
to
be
the
next
round
of
TCGA.
But
I
forget
the
name.
It's
another
cancer
program
from
the
NCI,
where
we'll
actually
be
doing
a
lot
of
this.
E
C
A
way
that
we
are
leveraging
things
in
the
TCGA
where,
when
we
come
up
with
the
signature
and
these
40
patients,
28
patient
items,
the
cross-reference
prevalent
TCGA
to
then
essentially
leverage
what
percentage
of
patients
from
that
as
a
you
know,
marker
for
say,
melanoma
potentially,
could
react
it.
The
same
benefits
there's
a
lot
of
small
phase,
one
work
with
a
cool
biomarker
being
said,
of
course,
with
48
it,
because
it's
the
garbage.
They
will
be
that
small
and
everything
you
know,
but
expanding
that
out
TCA.
E
What
we're
doing
you
know
with
the
idea
is
that
we
can
use
all
that
the
high
dimensional
genomics
data
to
try
to
reduce
it,
to
the
features
that
we
really
care
about.
So
you
know
a
lot
of
the
RNA
signatures
are
correlated
some
of
the
protein
markers
and
so
on
and
so
forth.
So
how
can
we
distill
all
this
down
to
here?
The
four
or
five
things
that
we
think
we
can
measure
clinically
and
and
represent
the
diversity
of
genomics
that
are
in
these
larger
data
sets.
C
B
B
D
So
it's
still
substantially
better
and
I.
Think
one
of
the
things
that
the
field
has
to
learn
over
the
coming
years
is
how
to
best
manage
those
complications.
So
someone
gets
colitis
from
one
of
these
drugs
right
now.
We
give
them
high
dosage
of
steroids
and
taper
them
off
and
then
hope
it
doesn't.
It
doesn't
come
back.
Is
that
good
enough?
Is
there
a
better
therapy
for
that
same
thing
for
the
thyroid
disease
or
adrenal,
gland
disease
or
lung
disease,
and
so
on?
C
A
C
Linking
it
I
think
is
where
I'm
putting
the
quotes
in.
We
don't
have.
Oh
there's
just
not
a
solution
that
I've
seen.
Maybe
it
does
exist
where
you're
marrying
the
patient
clinical
information,
which
we
need
to
get
the
survivability.
We
need
to
know
what
drugs
they
were
put
on
and
then
all
this
wealth
of
genomics
as
I
mentioned
earlier,
we
have
pharmacists.
They
just
throw
that
X
almond
RNA
seek
into
our
cloud
database
it'll
be
great,
and
you
can't
because
you're
clowning
them
several
orders
of
magnitude
beyond
what
the
database
can
hold.
C
C
D
Well,
I
think
it's
more
of
the
data
of
patients
sequencing
data
from
patients
who
are
on
immunotherapy
trials
with
robust
clinical
annotation
pharma
companies
have
it
for
big
trials
and
aren't
releasing
it,
and
so
we
need
either
to
convince
the
NIH
to
give
us
know
millions
of
dollars
to
do
these.
Studies
on
the
large
trials
that
the
NIH
is
supporting
or
we
need
to
take
real.
D
B
B
E
Analyzed,
but
they
know
that
we
have
okay,
I
would
take
well
annotated
data
over
more
data.
Any
day
of
the
week
has
been
and
simulated
to
and
and
I
think
really
that's.
The
key
is
is
to
figure
out
some
way
where
we
can
get,
and
you
know
better
annotated
data,
not
just
more
data,
and-
and
so
this
has
been
illusion.
This
happened
in
a
couple
different
ways:
it's
either
you
know
government-sponsored
research
or
it's
figuring
out
a
business
model
that
makes
it
attractive
for
the
pharmaceutical
companies
to
to
to.
E
You
know,
allow
others
to
dig
into
their
data,
and
but
neither
one
of
those
is
happening
right
now
and
because
I,
you
know,
there's
just
it's
either
too
expensive
or
it's
too
much
of
a
risk,
and
that's
really
limiting
now,
but
speaking
to
just
a
data
science
perspective
that
that
doesn't
mean
that
we
should
just
stop.
That
doesn't
mean
that
we
should
oh,
we
can't
get
to
the
good
data,
so
we're
going
to
quit.
E
There's
still
enormous
opportunity
to
take
the
data
that
are
available
and,
and
companies
are
already
doing
this
from
just
looking
at
public
data.
So
there
is,
there
is,
you
know,
good
knowledge
to
be
had
of
whether
it's
going
into
hospitals
that
are
willing
and
linking
up
clinical
data
with
treatment
schedules
or
taking
and
then
going
through
and
sequencing
those
samples
right
and
kind
of
putting
together
figuring
out
here's
a
cohort
that
I
can
access
what
what
data
can
I?
B
A
Have
a
few
more
announcements
from
the
South
big
data
innovation
hub?
If
you
hang
on
for
five
minutes,
the
South
hub
will
be
open
up
application
process
program
to
empower
partnerships
with
increasing
government
starting
tomorrow.
So
companies
and
government
agencies
are
interested
in
data
science,
faculty
postdocs
and
graduate
students
for
up
to
12
weeks.
In
the
summer
they
will
post
descriptions
to
the
internships
and
residency's
available
on
south
big
data
hubs.
Org.
A
A
Up
to
competitions
for
those
quite
soon
as
part
of
that
Vonnie
Mondavi
of
Microsoft
Research
will
be.
She
and
her
colleagues
will
be
presenting
on
what
their
computer
to
do
for
our
hub
members,
so
be
sure
to
check
that
out
on
the
data
sharing
and
infrastructure
working
group
meeting
Friday
April
28th
from
3:00
to
4:30
p.m.
A
Website
for
that
help
save
data
innovation
hub,
for
there
will
be
a
free
microsoft,
viewer
training
workshop
full
day
up
in
washington,
DC
on
June
8.
The
same
day,
the
South
hub
will
be
having
a
workshop
at
Intel
on
our
digital
intelligence
and
then
the
next
day
on
do
nice.
Sal
hub
will
be
hosting
its
second
annual
All
Hands
meeting
at
Microsoft
facility
in
Friendship,
Heights
right
right
off
the
Metro,
so
we'll
be
announcing
the
hold
the
date
and
register
shortly.
A
Amazing,
Sarah
Davis
we'll
be
sending
that
message
out
for
you
all
so
stay
tuned
for
details,
details
and
lastly,
you
haven't
seen
it
already.
The
National
Science
Foundation
has
issued
a
call
for
proposals
for
the
next
round
its
post
proposals
as
part
of
the
hub.
Those
grant
awards
go
from
anywhere
from
100
to
higher
styles
and
slippery
Millions.
It
was
looking
for
cross-sector
partnerships
and
connected
data
scientists
and
Maine
scientists
and
practitioners
for
real
world
applications.
There
are.
A
Are
now
having
in
terms
internal
competition,
so
look
for
an
earlier
deadline.
The
NSF
deadline
is
September
18th
and
you
will
need
to
obtain
a
letter
collaboration
from
your
hub
by
June
kind
of
heat,
so
look
for
FAQ,
that's
coming
out
as
a
call
for
proposals.
Announcement
is
already
out
there
so
with
that.
Thank
you
to
Kimberly
for
a
fantastic
panel
and
thank
you
for
our
distinguished
guests
who
have
joined
us
today.
Those
of
you
who
are
in
the
room
please
feel
free
to
come
up
to
talk
to
our
guests
as
you.