►
From YouTube: Healthcare Data Science in Clojure - Scicloj meeting 15
Description
This was our first in a series of public meetings about Clojure and data science in healthcare and medicine.
In this meeting, the main theme was knowledge management.
* Sivaram Arabandi: "Biomedical Ontologies - Design Patterns and Applications"
* Pier Federico Gherardini: "CANDEL: A platform for biological data science using Clojure, R, and Datomic"
* Discussion
Moderator: João Santiago
The text conversation was quite active during this meeting. You may find it usetul to read through the text chat: https://tinyurl.com/y4ks7f6o
Clojurians Zulip discussion:
https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/healthcare.20meeting.20.231.3A.20knowledge.20management
A
A
My
name
is
john
santiago
or
just
like,
I
said
santiago.
For
short,
I
am
a
medical
doctor
currently
working
as
a
data
scientist
in
berlin,
germany
and
I'll
be
moderating
the
meeting
today
before
we
start
with
the
actual
presentations
I'd
like
to
give
everyone
a
couple
of
guidelines
to
make
the
meeting
as
pleasant
as
possible.
For
for
everyone,
please
mute
yourself
when
you're,
not
speaking,
you
can
post
and
please
do
post
any
questions
at
any
time.
In
the
chat,
I
will
be
collecting
the
questions
throughout
the
meeting
and
after
each
speaker's
presentation.
A
I
will
pick
one
or
two
questions
to
to
present
to
the
speakers
and
then
at
the
end
we
will
go
over
the
ones
that
were
not
selected
during
the
open
discussion.
A
A
So
without
further
ado,
I
would
give
the
stage
to
you
sivaram,
to
talk
about
biomedical
ontologies.
B
B
Thank
you,
santiago
and
thank
you
daniel
for
setting
this
up.
It's
a
pleasure
to
be
speaking
here
at
the
the
closure
for
science
and
the
philosopher
data
science
group,
and
today
I
will
be
talking
about
biomedical
ontologies.
B
Some
design
patterns
and
applications
so
I'll
be
going
over
a
little
bit
of
ontology
basics,
because
I
think
from
some
of
the
previous
conversations
it
looked
like
the
you
know,
most
people
are
not
familiar
with
ontologies,
so
bear
with
me.
Okay,
let's
start
with
taking
a
look
at
what
a
clinical
case
looks
like
here's
an
example:
60
years
old
male
patient,
presented
with
a
sudden
onset
of
chest,
pain,
nausea,
sweating
and
dyspnea.
B
B
Angiography
was
performed
with
the
demonstrated
left,
anterior
descending
artery
stenosis
more
than
90,
and
the
patient
was
stented.
So
this
is
a
kind
of
a
typical
note.
You
would
see
now,
if
you
annotate
this
or
if
you
classify
this
into
certain
kinds
of
things
here,
we
have
you
know
different
kinds
of
data
elements
like
60
years:
male
chest,
pain,
nausea,
sweating,
dyspnea,
mitral
valve
stenosis
and
all
now,
if
we
categorize
this
a
little
further
right.
This
is
what
they
look
like.
B
So
we
have
some
demographic
kind
of
information
like
age
and
gender,
some
symptoms
and
science,
lab
information,
diagnosis
and
some
management
aspects
that
are
there,
and
these
come
in
different
kinds
of
data
types
right.
There
are
some
continuous
type
of
data
like
the
values
and
that
have
measurement
units
and
some
range
and
things
like
that,
like
age,
weight,
troponin
fev1,
which
is
the
respiratory
volume
measurement,
and
then
on
the
other
hand,
we
have
some
categorical
types
of
values
like
sex
or
diagnosis,
medications,
etc.
B
Now
boolean
is
a
special
type
of
categorical
right
where
you
have
like
a
true
or
false
value.
For
this
and
certain
things,
like
you
know,
does
the
patient
have
a
past
history
of
stroke?
Is
there
a
history
of
coughing,
right,
you're
going
to
say
true
or
false
there
and
this
so
the
scope
of
all
this
is
that
we
have
a
variety
of
types
of
different
information
right.
We
have
we're
looking
at
clinical
care
data,
symptom,
science,
diagnosis,
labs,
etc,
information
from
research,
registries,
studies,
clinical
trials,
etc.
B
Some
demographic
information
and
some
genomics
information.
So
this
presents
us
with
a
number
of
challenges.
Data
challenges
right.
The
good
news
is
that
it's
there's
a
lot
of
data
available
for
us.
The
bad
news
is
that
the
data
is
it's
all
segmented,
it's
all
fragmented
in
silos
and
in
addition,
there
are
because
we
are
dealing
with
a
number
of
different
data
sources.
B
We
we
have
structural
differences
right
because
of
the
localized
database,
schemas
that
are
used
and
even
within
that,
how
well
structured
is
it
right
and
you
might
get
some
text
data
as
well.
So
it
is
an
integration
challenge
for
us
for
sure,
but
that's
only
the
beginning,
because
we
then
have
to
we're
also
faced
with
the
the
challenges
with
the
semantics,
the
meaning
of
the
data.
So
we
have
we'll
be
having
different
labels
that
mean
the
same
thing
in
different
databases.
B
For
example,
if
you
take
male
gender
right,
it
could
be
recorded
as
as
man
it
could
be
as
male
as
m
as
one
as
zero.
So
there
are
different
variations
in
how
this
can
be
recorded
or
for
that
matter,
even
within
us
within
a
single
table.
B
You
can
come
across
variables
like
beta1
and
beta2
right
and
to
a
clinician
if
you're
talking
about
beta1
and
beta2
blockers,
you're
thinking
about
beta1,
selective
blocker
beta,
two
selective
blockers,
two
different
types
of
medications,
but
in
this
case
in
the
in
one
of
the
examples
that
use
cases
that
we
had
seen,
beta1
and
beta2
both
represented
beta
blockers
without
a
diuretic,
and
they
actually
represented
different
visits
right.
B
The
first
visit
called
beta1
and
the
second
call
was
called
beta2,
so
that
was
a
complete
curveball
for
us
when
we
found
out
that
this
information
from
one
of
the
smes
on
the
project,
so
now
we
have,
the
same
label
can
be
used
with
different
meanings.
So
previously
we
looked
at
different
labels.
That
could
mean
the
same
thing,
but
now
different
labels.
B
What
does
it
mean
to
you?
Well,
we
have
at
least
three
different
meanings
right.
One
is,
you
could
be
feeling
cold
the
it
could
be
a
cold
infection
which
is
a
infectious
disease
or
it
could
be
a
chronic
obstructive
lung
disease,
which
is
an
other
synonym
for
chronic
obstructive,
pulmonary
disease
or
copd
also,
and
then
we
have
things
like
hypopnea,
which
is
breathing.
That
is
very
shallow
or
has
an
abnormally
low
respiratory
rate
right.
B
So
the
the
english
definition
looks
very
simple,
but
when
we
look
at
the
operational
definitions,
the
american
academy
of
sleep
medicine
has
two
definitions.
There
is
a
recommended
definition
which
says:
airflow
reduction
greater
than
30
of
baseline,
lasting
for
10
seconds
and
a
hemoglobin
oxygen
desaturation
of
greater
than
or
equal
to
four
percent
from
baseline.
Now.
This
is
one
of
the
definitions.
B
There's
an
alternate
definition
so,
instead
of
30
percent
they'll,
say
greater
than
or
equal
to
50
percent
again
for
10
seconds,
but
with
a
hemoglobin
desaturation
of
greater
than
or
equal
to
3,
instead
of
4
and
from
one
of
the
projects
that
I
worked
on,
we
know
that
there
are
at
least
10
other
known
definitions
of
this,
so
the
challenges
for
us
when
it
comes
to
data
can
be
summarized
in
this
way.
We
have
to
deal
with
some
simple,
as
well
as
complex
definitions.
B
We
need
to
deal
with
overlapping
definitions
like
we
saw
with
the
hypomia
operational
definition.
We
need
to
deal
with
data
integration
challenges
both
across
applications
and
across
domains.
We
need
to
be
able
to
handle
query
of
these
different
data
sources
and
also
make
the
data
suitable
for
discovery.
Now.
This
is
exactly
the
situation
where
ontologies
have
proven
to
be
extremely
extremely
useful.
B
So
in
this
presentation,
I'm
going
to
talk
about
what
is
an
ontology
I'll,
give
a
small
demo
of
what
you
can
of
some
of
the
benefits
of
using
in
ontology
and
we'll
also
look
at
some
ontology
patterns
and
uses
okay.
So,
let's
start
with
what
is
an
ontology
again
going
back
to
this
definition?
An
ontology
is
a
formal
representation
of
knowledge
as
a
set
of
concepts
within
a
domain
and
the
relationships
between
those
concepts.
B
So
when
you're,
looking
at
thinking
of
concepts,
concepts
are
to
do
with
reality
and
concepts
within
domains
within
that
reality,
so
the
so
we
there
are.
There
are
if
this
is
a
semiotic
triangle.
Some
of
you
are
probably
familiar
with
this
and
the
the
so
this
is
the
real
world
around
us
and
we
refer
to
the
real
world
using
different
terms
like
city
or
a
capital
or
diseases,
and
things
like
that
and
for
the
terms
that
we
are
using.
B
B
Now,
when
we
communicate
these
words
to
other
people,
it
they
are
thinking
about
it
and
it
evokes
a
certain
concept,
and
that
is
about
the
world
around
us.
So
this
is
the
semiotic
triangle
about
how
we
interpret
reality.
How
we
talk
about
reality
and
how
we
conceptualize
reality
now.
B
Unfortunately,
reality
is
is,
is
not
not
so
easy
to
comprehend.
Right,
there's
often
often
a
lot
of
things
that
lost
in
translation.
So
this
is
a
a
program
or
humor
a
little
bit
of
program
humor
here
where
a
customer
gave
the
requirements
for
a
swing,
and
this
is
how
we
explained
it,
and
you
can
see
the
transition
says
how
it
was.
Finally,
it
was
interpreted
and,
interestingly
enough,
the
last
one
shows
that
okay,
this
was
actually
what
the
customer
wanted,
but
this
is
how
we
explained
it.
B
B
So,
for
example,
before
the
16th
century,
it
was
common
knowledge
that
the
earth
was
the
center,
was
it
the
center
and
there
were
pretty
complex
maps
to
track
how
the
different
planets
moved
and
all
that
stuff
coppernickers
in
16th
century,
presented
a
different
view,
and
this
was
the
heliocentric
or
the
sun-centric
model,
and
that
changed
our
perception
of
reality
of
how
we
thought
about
earth-
and
you
know
other
planets.
B
Now,
interestingly,
copper.
Nickus
was
very
reluctant
to
present
this
view
because
it
was
so
different
that
he
was
afraid.
People
would
think
that
he
was
crazy,
but
it
was
a
good
thing
that
it
came
out
eventually
now
going
back
to
how
we
model
information
of
or
how
we
capture
the
data
with
regard
to
the
reality
around
us.
So
we
saw
this.
We
saw
this
description.
60
years
old,
male
patient,
presented
with
certain
onset
of
chest,
pain,
nausea,
sweating
and
listening.
B
So
this
is
a
very
simple
model.
It
is
basically
a
list
of
terms
and
that
are
there
in
the
in
the
in
in
our
database,
in
the
data
model,
there's
no
additional
meaning
that
is
around
given
around
chest:
pain,
epigastric,
cough
epigastric,
pain,
cough
and
dyspnea.
For
the
most
part,
this
is
how
it
looks
like
right
now
there
may
be
definitions,
but
oftentimes.
It
depends
upon
who
is
seeing
this
list
and
how
it
is
integrated.
B
Now
an
ontology
model,
on
the
other
hand,
is
a
very
rich
network
of
of
relationships.
It's
a
web
of
relationships,
so
it
if
you
see,
if
you
take
the
same
information
or
similar
information
right,
you
have
what
I'm
showing
here,
is
anatomical
aspects
and
and
the
concepts
related
to
pain.
B
B
Pain
is
subclassed
into
abdominal
pain
and
chest
pain,
so
abdominal
pain
is
a
type
of
pain.
Chest
pain
is
a
type
of
pain
and
now
abdominal
pain
itself
is
subclassed
as
epigastric
pain,
and
there
are
other
regions
we
can
put
here,
but
this
is
the
only
one
that
I
I'm
showing
here.
Similarly,
chest
pain
has
ischemic
chest
pain,
periodic
chest,
pain,
sternal,
chest
pain
depending
upon
what
area
and
what
type
of
pain
it
is
now.
B
Epigastric
pain
is
a
type
of
abdominal
chest
pain
and
has
cite
the
epigastrium,
just
as
abdominal
pain
is
a
type
of
pain
and
has
side
abdomen.
So
you
see
the
the
part
of
structure
here
and
the
iso
structure
here
they
go
together.
Similarly,
over
here
chest
pain
has
side,
chest
and
then
sternal
pain
is
a
type
of
chest,
pain
and
has
site.
Sternum
and
sternum
is
a
part
of
the
chest.
B
So
if
you
look,
if
you
connect
these
two
things,
you'll
see
that
epigastric
pain
is
a
type
of
you
know.
It's
it's
connected
here
in
the
ontology
model.
Now
the
good
thing
about
an
ontology
model
is
that
this
graph
representation
is
actually
pretty
easily
understood
and
you
can
actually
tell
a
story
from
it
and,
in
addition
to
just
the
graph
representation,
each
term
has
a
textual
definition
to
tell
us
to
give
us
what
to
tell
us
what
it
means.
B
B
Control
vocabularies
are
kind
of
lists
which
bring
together
some
specific
terms
which
you
want
to
deal
with
together.
Taxonomy
is
more
a
little
bit
more
structured
in
which
you
have
a
hierarchical
representation
of
the
terms.
B
So
in
this
case
an
example
of
cardiovascular
system
disease
can
be
has
of
two
types:
heart
disease
and
vascular
disease,
and
then
again,
heart
disease
can
be
of
different
types:
microstenosis,
aortic,
regurgitation,
hypertension,
peripheral
artery
disease
and
things
so,
hypertension,
peripheral
artery
disease
or
vascular
diseases,
mitral
stenosis
and
regurgitation
are
heart
diseases.
B
So
so
you
have
a
taxonomy
here
which
represent
a
hierarchical
structure
of
diseases,
and
the
most
complex
of
this
is
the
ontological
structure
where
ontologies
represent
not
just
the
the
taxonomic
structure,
which
is
in
the
in
the
form
of
the
is
a
relationship
here,
but
also
it
has
other
relationships
like
here.
The
part
of
relationship
partnering
in
for
anatomical
structures
and
other
relations,
like
has
site
you
know,
abdominal
pain,
has
site
abdomen.
Epigastric
pain
has
side
epigastrium
things
like
that.
B
So
this
is
a
rich
web
of
relationships
that
we
get
we
can
use
with.
With
our
information
model
switching
gears
a
little
bit.
I
want
to
talk
a
little
bit
about
the
place
of
ontology,
so
what
you
know
how
it
fits
into
the
bigger
picture,
so
ontology
is
one
of
the
specifications
that
is
the
part
of
the
semantic
web
stack.
B
So
if
you
see
this
middle
structure
in
white,
you
can
see
that
at
the
bottom
is
the
rdf,
which
is
the
foundational
layer
for
the
semantic
web,
and
then
you
have
some
rdfs,
which
is
the
rdf
schema
language,
and
then
you
have
owl,
which
is
more
expressive
and
in
and
allows
us
to
specify
some
rules,
and
then
there
is
the
sparkle
language,
which
is
basically
the
query
language
that
is
used
for
semantic
web.
B
So,
looking
at
the
first
one
right,
the
foundation,
one
is
the
resource
description
framework.
Now
the
the
basic
fundamental
part
of
the
resource
description
framework
is
this.
Subject.
Predicate
object,
data
structure
and
these
and
resources
every
resource
in
in
semantic
web
is
described
using
these
this
data
structure
subject
predicate
and
object
data
structure.
Basically
these
are
statements
and-
and
these
triples
are
basically
they
are
the
fundamental
structure
of
rdf.
B
B
So
here,
in
this
case,
we
are
talking
about
bob
who
lived,
and
there
are
two
relations
lives
in
and
nose
and
and
then
there
are
this
houston
that
is
related
to
the
lives
in
predicate
with
bob,
and
then
we
are
also
we're
talking
about
different
kinds
of
concepts
here
right
and
if
you
look
at
it
each
one
of
these
entities,
the
subject
and
the
predicates,
and
all
that
they
are
identified
using
a
uniform
resource
identifier
or
a
uri,
they
uniquely
identify
entities
and-
and
if
and
when
it
comes
to
urls,
these
can
often
be
resolved.
B
So,
for
example,
houston
can
be
represented
as
dbpedia.org
page
houston,
and
if
you
go
to
this
web
website,
you
can
see
that
it
resolves
into
a
description
of
what
houston
is
now
this
more
you
can
that
rdf
provides.
It
also
provides
an
ability
for
us
to
tell
what
kind
of
thing
bobbies
or
lisa
is,
or
basically
an
entity
is
so
we
can
represent.
Bob
is
a
type
person,
lisa
is
of
type
person
and
houston
is
of
type
city,
and
these
also
get
their
own
uris
right.
B
B
Just
to
summarize
this
part
of
it,
the
semantic
web
has
different
parts.
The
rdf
is
a
standard
model
for
data
interchange
and
triples
form.
The
fundamental
data
structure
of
the
rdf,
rdf,
schema
or
rdfs
is
the
schema
language
that
allows
us
to
describe
the
instance
data
that
we
represent
in
rdf.
So
it
allows
us
to
define
classes
and
subclass
structures
like
a
person
is
a
is
a
subclass
of
organism
or.
B
Things
like
reading
is
a
subclass
of
activity,
and
things
like
that
web,
ontology,
language
or
owl
is
provides
us
additional
functionality
for
describing
rules
and
things
like
that
for
authoring,
ontologies
and
finally,
sparkle
is
the
query
language
that
we
use
for
for
querying
rdf
data
and,
lastly,
serialization
formats,
rdf
data,
rdf,
rdfs
and
web,
and
all
all
three
of
them
can
be
serialized
into
different
into
into
different
formats,
like
n
triples
turtle,
rdf
xml
or
even
json
ld.
B
So
these
are
all
text
formats,
and
the
important
thing
to
remember
is
despite
these
different
formats,
that
these
that
that
ontology
or
an
rdf
data
is
serialized.
They
all
mean
the
same
thing.
B
B
So
what
I
did
here
is
I
have
a
small
ontology
file
that
I
that
I've
created
for
demonstrating
some
of
the
different
aspects,
and
so
I'm
going
to
show
two
different
things.
One
is
at
the
class
level
and
the
other
is
at
the
instance
level.
So
if
you,
if
you
look
at
things
represented
at
the
top
level,
body
structures,
clinical
findings,
drug
products,
geolocations
substance
and
a
value
partition,
so
the
body
structure
is
pretty
simple
right.
B
It's
a
straightforward
data
structure,
so
we
have
abdomen
chest
epigastrium,
pleura
and
external,
and
if
you
look
at
the,
if
you
look
at
the
definition
for
us,
for
example,
for
epigastrium,
we
say
it's
a
part
of
abdomen
and
sternum
is
a
part
of
the
chest.
B
Chest
is
what
we've
defined
here
and
similarly
I've
in
terms
of
clinical,
finding
I've
defined
pain
and
some
subclasses
of
pain,
abdominal
pain,
chest
pain,
epigastric
pain,
ischemic
pain,
periodic
pain
and
external
pain.
So,
if
you
look
at
sternal
pain,
you
can
see
the
definition
has
site
sternum,
it's
a
subclass
of
pain
and
has
site
sternum.
B
B
B
So
we
have
here
different,
you
know:
I've
represented
a
city
and
capital
city,
capital
city
is
defined
as
capital
of
some
country
or
state
defined
a
continent,
and
this
is
a
defined
class.
So
it
has,
you
know
you
define
the
different
continents,
and
so
I
have
some
countries.
Have
some
regions
have
some
states
and
things
like
that
now,
if
you
look
down
here,
you
can
see
some
of
these
instances
that
I
have
defined
like
some
cities,
london,
hyderabad,
paris,
oslo
and
all.
B
So
if
you
go
into
the
definition
of
this,
what
I'll
do
is
I'll
turn
off
these
instances?
Inferences
and
we'll
see
here
so
london
is
defined
as
a
city
and
has
a
property
capital
of
the
united
kingdom.
Hyderabad
is
defined
as
a
city
and
is
and
has
a
property
capital
of
telangana
paris,
similarly,
capital
of
france.
B
B
Similarly,
you
can
see
for
other
things.
You
can
see
that
hyderabad
again
has
been
classified
as
a
capital
city,
even
though
we
didn't
say
it
is
a
capital
city
or
of
type
capital
city,
and
also
it
is
a
part
of
asia.
These
are
some
of
the
inferences
that
we've
drawn
because
we
said,
if
I
go
to
country
you
can
see
here.
B
India
is,
india,
has
part,
hyderabad
has
part
and
pradesh
and
such
things
so
there
there
are
a
lot
of
india
as
a
part
of
asia
is
what
I
declared,
but
here
are
all
other
things
that
it
has
inferred
everything
in
yellow.
B
So
I
hope
this
gives
you
a
little
bit
of
an
an
idea
of
what
you
can
do.
The
kind
of
things
you
can
do
with
with
a
with
an
ontology
going
back
to
the
presentation.
B
I
want
to
look
at
you
know:
go
quickly,
go
through
some
ontologies
in
healthcare.
So
if
you
look
at
the
proper
definition
for
a
diseased
term,
something
like
acromegaly,
this
is
what
it
would
look
like.
You
know
we
represent
its
clinical
findings
like
acromegaly
is
a
disease
which
has
which
presents
with
large
jaw,
large
hands
and
feet
joint
pains,
excessive
height,
and
all
that
and
then
you
it.
B
It
is
a
type
of
gigantism,
which
is
a
type
of
hyperpituitarism
that
has
a
location,
pituitary
and
a
part
of
the
pituitary
is
adeno.
B
Hypophysis
and
acromegaly
has
location,
editing
hypothesis,
so
you
can
see
that
you
know
this
is
a
very
good
graph
of
graphical
representation
of
what
this
of
this
model
of
of
a
model
of
what
the
disease
looks
like
we-
and
we
do
this
by
using
a
number
of
reference
ontologies
like
snomed,
which
is
a
very
large
model
covering
all
of
medicine,
with
about
300,
000
terms,
foundational
model
of
anatomy,
with
about
70,
000
terms,
gene
ontology
with
about
30
000
terms
and
then
ontology
for
general
medical
science,
infectious
disease,
ontology,
and
so
on
now
reference
ontologies
are
they
represent
domain
models?
B
The
benefits
are
that
they
are
carefully
curated
consensus
based
and
they
often
have
formal
definitions.
Both
english
and
logical.
So
here
is
a
definition
in
english
of
what
a
disease
means
a
disease
is
a
disposition
to
undergo
pathological
processes
that
exists
in
an
organism
because
of
one
or
more
disorders
in
that
organism.
B
B
This
is
where
application
ontologies
come
in,
which,
where
we're
dealing
with
specific
needs,
for
example
in
one
of
the
projects
that
I
did
semantic
db,
this
was
in
the
cardiovascular
space
dealing
with
ecg
cath
echo
and
surgical
procedures.
Physiomon
physiomimi
was
in
the
sleep
domain
and
we
were
dealing
with
polysomnography
data
and
then
we
have
other
needs
like
user
interface,
needs
for
data
entry
and
etc
measurement
units
and
even
semantic
search
applications.
B
B
So
just
to
summarize
this
right,
so
we
have
a
to
build
an
application
ontology,
the
scope
of
which
is,
you
know,
is
we
take
the
use
cases
which
explains
what
the
domain
is?
That
gives
us
a
top
down
view.
B
We
look
at
the
content
that
is
available,
the
data
that
is
available
and
it
provides
us
the
bottom
up
view
and
we
look
at
other
needs
like,
for
example,
the
user
interface
needs,
and
we
use
some
modeling
principles
like
reuse,
modularization
and
frameworks
to
do
this
so
I'll
skip
this
one.
This
is
you
know
the
previous
one
was
looking
at
the
clinical
use
case
I
talked
I
spoke
about.
You
know
from
a
top-down
approach.
B
We
can
look
at
what
kind
of
questions
that
we
want
answered
from
a
bottom
up,
some
term
lists
like
gender
age
and
all
and
this
between
the
two
of
them.
It
gives
us
prioritization
of
how
we
should
approach
the
modeling
technically
again,
the
user
interface
requirements
like
visual
query
and
search
and
things
like
that:
complex
definitions,
as
well
as
calculations
and
derived
values.
B
So
the
first
approach
is
reuse
right.
The
goal
is,
we
need
to
reuse
existing
models
and
in
a
custom
situation
to
solve
the
problem,
but
the
reference
models
are
pretty
large
and
we,
what
we
do
is
we
use
segmentation
algorithms
to
create
custom
modules
of
this,
and
thus
this
results
in
custom
models
that
are
useful
for
our
individual
situations.
B
The
thing
is
the
more
and
more
custom
models
we
use
right.
It
can
again
lead
to
fragmentation,
and
this
is
where
frameworks
come
into
picture
is
with
the
goal
to
avoid
fragmentation,
and
so
for
this
we
use
frameworks
like
basic,
formal
ontology.
This
is
one
of
them,
there's
other
ones
like
biotop
and
all,
and
then
there
are
some
principles
that
are
laid
out
by
the
oboe
foundry,
which,
like
reuse
and
modularization
and
frameworks,
and
all
that
that
we
use
the
reason
why
we
do.
B
So
here
is
an
example
of
the
sleep
domain
ontology
that
we
built
using
basic,
formal,
ontology,
biotop
clinical,
patient
record
ontology
ontology
for
general
medicine,
medical
science,
nemo
for
a
neurology,
ontology
anatomy,
and
then
some
special
ones
for
units
and
drugs
now
so
this
here
is
here
is
the
first
use
case
we
look
at
this
is
the
physio
from
the
physiomimi
project,
and
so
what
we
built
is
what
we
had.
What
we
did
was
we.
B
We
were
accessing
data
from
about
six
or
six
plus
data
stores
in
in
four
different
states
in
sorry,
in
four
different
institutions,
three
different
states
and
and
and
all
of
them
built
individually
and
separately
right.
So
there
was
none
of
the
data.
Schemas
were
matching
and
we
needed
to
have.
We
need
to
build
a
platform
whereby
we
could
query
the
data
across
these
four
institutions.
B
So
let's
see
how
we
did
this
so
the
first
thing
in
the
data
we
looked
at
were
the
units
right.
We
have
units
things
like
age,
represented
in
years
height
in
centimeters,
weight
in
pounds
and
hypopnea,
which
has
you
know,
30,
and
you
know
seconds
time,
duration,
percentage
and
things
like
that.
So
we
have
units
both
in
si
as
well
as
english.
B
So
what
we
did
was
we
took
a
look
at
what
kind
of
ontologies
were
available
for
units
right,
so
that
was
so.
We
had
a
couple
of
different
models.
We
could
use
patto
and
measurement
units
ontology,
but
our
analysis
showed
that
it
was.
They
were
not
adequate
from
a
coverage
point
of
view
and
also
they
were
not
expressive
enough.
So
we
were
forced
to
build
a
custom
ontology
for
units.
B
B
B
Ontology
language
does
not
support
this,
so
we
we
overcame
this
by
using
annotate
formula
annotations
where
we
represented
the
formula
in
as
part
of
the
definition
as
in
in
the
annotation
and
from
this
we
you
know-
and
we
did
like
foot
to
inch
units
of
the
same
type
kilogram
to
gram
as
well
as
conversions
between
english
units
and
si
units
and
based
on
these
formulae,
we
we
did
an
inferential
expansion
so
that
we
could
we
could
get
conversion
between
any
unit
to
any
unit
that
that
were
related
now.
B
The
second
challenge
that
we
had
to
solve
was
complex
definitions.
B
So
here
is
the
something
you
know
I
showed
this
previously
sleep,
hypopnea,
finding
and
american
academy
of
sleep
medicine
has
this
definition,
air
flow
reduction,
greater
than
or
equal
to,
30
percent
of
baseline
and
at
least
10
seconds
and
hemoglobin
desaturation,
four
percent
from
greater
than
four
percent
from
baseline
and
you
can
see
here-
is
the
operational
definition
in
out
now.
B
Now,
the
and
like
this
we
had
about
ten
different
definitions
that
we
found.
You
know
like
sleep,
hypotenuse,
finding
for
adults,
hypotenuse
finding
for
child
and
asm
one
asm2
chat
and
all
that
and,
as
you
can
see
here,
we
we
we
defined
each
one
of
them,
basically
in
us
as
a
single,
simple
list
under
the
named
hypothetical
finding
under
the
hypotenuse
finding
and
look
what
happens
when
you
classify
it.
B
B
Has
these
subclasses
chat
and
asm
definition
too.
So
the
third
thing
that
we
did
was
use
the
ontology
for
user
interfaces,
where
we
looked
at
the
different
kinds
of
data,
continuous
data,
categorical
data
and
boolean.
So
continuous
data
like
age
is,
you
know,
is
provided
as
a
with
with
this
formal
definition
right
using
the
the
measurement
unit
being
being
a
year.
It's
a
float
with
different
ranges
right
and
then
for
categorical.
B
Similarly,
we
define
in
this
case,
if
you
can
see
race,
we
defined
all
these
different
races
that
follow
us,
categorical
values
and
when
we
and
when
we
generate
the
user
interface
from
this
based
on
the
on
these
kind
of
patterns,
we
can
we
draw
these
kind
of
widgets.
So
if
you
pick
age
at
the
time
at
the
time,
in
this
case,
we
picked
age.
B
At
the
time
of
study,
you
can
see
how
a
widget
is
drawn,
where
you
can
set
the
age
limits
and
then
for
a
categorical
kind
of
values
you
can
see
you
can.
You
can
represent
this
as
as
check
boxes
now
going
back
to
this
picture
right
and
so
in
the
user
interface.
You
can
see
that
we
have
a
number
of
terms
that
you
can
see
here.
B
So
we
have
a
search
interface
where,
if
you
as
you
as
you
type
in
the
words,
letters,
the
the
terms,
start
filtering
and
these
terms
came
directly
from
the
sleep
domain
ontology.
B
B
We
we
placed
an
adapter
that
provided
the
mapping
layer
between
the
ontology
and
each
one
of
these
data
stores
and
that
that
provide
analysis
and
then
using
ontology.
We
generated
this
user
interface
and
for
the
queries
that
were
that
were
described
using
this
in
user
interface,
we
created
an
abstract
right
based
on
what
the
user
wanted,
and
that
was
sent
over
to
each
one
of
these
data
stores
and
using
the
mapping
layer
translated
into
the
individual
sql.
B
Language,
the
query
ran,
and
then
the
results
that
were
brought
back
were
again
translated
back
into
the
ontology
terms
and
displayed
in
the
user
interface.
So
this
was
the
this
was
one
use
case.
The
second
use
case
is
that
we
did
was
a
clinical
key
or
a
semantic
search
interface,
where
we
were
dealing
with
about
500
plus
journals,
700
plus
textbooks,
which
we
put
through
an
nlp
pipeline
for
information
extraction,
and
we
used
emmett
or
lcvs
medical
taxonomy
to
as
a
core
of
the
ontology
model
for
this
nlp
pipeline.
B
So
this
was
a
project
that
I
did
it
when
I
was
at
elsevier
now.
The
results
of
this
nlp
pipeline,
where
you
know,
was
in
two
forms:
one
was
as
a
link
data
repository
which
provided
which,
for
which
we
had
a
sparkle
interface,
and
we
could
do
interactive
queries
and
the
other
one
was
a
database
with
apache
solar
and
for
which
we
were
with
which
we
could
do
a
full
text
search.
B
So
this
was
this.
This
was
a
product
that
was
developed
and
and
deployed
into
the
market,
the
clinical
key
search
interface.
Okay.
So
with
that,
I
want
to
conclude
this
by
just
going
over
the
role
of
an
ontology
in
in
describing
reality.
So,
first,
it
is
useful
as
a
formal
representation
and
is
computable.
B
It
forms
the
basis
for
an
information
model,
it
aids
query
formulation
using
sparkle
and
we
saw
how
we
can
integrate
multiple
distributed
data
stores
with
it
and
and
we
can
bring
together
different
standard
terminologies
that
provides
data
integration
as
well.
B
So
I
want
to
thank
acknowledge
some
of
my
colleagues
with
whom
I
had
worked
on
these
projects:
semantic
dvd
project,
cuban
clinic
the
physio
mimi
project
at
case
western
reserve
university
and
the
smart
content
project
at
elsevier.
A
Awesome,
thank
you
very
much.
I
think
you're
all
giving
a
virtual
round
of
applause
very
interesting
presentation.
I
am
certainly
inspired
by
these
ideas
to
because
this
is
this
is
actually
useful
for
a
multiple
of
situations,
not
just
healthcare.
A
B
So
it's
a
mix
of
both
so
for
to
build
to
create
good
ontologies.
It
takes
a
lot
of
work
right
and
it
takes.
You
need
to
have
good
domain
knowledge
to
to
to
come
to
the
to
represent
the
definitions.
Well,
and
it's
a
very
consensus-based
approach.
Also
now
does
that
mean
that
we
don't
use
any
other
any
tools
and
techniques
to
help
us
in
this
process?
B
No,
we
we
do
use
some
nlp
tools
like
if
you
have
a
body
of
text
that
you
want
to
analyze,
to
see
what
kind
of
terms
that
we
want
to
represent
in
the
ontology
you
can,
you
can
do
use
use
some
nlp
and
you
can
do
like
word
frequencies,
simple
things.
You
know
that
tells
us
what
are
the
most
common
things
that
are
available
there.
That
will
help
us
from
a
bottom-up
approach
that
I
talked
about
in
in
understanding.
What
are
the
important
terms?
B
The
areas
that
we
want
to
represent
first
now
are
those
terms
that
we
want
to
represent.
Are
they
available
in
any
ontology
in
any
existing
ontology?
Because
if
something
is
if
and
if
it's
existing
in
an
ontology,
we
want
to
be
able
to
reuse
it.
A
number
of
these
ontologies
are
open
source
they're,
openly
available.
Snomed,
rxnorm,
fma
and
link
are
some
of
the
models
that
we
use
very
often
in
in
medicine.
B
So,
and
these
are
these
are
fairly
flexible
with
their
licenses,
so
we
can
use
them,
and
so,
if
we
don't
want
to
reinvent
the
wheel
right.
So
if
if
the
first
thing
is
search,
look
to
see,
if
something
covers
it,
if
it
is
available,
use
it,
and
that
is
one
of
the
basic
principles
of
the
semantic
web
of
using
ontologies.
B
If
it
is
not
available,
then
we
will
need
to
extend
a
framework
or
an
existing
ontology,
because
we
are
not
creating
this
in
isolation
right.
So,
for
example,
covid
recently
covet
was
a
good
example.
Right
till
recently
we
did
not
have
a
term
or
a
disease
coming
out
of
you
know
from
code.
So
what
we
did
is
we
extended
the
infectious
disease
ontology
and-
and
we
extended,
we
subclassed
sars
virus
to
represent
the
coronavirus,
the
the
sars
cove
two
virus
and
similarly
the
disease
is
caused
by
by
the
coronavirus.
C
Very
interesting,
it's
dr
arbandi
tom
hicks.
I
I
thank
you
very
much
nice
presentation
of
introduction
to
to
ontologies,
but,
but
I
think,
of
an
oncology
as
a
tool
that
is
serves
the
purpose
of
achieving
some
higher
goal
and
I'm
curious
to
know
what
what
your
your
uses
were
for
the
oncology
once
you
created
it.
For
instance,
in
the
first
example,
you
gave
the
clinical
notes.
C
B
B
And
so
once
we
were
able
to
interpret
that
clinical
text.
It
was
represented
in
the
form
of
an
rdf
graph
and
and
then
the
same
ontology
terms
were
used
as
as
part
of
the
query
interface
to
query
the
rdf
graph.
B
To
extract,
it
is
a
little
bit
of
a
chickening
the
fortunately,
we
a
lot
of
the
reference
ontologies
provides
us
with
very,
very
good
domain
coverage,
and
so
a
lot
of
terms
are
already
there
existing
and
we
can
leverage
those.
B
But,
as
we
are
parsing
the
text,
we
are
going
to
find
terms
that
are
not
there
and
those.
We
definitely
need
to
add.
So
it's
not
a
it's,
not
a
one-off
process.
Where
you
you
look
at
some
text,
you
you
develop
the
ontology
and
you're
done
with
it.
So,
as
new
text
comes
into
play,
your
or
our
our
as
our
reality
changes
right,
you're
going
to
keep
evolving
the
ontology.
A
Thank
you.
Thank
you
very
much
guys.
I
know
there
are
more
questions
and
in
the
interest
of
time,
let's
push
this
to
after
the
final
presentations
when
you
have
open
discussion,
I'll,
bring
back
the
questions
that
are
not
answered
for
now
yep.
Thank
you
very
much
sivaram.
We
will
now
move
on
with
representation
by
pierre
gerardini
with
kendall,
which
is
a
very
interesting
platform.
Very
cool
infrastructure,
mixing
closure,
r
and
atomic
pier
the
stage
is
yours.
F
Yeah,
so
as
a
as
as
santiago's
mentioned,
this
is
a
real
platform
for
doing
biological
data
science
that
was
built
using
a
combination
of
closure,
r
and
atomic,
and
before
I
go
into
the
details
of
the
platform,
I
work
at
the
parker
institute
for
cancer
immunotherapy.
So
I'm
going
to
give
just
a
two
seconds
introduction
to
the
institute.
What
is
the
main
work
that
we're
carrying
out
at
the
institute?
Because
we
really
built
the
platform
to
power
our
work,
and
so
it's
important
to
understand
what
our
work
is.
F
First,
so
the
vision
of
the
institute
is
really
to
transform
the
way
that
medical
research
is
done
and
to
in
order
to
turn
our
cancer
into
curable
diseases,
and
the
institute
itself
is
particularly
focused
on
immunotherapy
as
a
treatment
modality,
which
is
the
idea
of
using
the
body's
own
immune
system
to
attack
cancer
and
then
and
eliminate
cancer
and
the
the
pisces
model.
So
I'm
gonna
call
it
pisces,
which
is
the
algorithm
that
we
use
internally
and
actually
in
our
law
as
well.
F
So
when
I
say
pisces,
I'm
in
the
parking
issue
for
cancer
immunotherapy,
so
the
the
model,
the
pisces
model,
is
really
built
around
collaboration
and
and
the
the
institute
was
born
as
a
as
an
alliance
between
some
of
the
major
cancer
centers
in
the
nation
that
you
can
see
in
this
slide
and-
and
you
know
some
of
the
major
really
leading
researchers
in
these
fields-
and
there
is
a
central
office
in
san
francisco,
which
is
where
I
work
and
where,
where
we
have
an
informatics
team,
that,
as
part
of
this
charge,
is
taking
data
from
all
these
different
sites
and
and
both
from
clinical
studies,
as
well
as
laboratory
studies
and
laboratory
research
up
and
all
these
sites
and
trying
to
maximize
the
amount
of
information
that
we
can
extract
from
this
data
set
in
order
to
advance
the
field
of
cancer
immunotherapy
in
general.
F
So
the
as
I
say,
data
is
really
you
know
the
fuel
that
powers,
everything
that
we
do
and
and
in
our
research.
What
we're
really
trying
to
do
is
go
to
go
from
a
bench
discovery
so
something
that
one
of
the
investigators
discover
into
in.
F
In
one
of
the
labs,
maybe
in
an
animal
model,
move
that
to
the
bedside
so
translate
this
into
a
clinical
trial
that
can
be
used
to
test
this
therapy
in
in
actual
human
patients
and
then
eventually,
if
the
clinical
trials,
successful,
move
this
therapies
to
the
market,
which
is
the
way
that
you
know
commercialization
is
really
the
way
that
medicine
go
out
in
the
general
population.
Besides
that,
besides
a
clinical
trial,
that's
that's
a
very
limited,
obviously
population,
just
for
testing,
and
you
know
as
part
of
this.
F
Sometimes
we
also
want
to
what
we
want
to
go
the
opposite
so
we're
going
to.
We
want
to
go
from
bedside
to
bench,
so
we
make
a.
We
make
an
observation
in
a
patient.
We
discover,
for
instance,
that
you
know
patients
that
don't
respond
to
a
certain
therapy,
have
a
very
high
level
of
expression
of
a
given
gene,
and
so
we
want
to
go
back
to
the
bench
and
trying
to
figure
out
the
experiments
that
can
explain
his
observation
that
we've
made
we're
making
patients,
and
so
in
all
of
these
we
use
data
extensively.
F
You
know
both
molecular
data
from
from
our
patient
sample,
as
well
as
clinical
data
and
really
candle
is
the
engine
that
uses
all
this
data
to
power
all
this
work,
and
so
it's
really
the
core
data
infrastructure
that
is
central
to
the
work
that
we
do
at
the
institute.
F
F
It
actually
looks
in
practice
in
our
specific
example,
so
we
are
running
a
number
of
studies,
an
initial
as
an
institute,
a
number
of
clinical
trials,
and
in
doing
these
clinical
trials,
we
collect
both
tumor
tissue
from
patient
as
well
as
blood,
so
we
get
biopsies
and
we
get
we
get
blood
drugs
from
these
patients
and
then
we
use
a
you
know
a
large
sweet
collection
of
different
molecular
assays
on
this
on
these
samples
to
get
molecular
measurements.
F
So,
for
instance,
on
the
tumor,
we
can
do
whole
exome,
sequencing
and
figure
out
all
the
mutations
that
are
in
the
tumor
or
we
can
do
multi-parameter
imaging
and
trying
to
you
know,
see
the
immune
cells
and
the
tumor
cells
and
the
relationship
between
the
immune
cells
and
the
and
the
tumor
cell
in
in
the
impact
tissue,
or
we
can
do
rna
sequencing,
which
is
essentially
measuring
the
expression
of
different
genes
in
the
tumor
and
similarly
on
the
blood.
We
can
measure
different
type
of
of
molecules
that
are
that
are
in
the
serum.
F
Different
types
of
proteins
are
in
the
serum
that
are
important
for,
for,
or
you
know,
for
the
working
of
the
immune
system
as
well
as
we
can
profile
the
all
the
immune
cells
that
are
in
the
blood-
and
you
know
knowing
these
patients,
how
many
cells
of
one
type
there
are
how
many
cells
of
the
other
type
there
are,
and
also
what's
the
activity
of
all
these
different
cells
in
the
in
in
the
blood
and
and
then
so.
F
All
of
this
generates
a
large
stream
of
molecular
data,
so
molecular
information
about
what's
going
on
in
the
cells
and
and
the
genes
of
of
this
patient,
and
then
we
really
marry
that
with
the
clinical
information
that
we
get,
we
get
from
our
studies,
which
are
more
kind
of
more
like
the
thing
that
might
the
previous
speaker
was
talking
about,
which
are
information
such
as
you
know
what
is
the
type
of
cancer
that
this
patient
has
you
know
what
other
comorbidities
this
patient
is
experiencing,
what
drug
treatments
he
received?
What
was
the
response?
F
How
long
did
he
leave?
How
long
did
it
pass
before
the
disease
recurred
in
the
case
of
cancer,
etc?
So
all
the
all,
this
type
of
clinical
information
once
again
very
similar
to
what
the
previous
speaker
was
talking
about,
and
so
our
job
and
the
job
of
this
you
know
black
actually
great
box
here-
is
to
bring
these
two
together
and
trying
to
figure
out
if
there
is
any
in
the
molecular
data
that
predicts
any
of
this
clinical
feature
right.
F
So,
first
of
all,
we
want
to
be
able
to
give
treatment
to
patients
that
work,
and
so
when
the
patient
comes
at
the
door,
if,
if
we
see,
if
we
do
a
molecular
test,
then
we
see
that
a
certain
treatment
is
not
going
to
work
on
this
patient.
We
don't
want
to
give
it
to
him
and
vice
versa.
If
we
have
a
library
of
treatment
to
choose,
for,
we
want
to
make
sure
that
we
give
them
the
treatment,
that's
appropriate
for
the
molecular
profile.
F
But
that's
one
reason.
The
other
reason
is
to
further
advance
the
field
of
clinical
research
right,
because
if
we
discover
that
the
patients
that
don't
respond
to
this
therapy,
they
all
have
a
very
high
expression
of
gene
x.
Then
maybe
the
next
step
could
be
trying
to
figure
out
if
you
can
develop
a
molecule
that
targets,
gene
x
and
so
can
be
combined
with
that
with
existing
treatment.
F
To
also
address
the
needs
of
this
specification
for
patient
population
where,
where
the
previous
treatment
wasn't
working
right,
so
the
way
that
this
we
proceed
is
by
making
all
of
these
observations
impatient
and
then
using
this
observation
to
sort
of
further
advance
the
transcendental
practice,
because
in
every
clinical
study
there
is
always
some
subset
of
patients
for
which
the
treatment
works
and
some
subset
of
patients
from
which
for
which
the
treatment
doesn't
work.
And
so
the
job
is
really
teasing.
The
two
apart
using
the
the
molecularity
that
we
collect.
F
So
typically,
we
have
a
kind
of
like
a
small
number
of
subjects
in
the
study,
so
maybe,
let's
say
50
to
100
subjects
and
for
for
the
for
each
one
of
this
subject,
we
have
several,
you
can
think
of
it
as
a
table,
so
spreadsheet
of
molecular
measurement.
That
represents
the
results
of
one
of
this
acid
that
we
carried
out
on
this
subject.
So,
for
instance,
the
gene
expression
assay
right.
F
So
in
the
case
of
gene
expression
assay,
we
get
a
big
table
that
contains
expression
of
you
know:
20
000
different
genes
in
the
in
the
samples
from
this
specification,
and
we
have
the
similar
thing
for
all
the
assets
that
we
run.
So
we
will
have
a
big
table
for
our
gene
expression
assay.
We
will
have
a
big
table
for
our
genome
sequencing
assay.
F
Well,
the
other.
The
other
thing
about
the
other
feature
of
this
is
it's
very
deeply
interrelated,
because
at
the
end
of
the
day,
when
we
do
these
molecular
measurements,
we're
measuring
a
lot
of
things
that
are
related
to
each
other.
So
when
we
measure
the
abundance
of
the
proteins
in
the
blood,
these
proteins
regulate
the
activities
of
the
cells
that
we're
also
measuring
in
the
blood
right
and
when
we
measure
the
composition
of
of
the
the
number
of
immune
cells
into
a
tumor.
F
Well,
this
immune
cell
come
from
the
blood
in
the
first
place,
so
they
migrate
from
the
blood
to
the
tumor,
and
so
all
of
these
different
measures
that
we
are
that
we're
measuring
are
really
kind
of
like
looking
at
at
the
same
biological
system,
the
same
new
methodological
system
just
from
a
lot
of
different
angles,
and
so
this
data
is
very
deeply
interconnected
and
related,
and
the
one
of
the
the
other
important
feature
of
this
data
is
it's.
It's
typically
sparse.
F
So
because
these
are,
you
know,
very
sick
patients
that
we're
getting
these
samples
from.
Sometimes
it
won't
be
possible
to
have
a
tissue
biopsies,
and
sometimes
you
know,
somebody
was
scheduled
to
to
get
a
blunt
wrong
blood,
throne
or
given
day,
but
he
had
to
skip
the
blood
draw
and
so
we're
not
gonna
have
that
sample
from
the
for
that
patient.
So
the
data
is
a
little
bit
of
a
swiss
cheese
of
what
is
what
we
can
get
from
from
from
from
these
very
sick
patients,
and
and
very
often
it's
not
complete.
F
Actually,
it's
never
complete.
Basically,
so
the
thing
that
we
like
to
say
is
this
is
this
is
not
really
big
data,
but
it's
deep
data.
So
it's
not
big
data,
because
it's
not
a
ton
of
data.
I
mean
this
is
not
like
facebook
level.
Big
data
on
you
know
billions
of
users
and
hundreds
of
thousands
of
clicks
every
day,
but
it's
sorry
some
it's
a
much
smaller
data
set,
but
it's
very
deep
because
we're
collecting
a
lot
of
lot
of
different
features
on
on
these
data
sets
on
on
this
patient.
F
So
what
we
typically
want
to
do
with
this
data?
This
is
something
that
I've
already
you
know
mentioned
mentioned
before.
So
I'm
going
to
go
quickly
to
this
slide
but,
as
I
said,
you
know
identify
specific
subset
of
patients
to
you
know,
figure
out
if
they're
going
to
benefit
from
the
therapy
or
or
or
if
I'm
not
going
to
benefit
from
from
the
therapy.
F
For
instance,
then
you
know
another
another
thing
we
want
to
do
is
we
determine
if
a
certain
observation
has
been
made
before
right,
because
we
are
running
lots
of
studies
and
other
people
in
the
field
are
running
a
lot
lots
of
these
studies,
and
so
when
we
see
that
you
know,
gene
x
is
over
expressed
in
the
people
that
do
not
respond
to
the
therapy.
We
want
to
be
able
to
see.
F
Has
this
thing
ever
been
seen
in
another
data
set,
so
there
is
a
component
of
sort
of
meta-analysis
and
going
back
to
the
existing
data
and
querying
for
specific
observations,
and
then
you
know.
Last
but
not
least,
we
want
to
use
use
all
of
these
to
build,
as
I
said,
predictive
models
that
give
us
insight
into
the
mechanism
of
action
of
therapy
right.
We
want
to
see
if
gene
x
is
important
for
the
patients
that
do
not
respond
to
this
therapy.
F
What
is
the
mechanic,
the
biological
mechanism
that
under
underlies
this,
this
importance
and
the
reason
why
genexism
is
expressed
so
in
order
to
so
we
have
a
team
at
the
the
parking
institute
that
basically
is
doing
all
this
kind
of
work
for
the
studies
that
we're
running
as
an
institute.
So
we
as
an
institute
sponsor
a
number
of
clinical
trials.
We
collect
all
this
data
set,
we
bring
it
in
house
and
then
there
is
a
team,
the
informatics
team
of
which
I'm
a
member.
F
Obviously,
that
is
tasked
with
doing
all
the
work
that
that
has
developed
now,
so
in
order
to
facilitate
and
really
power
our
work,
we
created
this
platform
that
we
call
candle
that
stands
for
cancer
data
and
evidence
library,
and
it's
really
a
platform
for
biological
data
science
in
general.
That
supports
a
lot
of
different
data
types,
and
I'm
gonna
talk
more
about
that
in
the
in
the
next
of
the
presentation
of
the
subsequent
slides.
So
the
platform
is
really
you
know
we
can.
F
We
can
conceptualize
it
as
it
is
in
three
steps,
so
we
start
with
raw
data.
So
we
get.
You
know
big
massive
data
files
from
our
sequencing
vendor
for
our
imaging
vendors
etc.
So
this
is
a
you
know,
raw
binary
files
that
we
need
to
associate
with
whatever
whatever
sample
they
came
from,
and
at
this
point
this
association
is
very
much
unstructured,
so
you
can
think
of
this
as
a
almost
like
an
object
database.
We
have
all
these
objects.
We
associate
the
metadata,
that's
completely
instruction,
it's
just
what
we
get
from
from
a
vendor.
F
They
can
they
get
summarized
down
to
a
very
small
spreadsheet,
of
maybe
a
few
hundred
k
that
contains
all
the
variants
that
are
in
in
a
specific
sample
right.
So
there
is
a.
There
is
a
a
step
here
where
the
data
is
taken
from
raw
into
a
set
of
features
that
are
much
more
distilled
down
in
size
and
once
again,
this
feature
will
be
things.
As
you
know,
what
is
the
abundance
of
a
specific
protein
in
the
blood
or
what
is
the
proportion
of
cells
of
a
given
type
in
the
blood?
F
Or
what
mutation
does
this
patient
have
etcetera,
etcetera
and
so
from
then
all
the
all
this
data,
all
these
all
this
feature
then
go
into
the
scandal
database,
which
is
a
highly
structured
database,
and
it's
really,
you
know
the
basis
that
we
use
for
doing
all
the
all
the
subsequent
data
science
work.
F
So
trying
to
answer
all
the
questions
I
was
talking
about
before
we,
we
have
all
the
data
in
the
candle
database
and
then
we
pull
the
data
out
of
the
database,
and
we
do
you,
know,
machine
learning
and
exploratory
statistics
on
this
data,
so
the
rest
of
this
presentation
and
even
though
we
are
using
closure
and
r
and
atomic
across
this
entire
infrastructure,
the
rest
of
this
presentation
is
really
going
to
be
mostly
focused
on
on
on
on
this
last
piece
here:
the
the
candle
database
and
how
we
built
it
and
how
we
use
it,
how
it's
organized
so,
as
I
said,
this
database
is
really
a
platform
for
biological
data
science
and
and
the
core
idea
is
to
really
break
down
the
silos
between
different
types
of
molecular
and
clinical
data.
F
So,
as
I
said
up
to
now,
we
get
a
very
broad
variety
of
data
sets
and-
and
we
want
them
to
have
them
all
in
a
single
place,
so
that
we
can
do
queries
and
interrogations
that
really
navigate
this
data
very
freely
and
and-
and
you
know,
without
impediment
in
silos,
so
breaking
down
periods.
Putting
everything
together
was
was
a
major
design
goal
of
this
project.
F
Another
very
important
thing
that
has
to
do
with
with
the
efficiency
with
which
our
team
can
work
and
work
on
the
data
was
really
enabling
the
whole
team
to
make
sure
that
they're
working
on
the
same
data.
So
this
is
like
a
data
version
problem
right.
We
have
five
different
data
scientists
that
are
working
on
a
specific
trial.
F
We
don't
want
all
of
them
to
have
copies
of
spreadsheets
on
their
computer
and
then
you're
never
sure
that
they're
really
working
on
the
actual
same
version
of
the
data
we
want
to
have
a
centralized
repository
where,
ultimately,
that's
probably
version,
and
so
everybody
accessory
can
be
100,
confident,
they're.
Looking
at
the
same
data,
the
other
thing
that
was
really
important
about
you
know.
That's
still
related
to
the
efficiency
of
the
team.
Is
this
idea
of
making
analysis
code
reusable
across
projects
right?
F
So
what
happens
typically
in
when
this
work
is
done
in
an
academic
environment
where
this
sort
of
infrastructure
does
not
exist?
Is
that
somebody
will
have
a
spreadsheet
on
it
on
his
laptop
for
a
given
project,
he
will
write
a
whole
script
that
does
a
whole
complicated
analysis
and
then
the
next
project
comes
along
and
that
now
the
files
are
made
completely
different.
F
Now
the
code
that
we
write
is
really
is
really
reusable,
because
if
I
you
know,
if
I
write
a
script
to
do
a
certain
horizon
or
given
a
given
study,
when
I
move
to
a
different
study,
I
know
that
the
data
is
always
in
the
same
shape,
because
the
shape
of
the
leader
is
dictated
by
the
data
model
of
the
of
the
of
the
database,
and
so
my
code
becomes
much
more
usable
across
projects
and
across
the
members,
and
you
know
last
but
not
least,
was
this
idea
which
really
leverages
atomic
for
those
of
you
that
are
familiar
with
it,
which
is
the
really
keep
a
history
of
the
data
to
make
sure
that
we
are
always
able
to
reproduce
our
results
right.
F
So
we
want
to
be
able
if
two
years
from
now
somebody
comes
along
and
says.
Oh,
I
want
to
reproduce
the
same
plot
that
you
produced
two
years
ago.
We
want
to
be
able
to
be
able
to
do
what
I
just
described.
That
really
requires
a
system
for
data
versioning
that
that
is,
that
is
very
granular,
because
otherwise
there
will
be
no
no
way
for
me
to
go
back
to
the
state
of
the
deal
two
years
ago
and
be
able
to
do
this
if
I
didn't
have
a
specific
system
for
it.
F
So
in
answering
all
of
this,
all
of
this-
let's
say
use
case
and
and
needs
our
approach
was
really
to
leverage
closure
and
atomic
unique.
You
know:
data
modeling
and
processing,
primitives
upstream
of
thing
that
standard
data
science
tools.
So
what
we're
doing
here
is
not
replacing
the
whole
data
science
stuck
with
closure
and
atomic.
F
It's
a
it's
leveraging,
closure
and
atomic
for
the
for
an
area
that
we
think
it's
very
well
suited
for
that
has
to
do
with
the
data
regularization
data
processing,
while
at
the
same
time,
building
a
bridge
for
for
the
standard
data
science
workflow,
which
in
our
field,
which
is
that
of
computational
biology,
means
working
in
r
okay.
F
So
we're
not
trying
to
use
quotient
atomic
for
everything
where
we're
we're
using
closure,
entertainment
for
what
we
think
is
really
good
at
and
then
we're
building
a
bridge
to
r
for
for,
for
all
these,
the
scenarios
where
r
is
actually
a
better
suited
environment
so
why
we
choose
the
atomic
specifically
is
for
a
number
of
reasons.
One,
and
I
think
one
of
the
most
important
one
is
that
the
schema
is
malleable
to
change.
So
the
the
problem
of
biology
is
that
it's
an
exceedingly
complicated
field
and
also
it's
continuously
evolving.
F
So
if
we,
if
you,
if
you
use
a
database
technology
which
is
very,
you
know,
inflexible
where
you
have
to
get
basically
the
schema
right
at
the
get-go
and
then
you're
stuck
with
it,
because
it's
very
hard
to
change,
you
are
really
in
for
some
trouble,
because
you're
never
ever
going
to
be
able
to
anticipate
what.
If
the
data
is
going
to
look
like
two
years
from
now,
let
alone
20
years
from
now
so
having
a
schema
that
was
malleable
to
change
was
extremely
important
and
one
of
the
main
reasons
why
we
choose
the
atomic.
F
F
So
yes,
this
concept
of
the
history
of
the
data
and
being
able
to
access
the
whole
history
of
the
data,
which
is
very
important
for
the
reproducibility
goal
that
I
mentioned
in
the
review
slide
the
last
you
know
another
important
one
is
performance.
Obviously,
so
the
performance
that
we
get
from
the
system
is
absolutely
great
for
the
for
the
kind
of
work
that
we're
doing
the
and
and
and
the
last
two
are
more
technical
aspects,
while
one
one
has
to
do
with
the
expressive
of
the
query
language.
F
So
the
query
language
of
the
atomic
is
is
actually
it's
very,
very
simple,
but
but
very
expressive.
I
don't
have
time
to
go
into
it,
but
I,
if
you're
not
familiar
with
it,
I
will
I
will.
You
know,
suggest
all
you
look
into
it,
because
it's
it's
very
simple
and
very
elegant.
At
the
same
time
very
powerful-
and
it's
called
data
log-
it's
really
a
very
relative
of
prolog-
and
you
know.
F
Last
but
not
least,
the
economics
of
the
api
I
mean
working
with
the
atomic
api
is
really
is
really
nice,
and
you
know
that
was
a
huge
help
for
the
for
dev
velocity,
so
the
basics
of
the
atomic.
I
have
a
slide
here
just
to
give
you
at
the
basics
of
the
the
atomic
information
model,
for
those
of
you
that
are
not
familiar,
and
actually
the
previous
talk
was
the
perfect.
F
You
know
background
tool
of
this,
because
the
atomic
is
really
based
on
a
lot
of
the
same
concept
that
the
previous
figure
talked
about.
So
everything
in
atomic
is
is
modeled
as
a
as
datums,
which
are
really
doubles.
F
So
this
is
this
table
here,
represents
you
know,
a
collection
of
doubles
or
thetums
that
show
you
a
little
bit
of
the
structure.
So
we
have
an
entity
id
that
is
used
to
identify
entity
and
especially
to
identify
tuples
that
refer
to
the
same
entity.
F
Then
we
have
an
attribute
that
represents
what
we
are
saying
about
this
specific
entity
and
then
a
value
for
this
amp.
So,
for
instance,
the
first
tuple
tells
me
that
the
the
entity,
one
two
three-
is
a
subject
with
id
one,
two
three
five
x
and
then
the
same
entity
is
a
subject
that
has
disease
head
and
neck
cancer,
and
then
the
same
entity
has
been
subjected
to
a
therapy
called
789.
F
So
now
789
would
be
the
id
of
another
another
entry
in
the
system,
and
so
the
entity,
eight
nine
and
this
other
entity,
which
is
a
a
therapy
with
the
name
kituda,
okay.
So
the
way
the
way
that
is-
and
this
is
the
way
that
the
model
relationship
between
entities
and
atomic
by
having
by
having
attributes
that
represent
the
value
whose
value
represents
the
idea
of
another
entity.
And
so
you
can
create
a
you
know,
a
graph
essentially
of
all
the
different
entities.
F
So
it's
important
to
note
here
in
you
know
the
the
attributes
come
from
a
schema,
so
the
attributes
cannot
be
anything.
The
attributes
have
to
be
defined
in
a
schema,
but
it's
very
easy.
If
you
want
to
add
another
attribute,
you
just
add
it
to
the
schema
and
you
can
use
it
from
now
on
without
having
to
modify
the
existing
data,
and
the
other
thing
that's
important
is
that
attribute
can
be
named
space
here,
as
you
can
see,
and
that
allows
you
to
sort
of
define
a
concept
a
little
bit
of
entity.
F
So
even
though
the
atomic
doesn't
know
doesn't
have
a
concept
of
entity
per
se,
so
there
isn't
such
thing
as
a
subject
entity
or
a
therapy
entity.
We
can
model
the
same
thing
using
attribute
namespaces,
essentially,
and
so
last
but
not
least,
as
I
was
saying,
the
the
time
is
actually
first
class
concept
in
the
tommy,
so
the
actual
table
in
the
database
looks
something
more
like
this,
which
is.
F
There
are
two
additional
fields
that
are
added,
which
represent
the
time
that
this
this
fact
was
inserted
in
the
database
and
also
whether
this
boolean,
that
whether
this
is
this
was
an
assertion
of
a
retraction
for
a
fact,
and
these
two,
these
two
two
different
things,
basically
allow
you
to
have
the
full
history
of
the
database.
Okay,
so
you
can
always
go
back
and
say
hey.
F
I
want
to
do
a
query
of
this
database
as
it
was
two
years
ago,
and
that
would
return
me
exactly
the
same
result
that
I
got
two
years
ago
and
that
because
of
the
because
of
the
way
that
time
is
treated
by
the
atomic
system,
so
we
we
get
all
of
this
history
for
free
by
using
the
timing.
Just
by
extending
these
two.
You
know
just
by
the
fact
that
the
atomic
extent
this
notion
of
rtf
apple
with
this
additional
time
and
then.
F
Sorry,
okay,
so
most
of
the.
So
this
is
the
the
core
of
the
system.
So
most
of
the
of
the
of
our
work
as
a
as
as
developers
has
been,
you
know
how
to
like
how
to
facilitate
data
into
the
system
and
how
to
facilitate
using
data
and
so
getting
that
out
of
the
system
and
using
it.
So
in
the
in
the
next
few
slides
I'm
going
to
talk
a
little
bit
about
how
do
we?
The
system
will
be
to
for
getting
data
into
the
atomic
and
the
reason
why
we
built
a
specific
thing
here.
F
A
specific
system
or
specific
infrastructure
is
because
we
want
our
data
scientists
to
be
able
to
import
data
into
this
database
right.
So
our
data
scientists,
don't
know
closure,
don't
know
the
atomic,
don't
know
any
of
the
internals
of
the
database,
but
still
they
need
to
be
able
to
take
an
existing
dataset
and
put
into
the
system.
And
so,
in
order
to
to
facilitate
this,
we
developed
this
tool
that
called
prep
that
stands
for
programmable
etl
and
it's
really
an
a
configurable
etl
for
getting
data
into
the
atomic.
F
F
So
what
the
user
needs
to
do
here.
The
only
thing
that
the
user
needs
to
do
to
import
this
into
atomic
is
to
write
a
configuration
file,
which
is
an
eden
file
that
specifies
how
essentially,
the
columns
and
the
files
in
this
dataset
maps
to
the
attribute
in
the
schema.
So,
for
instance,
what
is
that?
F
What
this
very
little
snippet
here
is
telling
me
is
that
the
you
know
the
barcode
column
matches
to
the
to
the
database
attribute
sample
id
and
that
the
participant
column
in
this
file
maps
maps
to
the
to
the
subject
attribute
in
the
in
in
the
schema
right.
So
this
is
basically
establishing
a
connection
between
the
column,
adders
and
the
attributes
in
the
schema.
F
In
this
specific
example,
obviously
there
is
a
little
bit
more
than
that,
because
we
also
have
reference
between
the
different
files
etc.
But
this
this
should
be.
You
know
enough
to
provide
you
a
flavor
of
what
this
is
so
so
that
the
user
writes
a
configuration
file
doesn't
write
any
code.
F
They
just
write
this
configuration
file
and
then
the
tool
that
we
wrote-
prepped,
which
is
a
closure
command
line
tool,
takes
the
the
source
data,
the
configuration
file
and
the
knowledge
of
the
atomic
schema
and
metamodel,
which
is
some
additional
things
that
we
built
into
into
the
atomic
schema.
But
let's
say
the
atomic
schema
for
now,
so
it
takes
all
of
these
things
and
then
prepare
transaction
prepares
transaction
data
that
can
really
be
imported
into
the
input
database.
F
So
then,
in
a
subsequent
comment,
the
transact
command
print
will
take
all
of
this
transaction
data
and
put
it
into
a
database
so
taking
care
of
like
database
ml
transaction
with
rise
pop
up
all
of
that
stuff.
So
the
mechanic
of
putting
the
data
from
from
a
set
of
flat
files
into
a
database
and
then
the
last
step
is
that
performing
validation,
which
is
very
important
and
so
validation
includes,
like
validation
of
scalar
attributes.
F
Like
you
know,
a
percentage
can
only
be
a
positive
number,
for
instance,
then
referential
integrity,
so
making
sure
that
all
references
are
correct.
So
if
I
have
a
measurement
that
targets
a
specific
sample
with
the
temple
bar
code,
one
two
three,
then
I
need
in
my
sample
files.
I
need
to
have
decided.
I
need
to
have
defined
sample
one
two
three
for
this
reference
to
be
valid
and
also
stuff.
That
has
to
do
with
attribute
composition,
so
we
cut
which
combinations
of
attributes
are
valid
for
for
specific
entities.
F
So
this
step
is
really
important,
because
the
data
that
one
the
process
of
putting
the
data
into
the
database
really
does
a
ton
of
qc
and
standardization
of
the
data
which,
which
is
very
important
and,
for
instance,
as
part
of
this.
As
part
of
this
import
process,
we
standardize
a
lot
of
the
data
using
existing
ontologies.
F
So
you
know
there
are
ontologies
that
describe
the
name
of
different
genes,
the
name
of
different
proteins,
the
name
of
different
drugs,
etc,
etc,
and,
as
part
of
this
import
and
validation
process,
we
make
sure
that
everything
that
needs
to
be
validated
onto
ontology
has
been
actually
validated
on
the
relevant
quality
and
mapped
to
the
right
ontology.
F
F
I'm
gonna
maybe
skip
this
because
I
don't
want
to
go
over
time
and
that's
that's
not
really
that
important,
so
the
the
the
the
the
besides
this
system,
we
also
built
a
sensory
assistant
to
do
a
branch
and
merge
workflow
for
for
data
right
because,
as
I
said
before,
we
want
user
to
be
able
to
import
their
own
data
set.
But
at
the
same
time
we
don't
want
user
to
just
start
dumping
stuff
into
into
into
the
production
database.
F
And
so
we
built
a
system
whereby
user
can
request
a
copy
of
the
master
database
in
order
to
work
on
the
import
of
the
specific
data
set.
So
on
this
branch
data
database
they're
free
to
mess
it
up
or
it
doesn't
matter
if
they
import
broken
data.
Or
you
know
in
the
process
of
iterating
over
this
dataset
they're
going
for
program
data
multiple
times.
It
doesn't
really
matter
because
it's
happening
in
a
copy
of
the
database
and
then
there's
a
system
whereby
we
can
put.
F
F
So
we
we
like
to
think
of
this
as
a
as
I
said,
like
a
branch
and
merge
workflow
for
data,
so
then
you
know
the
data
is
all
in
there,
and
now
we
have
to
you
know
we're
building
all
of
these
for
data
scientists
to
you
to
you
to
use.
So
we
we,
we
have
to
meet
them
where
they
live,
and
the
data
scientists,
at
least
in
our
team-
and,
I
would
say,
in
the
majority
of
computational
biology,
delivering
r.
F
So
we
accept
queries
obviously
over
over
over
the
wire
and
the
query.
Sources
can
be
either
our
library,
which
is
what
what
our
user
use.
So
our
user
use
issue
query
through
on
our
library
or
also
a
visual
query,
building
environment
that
I
think
has
been
it's
called
enflame
and
I
think,
has
been
presented
by
mike
travis
from
our
team
in
a
previous
meetup
and
then
the
the
the
data
is
really.
F
These
queries
are
serialized
to
json,
and
so
we
have
a
data.json
parser
on
the
other
end
that
accept
the
query
and
transform
it
from
json
into
something
that
the
atomic
can
understand
and
then
also
a
you
know,
a
system
to
automatically
improve
queries.
I'm
not
I'm
not
really
going
to
talk
about
these
two
aspects
today,
because
first
of
all
we'll
talk
about
it
before,
but
also
I
want
to
focus
on
on
the
r
functionality,
which
is
more
important
for
the
data
science
part.
F
F
Instead,
closure
query
will
look
this
way
in
r.
So,
as
you
can
see,
it's
basically
a
very
simple.
You
know
substitution
of
certain
certain.
You
know
syntactic
element,
but
it's
basically
it.
It
looks
the
same
way,
and
so
we
have
a
with
this.
You
know
this
explains
that
the
transformation
that
we
had
to
do
we
we're
going
from
the
from
the
closure
syntax
to
the
r
syntax,
but
it's
really
a
one-to-one
mapping
and
I'm
going
to
explain
in
a
second
why
we
had
to
do
it.
F
So
basically,
you
can
write
a
query
like
this
in
r
looking
very
similar
to
closure
and
then
what
you
get
in
r
is
a
native
r
object
that
you
can
use
with
all
the
functions
that
exist
in
r,
so
you
want
to
use
r
for
plotting.
You
use
you
issue
a
query
like
this.
You
get
back
in
your
our
session.
You
get
back
a
native
r
object
and
then
you
use
it
for
downstream
plotting,
as
you
would
with
any
other
native
r
object.
F
But
the
other
interesting
thing
is
that
data
logging
queries
in
our
data
exactly
the
same
way
that
their
enclosure.
This
really
enables
composition
and
programmatic
programmatic
composition
of
queries
right.
So
this
is
this
is
something
that
allows
us
to
to.
You
know
as
developers
to
really
build
our
queries
that
are
that
are
built
programmably,
so,
for
instance,
to
give
an
example.
Here
we
have,
we
have
a
situation
where
we
have.
F
We
have
an
existing
query
which
we're
calling
q
here
and-
and
we
want
to
add
a
bunch
of
different
clauses
to
this
query
based
on
a
given
parameter
in
input
right,
and
so
we
have.
You
have
this
function
here
c
query
that
allows
to
take
an
existing
query
and
add
additional
clauses
to
it,
and
these
clauses
are
selected
based
on
some
other
logic,
and
so
this
is
very
useful.
F
You
know
it's
very
useful,
because
a
very
useful
consequence
of
the
fact
that
our
queries
are
data
structure,
the
same
way
that
they
are
in
closure,
it
would
be
something
would
be
much
more
complicated
to
do
if
queries
were
instead
strings
and
you
have
to
do
a
ton
of
string
interpolation
right.
We
can
instead
manipulate
queries
as
data,
so
this
dsl
really
takes
advantage
of
earth
lisp
origin.
So
I
don't
know
how
familiar
you
guys
are.
F
We
are,
but
our
was
really
a
lisp
initially
and-
and
you
know
it
takes
advantage
of
the
fact
that
symbols
and
expression
can
really
be
captured
before
evaluation
and
manipulated.
You
know
one.
One
limitation
is
that
you're
still
concerned
by
the
fact
that
the
expression
need
to
be
valid
are
syntax,
and
so
that's
the
reason
why
we
have
to
do
some
transformation
of
the
syntax
from
from
closure
to
r,
because
you
still
need
to
end
up
with
with
valid
r
syntax.
F
You
know
the
pool
syntax
that
you
have
in
the
atomic
is
also
available.
So
the
same
way,
you
can
do
a
pool
query
in
the
atomic.
You
can
do
a
full
query
with
our
our
library,
with
this
syntax
transformation.
F
We
have
this
spread
package
that
I
was
saying
is
the
programmable
etl
pipeline,
that's
written
in
closure
and
that
uses
talksaw
to
do
an
additional
system.
It's
called
candelabra
that
contains
all
the
machinery
for
doing
the
branch
and
merge
workflow
access
composer
said
before
so
this
is
the
way
the
data
gets
into
the
system,
and
then
data
gets
out
of
the
system.
F
They
just
say
you
know,
give
me
all
the
g.
We
have
a
function,
a
pre-baked
function,
a
prepaid
query
in
our
library.
That
says
give
me
all
the
gene
expression
measurements
for
this
data
set.
So
the
user
actually
just
called
that,
and
so
it's
exposed
to
them
as
an
r
library.
They
get
the
data
back
and
then
they
do
whatever
analysis
they
want,
and
on
top
of
that,
we've
also
built
a
lot
of
we're,
starting
to
be
a
lot
of
tools
for
doing
interactive
data
exploration
and
visualization
right.
F
So
we
you
can
build
dashboards
and
we
just
that
talk
to
the
database
and
so
allow
somebody
that
doesn't
doesn't
know
r
but
knows
how
to
click
around
to
you
know,
generate
plots
and
visualization
from
the
same
system.
Using
the
same.
F
F
This
idea
yourself
so
last
slide
is
acknowledgement,
so
really
want
to
acknowledge
man
lasik
each
from
our
team
and
ben
campos,
which
have
been
you
know
opposite
partner
in
building
and
building
on
the
system
up
and-
and
you
know,
absolutely
essential
contributors
and
then
as
well
as
a
particular
george
kirsten
from
the
continent
team,
which
has
also
been
a
huge
help
in
implementing
and
with
that,
I'm
only
six
meters
over
time.
So
I'll
take
I'll,
take
questions
and
thank
you
for
listening.
A
A
I
think
some
of
the
questions
kind
of
solve
themselves,
but
jordan
is
asking
if
it
was
straight
forward
for
the
users
living
and
working
with
r
to
learn
this.
The
atomic
dsl.
F
So
if
you're
talking
about
the
query
language,
I
mean
it's
not
yes,
we,
if
you're
talking
about
this.
Yes,
it
wasn't
too
complicated,
but
I
must
say
that
the
reality
is
that
we've
built
already
so
much
functionality
that
our
users
very
rarely
go
down
to
the
level
of
writing.
Custom
queries
so
a
lot
of
the
time.
So
I'm
showing
you
what
is
that?
F
What
is
like
the
nuts
and
bolts
of
how
this
thing
works,
but
the
reality
is
that
most
of
the
time,
users
just
just
use
the
functionality
they
were
built
on
top
already,
but
yeah
I
mean
have
we
had
users
picking
this
up,
I
would
say
that
the
the
the
the
probably
what
what
what
users
have
had
to
more
learning
to
do
was
was
was
learning
how
to
use
spread.
But
you
know,
I
think
this
is
still
it's
still.
It's
still
less
than
I
would
have
have
to
learn.
F
If
we
had
to
learn
how
to
you,
do
closure
and
atomic
to
do
transactions
and
and
everything,
so
you
know
one.
One
thing
that
I
like
about
this
system
is
that
we
have
had,
and
we've
literally
had
high
schoolers
join
our
team
and
in
a
couple
of
weeks
as
inter
during
the
summer
in
a
couple
of
weeks,
they
were
able
to
use
the
system
to
import
data,
because
the
only
thing
that
you
really
need
to
understand
here
is
the
data
and
maps
to
the
schema.
You
don't
need
to
understand
the
atomics
internal.
A
Well,
yeah,
that's
yeah,
that's
really
cool!
To
hear
I
mean
if
you
can
get
to
that
level
of
quick
activity
and
productivity,
even
with
high
school
kids,
and
really
speaks
to
both
the
quality
of
the
atomic
and
then
the
the
interface
with
r.
This
is
really
interesting.
F
Yeah,
so
yes,
I
can.
I
can
certainly
explain
that
so
so
you
know,
if
you,
if
you
do
a
little
bit
of
google
searching,
you
will
see,
you
will
see
people
decrying
the
crisis
of
reproducibility
in
science
right
so
in
general,
in
science.
There
is
this
this
this
problem
in
biological
sciences,
especially
of
getting
reproducible
results
so
being
able
to
you
know
I
do
an
analysis,
then
I
give
you.
F
But
what
we
wanted
here
was
to
at
least
try
the
stuff,
that's
under
control,
so
the
stuff
that's
solvable
in
software
to
solve
it
as
much
as
possible
right
and
so
obviously,
when
you
have
a
piece
of
the
analysis
in
order
to
have
a
reproducible
result,
you
need
to
have
you
know
the
code
need
to
be
version
obviously,
but
we
have
kit
for
that.
So
that's
that's
very
easy,
then,
to
reprodu
the
environment
in
which
the
code
is
run
needs
to
be
version,
and
you
can
use
docker
for
that.
F
F
So
that's
that's
what
what
where
our
system
comes
in
and
the
reason
why
you
want
why
you
want
to
do
this
is
because
you
know
this
data
is
exceedingly
complicated
and
errors
are
made
all
the
time
in
the
analysis,
and
so
it's
very
important
both
for
you
know
for
auditing
for
being
able
to
answer
answer.
You
know
questions
from
other
people
in
the
field
that
you
are
always
able
to
at
least
you
know
get
the
same
result.
F
Maybe
it
was
the
wrong
result
in
the
first
place,
but
at
least
at
least
you
can
you
can
get
you
can
get
the
same
result.
So
you
know
it
happens
all
the
time
that
you
know.
Maybe
somebody
reads
your
paper
and
that
will
see
a
result
and
and
won't
be
able
to
to
you
know,
will
download
your
data.
F
I
won't
be
able
to
reproduce
the
same
thing
right,
so
we
want
to
at
least
be
in
the
position
where
we
can
guarantee
that
whatever
went
into
the
publication,
whatever
was
in
the
figure,
we
can
reproduce
it
exactly
now.
Obviously
this
doesn't
guarantee
that
that
result
is
correct
because
there
could
be
still
bugs
in
the
code
or
whatever
the
analysis
code
could
be
incorrect,
but
having
it
be,
reproducible
is
the
first
step,
at
least
to
you,
know,
to
being
able
to
start
from
a
common
understanding
of
where
things
are.
A
Cool
yeah.
Thank
you
very
much.
There
was
also
a
question
by
jesus
regarding
this
branch
and
merge
system,
because
every
time
you
have
this
type
of
system,
you
can
have
conflicts.
When
you
try
to
merge
everything
together.
Could
you
comment
on
that.
F
Yeah,
so
the
the
reason
why
we
built
this
system
was
really
the
fact
that
when
you
do
this,
when
you
do
it,
when
you
spread
to
do
this
import,
obviously
you
know
you
import
the
data
the
first
time,
and
you
realize
that
you
know
you
made
a
mistake
on
your
configuration
file
or
the
data
is
wrong
in
some
way.
So
you
need
to
fix
it
before
you
import
it
right.
So
the
idea
it
was
it
was
impossible.
F
F
That
doesn't
mean
you
know
that
there
is
still
a-
and
I
put
it
here
in
this
diagram-
an
admin
approval
process
right.
So
when,
when
is
a
new
data
set,
that's
coming
in
so
when
it
is
a
new
data
set,
that's
coming
in
a
user,
an
admin
essentially
looking
at
it
and
making
sure
that
the
user
has
done
the
right
thing
and
then
it
gets
merged.
F
So
this
is
actually
a
very
complicated
project
that
we
just
recently
finished,
and
in
order
for
this
to
be
possible,
we
really
had
to
to
make
sure
that
you
know.
There's
lots
that
goes
into
making
this
possible,
but
like
one
important
thing,
is
that
every
single
entity
in
the
database
needs
to
have
a
unique
domain
based
identifier.
So
not
not
just
a
db
identified,
but
we
have
domain
based
identifier,
so
that
you
can
able
to
say.
F
The
new
version
of
the
data
set
right.
So
actually
this
this
differ.
This
d
thing
of
existing
data
is
a
much
more
complicated
problem
than
just
the
idea
of
importing
a
whole
new
data
set
in
production
which
is
really
about.
You
know
an
admin
approving
and
then
triggering
a
series
of
processes
in
the
in
in
the
cloud
that
basically
take
all
the
data
importing
the
production.
This
is
semantic
is
much
more.
A
Very
interesting
there
is
also
a
question.
I
guess
that
both
you
and
sivaram
can
talk
about.
There
was
some
discussion
regarding
the
use
of
ontologies
in
europeans,
like
we
saw
initially
how
severan
explained,
and
it
seems
that
in
in
kendall,
the
ontologies
end
up
being
more
like
a
graph
or
or
flatter,
and
there
was
some
discussion
that
there's
some
challenges
in
using
hierarchical,
hierarchical
ontologies,
and
I
think
this
this
discussion
is
interesting.
A
F
F
F
We
very
much
strive
to
use
flat
vocabularies,
and
the
main
reason
for
this
is
that
in
our
in
our
for
our
specific
use
case
that
this
process
of
standardization
mapping
ontology
has
to
be
done
by
data
scientists
that
contact
the
patients
or
the
willingness
to
learn
all
the
intricacies
of
the
on
top
of
these
ontologies,
like
so,
for
instance,
I'll
make
an
exam
very
simple
example
in
terms
of
disease
time,
so
that
you
know,
obviously
we
want.
We
want
the
disease
type
to
be
a
standardized
control
vocabulary,
so
lung
cancer
should
always
be
lung
cancer.
F
Okay,
so
that's
that's
the
control
vocabulary
aspect
of
it,
but
at
the
same
time,
some
of
the
ontologies
that
are
using
the
biomedical
field
that
really
elaborate
in
terms
of
placing
this
lung
cancer
concept
in
a
very
complicated
tree,
as
civilaram
was
was,
was
explained
in
a
very
complicated
trees
and
taxonomy
of
classification
of
cancer.
F
We
decided
not
to
use
that
and
to
prefer
another
ontology,
that's
a
bit
rough,
more
rough
and
less
sophisticated,
but
that's
the
properties
of
being
essentially
a
flat
control
vocabulary,
because
we
really
you
know
in
order
to
sometimes
in
order
to
be
able
to
do
this
map
precisely
you
need
to
really
be
a
specialist
in
these
ontologies
and
and
our
scientists.
Our
data
scientists
are
definitely
not
special,
that
they're
not
gonna,
I'm
not
gonna,
be
kinda
anytime
soon,.
B
Yeah,
so
these
I
I
agree
with
what
freddy
korres
said.
You
know
using
the
hierarchy.
B
The
taxonomy
aspect
of
the
ontology
is
a
fairly
advanced
use
of
the
ontology
for
the
for
the
most
part,
the
most
common
use
is
to
flatten
the
the
the
structure
of
the
ontology
and
use
it
as
as
lists
for
multiple
ways
you
know
which
so
which
moves
more
towards
what
you
are
saying:
critical
as
control
vocabularies,
and
but
it
it
doesn't
mean
that
you
it's
a
wrong
way.
It's
an
incorrect
way
of
using
an
ontology.
B
It
is
one
way
in
which
an
ontology
can
be
used
right,
the
same
the
same
labels
and
the
same
words
and
terms
that
you're
using
that
you're
getting
from
the
ontology
once
you
have
mapped
them
into
your
data,
and
you
from
you
can
still
make
use
of
the
structure.
The
ontology
structure,
the
the
both
the
hierarchy,
as
well
as
the
other
relationships
that
actually
make
it
turn
it
into
a
rich
web
of.
You
know
a
rich
graph.
B
That's
you
can
use
that
for
querying
and
for
reasoning
and
for
for
for
doing
you
know
a
lot
of
for
for
generating
new
information
that
was
previously
not
or
or
not
asserted
in
the
data
itself
right.
So
a
simple,
a
simple
example
of
something
like
that
would
be.
You
know
in
your
data
set
if
you
had,
if
you
had
a
cancer
that
was
labeled
as
oral
cancer
right
and
you
have
other
cancers
which
are
labeled
as
a
nasal
cancer
or
cancer
of
the
error
and
another
one
like
ear
cancer.
B
But
if
you
are
going
to
now
start
querying
hey,
I
want
to
find
out
all
patients
with
head
and
neck
cancers
right.
The
the
word
head
and
neck
doesn't
doesn't
appear
in
any
one
of
these.
You
know
data
points
that
you've
just
stored
right,
one
says
oral
cancer,
another
says
nasal
cancer.
Another
says
year.
B
So
this
is
where
the
power
of
an
ontology
comes
in,
where,
even
though
your
data
is
at
the
level
of
the
more
granular
terms,
you
can
pull
back
up
and
do
a
subsumption
query
where
you
can
query
for
all
the
subclasses
of
what
a
head
and
neck
cancer
is,
and
that's
where
you
take
the
taxonomy
the
hierarchy
structure
and
do
that
query
and
and
bring
back
all
the
results
that
match
oral
cancer
and
all
these,
because
these
are
all
these
all
of
them
fall
under
head
and
neck
cancer.
B
That's
one
way
of
doing
it,
I
mean
it's.
You
know
it
sounds
very
simple,
but
I
I
think,
as
probably
fredica
knows,
it's
not
the
simplest
of
things
to
do.
B
Yeah
and-
and
this
is
something
that
you
can
do
in
a
triple
store
like
virtuoso
or
blaze,
graph
or
elegrograph,
which
are
which
are
pure
rdf,
triple
straws
semantic
ventricular
stores.
The
atomic
does
not
support
this
kind
of
reasoning
to
one
of
the
areas
that
I
have
worked
with
in
the
previously
is:
how
do
you
represent
this
kind
of
ontological
structure
into
an
atomic
schema
and
be
able
to
develop
custom
algorithms?
B
So
that
you
can
do
this
kind
of
reasoning
in
india
comic,
because
if
you
look
at,
if
you
I
think
I
didn't
mention
it
and
I
think
frederick
also
didn't
mention
it.
The
sparkle
language
is
actually
very,
very
close
to
data
log.
The
and
the
rdf
triple
structure.
Triple
structure
is
very
similar
to
the
eav
data
structure
in
in
the
atomic
and
and
when
I
met
rich
hickey.
B
I
asked
him
about
this
because
I've
been
I'm
coming
from
a
semantic
web
background
and
when
I
first
encountered
clojure
and
then
comic
and
met
rich
hickey
at
one
of
the
closure
conferences.
I
asked
him
about
it
and
he
said
yes,
closure
and
atomic
borrows
a
lot
from
the
semantic
web
space,
and
so
you
can
see
a
lot
of
that.
The
thought
process
that
went
into
developing
atomic
you
know.
A
C
Ben,
I
think
I
think
this
was
in
reaction
to
ben's
comment
that
that
he
could
sort
of
infer
the
relationships
from
the
facts
or-
and
I
was
I
was
a
little
skeptical.
I
suppose,
if
the
facts
are,
is
large
enough
or
or
diverse
enough,
you
could
do
that,
but
to
cover
the
semantic
field
that
you're
trying
to
extract.
C
But
if
you
were,
if
you
had
say
a
very
narrow,
very
narrow
set
of
relationships,
I'm
not
sure
whether
that
would
be
sufficient
to
to
really
get
back
to
a
real
ontology.
B
So
you
know
in
a
very
small
space
if
it's
a
small
problem
that
you're
dealing
with,
I
would
definitely
say
that
you
know
if
you're,
using
an
ontology.
That
would
be
an
overkill.
It's
like
saying,
you
know
if
you
want
to
put
a
if
you
want
to.
If
you
want
to
dig
a
small
hole
to
to
plant
a
small
planter
right,
you
don't
want
to
get
a
big
tent
on
truck
digger
to
dig
that
hole
right,
so
you
have
to
use
the
right
tools
for
the
right
purposes.
B
So
if
you
have
a
small,
very
small,
well-defined
space,
then
you
know
writing
your
own
relation.
You
know
how
you
want
to
find
relations
within
them
or
what
you
want
to
infer
from.
It
is
probably
going
to
be
just
as
easy
as
trying
to
or
maybe
easier
than
trying
to
use
an
ontology
or
for
that
matter,
something
like
the
atomic
for
it.
B
It's
it's
a
non-trivial
process.
C
So
the
trade-offs
on
your
on
your
controlled
vocabularies
are
what
perhaps
you
could
speak
a
little
more,
I'm
assuming
something
like
you
can't
easily
identify
hierarchies
hierarchical
things
like
classes
and
super
classes
as
easily.
If
you
just
have
a
a
list
of
controlled
items,
yes,
I
mean
what
did
we
talk
about?
I
didn't
quite
follow
that.
Well,
so
you
said
you're
using
controlled
vocabularies,
which
are
really
just
lists.
C
D
B
Yeah
yeah,
so
the
the
thing
about
control
vocabulary
is
is:
is
that
you
can
you
can
pretty
much
put
anything
in
there
right?
It's
not
it's,
not
a
pure
taxonomic
structure.
Now
what
what
does?
What
does
the
hierarchy
mean
in
terms
of
taxonomic
structure
or
in
term
of
terms
of
an
ontology
right?
So
when
you
say
a
is
a
subclass
of
b?
What
does
what
does
that
mean?
So,
for
example,
you
say
a
dog
is
a
subclass
of
animal.
The
the
class
dog
right
is
a
subclass
of
the
class
animal.
B
That
what
it,
what
it
means
is
that
every
instance
of
a
dog
is
an
instance
of
an
animal,
and
there
is
no
instance
of
a
dog
that
is
not
an
instance
of
there's
no
instance
of
dog.
That
is
not
an
instance
of
animal.
It
has
got
that
that
kind
of
connotation
to
it,
formalism
associated
with
it.
So
when
when,
when
you
descri,
when
you
create,
when
you
create
an
ontology,
when
you
create
a
hierarchy
in
an
ontology,
this
is
the
kind
of
discipline
that
you
are
following.
B
But
when
you
look
at
control,
vocabularies
and
I'll,
give
you
an
example
from
icd-9
codes.
Right.
Icd-9
code
is
the
diagnostic
codes
that
are
used
in
medicine
and,
if
you
look
at,
if
you
go
to
icd-9
and
hypertension
area,
right,
you'll
see
hypertension,
which
is
a
disease,
but
you'll
also
see
something
like
hypertension
with
heart
disease,
hypertension
with
a
chronic
kidney
disease.
B
Okay.
So
if
you
think
of
this,
as
from
a
hierarchical
point
of
view,
you
are
seeing
you're
saying
that
hypertension
with
kidney
disease,
so
there
are
two
diseases
there.
Hypertension
and
kidney
disease
is
a
hypertension
and
that
doesn't
make
any
sense.
A
kidney
disease
is
not
hypertension
yeah,
so
that's.
C
Why
I'm
sorry
you're
going
across
two
axises
right?
I
mean
you're,
saying
it's
a
it's
a
certain
disease,
but
it
also
is
related
to
another
disease.
What
does
that
sort
of
imply
that
you,
you
would
want
to
move
to
a
more
general
graph
structure
where
you
could
represent
more
relationships
than
simple
yeah.
B
Yeah
so
so
is
a
the
is.
A
relationship
provides
the
hierarchical
structure
right
and
along
with
that,
if
you,
if
you,
if
you
were
to
the
example
that
I
gave
right,
it
was
about
from
pain,
pain,
says,
chest
pain
is
a
type
of
pain
and
then
it
has
sight
the
chest
so
that
provides
the
other
kinds
of
relationships
with
goes
towards
enriching
this
graph
and
that's
what
you're
missing
out
when
you
use
just
a
control
vocabulary,
you
don't
have
any
of
those
things.
A
All
right,
thank
you
very
much
to
both
of
you,
since
we
still
have
some
time
until
our
deadline
for
the
end
of
the
meeting,
and
there
was
there
was
a
list
of
topics
and
some
questions
that
we
gathered
during
the
the
rsvp
process.
I
would
like
to
just
open
the
floor.
Wonder
if
someone
has
any
further
questions.
They
may
be
data
science
specific
they
may
be
about.
Someone
asked
about
the
state
of
the
data
science,
tech,
for
example,
in
closure.
H
A
pending
question
to
see
what
I'm
in
his
architecture,
how
do
you
use
closure.
B
Okay,
so
in
my
I
I
don't
use
closure
for
most
of
my
work.
I
I
use
closure
more
on
a
personal
side
for
doing
most
of
my
own
research
projects.
My
own
skunk
works
kind
of
a
thing
and
that's
where
I've
used
it
all.
The
way
from
you
know
for
building
data
science
pipelines
to
even
web
applications
as
well
as
for
me,
some.
You
know,
nlp
and
more
recently,
in
struggling
with
some
machine
learning
kind
of
areas,
so
on
the
side
of
semantic
web
and
ontology
I
have
worked
in.
B
I
have
done
some
work
around
being
able
to
use
ontologies
and
ontology
schemas
and
being
able
to
represent
them
in
the
atomic,
and
crux
is
crux:
I
don't
know
if
you've
heard
of
crux
crux
is
the
other
database
which
is
similar
to
atomic,
and
so
that's
another
one
that
I
have
used
and
to
be
able
to
represent
ontologies
in
as
this
as
a
schema
in
atomic
and
being
able
to
leverage
the
hierarchies
and
other
relationships
within
them.
A
B
With
with
nlp,
I
have
used
opennlp
for
the
main
part
that
seems
to
have
pretty
decent
closure
support.
The
other
ones
that
have
come
across
are
like
stanford,
nlp
and
but
I
haven't.
I
haven't
used
that,
and
I
know
that
there
are
other
ones
where
you
can
come
across
examples
on
the
net,
but
I've
I've
not
found
them.
You
know
I've
really
struggled
to
make
them
work
both
for
nlp,
as
well
as
for
machine
learning.
B
A
F
We're
not
in
our
team,
not
we're
all
using
r,
but
you
know
one
could
imagine
obviously
the
same
way
that
we
built
a
system
to
use
r.
We
can
imagine
building
a
system
to
use
python,
even
though
I
think
a
lot
of
I'm
not
as
familiar
with
python,
but
I
don't
think
that
the
same
level
of
symbolic
manipulation
is
possible
with
python,
so
the
syntax
might
be
a
little
bit
more
string
based.
But
no,
in
our
case
we
all
use
our.
F
G
Yeah
hi,
I'm
lacey.
I
also
work
at
the
parker
institute
and
work
on
candle
been
chatting
with
some
of
you.
I'll
also
say
that
the
large
reason
that
we
use
r
for
all
of
our
data
science
and
analytics,
is
because
of
the
extensive
library
of
computational
biology
tools
that
exist
in
r.
So
anytime,
a
new
paper
comes
out
with
a
new
method
in
computational
biology.
G
It's
always
written
up
first,
as
an
r
package,
so
being
an
r
for
us
is
pretty
critical
to
staying
able
to
use
the
most
current
methods
in
the
field.
So
that's
why
we're
in
r.
G
A
F
Yeah
I
mean
we
just
that
they
they
end
up
published.
I
mean
in
scientific
publication
in
journals,
the
same
way
the
same
with
it,
the
same
way
that
would
be
in
if,
if
you
weren't
using
candles,
so
it's
it.
This
is,
from
the
all
intents
and
purpose
that
this
is
like
an
infrastructure
details
that
it's
you
know
will
be
mentioned
in
the
in
the
publication.
F
Maybe,
but
it's
not
really,
you
know
most
of
the
time
it's
not
really
relevant
to
to
to
this
kind
of
paper
that
really
focus
on
the
results.
H
H
H
That's
like
the
same
question
and
and
the
other
question
is
I'm
curious
about
you
reduce
case
sivarance
and
the
two
times
the
stuff
they
provide.
H
Yes,
so
krogs
has
like
a
two
axis
for
time.
Okay,
right!
That's
what
I
understand
this
is
the
atomic
is
time
based
cross
is
time
based,
but
they
provide
like
a
second
second
line
of
time,
and
I
was
wondering
how
do
you?
How
would
somebody
take
advantage
of
that
and
if,
for
some
reason,
you're
using
that
or
you're
using
it
as
an
atomic.
B
I'm
using
it
more
closer
to
what
the
atomic
functionality
is
providing.
I
haven't
really
explored
a
lot
about
the
the
time.
Travel
aspect
that
crux
is
is
you
know,
is
featuring
yeah
yeah,
I'm.
H
Playing
a
bit
with
crocs,
but
really
in
the
same
place
right
just
using
it.
I
H
F
You
know
in
our
field,
you
know
the
there
is
such
such
a
mountain
of
of
analytical
routine,
already
written
in
art
that
the
the
obviously
the
idea
of
providing
them
enclosure
for
for
us
is
a
non-starter
really
because
of
the
you
know,
that's
the
standard,
the
field
these
are
so
the
the
approach
that
we
have
taken
has
really
has
to
do
with
like
using
closure
for
for
all
this
data
processing,
and
then-
and
you
know,
a
lot
of
it-
is
done
in
an
abstract
way,
because
it's
you
know
it
doesn't,
for
instance,
the
the
tools
that
we
built
are
all
schemagnostic,
so
they
don't
have
you
know,
pred
doesn't
know
anything
about
the
atomic
schema.
F
The
atomic
schema
is
read
as
input,
so
a
lot
of
a
lot
of
these
computation
happens
at
a
pretty
high
high
level
from
a
from
a
semantic
point
of
view,
because
it
doesn't
actually
know
the
semantics
of
the
schema
interprets
the
semantics
of
the
schema,
and
so
you
know
all
for
all
of
this
stuff
closure
has
been
fantastic.
I
don't
even
know
how
we
would
have
written
this
in
in
any
other
language,
but
you
know,
and
then
so,
the
the
the
the
the
way
we're
using
did
is,
though,
is
really
confined
to
a
specific.
F
You
know,
problem
domain,
for
which
we
think
it's
really
well
suited,
and
then
the
the
you
know
the
then
we're
using
you
know
r
for
for
all
the
analysis,
because
yeah
once
again,
we
we
wouldn't
be
our
users
are
all
in
our
we're,
not
gonna
they're,
not
gonna,
learn
to
use
closure.
I
mean
that's,
that's
just
that's
just
the
reality.
F
But
I
will
also
say
that
so
what
one
thing
is:
that's
really
interesting.
So
what
we're
doing
is
not
it's
different
from
from
what
daniel
was
mentioned,
so
we're
not
trying
to
either
you
use
like
embed
our
enclosure
or
bad
closure
into
our
obviously
right.
So
the
system
is,
is
closure,
enclosure
and
atomic
is
a
web
server
and
you
could
query
the
web
server.
However,
you
wanted,
we
happen
to
be
querying
in
our
because
we
use
r,
but
I
guess
what
did
what
daniel
was
talking
about
and
what
maybe
other
people
here
are
hinting
about.
F
A
A
Unfortunately,
we
are
at
the
end
of
our
time
today.
I
hope
everyone
learned
something
new.
At
least
I
know
I
did,
and
I
definitely
encourage
everyone
to
keep
discussions
either
contacting
the
the
speakers
or
joining
the
clojure
and
zulip
channel
or
either
getting
in
touch
with
me,
daniel
anyone
in
the
cyclash
community.
We
are
always
eager
to
have
more
people
talking
about
data
science,
data,
science
and
closure
and
seeing
what
you
guys
are
doing
out
there.
B
H
Yeah,
no,
no!
No,
then
I
was
just
mentioning
I
I
didn't
want
to
take
this
time
for
that.
No,
no,
but
just
mentioning
we
I'm
really
happy,
because
I
I
we
have
I've
been
able
to
collect
a
group
of
people,
interesting
learning
closure.
We
created
this
community
club
called
closure,
closure
hispanic
and
we
just
recently
like
revive
it
because
it
was
existing
like
last
year
but
like
in
the
last
month.
We
are
reviewing
the
group
and
is
really
really
active.
H
Now
we
are
doing
inspired
by
you
with
this
idea
of
study
groups.
We
use
propose
it
and
we
are
doing.
We
are
reading
books
now
about
closure,
and
basically
I'm
here
like
spying
and
taking
notes
about
how
to
manage
study
groups,
but
we
are
like
26
people
and
all
of
us
have
been
doing
this
all
the
weeks
doing
the
the
classes
and
that,
I
think,
is
a
good
signer.
H
Well,
that's
it
so
we
are
people
from
spain,
obviously
and
all
the
americas,
I'm
from
from
venezuela
and
in
canada,
but
I'm
from
venezuela.
We
have
people
from
nicaragua,
peru,
argentina
and
spain
right
now,
and
it's
good
to
find
people
speaking
my
language,
but
I
can
do
a
better
level
in
communications
for
sharing
closure,
knowledge
and
ideas.
H
H
H
Well,
I
managed
I
manage
a
group
called
python
venezuela,
I
I
will
have
more
content
more.
We
have.
We
are
more
mature
years.
We
have
the
page
and
everything
and
as
many
many
more
people
as
the
other
group
and
managing
and
founder,
that's
in
and
we
have
a
foundation
in
venezuela
for
that
this
week
is
half
people
from
many
places
and
not
only
venezuela.
I
don't
know
why
he's
good,
but
I'm
all
in
now,
enclosure
like
I
I
use
I
help
with
the
fight
monster
group
all
fridays.
I
do
something
called
a
consulting
hour.
H
I
spent
one
hour
in
gtc,
one
video
conference
and
people
freely
come
in
and
ask
questions
right,
and
I
just
try
to
help
or
try
to
answer
the
questions
about
python.
When
I
get
some
or
that's
an
idea
right.
If
so,
if
I
get
some
expertise
in
closure,
I
expect
to
do
the
same
thing
for
closures
on
daily.
It's
one
one
hour
a
week.
I
just
put
the
link
in
the
group
and
no
no
scheduling
I'm
just
there
waiting
for
them.
Then
randomly
people
come
in
and
ask
me
something
about
item
and
we
try.
D
H
They
are
the
best
example
of
I
think
it's
the
right
example
of
a
community
in
a
local
language.
They
did
the
whole
translation
in
spanish
of
all
the
python
documentation
and
everything
is
in
spanish
and
it
is
wonderful
and
we
try
to.
When
we
created
python
venezuela,
we
tried
to
mimic
python
argentina,
but
we
didn't
need
to
do
translation.
Everything
was
so
everything
I
put
links
to
python
written
I
when
they
need
to
read
documentation
in
spanish,
but.
H
Anything
like
that
enclosure
not
going
near
so
that
would
be
the
most
important
I
think
initiative
to
just
have
a
translated
content.
I
I
I
know
by
first
hand
how
different
it
is
to
reading
you
know
in
spanish
and
in
english.
It's
her
to
heaven
right
so
yeah.
That
would
be
a
good
goal
if
we
collect
enough
energy,
I
suppose.
A
A
Yeah
cool,
then
again,
thank
you
very
much
everyone.
This
was
awesome
for
those
that
are
starting
the
day,
have
a
nice
day
for
those
that
are
finishing,
it
have
a
nice
night
bye.
Everyone.
Thank
you.
Bye-Bye.