►
From YouTube: CI WG demo: ImPACT, a toolkit for working with protected data in a multi-institutional environment
Description
ImPACT (Infrastructure for Privacy-Assured CompuTations) is an NSF funded BIGDATA project in its 3rd year of execution, bringing together experts from UNC and Duke to address some of the hard problems encountered by researchers, data providers and institutions when working on protected data. While focusing primarily on supporting analysis of PII data in social sciences as the primary use domain, tools developed for ImPACT are broadly applicable to other domains and types of data. This presentation describes what the project has been able to accomplish so far.
Date: 2/7/20
Presenter: Laura Christopherson
Institution: RENCI
South Big Data Hub
A
Why
don't
we
get
started
and,
as
usual,
we'll
we'll
start
by
just
going
around
the
virtual
or
rum
so
that
people
can
briefly
introduction
just
say
your
name
and
where
you're
from
we'll
do
a
little
bit
of
news
from
around
the
hugs
and
other
other
related
news.
And
then
we
have
Laura
Christensen
here
from
renze
to
talk
about
impact
which
is
a
tool
they're
developing
there.
So
why
don't
I
just
start
by
giving
people
a
chance
to
just
say
who
they
are
and
where
they're
from
and
I'll
just
go
around.
C
A
B
A
B
Actually
I
had
a
quick
favor
that
I
thought
I
had
asked
earlier
to
help
keep
track
of
everybody
that
that
attends
our
presentations.
It
would
be
really
helpful
if
you
could
rename
yourself
in
the
participants
list
on
zoom'.
Do
you
think
you
just
open
the
participants
list
and
hover
over
your
name
and
if
you
could
add
in
your
first
and
last
name,
that
would
do
me
a
big
favor.
A
A
A
F
B
A
B
G
B
C
G
A
So,
just
a
just
a
couple
of
things
from
news
around
the
hubs:
Christine
Kirkpatrick,
who
can't
be
here
today
from
the
West
Hub,
asked
us
to
mention
that
the
West
hub
is
seeking
community
input
through
an
open
to
all
online
survey
which
I
will
put
in
the
chat
window.
So
anyone
who's
okay,
responding
to
that
would
be
much
appreciated
and
the
first
500
respondents
gets
get
swag.
So
that's
a
an
incentive,
so
the
the
oops
wrong
wrong
link
but
I'll
get
there.
A
F
Since,
since
the
executive
director
is
here,
John,
maybe
I'll
kick
it
over
to
you,
alright,
is
there
you
you'd,
have
it
actually
from
the
source
himself?
So
you
know,
John
is
John,
is
new.
I
should
just
say
maybe
by
way
of
introduction,
so
that
he
didn't
have
to
do
this
himself.
I
think
this
may
be
the
first
time
that
John
is
called
joining.
F
This
call
as
the
newly
appointed
executive
director
of
the
Midwest
big
data
hub,
so
I
think
that
true
and
that's
certainly
welcome
news
that
we
have
new
leadership
and
that
he's
doing
a
great
job
in
terms
of
getting
all
the
various
activities
that
the
Midwest
big
data
hub
is
doing,
organized
and
and
together.
So
with
with
that
perhaps
introduction,
maybe
John.
If
you
wanted
to
jump
in
and
and
give
any
updates
in
terms
of
specifically
what
the
Midwest
big
data
hub
is
up
to.
D
Share
yeah
thanks
Jim
appreciate
that
introduction,
and
for
those
of
you
who
don't
know
me
I've
been
with
the
Midwest
hub
for
past
couple
of
years,
working
with
Melissa
when
she
was
there
as
the
IDI.
Just
a
couple
of
things
to
highlight
for
you
that
are
coming
up,
we
will
have
our
all
hubs
summit
coming
up
at
the
end
of
May,
so
hold
the
date
for
that.
It's
going
to
be
the
26th
through
the
28th
of
May,
hosted
at
Ohio
State
University
through
the
TDA
I
folks
there.
D
G
Sorry
I
was
muted
yeah.
We
nothing
in
particular
to
report
we're
looking
forward
to
the
all
hugs
effort,
and
one
thing
we
are
doing
is
we're
kicking
off.
You
know
the
discussion,
I
think
with
all
the
hub's
about
the
marketing
activity,
marketing
and
communication,
so
I'm
gonna
be
interviewed
for
that
to
see
how
we
can
help
with
that.
So,
if
any
of
you
have
specific
thoughts
on
that
feel
free
to
send
me
an
email
or
if
you
don't
know
my
email
I
can
stick
it
in
the
chat.
A
A
E
No
I
think
Christine
passed
on
I
mean.
Let's
see,
we've
got
a
number
of
fair,
related
activities
going
on
through
the
SDSC
office.
I,
don't
III
can't
give
you
updates
on
the
West
hub
as
a
whole.
I
think
that
the
survey
is
probably
the
most
important
general
announcement
there,
and
otherwise
that
that
from
the
STS
C
office
will
be
leading
a
number
of
fair,
related
activities
that
will
be
part
of
our
West
hub
grant
for
this
year
and
next
year.
So
that's
probably
it
okay.
A
H
A
H
A
E
E
E
There
are
three
different
communities
interested
in
this
data
set,
and
so
the
data
are
now
available.
It's
about
10,
terabytes
or
so
data
they're
now
available
through
the
OS
n
and
serving
both
the
infrared
imaging
community,
which
is
a
fairly
small
community,
but
also
then
clinical
and
computational
biomedical
researchers
in
the
cancer
space
and
now
that
data
set,
because
it's
highly
annotated
is
also
available
for
work
by
people
in
directly
in
the
machine
learning
space.
E
So
that's
just
one
example:
there
we
also
at
Illinois
have
loaded
up
so
that's
available
through
an
s3
API
through
the
cloud
or
interface.
We
have
one
of
the
one
of
the
critical
zone,
Observatory
datasets
available
and
that's
a
hydrology
data
set
and
that's
available.
The
front-end
interface
to
that
would
be
will
be
clowder.
We
have
a
fairly
significant
chunk
of
data
from
one
of
the
big
NASA
instruments.
That's
now
available.
I
won't
speak
too
much
about
that,
but
I
can
post
some
information
links
to
that.
E
There
was
a
big
presentation
at
age
you
on
that
data,
but
that's
a
about
a
hundred
and
fifty
terabytes
of
data
from
this
NASA
instrument.
It's
integrated
from
several
different
well
several
different
instruments
from
one
from
one
satellite
we've
also
now
got
the
data
from
Beco
demo,
which
is
which
is
oceans,
data
the
Beco
demo
group
needed
a
place
to
be
able
to
test
their
their
architecture
for
objects
store
for
their
entire
data,
set
data
collection.
E
So
there
they've
got
a
bucket
now
on
OSN
that
they
can
begin
to
do
testing
of
their
new
architecture
related
to
both
io
and
and
and
calls
into
the
data
set
for
for
queries,
so
they're
testing
that
there
and
then
on
the
northeast
pod,
which
John
oversees
the
northeast
pod,
has
some
also
ocean
physics,
data
coming
from
other
NASA
instruments
and
the
front
end
on
that
will
be
an
I
rods
front
end.
So
we're
testing
a
whole
number
of
use,
user
interfaces
and
I/o
mechanism
mechanisms,
and
that's
just
a
sampling
era.
E
We've
got
another
three
or
four
data
use
cases
that
are
gonna,
be
loaded
up
and
in
the
near
term.
So
that's
that's
a
handful
of
examples.
There
of
different
different
disciplinary
and
project
based
data
sets
that
are
now
available
for
both
public
use
and
internal
testings,
so
that
so
that,
then
they
can
be
opened
up
for
public
use
in
the
not-too-distant
future.
I.
A
Think
Alex
zulay
who's,
the
P
I
often
points
out
that
open
storage
network
is
90%
social
experiment
and
10%
technology
experiment.
It's
it's
been
kind
of
fascinating
and
fun
to
see
engineers
and
data
scientists
work
across
five
different
sites
to
bring
up
a
fairly
complex
piece
of
piece
of
technology,
and
it's
it's
a
lot
of
fun
to
see
it
start
to
work
and
start
to
deliver.
So.
E
John
I
might
just
add
one
more
comment
just
because
there
are
folks
on
the
call
that
might
actually
get
asked
this
question
because
we're
getting
it
regularly
and
that
is
what's
the
status
of
oh,
oh
s,
n.
What
are
the
next
steps?
Do
you
proceed
moving
forward,
because
this
is
a
pilot,
our
our
active
funding
ends
in
June.
E
We
are
anticipating
putting
in
a
no-cost
extension
to
wrap
up
some
work
that
we
can
do
and
within
the
constraints
of
our
current
budget,
but
we
are
also
in
the
process
of
starting
to
scale
up
are
starting
to
put
together
what
would
be
a
proposal
to
scale
up
the
OS
n,
so
we're
thinking
about
various
ways
that
we
will
we'll
take.
What
we're
doing
now
think
about
some
bridge
work
that
needs
to
happen
and
then
and
then
be
able
to
put
in
a
proposal
to
scale
the
OSN
out.
E
I
can't
answer
the
question
about
what
the
size
of
that
scale-out
will
be
right.
Now
we
are,
we've
got
a
couple
different
sort
of
pathways
where
we're
considering
so
whether,
for
example,
we
move
to
30
petabytes
or
50
petabytes.
Those
are
you
know.
They
will
take
different
kinds
of
efforts
to
be
able
to.
You
know
pick
one
of
those
pathways
and
I
guess
the
other
piece
I
would
add.
So,
yes,
we're
moving
forward,
we're
continuing
to
grow
we're
planning
on
the
next
phase
proposal.
E
That's
one
of
the
storage
nodes
at
an
institution
so
requesting
pod
funding
to
be
able
to
link
into
the
network
or
to
purchase
storage
that
would
come
into
a
standing
pod.
So
there
are.
There
are
proposals
out
and
going
into
NSF
now
to
continue
to
add
to
the
network
and
grow
the
collaborative
network
as
well
and.
A
A
A
G
What
are
some
of
the
challenges?
What
are
some
of
the
you
know,
practices
you
have
on
what
you're
doing
what
helped
you
wish
you
still
had,
so
we're
hoping
we
can
get
it
ready
to
send
out
in
March
the
survey,
so
I
could
ask
Jim
Bosnia
if
he
thinks
it
would
be
interesting
to
talk
about
it
on
the
March
6th
call
go
hurt,
okay,.
A
Let's
move
on
then
lauric
Kristofferson
is
presenting
on
behalf
of
Yulia
Baldwin
on
the
impact
project,
infrastructure
for
privacy,
assisted
computations
I
am
personally
taken
by
this
this
project
and
that
there
are
so
many
security
and
and
sensitive
data
oriented
projects
that
focus
on
protecting
the
data
and
so
few
that
work
on
making
it
straightforward
to
protect
the
data
and
and
manage
access.
So
this
is
a
at
least
to
me,
a
pretty
fun
talk
and
I
will
not
attempt
to
steal
any
thunder
whatsoever.
I'll
just
hand
the
mic
over
to
to
Laura
I.
C
I
just
want
to
thank
you
all
for
having
me
and
I'm
give
Ilyas
apologies
for
not
being
able
to
be
here.
The
I
spoke
with
him
to
see
how
he
wanted
me
to
present
this
information
to
you
and
and
I
asked
Mary
Ann
to
send
out
those
links
earlier.
I,
don't
know
if
any
of
you
had
a
chance
to
look
at
them,
but
Ilya
wants
me
to
have
you
guys.
Look
at
the
video
that
we've
created
it's
an
animation.
It's
an
entre
non-technical
introduction
to
impact,
so
I
will
show
you
the
video.
I
And
challenges
largely
around
complying
with
various
policies
and
requirements
on
data
access
in
use.
This
is
most
especially
the
case
for
research
projects
using
sensitive
data
data,
for
example,
that
may
contain
personally
identifiable
information
or
PII,
and
that
must
be
specially
guarded
to
protect
the
identities
of
the
people
that
are
described
in
the
data.
Many
researchers
work
in
teams
across
institutions
and
disciplines,
which
may
add
another
layer
of
complexity
to
the
research
process.
I
For
example,
a
research
team
may
have
to
comply
with
data
use
policies
and
technological
constraints
on
the
protection
of
that
data
that
differ
between
institutions
and
discipline.
Impact
infrastructure
for
privacy
assured
computations
is
a
project
that
aims
to
simplify
these
complexities
and
ease
the
research
process.
So
researchers
can
be
free
to
focus
more
fully
on
their
research.
Specifically,
impacts
goal
is
to
support
collaborative
multi
institutional
analysis
of
data,
ensuring
there's
protection
and
a
satisfaction
of
policies,
and
agreements
governing
its
use
explain
impact
better.
Let's
introduce
some
of
the
key
players.
Hi.
H
J
A
researcher
and
for
my
next
project
I've
been
wanting
to
work
with
a
particular
set
of
data
that
is
sensitive
and
has
some
strict
policies
around
its
use.
I
like
to
get
access
to
it
without
delay,
I
don't
want
to
violate
data
provider
policies
and
I
will
need
a
secure
place
to
do
my
analysis
to
avoid
the
risk
of
data
leakage.
I
am.
B
H
I
C
C
H
I
Impact
you
can
keep
data
safe
and
secure
on
your
servers,
but
create
a
listing
for
her
in
dataverse
a
data
repository.
The
listing
would
show
all
the
metadata
our
data
about
data
describing
data,
but
none
of
her
actual
details
just
download
and
install
TRS,
a
the
crusted,
remote
storage
agent.
It
will
index
data
harvesting
any
metadata
and
create
that
listing
for
you
on
data
verse.
All
the
while
data
will
be
safe
and
sound
with
you,
data
provider,
I.
H
B
I'm
in
data
provider,
I
am
the
newest
service
in
impact
I
work
with
you
to
store
all
your
policies
and
keep
an
audit
trail
of
what
all
the
different
players
in
the
story
need
to
do
to
satisfy
their
agreements.
For
example,
if
you
want
the
policy,
the
state's
researchers
shouldn't
share
data
with
others.
You
can
register
that
with
me
and
when
researchers
promise
to
adhere
to
that
policy,
I
will
notice
in
my
log
with
the
date
in
a
signature.
Consider
me
your
digital.
H
B
I
J
Remember
me:
I
had
my
eye
on
a
data
set
that
I'd
like
to
use
in
my
next
study
today,
I
was
browsing
through
data
verse
and
data
caught
my
eye
she's
perfect
for
my
study.
Ordinarily,
it
takes
quite
a
bit
of
time
to
obtain
permission
to
use
a
data
set
because
of
all
the
back-and-forth
through
email
and
phone
calls
and
all
the
paperwork,
but
within.
B
Releasing
her
to
you,
you
can
formalize
your
agreement
to
those
policies
with
me.
Wait
a
moment.
Please,
I
have
a
part
to
play
in
this
as
well.
I
need
to
make
sure
that
the
researcher
submits
a
proposal
to
my
office
outlining
her
intentions
with
data
data
contains
some
sensitive
information
and
I
need
to
be
part
of
ensuring
she
is
adequately
protected.
B
J
H
H
It's
a
remote
desktop
presented
to
you
on
your
computer
through
a
browser
window.
It's
invisible
to
others
on
the
internet.
Access
is
restricted
to
you,
and
only
those
collaborators
that
have
been
approved
through
the
notary
service
and
login
is
encrypted
any
analysis.
Software
I,
install
in
the
Enclave
will
have
been
rigorously
tested
to
ensure
it
doesn't
make
the
Enclave
vulnerable
to
attack
wow.
J
H
B
H
J
C
H
I
A
C
So
this
is
sort
of
a
diagram,
a
different
way
to
sort
of
view
what
was
discussed
in
the
video,
although
the
diagram
includes
some
components
that
are
not
covered
in
the
video.
So,
for
example,
the
different
actors
would
be
the
data
provider,
the
infrastructure
provider,
anything
that
you
might
want
to
call
institutional
governance
or
IRB,
as
well
as
the
researcher,
the
most
important
one.
C
So
the
the
idea
is
that
a
data
provider
can
have
data
that
that
they
own
that
they
keep
on
their
server
that
maybe
they
don't
want
to
really
release
out
to
the
world
just
yet
and
they
can.
But
they
want
to
list
it
on
dataverse
down
here,
and
they
want
to
do
that
where
they
have
the
listing,
but
are
able
to
keep
their
data
on
their
own
servers
under
their
own
protection.
And
so
the
component
here
TRS
a
trusty,
remote
storage
agent
was
developed
in
conjunction
with
dataverse
and
at
Odom
Institute
at
UNC.
C
Basically,
what
notaries
service
is,
is
it's
it's
like
an
auditor,
or
you
know,
a
notary
service
where
it
basically
keeps
track
of
everything.
That's
going
on
in
this
larger
system.
So,
for
example,
the
data
bride,
the
data
provider
may
have
policies
about
how
people
can
use
is
or
her
data,
for
instance,
they
may
say:
okay.
C
Well,
you
can,
you
must
I'm
gonna,
give
it
to
you,
undie
identified
you're
going
to
have
to
be
identify
it
in
order
to
use
that
ninety-two
assurances
that
you're
going
to
keep
it
on
a
lockdown
server
and
that
you're
not
going
to
share
it
with
people
that
are
not
on
your
project.
So
the
data
provider
will
have
that
kind
of
information
in
in
human
readable
policies,
stuff
that
we
all
understand
when
we
just
take
a
look
at
it.
C
Well,
the
data
provider
will
then
need
to
put
those
policies
in
more
of
a
machine,
readable
form
in
notary
service
and
and
there's
more
intricacies
with
that
that
we
can
of
course
discuss
offline
if
you're
more
interested
in
that
and
then
safe.
Is
this
logic,
engine
that
sits
behind
notary
service
and
what
it
does
is
it
helps
sort
of
manage
the
way
that
notary
service
governs
the
approval
or
attestation
to
the
various
policies
that
the
data
provider
may
set
up
presidio
and
the
reason
a
wall
was
chosen
is
because
that's
exactly
what
it
is.
C
It's
like
a
gatekeeper
and
it
ultimately
determines
whether
or
not
a
person
can
have
access
to
the
data
providers
data.
So,
theoretically,
a
data
provider
would
then
have
notary
service
safe
and
presidio
installed
on
his
own
servers
and,
and
they
would
work
together
to
protect
the
data
and
only
allow
its
use
by
other
folks,
such
as
the
researcher,
once
the
researcher
has
attested
to
agreed
that
they
are
going
to
follow
through
on
various
policies.
Yes,
I
will
de
identify
the
data.
C
Other
actors
in
the
system,
such
as
an
infrastructure
provider
or
the
IRB,
can
also
play
a
role
in
this.
So,
for
instance,
we,
what
was
first
sort
of
discuss.
Was
this
idea
of
an
enclave,
a
secure
Enclave,
and
that
is
another
component
of
impact
which
is
separate.
It
was
originally
developed
at
Duke
University,
and
it
has
this
groovy
Pro
console
front
in
which
allows
it
to
show
you
essentially
a
VM
in
a
browser
window
and
that
Enclave,
where
you
can
go
and
have
your
data,
your
private,
that
private
data
download
it
into
here.
C
It's
secured
and
only
you
and
whoever
you
have
noticed,
who
you've
listed
a
notary
services
being
able
to
have
access
to
the
data
that
the
data
provider
said
is
OK
by
a
notary
service.
Only
those
folks
can
gain
access
to
the
Enclave
and
they
can
go
in
there
and
they
can
use
different
types
of
secured
analysis,
software
that
you
might
request
so,
for
instance,
in
the
animation
the
the
researcher
said:
oh
I'd
really
like
to
use
MATLAB
and
I
think
it
was
R.
C
So
if
you
wanted
that
part
of
this,
the
impact
process
is
that
that
those
kinds
of
analysis
software
would
be
securely
packaged
and
then
made
available
in
the
infrastructure
in
this
Enclave.
For
you
to
then
use
and
with
the
data
and
to
conduct
your
research
and
the
whole
point
of
this
is
to
keep
any
data
from
exfiltrating
in
ways.
You
did
not
want
and
then
of
course,
there's
a
you
know,
the
the
infrastructure
provider
might
be
the
one
that
then
says.
Oh
ok,
I
created
this
Enclave
2
specification.
C
Whatever
those
specifications
were
might
have
been
laid
out
by
the
data
provider
such
as
it
needs
to
be
password-protected,
it
needs
to
you
know,
logins,
to
be
encrypted.
You've
got
to
use
this
that
or
the
other
and
I
so
forth,
and
so
on.
So
the
infrastructure
provider
can
go
in
to
notary
services,
say
ok
for
this
project,
and
this
can
be
like
an
IT
person
at
your
university
or
it
could
even
be
a
cloud
service
provider.
Somebody
like
Amazon,
ok,
I,
went
in
and
I
created.
C
This
Enclave
and
I've
made
it
according
to
the
specifications
of
the
data
provider
as
set
down
in
the
policies
that
are
in
notary
service.
Irb
can
then
say:
okay,
I've
gone
into
notary
service
and
I,
see
that
I
need
to
look
at
the
IRB
proposal
and
which
is
basically
a
set
of
policies
that
the
researcher
will
have
agreed
to.
So
those
can
be
listed
in
notary
service
and
they
can
say
yes,
I've
reviewed.
This
I've
exempted
this
person,
let's
say,
and
so
they
may
move
forward
with
their
research.
C
C
C
Then
there
is
this
piece
which
I
said
these
three
go
together:
notary
service,
safe
and
Presidio,
and
their
goal
is
to
help
negotiate
and
log
data,
use
agreement,
type
policies
and
automate
access
to
protected
data
that
is
stored
on
the
data
providers.
Own
servers
and
the
last
piece
is
the
dataverse
TRS,
a
piece
because
PRS
a
of
course
is
accessible
via
dataverse,
so
that
what
this
does
is
allows
you
to
have
a
nice
secure
way
to
list
your
data
and
offer
it
up
for
discovery,
so
I'm
gonna
stop
there
and
see.
C
And
actually
I'm
glad.
You
asked
that,
because
one
of
the
reasons
why
I
think
Shannon
had
asked
if
we
would
be
interested
in
talking
to
you
all
it's
because
we
would
really
love
to
get
some
early
adopters,
who
will
give
us
feedback,
you
know
who
will
use
it
and
implement
it
and
say
you
know:
I,
don't
like
this
piece,
but
this
piece
works
real
well
for
us
we'd
like
to
see
put
it
into
practice
and
see
how
folks
are
using
it.
You
know
see
how
folks
would
use
it.
C
F
Laura,
this
is
some
Jim
Mullikin
bush.
Thanks
for
your
presentation,
I
was
just
trying
to
get
a
sense
of
you
know
as
a
follow-up
to
that
answer,
or
are
the
people
using
it
in
sort
of
beta
use
or,
given
you
know,
the
sensitivity
around
the
data.
It
wasn't
really
clear
to
me
whether
this
is
a
in
production
or
or
whether
they're,
where
whether
it's
just
in
sort
of
proof
of
concept
stage,
that's.
C
A
wonderful
question,
and
although
we
do
not
use
the
term
proof
of
concept,
we
use
a
corollary.
What
what
we
consider
it
to
be
in
now
is
the
minimal
Viable
Product
stage.
It's
like
proof
of
concept,
so,
basically
what
it.
What
that
means
is
we've
gotten
it
to
a
point
where
yeah
I
guess
you
could
call
it
a
beta
I
guess
we
could
use
that
as
a
synonym
where
it
it's
a
working
functioning
entity
that
can
be
adopted
and
used.
C
The
folks
that
are
using
it
now
are
testing
it
out
more
so
than
actually
using
any
real
sensitive
data.
As
of
yet
so,
for
example,
we
have
a
faculty
member
at
UNC
who's,
a
math
professor,
and
he
also
works
with
the
sociology
professor
over
at
Duke,
and
they
are
currently
using
it,
but
they
are
using
simulated
data
right
now
in
it
because
we're
getting
them
to.
They
have
been
testing
Pradhan.
C
The
actual
Enclave
piece
and
we've
also
got
trust
at
sea.
Ice
oops
mentioned
that
earlier
and
they're
using
it,
but
they're
also
using
it
from
the
standpoint
of
trying
to
poke
holes
in
it
and
make
sure
it's
it's
it's.
What
we
want
it
to
be
in
terms
of
security,
so
they're,
not
so
so,
there's
folks
using
it,
but
in
for
different
reasons
and
we'd
love
to
have
other
folks
we're
talking
to
research
computing
at
UNC
about
them,
possibly
offering
a
use
and
John
Crabtree.
C
F
C
An
interesting
question
so,
as
far
as
audience
is
concerned,
the
grant
was
actually
written
to
focus
on
social
scientists
and
and
their
needs,
and
that
would
include
also
so,
for
example,
we've
also
been
giving
demos
and
talking
to
and
gathering
requirements
from
people
at
the
business
school,
for
example,
or
in
the
School
of
Journalism.
So
in
the
business
school
the
folks
there
they
may
not
necessarily
have
sensitive
data
per
se,
but
they
may
have
commercial
data
data
that
has
commercial
value,
and
so
they
don't
really
want
it
to
be
linked
out.
C
C
We
need
to
know
how
to
improve
it,
and
we
want
to
make
sure
that
the
way
we
spend
the
remaining
grant
money
is
so
that
we
are
attending
to
whatever
the
needs
of
those
folks
are
that
we
haven't
so
far
met
within
the
system
as
it
currently
is
in
its
minimal
Viable
Product
stage.
Did
that
answer
your
question.
F
C
C
It
pretty
much
is
open
source
because
well
actually,
I
cannot
say
for
certain
if
I
Don
Crabtree
would
consider
to
your
essay
open
source,
but
the
code
for
presidio,
safe
and
notary
service
is
something
that
ultimately
would
have
to
be
given
over
to
the
data
provider.
At
the
debate,
data
provider
wanted
to
use
it
pro
console
is
again
one
of
the
things
that
we've
done
is:
we've
included
some
rather
detailed
instructions
on
how
to
implement
each
of
these
components.
C
So,
for
example,
Don
Sizemore,
who
works
with
John
Crabtree
over
at
the
Odom
Institute,
is
the
one
who's
and
sort
of
in
charge
of
creating
some
of
these
enclaves
for
UNC
folks,
and
so
basically,
what
he's
done
has
created
a
lot
of
helpful
information
about
you
know
these
are
the
different
components
you're
going
to
need
to
add
and
here's
how
you're
gonna
have
to
use
it.
So
it
is
already
open
source.
B
H
E
Laura
this
Melissa
created
thanks
thanks
for
your
presentation,
I
watch
the
video
the
other
day
and
and
so
I'm
intrigued
by
a
number
of
by
the
system
here
as
a
whole,
but
also
by
a
couple
of
the
components
and
I'm
wondering
have
there
been
conversations
about
the
roles
and
responsibilities
for
and
whether
this
fits
are
not
into
the
into
the
tool
or
the
service
for
checking
compliance.
You
know
so
so
who's
responsible
for
actually
you
know
the
notary
service
holds
the
data.
Use
similar
did
user
signature.
E
C
That's
it
that's
a
very
good
question,
and
that
is
that
is
one
of
the
things
that
makes
this
rather
complex
is
because
the
the
goal
is
to
make
something
like
notary
service.
That
can
then
be
useful
and
streamline
the
process
and
make
it
more
automated
for
all
of
the
different
actors
and,
and
so
what
that
would
ultimately
involve
is
we
need
to
get
folks
who
were
on
all
you
know.
C
We
would
need
to
get
a
test
use
case
kind
of
scenario
where
we
had
actors
from
all
of
these
to
work
with
the
system
and
to
let
us
know
if
they
think
it's
feasible,
for
example,
for
their
larger
institutions.
So
so,
as
I
said,
we
were
trying
to
work
with
research
computing
over
it
and
the
I
TS
information
technology
services
at
UNC,
and
but
that
so
far,
what
we're
looking
at
is
more
this
side
of
things.
Mm-Hmm.
H
C
Basically
it's
a
log
of
who
has
agreed
to
what,
throughout
this
research,
research
approval
and
data
use
process,
and-
and
it
would
be
good
if
we
could
see
how
all
that
would
interact
and
I'm
think
I've
lost
the
the
flow
of
what
I
was
going
to
say
in
answer
to
your
question,
so
tell
me
what
I
haven't
answered?
No.
A
Hello,
this
is
John
Goodhue.
One
of
the
one
of
the
challenges
with
with
sensitive
data
is
liability
and
guess
you
I'm
guessing.
You
thought,
through
kind
of
the
liability
exposures
for
someone
who's
operating
either
a
service
like
this
in
total
or
or
one
of
the
components,
I'd
be
curious
to
hear
how
you're
thinking
about
it.
That's.
C
Also,
a
good
question
very
similar
I
was
also
thinking
about
that
when
I
was
trying
to
answer
the
earlier
question.
So
what
notaries
service
does
is
notary
service
basically
keeps
a
log
or
a
record
of
all
the
agreements
made
you
know
and
the
date
and
the
time
and
who
agreed
to
what?
But
it
doesn't
actually
provide
means
for,
let's
say
the
data
provider
to
then
find
out
if
the
researcher
has
actually
de-identified
the
data
to
use
the
example
I
was
using
before
so
it's
based
on
the
trust
system.
C
Basically,
the
researcher
agrees
that
he
will
be
identify
the
data
and
he
makes
that
agreement
a
notary
service
and,
and
then
the
data
provider
releases
it
based
on
trusting
that
the
researcher
will
actually
do
with
what
the
researcher
says:
a
notary
service,
the
advantage
of
notary
service,
because
you
know
that
seems
like
well
we're
in
the
same
boat.
We
were
before
the
advantage
of
notary
service
is
that
it
is
a
digital
record.
You
now
have
a
record
that
says
the
researcher
agreed
to
this.
C
C
But
yeah,
if
you
wanted
to
add
more
of
a
stronger
consequence,
sort
of
aspect
to
it
more
at
the
organizational
level,
the
institution
level,
that
would
probably
need
to
be
a
policy
decision
among
the
institution's
policy
makers
on
how
you
would
want
to
enforce
anything.
Did
that
answer
your
question.
A
Let's
see
I'll
answer
by
that
by
saying
we
are
close
to
the
top
of
the
hour.
I'll
stop
there
and
I
was
wondering
Marian,
whether
you
might
just
say
a
little
bit
about
the
first.
Thank
you.
I
was
looking
forward
to
this
and
and
just
a
remarkable
thing
that
you're
doing
so
I'll
leave
you
with
that
and
Marian.
If
could
you
spend
just
a
just
a
minute
talking
about.
C
Let
Marian
know
if
you
are
interested
in
in
either
having
a
more
detailed,
more
technical
demo
that
will
go
through
all
the
different
components
and
explain
what
you
would
need
to
set
them
up
and
try
them
out,
and
let
her
also
know
if
you're
interested
in
trying
any
of
this
we
would
have.
We
would
love
to
talk
with
any
any
of
you
more
about
how
we
can
make
this
useful
for
you.
That's
all
I
have.