►
From YouTube: CI WG demo: QueryArrow
Description
Date: 02/03/17
Presenter: Hao Xu
Institution: Renaissance Computing Institute (RENCI)
South Big Data Hub
A
So
it's
like,
we
have
maybe
a
good
quorum
on
the
on
the
phone
here,
so
we
can
get
started
so
welcome
everyone
to
the
South
big
data
hub
data
sharing,
an
infrastructure
working
group.
This
is
hopefully
going
to
be
one
of
the
last
in
a
series
of
demos.
We've
been
doing
since
last
fall
at
the
recommendation
of
Regan
more
to
showcase
some
demonstrations
on
core
components
that
would
make
up
cyber
infrastructure
supports
for
the
hub,
putting
the
data
Federation.
A
So
with
that,
because
we
wonderful
sis
is
set
up
today,
I'll
introduce
the
first
one.
I
am
pronouncing
this
correctly.
How
are
you
how
you
is
a
research
scientist
at
the
data
intensive
flex,
Pharma
Center
University
of
North
Carolina
at
Chapel,
Hill
they've
been
working
on
improving
the
rule
engine
and
rule
language
and
the
metadata
catalog
of
the
integrated
rule-oriented
data
system,
known
as
I
rise
since
2010.
He
develops
pluggable
rule
engine
architecture
that
allows
interoperability
between
different
programming
languages
and
the
irods
data
management
systems.
A
His
research
interests
include
theory
of
data
management
automatically
are
improving
programming,
languages,
distributed
data
systems
and
formal
methods
and
software
development.
He
has
a
bachelor's
of
engineering
and
computer
science
and
engineering
and
a
BS
bachelor
of
science,
minor
applied
math
from
Bay
Bay
Hong
University
and
a
PhD
in
computer
science
from
University
of
North
Carolina
at
Chapel,
Hill
I'm
going
to
ask
this
of
all
my
speakers.
A
B
Thank
you
so
I'm
going
to
start
my
presentation
so
today,
I'm
going
to
talk
about
a
query.
Arrow
Carrere
is
a
software
that
I've
been
developing
since
I
think
it
says
the
year
before
it
could
be
like
one
year
hat
since
I
started
developing
for
arrow
and
they
are.
The
goal
is
to
create
a
software
that
allows
us
to
do
bi-directional.
Integration
of
multiple
metadata
sources,
so
I
understand,
there's
a
lot
of
technical
charges
here
so
but
hopefully
I'll
explain
Muslims
during
the
presentation.
B
So
hopefully
everybody
will
get
a
basic
idea
of
what
Corel
does
and
how
it
could
potentially
help
your
projects
at
the
end
of
the
presentation,
so
so
acknowledgment
so
secure
some
resources.
So
the
first
so
because
this
presentation
is
supposed
to
be
a
very
high
level,
one
so
I
think
some
folks
in
the
audience.
If
you
would
like
to
get
a
little
bit
more
details
or
technical
details
and
what
makes
Carrero
unique
compared
to
similar
projects
on
the
internet,
you
can
there's
a
technical
report.
B
B
B
If
I
have
metadata
outside
I
rot
each
day
like
say,
I
store
this
metadata
in
a
graph
database
right,
how
hard
I'm
going
to
be
able
to
am
I
able
to
integrate
my
metadata
stored
outside
IRAs,
with
metadata
stored
inside
IRAs
and
kind
of
represent
a
unified
view
of
both
of
the
data
right.
So
so
people
Carrero
the
answer
was
it
was
possible,
but
it
was
very
ad
hoc.
It
was
not
easy
to
do,
and
also
it's
very
error-prone,
which
means
you
know
you
might
work
for
one
page,
but
not
necessary
for
another.
A
B
Hell
tries
to
solve
that
problem
and
the
second
one
is
policies
for
her
method,
access
control,
so
access
control
for
your
data.
Like
you
know,
people
have
all
kinds
of
existing
access
control
mechanism
like
groupers,
and
things
like
that.
So
so
you
know
maybe
that
if
they
want
to
use
those,
how
do
you
integrate
those
into
your
into
your
system
discovery
so
when?
B
Basically,
this
is
when
you
have
a
huge
collection
of
data
and
you
have
some
sort
of
metadata
tags
on
those
days
all
right.
So
how
do
you
efficiently
search
for
your
data
object
space
on
the
metadata
tags?
So
that's
there's
another
thing
that
Carrero
help
solve
are
the.
Finally,
last
but
not
least,
you
know
only
the
migration,
so
this
is
a
big
one,
because
when
you
think
about
a
metadata,
catalog
right,
you
you,
you
think
about
metadata
as
not
just
being
a
kind
of
when's
the
end
a
data
store
buy
it.
B
You
may
need
this
metadata
like
ten
years
in
the
future
right,
so
when
an
underlying
technology
has
changed.
So
how
do
you
migrate
from
one
technology
to
another
without
changing
the
semantics
of
the
metadata?
So
that's
another
question
right
so
so
incur
to
be
a
bigger
square.
Provides
you
with
a
kind
of
semental.
You
cannot
just
in
taxable,
you
notice
the
unified
API,
but
also
what
the
API
does.
The
semantics
is
also
universal.
B
So
this
way
you
can
kind
of
have
certain
kind
of
more
guarantee
that
when
you
migrate
your
base
metadata
into
a
new
database
technology
usually
does
meaning.
The
semantics
of
your
metadata
has
to
not
be
changed.
Okay,
here's
from
some
of
the
elements
is
the
elements
of
Carrero
there's
a
courier
service
which
is
basically
a
service
that
runs
on
a
server.
It's
like
you
know
it's
like
web
server
or
something,
but
it's
not
really
a
web
server.
B
It's
kind
of
like
a
system,
a
service
runs
on
a
machine
and
what
it
does
is
allows
you
to
write
your
different
data
stores
and
also
it
allows
you
to
execute
QA
l
by
QA.
L
qm
is
the
short
for
query
language.
So
it's
basically
a
semantically
unified
language
which
is
also
commonly
specified
a
gives
you
three
things
one
is
configuration,
allows
it
gives
you
a
political
configure,
the
career
of
service.
B
A
B
The
the
component
allows
you
to
go
out
to
different
data
stores
or
databases
will
show
a
diagram
example
how
these
comes
together
in
the
next
slide.
So
so
the
other
deport
component
is
a
era's
plugin
which
allows
Iran
to
use
her
arrow
as
a
kind
of
metadata
catalog,
so
that
I
read
can
I
super
arrow.
Arrows
can
get
metadata
from
multiple
credits.
B
This
kind
of
answers,
the
question
one
of
the
applications
already
one
of
the
motivating
applications
at
how
about
earlier
is
that
which
is
what,
whether
you
can
kind
of
integrate
a
graph
PBS,
for
example,
into
your
marriage,
your
eyebrows
right
so
as
nice
a
day
yeah
so
Carreras
specification.
So
this
is
also
important,
which
basically
gives
you
the
the
semantically
unified
instead
of
just
api.
You
know,
you
know
when,
when
you
say
is
like
a
language
rights.
You
have
spoken
language,
you
you
have
this:
maybe
you
have
German,
you
have
English
and
they
both
have.
B
B
On
top
there's
the
query
arrow
database
plug-in,
which
basically
interface
with
allows
irods
interface
with
arrow
and
on
the
bottom,
there
are
three
QEP
which
of
Carrero
plugins
allow
square
able
to
go
to
PostgreSQL
new
for
j,
which
is
a
graph
EBI.
You
might
start
with
a
search
engine,
so
so
basically
we're
providing
the
unified
view
from
Iran's
point
of
view.
All
Pereira
does
is
try
to
probe
the
core
area
is
just
one
data.
A
B
So
to
speak,
you
can
think
of
that
way,
but
under
the
need
a
square
arrow
there
there
may
be
multiple
different
data
source,
so
this
greatly
simplifies
the
of
the
development
of
a
client,
because
a
client
doesn't
have
to
deal
with
Marco
their
source.
Of
course,
you
can
also
use
forever
as
a
standalone
clothes.
B
If
you
have
a
different
thing
that
you
want
to
query
our
query:
query
arrow
directly,
you
could
do
that
so
currently
there
are
15
of
plugins
and
notably
there's
policy
support,
which
is
translation,
QEP,
there's,
there's
the
connectors,
there's
Neil
for
traders,
Postgres
sqr,
those
six
alive,
three
there's
AA
group
TB
and
their
spouses
them
so.
B
Recently,
which
allows
you
basic
protocol
process,
you
can,
you
can
be
work,
you
can
basically
get
all
the
metadata
file
system
out
and
and
hurried
as
if
it
is
story,
method
store.
So
I
think
I'm
running
almost
running
other
titles
here
now
so
I'm
just
want
to
so
this
writers
show
you
some
of
the
things
that
you
can
curry
like
that
are
common
tasks
in
data
and
kind
of
data
repositories
like
you
can
do
return
out
data
objects,
ID,
delay,
names,
etc.
B
So
you
can
no
need
to
worry,
but
you
can
also
do
the
insert
or
the
twenty
lead,
so
so
I
think
I'm
gonna
stop
here,
I
think
I'm
at
ten
minutes
now
so
I
actually
had
prepared
two
examples,
but
I
wasn't
trying
to
talk
in
the
ten
minute
and
I
think
it
somewhere.
I
can
explain
more
about
I.
Think
that's
cuz!
It's
a
key.
B
An
example:
okay,
thank
you,
okay,
so
so
this
is
an
example
of
policy
metadata
access
control.
So,
on
the
left
hand,
side
there's
a
if
this
is
basically
what
currently
how
this
is
dated
forever
simulating
occurs.
What
idris
Kearney
does
Eric
Kearney?
Has
this
post
press
plug-in,
which
goes
to
cos
squared
exactly,
and
it
is
it's
impossible
to
go
to
two
databases
at
the
same
time,
let
alone
two
different
types
of
databases?
B
So
so,
but
but
when,
but
with
Carrero,
we
can
do
a
little
bit
more.
So
we
can
kind
of
enhance,
what's
show
on
the
left,
with
a
new
database
that
is
shown
on
the
which,
which
is
new
per
day,
and
the
resulting
system
is
shown
on
the
right.
So
so
the
idea
here
is
I
added
this
new
database,
which
control
contains
the
access
control
date,
has
coordinate
data
because
currently
I
rod
doesn't
provide
iris
allows
you
to
a
residence,
a
pool,
access
control
for
all
the
data
objects,
but
not
for
the
house.
B
B
You
could
you
can
do
it
with
arrow
by
just
you
know,
just
integrate
just
put
this
put
a
new
QEP
in
square
arrow,
and
then
then
the
extra
control
metadata
becomes
available.
However,
that
doesn't
mean
that
the
ISO
Tran
control
is
being
enforced
to
just
having
the
access
from
for
data
available
doesn't
mean,
is
being
forced,
so
we
need
to
write
some
policies.
So
in
the
middle
is
translation.
Q
AP
allows
you
to
basically
create
policies
that
kind
of
combines
different
metadata
from
different
sources
and
extra
enforcer.
B
Certain
kind
of
policy
C
is
based
on
these
metadata,
for
example,
here
I.
This
is
a
baseline
system
without
access
control.
So
so
I
have
this
method
that
I
define
a
predicate
map.
Basically,
so
so
we're
not
going
to
look
into
too
much
details,
you
can
think
of
this
Medhat,
representing
the
the
whether.
B
Metadata
metadata
item
or
not
right,
so
in
the
baseline
it
doesn't
consult
the
access
control
data
it
just
go
to
it.
Just
goes:
I
wrote
the
the
metadata
directly,
so
you
can
insert
and
delete
you
can
query
as
much
as
you
are
right.
So
then
we
can
do.
Is
we
can
add.
We
can
import
this
access
and
control
data
from
New
York,
okay,
as
shown
next
to
the
label
one,
and
we
then
we
Hein,
draw
metadata.
A
B
Are
shown
next
to
label
two,
so
this
way
the
user
won't
be
able
to
access
directly
and
and
the
next
slide
we
define
our
new
predicate.
That
basically
says:
okay
I
had
to
get
the
client
ID
of
the
client
and
I
asked
whether
this
client
has
read
access
to
this
metadata
item
or
not.
If
the
client
has
with
Isis
method
to
great
access,
then
I
just
go
to
the
raw
predicate
I
get
the
better
sales
tax.
B
B
With
Iraq,
this
is
to
show
you
the
flexibility
of
Carrero
you
can
offer
to
you
can
say
if
you
have
a
right
Isis,
you
don't
automatically
get
with
access.
You
can
do
that
too.
You
just
delete
a
slide,
the
time
that
not
basically
gives
you
read
access
from
right
access
and
for
for
users
the
same
thing.
A
similar
thing
is
basis.
A
B
Use
is
a
little
bit
different
from
insert
because
it
when
you
insert
method
in
the
metadata
is
not
available
here,
but
when
you
delete
metadata
to
the
vehicle.
So
if
you
have,
if
so
here,
I'm
just
saying
okay,
so
if
you
have
access
to
the
data
object,
then
you
can
delete
the
the
metadata
object
and
here.
A
B
So
it's
basically
a
hover
when
you
delete
a
metadata
object,
you
also
have
to
delete
the
access
actually
control
data
for
all
other
users
that
may
have
access
to
this
method
of
the
item,
because
the
method
I
come
is
gone
right.
You
want
to
keep
dangling
access
control
date
there,
so
you
so.
This
is
basic.
Next.
A
B
A
Maybe
yeah
Carl
can
send
out
a
contact
information
for
you
if
people
want
to
get
in
touch
because
I
want
to
be
respectful
of
the
other
speakers
and
make
sure
they
have
time
to
talk
as
well.
Well,
actually,
now
I'm
looking
at
the
way
up,
we
have
some
time
we
gonna
just
go
ahead,
but
I
wanted
to
give
a
quick,
quick,
rundown.
B
Okay,
so
I
could
give
a
quick
rundown
be
Scott,
so
so
not
all
of
these
are
very
interesting.
Some
of
these
are
like
a
polyfill
kind
of
things.
So
if
you
have
a
database
that
doesn't
have
Greg's
regular
expression
capabilities,
but
this
record
QAP
basically
allows
you
to
kind
of
do
a
guess,
programmatically,
but.
B
A
B
Your
data
in
the
database
or
or
programmatically,
but
that's
the
nice
thing
about
these
things
like
mutable
map.
If
you
just
wanted
something,
you
can
get
memory,
you
can
you
can
do
this
and
then
there's
this
cache.
This
cache
is
basically
something
you
can.
You
can
use
to
to
make
make
your
assistant
run
faster
because
because
there's
a
translation
cost.
When
you
use
your
query
error
language,
it
has
to
go,
throw
a
party
all
the
databases,
all
the
plugins
and
and
kind
of
translated
for
error
or
query
to
the
two
different
databases.
So
you.
A
B
Do
this
every
time
they
usually
the
same
query,
so
you
can
just
say:
okay,
Agrico
cast
here:
it's
translation
is
for
are
for
for
apologies,
and
it's
some
QA
p
is
for
so
the
nice
thing
about
Carrero
is
the
court.
The
service
is
actually
very
similar.
Service
doesn't
even
see
multiple
queue.
Aps
the
server
only
sees
one
QAP,
but
because
the
security
could
be
a
style
Q,
a
P.
So
under
the
sound
q
AP
you
can
have
both
both
q
ApS.
B
So
you
can
even
say,
like
the
server
doesn't
even
do
policy
right,
the
server
doesn't
do.
Caching
and
all
these
things
are
done
in
a
few
feeds,
and
this
Q
is
Q.
A
P
is
very
increasing,
so
this
allows
this
the
server
to
communicate
the
Carrero
service
to
community,
with
a
one
instance
of
courier
service
to
communicate
with
another.
B
Of
courier
service
over
the
internet
through
a
kind
of
QAP
interface,
so
the
service-
still
oh,
oh
I'm,
just
talking
to
acuity
environment
Isis,
does
behind
gob
is
actually
another
courier
service.
So
this
place
allows
you
to
distribute
query
in
innocence
and
against
this
busy
life
search.
All
these
things
are
for
for
for
per
database,
and
it
is
for
call
system
and
there's
some
utility
ones
like
for
equal
text
and
yeah
they
absorb
so
it
Macon.
A
A
B
It
yes
well,
the
Federation
is
very
limited:
is
it's
not
as
sophisticated
as
the
iris
Federation
so
and
so
on?
So,
like
my
query
or
doesn't
handle
any
authentication
or
encryption,
so
the
price
QIT
is
updating.
Allow
you
to
basically
stand
on
multiple
instances
and
kind
of
handled,
and
you
know
allows
the
kind
of
have,
like
you
know,
load
module
server
to
handle
the
same
workload.