►
From YouTube: OCB AMA: Scribe - Asynchronous Data Replication with John Strunk and Scott Creeley (Red Hat)
Description
Scribe is exciting for its unique, light weight and agnostic data movement capabilities for any storage type including File, Block and Object. Scribe also supports all Kubernetes based storage drivers, CSI and non-CSI compliant. It takes advantage of best-of-breed industry data replication technologies using rsync and rclone controlled by a single CR based interface. Scribe also utilizes CSI capabilities like Snapshots and VolumeClones if supported by the driver. Join this briefing for a introduction, demo and live AMA session with Scribe project leads!
https://github.com/backube/scribe
A
All
right,
everybody
happy
monday,
welcome
back
to
openshift
commons
today,
as
we
like
to
do
on
mondays.
We
have
an
upstream
project
with
us
and
many
of
the
team
leaders
and
we're
going
to
make
them
tell
us
all
about
their
their
project,
and
if
I
get
this
right,
it's
asynchronous
data
replication,
which
is
what
scribe
does
for
us,
and
we
have
john
strunk
ryan
cook
parole
singh.
I
see
scott
creely
somewhere
in
the
background
and
gee
margot
margalit
I'll.
Let
you
introduce
your
team
john
and
there's
going
to
be
some
live-ish
demos
here.
A
B
Awesome,
thank
you
diane,
so
yeah
we're
we're
here
today
to
tell
you
a
little
bit
about
scribe
and
there's
a
number
of
folks
working
on
the
project.
Right
now,
as
diane
said,
I'm
john
strunk
and
we've
got
ryan
parole,
guy
and
scott
that
are
also
helping
out,
and
so,
let's,
let's
start
with
a
quick
overview
of
of
what
we're
going
to
cover
today.
B
So
we're
going
to
start
with
a
few
intro
slides
on
what
exactly
scribe
is
about,
and
then
we've
got
three
demos
because
everybody
likes
demos,
so
we
got
three
demos
keyed
up
for
you
and
then
we'll
finish
it
off
with
a
little
bit
of
q,
a
let's
get
to
it.
B
So
let's
talk
a
little
bit
about
your
data
right,
so
you
know
kubernetes
and
git
ops.
It
works
really
well
for
stateless
applications
right.
If
your
pod
crashes,
you
lose
a
node.
Kubernetes
is
more
than
happy
to
reschedule
your
pods
somewhere
else
in
the
cluster
and
in
the
event
that,
if
that
your
cluster
goes
down,
if
you're
managing
your
application
and
configuration
via
git
ops,
you
have
all
that
information
and
it's
easy
enough
to
just
apply
that.
B
So
what
scribe
is
is
a
kubernetes
operator
that
is
designed
to
do
cross-cluster,
asynchronous
data
replication,
and
it
does
this
in
a
storage
system
independent
way.
So
you
don't
actually
need
the
underlying
storage
system
to
support
the
data
replication
right,
so
we
handle
it
all
on
top
and
one
of
the
nice
bits
about
that
is
that
you're
not
forced
to
run
the
same,
the
same
storage
system
on
all
of
your
sites.
So,
for
example,
if
one
of
your
clusters
is
running
in
the
cloud,
you
can
use
a
storage
system
that
is
optimized
for
that.
B
Scribe
makes
use
of
csi
capabilities
of
clones
and
snapshots
if
they're
available.
So
we
use
that
in
order
to
create
point
in
time
copies
of
your
data
to
replicate.
But
if
your
storage
driver
storage
system
doesn't
support
it,
that's
okay,
we
can
still
copy
your
data
without
and
as
well.
Scribe
is
designed
around
an
extensible
architecture
so
that
if
the
storage
system
does
support
optimized
replication
natively,
it
could
also
be
integrated
with
scribe.
B
So
when
we
think
about
where
all
you
know
where
all
we
might
want
to
use
scribe,
probably
the
first
thing
that
comes
to
mind
is
disaster.
Recovery
right
so
replicating
your
your
applications,
data
from
a
primary
cluster
to
a
secondary
cluster,
but
it's
also
useful
for
data
distribution
scenarios.
B
It's
also
useful
for
for
say,
data
migration
like
within
your
cluster.
So
if
you
want
to
swap
out
your
storage
system,
maybe
change
vendors,
that
sort
of
thing
you
could
use
scribe
to
move
your
your
data.
That
way,
as
well
as
migrating
data
between
cloud
and
on-prem
environments,
you
also
use
it
for
off-site
analytics
or
even
just
replicating
your
production
data
for
dev
and
test
scenarios.
B
So
scribe
is
built
around
this
notion
of
data
movers,
so
there
is,
we
have
one
data
mover
that
is
based
on
rsync
and
that's
really
optimized
for
one-to-one
volume
relationships
so,
for
example,
that
asynchronous
disaster
recovery
scenario
right
where
you're
trying
to
replicate
a
volume
from
a
primary
to
a
secondary
cluster.
B
B
B
We
are
in
the
process
of
adding
a
third
data
mover
that
is
based
on
rustic
and
that
is
to
handle
more
archive
type
use
cases,
we're
working
on
adding
metrics
to
the
operator
so
that
it's
easy
to
keep
track
of
the
current
status
of
the
replication
relationships
and
as
well
we're
working
on
adding
some
helper
programs
to
make
it
a
little
bit
easier
to
replicate
data
into
and
out
of
kubernetes
environments
as
a
whole.
B
Right,
because
we
realize
that
not
everybody's
I.t
environment
is
100,
cube
at
this
point
and
then
finally,
where
to
find
us.
So
you
can
go
check
out
the
documentation
it's
over
at
scribe,
dash
replication
on
read
the
docs
or
you
can
check
out
our
code.
That's
on
github
and
we'll
put
the
slide
back
up
again
at
the
end,
so
that
is
the
overview.
B
And
now
what
we're
going
to
do
is
we've
got
three
demos
that
are
keyed
up
for
you,
so
the
first
one
is
going
to
be
showing
off
the
rsync
based
data
mover
specifically
in
a
disaster
recovery
scenario.
It
kind
of
takes
a
look
behind
the
scenes
a
little
bit
about
kind
of
what
scribe
is
doing
in
the
background.
B
B
So
what
we
have
is
two
different
clusters
right,
so
we've
got
a
primary
site
that
is
running
an
application
that
has
some
data
volume
and
then
we've
got
a
secondary
site
and
we
want
to
replicate
that
data
over
to
the
secondary
site
so
that
we
could
move
our
application.
B
And
so
what
we're
going
to
do
in
the
demo
is
we're
going
to
create
a
custom
resource,
a
replication
source
over
on
the
primary
side
that
points
at
that
data
volume
to
replicate
and
then
on
the
secondary
side,
we're
going
to
create
a
replication
destination
that
provides
a
target
for
us
to
replicate
the
data
to
once
those
are
in
place.
What
scribe
is
going
to
do
is
create
a
data
pipeline
from
the
primary
site
over
to
the
secondary.
B
B
B
B
This
is
a
quick
demo
of
cross
cluster
replication
using
scribe.
What
we
see
here
is:
I
have
two
clusters.
I
have
a
primary
cluster
in
this
first
window
running
in
us
west
and
over
here
in
the
other
window.
I
have
a
secondary
cluster
running
in
us
east
and
what
we're
going
to
do
is
use
scribe
to
ensure
that
our
data
is
replicated
so
that
we
can
move
an
application
between
clusters
if
necessary.
So,
let's
start
by
taking
a
look
at
our
primary
cluster.
B
So
what
we
see
in
this
cluster
is
that
I
have
a
simple
wiki
application.
That's
running
and
here's
the
pod
for
the
wiki
and
the
data
is
residing
in
a
pvc
also
in
this
namespace,
and
what
we
need
to
do
is
replicate
this
data
over
to
our
secondary
cluster,
so
that
we
could
fail
over
our
application
if
necessary.
If
we
take
a
look
at
the
secondary
cluster,
what
we'll
see
over
here
is
that
we
have
the
same
application
deployed.
B
However,
it's
currently
scaled
down
to
zero
and
you'll
also
notice
that
it
doesn't
have
any
pvc
associated
with
it,
because
we
haven't
replicated
our
data
over
here.
Yet
so,
like
we
saw
earlier
in
the
slides,
let's
go
ahead
and
set
up
scribe
to
do
the
replication,
so
first
thing
that
we're
going
to
do
is
set
up
the
replication
destination
over
here
on
the
secondary
cluster.
B
When
we
take
a
look
at
the
scribe
cr,
what
we
see
is
that
we're
asking
it
to
create
a
10
gig
volume
for
the
incoming
data
and
then
with
each
sync
iteration
to
preserve
a
point
in
time,
image
via
snapshot.
So
let's
go
ahead
and
add
that
to
the
cluster,
now
that
we've
inserted
that
into
the
cluster,
let's
go
and
check
out
our
namespace
again.
B
Now
what
we
see
is
here's
that
replication
destination
that
we
just
created
and
the
operator
has
taken
that
and
it's
working
on
setting
up
the
infrastructure
necessary
to
accept
the
incoming
transfers.
So
the
first
thing
that
we
see
is
that
the
operator
has
created
a
pvc
in
the
namespace.
That's
going
to
receive
the
incoming
data.
It
has
also
set
up
a
load
balancer,
that's
going
to
act
as
the
end
point
for
our
source
to
eventually
connect
to,
and
in
a
minute
once
this
pvc
gets
finishes
binding.
B
B
So
now
that
that's
done,
let's
take
a
look
at
the
custom
resource
and
what
we
see
here
is
that
scribe
has
added
to
the
status
field,
the
connection
parameters
necessary
for
us
to
configure
our
source
so
that
it
can
transfer
data
to
this
location
and
what
that
information
consists
of
is
the
address
to
connect
to
right.
So
that's
our
load
balancer
as
well.
It
has
given
us
a
set
of
ssh
keys
that
exist
in
a
secret
here
in
this
namespace,
and
so
we
need
to
transfer
that
secret
over
to
our
primary
cluster.
B
So
what
I'm
going
to
do
is
I'm
going
to
save
that
secret
out
to
a
file
and
now,
let's
go
over
to
our
primary
cluster
and
set
up
the
replication
source
so
over
here?
The
first
thing
we're
going
to
do
is
insert
that
secret
and
now,
let's
edit
the
replication
source
cr.
Okay,
what
this
is
going
to
do
is
define
what
data
we
need
to
replicate.
So
the
first
thing,
we'll
notice
here,
is
we're
specifying
a
source
pvc
to
replicate.
That
is
our
wiki's
pvc.
B
In
the
meantime,
let's
take
a
look
at
our
application,
so
this
is
the
endpoint
for
our
application
running
on
our
primary
cluster.
Here
we
see
it's
just
a
simple
wiki
we
can
come
and
we
can
edit
the
data
and
we
see
that
makes
changes
and
eventually
those
changes
will
get
replicated
over
to
the
other
side.
So
it
looks
like
right
now.
Scribe
is
in
the
process
of
replicating
data
over
to
the
remote
site.
Well,
that
happens.
Let's
go
and
see.
B
B
Okay,
here
we
see
that
our
next
sync
iteration
has
started
so
scribe
has
taken
a
snapshot
of
the
application's
volume
and
we're
currently
waiting
for
that
to
be
processed
into
a
usable
snapshot
once
that
succeeds,
we'll
again
get
a
new
persistent
volume
that
will
be
used
by
the
rsync
data
mover
to
update
the
volume
on
the
remote
side.
Okay,
so
our
snapshot
is
ready
to
use.
B
B
There
we
go
now
over
here
on
the
remote
site.
It
has
updated
the
volume
snapshot
to
contain
the
the
most
recent
data,
but
let's
go
and
take
a
look
again
at
our
replication
destination
and
again
take
a
look
at
the
status
and
what
we
see
is
this
latest
image
field
and
that
always
tells
us
what
the
most
recent
volume
snapshot
is,
so
that
if
we
want
to
spin
up
our
application,
we
can
do
that.
B
So
we're
going
to
take
the
name
of
that
volume
snapshot
and
we
are
going
to
add
that
to
our
customize
and
use
that
to
scale
up
our
application
over
here
on
the
secondary
site.
And
so
what
happens
with
that?
Customize
is.
It
goes
and
creates
a
new
pvc
from
that
snapshot
that
latest
snapshot
as
well.
It
scales
up
the
wiki
deployment.
B
B
Here's
the
pod,
for
it
again
we're
waiting
for
that,
but
as
soon
as
that
becomes
ready,
then
we'll
be
able
to
head
back
over
to
our
browser,
and
we
should
be
able
to
see
the
edit
that
we
made
to
the
wiki
back
over
on
our
primary
site,
except
it'll,
be
here
on
our
secondary
okay.
Our
pod
is
ready,
so
I'm
going
to
copy
the
address
here
for
our
secondary
site.
B
C
Oh
no,
we
have,
I
have
the
control,
I
am
showing
the
sling.
C
Okay,
so
to
wrap
up
the
key
points
from
the
first
demo,
we
saw
the
replication
of
a
wiki
application
from
primary
to
secondary,
and
to
do
that,
we
saw
how
the
scribe
operator
replicates
by
using
point-in-time
copy
of
the
application
data,
and
it
preserves
that
image
name
in
its
cr
on
the
secondary
site
to
restore
the
application
on
the
secondary
site.
All
we
need
to
do
is
ensure
that
the
test
that
the
that
the
destination
or
the
secondary
pvc
is
restored
from
the
snapshot
that
is
preserved
in
the
cr.
C
C
So
whenever
the
at
the
start
of
each
replication
iteration,
what
the
operator
does,
it
creates
a
snapshot
or
copy
of
the
volume
using
csi
driver
if
available
or
it
does
in
a
non-csi
fashion
if
they
are
not
available
in
the
cluster.
So
once
it
has
created
the
snapshot,
it
moves
that
data
onto
an
intermediate
storage.
C
C
C
So
you
can
see
that
our
clone
is
our
clone-based
data.
Mover
job
is
a
push
and
pull
mechanism
where
the
central
hub
or
the
primary
side
pushes
the
data
onto
an
intermediate
storage
and
the
edge
cluster
pulls
the
data
from
the
intermediate
storage
for
the
demo
that
I'm
going
to
show.
Next,
I
have
a
kind
cluster
with
two
namespaces,
the
source
namespace,
which
act
as
a
primary
site
containing
the
source
of
truth
and
the
desk
name
space
which
act
as
the
its
side
that
will
pull
data.
C
C
C
Okay,
so
it
had,
it
is
creating
the
container.
Let's
give
it
few
seconds
to
get
up
and
started.
C
Okay,
as
you
can
see
that
my
sql
application
is
up
and
running,
let's
see
what
pvc
it
is
using
to
store
its.
C
C
So
if
you
see
here,
it
is
using
a
pvc
which
is
called
mysql,
pb
claim
and
what
scribe
will
do
now
is
going
to
create
a
point
in
time
snapshot
of
this
pvc
and
copy
that
data
onto
the
intermediate
storage,
which
is
our
s3
bucket.
C
C
C
So
now
that
we
have
created
a
new
database,
it's
time
for
us
to
deploy
the
replication
source
cr
on
the
primary
side,
but
before
we
do
that,
we
need
to
create
a
secret
which
is
called
our
clone
secret
and
the
operator
would
be
using
the
secret
to
push
the
data
on
the
intermediate
side,
which
is
our
case.
Is
a
aws.
S3
object
store,
so
I
am
going
to
go
ahead
and
deploy
the.
C
C
C
C
C
C
So
basically,
that
describe
operator
creates
a
snapshot
based
out
of
this
ppc
and
it
creates
a
temporary
pvc
which
is
called
scribe.
Src
database
source
and
the
rclone
based
data
move
job
uses
this
pvc
to
push
the
data
onto
the
intermediate
object
store.
So
let's,
let's
wait
for
the
operator
to
finish
moving
the
data.
C
Okay,
as
you
can
see
that
the
scribe
operator
has
finished,
and
it's
now
time
to
verify
if
the
replication
was
is
happening
or
is
initiated
on
the
destination
side
or
not.
C
So
to
do
that,
I
am
on
my
destination
namespace
and
just
like
the
source
side,
you
have
to
create
a
secret
which
you
can
see
over
here
and
now,
I'm
going
to
deploy
the
replication
destination
cr.
C
C
Let's
extract
the
so
let's
go
and
verify
what
scribe
operator
has
done
on
the
destination
side
and
to
do
that,
let's
see
what
is
happening
in
the
replication
destination
cr.
C
So
again
you
see
that
it
is
a
trigger
paste
that
is
scheduled
to
run
every
five
minutes
and
in
the
status
field.
You
can
see
that
the
first
week
consolation
is
complete,
and
over
here
you
can
see
that
it
took
the
data
from
the
object,
store
and
created
a
snapshot
out
of
it
and
that
snapshot
image
name
is
preserved
in
the
latest
image
field
of
the
replication
destination.
C
So
now,
just
to
to
sync
your
database
application
on
the
it's
side,
all
we
need
to
do
is
ensure
that
the
database
application
restores
its
pvc
using
this
snapshot
name.
Let's
do
that
and
see
if
the
database
that
we
created
called
synced
is
present
on
the
destination
side
or
not.
C
So
I
will
extract
the
latest
point
in
time:
snapshot
image,
which
is
this,
and
I
am
going
to
create
a
pvc
out
of
it
which
the
edge
database
application
would
be
pointing
to
so.
I've
used
this
pv,
this
snapshot
name
and
I'm
going
to
replace
it
into
I'm
going
to
create
the
pvc
out
of.
C
C
C
Okay,
so
back
to
me,
and
so
we
saw
the
our
clone
based
replication
that
scribe
uses,
and
we
think
that
this
has
potential
use
cases
in
its
scenarios.
So
what
we
did
in
this
demo
is,
we
did
a
replication
of
mysql
based
application
from
one
namespace
to
a
different
namespace
and
the
scribe
operator
uses
an
intermediate
storage
like
s3
object,
store
the
primary
site
pushes
the
data
to
the
intermediate
storage,
while
the
edge
size
pulls
the
data
from
the
intermediate
storage.
C
So
so
far
I
have
been
talking
about
how
this
white
fan
out
replication
has
potential
for
its
scenarios,
but
I
didn't
actually
showed
you
how
you
can
move
it
between
different
clusters.
Didn't
I
all
I
showed
you
is
like
how
you
move
from
one
space
to
another
namespace,
so
to
prove
my
claim.
We
have
a
third
demo.
That's
coming
and
ryan
will
show
you
how
you
can
integrate
scribe
with
red
hat
advanced
cluster
management
to
easily
scale
applications
over
tv
run.
D
There
we
go
all
right
so,
as
paul
said
this
last
demonstration,
it's
going
to
kind
of
glue
all
of
the
pieces
together,
john
and
pearl
both
talked
about
and
demonstrated
how
scrub
creates
a
kubernetes
centric
way
of
managing
replication.
Pretty
much.
The
easiest
way
to
say
is:
there's
yaml
files
to
control
your
replication.
D
So
the
really
cool
part
about
that
and
john
mentioned
early
on
is
that
we
can
use
git,
ops,
tooling,
such
as
red
hat,
advanced
cluster
management,
argo
cd
to
manage
our
application
placement
and
then,
with
the
addition
of
scribe,
then
we
can
handle
our
replication
replacement.
So
with
both
of
those
combined,
then
we
can
actually
just
scale
out
sides
as
we
need
as
we
see
fit.
D
So,
as
you
see
from
this
page
here,
we'll
start
out
with
the
primary
cluster
you'll
see
it's
rhacm,
it's
labeled
within
rackham
as
local
cluster,
and
this
will
be
running
ocs
as
a
storage
class.
D
We
have
our
local
cluster,
we
have
our
aws
cluster
and
we
have
a
barrier
metal
cluster.
Our
application
will
be
created
and
updated
here
and
then
any
changes
to
that
application
will
get
a
snapshot.
D
But
I
will
move
the
data
to
a
bucket
and
then
we'll
actually
be
able
to
update
both
clusters
at
the
same
time,
or
even
if
one
of
those
clusters
is
on
a
boat
or
on
a
plane.
Whenever
that
cluster
comes
back
in
connectivity
with
acm,
it
will
be
replicated,
the
new
data
will
be
there.
Any
application
updates
will
happen.
D
So,
as
you
see,
combining
both
of
these
technologies
is
just
a
huge
strength,
because
you're
no
longer
having
to
go
to
each
cluster
and
kind
of
poke.
It
so,
like
I
said
at
our
primary
site.
Any
changes
that
we
make
are
going
to
be
sent
to
these
other
clusters
and
with
that
diane
I
think
I'm
ready
for
the
video.
D
So,
jumping
into
the
rackham
console,
we
will
take
a
look
at
our
clusters.
Currently,
there
is
only
our
source
cluster
that
we
showed
earlier,
which
by
default,
is
named
local
cluster
raicum
handles
application
placement
based
on
labels
shown
here.
You
will
see
storage
of
ocs
in
site
headquarters,
as
well
as
some
various
auto-generated
labels,
the
storage
label
and
the
site.
Labels
are
used
to
determine
what
storage
class
to
use
and
whether
the
location
is
the
replication
source
or
destination.
D
To
save
some
time
describe
components,
storage,
class
modifications,
our
replication
source
and
destination
objects
which
parole
and
john
had
showed
earlier,
and
our
docuwiki
site
have
already
been
defined
within
rackham.
Here's,
our
docuwiki
page
with
a
simple
hello
message.
Now
it's
time
to
bring
up
the
remote
sites.
D
D
As
you
can
see,
the
docuwiki
site
has
been
deployed
on
our
new
cluster
automatically
thanks
to
rackham
and
scribe
with
the
same
hello
message
that
matches
our
primary
deployment.
Now
we
will
import
a
metal
cluster.
This
cluster
was
too
small
to
deploy
openshift
container
storage,
so
we
will
use
hostpath
as
our
storage
class,
we
will
use
the
labels
storage,
equals
host
path
in
site
equals
remote.
D
When
the
metal
cluster
becomes
ready
within
rackham,
our
docuwiki
application
will
deploy,
we
will
now
update
our
docu
page.
The
changes
will
be
synchronized
to
the
remote
clusters
and
when
we
are
ready
to
update
our
docuwiki
site,
we
will
update
our
pvc
definition
and
deployment
within
our
git
repository.
D
A
D
So,
actually,
that's
great,
though,
that
you
asked
that
I
mean
if
we
could
scale
out
clusters
and
things
are
that
simplistic
to
do.
That
shows
the
strength
of
not
only
rackham
but
scribe.
So
that
is
perfect.
The
parts
that
we
did
definitely
cut
out
of
that,
if
everybody
you
know,
has
spun
up
an
openshift
cluster
before
you
know,
you
know
how
long
it
takes
to
spin
up
a
cluster.
D
So
we
did
cut
those
parts,
but
it
just
definitely
shows
that
by
implementing
a
mature
process
having
a
get
ops
tool
like
argo,
cd
or
rackham,
and
then
with
scribe,
you
could
just
have
this
beautiful
scenario
and
if
you're
talking
about
like
a
factory,
you
could
just
spin
up
new
factory
locations
as
you
see
fit.
If
you're
a
restaurant
you
can
pop
up
faster
than
you
can
even
imagine
so,
and
just
the
ability
to
scale
and
not
have
to
worry
a
new
way
to
figure
out
how
to
get
your
data.
D
That's
that's
the
key,
that's
a
good
thing.
You
know
keeping
it
simple.
Definitely
it's
going
to
make
everybody's
lives
easier.
A
All
right:
well:
are
we
ready
for
a
little
q,
a
and
conversation
about
this
now,
john
and
peru?
So
that's.
E
A
This
seems
to
me
and
I'm
gonna-
I
need
a
few
folks
here
and
and
and
paul
if
you
want
to
join
in
as
well.
This
seems
to
be
a
very
early
stage
and
someone
is
sharing
their
screen
with
me
still
and
I'm
getting
their
slack
channel.
So
that's
all
right
happens
all
the
time.
A
It
seems
to
me
a
very
a
new
project
as
like
this,
when
you
guys
approached
me
to
do
this
talk.
How
long
has
scribe
been
around.
B
So
I
guess
it's
really.
We
only
really
got
started
back
around
what
maybe
october
or
something
of
of
last
year.
You
know
it's
something
that
you
know
we
want
to
get
started
for
a
while,
but
you
know
just
in
terms
of
of
scheduling
and
that
kind
of
thing.
So
it
is,
it's
still
really
really
new,
but
we're
excited
with
the
with
the
potential.
A
I
know
it
seems
huge
and
like
lately
it
seems
and
no
surprise
to
anyone.
We've
been
having
all
these
edge
conversations,
so
it
seemed
like
when
you
walk
through
the
beginning
in
the
beginning
of
talk,
this
edge
data
distribution
problem
and
solving
that
problem
for
edge
and
iot
devices
that
that
are
out.
There
is
huge
and
it
seems
like
it's
a
big
part
of
what
we're
going
to
need
to
move
forward
successfully
in
the
edge
space
anyways.
So
that's
that
part's
really
amazing
and
cool.
A
So
I
can
see
lots
of
applications
for
that.
Are
you
working
with
people
who
are
deploying
things
on
the
edge?
Is
that
one
of
the
the
major
reasons
for
building
scribe?
Or
was
it
just
the
data
replication
that
drove
you
to
start?
It.
C
There
was
a
time
we
were
getting
a
lot
of
emails
and
people
were
talking
about
edge
clusters,
single
node
openshift,
and
we
kind
of
thought
like
how
can
we
have
a
solution
that
is
storage
independent
because
with
scribe
you
don't
even
have
to
depend
and
be
dependent
on
the
underlying
storage
system
you're
using
so
let's
say
if
one
cluster
is
using
gp2
when
someone
is
using
hostpath,
you
can
use
tribe
to
replicate
data,
irrespective
of
the
underneath
storage.
So
as
we
started
to
build
scribe,
we
thought
like.
A
So
yeah
yeah,
so
everybody's
talking
edge
these
days
and
you
know,
there's
that's
that's
sort
of
the
new
hotness
right
now.
I
think
in
my
little
world
of
spheres.
So
that's
it's
amazing
to
see
this
done
and
I
totally
appreciate
it.
So
this
the
code
is
in
github
correct.
Yes,
I
saw
a
url.
E
A
There,
how
are
you
use
besides
things
like
this
openshift
commons
briefing?
How
are
you
interacting
with
the
community?
Are
you
is
this
something
that
you're
going
to
try
and
grow
a
big
community
around?
Is
this
just
a
piece
of
a
bigger
project?
B
So
we
would,
you
know
anybody,
that's
interested,
please,
you
know
come
come
visit,
the
the
github
repo.
You
know
open
issues
start
discussions,
whatever
that's
kind
of
our
our
primary
way.
Right
now
we
are,
you
know,
trying
to
get
scribed
into
various
forums.
You
know
to
give
talks
and
and
that
sort
of
thing
but
yeah.
B
No
definitely
you
know
try
it
out
check
it
out
on
on
artifact
hub
and
then
you
know
send
us
your
feedback,
open
issues,
I'm
sure
there's
a
bug
or
two
in
there
somewhere,
but
you
know
we'll
get
it.
A
Never
about
yeah,
it
seems
there's
a
couple
of
cigs
and
maybe
if,
if
paul's
around
that
it
would
be
great
to
get
this
in
front
of
in
in
the
cncf.
So
I
can
see
a
lot
of
interest
coming
from
that
and
that
maybe
this
is
something
we
might
want
to
throw
into
a
sandbox
in
the
cncf
and
some
not
too
distant
future,
because
I
think
that's
a
great
way
to
get
other
people
to
participate
in
a
project
as
well
and
as
well
as
to
get
other
kubernetes
folks
there.
A
So
yeah
the
artifact
hub,
helm,
charts
the
operator
stuff
sounds
like
it's
coming
soon
to
a
theater
near
me.
So
that's
that's
good!
Is
there
an
operator
a
section
of
the
repo
itself
where
you're
working
on
that
in
public
or
is
that
still
behind
some
firewall
or
on
something.
B
Yeah,
no,
it's
it's
all
there
in
the
the
scribe,
repo!
So
there's
the
there's
the
kubernetes
operator
and
then
there
are-
and
so
that's
like
one
container
whenever
you
build
it
and
then
there
are
these
other.
The
data
mover
containers
right
so
one
for
rsync,
one
for
our
clone
and
then
the
one
that
we
are
working
on
for
rustic.
A
Yeah,
it
was
new,
it
was
a
new
topic
to
me
paul.
So
that's
why
I
was
really
interested
in
getting
you
guys
on,
because
I
you
know
it's
like
whoa.
Where
did
this
come
out
of,
and
this
is
because
we've
had
in
the
okd
working
group
a
number
of
conversations
with
people,
especially
from
the
fedora
iot
and
the
core
os
teams,
and
you
know
doing
kind
of
interesting
stories,
especially
around
bare
metal
and
edge
stuff.
A
So
it's
definitely
something
we
want
to
take
you
to
the
okd
working
group
and
make
you
show
off
as
well.
So
we'll
share
this
with
them,
but
I
think
this
has
got
a
very
broad
reach,
so
it'd
be
very
interesting
to
see
how
other
people
respond
to
this.
Is
there
anything
out
there
that
competes
with
this
or
is
similar
to
this
any
other
projects.
B
So
the
the
thing
is
right:
the
asynchronous
data
replication-
or
I
mean
just
data
replication
in
general-
is
something
that
has
traditionally
been
done
as
a
part
of
the
storage
system.
Right
because
you
know
in
sort
in
traditional
I.t
environments,
you
know
it
was.
It
was
all
up
to
the
vendor
to
do
that.
B
Replication-
and
you
know
this
was
actually
kind
of
one
of
the
one
of
the
reasons
why
I
thought
it
was
important
for
us
to
build
this
operator
that
you
know
to
be
a
cross
vendor
replication
engine
right,
because
there's
there's
kind
of
that
that
lock-in
of
relying
on
the
storage
vendor
to
do
the
replication-
and
you
know
not
all
of
them-
support
it
and
that
sort
of
thing,
whereas
you
know
kubernetes,
is
really
good
about
abstracting
away
the
underlying
hardware
and
environment
that
you're
in,
and
so
you
know,
we
thought
that
it
was
really
important
to
be
able
to
provide
those
advanced
data
management
capabilities.
B
A
So
yeah,
this
is
the
question
that
I
always
ask
myself
is:
what
overhead
does
it
add
to
an
application,
and
can
you
take
a
look,
I
noticed
the
reference
to
prometheus
and
other
monitoring
things,
but
I'm
wondering
what
what
overhead
does
it
add
to
my
application.
B
B
You
know
natively
it's
going
to
do
that
more
efficient
than
anything
that
you
know
a
scribe
operator
is
going
to
be
able
to
to
do,
but
the
the
data
movers
that
we've
chosen-
you
know
like
rsync,
it
does
go
in
and
it
calculates
just
the
changes
that
have
been
made
in
the
volume,
and
so
it
does
try
and
and
minimize
the
amount
of
traffic
that
goes
over
the
network.
B
That
sort
of
thing
and-
and
so
you
know,
we've
we've
tried
to
make
it
fairly
lightweight
in
that
way,
and
so
we
we
think
that,
for
a
very
broad
spectrum
of
use
cases,
this
will
be
a
good
solution.
E
A
So
any
questions
out
there
in
the
chat
room,
I'm
not
seeing
any
yet
I'm
wondering
if
anyone's
actually
deployed
scribe
yet
in
production.
Is
this
still
that
new
that
we
don't
have
customers
or
end
users,
giving
you
feedback
yet.
B
Yeah,
it's
it's
still
pretty
new,
but
we're
we're
we're
talking
with
folks
trying
to
trying
to
convince
them
to
give
it
a
shot.
A
I
I
think
I
think
you've
got
a
really
good
shot.
I
I
can
think
right
off
the
top
of
my
head
about
two
or
three
folks
that
have
been
talking
about
this
problem
with
me,
so
I'm
going
to
definitely
hit
them
and
we've
got
one
question
coming
in:
could
the
traffic
traveling
over
be
compressed
and
that
thus
reducing
the
traffic.
B
Yeah,
absolutely
so
the
you
know
for
the
case
of
the
the
rsync
data
mover,
for
example.
It's
just
it's
the
rsync
protocol
over
an
ssh
tunnel
between
the
two
sides
right
and
so
the
ssh
connection
itself
does
compression.
You
know
rsync
is
doing
the
the
deltas
of
the
files.
So
yes.
F
Not
a
question,
but
I
just
wanted
to
say
before
we
get
too
far
afield
of
this
topic,
that
I
think
that
demos
and
presentation
you
gave
today
like
that
would
be
great
to
show
to
cncf
sig
storage
when
you
feel
like
you're,
ready.
A
Yeah
definitely
okay
yeah,
but
we
can
set
you
up
with
that.
That's
that's
what
I
was
trying
to
figure
out
which
which
one
definitely
storage
but
there's
a
few
others
too.
I
think
there's
even
an
edgy
say
coming
around
soon.
So
there's
another
question
coming
in.
Is
it
possible
to
encrypt
the
data.
B
So
the
the
data
whenever
it's
going
over
the
wire,
at
least
with
the
the
rsync
protocol,
it
is
going
over
an
ssh
connection
and
in
the
demo
there
we
had
to
copy
that
secret
from
from
one
side
to
the
other,
and
that
was
basically
moving
the
ssh
keys
from
one
side
to
the
other,
so
that
both
both
sides
could
authenticate
each
other
and
properly
encrypt
that
traffic.
E
F
Any
limits
on
the
number
of
edge
sites
that
can
pull
right
now.
B
So
one
of
the
sorry
I'm
just
trying
to
put
up
the
slide
again.
B
F
A
E
B
B
So
there's
there's
that
side
of
it.
Now,
in
terms
of
the
clones
and
and
snapshots
with
csi,
there
is
a
configuration
both
on
the
source
side
and
on
the
destination
side
that
allows
you
to
basically
enable
or
disable
whether
you
want
to
do
snapshotting.
B
So
on
the
source
side.
You've
actually
got
three
options.
You
can
get
your
point
in
time
copy
of
the
source
volume
either
via
clone
or
via
snapshot.
So
you
know
the
most
efficient
way
to
do.
It
is
just
directly
clone
the
volume
to
be.
You
know,
on
the
source
side
to
hand
that,
to
the
data
mover
in
the
demos
that
we
did,
we
were
using
the
the
ebs
csi
driver
which
doesn't
actually
support
clone,
so
we
had
to
use
the
snapshot
mode
where
it
takes
a
snapshot
and
then
restores
that
snapshot
as
a
volume.
B
In
order
to
get
point
in
time
copies,
but
then
there's
the
third
mode,
which
is
basically
to
just
use
the
live
volume
and
replicate
that
on
a
schedule,
and
so
that
gets
you
around,
you
know
requiring
csi
snapshots
or
clones,
but
you
lose
the
sort
of
instantaneous
view
of
the
volume.
When
you
do
that
right,
so
you
can
potentially
get
a
little
bit
of
skew
in
there
whenever
it's
being
replicated
and
then
over
on
the
destination
side.
B
You
have
your
choice
of
whether
to
snapshot
after
each
sync
iteration
or
to
just
you
know,
leave
the
the
volume,
as
is
right,
which
you
could
do.
If
your
storage
provider
on
the
destination
doesn't
support
snapshotting.
A
E
I
just
have
to
think
some
more,
but
thanks
john
okay.
A
All
right
more
in
the
chat,
is
it
reasonable
to
use
for
huge
volumes?
Do
you
have
you
done
benchmarks
or
tests.
B
Right
so
we
haven't
done,
we
haven't
really
done
a
lot
of
benchmarking
of
it,
yet
the
you
know,
obviously
the
the
bigger
your
volume,
the
higher
your
change
rate
and
stuff.
Like
that,
the
the
more
I
guess,
the
more
latency
you
can
potentially
see
in
terms
of
you
know
how
long
it's
going
to
take
to
to
replicate
your
data
in
terms
of
doing
large
volumes.
B
I
would
say:
I'm
not
all
that
concerned
about
just
having
a
big
volume
in
terms
of
data.
It
could
take
a
while
to
get
the
first
copy
of
that
over
to
your
secondary
site,
but
sort
of
as
the
as
the
the
replication
is
ongoing.
A
So
bench
benchmarking
will
be
interesting
when
we
get
get
to
that
stage
and
what
it
is.
We
should
actually
be
benchmarking
so
because
everyone
will
have
a
different
scenario.
So
that's
going
to
be
interesting
to
to
figure
out
what
the
best
thing
to
benchmark
is
all
right,
so
we're
at
the
almost
end
of
the
hour.
Anyone
got
any
more
questions
we'll
give
them
a
second,
otherwise
we're
going
to
say
thank
you
and
we're
going
to
have
you
back
in
a
few
iterations.
A
So
we'll
see
what
comes
out
in
the
next
releases
and
if
folks
are
interested,
please
do
go
to
the
github,
repo
and
reach
out
to
john
or
parole
or
ryan,
or
anybody
in
the
storage
team.
Over
here
at
red
hat,
you
can
get
a
hold
of
them
all
and
we'll
definitely
keep
keep
you
posted,
because
I
know
this
is
something
near
and
dear
to
a
number
of
folks
hearts
and
I'm
really
pleased
to
see
this
solution.
A
So
thank
you
for
taking
the
time
today,
everybody
and
for
the
wonderful
demos
and
the
especially
the
very,
very
short,
acm
rackham,
one
ryan
that
was
impressive,
even
with
the
minor
edits
of
launching
the
clusters.
That
was
great.
So
thanks
very
much
and
thanks
everybody
for
joining
us
today
and
we
will
keep
you
all
posted
on
scribe's
progress
and
see
what
we
can
do
about
getting
you
in
front
of
sig,
storage
and
other
places
to
get
the
word
out.
So
thanks
again,.