►
From YouTube: Using Apache CouchDB Operator for Data Portablity Josh Mintz and Will Holley IBM |OpenShift Commons
Description
Using Apache CouchDB Operator for Data Portablity
Josh Mintz and Will Holley IBM
OpenShift Commons Briefing
@openshiftcommon
Link to slides: https://github.com/openshift-cs/commons.openshift.org/blob/master/briefings/slides/OpenShift%20Commons%20Briefing%20Apache%20CouchDB%20Operator%20Data%20Portablity%20Josh%20Mintz%20Will%20Holley%20IBM.pdf
A
Hello,
everybody
and
welcome
to
another
openshift
Commons
briefing
this
time
we're
going
to
talk
about
some
of
my
favorite
things,
one
of
which
is
operators
and
data
portability,
and
we
have
with
us
Josh
Mintz,
we'll
Holly
and
Mike
Breslin,
all
from
IBM
and
they're
gonna
talk
about
using
the
Apache
file
DB
operator
for
data
portability
and
I'm
gonna.
Let
them
introduce
themselves
there's
a
bit
of
a
demo
here
at
the
end
of
this
session,
we'll
have
time
for
Q&A.
A
B
You
Diane
hi
everyone,
my
name
is
Josh
Mintz
I'm,
a
product
manager
at
IBM
cloud
I
sit
in
Boston
alongside
Mike
Breslin,
we're
also
joined
by
will
Holly
from
the
UK.
Today
we're
going
to
be
talking
about
the
operator
for
apache
couchdb.
The
whole
team.
That's
here
is
part
of
the
organization
that
delivers
IBM
Cloudant,
which
is
a
database
as
a
service
in
the
IBM
cloud
that
database
as
a
service
is
built
on
apache
couchdb,
and
we
took
that
experience
running
CouchDB
at
scale
in
a
public
cloud
environment.
B
We
wanted
to
transform
that
into
the
operator
paradigm,
so
people
could
take
our
lessons
learned
and
easily
allow
them
to
to
run
it
in
their
own
open
ship
cluster
apologize.
Did
you
hear
any
squeaking
with
the
work
from
home
going
on
I'm
here
with
my
16
week,
old
puppy
that
has
just
gotten
a
new
toy
she's,
very
cute
I
promise.
So
thank
you
in
advance
for
your
understanding.
So
a
little
picture
to
go
to
go
with
the
names
can
also
get
a
picture
of
Mike
on
here
later
beautiful
face.
B
So
there's
will
and
I
give
any
questions
or
concerns
about
the
presentation
or
want
to
talk
to
us.
We
hang
out
in
the
apache
couchdb
slack
I
have
a
link
at
the
end
of
the
presentation
to
join
us
there,
so
we're
we're
definitely
down
to
two
talk
on
slack
or
talk
on
the
phone
or
talking
to
you,
the
open
source
at
any
time,
we're
here
to
help
so
before
going
to
the
operator.
I
just
want
to
give
like
a
bit
of
background
ie.
Why
why
you
should
trust
our
opinion?
B
The
opinion
in
design
that
goes
into
the
operator
pattern
right
and
so
at
IBM
Cloudant
we're
the
data
backbone
of
the
IBM
cloud
across
50
data
centers
all
over
the
world.
We
have
petabytes
upon
petabytes
of
data
under
management
and
again
cloud
is
at
its
core.
You
know,
code
couchdb,
and
you
know
part
of
us
running
it
as
a
service
is
the
years
of
experience
that
we
have.
B
You
know
operating
monitoring
and
scaling
these
systems
for
hyperscale
use
cases
and
it's
fully
compatible
with
patchy.
Couchdb.
There's
some.
You
know
API
differences
that
you
might
expect
in
sort
of
a
public
cloud
versa.
A
piece
of
software
you'd
run
on
a
server
or
a
Raspberry
Pi,
but
you
can,
you
know,
use
them
interchangeably
for
the
most
part,
and
we
have
there's
lots
of
information
on
the
web
about.
You
know
that
the
minor
differences
between
them.
B
So
I
want
to,
instead
of
focusing
about
clouded
I,
want
to
focus
on
apache
couchdb
today
and
we're
gonna
talk
about
the
operator
and
do
a
little
bit
of
a
live
demo
from
well,
but
before
we
get
into
that,
I
just
want
to
cover
the
basic
high-level
feature
set
that
you
get
when
you
use
Apache
Catch
TV
and
for
people
that
have
been
around
the
database
community
for
the
last
decade.
Couchdb
was
one
of
the
first
no
sequel
data
stores.
B
To
really
you
know,
carry
that
movement
forward
right,
couch
TB
in
MongoDB,
where
we're
very
popular
a
number
of
years
ago
and
couchdb
is
still
continue
to
improve
and
become
even
more
reliable
and
you've,
even
more
feature-rich,
so
we're
still
there
under
the
the
Apache
organization.
As
a
you
know,
an
open
source
project
governed
by
that
at
PMC
and
those
those
standards
so
at
a
high
level
to
JSON
document
store
with
an
HTTP
API.
B
So
it
speaks
the
language
of
the
web
for
very
ease
of
use
for
web
and
mobile
application
development
in
places
a
premium
on
data
durability.
So
it
uses
you
know,
structures
and
paradigms
or
in
you
know
the
way
it
sets
up
clusters
and
how
it
deals
with
crash
failures
to
make
sure
that
we're
focusing
on
keeping
the
gold
of
the
database
the
data
as
safe
and
durable
as
possible
and
couchdb
2.0
and
3.0.
It
uses
multi
master
clustering.
So
it's
kind
of
a
master
master
architecture
that
allows
you
to
scale
up
and
out
very
easily.
B
You
know
start
with
one
node
and
add
another
and
add
another
add
another.
A
similar
paradigm
would
be
like
Apache
Cassandra.
The
other
best
part
about
couchdb
is
its
ability
to
sync
data.
Though
there's
this
thing
called
the
CouchDB
replication
protocol
and
it
allows
you
to
very
easily
move
data
wherever
you
need
it
to
go,
whether
it's
in
public
cloud
on
a
store
or
a
point-of-sale
device
at
the
edge
or
an
oil
rig
out
in
the
middle
of
the
ocean.
B
You
can
just
kind
of
use
Lusine,
as
you
would
want
to
to
the
cache
db-api.
It's
also
api
compatible
with
CouchDB,
which
will
is
very
familiar
with.
So
if
we
have
folks
on
the
call
that
want
to
talk
more
about
that,
I'm
sure
he'd
be
happy
and
interested
to
answer
any
questions
there.
It's
a
software
for
running
sort
of
the
couchdb
protocol
on
mobile,
apps
or
small
devices.
Rx
DB
is
kind
of
a
newcomer
to
this
space.
It
is
for
JavaScript
applications,
I
think
running
something
on
your
phone
as
well.
B
It's
also
API
compatible
with
pouch
pouch,
DB
and
Cloudant
and
lastly,
as
we
discussed
earlier,
IBM
Cloudant
compatible
with
apache
couchdb.
A
lot
of
the
folks
I
work
with
are
also
contributors
to
the
open-source
project
summer
on
the
PMC
and
it's
a
it's
an
awesome
community
and
we'd
love
for
people
to
come,
say:
hi,
learn
more.
If
you're
interested
in
sort
of
joining
the
fold
and
helping
develop
a
patchy,
CouchDB
sure
there's
a
few
people
in
the
community
that
would
love
to
help
steward
your
involvement
or
answer
any
questions.
B
You
may
have
one
of
the
last
cool
things
about
how
GB
is
it
scales
down
to
small
devices
like
raspberry
PI's,
but
also
we
run
CouchDB
eg
Cloudant
a
database
in
the
public
cloud
that
have
many
many
terabytes
in
them
per
instance,
so
it
scales
up
and
down
very
nicely
if
anyone
has
any
questions,
feel
free
to
pause
me
throughout
the
session.
I
know
people
drop
in
drop
out,
so
no
problem
at
all.
B
If
you
need
me
to
cover
something
or
go
back,
one
of
the
cool
things
that
you
can
do
with
apache,
couchdb
and
Cloudant,
because
of
that
data,
replication
protocol
is
sort
of
this
open,
hybrid
multi
cloud
architecture
and
I've
recognized
the
jargon
there.
But
it's
pretty
descriptive
for
what
we're
actually
trying
to
trying
to
deliver
and
what
we
see
as
a
common
use
case
for
people
we
work
with
and
that's
partially,
why?
B
We,
we
went
and
did
the
development
on
the
operator
for
apache
couchdb
to
help
take
what
we
learned
in
the
public
cloud
and
allow
people
to
use
that
knowledge
through
the
operator
framework
on
openshift,
wherever
they
want
to
run
so
couch
TVs.
Strong
replication
protocol
allows
you
to,
for
example,
run
the
operator
for
apache
couchdb
on
an
on
premise:
open
ship
cluster
in
your
own
data
center.
B
Then
you
can
easily
just
kind
of
replicate,
bi-directionally
or
one-way
to
any
other
environment,
though
that
might
be
a
managed
database
as
a
Service
Cloud
and
on
the
IBM
cloud,
where
you
don't
really
have
to
worry
about
running
the
environment.
You
just
pay
for
provision
throughput
and
off
you
go,
and
then
you
can
also
only
build
your
apps
on
openshift
as
well
in
the
IBM
cloud,
because
we
have
a
managed
open
offering
there
as
well,
and
then
you
can
take
that
data.
B
That's
been
replicated
to
the
IBM
cloud
and
you
could
also
replicate
it
over
to
Azure.
Let's
say
you
have
a
footprint
Azure
where
you're
running
Red
Hat
open
shift,
and
you
want
to
use
the
operator
pattern
again.
So
basically,
what
we
want
to
do
is
you
know:
let
people
get
their
data
wherever
they
needed
to
be.
We
are
big
believers
in
kubernetes
at
IBM
and
an
open
shifts
and
those
help
dramatically
with
application
portability.
One
of
the
things
that
we've
seen
to
be
a
problem
for
our
customers
is
data
portability.
B
So
you
can
move
the
application
now
seamlessly
between
clouds,
but
it's
not
super
easy
to
move.
The
data
we
feel
strongly
that
the
patchy
couchdb
is
a
technology
is
well
suited
for
that,
given
its
Data
Sync
technology
and
the
fact
that
it's
open
source
under
the
Apache
foundation
and
if
you'd
like
to
use
it
as
a
managed
service
in
the
public
cloud,
it's
there
under
IBM
Cloudant.
But
if
you
want
our
expertise
in
the
operator,
pattern
feel
free
to
use
the
operator
for
apache
couchdb.
The
more
the
merrier
here.
B
B
So
you
might
be
running
apache
couchdb
in
the
cloud.
Let's
say
you
have
stores
all
over
your
your
continent,
whether
it's
the
US
or
Europe,
and
you
need
to
replicate
data
from
the
cloud
into
those
stores
or
there's
devices
at
those
stores
that
need
to
get
data
from
the
cloud,
and
you
want
to
make
sure
that
you're
able
to
do
that.
B
Even
when
there's
not
any
internet
connectivity
which
CouchDB
does
very
well,
it's
able
to
sort
of
understand
that
maybe
it's
lost
Internet
connectivity
and
when
it
comes
back
up,
stable
to
resume
replication
and
carry
on
like
there
hasn't
been
much.
That's
happened
to
the
the
network
infrastructure,
so
what
you
can
do
is
sort
of
run
it
in
the
public
cloud
or
anywhere.
B
You
want
and
replicate
that
data
to
your
stores,
where
you
might
also
be
running
apache
couchdb
and
not
those
stores
they
need
then
mabel,
if
you
want
replicate
them
to
like
point-of-sale
device,
is
whether
it's
like
maybe
an
Android
phone
or
an
iPad,
using
something
like
CouchDB
running
on
the
devices
which
speaks
the
apache
couchdb
replication
protocol.
So
if
there's
any
questions
on
that,
I
can
pause.
B
Hopefully
that
was
sort
of
a
useful
use
case
for
you
to
help
couch,
no
pun
intended
how
you
might
use
this
technology
and
with
that
I
will
pass
it
on
along
too
well
for
what
you're,
probably
all
here
for
the
demo
and
how
to
get
it
going
and
used
in
production.
So
well,
you
want
to
add
anything
or
take
it
away.
C
C
Nettie's,
though,
as
Josh
mentioned,
that
one
of
the
kind
of
big
things
is
that
it's
you
interact
with
the
cache
to
be
entirely
through
HTTP
API,
so
exposing
the
database
load
balancing
the
database
instances
readings
the
database
is
all
fairly
kind
of
straightforward
because
given
at
ease
HTTP
applications
are
a
very
common
use
case
for
cube.
Nasa's
data
durability
story
is
another
one
that
makes
it
well
suited
because
casually
uses
a
copy-on-write
storage
engine,
which
is
extremely
fault.
C
Tolerant,
though,
as
long
as
you've
got
a
positive,
compatible
storage
back-end,
you
can
basically
it's
very
tolerant
to
database
instances
just
being
stopped
abruptly
being,
and
so
it's
easy.
It
kept
well
with
check
that
they're
keeping
at
each
a
deal
and
moving
things
around
so
that
that
also
massively
simplifies
the
deployment
of
this.
The
main
kind
of
weak
points
are
then
captured.
C
C
Okay
and
I
guess
the
first
thing
is:
where
can
you
get
the
operator
from
and
and
the
operator
is
published,
to
do
location?
So,
if
you're
not
using
OpenShift
ball,
you
can
go
to
operate
a
hub
to
IO
and
install
the
operator,
and
so
that
will
work
with
vanilla
upstream
kubernetes
or
it
will
work
with
at
chip
3
and
you
install
them
directly.
C
C
The
default
configuration
of
the
operator
is
to
only
allow
one
CouchDB,
node,
open
shift,
worker
or
kubernetes
worker,
and
so
I
could
I
wouldn't
be
able
to,
and
if
I
try
and
try
and
have
a
three
node
cache
de
cluster,
it
will
fail
to
deploy.
You
can
override
that
using
a
developer,
a
dev
mode
flag
and
so
for
development
use
cases
where
you
want
to
try
a
three
node
cluster
and
particularly
because
some
of
the
consistency
properties
and
can
actually
be
differ
between
single
node
deployments
and
multi
node
deployments.
C
You
can
do
that,
but
in
production
the
default
is
that
we
will
try
and
spread
the
nodes
across
workers
using
an
T
affinity
rules.
They
create
that
and
that's
going
to
go
off
and
create
my
cache,
we
cluster
and
so
under
the
hood.
What
that's
going
to
do
is
create
a
resort
achill
deformation,
which
is
the
piece
that
does
the
real
work.
C
If
I
go
into
my
cache
to
be
cluster,
and
just
we
look
at
the
yeah
Mille
here,
and
we
see
that
it's
got
a
formation
generation
which
is
basically
the
generation
of
the
formation
that's
being
created,
and
this
observed
generation
will
get
updated
as
the
formation
is
expanded
into
its
its
sub
resources.
Though,
when
is
to
number
when,
when
observed
generation
equals
two,
then
it
means
that
the
formation
is
fully
representative
of
the
couchdb
cluster
resource.
C
C
C
It's
also
integrated
with
the
OpenShift
know
this
certificates,
so
it
will
have
its
exposing
a
an
HTTP
service
on
port
443,
which
is
using
those
openshift
certificates,
so
it
will
be
able
to
be
validated
by
any
any
clients
within
the
OpenShift
cluster.
The
other
neat
thing
with
that
is
that
it
makes
it
very
easy
to
expose
the
outside
world
if
you
want
to
so
I
can
go
in
and
create
a
route
or
couchdb
dashdb
and
let
my
service
here
it's
what
443
and
I'm
going
to
secure
the
route
with
Riaan,
crypting
and
rating
that.
C
That
script
created
near
route
that
I
can
use
directly
from
outside
of
the
openshift
cluster
I
launched
that
gonna
give
me
a
warning
because
I'm
running
code
ready
container,
so
it's
it's
using
a
self-signed
certificate.
But
I
can
go
past
that
and
it's
given
me
my
couch
DB
instance
the
base
URL
for
couch
DB.
Is
it's
just
adjacent
endpoint?
C
So
if
I
go
to
the
utils
endpoint,
that
will
give
me
the
dashboard
and
his
McCarthy
beak
can
log
in
with
the
credentials
which
we
specified
when
we
created
our
cache
to
be
cluster
resource
log
in
and
you
can
see,
it's
created
a
couple
of
distant
databases,
the
replicator
and
the
users
database
and
I'm
not
going
to
funny
kind
of
goes
the
dashboard.
But
we
can
see
from
here
that
the
Academy
says
it
setup
is
configured
for
producting
production
usage
as
a
clustered
node.
C
C
C
So
I've
got
an
open
shift
instance
deploy
to
IBM
cloud
as
well,
which
also
runs
a
cash
TV,
and
so
I
can
show
you
how
to
replicate
between
the
two
by
launch
my
IBM
cloud,
lovely,
URL,
open
gift,
and
here
this
is
a
CP
4
as
well,
and
I've
got
a
root
I
catch
to
be
instance,
somewhere
here.
So
what
networking
routes?
C
And
I've
got
just
a
demo
database
here,
which
is,
which
is
a
movies
database.
Stuff
I
run
a
quick
query
here.
That's
noon
and
I
can
replicate
that
to
my
local
cache
dB.
So
in
my
local
cache,
TV
I
will
set
up
a
replication,
though
the
source
is
a
remote
database.
This
one
that
I
just
saw
said
movies
dinner.
C
C
C
And
I'm
going
to
set
this
up
to
run
and
continuously,
so
this
database
will
continuously
sync
using
HTTP
of
the
internet.
Now
I
have
to
run
this
replication
from
my
local
deployment
because
my
local
deployment,
whilst
it
they
can
pull
data
from
the
internet,
it's
not
exposed
to
the
internet
directly,
so
I
wouldn't
be
able
to
access
this
actually
instance
accident
outside
of
my
my
network
here,
I'll
start
their
application
and
that
should
go
off
and
pull
all
that
data
from
the
Pegman
shift
deployed
actually
be
in
the
in
IBM
cloud.
C
C
C
But
the
other
part
of
replication-
that's
nice,
so
a
casualty
replication
by
its
nature
is
one
directional,
though
this
is
just
pulling
data
from
that
external
casually.
The
incidents
hosted
in
the
cloud
if
I
want
I
can
also
make
a
second
replication
to
make
it
bi-directional.
So
I'll
take
that
source
and
I'm
going
to
make
the
sauce
my
local
movies
database.
C
That's
continuous
as
well.they.
That's,
basically
how
you
would
configure
an
active,
active
deployment
across
multiple
clouds
or
multiple
regions
and
within
the
same
cloud,
though
bi-directional
replication,
one
that
pulls
data
and
one
that
pushes
data.
But
here
I've
got
I,
think
I've,
fully
populated
my
and
my
database
here.
But
if
I,
if
I
run
a
search
here
and
let's
say
day,
one
got
my
Anna
Kendrick
search
now.
C
D
Have
a
question
hi:
this
is
Shanna
when
you
set
for
continuously
sync
or
replicate
what
are
the
time
interval
in
between
or
it
just
recognized
if
there's
a
new
data
and
just
automatically
replicate,
or
how
does
that?
Does
that
mean
for,
like
you
know
when
we
do
incremental
replications
by
per
day
or
but
when
you
say
continue
means
what's
the
context
around
that,
so.
C
There's
there's
a
few
different
ways:
you
can
do
it
depending
on
what
your
requirements
are
on.
Sorry,
there's
a
few
different
ways:
you
can
do
it
depending
on
what
your
requirements
are
for
latency,
though
CouchDB
does
support
a
fully
continuous
replication,
where
it
basically
maintains
a
connection
over
HTTP.
C
You
can
also
another
way
of
doing
it
would
be
to
schedule
replications
and
so
you'd
have
to
do
that.
Programmatically
have
something
that
replicates
they
at
certain
times
of
the
day
or
or
just
periodically.
If
the
data
meant
changing
that
that
frequently,
but
that's
kind
of
a
way
to
do
it
with
less
load
on
the
cluster
and
but
probably
the
continuous
replication
is
the
most
common
way
to
do.
C
So
you
yeah,
the
replication,
isn't?
Isn't
it
transactional
in
that
sense
that
you
won't
casually
be
doesn't
have
have
transactions
anyway
in
that,
in
that
sense,
so
there
are
sort
of
there
are
patterns
you
can
use,
get
close
to
it,
but
ultimately
it
the
way
that
most
people
work
around
that
is
to
just
have
their
own
monitoring.
That
will
test
that.
You
know
you
can
have
a
document
that
gets
updated,
say
once
a
minute
on
one
side,
and
then
you
can
test
that
that
has
propagated
on
the
other
side
figure
out.
C
What
the
lag
is
that
kind
of
thing,
but
if
you
need
a
hard
guarantee
that
the
data
is
replicated
across
regions
that
is
difficult
with
with
couchdb
the
way
that
I've
seen
it
done
before
is
to
instead
of
using
replication,
for
that.
Actually
just
have
your
application
make
rights
to
two
different
databases
at
the
same
time,
the
what
what
catch
me
does
do
for
you
and
the
CouchDB
clusters
that
are
deployed
by
the
operator
will
do.
C
This
is
that
it
maintains
multiple
replicas
within
the
cluster,
though
you've
you've
already
got
three
rep
because
of
the
data
by
default.
Within
your
cache
to
be
trust
it
so,
if
you're
deploying
to
a
cloud
or
an
open
shift
cluster,
this
has
multiple
fault
zones,
then
your
cache
should
be
node
within
the
cluster
will
be
placed
across
those
fault
zones
and
it
will
attempt
to
distribute
the
replicas
across
them
as
well.
C
A
So
will
I
have
a
quick
question
if
you're
at
the
end
of
your
demo,
yeah
I
know
that
you
are
I,
hear
that
you
are
one
of
the
people
that
wrote
the
operator
itself
for
couch,
TV
and
I
was
wondering
if
you
could
talk
a
little
bit
about
that
process
and
any
lessons
that
you
learned
or
anything.
You
know,
as
you
mentioned
earlier,
that
it
was
an
opinionated
version
of
it
how
that
went,
and
so
I.
C
Mean
really
the
couch
TV
operator
is
standing
on
the
shoulders
of
giants
and
turtley
it
that
we
have
been
using
operators
at
IBM
for
a
long
time
to
support
our
IBM
cloud.
Databases
offering
and
part
of
the
work
for
the
couch
to
be
operated
was
to
take
the
patterns
that
have
been
established
in
in
doing
that
work
and
bringing
them
into
a
framework
that
could
be
used
for
standalone.
C
C
So
the
operator
can
essentially,
we
call
it
a
recipe,
but
it
will
have
a
series
of
actions
that
it
needs
to
perform
to
get
to
a
state
and
can
instruct
the
sidecar
to
perform
that
action
on
every
replica
of
the
of
the
database
maze
and
that
the
complexity
of
that
depends
on
the
database
that
you're
deploying.
But
we've
used
that
pattern
successfully
for
things
like
Redis
and
Postgres,
and
my
sequel,
the
MongoDB
and
now
casually
B
as
well.