►
From YouTube: CNCF SIG Storage 2020-04-08
Description
CNCF SIG Storage 2020-04-08
A
A
Hello,
everyone:
this
is
this:
if
six
Oracle
and
we're
just
waiting
for
a
couple
is
to
to
join
and
then
Derek
and
Flavio
will
be
giving
us
a
presentation
of
privy
ger,
so
will
will
address,
wait
for
a
couple
more
couple,
more
minutes,
we'll
probably
start
at
five
past
the
hour.
A
A
B
A
D
D
Frickin,
can
everyone
see
my
my
slide,
but
I
can't
chip,
okay,
all
right,
so
how
much
time
would
you
like
for
this
I
suppose
you
don't
want
me
to
choose
the
whole
time.
Is
that
right?
So
how
much
time
do
you
think
I
have
just
so
that
I
calibrate.
D
Okay,
all
right,
okay,
so
I
won't
rush.
Then
my
name
is
Flavio.
I
am
now
be
talking
about
Bree
Vega,
as
Alex
system
just
introduced.
So
this
is
pretty
much
the
same
presentation
I
gave
at
the
CN
CF
webinar
yesterday.
I
did
make
a
few
changes
that
removed
some
of
them
of
discussion
on
flank
and
I
added
some
some
some
new
content.
So
you
know-
hopefully
many
of
you
haven't
seen
that
presentation,
so
it
won't
be.
You
won't
be.
It
will
be
new
to
you
before
I
get
into
pro
Vega.
D
So
a
bit
about
a
bit
about
myself.
I
am
a
senior
distinguished
engineer
at
Dell,
so
I've
been
working
on
the
purveyor
project
since
2016,
so
I
have
completed
don'ts
towards
the
end
of
2016,
so
I
completely
be
three
years
working
on
the
on
the
project
so
complete
for
towards
the
end
of
the
year.
My
background,
this
is
in
disability.
Computing
I
was
in
in
research
for
a
number
of
years.
D
I
was
researching
in
Microsoft
research
earlier
in
Yahoo
research
and
I,
working
on
the
number
of
Apache
projects,
most
prominent
ones,
that
I
actually
had
I,
helped
you
to
to
to
build
from
scratch
or
zookeeper
and
and
in
bookkeeper
both
in
the
ASF.
So
I
have
some
contact
information.
The
case
you
want
to
reach
out
to
me
or
follow
me
on
Twitter,
so
you
may
under
and
the
Twitter
handle.
So
now.
Let
me
move
on
and
I'll
talk
about
motivation
and
such
so.
D
The
main
motivation
for
bra
Vega
and
many
of
the
systems
that
you
hear
that
are
dealing
with
the
streams,
a
stream
processing
general,
are
the
many
sources
of
a
continuously
generate
data
that
you
have.
We
have
out
there
and
I'm
sure
that
you
have
come
across
a
good
number
of
them
and
that's
not
a
surprise
but
being
more
concrete
about
about
the
sources.
They
can
be.
Events
that
are
end-users
generating.
We
can
think
of
the
traditional
social
networks
or
users
are
posting
events
or
you
can
think
of
online
shopping.
D
Your
users
are
purchasing
items
are
performing
online
transactions
or
they
are
searching
for
products.
So
all
those
generate
data
that
you
might
want
to
capture,
but
it's
not
only
about
end
users,
so
you
can
think
of
machines,
also
being
sources
of
continuously
generated
data.
You
have
servers
that
are
continuously
producing
telemetry,
that
I
you
want
to
capture
so
that
it
can
spot
problems
early
on
in
your
food
of
service.
So
that
would
be
you
when
you
use
case,
but
it's
not
only
about
servers
either.
D
There
are
the
types
of
machines
that
are
that
many
users
and
applications
care
about
so
sensors
in
IOT
the
holds
the
promise
of
a
connected
cars,
autonomous
cars
and
so
on.
It's
not
quite
a
reality
yet,
but
we
are
going
towards
the
directions
so
hopefully
that
that
will
become
a
reality
eventually,
but
all
those
who
be
continuously
generating
data
and
and
ingesting
the
data
and
processing
it
could
be,
could
be
interesting
or
or
even
a
requirement
for
a
good
number
of
users.
D
Now,
if
I
put,
if
I
put
those
comments
into
what
I'm
calling
a
landscape
here,
what
we
have
is
on
the
left-hand
side.
We
have
various
sources
of
data,
end
users
who
have
machines
I,
don't
know
you
can
have
you
have
drone
sensors
are
connected
cards.
All
these
things
produce
a
continuous
flow
of
data,
that's
I
want
to
capture
and
that
I
want
to
process.
D
So
visualization
where
you,
where
you
are
representing
raw
data
in
ways
that
are
just
more
intuitive
or
it
or
are
easier
to
to
extract
insights
out
of
it,
you
can
produce
alerts.
If
we
talk
about
a
fleet
of
servers
that
I
know
bad
things
are
happening
in
your
infrastructure
and
you
want
to
know
about
it
generating
sites
about
maybe
users
or
your
applications.
You
know
front-end
applications
that
are,
you
might
want
to
know
that
there's
a
spike
in
traffic
or
or
or
events
of
the
like
recommendations.
D
Again,
if
you
talk
about
end
users,
what
other
users
are
looking
to,
or
users
with
a
similar
profile
and
and
finally
just
actionable
analytics
where
you
you
present
data
or
present
results
that
could
be
useful
on
any
action
you
want
to
take.
You
go
visit
a
customer
and
you
want
to
know
more
about
the
counselor
or
anything
related
to
that
customer
about
to
visit.
D
There
is
looking
at
the
health
of
your
cattle
to
inspecting
airplanes
between
flights,
and
you
you
want
to
do
that
not
only
by
tailing
the
stream
by
tailing
the
data
right
so
by
processing
as
soon
as
the
data
is
available
for
for
processing,
but
also
you
might
want
to
go
back
and
reprocess
data
I,
don't
know.
Maybe
you
found
a
bug
or
you
found
an
issue
that
you
wanna.
D
Where
you
want
you
spot
those
problems
or
those
defects
as
soon
as
possible.
So
you
you
probably
want
you
probably
want
you
to
tail
the
string
and
process
the
data
as
soon
as
you
get
it,
but
there
might
be
situations
again
in
which
you
want
to
go
back
in
time
and
revisit
the
data
and
reprocess
it
so
that
same
concept
applies.
For
for
such
a
use
cases
you.
D
Now,
focusing
on
on
strings
itself
right,
so
there
was
a
that's
what
I
wanted
to
say
about
about
use
cases,
but
now,
let's,
let's
turn
our
attention
to
what
streams.
What
these
streams
actually
look
like
you
know,
in
an
abstract
way,
so
natural
way,
you
think
think
about
dreams
is
that
they
are
sequences
of
events
or
records
or
messages
that
I
know
any
concept
that
I
matter
so
to
the
application.
But
this
this
sequence
of
data
items
as
they
are
produced.
D
We
keep
appending
them
to
the
to
the
sequence,
but
in
reality
a
reality
is
not
just
one
single
flow.
If
I
think
that
a
lot
of
the
a
lot
of
the
scenarios
that
I
have
mentioned
with
servers
with
sensors,
you
have
a
number
of
this
parallel
flow.
So
it's
not
one
sequence,
but
you
can
have.
You
can
have
many
of
those
in
a
in
parallel,
and
so
this
parallelism
gives
us
another
degree
of
a
say,
realism
right.
So
it's
closer
to
our.
D
We
were
we'd,
expect
to
see
in
in
a
real
application,
but
doesn't
stop
there,
because
we
can
also
have
fluctuations
in
the
in
the
traffic
we're
observing.
You
can
have
parallelism
is
that
you
have
the
parallelism,
but
you
know
that
the
traffic
in
the
parallel
flows
can
I
can
grow
me
shrink
over
time,
and
that's
because
you
have
a
I,
don't
know
daily
cycles.
Weekly
cycles,
maybe
no
be
honest,
likely
cycles,
but
also
you
can
have
spikes
on
a
Black,
Friday
or
or
Christmas,
and
some
events
that
get
people
to
access
more
your
system.
D
D
Now,
it's
also
important
to
note
that
if
we
talk
about
continuous
gen
continuously
generated
data,
you
can
have
this
being
continuously
generated
for
a
very
long
time.
We
could
be
talking
about
years
right
as
if,
if
an
application
has
been
running
for
a
long
time,
and
so
we
we
might
want
to
capture,
we
might
want
to
capture
this
frame
from
the
beginning
and
keep
it
as
a
stream.
D
D
So
it's
the
recent
data
you
are,
you
are
you're
capturing
processing
and
the
older
historical
data
that
made
you
have
already
process
and
you
might
want
to
reprocess
in
the
future,
and
so
for
that
matter
you
maybe
use
a
different
system
even
to
restore
it
and
I'm
gonna
call
it
the
lumped
away
just
in
reference
to
to
what
people
called
the
Lumbee
dollar
architecture,
but
the
reality
is
that
it
would
be,
it
would
be
ID.
You
know
for
applications
that
they
didn't
have
to
make
such
a
distinction
that
they
could
ingest
the
stream.
D
D
Now
this
streams
enough
learning
about
writing
I,
have
focused
a
lot
on
the
on
the
right
part
right,
so
one
sequence
far
listened
and
then
traffic
fluctuations
and
ingestion,
but
there's
a
big
part
of
it,
is
also
on
the
on
the
read
side
so
making
sure
that
that's
an
application
that
once
you
process
it
is
actually
able
to
cope
with
that.
You
deal
with
the
flow
of
data,
no
matter
in
what
form
if
it
fluctuates,
if
it's
parallel
or
what
so
read.
D
D
So
traditionally,
a
start
systems
have
focus
on
objects
and
files,
and
we
we
thought
that
that,
given
the
nature
of
a
lot
of
applications
that
were
using
they
it's
more
natural
that
that's,
they
use
streams
as
as
their
core,
primitive
and
now
using
the
concepts.
I
have
just
explained
those
things
they
have
to
be
implement
in
such
a
way
that
I
that
the
system
is
able
to
accommodate
an
unbounded
amount
of
data
that
the
stream
is
elastic.
That
is
consistent.
We
don't
want
to.
D
We
don't
want
to
duplicate
events
or
miss
events
and
that
applications
are
able
to
both
tail
and
in
process
data
historically
and
I'm.
Referring
to
this
as
a
called
native
way
of
of
exposing
streams,
because
those
are
all
concepts
that
are
that
we
find
very
important
when
building
blood
systems.
So
let
me
now
move
to
talk
about.
Provegan
specifically
now
prevail,
the
builds
on
the
concept
of
segments,
a
segment
is
a
single
sequence
of
bytes
and
it's
our
storage
unit
right.
So
it's
the
is
the
unit
data
that
we
store
in
our
lowest
storage
system.
D
It's
an
append-only
sequence
of
bytes,
its
bytes,
not
events,
I,
have
been
I
have
mentioned
events
before
and
that
appears
at
the
API
level.
Bye-Bye
internally,
we
treat
as
a
sequence
of
a
of
bytes
now
to
convert
from
bytes
to
to
sort
of
two
events
from
by
two
bytes.
We
use
serialization.
So
we
expect
the
application
to
provide
a
way
of
serializing
the
the
data
industrializing
on
the
on
the
way
out
so
segments
enable
us
to
have
parallelism.
I
can
have
a
number
of
segments
ena
in
parallel,
daddy.
A
Flavia
I
just
asked
a
very
good
question
here:
yeah
of
course,
oh,
this
does
privy
to
just
focus
on
on
the
storage
of
the
streams,
or
does
it
have
or
doesn't
have
any
functionality
to
kind
of
do
some
of
that
serialization.
So
you
know,
as
an
example
is
it
is,
is
it
only
just
doing
the
raw
streams
or
does
it
have
some
higher
I
live
a
functionality
like
like
similar
to
what
say
a
message
queue
might
might
do,
for
example,.
D
So
but
you
do
have
the
bigger
of
writing
events.
You
just
perfect.
The
prophetic
line
expects
you
to
pass
as
you
realize
it,
and
this
you
realize
so
this
realize
on
the
way
in,
let
you
serialize
it
will
use
the
CDC
realized
on
the
way
out,
so
the
application,
rights
and
events,
but
internally
we
store
it
as
bytes,
so
prevent
internally
does
not
understand
the
events.
You
don't
understand
the
bytes
okay.
D
So
so
the
segments
that
we
store
internally,
they
enable
us
to
have
parallelism,
so
you
can
have
the
the
various
clients
writing
to
those
segments
in
an
Impala
and
we
use
routing
keys
to
map
their
pans
of
data
to
those
segments.
Now
this
segments
also
enable
us
to
varied
that
degree
of
parallelism,
which
I'm
gonna
call
scaling.
So
I
can
start
if
I
start
from
the
head
of
the
stream
which
which,
in
this
representation
it
starts
from
the
from
the
right.
If
I
start
with
two
segments,
then
I
at
some
point,
my
traffic
goes
up.
D
I
decide
that
I
need
a
larger
number
of
segments
and
I
transition
from
two
to
five.
As
some
later
points,
traffic
goes
down
and
I
I
dropped
from
five
to
three
right,
so
that's
viable
with
with
a
per
Vega
stream,
and
as
I
mentioned,
we
have
this.
We
have
this.
We
have
this
notion
of
scaling
and
this
scaling
can
be
then
in
an
automatic
manner.
So
any
configured
stream,
you
can
say
they
want
auto
scaling,
enable
and
super
vague,
attract
the
traffic
and
and
we'll
do
that
scaling
automatically
for
you.
D
Segments
also
Lao
was
to
implement,
implement
transactions
efficiently
and
effectively.
We
when,
when
you
start
a
transaction
from
from
from
an
application,
provegan
create
segments
for
a
temporary
segments
for
those
for
those
for
the
application
for
that
transaction,
and
any
parent
in
the
context
of
the
transaction
will
go
to
those
segments.
D
Now,
if
the
transaction
commits,
then
those
segments
are
merged
to
achieve
the
main
segments
of
the
stream
and
in
the
case
of
an
abort,
we
just
discard
those
those
segments
so
that
data,
the
data
of
a
transaction,
does
not
interfere
with
the
data
in
the
primary
segments
of
the
stream
until
the
the
transaction
is
is
committed,
and
if
it's
aborted,
we
simply
discard
those
segments
and
the
data
in
them.
So
that's
another
benefit
of
of
having
cheap
transite
segments
in
the
way
we
have
in
a
improve
Vega.
D
We
do
that
via
primitive,
that
we
call
the
state
synchronizer,
which
we
both
exposing
the
api
and
and
we
using
internally,
and
one
of
the
things
that
you
can
do
with
this
is
build,
replicated
state
machines
right
and
note
that
this
is,
though
we
are
doing
this
via
using
optimistic
concurrency,
and
so
we
have
again
these
two,
these
two
primitives
that
we
expose
one
is
revision,
streams
and
another
one
that
is
at
state
synchronizer.
They
stay
synchronize.
It
builds
on
on
revision,
streams.
D
D
So,
for
example,
if
you
expect
some
traffic
and
you
wanna
increase
the
degree
of
parallelism,
your
of
your
stream,
you
can
go
and
manually
do
it
before
the
events
or
before
you
have
that
spike
now
illustrating
how
how
that
looks
in
a
for
a
particular
stream,
so
say
that
we
start
with
a
single
segment
stream
with
a
single
segment,
and
what
this
graph
is
showing
is
the
routing
key
space
versus
time.
Remember
that
the
routing
key
they're
running
keys
are
the
elements
we
use
to
map
events
that
are
being
upended
to
two
segments.
D
So
in
this
case,
if
I'm,
starting
with
a
single
segment,
all
the
keys
in
the
routing
key
space
are
mapped
to
the
same.
To
the
same
segment
now
say
that
I
have
two
hot
keys
and
and
those
two
hot
keys
induce
enough
load.
That
provider
decides
that
it
needs
to
split
that
one
segment
you
to
say,
for
the
sake
of
example,
say
that
those
rally
the
routing
keys,
are
representing
geographic
locations.
D
For
example,
I
have
some
taxi
ride
application
or
some
taxi
write
data
I'm,
looking
at
the
the
geolocation
of
the
taxi
rights
or
they're,
starting
where
they're
ending
and
so
those
two
locations
turned
out
to
be
hard.
For
some
reason
there
is
an
event
or
people
just
congregating
there,
I
guess
these
days.
That's
that's,
probably
not
happening,
but,
but
you
know
before
all
this
before
they
were
a
situation
that
I
I
suppose
that
was
the
common,
a
common
thing
so
privately
splits
into
two
new
segments.
D
Two
and
three
now,
let's
say
that
that
was
not
enough
and
and
itsu
induces
enough
flow
that
you
want
you
needs.
It
requires
a
high
degree
of
parallelism
so
now
I
split,
second
and
chewing
into
four
and
five
and
at
a
later
time,
say
that
those
keys
go
back
to
codes,
no
privity
ghosts
and
merges
45
back
into
in
just
six.
So
that's
the
that
would
be
the
final
state
of
the
stream,
at
least
for
the
time
frame
that
we're
looking
to
now.
D
One
interesting
thing
to
observe
from
this
is
that
a
single
routing,
II
there's
not
always
map
to
the
same
segment.
It
varies:
it
can
vary
over
time
if
you
have
auto-scaling
enabled
so.
If
I
pick,
for
example,
up
0.9,
then
it
started
with
segment.
1
then
then
was
mapped
to
segment
2,
then
4,
then
6,
so
at
different
points
in
time
a
particular
routing
key
mapped
to
different
segments,
and
even
though
this
may
sound
a
complication
for
the
application,
the
application
doesn't
need
to
really
does
not
read,
observe
this.
In
this
changes.
D
Now
let
me
show
you
a
graph
that
illustrates
the
changes
to
just
segments
over
time,
but
now
from
a
real
run.
So
we
have
this
heat
map
that
shows
segments
and
the
loads
on
on
the
segment
so
that
the
color
represents
load
so
light
blue
means
that
is
lightly
loaded
and
and
bright.
Red
means
that
it's
it's
heavily
loaded.
D
The
white
lines
represent
the
seperation
between
segments
and
what
we
observe
here
is
if
we
start
from
the
left,
we
observe
that
we
have
a
number
of
segments
and
and
slowly
those
segments
are
merging,
and
so
we
see
fewer
and
fewer
segments
down
to
a
minimum.
That
starts
around
2:30
a.m.
and
goes
all
the
way
to
around
5:30
a.m.
and
then
we
have
only
two
segments
during
that
period
and
from
around
5:30
a.m.
6:00
a.m.
D
D
We
took
half
a
day
from
it
and
we
just
ran
it
through
pure
Vega
to
observe
that
those
changes
should
to
the
segments,
and
we
see
precisely
that
so
it
starts
with
some
amount
of
traffic
and
it
slowly
drops
down
to
a
minimum
and
then
at
some
point
early
the
more
you
start
speaking
up
again
and
if
we
put
them
both
together
now
we
we
can
observe.
We
can
observe
that
a
that
effect
where
the
change
to
traffic
causes
the
segment's
to
to
merge
initially
and
then
displayed
again
when
the
traffic
picks
up.
D
Let
me
now
move
to
talk
about
the
the
private
architecture:
sighs
I'm,
as
I
mentioned
before
we
have
if-then
writers.
That's
one
of
our
API.
You
have
other
API
is
like
a
byte
stream
API,
but
they've
an
API
is
an
important
one
main
applications.
They
have
the
obstruction
of
events
or
or
similar
obstructions
that
can
be
mapped
to
two
events
and
so
using
the
event.
D
Api
writers
append
to
a
previous
stream
to
the
segments
of
a
previous
frame,
and
we
track
the
position
of
the
writer
so
that,
in
the
case
of
a
disconnection
and
followed
by
reconnection
that
the
writer
is
able
to
resume
from
the
the
right
position,
then
to
consume
the
data.
We
have
the
notion
of
reader
group,
so
we
group
event
readers
into
groups
that
we
use
to
split
the
load
of
segments
to
balance
the
load.
D
That's
all,
and
that
gives
me
their
bait
of
growing
in
in
shrinking
the
set,
so
that,
if
I
need
more
capacity
for
reads,
then
I
can
do
if
I
don't
need
as
much
and
I
want
to
reclaim
some
resources.
I
can
I
can
remove
some
readers
as
well
and
and
read.
The
groups
operates
even
in
the
presence
of
scaling
and
so
that
balancing
of
an
assignment
of
segments
happens
if
Ana,
when
in
the
presence
of
of
scaling
and
the
readers,
are
not
aware
of
those
changes
that
happens
and
it's
coordinated
internally
using
a
state
synchronizer.
D
Now
the
two
main
components
of
profit
itself
are
the
controller
and
the
segment
store.
The
controller
manages
the
lifecycle
streams.
It's
it
commands
the
segment
so,
for
example,
to
create
segments
when,
when,
when
he
needs
to,
it,
also
manages
transactions
that
we
run
against
against
against
streams.
The
segment's
store
is
responsible
for
managing
the
lifecycle
of
segments
and
for
storing
them.
So
that's
that's
our
underlying
storage
later
so
the
segment
store
doesn't
know
anything
about
streams.
D
We
use
steered
storage
the
first
year
of
storage
is
we
expect
it
to
be
a
low
latency
option
for
small
writes,
so
we
have
chosen
to
use
Apache
bookkeeper
and
for
the
second
tier,
which
we
call
the
long
term
storage
tier.
We
we
have
different
options
and
we
can.
We
can
bill
that
we
can
configure
it
to
use
either
fire
or
object
in
principle
that
the
system's
agnostic,
Chua
was
being
done
there
as
long
as
we.
What
we
have
binding
sue
connects
you
to
such
a
system.
D
So,
for
example,
we
can
use
HDFS
there
or
we
can
use
an
NFS
mount.
We
also
use
Apache
bookkeeper
for
for
coordinating
the
assignment
of
of
what
we
call
segment
containers.
So
that's
not
to
be
confused
with
Linux
containers.
Those
are
that's
obstruction.
We
use
to
represent
groups
of
of
segments
and
that's
the
units
we
use
to
assign
work
to
the
different
segment
story.
Instances.
D
Let
me
let
me
talk
a
bit
more
about
this,
so
the
controller
is
the
one
responsible
for
assigning
segment
containers
to
the
different
segment
store
instances.
So
each
segment,
each
segment
container,
is
responsible
for
for
a
group
of
of
segments
and
and
to
determine
where
a
particular
segment
is
going
to
land
with
respect
to
segment
containers
we
use,
we
use,
we
hash
the
name
of
the
segment
now
in
this
particular
example
I'm,
showing
that
the
controller
assigning
three
segments
containers
to
each
one
of
the
of
the
of
the
segment
story.
D
Instances
in
the
case
that
say
I
add
another
segment
store
instance.
What
the
control
will
do
is
will
remap
the
segment
containers
we
shut
down
segment
containers
in
the
existing
or
some
of
them
in
existing
segment
store
instances
and
map
them
to
the
should
to
the
new
one.
So
that
way
we
distribute
the
loads.
You
take
into
account
newer,
new
new
segment
store
instances
you
can
also
remove
them,
am
illustrating
adding
one.
But
of
course
you
can
be
moving
that
salsalin
we
distributed.
D
Now
talk
a
bit
more
about
the
right
and
the
right
path
so
on
the
right
path,
if
the
first
thing
that
an
event
screenwriter
needs
to
do
if
he
wants
to
append
data
is
to
determine
which
signal
store,
hosts
the
it's
the
segment
he
wants
to
appoint.
You
know
based
on
this
segment
on
the
segment
container,
so
he
finds
that
information
from
the
controller,
and
at
that
point
he
connects
to
the
segment
store
and
starts
on
appending
the
bytes.
D
D
I
persist
the
data
in
in
a
journal,
and
so
it's
guaranteed
that
that
is
on
disk
by
the
time
that
the
event
stream
writer
receives
the
technology
and
the
data
to
to
long-term
storage
gtr2
that
is
propagated,
asynchronously
and,
as
I
mentioned
before,
we
we
have
a
few
options
that
they
have
of
HDFS
and
FS
all
those
built
on
file
or
objects
for
the
repast.
We
have
a
similar
structure,
but
the
stream
readers
get
information
about
segments
from
the
controller
they
read
bytes
from
the
from
the
segment
for
the
corresponding
segment
store.
D
The
segment
store
responds
with
data
in
the
cache.
If
it's
a
cache
hit,
because
it's
stating
a
string
data,
they
will
return
immediately.
If
knots
they
needs
to
read
the
data
from
from
theater,
and
the
data
in
bookkeeper
is
not
used
for
reads
at
this
point
is
only
used
for
for
recovering
for
recovery
purposes.
So
if
a
segment
store
instance
crashes-
and
he
needs
to
recover
the
data
for
for
a
particular
segment
container
or
set
of
segment
containers,
then
you
will
use
the
data
in
in
Apache,
bookkeeper
legends.
D
D
A
So
Flavio
could
I
just
ask
a
very
good
question
about
the
battery
bookkeeper.
So
so
so,
in
effect,
is
that
is
the
apache
bookkeeper
kind
of
your
transaction
or
go
right
ahead.
Look
kind
of
thing
exactly.
D
So
the
to
connect
provecho
applications,
we
typically
use
connectors,
especially
if
you're
talking
about
well.
So
if
let
me
say
that,
so,
if
you're
talking
about
application
you're
building
from
scratch,
then
of
course
it
can
just
go
in
and
use
the
clients
directly.
But
you
have
say
frameworks
which
you
want
to
to
the
generic
frameworks
that
you
want
to
connect
to
provegan
for
those
you
want
to
build
Sinkin
in
source
connectors,
so
the
sync
connector
will
allow
you
to
output
data
to
a
private
stream.
D
The
source
connector
will
allow
you
to
and
to
read
data
from
from
a
profit
stream.
So
one
example
is
the
the
fling
connectors
that
we
have
developed.
So
the
the
reference
to
the
repository
is
at
the
bottom
of
of
the
slide,
but
that's
the
general
concept
for
for
connectors
that
that
we
can
use
for
systems
like
finger
order,
these
stream
processors
so
existing
connectors.
That
are
that
we
that
we
have
implemented
or
are
aware
of
so
we
have
one
for
Apache
Frank,
which
I
have
just
mentioned.
Then
we
have
one
for
Hadoop.
D
D
So
I
I
have
skipped
a
good
I'm,
a
good
amount
of
slides
that
I
hadn't
flink.
If
anyone
is
interesting
talking
more
about
this,
I
I
have
backup
slides
about
that,
but
I
always
keep
it
for
now
and
move
on
and
talk
about
them,
provegan
kubernetes.
So
we
have
implemented
operators.
So
operators
are
in
a
custom
controller
from
managing
the
lifecycle
of
an
application.
That
would
be
at
a
general
definition
and
we
have
used
that
in
in
a
few
places
and
our
operators,
they
do
a
number
of
things.
D
They
worry
about
the
deployment
about
configuration,
so
we
talked
about
board
disruption,
budgets,
part
of
Feeney
and
I,
have
any
rules
of
no
validating
making
sure
that
we
are
satisfying
those
assigning
default
value
co2
variables.
So
all
those
things
it
takes
care
of
scaling
in
the
case
of
the
private
operator
is
responsible
for
for
upgrades
self
upgrading
from
from
more
rushing
to
another
one.
That's
the
that
would
be
the
responsibility
of
the
operator
to
let
you
take
care
of
it
and
also
monitoring
the
health
of
of
the
individual
components
we
have
implemented.
D
Three
different
operator
is
for
the
different
components
that
are
well
for
the
various
parts
of
the
system,
so
we
have
for
the
private
operator
that
covers
the
controller
in
the
segment
store.
Then
we
have
the
bookkeeper
operator
and
zookeeper
operator.
So
all
those
three
are
open
sources
at
the
moment
and
what
I
wanted
to
do
is
show
how
quickly
a
cluster
data
that
we
have
deploy.
It's
running
a
longevity
workloads,
this
particular
longevity
workload.
This
is
running
so
it
categorized
by
a
small
set
of
routing
key.
D
D
Okay,
yeah
so
so
I
have.
We
have
I,
have
set
up
this
before
the
call,
because
there
are
a
number
of
steps.
I
do
need
to
get
this
ready,
so
I
am
I
want
to
show
this
Reni
already,
so
I
won't
be
running
a
lot
of
other
commands
this
also
in
dangerous
of
time.
But
let's,
let's
see.
D
D
We
have
the
pro
Vega
operator
running,
then
we
have
a
single
controller
and
a
single
segment
store,
and
we
have
five
zookeepers
running
here:
zookeeper
servers
and
the
zookeeper
operator,
so
this
version
of
of
the
of
the
operator
was
not
using
so
was
incorporating
the
actions
of
bookkeeper
as
well,
and
so
we
I
don't
have
a
the
bookkeeper
operator
running
separately
here,
but
at
their
operators.
I
have
described
this
isn't
a
separate
repository.
It's
available.
D
So
this
is
graph
Anna.
This
is
the
graph
on
a
dashboard
for
an
for
that
cluster.
So
let
me
start
with
this:
the
operational
dashboard
they
show.
So
this
is
the
krita
traffic
we're
imposing.
We
are
putting
between
6
&
8
hold
on
a
second
I'm,
not
seeing
you
know
this
plane.
Quite
yet,
okay,
there
you
go.
D
Right
so,
as
you
can
see
in
this
graph,
your
segment
right
bytes
per
second
we're
putting
a
load
sets
between
6
&
8
megabytes
consistently.
So
as
I
mentioned,
this
is
a
longevity
test
that
we
run
continuously
and,
and
one
of
the
interesting
things
that
we
can
see
here
is
the
number
of
the
variation
of
the
number
of
segments.
Remember
that
the
distribution
of
load
across
keys
is
skewed,
so
I
am
sending
to
a
small
set
of
keys.
I.
D
D
All
right
so
well,
that's
that's
what
I'm
going
to
show
quickly
just
to
see
that
that
that,
because
the
running
and
some
of
these
graphs,
let
me
go
back
to
the
to
the
presentation.
If
you
have
any
questions
about
this,
I
can
come
back
to
it
and
show
you
some
more.
D
And
alright,
so
that
was
like
a
quickie
view
of
a
prophetic
lesser
life
now
to
wrap
up
to
wrap
up
the
main
motivation
for
us
to
pursue
a
system
to
stone
streams
was
the
observation
that
I
we
have
a
very
good
number
of
applications
out
there
that
I
have
sources
of
that
are
continuously
producing
data,
and
we
felt
that
a
lot
of
those
applications
words
map
there.
There
Deb
struction,
that
that
they
have
of
this
sources
to
streams
rather
than
filer
objects,
which
are
the
thracian
primitives.
You
find
in
storage
systems.
D
Now
we
have
put
the
effort
into
making.
This
is
the
streams,
unbounded,
elastic
and
consistent
from
a
storage
perspective,
and
we
have
also
done
the
work
of
connecting
them
to
stream
process
so
that
we
can
extract
the
value
out
of
the
data
is
not
only
about
in
jessamine
story
but
also
being
able
to
to
derive
value
out
of
the
data.
I
gave
one
example,
which
is
effective
link
I
mentioned
some
others,
but
about
you
think
is
the
main
one
that
we
have
being
working
with.
D
The
project
is
open
source
is
under
apache
license,
which
you
at
the
moment.
It's
hosted
on
github
and
we
are
looking
for
a
home
for
a
frank
evasion,
so
we
are
looking
at
a
better
options
for
for
incubation
at
this
time
and
before
I
close
just
a
few
comments
on
anyone
who
could
be
interested
in
starting
with
pro
Vegas,
so
I
want
to
give
a
few
pointers
check
the
website.
There's
a
good
amount
of
documentation
there.
D
You
have
even
videos
and
and
blog
posts,
in
addition
to
project
documentation,
check
the
organization
on
github
and
the
repository
the
main
repository.
There
are
a
number
of
repositories
there
for
Vegas.
That
is
the
main
one
with
respect
to
what
I
present
it
today,
but
you
also
have
the
connectors
I
have
mentioned.
Then
you
can
run
provegan
stand-alone
locally.
If
you
want
to
do
some
quick
testing
or
even
some
development,
you
can
run
that,
along
with
previous
samples,
so
I
have
a
number
of
samples
and
data
in
that
product.
D
Samples,
repository
and
Teutonic
Oberon
Ares
I
suggest
that
you
know
go
look
at
the
repository,
the
instructions
there
and
throughout
that
process
you
know,
feel
free
to
give
feedback,
and
even
country
beautiful.
If
you
see
anything
that
you'll
be
interesting,
changing
your
or
improving
your
or
anything,
and
that's
with
that
I
conclude
my
presentation,
so
this
last
slide
gives
a
good
number
of
references
for
for
all
the
things
I
have
mentioned
during
the
presentation.
Thank
you.
A
Thank
you
for
that
presentation
that
was
was
very
informative.
It's
it's
interesting,
honest
I,
guess
slightly
different
to
some
of
those
sort
of
typical
storage
projects
that
we've
discussed
so
far.
So
this
is
this
is
a
very
interesting
alternative.
You
mentioned
you
were
looking
for
a
place
to
to
donate
the
project.
To
are
you
familiar
with
the
sort
of
projects,
graduation
structure
in
the
CN
CF.
D
A
little
bit
I'm
not
very
familiar
I'm
I'm,
more
familiar
with
a
of
doing
things,
I'm,
not
as
familiar
with
the
CN
CF
or
even
the
Linux
Foundation
in
general
I
mean
we
have
spoken
with
people
in
across
CN
CF,
sorry
on,
because
the
Linux
Foundation
green,
CN,
CF,
but
I,
don't
know
it's
anything.
You
want
to
make
sure
it
would
definitely
be
a
be
of
you
so
be
if
you
wanna,
you
wanna
give
any
information
about
that.
There
will
be
used
so
I'm
sure
all.
A
A
Perfect,
that's
really
helpful,
but
so
so,
just
just
to
quickly
summarize
there
are,
there
are
sort
of
three
levels
of
of
projects.
The
the
starting
level
is
a
is
a
sandbox
project
and
this
has
relatively
low
a
lot
relatively
low
bar
to
entry.
So
so
it's
kind
of
goods,
if
you're,
if
you're,
trying
to
help
build
the
community-
and
you
know
address-
maybe
IP
policy
related
changes
or
help
grow.
The
number
of
maintained
errs,
for
example,
of
the
project.
A
The
next
level
up
is
the
incubator
level,
and
that
covers
there
has
that
has
a
that
has
a
higher
bar
and
there
are
a
number
of
different
criteria
and
then
finally,
there's
there
are
the
graduated
projects.
But
graduation
then
requires
additional
things,
like
security
audits,
for
example,
so
it
would
be.
D
A
It's
it's
to
project
maturity,
so
so
the
incubation
level
requires
a
number
of
criteria
like,
for
example,
having
maintained
errs
from
different
organizations
or
and
end-users
and
having
the
project
being
used
in
production,
those
those
sorts
of
things.
So
you
know
if,
if
you
can't
get
some
of
those
references
or
or
you
know,
maybe
maybe
the
the
project
is
very
focused
on
just
one
organization
that
might
be
an
opportunity
to
go
in
at
soundbox
level
to
kind
of
grow.
The
community
further.
C
And
yes,
sorry,
I
might
I,
don't
know
my
audience
was
breaking
up,
I,
don't
know
if
it's
me
or
everyone
else
as
well
that
thanks
for
presenting
I,
think
it's
definitely
as
Alex
said,
a
a
very
different
project
than
we're
used
to
looking
at
so
I
think
it'd
be
a
good
asset
to
add
to
our
portfolio
and
ciencia.
B
D
Excellent
question
I
would
say
that
I
I
personally
haven't
entirely
made
up
my
mind.
I
have
what
I
have
been
with
I
accept
for
a
for
a
board
director
I
have
been
everything
in
the
SF.
I
am
commuter
projects,
I
am
part
of
PMC's
I
am
an
Apache
member.
I
have
been
part
of
incubators,
so
I
know
their
stuff
pretty
well.
I
have
heard
great
things
about
the
Linux,
Foundation
and
and
CN
CF
in
particular.
D
I
have
been
very
impressed
with
the
the
infrastructure
and
and
the
group
of
people
and
the
projects,
and
so
I
decided
to
explore
I
thought
it'd
be
a
good
idea
to
explore.
So
it's
it's
a
it's
offering
a
strong
contender
in
a
name
I
list.
It's
a
it's.
It's
looking
pretty
solid,
all
the
work
that
I
very
people
have
been
doing
about
the
projects
and
again
infrastructure
I
think
there
are
all
accounts
and
in
helps
projects
to
be
to
be
successful.
A
C
I
think
we
should
pass
on
the
maybe
sandbox
at
least
to
start
templates,
so
it
helps
answer.
Some
of
these
questions,
like
Louis,
was
answering
to
structure
it
in
a
way
that
would
help
you
understand,
which
level
of
acceptance
you
think
the
project
would
go
into
so
I'll
go
ahead
and
forward
that
on
to
you,
Flavio
and
then
I
think
we'll
go
from
there
once
once,
it's
structured
in
a
way.
C
B
C
That's
not
accurate.
There
are
projects
they
can
generally
I
mean
they
don't
come
in
as
graduation
I
think
that's
kind
of
a
given,
but
they
can
start
as
incubation,
provided
you
know
there
have
a
lot
of
support
within
the
community
and
box
meant
to
be
that
springboard
and
so
without
really
understanding
it
in
in
terms
of
all
the
other
different
aspects,
I,
don't
think
I
could
make
a
recommendation
one
way
or
the
other
okay,
the.