►
From YouTube: Ceph Month 2021: Ceph Project Update
Description
By Sage Weil and Josh Durgin
Slides: https://www.slideshare.net/Inktank_Ceph/202106-ceph-project-update/Inktank_Ceph/202106-ceph-project-update
Ceph month schedule: https://pad.ceph.com/p/ceph-month-june-2021
A
All
right
welcome
everyone
to
sethmonth2021
josh
and
I
are
going
to
give
a
quick
update
on
the
sub
project
and
then
we'll
be
moving
on
to
a
few
other
talks.
So
I'm
going
to
start
with
a
little
bit
of
stuff
background
that
will
be
familiar
to
anyone.
A
Who's
been
using
seth,
but
should
be
helpful
for
those
who
are
new
talk
a
little
bit
about
stuff
month,
give
an
update
on
the
stuff
foundation
and
what
we've
been
up
to
a
quick
update
on
the
stuff,
the
sepia
lab,
where
we
do
all
our
upstream
testing
and
release
stuff,
we'll
talk
a
little
bit
about
telemetry
and
then
talk
a
bit
about
the
future
and
what
we
see
coming
next,
so
just
a
little
set.
A
A
These
are
all
nice
words,
but
are
a
little
bit
meaningless
and
a
little
bit
more
concrete
terms.
Seth
is
open
source
software.
That's
a
software-defined
system
doesn't
rely
on
any
particular
piece
of
hardware
because
you
can
run
it
on
any
sort
of
commodity
parts.
That
means
quantity,
servers,
ip
networks,
hard
drives,
ssds
and
vms,
and
so
on
and
it's
unified,
and
that
a
single
cluster
can
serve
object,
block
and
file
workloads.
Obviously,
so
seth
is
free
and
open
source
software.
That
means
you
have
the
freedom
to
use
the
software
freeze
and
beer.
A
You
have
the
freedom
to
introspect,
modify
and
share
free
is
in
speech,
so
you
get
source
code
and
you
can
modify
it
as
you
see
fit.
That
gives
you
the
freedom
from
lock-in
to
a
particular
vendor
and
gives
you
the
freedom
to
innovate
by
extending
and
improving
the
platform.
A
Stuff
is
also
reliable.
That
means
that
we
believe
ceph
should
be
a
reliable
source
system.
That's
built
out
of
unreliable
components
with
no
single
source
of
failure.
A
We
provide
data
durability
via
replication
or
erasure
coding,
and
there
are
no
interruptions
of
service
from
rolling
upgrades
or
online
expansion
or
contraction
of
the
cluster
and
seth
is
always
designed
to
favor
consistency
and
correctness
over
performance,
although
obviously
we'll
try
to
get
those
and
also
stuff
is
designed
to
be
scalable.
So
it's
an
elastic
storage
infrastructure,
which
means
that
the
cluster
may
grow
or
shrink
over
time.
As
your
needs,
change
or
as
hardware
is
refreshed.
A
That
means
you
can
add
a
remove
hardware
while
the
system
is
online
and
under
load,
and
we
allow
seth
to
scale
up
with
bigger
and
faster
hardware,
we
scale
out
by
adding
more
hardware,
more
systems
and
more
racks
to
a
single
cluster
to
provide
more
capacity
and
performance,
and
we
also
allow
you
to
federate
multiple
clusters
across
multiple
sites,
with
asynchronous
replication
features
and
disaster
recovery
capabilities
and
on
top
of
all
this
stuff
is
a
unified
system.
A
That
means
that
we
can
provide
object,
block
and
file
interfaces
from
the
same
storage
cluster,
which
is
built
on
this
rados
component,
which
handles
all
the
replication
and
data
distribution
in
the
system.
A
We
release
seth
every
year
in
march,
so
we
have
a
stable
named
release
every
12
months.
We
provide
back
ports
for
two
releases,
so
that
means
bug
fixes
and
security
updates.
That
means
that
a
release
will
release
really
reach
its
end
of
life
shortly
after
the
newer
release.
That's
two
years
after
that,
so,
for
example,
nautilus
is
reaching
end
of
life
right
about
now.
Now
that
pacific
is
out,
so
we
just
did
a
security
bug
fix
release
a
couple
weeks
ago.
A
We
provide
packages
debian
and
rpm
release
packages
on
the
upstream
sep
website,
as
well
as
container
container
images,
and
recently
we've
been
working
on
some
pro
process.
Improvements
in
this
area
so
we're
getting
better
at
doing
the
security
hot
fixes
quickly
and
getting
them
out
simultaneously
for
multiple
stable
branches
and
are
working
on
a
more
regular
cadence
for
those
stable
releases
so
a
little
bit
about
stuff
month,
which
is
where
you
are
right
now
june
21..
A
A
We
did
not
plan
an
in-person
event
this
year
either,
obviously
because
the
pandemic
is
still
continuing,
but
we
wanted
to
give
people
an
update
on
the
project
and
provide
venue,
but
we
didn't
want
sort
of
like
a
full-blown
virtual
conference
that
took
all
day
and
had
you
sitting
in
front
of
zoom
for
extended
periods
of
time.
So
the
idea
is
to
spread
it
out
over
several
weeks
and
have
sort
of
bite-sized
chunks.
A
So
a
couple
hours
at
a
time
with
a
couple
talks
to
bring
people
in
and
try
to
make
it
more
interactive.
So
we
can
have
more
questions
and
interactivity
with
the
speakers
and
also
have
some
freeform
discussion.
Just
so,
users
can
network
and
share
share
experiences.
A
A
That
should
be
in
the
chat,
feel
free
to
add
questions
there
and
also
at
the
bottom.
For
that
particular
day.
There's
an
area
to
just
add
any
topics
that
you
want
to
discuss:
sort
of
after
these.
A
After
these
presentations,
they'll
have
all
the
developers
online
and
other
users
and
so
on,
and
we
can
talk
about
really
whatever
you
want
this
first
week
we're
going
to
talk
about
this
project,
update,
we'll
talk
about
rados
and
the
new
windows
support
that
we're
working
on
next
week
will
be
raido's
gateway,
I'm
in
a
number
of
performance
talks
the
week
after
that
is
rbd's,
dashboard
and
more
lightning
talks,
and
then
the
final
week
we'll
have
seth
ffs
and
stuff
adm.
A
So
please
open
up
the
pad
and
add
whatever
you'd
like
next
year.
We
would
like
to
do
another
cephalocon
and
assuming
we
do,
it
would
be
in
march
2022,
but
we
don't
have
any
definite
plans
yet,
but
we
don't
know
where
we
would
hold
it.
We
could
go
back
to
doing
it
in
seoul,
which
is
where
we'd
plan
to
do
it
two
years
ago
or
one
year
ago,
but
didn't
we
could
also
go
for
something
in
north
america
after
seoul
was
canceled.
A
So
if
you
guys
have
suggestions
about
which
region
and
so
on,
please
let
us
know
we
hope
to
have
an
in-person
event,
but
it's
sort
of
an
open
question
whether
we
want
to
try
to
invest
in
making
it
more
of
a
hybrid
type
of
thing
where
you
can
also
participate
remotely
or
if
we
should
just
focus
on
having
sort
of
a
more
traditional
conference,
more
interested
in
your
feedback.
So
please
let
us
know
a
bit
of
an
update
on
the
seth
foundation.
A
A
So
we
have
a
number
of
career
members
now
12
a
couple
new
entrants
are
bloomberg,
joined
in
the
last
year
and
a
number
of
general
members
as
well.
New
members
here
are
cloud
base
and
vexos
and
then
a
number
of
associate
members
as
well.
A
These
are
educational
institutions
and
government
institutions
and
nonprofits,
and
so
on
so
current
projects
that
the
foundation
is
focused
on
one
that
we
started
a
couple
years
ago
was
an
effort
to
improve
the
theft
documentation
and
we
have
a
full-time
technical
writer
stacked
over
who's
on
contract.
For
a
couple
of
years
now,
hopefully,
you've
noticed
some
improvements
on
the
setbox
as
a
result,
as
a
result
of
his
efforts.
A
We
also
have
a
website
update
that's
in
progress
for
the
last
several
months
or
six
months,
maybe
even
a
year
a
while
now
that
effort
has
been
spearheaded
by
soft
iron,
one
of
our
premier
members
and
we're
very
grateful
for
their
efforts.
There
we're
shifting
away
from
wordpress,
so
it's
gonna
be
a
static
site
generated
from
github.
A
You
can
go
ahead
and
take
a
little
source
code
there
and
there's
a
there's,
a
sort
of
a
tentative
site,
whatever
development
site,
we're
hoping
to
launch
this
within
the
next
month
or
so
so
stay
tuned.
A
A
Jc
lopez,
who
was
responsible
for
a
lot
of
the
training
material
way
back
and
inktank
days,
is
helping
put
together
a
lot
of
that
and
we're
hoping
to
get
an
initial
set
of
stuff
like
101
type
courses
together
online.
These
can
be
either
hosted
on
edx
platform
or
lf
has
their
own
thing.
A
They
can
support
both
sort
of
cell
flat
or
instructor
led
courses.
So
there's
the
potential
to
have
more
advanced
courses
later,
even
like
paid
certifications
and
so
on.
We
don't
really
have
any
specific
plans
here.
We're
focused
right
now
on
just
having
some
initial
free
material
there
to
spread
the
level
of
suffix
expertise
within
the
community,
but
we're
excited
about
pursuing
this
new
direction.
A
Sorry,
my
headset
always
turns
off
we're
drastically
reducing
the
amount
of
money
that
the
fundation
has
been
spending
with
public
cloud
providers
to
host
a
lot
of
the
infrastructure
by
buying
hardware
to
put
in
the
cpu
lab.
Instead,
it's
much
more
cost
effective,
and
so
at
this
point,
we've
basically
are
only
putting
public-facing
content
in
ovh.
So
things
like
the
tracker
and
the
stuff
website
and
download.com.
A
So
we're
pretty
happy
about
that.
We're
saving
a
bunch
of
money
there
and
buying
lab
hardware.
So
that's
hardware
for
doing
builds
for
doing
ci
tests,
all
the
github
checks
and
stuff
that
we
do
and
we're
also
expanding
the
stuff
cluster.
That's
in
the
lab
that
we
use
to
store
all
of
our
stuff
results,
and
so
on.
Windows
support
is
coming
along
we're
contracting
with
cloud
base
to
do
a
lot
of
this
initial
development
and
to
set
up
all
the
upstream
ci
and
testing
and
stuff
to
do
that.
A
So
we're
very
excited
we're
going
to
hear
more
about
that
from
alessandro
in
about
45
minutes
and
also
there's
a
new
marketing
committee
right
now,
soft
iron
canonical
and
red
hat
are
participating,
but
we're
eager
to
have
anybody
else
going
in.
Essentially
this
is
the
idea
here
is
to
coordinate
sort
of
upstream
project,
focused
marketing
activity
and
press
and
so
on
for
the
community.
A
So
we're
excited
about
that
and
josh.
I
can
turn
it
over
to
you.
If
you
want
to
talk
about
new
stuff
in
the
lab
sure.
B
B
B
I
had
a
google
summer
of
code
student
last
number
who
changed
the
the
model
there
so
that
we
could
actually
run
larger
scale
tests
by
locking
many
machines
at
once
and
not
having
kind
of
competition
for
lab
resources
and
there's
one
more
work
in
progress
on
this
by
another
developer
actuaria,
who
is
moving
from
the
current
in-memory
queue
into
and
moving
that
into
a
postgres
database,
which
will
allow
us
to
have
much
more
intelligent
scheduling
of
our
test
lab.
B
So
that
should
be
able
to
help
us
use
the
lab
much
more
efficiently
and
get
a
lot
more
interesting
testing
done,
especially
at
larger
scales
within
the
lab
another
area.
We
want
to
focus
on
going
in
in
the
future
is
downgrade
testing.
This
would
mean
some
testing
downgrades
within
a
given
major
release,
so
from
one
minor
point
release
to
the
previous
one
or
perhaps
for
the
the
previous
few.
B
B
B
Another
major
aspect
of
the
future
is
at
different
architectures,
a
big
one
is
arm
so
ampere
donated
a
bunch
of
hardware
to
the
lab
and
we're
now
building
it
packages
and
container
images
for
center
state
and
u.2
focal
we
currently
we
there.
B
We
had
to
address
a
number
of
issues
in
podband
and
clay
and
to
support
multiple
architectures,
but
things
are
improving
there
and
we've
already
seen
a
number
of
users
try
this
out
on
our
hardware
as
small
as
raspberry
pi's
for
home
clusters
and
we're
excited
to
see
where
that
arm,
however,
goes
in
the
future.
B
In
terms
of
broader
usage,
one
topic
we
wanted
to
talk
about
today
was
telemetry,
so
cephas
had
a
telemetry
module
that
allows
users
to
opt
in
to
share
some
of
the
data
with
it
with
stuff
developers
and
there's
a
public
dashboard
that
displays
a
lot
of
information
about
kind
of
which
versions
are
running,
and
currently
we
have
over
a
thousand
clusters
reporting
telemetry
with,
as
you
can
see,
at
300,
over
300
petabytes
of
storage.
B
The
orange
here
is
octopus.
These
are
different
versions
and
the
little
red
there
is
pacific,
so
pacific
was
just
was
just
released
around
that
time.
We
see
that
line
starting
to
get
increased
and
octopus
is
leveled
off,
so
it's
users
are
upgrading
now
to
pacific
and
now
that
it's
so,
which
is
great
to
see.
B
But
telephony
is
all
opt-in,
it
requires
you
to
explicitly
acknowledge
that
license
for
it,
and
there
are
a
number
of
different
channels
that
you
can
choose
to
opt
in
or
out
of
there's
the
basic
information
which
has
like
basic
metadata
about
the
cluster,
like
how
big
it
is,
what
version
it
is
there
is
crash
metadata,
which
really
helps
us
developers
understand
what
kinds
of
problems
people
are
actually
running
into
in
real
life
and
whether
something
in
the
we're
seeing
maybe
in
our
tests
once
out
of
ibm
100
runs,
is
a
very
common
problem
in
the
field
or
if
it's
really
limited
to
our
hardware.
B
We've
already
seen
some
great
use
of
this.
When
we
had
a
buggy
version
of
tc
malik
in
the
container
image
that
was
causing
a
bunch
of
crashes
in
rgw,
we
saw
a
bunch
of
those
show
up
in
the
vh
telemetry.
B
There's
also
a
device
channel
for
capturing
metrics,
about
which
kind
of
disks
are
in
use,
as
well
as
health
data
like
smart
from
the
from
those
disks.
This
helps
us
try
to
improve
our
disk
protection
model
that
tries
to
give
you
an
idea
of
when
whether
your
device
is
likely
to
fail
soon
or
not.
B
There's
an
optional
identity
channel
that
lets
you
provide
some
kind
of
information
like
an
email
address
in
case
developers
would
like
to
contact
you
about
your
cluster
and
in
the
future,
we're
working
on
a
performance
channel
to
add
a
bit
more
granular
information
about
how
the
cluster
is
used.
In
terms
of
like
what
kinds
of
read
versus
write
patterns
we
see
in
the
cluster,
what
kind
of
rates
there
are
other
sorts
of
information
like
that.
B
That
will
help
us
understand
how
well
stuff
is
working
at
different
levels,
with
real
workloads
and
what
kinds
of
real
workloads
we
see
in
in
real
life.
This
will
help
with
developers
optimizing
stuff,
as
well
as
potentially
making
tuning
for
users
more
of
a
close-loop
feedback.
B
Grab
the
json
report
that
includes
all
this
information
and
inspect
yourself
to
see
exact,
precisely
what
is
being
shared
and
what
isn't
safe.
So
you
think
on
the
next
slide,
we
have
a
little
bit
of
information
about
from
the
most
recent
ceph
user
survey,
asking
whether
folks
enable
telemetry,
and
we
see
about
a
quarter
of
that
response.
Finders
haven't
enabled
on
every
cluster,
another
15
or
so
have
enabled
on
some
clusters
and
60
don't
have
it
enabled
yet
there's
a
wide
range
of
reasons.
For
this.
B
The
most
common
answer
is
that
the
clusters
on
our
protected
network
that
doesn't
have
access
to
the
internet.
Unfortunately,
we
have
a
solution
for
this.
You
can
add
a
socks
proxy
to
that
selling
p
module
and
have
it
report
through
that.
That's
one
side
pretty
easy
to
solve.
Second,
most
common
is,
I
haven't
gotten
around
to
it
yet.
B
I
would
like
to
talk
a
little
bit
about
where
we
see
stuff
going
in
the
future
in
general.
I
think
there
are
kind
of
three
major
aspects
of
that
we
want
to
focus
on.
First
off
is,
of
course,
the
basic
reliability,
which
is
the
baseline
of
a
storage
system.
You
expect
it
to
be
stable
and
not
lose
your
data,
of
course,
and
then.
Secondly,
there
is
a
user
experience,
though
out
of
the
box.
B
B
Kind
of
helping
you
set
up
more
end
user
protocols
that
you
would
that
would
be
taking
more
effort
than
with
other
systems.
So
you
can
really
easily
set
up
a
small
cluster
to
as
a
kind
of
nas
replacement
and
we're
working
on
kind
of
turnkey
support
for
nfs
and
object,
storage
and
samba
coming
in
quincy.
B
So
there
are
a
number
of
new
classes
of
devices
coming
along
down
the
pipe
they're
at
varying
stages
of
development.
Some
are
already
available
for
purchase
today.
Some
are
still
in
kind
of
standardization
phases,
but
we're
looking
at
a
number
of
these,
especially
in
the
context
of
like
crimson
rework
of
the
osd,
to
take
advantage
of
these
incredibly
fast
storage
devices,
which
have
really
shifted
the
bottlenecks
over
the
last
five
to
ten
years
from
the
disc
to
the
cpu.
B
But
we'll
see
a
lot
more
discussion
about
what
exactly
that
means
in
terms
of
crimson
later
in
the
sf
month.
But
for
now
is
to
be
aware
that
this
is
kind
of
where
we're
dealing
with
all
these
kinds
of
different
new
hardware
and
new
devices
and
it's
being
designed
in
a
way.
That's
flexible
and
composable
enough
to
work
with
any
of
these.
B
Well,
another
major
emerging
technology
is
mtme
over
fabrics,
so
the
npme
protocol
has
been
around
for
a
long
time,
just
as
a
local
storage
protocol
for
for
devices,
especially
flash
devices
and,
more
recently,
the
fabric
side
of
it
has
enabled
access
to
remote
devices
as
well,
without
involving
potentially
the
cpu
on
the
other
side.
B
There's
a
project
underway
already
to
look
at
tackling
this
on
the
on
the
client
side
in
terms
of
presenting
an
rbd
device
over
the
nvme
over
fabrics
as
an
alternative
to
discussion,
this
could
be
used
as
like
a
metal
on
a
service
for
cloud
manufacturer
where
you're
exposing
a
disk
directly
to
a
host,
and
it
looks
like
a
a
plain
nvme
disk,
though
almost
any
kind
of
host
could
talk
to
it
directly.
B
There's
some
new
hardware
out
there,
like
more
next
style
hardware
like
nvidia's,
bluefield
chip
that
allow
you
to
do
this
without
without
involving
the
host
cpu
at
all.
B
There's
also
some
discussion
on
further
future
plans
for
crimson
a
kind
of
a
second
phase
of
it,
which
would
would
involve
changing
the
replication
style
of
the
osd,
to
avoid
needing
to
write,
involve
the
cpus
on
the
replicas
in
the
replication
path
at
all.
So
that
would
be
a
massive
decrease
in
the
cpu
requirements
that
if
you
have
a
three
to
a
three
times
replication
that
would
be
you
know,
cutting
out
the
cpu
from
those
two
replicas
would
be
a
two-thirds
reduction,
which
would
be
huge,
that's
a
a
very
large
project.
B
Finally,
there's
a
large
ecosystem
of
software
around
nsf
that
stuff
integrates
with
some
of
these
things
are
I'm
ensuring,
like
rook
in
the
kubernetes
space
and
there's
a
more
also
more
support
for
k
native
via
rgw,
especially
with
the
bucket
notifications,
apis
and
things
like
sp
select,
which
let
you
work
with
all
kinds
of
software,
like
spark
that
treats
the
very
large
scale
object
store
almost
like
a
database
also
a
lot
of
efforts
around
data
movement
in
general
and
multi-site,
with
interoperability
between
a
private
cloud
and
a
public
cloud.
In
particular.
B
This
will
probably
be
described
a
bit
more
in
the
rgw
talks
later
this
month,
but
some
of
it
involves
some
kind
of
replication
between
a
private
data
center
and
a
public
cloud,
as
well
as
within
a
public
cloud
and
being
able
to
choose
different
backends
for
each
of
those
there's.
Also
some
newer
emerging
de
facto
standards
in
in
the
industry
like
apache
aero
4k.
These
are
kind
of
general
data
and
formats
for
use
for
for
a
large
number
of
tools,
especially
in
the
machine
learning
and
data
science.
B
Space
which
are
giving
storing
data
in
a
very
efficient
format.
For
queries,
as
well
as
enabling
queries
to
be
saved
as
kind
of
like
a
what
you
might
call
a
materialized
view
in
the
relational
database
world,
so
you
can
have
multiple
steps
in
in
your
pipeline
that
kind
of
trade
these
these
these
use
of
the
data
and
then
expose
them
to
subsequent
steps
without
any
of
the
data
any
computation
leaving
the
cluster
you
could.
B
So,
what's
coming
after
quincy?
Well,
we
haven't
figured
out
a
name
exactly
yet,
but
there
we
have
an
ether
pad
of
the
url
there
and
the
current
leader
is
seth
rogen.
So
please
go
and
vote
because
I'm
not
sure
that
that's
the
last
name
we
can
come
up
with
here.
B
That's
apparently
the
name
of
the
squid
at
the
vancouver
aquarium.
B
B
Yeah,
I
think
we
what
well
I
know
I
will
not
discuss
crimson
at
least
briefly
in
a
minute,
and
I
want
to
schedule
a
larger
crimson
update
later
than
in
the
month.
As.
A
Well
right,
I
put
I
put
the
links
in
the
chat
for
the
the
pad
with
our
names.
Usually
I
go
through
wikipedia
or
something
similar
and
try
to
come
up,
find
all
the
like.
The
common
names
for
cephalopod
species
that
start
with
r.
I
haven't.