►
From YouTube: Running Ceph on Flashcache - Paweł Sadowski
Description
Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Paweł Sadowski, OVH DevOps
C
C
C
C
The
first
case
is
construction,
so
cash
project
is
a
it
happens
when
you
put
in
cash
stuff
that
you
don't
need,
for
example,
when
deep
scrub
is
running,
it
reads
all
all
of
your
data
and
fill
the
cache
with
data
that
won't
be
needed
within
until
next
next
dip
dip
scrub.
So
this
is
the
problem
with
most
caching
solutions.
You
have
to
tell
the
caching
mechanism
that
this
data
won't
be
needed
soon.
C
So
there's
no
need
to
cash
it
so
as
deep
scrubs
is
no
required
large
rates
that
those
large
reads
destroy
the
hot
data
set
from
the
cache
and
the
same
history
is
happening
during
backups
and
you're
in
recovery,
who
also
during
recovery.
Not
only
real
statements
only
also
writes
happens,
so
they
also
pushing
hot
data
set
from
from
cache.
So
what
what
can
we
do
to
avoid
that
situation?.
C
C
All
the
layers
on
the
top
of
the
aisle
have
to
support
it.
It
has
to
be
supported
by
OSD,
has
to
be
supported
by
caching.
Device
has
to
be
supported
by
the
kernel,
so
it's
hard
to
to
make
them
it's
hard
to
make
that
that
change.
So
what
can
we
do?
Is
watch
this
first
webcast,
some
mechanism
that
is
called
PID
bug
listing,
so
you
can
disable
caching
for
particular
I/o.
C
It
has
some
advantages
like
if
you
have
some
back
at
Mohali
that
is
running
locally
on
the
machine,
then
you
can
just
put
process
ID
tell
the
flash
that
this
process
ID
shouldn't
be
kept
with
this
scrap.
It's
not
that
easy
to
find
the
thread
ID
that
is
actively
working
and
the
PID
blacklisting
works
only
with
direct
I/o.
So
when
you
have
rights
that
were
delayed
and
there
are
flushed
from
the
system
buffer,
the.
C
They
are
coming
from
kernels,
so
you
don't
know
the
process
ID,
so
we
could
reduce
that
debt
solution.
So
another
solution
that
flash
cache
has
that
flash
memory
supporting
is
skipping
sequential
threshold.
So
flash
cache
when
you
do
large
reads
or
write
large,
writes
in
sequence.
Let's
graph
can
detect
such
situation
and
can
skip
catching
of
such
I
hope.
So
all
of
the
operation
that
I
mentioned
before
like
deep
scrap
recovery,
Becca
are
usually
big,
big
operation.
So
it's
easy
to
it's
easy
to
detect
them
as
skip
getting
for
such
forced
a
trial.
C
Second
story
is
about
matching
the
block
size
on
the
Liars
on
the
storage
layers.
So
when
we
were
tracking
our
latency
on
on
our
clusters,
we
noticed
that
clients
latency,
is
becoming
higher
and
higher
for
an
unknown
reason,
and
we
found
out
that
the
client
was
doing
some
snapshots
and
forgot
to
remove
those
snapshots
and
multiple
snapshots
on
the
same
image
are
causing
large
large
sets
of
data.
C
Those
fences
such
effect
that
each
right
to
the
new
right
that
will
copy
that
will
try
to
copy
and
write.
Some
blood
will
have
to
also
update
the
exact
tributes,
and
this
took
way
much
too
much
time
like
300
milliseconds,
and
so
after
adding
this
this
option,
we
saw
that
the
latency
has
dropped
to
the
normal
normal
levels.
So
this
this
option
was
not
was
not
mentioned
in
the
documentation.
So
we
that
was
one
of
the
first.
C
C
C
C
C
C
We
used
another
file
system
for
the
OS,
so
we
were
able
to
log
into
that
server
and
that's
why
we
could
find
what's
happening
so
fun
flash
just
performs
in
our
case.
So
in
our
case
we
have
over
50%
more
I/o
from
agde,
so
rights
are
more
localized,
so
so
they
are
merged
before
sending
to
the
device,
which
means
that
we
have
less
6
on
the
hard
drives
and
we
use
nvme
for
journal.
C
A
C
We
did
the
only
change.
We
did
notice
that
the
new
colonel
doesn't
allow
you
to
increase
the
disc
you
on
the
HD
DS
it's
by
default.
First
to
128,
and
on
this
colonel
we
could
increase
that
to
8000,
which
allows
the
colonel
to
rational
the
aisle
to
the
device
which
improve
the
performance.
So
reads
our
merge
before
sending
to
the
to
the
device.