►
From YouTube: ZFS Caching: How Big Is the ARC? by George Wilson
Description
From the 2020 OpenZFS Developer Summit
slides: https://drive.google.com/file/d/19th2JHeITp1Iefc-JffIDqn_4oy_JfVx/view?ts=5f7b7499
Details: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2020
A
Great
well
thanks
everyone
for
for
joining
virtually.
This
is
going
to
be
an
interesting
thing.
This
first
time,
I've
ever
done
a
virtual
conference,
as
probably
the
case
with
many
of
you.
A
So
what
I
wanted
to
talk
about
today
was
kind
of
like
some
things
that
we
encountered
at
delfix
regarding
the
arc
for
those
that
don't
know,
we
have
recently
switched
our
product
to
linux,
so
this
was
kind
of
like
a
a
big
change
for
us,
and
one
of
the
things
that
we
were
trying
to
do
was
to
make
sure
that
in
fact,
we
kind
of
get
the
same
performance
as
what
we're
used
to
seeing
in
illumos,
because
that's
what
our
product
was
based
on
and
so,
as
we
were
kind
of
going
through
and
kind
of,
you
know,
had
this
big
giant
schedule
for
it.
A
One
of
the
things
that
we
noticed
was
that
the
arc
size
on
linux
is
quite
smaller
than
what
we
were
expecting
and
what
we're
used
to
on
lumos.
So
we
said:
okay,
well,
you
know:
hey
no
big
deal,
we
we
know
the
code,
let's
go
in
there
and
change
it.
A
So
we
had
noticed
that
linux
was
half
of
memory,
and
here
you
kind
of
see
a
diagram
of
what
like
we're
used
to
in
a
limous
which
is
kind
of
taking
up
all
of
memory
and
that
we
use
that
primarily
because
we're
really
read
intensive
for
for
certain
workloads.
So
we
want
to
have
that
caching!
A
Well,
no
big
deal,
you
know,
hey
we're
we're
all
developers,
let's
just
go
in
there,
make
a
change,
simple
fix
and
sure
enough.
It
was
a
simple
fix.
We
actually
got
it
changed
and-
and
so
now
our
delfix
platform
kind
of
looks
like
this,
which
matches
some
of
the
other
platforms
that
you
may
be
used
to
so
like
freebsd
and
lumos,
also
kind
of
do
a
similar
type
of
configuration
where
they're
using
most
memory
and
so
great
we
we're
done
and
that's
the
end
of
my
talk.
No
sorry!
It's
not!
A
A
So
what
I'm
showing
here
is
a
graph
of
the
arc
and
in
this
case
arc
c,
so
the
size
of
the
arc
and
what
the
target
sorry
the
target
size
of
the
arc
and
kind
of
what
happens
over
a
period
of
time.
And
so
we
noticed
this
deep
drop
off
where,
like
this,
just
steep
decrease
in
the
arc
size
and
that
started
us
kind
of
investigating.
Really
what
was
happening
here
and
what's
interesting
is
like
you
know.
A
We
had
made
the
change
for
to
increase
the
size
of
the
arc
months
and
months
ago,
and
we're
now
kind
of
starting
to
see
this
now
in
kind
of
like
real
systems,
so
we
kind
of
had
to
step
back
and
we
had
like
it's
two
types
of
things
we
looked
at,
so
we
were
familiar
with
what
you
see
on
here
on
the
on
the
left
side
of
your
screen,
which
is
expected
workflow.
We
knew
we
had.
You
know
the
arc.
Has
these
two
threads
that
run?
A
What
we
discovered,
however,
is
that
the
actual
workflow
had
this
other
component
called
the
shrinker,
and
this
was
relatively
new
to
us
because
again,
this
is
a
kind
of
a
linux
specific
component
and
not
something
that
we're
used
to
seeing
in
illumos
and
what
we,
when
we
dug
into
it.
We
noticed
that
the
shrinker
has
kind
of
these
two
modes
where
one
of
them
is
to
just
get
some.
A
You
know
how
many
objects
can
can
the
shrinker
actually
free
or
how
many
are
you
going
to
tell
the
kernel
that
you
can
free
and
then
actually
go
in
and
shrinking
the
target
size?
A
In
addition
to
that,
we
also
wanted
to
kind
of
like
we
noticed
two
different
ways
that
the
shrinker
was
being
called
and
there's
two
different
types
of
memory:
pressure
that
you
can
see
like
and
you're,
probably
used
to
this
on
on
any
platform
which
is
like
kind
of
a
direct
memory
reclaim
and
an
indirect
memory
claim,
and
this
graph
on
the
left
is
something
that
I
kind
of
wanted
to
speak
to,
because
the
indirect,
the
indirect
memory
claim
actually
happens.
A
So
k-swap
d
will
kind
of
run
until
it
gets
to
the
point
where
it's
this
minimum
water
mark
and
at
that
point
in
time,
that's
when
you
actually
start
seeing
these
direct
reclaims
so
you'll
get
the
synchronous
reclaim,
which
is
happening
in
the
context
of
whatever
threat
is
running,
so
you
may
be
in
the
middle
of
doing
a
you
know,
a
vfs
read
and
all
of
a
sudden
you
need
memory
and
it's
coming
in
and
calling
back
into
the
arc
and
the
arc
is
having
to
go.
Do
some
stuff.
A
A
The
first
thing
is,
like
the
count
objects
effectively,
just
return
back
all
evictable
memory
to
the
kernel,
so
it
was
saying
hey.
If
you
need
memory
here,
we
have
everything,
that's
that's
evictable
you
can
have
and
then
the
but
the
count
objects.
That's
all
it
does.
It
just
returns
back
the
number
of
pages
that
you
think
you
can
give
back.
If
the
kernel
needs
it,
and
then
the
scan
object
is
really
the
one
that
does
the
real
work.
So
when
the
kernel
calls
you
back,
it
says
here
go
do
some
work.
A
A
So
as
a
result,
what
every
anytime
you
got
the
you
know
the
shrinker
called
back
into
the
arc.
It
simply
reduced
the
target
size
and
away.
You
went
on
that
steep
decline
all
the
way
to,
and
so,
as
we
started
monitoring
this,
we
noticed
like
here's
a
sample
so
over
a
20
second
period,
with
just
some
small
memory
pressure.
We
were
seeing.
A
198
000
calls
to
go
and
shrink
the
arc,
and
so
we're
being
bombarded
with
these
calls
coming
in
from
case
walk
d,
so
over
9
000
per
second,
and
what
was
interesting
is
that,
even
though
we
were
trying
to
keep
up
and
do
some
eviction,
it
seemed
like
the
colonel
was
never
satisfied
that
we
were
doing
any
work,
so
that
was
another
mystery
for
us
to
go
solve.
A
So,
let's
step
back
and
go
back
to
our
original
problem,
I
mentioned
we
had
these
iscsi
lund
resets.
So
how
does
this
actually
relate?
What
was
really
happening
here?
If
we
take
the
same
graph
and
now
map
arc
size,
things
started
to
become
a
little
clearer,
so
you
can
see
here
that
we
had
seen
the
arc
target
size
decrease
on
this
steep
decline
and
the
size
is
not
able
to
keep
up
it's
actually
slowly
going
down,
and
these
are
actually
happening
by
two
different
processes.
A
A
So
before
we
go
into
the
real
crux
of
the
problem,
I
wanted
to
kind
of
make
sure
that
people
kind
of
understand
what
happens
when
you're.
In
this
you
know
kind
of
arc
full
condition
or
if
you've
looked
at
the
code,
the
arc
overflowing
condition
where
an
arc
overflowing
condition
is
when
the
size
of
the
arc
is
bigger
than
the
target
size.
A
So
when
you're
actually
going
in
and
trying
to
do,
for
example,
a
reed
and
you
need
a
block
in
the
arc,
the
first
thing
you're
going
to
ask
is:
is
the
arc
overflowing?
If
the
answer
is
yes,
then
you're
going
to
be
asked
to
block
giving
the
the
arc
evic
thread
the
ability
to
kind
of
make
some
progress
and
find
a
block
for
you,
so
that
you
can
make
so
you
can
actually
have
it
and
and
do
your
I
o
if
you're
not
overflowing
great,
you
get
a
block.
A
A
A
So
with
that
we
set
off
to
go
and
try
to
improve
how
we
actually
do
memory
pressure,
detection
and
the
first
place
we
looked
at
was
to
actually
revamp
the
way
the
shrinker
logic
works.
So
because
we
have,
it
could
have
lots
of
evictable
memory.
We
felt
it
was
a
little
unfair
for
us
to
simply
say:
hey
here's
all.
You
know
gigabytes
of
evictable
memory
anytime,
the
shrinker
was
called
so
instead
we
said
well,
you
know,
let's
lie
to
the
shrinker
and
give
back
a
certain
number
of
pages.
A
Likewise,
the
work
that
was
actually
happening
with
the
shrinker,
instead
of
just
going
back
and
adjusting
the
target
size,
we
wanted
to
make
sure
that
when
we
adjusted
the
target
size,
we
actually
waited
for
those
evictions
to
happen,
and
then
there
was
one
little
nuance
that
we
uncovered,
which
is
there
is
a
way
to
actually
let
the
colonel
know
that
we're
making
some
progress.
It
just
wasn't
what
we
expected
instead
of
the
kernel
tracking,
like
free
pages,
it
wants
to
know
that
things
are
in
this
reclaimed
state.
A
We
also
introduced
this
lock
step
type
of
mechanism,
so
I
mentioned
that
we
want
to
make
sure
that
we're
making
some
progress
any
time
that
you
shrink
the
that
you're
actually
shrinking
the
arc
size.
So
with
that
in
mind,
we
introduce
this
new
function
called
arc
weight
for
eviction,
and
the
way
it
works
is
that
it's
just
a
list
of
how
many
bytes
have
been
evicted
since
the
system's
booted
and
any
time
you're
requesting
for
some
eviction.
A
You
just
add
yourself
to
this
list
and
you
kind
of
increment,
so
this
this
diagram
is
showing
here
that
there's
four
different
consumers,
they've
added
and
just
been
accumulating
on
here.
The
number
you
see
in
here
is
the
number
of
bytes
that
it's
expecting
for
eviction
to
get
to
before
they're
woken
up,
then,
from
the
archivic
thread
side.
It's
going
to
process
this,
it's
going
to
keep
you
know
if
there's
memory
pressure
and
it
needs
to
do
some
work,
it's
going
to
be
woken
up.
A
It's
going
to
go
through
the
ark
and
say:
okay,
I
need
to
evict
some
pages
and,
as
I
a
victim
every
single
time,
I
complete
some
eviction.
I'm
going
to
look
and
see
who
do
I
need
to
wake
up
in
the
past.
The
old
logic
was
simply,
you
wait
until
the
arc
size
got
below
arc
c,
and
then
you
woke
everybody
up
here.
We
now
have
an
opportunity
to
wake
up
people
as
they're
make
as
we're
making
progress.
This
kind
of
allows
us
to
do
this.
A
Lockstep
thing
and
also
any
consumers
out
there
that
are
waiting
for
you
know
for
blocks
to
be
allocated
can
actually
make
some
forward
progress.
Once
we've,
you
know,
freed
enough
memory
and
they
don't
have
to
sit
there
for
long
periods
of
time.
So
in
this
example,
we
can
see
that
we
started
off
our
eviction.
Count
with
you
know:
10
156
bytes,
we
free
384,
bytes
and,
as
a
result,
we're
able
to
wake
up
these
two
threads.
A
So
this
also
changed
the
way
that
you
know
when
you're
going
and
reading
a
block
how
we
actually
do
that
allocation,
because
now
every
single
time
the
arc
is
overflowing,
we
now
will
put
you
on
this
list
and
you're
going
to
wait
for
a
specific
amount
of
memory
to
be
freed
rather
than
waiting
for
arc
seat
or
sorry
arc
size
to
get
below
rc.
So
you
can
make
some
forward
progress
and
we
don't
have
those
long
delays
anymore.
A
So
our
simple
test
of
just
going
through
and
saying:
okay,
let's
run
the
same
workload,
fill
the
arc
and
then
add
some
memory
pressure.
We
now
see
that
arc
size
and
rc
are
able
to
stay
in
lockstep
and
we
can
see
that
in
this
case
we
were
doing
a
30,
gig,
30
gigs
worth
of
memory
pressure.
We
saw
the
arc
actually
slowly
come
down
to
that.
A
You
know
and
and
reduce
itself
by
30
gig,
give
that
memory
to
somebody
else
to
consume
and
were
able
to
drive
off
without
any
long
delays,
so
that's
kind
of
where
we
ended.
But
it's
not
the
end
of
the
story.
There's
still
more
to
be
had
here,
in
particular
one
of
the
things
that
that
we
did
was
we
actually
introduced
kind
of
this
minimum
threshold
of
memory
that
we
should
always
leave
for
the
system
to
run
in
currently
in
upstream.
It's
set
to,
I
think,
132nd
of
of
all
memory.
A
Something
about
that
size
has
a.
It
has
an
interesting
calculation
in
delfix.
However,
we
found
that
having
that
value
set
ended
up,
causing
us
to
actually
see
out
of
memory
conditions,
so
we've
increased
it,
but
we
still
want
to
dig
into
that
further.
Our
main
goal
here
really
is
to
make
it
so
that
the
arc
sizing
is
not
something
that
people
have
to
worry
about.
You
know
we
want
consumers
to
be
able
to
say
I
install
open
cfs
and
the
arc
is
going
to
adjust
to
your
workload
and
to
your
environment.
A
A
A
This
component
called
arc
no
grow
and
it
will
detect
and
kind
of
slow
down
its
ability
to
grow
once
it
finds
that
you
know
that
memory
pressure
hasn't
has
been
incurred
and
then,
after
a
period
of
time
it
says.
Let
me
check
again.
Let
me
see
if
that
memory
pressure's
still
there
and
if
so
you
know,
then
I
don't
grow,
but
if
it's
gone
then
I'll
slowly
start
allowing
the
arc
to
continue
to
grow,
so
it
will
adjust
over
the
course
of
time.
A
And
then
alan
asked
is
this
related
to
the
freebsd
tunable's
arc
free
target
which
triggers
arc
reclaim
when
the
kernel
free
pages
get
below
this
target.
Alan
I'm
not
too
familiar
with
arc
free
target.
So
I'd
have
to
look
at
that,
but
it's
possible
that
there's
some
similar
similarities
there
in
like
for
for
linux.
A
Right
now
we
have
arcsis
free,
which
is
kind
of
the
the
bottom
end
of
when
you
can
actually
or
how
much
memory
you
leave
for
the
system
and
that's
the
piece
that
kind
of
determines
at
what
point
in
time.
You
know
we
won't
allow
you
to
to
or
treat
memory
as
kind
of
like
memory
minus
that
amount
so
that
we
always
leave
that
around.
A
Unfortunately,
there
isn't
a
good
test
suite
that
I'm
aware
of
I
know
some
people
have
been
looking
at
trying
to
figure
out
ways
to
verify
and
validate
some
of
the
arcs,
algorithms
and
and
heuristics,
because
there's
quite
a
few
things
here,
as
I
mentioned
like
I
mentioned
the
arc
reap
thread
now
with
this
shrinker
logic,
the
arc
reap
thread
doesn't
really
do
very
much
and-
and
even
when
we
looked
at
this,
when
we
first
encountered
the
problem
and
we
kind
of
expected
the
arc
reaped
thread
to
be
the
detector
of
memory
of
memory
pressure.
A
When
we
started
tracing
that
we
found
that
it
never
woke
up
like
it
just
wasn't
doing
anything,
it
was
every
single
call
that
was
coming
through
to
shrink
the
arc
was
actually
all
coming
through
casewarpd,
maybe
under
rare
occasions.
It
would
do
that-
and
I
also
mentioned
briefly
that
there
was
this
kind
of
cumbersome
interaction
between
the
shrinker
and
the
way
that
we
would
detect
memory
pressure
and
what
it
was
doing
was.
A
It
only
incremented
under
certain
conditions,
and
so,
as
a
result,
we
were
too
late
to
the
party
whenever
memory
pressure
came
in
and
so
the
arc
weak
thread
is
probably
something
that
we
have
to
go
back
and
you
know
revisit
and
see
like
if
it's
got
value
anymore
and
then
another
question
is
the
this
new
list
of
waiting
allocations.
Is
there
an
arc
os?
Is
this
an
arc
os.c
or
more
generic
other
platforms?
It
is
generic.
It's
actually
used,
I
think
both
for
freebsd
and
linux
today.
A
Did
you
look
at
the
case
with
instant
memory
requests
on
ins
on
linux,
like
keemu,
now
fails
easily
to
fail
its
allocation?
We
have
looked
at
so
one
of
the
things
that
we
want
to
do,
and
this
is
one
of
the
I
guess
areas
that
probably
needs
investigation.
A
I'm
not
look
specifically
at
like
the
qmu
kind
of
use
case,
but
it's
my
understanding
that
that
was
one
of
the
reasons
why
linux
had
never
really
went
to
a
full.
B
Hey
george
yeah
2.1,
you
can
stop
sharing
your
slides
and
people
will
be
able
to
see
you
a
little
bit
bigger
and
the
other
is.
I
answered
somebody's
question,
but
I
kind
of
dismissed
it
before
I
should
have.
The
question
was
which
release
of
open
zfs
will
have
the
fixes
that
you're
talking
about.
A
Oh
so
these
are
all
in
open,
cfs
2.0,
so
I
think
they've
already
been
pulled
in.
A
What
is
the
largest
arc
size
that
you've
tested
in
how
much
memory
is
too
much
memory?
So
we've
had
systems
that
have
been
using
up
to
like
a
terabyte
of
memory
is
the
largest
that
we've
seen.
I
don't
know
if
others
have
have
gone
much
larger
than
that
I'd
be
very
interested
to
hear
about.
You
know
experiences
that
people
may
have
had
with
some
extremely
large
systems.
We
know
that
with,
like
you
know,
team
m
systems
coming
out.
A
A
Okay,
I
think
I
have
okay,
I
have
one
more
minute.
Do
you
expect
this
to
remove
the
need
to
limit
the
arc
size
on
virtual
machines
hosts?
We
would
like
to
get
to
that.
You
know,
I
think
if
you
know,
as
we
start
to
kind
of
dig
into
this
more
and
and
make
it
more
robust,
we're
going
to
be
looking
for
feedback
from
the
community
as
people
start
to
run
environments.
I
I
mentioned
kind
of
the
qmu
case.
We
know
that's
problematic,
but
we've
you
know
at
delfix.
A
We
we
run
on
virtual
machines.
You
know
vmware
all
the
cloud
environments
and
we're
not
seeing
that
to
be
an
issue,
but
you
know,
obviously
everybody's
workload
might
be
a
little
bit
different,
so
we
would
love
to
kind
of
get
a
feel
for
how
people
are
kind
of
seeing
this
in
the
wild,
and
you
know
experiences
you
may
have
had
in
the
past
and
try
to
go
back
and
see
if,
if
these
things
are
now
addressed
with
the
changes
that
we've
that
we've
implemented.