►
From YouTube: DirectIO for ZFS by Brian Atkinson
Description
From the 2021 OpenZFS Developer Summit
slides: https://docs.google.com/presentation/d/1f9bE1S6KqwHWVJtsOOfCu_cVKAFQO94h
Details: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2021
A
Cool
yeah,
so
I'm
brian
atkinson,
I'm
from
the
hp
hpc
des
storage
design
group
at
los
alamos
national
lab.
Today,
I'm
going
to
be
presenting
on
the
addition
of
direct
io
to
zfs
hello.
A
A
It
says:
try
to
minimize
the
cache
effects
of
io
to
and
from
the
file
and
the
io
is
done
directly
from
to
and
from
user
space
buffers,
and
so
really
what
this
means
in
linux
is
most
file
systems
will
just
directly
map
in
user
pages
into
the
file
system,
and
then
they'll
read
directly
in
and
out
read
and
write
directly
in
and
out
of
those
pages,
and
by
doing
this
they're
completely
bypassing
the
page
cache.
So
no
copies
are
ever
made
of
the
actual
buffers
that
the
user
is
using.
A
But
outside
of
this,
there's
really
loose
rules
and
semantics
around
directio
and
if
you
look
at
different
file
systems,
they'll
have
their
own
alignment
restrictions
or
even
things
like
do.
We
want
to
be
coherent
with
buffered,
writes
and
reads,
and
all
that
kind
of
stuff,
with
direct
io.
So
really
it's
very
loose
semantics
outside
of
just
that
main
idea
of
mapping
in
user
pages
and
directly
reading
in
and
out
of
them,
and
so
currently
zfs
actually
accepts
the
direct
flag,
but
what
it
actually
does.
A
And
I
can
honestly
say,
while
working
on
this
code,
I
have
felt
like
I
was
going
crazy
from
time
to
time,
because
the
arc
is
such
an
integral
part
of
how
zfs
works,
and
I
wanted
to
start
out
by
saying
that
in
no
way
is
this
presentation
saying
never
use
the
ark.
The
ark
is
super
important
to
cfs
and
you
can
get
great
performance
out
of
it.
However,
there
are
certain
times
where
it
is
beneficial
to
bypass
the
ark.
A
But
I
also
want
to
talk
about
just
our
workloads
at
lanl
specific
to
hbc,
and
we
have
these
run.
Launching
run
long
run
parallel
simulations
and
these
things
can
run
for
sometimes
on
the
scale
of
weeks
to
months
and
highly
parallel.
And
so
often
what
these
simulations
will
do
is
periodically
throughout
time
is
they're
going
to
checkpoint
their
data,
and
this
is
just
a
right
one
situation
in
the
hopes
that
we
never
have
to
read
that
data
back.
A
It's
just
merely
to
say
the
state
of
the
simulation
itself
and
we
call
this
checkpointing,
and
so,
if
that's
going
in
the
arc,
there's
really
no
benefit
for
us
there,
because
our
intention
is
to
never
have
to
read
that
data.
But
unfortunately,
we
all
know
hardware
failures
happen,
and
in
that
case
we
actually
do
eventually
have
to
read
that
data
back
out
of
zfs
to
restart
the
simulation,
just
get
it
to
its
old
state.
A
We
got
really
surprising
results,
and
so
the
first
thing
I
want
to
highlight
on
this
graph
is
this
top
line
here.
This
dashed
line,
and
what
this
is
is
the
maximum
amount
of
sequential
read
bandwidth.
We
could
get
out
of
12
samsung
pm
1725
nvme
right
around
42-43
gigs
a
second.
However,
when
we
actually
put
these
nvmes
inside
of
z
pool,
then
we
start
use
different
configurations,
see
the
striping
rate
z1
or
az2.
A
We
found
we
were
completely
bowel
neck
and
the
bottleneck
got
worse
as
we
continue
to
decrease
io
parallelism,
which
is
the
x-axis
here.
So
each
one
of
these
data
points
along
the
x-axis
is
sequential
readers
of
just
their
own
individual
files,
out
of
the
z
poll,
and
so
in
the
best
case
scenario,
which
is
the
low
I
o
thread
counts.
We
get
we're
leaving
about
48
to
57
of
all
the
available.
A
A
The
main
difference
here
is
that
the
top
line
is
for
all
12
nvmes,
and
this
is
for
striping,
because
we
actually
have
all
12
nvme
bandwidths
to
work
with.
In
that
case,
however,
when
you
go
down
to
raid
z1,
you
lose
a
disk
worth
of
bandwidth
for
the
rights
and
then
in
the
rey
z2
case.
Again,
you
lose
an
additional
disc
worth
of
bandwidth
for
the
rights,
and
we
saw
something
very
similar.
A
Just
like
we
saw
with
the
reads
as
we
increased
io
parallelism,
we
were
just
completely
fat,
a
flat
line
and,
in
the
worst
case,
we're
leaving
40
7
of
available
nvme
bandwidth
on
the
table
in
the
best
case
34.
But
in
either
of
these
cases
I
mean
there.
It's
still
a
lot
of
available
bandwidth
that
we
wanted
to
capture,
and
so
this
actually
led
to
a
meeting
back
in
august
of
2019
between
cray
livermore,
oakridge
and
lanl,
because
funny
enough
we
were
all
investigating
the
same
issue
and
we
had
all
tried
different
parameters.
A
We
had
shared
that
with
each
other.
We
tried
just
doing
small
patches
of
code,
but
we
can
never
get
past
these
bottlenecks
and
zfs
with
these
mdme
z,
pools
and
so
over
the
course
of
the
week.
We
decided
all
right,
let's
actually
investigate
this
and
figure
out.
What
are
we
missing
here?
Like?
What's
our
bottleneck,
and
so
we
use
the
tool
called
flame
graphs
and
the
way
you
read
a
flame
graph
is
at
the
very
bottom
of
the
flame
graphs.
A
That's
the
beginning
of
the
call
stack
and,
as
you
go
further
up
in
the
flame
graphs,
you're
getting
deeper
and
deeper
in
the
call
stack
and
what
you're
trying
to
find
with
these
plane.
Graphs
are
these
plateaus
and
the
longer
a
plateau
is
the
more
execution
time
is
being
spent
in
that
call,
and
so
we
found
two
places
in
particular
where,
with
the
sequential
reads
for
buffered
zfsio,
we
were
getting
stuck
in.
A
But
this
time
with
writes
and
again,
we
found
all
of
our
execution.
Time
was
being
stuck
in
this
memory
copy.
But
in
this
case
it
was
just
simply
taking
the
user's
buffer
and
copying
it
into
the
kernel
space
and
into
the
arc.
And
so
when
we
looked
at
this
for,
like
all
right,
there's
actually
a
pretty
easy
solution
of
this.
We
just
need
to
actually
implement
direct
io
and
zfs,
and
we
can
completely
avoid
these
memory
copies.
A
And
so
I
just
wanted
to
first
before
I
get
into
the
dirty
details
of
how
we
got
all
this
working
just
go
over
the
big
picture.
What
what
do
we
mean
when
we
say
we
added
direct
io
to
zfs
and
I'm
just
going
to
start
with
the
reads?
First,
and
I've
got
a
really
simplistic
diagram
over
here
on
the
right
of
the
internals
of
zfs.
So,
on
the
left
hand
side,
we
have
the
normal
buffered
path
that
zfs
does
with
reads.
A
The
big
picture
here
is
what
we're
trying
to
say
is
when
we
get
that
read
system
call
come
in
through
the
dpla,
we'll
enter
the
dmu
and
we're
just
going
to
directly
issue
that
into
the
cio
pipeline
and
read
that
data
off
of
the
v
devs,
and
we
do
this
by
direct
mapping
the
user
faders
directly
into
an
abd,
and
I
just
want
to
quick,
quickly
state
this
here
and
then
I'll,
go
into
way
more
detail
about
what
I
mean
by
this.
There
is
with
direct
diaries
we
can
copy
out
of
the
arc.
A
We
allow
that,
because
we
have
this
thing
called
art
coherency
between
buffer
and
direct.
I
o,
but
again
I'm
going
to
go
in
that
in
much
more
detail,
but
I
did
want
to
just
quickly
mention
that
here.
A
So
the
big
idea
with
the
right
side
is
that
the
typical
right
path
for
zfs
for
bufferedio
is
a
user
buffer
will
come
in
and
that's
immediately
memory
copied
into
the
arc
as
soon
as
that
copy
is
done.
We
return
out
to
the
right
system
call
and
we're
done,
and
this
buffer
here
is
then
assigned
to
a
transaction
group
and
eventually,
after
some
time
period
or
enough
dirty
data
is
accumulated
into
the
arc.
This
transaction
group
will
transition
to
its
sync
phase.
A
What
we're
saying
is
like
take
that
user
buffer
again
directly
map
it
into
an
avd
and
then
immediately
send
that
through
the
zio
pipeline,
so
we're
still
going
to
do
all
those
transformations
just
as
we
do
before,
and
then
we're
going
to
issue
that
right
immediately
down
to
the
underlying
vdups,
and
it's
not
till
after
we've
put
that
data
on
disk
that
we
return
back
to
that
right
system
call,
and
so
at
that
point
we
can
guarantee
to
the
user.
A
Okay,
your
data
has
landed
on
the
underlying
b
desk
so
now
to
get
into
the
actual
details
of
how
we
actually
got
all
this
working
in
zfs.
I
just
want
to
stick
to
general
details
and
then
I'm
going
to
actually
go
into
much
greater
detail
with
the
right
side
of
directio
and
just
to
start
up
front.
I
want
everybody
to
understand.
A
Odirect
does
not
imply
osync,
and
this
is
common
with
all
file
systems
and
what
we
saying
when
we
when
we
say
this,
what
we
mean
is,
even
though
we're
guaranteeing
that
data
has
landed
on
the
v
dabs
with
odirect.
That
says
nothing
about
the
indirect
block,
pointers
and
the
metadata
of
that
right.
If
you
want
that
to
go
out
with
that,
oh
direct
data,
you
have
to
either
tell
the
z-pole
sync
always
or
pass
the
o-sync
flag.
A
The
other
big
thing
is:
we
have
alignment
restrictions
with
odirect
for
now,
we've
chosen
the
page
size,
it's
common,
a
lot
of
amongst
a
lot
of
file
systems,
but
some
can
even
go
as
low
as
the
lba,
but
for
now
we
stuck
with
page
size
and
if
a
user
does
request
direct,
I
o
through
the
o
direct
flag
and
it's
not
page
size
line
in
that
case
we're
going
to
return
em
valve.
A
So
users
can
actually
verify
how
exactly
did
my
request
get
issued
out
in
zfs
there's
also,
of
course,
arcstat,
and
you
could
see
through
arcstat
how
your
data
was
delivered,
but
these
extra
accounting
that
we've
added
they're
much
more
fine-grained.
If
anybody
wants
to
look
at
that,
while
they're
issuing
their
direct,
I
o
the
other
big
thing
I
wanted
to
mention
is
that
all
direct
I
o
requests
they're
issued
as
a
sync
priority
down
in
the
vw
queues,
so
the
zfs
vdev
sync
max
min
those
are
the
parameters
that
apply
to
any
direct.
A
So
in
the
case
that,
for
whatever
reason,
the
user's
mixing,
buffered
and
direct,
I
o
operations.
We
do
have
that
little
bit
of
logic.
So
once
the
data
is
on
the
v
dev
in
the
best
case
scenario,
we
have
no
data
a
dirty
day
in
the
ark,
and
so
at
that
point
it's
like
great.
Let's
just
update
the
block
pointer.
That
means
every
feature
reads:
gonna
have
to
issue
down
to
the
v
dubs
below,
however,
worst
case
again,
users,
sometimes
they'll,
mix
and
match
you
know
buffer
and
direction.
A
Although
I
don't
suggest
it,
they
will
occasionally
do
this.
So
if
we
do
have
these
dirty
records,
the
first
thing
we
have
to
check
is:
okay.
Are
we
actually
syncing
out
this
dirty
record
here,
that's
associated
with
this
data
buffer
and
if
that's
the
case,
we
have
to
wait
because
with
directio,
we
want
to
promise
the
same
consistency
semantics
that
zfs
normally
promises.
A
So
we
want
that
transaction
for
the
previous
transaction
to
go
out
and
be
written
out
to
disk,
and
if
there
is
no
no
dirty
record
syncing,
what
we'll
do
is
remove
all
the
dirty
records
remove
the
data
from
the
arc,
and
then
we
update
the
block
pointer,
and
so
we
do
have
one
other
alignment
restriction
specifically
with
direct.
I
o
rights
and
that's
for
each
director
right.
It
also
has
to
be
record
size,
aligned
and
there's
a
reason
we
do
this.
A
It's
because
we
want
to
avoid
a
read,
write
and
modify
cycle
and
to
explain
what
I
mean
by
this.
I
wanted
to
show
two
back
direct.
I
o
writes.
You
know
one
after
the
other
explain
what
that
means
by
the
read
write
modify.
So
if
we
came
in,
we
did
the
direct
I
o
path.
We
went
into
the
ci
pipeline,
we
would
calculate
a
checksum
based
on
that
user
data
and
we
place
that
in
the
block
pointer
and
then
that
data
will
be
on
the
vw.
A
A
You
have
to
go
down
to
the
vw,
read
that
data
off
bring
it
back
up
in
the
dmu
layer,
we'll
modify
it
again
issue
it
back
out
to
the
director
or
through
the
zio
pipeline,
because
we
have
to
recalculate
that
checksum
and
update
it
inside
the
block
pointer
and
then
issue
that
back
out
to
the
vdov
and
honestly,
we
did
implement
it
at
first.
To
do
this.
We
allowed
for
these
sub
record
size
updates
with
odirect.
A
In
that
case,
the
other
thing
I
did
want
to
mention
is
that
for
each
block
or
first
block,
that's
written
into
a
file,
zfs
actually
allows
that
first
block
to
slowly
grow
up
to
a
record
size.
This
is
really
entwined
with
how
the
war
the
arc
works
and
because
we
already
have
this
record
size
alignment
restriction.
You
decide,
you
know
what,
while
that
block
size
is
growing,
just
go
through
the
arc
like
you
normally.
A
And
what
we're
saying
here
is
we
actually
have
to
write,
protect
the
users
pages,
because
it's
important
remember:
we've
directly
mapped
in
the
users
pages
with
odirac
and
explain
why
this
is
important
again
when
we
go
into
the
zio
pipeline,
we're
calculating
to
check
someone
writing
that
into
the
data's
block
pointer.
However,
for
some
reason
these
are
just
maliciously
you're,
just
let's
be
honest
as
users
of
apparel
you
know
code.
Sometimes
they
just
do
weird
things
if,
for
some
reason,
after
we've
calculated
this
checksum,
we
said
it
in
the
block
pointer.
A
If
we
were
to
modify
the
contents
of
that
buffer
and
write
that
out,
I
mean
everything's
fine
as
long
as
it's
just
sitting
there
on
the
vw.
But
the
issue
comes
in
when
we
go
to
read
that
data
out
either
because
we've
issued
a
read
on
it
either
or
there's
a
re-silver
happening
or
anything
of
that
matter.
What
we're
going
to
get
is
a
check,
some
failure
and
there's
no
way
for
us
to
actually
fix
this,
because
our
checksum
was
calculated
based
upon
the
original
contents
of
the
user's
pages.
A
B
A
Last
note,
on
direct
a
on
zfs,
we
did
add
a
new
dataset
property,
which
is
called
direct
and
the
default
is
standard,
and
that
follows
all
the
semantics
that
I've
outlined
so
far,
but
we
also
added
always,
and
what
this
really
allows
the
user
to
do
is
say:
okay,
I
don't
want
to
modify
all
my
applications
to
use
the
odirect
flag.
Let
me
just
simply
try
it
out
and
always
is
kind
of
a
best
effort
that
we
do
here.
A
A
We
also
added
in
that
same
vein,
direct
disable,
because
if
this
patch
gets
merged,
people
that
we're
passing
another
direct
flight
would
be
like
what
the
heck
just
happened,
because
the
performance
may
be
all
over
the
place,
all
of
a
sudden
they're
not
getting
what
they
want,
and
so
by
doing
disabled
just
means
hey
just
ignore
the
o
direct
flag.
Just
how
zfs
does
it
today?
A
So
it's
basically
fall
back
to
the
old
version,
and
just
forget
that
oh
direct
flag
was
even
being
passed,
and
so
now
I
want
to
actually
show
how
we
our
performance
results
with
all
these
crazy
additions
of
o
direct
to
zfs,
and
so
I'm
going
to
start
out.
Looking
at
the
squid,
chill
read
results
from
the
nvme
z
pools
and
on
this
graph
I
have
the
previous
results
that
we
had.
A
We
could
get
with
sequential
reads,
and
so
we
actually
see
is
that
for
each
of
the
different
video
configurations
driving
rate
z's,
we
get
pretty
close
to
capturing
all
that
available,
nvme
bandwidth
and
at
the
higher
io
threads,
where
we
are
thrashing
the
arc,
we
actually
get
about
a
3x
speed
up
for
the
reads
there
and
in
general,
for
all
cases
the
o
direct
reads
scale.
You
know
pretty
well
and
they
stay
consistent
as
we
continue
to
decrease
that
I
o
parallelism.
A
So,
with
the
sequential
write
results
for
these
nvme
z
pools,
it's
again
important
to
remember
that,
based
on
the
certain
v
dev,
you
have
a
certain
maximum
you
can
achieve.
So
I
just
wanted
to
grab
the
one
case
where
all
three
of
the
different
vita
of
configurations
were
at
their
maximum
bandwidth
and
that
was
at
512,
sequential
I
o
thread
writers
and
so
for
each
of
the
different
v
dev
cases
I
have
a
green
bar,
that's
showing
this
is
the
amount
of
available
nvme
bandwidths.
A
A
But
for
all
three
of
the
different
v
dev
configurations,
we
got
over
1.5
x,
speed
up
and
I
also
just
wanted
to
quickly
state.
Unfortunately,
I
don't
have
enough
time
in
this
presentation
to
go
over
all
the
scaling
results
with
odirec
sequential
rights,
but
if
anybody's
interested
in
this
data,
I
do
have
addendum
slides
all
that
data
is
available
and
I
also
do
have
oh
direct
ride.
A
Direct
write,
scaling
results
with
d-rate
configurations,
so
anybody
can
look
at
that
if
they're
interested
there
was
one
last
performance
case
I
actually
wanted
to
go
over
here
and
that
was
o
direct
with
disk
and
the
main
difference
between
this
graph
and
the
previous
graphs
that
I've
shown
is
along
the
x
axis
here.
This
is
actually
the
number
of
sequential
read.
A
The
light
color
lines
are
the
buffered
cases,
and
so
what
I
really
want
to
highlight
here
is
that
for
all
the
o
direct
cases
we're
actually
performing
worse
than
buffered,
and
this
goes
back
to
what
I
started
out.
The
presentation
with
like
we're
not
arguing
to
never
use
the
arc,
in
fact
with
odirect,
if
you're
looking
at
just
from
a
performance
perspective,
there's
a
few
things
to
take
into
consideration.
A
Basically,
you
want
to
think
about
what
is
my
I
o
workload.
Will
I
benefit
from
this?
Even
with
that
io
workload?
What's
my
v
dev
configuration
and
along
those
lines?
What
is
the
underlying
hardware?
Because
we
can
see
here
that,
even
in
the
low
thread
I
o
thread
counts,
the
our
prefection
is
doing
a
pretty
good
job
with
this
j-bond
and
even
though
it
does,
you
know
flatline
a
little
bit
and
come
down.
It's
still
really
outperforming
the
o
direct
results.
C
A
People
go
out
there,
grab
it.
Try
it
out,
discover
bugs
possibly
give
it
a
whirl.
You
know
we
definitely
would
like
some
feedback
on
the
pull
request
from
anybody
who
does
try
it
and
we
do
are.
We
are
aware
of
some
bugs
in
the
code
there's
some
corner
cases
we're
still
working
through,
in
particular
with
linux,
the
stable
page
stuff.
A
The
kernel
doesn't
really
document
stable
pages
too
well,
so
we've
been
struggling
to
get
that
to
work
for
all
cases
on
the
linux
side,
but
we're
hopeful
that
we're
getting
close
to
getting
that
done
and
in
the
freebsd
for
some
strange
reason
without
direct
reads:
I've
been
getting
e-faults
occasionally
when
mapping
in
the
the
user
pages
and
really
anybody
in
the
freebsd
community,
I'm
more
of
a
linux
person
and
I'm
trying
to
learn
freebsd
as
much
as
I
can,
but
anybody
that
could
help
out
on
the
side.
That
would
be
greatly
appreciated.
A
A
Maybe
who
initially
did
the
direct?
I
o
implementation,
matt
ahrens,
he
really
assisted
in
getting
the
semantics
nailed
down
for
how
he
wanted
direct
io
work
in
zfs
matt
macy
did
all
the
free
freebsd
ports
for
direct.
I
o,
and
I
really
appreciate
his
help.
There
and
brian
bellendorf
has
been
great
in
helping
me
try
to
get
this
work
across
the
finish
line,
a
lot
of
stuff
with
the
art
coherency
and
just
general
linux
side
implementation
details,
including
the
stable
page
issue
that
we
have
right
now.
C
So
brian,
this
is
fantastic
to
see
this
work
coming
along.
It's
been
a
while,
I
know
so.
C
So,
in
your
analysis
I
mean
you
already
highlighted
that
you
know
the
the
direct
I
o
work
is
really
a
performance
effort.
A
C
So
you
know
you
have
to
be
careful
when
you
use
it
because
it
may
not
pan
out
in
all
configurations,
and
you
should
demonstrate
that.
Obviously,
with
your
your
example
running
on
on
disk
ra,
then
an
nvme.
Have
you
also
done
any
analysis
on
small
block
versus
large
block
workloads,
because
I
would
assume
that
would
be
another
category
where
you
might
not
see
the
performance
you
might
expect
or.
A
A
No,
I
haven't
done
a
full
analysis
there
yet,
but
that
is
something
we're
interested
in
because
actually
matt
aaron
and
I
were
talking
about
those
jbod
results
and
really
I
should
have
pressed
up
the
record
size.
And
I
didn't
say
this
all
the
results
that
I
shared
were
one
meg
record
sizes
and
one
meg
request
sizes.
But
if
I'd
actually
pushed
that,
you
know
up
to
maybe
say
like
eight
meg
record
or.
C
A
C
Yeah
exactly,
I
think
you
have
to
be
careful
when
we,
when
we
release
this
feature
to
you,
know
the
documentation
very
clear
about
that
you're.
Obviously
your
performance,
it
may
vary
depending
on
your
on
your
work
cases,
which
exactly.
A
So
yeah
for
the
io
workload
we
used
the
xdd
tool
and
the
reason
we
use
xtd
is
because.
C
A
A
B
Cool
the
next
question
was
from
saji
or
joshua,
says
thanks.
The
next
question
was
from
sadji
nair.
He
notes
that
yeah,
so
odirec
doesn't
give
you
a
sync
which
means
that
if
you
crash
you
could
lose
the
oh
direct
rights
if
they
weren't
synced
yeah
statement.
That's
exactly.
A
B
A
A
But
now
all
that's
going
to
be
in
the
arc,
with
odirac
we're
strictly
talking
about
the
data's
user
buffer
itself.
We're
not
talking
about
any
of
those
indirects
and
that's
again
goes
back
to
that.
Osync
idea.
B
Cool
james
simmons
asks
what's
the
timeline
to
getting
this
merged.
A
A
Yeah,
hopefully
soon
I
mean
the
the
stable
page
stuff
has
definitely
been
frustrating
on
the
linux
side.
Every
time
we
think
we
got
it.
It's
like,
oh
well,
we
just
tested
a
case
and
now
we're
stuck
waiting
around
you
know
and
just
dead
locking.
So
really,
I
think,
that's
the
last
remaining
hurdle
in
the
linux
side.
Again,
there's
a
few
little
bugs.
A
We
actually
do
have
a
random
arc
leak
that
can
happen
which
has
been
super
hard
to
trace
down,
but
it
could
be
like
one
out
of
100
runs
all
of
a
sudden
we
leak
in
our
cutter,
but
that's
only
when
you're
mixing
the
direction
buffered
stuff,
the
freebsd
again,
that
needs
more
love
and
nurturing
care
than
I
have
given
it.
B
Cool,
we
have
a
few
more
questions
and
I
think
we
have
time
for
them,
so
I'll
keep
going
richard
lager
asks
he's
he's
asking
about
when
the
page
size.
His
question
is
about
alignment,
so
if
the
page
size
is
less
than
the
record
size,
then
what
are
the
alignment
requirements?
Doesn't
it
still
need
to
do
read,
modify,
write
and
does
that
fall
back
to
buffered
access,
or
is
it
still
going
to
be
doing
direct?
I
o
with?
B
A
We
actually
do
address
the
case.
Where
say
if
you
have
a
record
size
of
1k
as
long
as
you've
issued
the
request
as
4k,
we
will
send
that
oh
direct,
we.
B
B
B
A
I
think
that's
too
broad
of
a
statement
to
be
honest
because
again
veterans
we're
going
over
the
results.
You
know
it's
like
well,
we
should
have
gone
with
the
bigger
record
sizer,
and
so
again
this
is
where
I'd
love
for
the
community
like
get
the
pull
requests
down
mess
with
it.
You
know,
give
some
feedback
we're
obviously
doing
more
in-house
to
to
get
more
results
and
interpret
them,
but
yeah.
I
I
think
it's
too
pro-statement
to
say
like
no.
This
is
only
going
to
be
great
for
nvme.
A
B
Yeah,
I
agree,
it
seems
like
primarily,
you
can
think
of
this
as
like
odirect
reduces
cpu
usage
and
reduces
memory
usage
and
there's
some
latency
trade-offs
for
making
that
happen
in
some
cases.
But
you
know
if,
if
your
workload
is
not
super
latency
sensitive,
then
having
more
cpu
is
good,
regardless
of
the
back
end
performance.
B
Yuri
vault
watchkov
asks
how
about
the
use
case
of
virtual
machine
storage
using
a
z-vol.
Would
that
benefit
from
direct?
I
o,
assuming
that
you
carefully
select
the
record
size.
A
It
could
and
that's
for
at
the
bellandorf-
and
I
were
talking
recently
they've-
had
a
case-
come
up
livermore
where
they
could
have
benefits
with
z,
vols
and
odirect.
A
So
did
we
definitely
want
to
get
that
work
in
it's
just
again.
I
don't
know
if
it'll
be
in
this
initial
pull
request,
there's
a
separate
pull
request
when
this
gets
merged
without
fully
testing
out
it's
hard
to
say
you
know,
we
first
would
have
to
get
everything
hooked
in
and
actually
we
thought
we
had
it
working
at
one
point,
but
we
were
surprised
it
wasn't
as
easy,
even
though
the
hooks
are
there.
It's
like
okay.
I
just
put
this
in
this
evol
code.
B
Right,
I
just
took
time
for
maybe
these
two
last
questions
that
are
here
a
question
relayed
from
youtube.
B
The
xrd3k
is
asking
if
the
speed
ups
would
apply
to
special
v
devs
as
well
special
videos
like
those
that
are
used
for,
like
small
blocks
and
raids
like
metadata
like
youtube
metadata,
I
think
normally
those
would
not
be
storing
like
data.
A
B
Ted
kabine
is
asking
about:
is
this
largely
intended
for
workloads
where
the
data
being
accessed
is
much
larger
than
the
memory
available
for
the
arc?
So
in
other
words,
it
can't
be
cached
and
the
blocks
being
accessed
are
not
accessed
through
public,
so
yeah
he's
basically
asking
like
is
this?
Is
this
for
uncached
workloads.
A
Yeah,
it's
kind
of
what
our
directs
implied.
I
mean
and
again
the
reason
we
even
got
this
work
kicked
off.
I
mean
between
cray
over
his
liberal
war
and
lana
I
mean
you
know.
Sometimes
our
iowa
worker
workloads
are
not
common.
You
know
we're
sequentially
streaming
out
large
amounts
of
data,
and
that's
where
we
saw
the
benefits
of
this.
So
for
us,
it's
like
well,
if
we're
not
really
wanting
to
cache
it
there.
A
A
B
A
I
don't
see
whether-
and
I
didn't
stress
this
in
the
presentation,
but
zfs
has
the
range
locks
you
know
out
there,
so
that
prevents
a
lot
of
bad
things
happening
with
oh
direct
rights
and
because
we
have
the
record
size
alignment
thing,
it's
like
all
right
great
we
can.
We
can
have
multiple
direct.
I
o
rights
going
to
the
same
file
and
in
our
into
one
cases
that
we've
tested
atlanta.
It
works
completely
fine.
You
know
so
you're
never
going
to
have
two
direct.
I
o
rights
overlapping
to
the
same
record.
A
B
All
right:
well,
thanks
a
lot
brian
for
your
talk.