►
From YouTube: Tim Feldman - Host-Aware SMR - OpenZFS Dev Summit 2014
Description
Host-Aware SMR (Tim Feldman from Seagate)
B
Very
compelling
and
broad
a
lot
of
very
interesting
subjects.
I
also
had
the
chance
to
pick
up
a
little
free
piece
of
paper
this
morning.
Just
thought
I
might
learn
something
from.
It
happened
to
page
into
the
very
middle
of
it
and
found
this
quote
that
brian
anderle
and
robert
roback
talk
about
so.
B
People
who
have
talked
have
heard
about
single
magnetic
reporting
before
okay,
it's
I'm
going
to
give
a
brief
preview
of
all
of
this,
so
that
so
they'll
all
have
the
same
basis.
But
but
this
basically
is
is
the
call
to
action
that
smr
devices
without
any
modification.
D
C
B
B
B
We
measure
the
view
we
see
the
future
where
we
are
shipping
zen
lights.
Today
we
produce
a
head
dispair
about
once
per
second
and
that
today's
cafeteria,
that
that's
the
rate
of
production,
24,
7,
365
of
storage-
that's
the
that's
the
appetite
that
the
world
has
for
storage
and
that's
that's
the
opportunity
for
for
a
file
system
with
innovative
ideas
to
have
advantages.
Now
there
are
different
types
of
smr
devices
that
I'm
going
to
talk
about.
B
I'm
mentioning
two
right
here:
drive
manage
and
host
aware-
and
this
is
sort
of
the
conclusion
slide
up
front
to
make
sure
I
get
to
it.
But
but
there
are
some
applications
that
work
well
with
an
autonomous
drive,
managed
solution.
You
don't
necessarily
have
to
change
the
the
software
set,
but
other
places
the
software
stack
will
work
much
better
with
upgrading
file
systems.
So
thank
you
very
much
for
the
opportunity
for
me
to
tell
you
our
side
of
the
story,
our
perspective
and
have
an
exchange.
So
so
I'll
give
you
this
introduction.
B
And
then
get
into
some
of
irritating
more
and
more
of
the
details
of
the
host
wear
devices
conventionally.
What
we
do,
I'm
showing
a
set
of
tracks
here,
which
of
course
are
full
circles
on
a
heel
disc,
but
I'm
just
showing
them
the
simplicity
of
this
cartoon
as
a
set
of
rectangles
and
their
width
is
defined
by
the
width
of
the
right
hand,
element
and
we
could
have
fun
with
physics
here.
But
we
don't
have
time
to
talk
about
that.
We're
actually
changing
the
skin
state
of
life.
B
So
that's
how
we're
recording
the
information
in
the
world,
but
suffice
it
to
say
that
on
the
end
of
the
arms,
are
these
heads
that
have
a
writer
element
and
reader
two
separate
elements
and
they
have
different
widths
and
conventionally
space
them
so
that
you,
when
you
come
back
and
read
you're
just
reading
down
the
middle
of
it,
and
you
can
see
that
you
can
override
the
place.
You
can
pick
any
sector
it'll
write
into
place
without
any
interaction
with
the
neighborhood
suckers.
B
But
we
are
now
at
the
point
where
we
can
keep
scaling
the
reader
smaller,
but
we
can
no
longer
scale
the
writer
any
smaller.
It's
because
the
flux
density
we
need
to
overcome
the
coercivity
of
the
media.
It
gets
into
all
this
interesting
material
science,
but
this
is
the
point
of
that
we
hit
is,
is
just
this
fact.
B
We
want
to
break
that
up
every
once
in
a
while
and
and
restart
the
shingles.
So
it's
sort
of
like
at
the
the
ridge
of
the
roof
and
and
so
each
shingled
region,
most
commonly
is
called
van.
B
Building
these
as
fast
as
we're
flipping
burgers
but
but
just
about
drive
managed,
is
the
first
type
of
smr
drive.
It
autonomously
handles
all
of
the
bakeries
of
trying
to
take
an
arbitrary
right
to
any
set
of
ldas
and
store
that
so
it's
retrievable
seagate
has
shipped
millions
of
these
millions
and
millions
of
these,
but
so
far
they've
all
been
in
the
drive
managers
to
so
the
process
that
we
go
through.
B
If,
if
you're
trying
to
up
read,
if
you
try
to
write
an
arbitrary
piece
of
data,
is
we
first
read
through
the
shingle
region
and
retrieve
all
the
old
data
and
then
bring
in
this
new
data
from
the
host
which
may
have
already
been
in
buffer?
Maybe
it's
coming
in,
but
it's
already
just
sort
of
scattered
down
the
operation
of
linking
a
set
of
pointers.
B
So
we
now
have
this
representation
of
what
needs
to
be
in
the
shingle
region
and
then
we
come
back
and
we
rewrite
the
data
with
the
new
data
there
and
the
old
data
being
restored.
If
we
stop
this
process
halfway,
we
have
tracks
where
the
data
that
has
been
clobbered
and
in
the
old
day
the
customer
data
is
no
longer
recoverable.
B
B
Big
caches
are
getting
on
drives
right.
There's
the
d
rams
on
drives
are
now
328
megabytes,
256,
we're
now
talking
about
a
hash.
That's
on
this,
it's
tens
of
gigabytes
and
it's
persistent.
So
as
soon
as
the
data
is
on
this
disk,
the
data
is
safe
from
an
unexpected
power
loss
that
the
right
is
coherent
from.
B
B
Then
we
don't
need
to
use
this
right
catch.
We
can
just
just
go
straight
to
being
stored
in
the
districts,
but
there
were
a
bunch
of
caveats
in
there
right.
It's
not
a
trivial
detection.
You
have
to
wait
for
data
to
come
so
there's
a
latency
period
before
this
starts
up.
The
drive
is
wondering:
is
this
sequential
way
to
persist,
or
has
it
just
paused
for
a
moment
so
so
so
far
from
far
far
from
easy,
but
their
certifications
number
one?
No
change
is
required
right.
B
This
autonomously
handles
inside
the
drive
everything
to
make
all
current
reads
and
writes
work,
just
as
they
always
have
in
terms
of
integrity,
but
with
performance
confidence.
Now,
sometimes
the
performance
is
actually
faster.
I
retain
this
disk
cache
and,
as
random
writes,
writes
to
random
lbas
are
coming
in.
We
aggregate
them
and
write
them.
B
If
you,
if
you're,
only
accessing
primarily
you're,
only
accessing
a
very
small
logical
space,
and
it's
going
to
tend
to
belong
to
a
very
few
number
of
bands
and
the
cleaning
would
be
very
efficient,
and
this
this
has
been
extremely
effective
in
what
we
think
of
as
personal
compute.
The
client
market
notebook
drives
external
usb
drives
desktop.
This
is
where
the
millions
of
millions
of
drives
ship,
so
it's
great,
it
works
or
it
ordinates
markets.
But
the
challenges
are
this
disk.
Cache
is
limited
resource
and
it
takes
up
physical
space.
B
The
more
physical
space
used,
the
less
there
is
for
the
user,
or
really
because
no
one's
going
to
buy
a
4.9
terabyte
drive.
It
forces
us
to
have
higher
investment,
which
just
makes
the
product
lower,
yielding
or
later
with
being
available,
or
you
know
all
the
various
effects
of
trying
to
move
forward.
The
aerial
area
that
cleaning
process
from
the
disk
cache
that
makes
doors
complex,
and
there
are
all
these
limitations
of
actually
being
able
to
get
this
right
around
behavior.
B
We
have
a
lot
of
customers
that
say:
oh
yeah,
we're
doing
sequential
rights
as
some
of
your
dry
manchester
marsh
drives.
We
just
find
folks
and
then
we
hook
it
up
and
we
find
a
combination
of
things
right,
the
first
of
which
is
what
you
think
of
as
a
sequential
right
to
this
turns
out
not
to
be
sequential
from
the
disappointed
view
and
then
all
the
other
sunday
things
of
oh.
But
the
file
system
is
introducing,
I
know,
updates
and
just
just
plenty
of
challenges,
of
trying
to
get
to
parity
within
conventional
level.
B
D
B
We
have
that
that
was
that's
introduction
to
drive,
managed
and
here's
the
rest
of
the
story.
As
I
mentioned
earlier,
there
are
different
drivers,
drive
management,
square
and
host
managed
it's
a
mouthful
but
drive
managed
rds
back
in
the
top
factory,
compatible
autonomous
trucks
host.
C
B
Is
this
drive
type
that
uses
extensions
to
the
atms
because
of
cs6?
This
is
being
standardized
right
now,
but
the
host
managed
drives.
Don't
do
just
any
reads
or
writes
it.
Is
it
only
allows
within
a
certain
led
ranges,
only
allows
sequential
writing
and
it
doesn't
allow
reads
before
the
data's
given
so
their
various
needs
and
rights
would
just
systematically
fail.
It's
not
a
random
access,
direct
access
device,
so
it
has
a
completely
new
device,
type
identity
and
regular
code.
B
Seen
that
in
my
scope
before
so
post
aware
is
the
type
that
this
discussion
is
about.
This
is
the
type
that
seagate
is
evangelizing,
but
this
is
not
a
marketing
pitch.
Hopefully,
you'll
feel
that
way
it's
a
technical
presentation
of
what
the
opportunities
are
here.
So
this
is
a
super
set
of
the
two
other
types
it
remains.
It
retains
the
factory
compatibility,
so
it
uses
the
dry
match
techniques
as
a
backbone
as
a
foundation
to
deal
with
any
any
just
right
that
might
come
in
and
it
uses
the
same
extension.
B
So
by
the
way
you
know
what
the
committee
system
recognized
the
standardizations
recognized
is
that
there
are
multiple
different
technologies
that
people
wanted
to
go
after
and
we
decided
they're
close
enough
so
that
we
will
use
exactly
the
same
command
sets
as
much
as
possible
and
so
far
that's
100
possible.
There's.
There
are
differences
in
some
of
the
models,
but
it's
it's
the
exact
same
commands
and
fields.
B
C
B
Instead
of
having
a
noticeably
slow
and
sluggish
experience,
maybe
you
notice
it
most
easily
on
your
desktop,
but
eventually
the
data
center
would
see
it
too.
What
do
we
have
to
do
to
get
to
to
have
parity
with
conventional
rights?
How
do
we
enable
the
world
to
consume
these
higher
capacity
drives
we're
adding
we're
adding
capacity
without
adding
components
right
we're
not
going
to
solid
state,
so
so
it's
the
cost
is
very
important.
B
We
want
to
keep.
We
want
to
minimize
the
interface
changes,
we're
trying
to
make
it
easy
for
software
to
evolve
to
make
use
of
these.
We
want
these
devices
to
be
general
purpose,
and
we
want
to
enable
the
devices
to
be
consumed
outside
the
personal
compute
space,
where
we've
already
been
successful.
B
B
B
We'll
get
into
that
more
in
just
a
little
bit
and
devices
have
some
key
characteristics
that
will
be
exposed
programmatically
as
parameters
to
the
interface.
So
how
much?
How
many
different
pin
points?
How
many
different
places
you
can
be
sequentially
writing
and
have
high
performance
and
how
much
random
writing?
Can
you
do
a
bit
more
details
there
too?
So
we
return
to
the
story
about
smart
bands,
the
abstract.
B
B
So
just
we
taught
you
all
for
decades
that
oil
it's
been
about
25
years,
but
in
the
in
the
in
the
80s
you
were
the
same
same
speed
throughout
the
whole
space
in
the
90s
we
started
to
introduce
some
technologies.
It
gave
us
more
capacity
outside
less
capacity
for
attractive
phase
at
the
inner
radius
and-
and
we
start
to
we
have
now
this
tribal
knowledge.
That's
never
been
an
outright
contract.
That
low
va
surpasses
myopians.
B
We
may
we're
about
to
vibrate
this,
so
I
would
actually
caution
again.
The
experience
would
be
interesting,
but
I
caution
against
trying
to
make
performance
architectures
that
are
based
on
the
expectation
of
this
relationship
between
lda
and
sustained,
so
so
zones
there
are
just
like.
There
are
different
device
types.
There
are
different
zone
types
most,
notably
a
device
can
have
conventional
space.
B
They
could
say
I'm
going
to
give
the
I'm
going
to
provide
some
amount
of
conventional
space
where
there's
no
particular
advantage
to
sequential
writing
and
there's
no
right
pointer,
but
the
new
idea
are
right
here
to
have
zones
that
each
have
their
own
right
nature
in
their
own
state
of
how
they've
been
how
they've
been
used
so
far.
That
state
basically
looks
like
this.
The
little
state
diagram
of
every
zone
driving
is
brand
new
out
of
the
factory.
Every
zone
is
empty,
as
the
zone
is
written
to
it.
It
is
open
and
it's
being
filled.
B
B
So
the
idea
is
that
if
you
write
at
the
right
point
or
that's
the
most
performing
place
to
write
and
the
right
pointer
moves
as
a
side
effect
of
that
right.
So
if
you're
doing
some
country
rights,
you
don't
have
to
you,
don't
the
host
doesn't
have
to
ask
where
the
vibe
the
host
doesn't
have
to
ask
the
drive
where
the
right
printer
is
knows.
B
B
Of
whether
or
not
to
do
it
but
the,
but
the
right
succeeds
and
this
the
right
pointer,
the
reset
right
corner
operation
then
sends
the
right
pointer
back,
puts
this
zone
back
into
the
empty
state
and
it
makes
all
the
lpas
100
so
that,
like
flight
term
or
a
map,
if
you
do
a
read
of
an
lba
right
after
you've
done
a
reset
by
contributing
the
drive,
will
return
all
zeros
or
whatever
the
initialization
pattern
is,
so
you
don't
have
to
access
the
media
right,
but
the
reads
will
work
and
they'll
work
determine
just
you
know
what
data.
B
B
B
You
can
filter
this
base
on
a
certain
stage.
You
can
ask
just
for
a
list
of
the
empties
or
a
list
just
of
the
post,
there's
also
a
shortcut
here
for
the
same
flight.
So
if
you
receive
a
important
zones
with
the
same
the
same
flag
set,
the
drive
is
telling
you
all
of
the
zones
are
the
same
type
of
size.
So
this
is
what
we
consider
to
be
a
general
purpose.
B
B
So
that
the
amount
of
space
needed
to
isolate
zones
from
each
other
is
well
below
a
percent
as
far
as
we
can
see,
but
it's
a
number
that
you've
discovered
programmatically
like
like
the
capacity
of
the
product,
so
we
think
the
only
thing
that's
really
useful
for
this
to
be
a
power
of
two
and
for
it
to
be
as
small
as
practical,
and
that
number
seems
to
be
in
256
a
recent
write
pointer.
I
already
talked
about
this
and
you
specify
a
zone.
B
B
I
talked
about
device
capabilities,
so
there's
two
parameters
that
you
can
create
a
device
for
either
through
an
ata
one
page
or
a
75
product
indication.
There's
a
open
zones
number,
the
full
formal
name
of
it
is.
B
B
That
tents
are
not
enough;
thousands
are
probably
not
needed,
so
our
current
design
target
is
158.,
so
it's
possible
that
we
may
want
to
consider
the
number
of
metadata's,
like
our
message,
if
I
understand
the
concept
properly
to
be
128
or
to
be
equal
to
this
number,
when
you
query
the
drive
and
ask
what
this
value
is
on
that
drawing
the
other
value,
that's
of
interest
is
how
much
random
writing
can
you
do?
That
is.
B
In
that
opa
space,
any
of
the
tens
of
thousands
can
be
randomly
written,
but
not
all
of
these
if
you
still
want
to
have
super
performance.
So
how
many
this
number,
which
we
think
is
going
to
be
16.,
so
again
give
us
some
feedback,
but
it
means,
for
instance,
in
a
more
conventional
system.
If
you
wanted
to
have
a
10
terabyte
drive,
each
file
system
had
no
more
than
one
terabyte,
so
you
have
10
partitions.
Then
each
file
system
can
have
a
zone
for
random,
override
updates
for
its
own
metadata,
and
this
will
work.
B
C
B
B
Usage
model
is
start
off
by
issuing
reports
find
out
what
the
zone
configuration
is
and
if
you
find
out
that
not
all
zones
are
the
same
size.
Maybe
you
want
to
reject
that
and
say
that
that's
a
niche
market
box,
maybe
there's
some
niche
markets,
for
I
don't
know
for
surveillance
or
deep
archive
for
something
where
there's
good
reason
not
to
have
all
those
nonsense,
but
I
think
it
would
be
perfectly
fine
to
say
for
a
general
purpose
file
system.
All
zones
should
be
the
same
size
then
limit
your
random
zones.
B
Use
use
random
zone
allocate
zones
to
be
randomly
written
as
needed
limit.
If
you
can
to
that
advisory
parameter
and
for
your
random
zones,
the
right
point,
you're,
maybe
doing
things
that
are
hard
to
predict,
but
you
don't
care
you're,
never
going
to
shoot
the
pointer
use
that
right
here.
I
presume,
because
I
don't
think
these
zones
are
going
to
tend
to
ever
get
to
the
state
where
all
the
data's
failed.
B
So
so
that
so
the
expected
usage
model
is
just
take
some
zones
and
do
your
randomizing,
the
rest
of
the
zones,
use
them
for
sequential
writing.
So
in
these
the
right
pointers.
Implicitly,
though,
and
again
you
don't
need
to
issue
except
no
tissue
report
zones
with
any
frequencies
control.
How
many
zones
are
open,
that
is
they'll
even
be
up
to
your
hole.
B
Now,
if
you
have
an
application,
that
is
still
once,
if
you're,
if
you're
in
a
a
cloud
storage
application
that
is
storing
scoring
data
that
nobody
ever
wants
to
throw
away,
tells
you
they'll
never
throw
away
for.
B
B
Financial
records
are
not
going
to
be
thrown
away,
or
at
least
not
for
life
and
drive
like
maybe
after
some
years,
but
for
those
systems
trying
to
reuse
space
may
not
be
a
big
deal,
but
for
for
other
systems,
you
may
want
to
reuse
your
space
and
if
you.
D
B
B
So
here's
some
resources-
you
can
get
a
hold
of
me,
the
short
form
we
is
sample
drives
available,
we've
announced
our
8
terabyte
drives
and
they
will
be
post-square.
Variants
of
that
and
samples
are
available.
B
Performance
will
get
irritably
better
over
over
the
next
coming
months
and
quarters
the
standards
bodies,
websites
have
the
current
drafts
of
the
specifications.
The
stuff
is
going
to
go
down
let
about
soon,
and
let
me
say
that
there
are
some.
There
are
some
extensions
that
I
didn't
mention
here
that
just
got
approved
last
week.
So
so
it's.
C
B
Than
I
thought
than
I
described
it,
that's
good
to
try
and
keep
the
time
short.
Hgst
has
put
some
user
space
libraries,
so
people
can
start
playing
around
with
this
with
some
emulation
the
discussion
layer,
components
of
linux.
There
are
some
mutual
improvements
that
apply
to
open
source,
so
you
can
grab
those
and
consider
coming
to
a
new
conference.
The
olympics
foundation
is
the
first
of
which
is
next
margin
possible
and
back
to
the
conclusions
line.
B
A
B
First
of
all,
what
I
just
described
about
the
timing
about
what
we
were
led
about
is
for
the
first
generation
of
these
commands
and
extensions
and
people
have
been
talking
about
additional
complexities
that
we
have
said.
That
would
be
second
generation,
so
maybe
a
zdc2
and
one
of
those
things
is
the
notion
of
a
circular
zone.
So
if
you
some
people
want
to
have
a
circular
reuse
of
the
zone
that
say
four
to
five
pointer.
Maybe
the
next
eight
megabytes
is
unrecoverable,
and
that
would
be
just
systematically
true.
B
That
drive
would
know
it,
but
what
you're
talking
about
is
even
richer
than
that
and.
B
But
right
now,
there's
no
expectation
that
that
would
actually
be
brought
to
mind
because
of
so
many
complications
of
inside
the
drive
trying
to
understand
how
the
drive
can
know
whether
it
is
healthy
and
all
these
processes
that
are
happening
in
the
system
to
do
scrubbing
and
checksums.
And
all
that
similar.
B
E
E
E
B
Capacity
which
has
its
own
issues,
but
yes,
so
so
to
make
sure
we
often
want
to
have
the
same
functionality,
including
the
performance.
So
people
often
regard
a
suitable
replacement
drive,
not
just
in
the
same
capacity
but
say
also
the
same
reveal.
So
I
believe
that
these
device
characteristics
that
I
described
are
just
another
set
of
parameters.
If
you
say
a
suitable
replacement
drive,
that's
not
only
the
capacity
rpm
would
also
have
to
have
the
same
zone
size
and
the
same
parameters.
B
There
were
enough
ideas
that
the
minis
didn't
want
to
shut
down,
that
we
made
those
things
parameters
that
are
driving
exposed.
We
may
find
that
the
market
just
settles
on
a
certain
set
of
parameters
that
everyone
accepts
as
being
good
parameters,
and
then
every
driver
will
have
the
qualities
that
the
drive
being
replaced
will
have.
But
it's
too
early
in
the
technology
to
know
that's
going
to
happen.
B
B
So
if
you
want
to
write
a
file
system
to
a
key
value
object
which
some
file
systems
file
services,
such
as
seth,
for
instance,
are
going
to
do
now,
you
can
you
can
just
buy
this
key
value
device,
and-
and
so
we
have
a
team
inside
seagate
that
is
going
to
take
on
all
these
challenges
that
I
just
exposed
to
you
and
solve
it
for
the
device
that
you're
going
to
ship.