►
From YouTube: Video and Metadata Standards with Livepeer - Yondon Fu
Description
Shannon will share the learnings and observations from Livepeer, as they work across ecosystems to help steward shared open standards.
A
My
name
is
yondin
and
I'm,
one
of
the
co-founders
of
the
Life
beer
project,
which
is
a
decentralized
video
streaming.
Network
I'll
talk
a
little
bit
more
about
life
here,
as
we
go
through
the
presentation,
but
the
title
of
today's
presentation
is
video
and
metadata
standards
and
I'm
going
to
go
through
a
few
topics
today.
A
Taking
a
step
back
I'm
going
to
talk
about
this
decentralized
storage,
video
compute
pipeline,
which
we
think
that
we
can
enable
for
the
broader
web3
Community
when
it
comes
to
supporting
the
next
generation
of
enhanced
video
experiences
on
the
internet,
but
before
getting
into
all
that
metadata
standards.
This
is
the
starting
point
and
the
first
question
that
I
think
is
worthwhile.
A
Asking
is
well
what
do
we
mean
when
we
say
metadata-
and
this
might
be
an
obvious
question
to
a
lot
of
folks
here,
but
just
in
the
interest
of
kind
of
establishing
a
firm
ground
before
moving
forward
at
the
end
of
the
day.
Metadata
is
the
set
of
data
that
describes
another
piece
of
data.
Hence
the
term
meta
and
I
think
the
concept
that
probably
best
illustrates
this
today
is
nft
metadata.
A
So
on
this
screen
here
we
can
see
on
the
top.
We
have
a
screenshot
of
openc,
which
shows
an
azuki
nft
and
then
on
the
bottom.
We
can
see
the
actual
metadata
that
is
associated
with
this
nft
and
nft
is
an
interesting
form
of
an
on-chain
asset
that
also
has
on-chain
property
rights,
and
this
notion
of
ownership
on
chain
is
super
important
right.
A
However,
the
off-chain
metadata
and
the
metadata
that's
linked
with
that
asset
is
also
just
as
important,
because
that
information,
whether
it
be
traits
in
this
case
for
the
Suzuki
nft,
we
have
different
attributes
that
are
associated
with
the
nft
or
links
to
other
pieces
of
off-chain
content,
such
as
images,
videos,
so
on
and
so
forth.
All
of
those
pieces
of
data
and
information
are
information
that
give
the
nft
additional
meaning.
So
I
think
this
is
a
nice
example
that
illustrates
why
metadata
is
relevant.
A
Generally
speaking,
and
this
gets
us
to
the
question
of
well,
why
do
metadata
standards
matter
in
the
first
place?
And
the
answer
that
I
have
here
on
this
slide
is
that
if
metadata
formats
are
standardized,
then
multiple
applications
can
build
upon
the
same
metadata
because
they
understand
how
to
consume
the
metadata,
and
we
have
this
level
of
interoperability
and
portability
where
the
same
metadata
format
can
be
consumed.
A
So
this
nft
here
it
has
a
name
Superfluous
and
the
same
metadata
is
being
read
in
both
applications,
but
the
user
interface
and
user
experience
is
different
in
both
of
these
contexts,
because
they're
targeting
different
types
of
consumer
preferences,
they're
targeting
different
types
of
experiences
on
the
product
side
that
they
like
to
create.
So
this
is
one
of
the
powerful
things
that
standards
help
enable
where
these
two
applications
can
exist
and
have
the
same
shared
fundamental
data
layer.
A
Now
this
is
the
gaming
and
metaverse
track.
So
a
good
question
is
well.
Why
do
metadata
standards
matter
for
gaming
and
metaverse
applications,
and
the
answer
that
I'd
like
to
give
here
is
that
well
in
the
context
of
games
and
metaverse
applications?
How
should
content
such
as
images
and
videos
be
referenced
in
asset
metadata
such
that
that
content
can
be
built
upon
in
these
different
contexts
and
different
applications
in
this
game?
A
World
in
this
metaverse
world,
one
of
the
most
powerful
things
about
all
the
work
that
gaming
and
metaverse
entrepreneurs
and
application
developers
are
working
on.
Is
that
given
a
simple
starting
state
for
a
game
world
or
a
metaverse
world?
Instead
of
having
to
build
all
the
content
yourself
and
instead
of
having
to
build
all
the
experiences
yourself,
you
can
drive
that
forward
with
the
community
and
developers
in
a
permissionless
way
can
extend
the
game.
A
World
extend
game,
clients
extend
metaverse
clients
by
reusing
the
same
content
and
logic,
building
blocks
that
were
established
previously
by
their
predecessors.
So
the
example
that
hopefully
can
help
illustrate
some
of
the
interesting
things
you
can
do
here
when
you
are
able
to
build
permissionlessly
on
the
work
of
others.
Is
this
very
simple
illustration
which
is
imagine
you
have
a
trophy
asset
in
a
game
that
comes
attached
with
an
instant
video
replay
of
the
victory?
A
What
if
we
as
a
developer,
want
to
build
an
extension
of
the
game
client,
where
every
single
time,
the
player
holding
the
trophy
asset
inserts
a
particular
special
Zone
in
the
grid,
the
instant
video
replay
of
the
victory
is
played
back
in
a
virtual
movie
theater
for
everyone
in
the
general
proximity
of
that
player
to
watch,
and
this
can
be
an
experience
that
no
one
ever
expected
to
build
when
the
game
world
was
first
created.
But
someone
can
decide
that
that
is
an
interesting
experience
that
they'd
like
to
enable.
A
However,
this
requires
you
to
be
able
to
permissionlessly
access
that
content.
It
requires
you
to
be
able
to
parse
the
metadata
of
these
assets
that
are
being
created
by
other
players
and
build
on
top
of
that.
So
I
think
that's
one
of
the
interesting
ways
that
we
can
approach
metadata
standards
for
gaming
and
metaverse
applications.
And
lastly,
this
is
a
talk
about
video,
so
I'd
be
remiss
not
to
talk
about
why
metadata
standards
for
video
and
the
main
thing
that
I'm
hoping
people
can
take
away
with
this
slide.
A
Is
that
there's
this
interesting
question
of
in
a
video
context?
How
should
videos
be
referenced
in
metadata
if
there
are
so
many
different
video
formats
and
so
many
different
possible
Renditions
of
the
same
exact,
visual
content
when
it
comes
to
video
oftentimes?
The
video
that's
produced
is
very
rarely
the
video
that
you
see.
This
might
be
because
you
have
an
mp4
version
of
the
file.
You
have
an
mov
version
of
the
file
or
it
might
be
because
you've
processed
the
file
into
different
qualities.
A
You
have
applied
different
filters,
and
this
comes
at
odds
with
this
other
property
that
we
like
in
web
3,
which
is
verifiable
data.
So
the
nice
thing
about
cids
in
the
ipfs
ecosystem
is
that
we
can
establish
links
in
the
metadata
with
a
verifiable
piece
of
content,
because
the
CID
is
just
the
hash
of
that
content.
However,
when
it
comes
to
dealing
with
this
problem
of
video
formats
and
Renditions,
a
single
CID
on
its
own
is
not
going
to
be
able
to
address.
A
This
represent
all
the
possible
different
Renditions
that
you
might
need
today
and
then
also
thinking
forward
to
the
future
all
the
different
versions
of
that
video
that
you
might
need
in
the
future.
What
happens
if
someone
comes
out
with
a
new
format
that
you'd
like
to
support?
That's
only
supported
on
certain
devices.
What
happens
if
you
want
to
enhance
the
video
into
a
new
quality
that
wasn't
previously
supported?
A
All
of
those
things
are
extensions
of
the
same
content
and
if
you
kind
of
bake
into
the
CID,
just
the
single
piece
of
content
that
you
have
today,
there
are
questions
around
how
you
should
handle
that
in
the
future.
On
the
other
hand,
in
a
web
2
context,
we
can
solve
that
problem
right,
a
YouTube
url.
If
we
link
that
in
the
metadata,
you
can
serve
many
possible
Renditions
with
that.
But
unfortunately
this
is
a
reference
to
a
location
and
not
to
the
data
itself
and
the
link
could
break
so.
A
A
So
this
leads
to
an
idea
that
we've
been
working
on
at
live
peer
and
just
generally
how
we
can
support
the
weather
ecosystem
and
it's
this
notion
of
video
compute
on
top
of
decentralized
storage,
where
everything
starts
with
decentralized
storage
as
the
shared
verifiable
data
layer.
But
you
have
compute
layers
that
you
layer
on
top
in
order
to
augment
and
enhance
the
content
that
already
was
anchored
into
decentralized
storage.
A
So
the
nice
thing
about
cids
and
metadata
is
that,
as
shared
verifiable
data,
they
can
be
the
root
of
content
built
on
by
others.
So
the
CID
just
being
the
hash
of
a
particular
data.
It
can
serve
as
the
original
reference
of
content
and
additional
Renditions,
whether
they
be
different
qualities,
filtered
versions
of
the
original
content.
A
Those
can
be
processed
by
the
layers
that
you
layer
on
top
of
the
CID,
but
the
ultimate
root
and
input
is
still
the
CID
of
the
original
content
and
a
way
that
I
like
to
think
about
it
is
that
we
can
view
cids
as
the
base
ingredients
for
enhanced
Renditions.
So
something
that
I
will
highlight
here
is:
let's
say
that
we
have
nft.
A
We
have
the
metadata
for
that
nft
and
then
I'm
sure
many
of
people
are
familiar
with
the
Rick
Roll
video,
so
I
just
chose
to
use
that
as
an
example
here,
because
there
is
actually
an
interesting
news,
article
that
came
out
last
year
that
I'll
get
into
momentarily.
But
let's
say
that
we
have
a
Rickroll
nft
right,
so
we
have
a
CID
that
references,
the
Rickroll
mp4
file,
and
now
this
is
permanently
linked
with
the
nft.
A
The
Rick
Roll
video,
as
many
people
know,
is
a
Rick
Astley
music
video
that
came
out
a
long
time
ago.
So
when
it
was
actually
created,
we
as
kind
of
video
technologists,
didn't
really
have
as
sophisticated
Tools
in
order
to
create
really
high
res
versions
of
that.
But
in
2021
there
was
this
Verge
article
that
came
out
that
noted
that
someone
actually
using
some
new
video
AI
based
techniques
upscaled
the
original
Rick
Roll
video.
So
now
you
can
watch
Rick
Astley
in
4k.
A
Now
you
can
get
rickrolled
in
4k
and
that's
an
example
of
like
a
post-processing
step
that
was
not
available
at
the
time
that
the
content
was
released.
But
naturally,
you
want
to
see
if
you
can
enhance
the
experience
of
these
videos,
so
naturally
you're
going
to
want
to
explore
applying
these
forms
of
compute
in
order
to
enhance
what
you
already
have.
So
in
this
case,
we
can
apply
compute
on
top
of
the
Cid
in
order
to
get
this
enhanced
video
Rickroll
in
4k.
A
So
this
brings
me
to
a
demo
that
I
want
to
show,
and
hopefully
it
will
work
but
before
getting
into
it
the
feature
that
I
talked
about
earlier
ipfs
CID,
video
streaming
playback.
The
general
idea
here
is
that
the
plumbing
for
video
streaming
online
is
this
process
called
transcoding
and
transcoding
is
the
process
of
taking
an
input,
video
and
transforming
it
into
all
of
these
different
qualities
and
Renditions
and
formats
so
that
you
could
be
on
your
device
on
your
Android
device.
A
Your
iOS
device,
high
bandwidth
connection,
low
bandwidth
connection,
and
you
can
continue
watching
the
same
content
seamlessly
and
if
you
go
to
a
website
like
YouTube
If,
you
go
to
the
UI.
You
can
actually
see
here
that
there's
multiple
different
qualities
you
can
pick
from
and
by
default,
the
player
on
the
front
end.
It's
going
to
intelligently
choose
for
you,
so
you
never
as
a
user
or
a
consumer
have
to
touch
anything.
A
So
this
comes
to
what
we've
been
doing
at
live
Pierre,
so
live
peer,
is
a
protocol
on
the
network
for
supporting
Global
and
open
decentralized,
video
infrastructure
and
one
of
the
primary
tasks
and
compute
that
the
network
is
responsible
for
Today
Is
video
transcoding.
So
video
transcoding
is
a
very
heavy
process,
but
it
can
be
Hardware
accelerated,
so
someone
that
is
say
a
crypto
Miner
that
has
access
to
a
GPU.
That
GPU
can
actually
be
quite
efficient
for
this
task
that
we
work
on
here
in
the
network.
A
A
So
coming
back
to
transcoding
as
I
mentioned
previously
in
a
lot
of
cases,
if
you
want
to
have
this
efficient
playback
process,
if
you
want
to
be
able
to
have
things
work,
regardless
of
what
device
you're
on
and
what
internet
connection
speed
you're
on
transcoding
can
help
there.
But
what
happens
if
all
you
have
is
a
mp4
file?
What
happens
if
you
have
an
nft
that
all
it
has
is
a
CID
link
to
a
file
that
already
exists
on
ipfs?
A
So
I
don't
know
if,
okay,
so
I,
don't
think
I
have
sound
here,
so
I'll
just
narrate
instead.
So
what
is
happening
here
is
that
oftentimes,
you
see
the
effects
of
transcoding
a
lot
more
prominently
when
you're
on
a
shaky
internet
connection.
So
what
you
can
do
in
the
browser
and
what
I
did
here
is
I'm
actually
going
to
simulate
a
3g
connection.
A
So
it
feels
like
you're
on
a
mobile
connection,
as
opposed
to
like
your
high-speed
internet,
which
might
be
like
fiber
optic,
Wherever,
You
Are,
so
first
I'm
going
to
switch
over
to
that
mobile
connection
and
then,
by
being
on
that
mobile
connection,
then
I'm
going
to
show
the
playback
experience
when
you
take
an
mp4
asset
that
you
play
back
on
ipfs
and
then
using
the
video
streaming
approach.
I
mentioned
just
now.
A
So
next,
what
I'm,
showing
here
is
that
we
have
this
docs
page
for
this
live
pure
JS
SDK,
and
what
you
can
actually
do
is
you
can
take
the
same
CID
that
was
being
played
back
from
the
ipfs
Gateway
and
you
can
plug
an
ipfs
URL
with
that
CID
into
this
page.
And
what
this
shows
is
that
it's
going
to
try
loading
the
video
streaming
based
playback
and
from
here
we
can
actually
see
the
same
exact
content
being
played
back
via
the
CID.
A
That
was
inputted,
and
the
nice
thing
here
is
that
we
don't
actually
get
any
buffering.
So
we're
still
on
this
same
3g
connection
right.
So
it's
still
shaky.
It's
still
not
going
to
be
fast
enough
for
a
lot
of
types
of
data
downloads,
but
because
we
transcoded
it,
we
get
access
to
these
different
versions
that
we
can
switch
between
and
from
an
end
consumer
point
of
view.
You
actually
continue
playback
and
you
don't
get
the
spinning
Circle
of
Death
anymore.
A
Here
you
can
get
shared
verifiable
data
via
ipfs,
but
you
can
also
apply
compute
on
top
in
order
to
get
this
enhanced
video
experience,
and
that
leads
me
to
the
last
portion
of
my
talk,
which
is,
we
just
showed
this
notion
of
ipfs
CID
video
streaming
playback,
but
when
taking
a
step
back,
we
think
we
can
generalize
this
feature
and
when
we
generalize
this
feature,
we
end
up
with
what
we
call
a
decentralized
storage
video
compute
pipeline.
So
transcoding
is
what
I,
just
demoed
and
transcoding
is
a
big
part
of
supporting
video
on
the
internet.
A
Cool,
so
a
first
question
is
well
this
video
compute
pipeline.
What
other
forms
of
video
compute
could
actually
be
interesting?
What
does
that
even
mean?
So
I
present
a
few
examples
here
that,
hopefully,
can
help
illustrate
a
little
bit
of
what
you
can
start
exploring
when
you
have
this
pipeline.
A
So
as
a
result,
even
if
you
didn't
start
off
with
a
transcription
of
what
actually
happened
in
the
dialogue
of
a
video,
you
can
Auto
generate
that
now
and
it
actually
can
work
pretty
well,
and
someone
actually
did
that
for
a
whole
slew
of
videos
on
YouTube.
So
the
second
screenshot
shows
there's
this
Lex
Friedman
podcast
and
all
the
videos
are
publicly
available
on
YouTube
someone
actually
downloaded
all
the
videos
and
transcribed
and
generated
captions
for
all
of
them,
which
is
pretty
cool.
A
But
then
the
next
step
that
you
can
take
from
there
is
that
you
have
the
captions.
Well,
you
want
the
captions
in
your
video
and
support
them
in
your
video
as
well.
So
something
that
you
can
do
is
auto
generate
the
captions
and
then
also
automatically
insert
them
into
the
video
stream,
so
that,
similar
to
what
you
see
on
YouTube,
where
you
can
turn
on
closed
captioning
for
your
video,
you
can
support
this
in
more
applications
as
well,
but
you
don't
necessarily
need
to
be
YouTube,
I
think.
Another
good
example
is
something
called
super
resolution.
A
It's
just
a
fancy
term
for
can
I
take
a
original
video
that
I
have
and
then
increase
the
quality
of
it.
So
it
can
look
crisper
and
more
quote,
unquote.
Modern!
So
we've
seen
a
lot
of
examples
of
this
where
people
try
to
restore
old
footage
and
then
basically
enhance
it,
so
that
maybe
an
animation
that
came
out
in
1995,
you
can
restore
it.
So
it
looks
like
it
came
out
in
2015
or
2020.
A
and
there's
in
some
interesting
compute
work
here
in
the
AI
side,
and
one
of
the
most
popular
models
for
this
is
called
esrgan
and
on
the
left.
I
show
a
screenshot
of
something
that
I
did
locally,
which
was
I,
took
an
input,
video
and
then
applied
ESR
Gan
to
it,
so
that
we
can
upscale
it
and
increase
the
resolution,
and
you
might
not
be
able
to
see
it
perfectly
here.
A
I
think
another
example:
that's
interesting
is
many.
People
here
might
have
bought
into
or
been
checking
out
the
generative
AI
hype,
that
is
on
Twitter
and
what's
interesting
about
generative
AI.
Some
of
the
stuff
that
people
have
been
working
on
in
stable
diffusion
is
that
on
one
hand,
people
are
working
on
models
to
basically
be
able
to
transform
text
to
video,
but
even
before
getting
to
the
text-to-video
models,
we
can
generate
interesting
videos
from
text
already
using
text
to
image
models.
A
At
the
end
of
the
day,
the
output
of
this
pipeline
is
encoded,
video
that
can
be
transmitted
on
the
internet,
and
we
think
that
this
is
an
interesting
way
to
think
about
how
to
extend
the
data
that
you
have
in
decentralized
storage,
augment
it
enhance
it
so
that
you
can
not
only
have
the
experiences
you're
used
to
with
media
and
web
2.
But
you
can
go
beyond
that
as
well
and
at
the
end
of
the
day,
cids
Remain.
A
The
last
thing
that
I'll
mention
and
I'll
close
with
is
this
notion
of
verified
inputs
and
verified
outputs.
So
I
think
an
important
question
to
ask
here
an
important
question
that
we're
looking
into
is
in
a
world
of
many
formats
and
Renditions
of
the
same
original
video,
given
a
verified
input.
So
a
CID
is
a
verified
input
in
that
you
can
verify
that
the
data
matches
the
hash
of
the
CID
or
sign
data,
or
someone
takes
their
private
key
and
signs
the
hash
of
a
data.
A
Given
a
verified
input,
how
do
we
link
this
video
output
of
the
pipeline
back
to
the
input?
So,
ideally
you
have
something
that
looks
like
this
illustration
that
I
have
on
this
slide,
where
I
have
the
video
output
and
then
transmitted
with
the
data.
Is
this
linkable
provenance
chain
through
the
Transformations
that
were
applied
back
to
the
verified
input,
so
verified
input
in
and
verified
output
out
and
I?
A
A
So
that's
something
that
we're
spending
Cycles
thinking
about
at
live
peer
and
we
think
there's
some
interesting
things
that
you
can
do
here
with
cryptographic,
proofs,
providing
audit
chains
for
the
media
having
those
audit
chains
be
transmitted
alongside
that
video
or
piece
of
media,
so
that
when
you
are
able
to
view
it
you're
also
able
to
consume
and
display
and
use
as
a
developer
that
Providence
chain.
A
Cids
can
actually
serve
as
this
shared
verifiable
data
layer
that
then
can
provide
inputs
into
this
video
compute
pipeline.
That's
based
on
decentralized
storage.
I
mentioned
this
CID
video
streaming
playback
as
an
example
of
what
you
can
do
with
that
pipeline,
that's
rooted
in
decentralized
storage
and
lastly,
this
General
video
compute
pipeline
Anchored,
In
decentralized
Storage,
we
think,
can
be
a
general
approach
to
creating
enhanced
video
experiences
with
different
types
of
video
compute
on
top
of
decentralized
storage.