►
From YouTube: Software-Defined Assets Demo
Description
This video shows off how to get started with Dagster’s Software-Defined Assets, and the features they enable. It walks through creating assets in Python, as well as loading them from tools such as dbt.
Read more about Software-defined Assets in Dagster here: https://dagster.io/blog/software-defined-assets
A
Hi,
my
name
is
owen
and
I'm
a
software
engineer
working
on
dagster.
This
demo
will
walk
through
how
to
get
started
with
software-defined
assets
and
the
powerful
features
that
they
enable
through
it
we'll
build
out
a
sample
data
platform
showing
how
to
create
our
own
assets
in
python,
as
well
as
how
to
load
them
from
external
tools
such
as
dbt.
A
With
that,
let's
jump
into
the
code.
Let's
start
from
the
very
beginning,
our
team
wants
to
take
some
raw
data
and
store
it
in
a
data.
Warehouse
we'll
start
with
a
few
imports,
then
define
a
function
to
generate
our
raw
data
to
keep
things
simple,
we'll
just
grab
some
data
from
wikipedia
using
pandas,
but
this
is
completely
arbitrary
python
code.
We
can
use
whatever
libraries
we
want
and
load
operate
on
and
return
data
of
any
type
by
annotating
our
function
with
asset
decorator.
It
becomes
an
asset.
The
key
or
name
of
the
asset
is
country.
A
Population
and
its
contents
are
computed
using
the
function.
We
just
defined
you'll
notice
that
we
haven't
yet
told
dagster
how
and
where
to
store
this
data.
We
could
include
that
logic
in
line,
but
it's
often
useful
to
keep
this
business
logic
separate
from
I
o
concerns
by
default.
Our
assets
will
be
stored
as
pickled
files
on
our
local
file
system,
but
this
behavior
is
completely
customizable
and
dagstr
is
built-in
support
for
storing
assets
with
major
cloud
storage
systems
such
as
s3,
adls
and
snowflake.
A
A
We
can
monitor
its
status
directly
from
this
view,
seeing
that
a
run
is
currently
refreshing
it
or
we
can
jump
into
a
live
updating
timeline
to
view
detailed
logs
as
they
come
in
once
the
run
successfully
completes
our
data
is
stored
and
ready
to
be
used
from
here,
it's
natural
to
want
to
take
our
new
data
and
do
something
with
it.
For
example,
we
might
want
to
use
the
raw
population
data
and
aggregate
it
per
continent.
A
A
We
define
a
data
dependency
simply
by
adding
an
argument
to
our
function,
with
the
name
of
the
asset
that
we
want
to
depend
on
once
we
turn
this
into
an
asset.
Dijkstra
will
handle
the
rest
from
creating
a
lineage
link
between
these
two
assets
to
loading
the
contents
of
country
population
as
input
to
this
function
when
it
comes
time
to
run
it.
A
A
While
that's
running
it's
useful
to
take
a
step
back
and
consider
the
benefits
we're
already
getting
out
of
this
declarative
model.
After
just
a
few
lines
of
code.
At
no
point
in
this
process
did
we
need
to
think
about
tasks.
We
just
wrote
the
code
necessary
to
compute
the
contents
of
our
assets
and
the
orchestrator
was
able
to
string
these
definitions
together
to
execute
them
in
the
proper
order.
A
In
addition,
our
orchestrator
has
direct
understanding
of
the
assets
that
it's
responsible
for
giving
us
insight
into
how
they're
computed
and
how
up
to
date,
they
are
now
that
we
have
a
couple
of
assets
working
smoothly.
Let's
fast
forward
a
bit
and
see
how
this
scales
to
keep
everything
organized,
we'll
break
things
out
into
separate
files,
then
combine
our
assets
into
a
single
dagster
repository
first
we'll
load
in
the
population
data
assets
that
we
just
created.
A
A
Finally,
we'll
bring
in
a
machine
learning
team
that
will
use
the
data
processed
by
dbt
to
train
a
machine
learning
model.
These
assets
were
defined
in
python,
just
the
same
as
our
original
assets.
Once
again,
the
code
inside
these
assets
is
completely
arbitrary
and
you
can
use
whatever
tools
or
libraries
you
want
with
all
these
assets.
Added
to
our
repository,
we
now
have
a
number
of
assets
which
are
represented
as
pandas
data
frames
in
python,
then
serialized
as
files
on
local
disk.
A
A
You
might
have
noticed
that
changing
the
storage
location
of
our
assets
didn't
require
modifying
the
definitions
themselves.
This
makes
it
easy
to
write
unit
tests
for
our
assets
as
the
business
logic
stays
decoupled
from
the
external
systems
where
the
asset
will
be
stored.
Each
asset
can
be
assigned
a
different.
I
o
manager
allowing
precise
handling
of
storage
behavior
now
that
we
have
some
more
assets
defined.
Let's
head
back
to
the
ui,
the
first
thing,
we'll
notice
is
that
our
original
population
assets
now
have
some
downstream
dependencies.
A
If
we
go
to
the
global
lineage
graph,
we
get
a
total
view
over
all
of
the
assets
in
our
data
platform,
this
graph
spans
groups,
jobs
and
code
locations,
giving
you
insights
into
your
data
dependencies.
Regardless
of
how
you
choose
to
organize
your
code
and
execution
clicking.
Any
of
these
assets
will
bring
up
a
sidebar
containing
some
high
level
information
about
the
asset.
A
This
is
one
of
the
many
views
throughout
the
ui
that
allows
you
to
see
which
partitions
have
been
computed
for
an
asset
making
it
easy
to
tell
if
any
data
is
missing.
If
we
want
to
learn
more
about
a
specific
asset
such
as
our
population
summary
table,
we
can
search
for
it
by
name
and
get
taken
to
its
asset
details
page
from
here
we
can
see
information
on
every
time.
It's
been
materialized,
its
definition
and
which
assets
it
relates
to.
If
we
want
to
refresh
this
asset,
we
can
do
that
directly
from
this
page.
A
This
sort
of
workflow
is
also
easy
to
put
on
a
schedule.
If
we
go
back
to
our
code,
we
can
define
a
new
job
that
targets
a
selection
of
assets
that
are
upstream
of
our
population
summary
table
and
put
it
on
a
schedule
that
will
run
it
once
a
day
back
in
the
ui
we
can
see.
Our
asset
is
now
on
a
schedule
and
the
job
we
created
shows
up
ready
to
be
run.