►
From YouTube: Data Ingestion as Code - a showcase of managing managed services. Dagster Community Day - Dec 2022
Description
In this session, Ben Pankow provides a demo of 'Data Ingestion as Code', and Nick Schrock explains the broader context of 'managing managed services' from within the context of the data orchestration layer.
A
Hi
everyone,
my
name
is
Ben
and
I'm.
An
engineer
working
on
dagster
today,
along
with
Nick
I'm,
excited
to
share
our
new
data
ingestion
as
code
functionality
for
earbud
and
5
Trend
Integrations
feature
allows
you
to
manage
your
five
Trend
and
airbike
connections
without
leaving
your
python
code
base.
A
A
A
Here
we
have
an
air
byte
resource,
that's
pointing
at
our
local
airbite
instance
we'll
go
ahead
and
Define
an
earbike
connection,
object
we'll
provide
a
name
and
then
Define
the
source
and
destination
using
typed
python
classes,
which
are
automatically
generated
from
your
right.
Spec
files,
we'll
input,
config,
passing
credentials
from
the
environment,
then
we'll
specify
the
list
of
streams
to
sync,
including
the
sync
mode
and
finally,
we'll
tell
dagster
to
load
this
connection.
A
A
If
we
move
over
to
our
tags,
for
instance,
and
reload
our
asset
definitions,
we
can
see
a
new
set
of
software-defined
assets
associated
with
the
tables
that
air
bytes
generating
in
our
destination
in
Snowflake.
If
we
click
on
them,
we
even
get
metadata
such
as
the
table
schema
that
airbright's
going
to
generate
once
we
run
async
selecting
our
assets,
we
can
go
ahead
and
kick
off
the
materialization.
A
A
B
Thanks
Ben,
that
was
an
awesome
Demo
First
of
all.
For
those
who
don't
know
me,
my
name
is
Nick
Schrock
I'm,
the
CTO
and
founder
of
Elemental,
and
what
Ben's
demonstrated
here
today
was
not
just
a
feature
of
a
couple
Integrations,
although
what
features
they
were,
but
we
really
think
it's
a
massive
Leap
Forward
for
practitioners
in
the
modern
data
stack.
B
So
why
do
we
think
that?
Well
what
Dem?
What
this
demo
showed
is
that
now
ingestion
tools
can
be
a
first-class
citizen
in
your
engineering
workflow.
You
can
manage
their
behavior
in
modern
type
to
python.
You
can
manage
change
with
Source
control
and
get
all
the
associated
benefits.
Cicd.
You
can
review
changes,
you
can
roll
them
back.
You
can
test
them.
You
can
build
your
own
abstractions
on
top
of
them,
and
Dexter
remains
a
source
of
Truth
for
your
asset
definitions
rather
than
having
them
be
defined
as
state
in
a
managed
service
or
app.
B
You
know
this
is
the
way
that
Engineers
want
to
work,
but
this
will
not
end
with
airbite,
5tran
and
other
ingestion
tools.
There
are
and
will
be
other
managed
services
that
Define
and
control
the
behavior
of
asset
definitions
within
their
tool.
The
question
is:
how
do
you
want
to
manage
change
with
those
tools?
B
In
other
words,
you
know
what
manages
the
managed
services.
Let's
start
with
our
fundamental
assumptions.
You
know
we
at
dagster
believe
a
few
things
and
our
work
is
centered
around
these
beliefs.
You
know
dagsters
for
those
and
we
believe
that
data
management
is
a
software
engineering
discipline,
and
that
means
that
all
data
assets
should
be
defined
in
software,
meaning
code,
because
data
assets
are
fundamentally
business
logic
and
this
change
in
these
systems,
because
its
software
should
be
managed
to
the
software
engineering
life
cycle.
B
So
how
does
that
apply
to
these
ingestion
tools
and
other
managed
Services?
Well,
let's
talk
about
this
device,
you
know.
So
if
you
believe
that
data
Management's
a
software
engineer,
engineering
discipline,
you
shouldn't
be
using
your
mouse
and
pointing
and
clicking
around
a
UI
to
make
production
changes
and
deploy
them.
You
know
put
another
way:
a
data
practitioner
should
not
be
forced
to
point
and
click
in
a
UI
to
make
changes
to
and
deploy
business
logic.
It's
incredibly
dangerous
and
fragile.
A
lot
of
this
work
actually
stemmed
from
our
own
internal
data
platform
experience.
B
We
extensively
use
ingestion
tools
and
it
became
increasingly
scary
and
nerve-wracking
to
make
changes
to
our
own
injection
logic,
ingestion
logic,
because
someone
screwed
that
up
and
mistyped
something
or
clicked
the
wrong
thing.
How
do
you
roll
that
back?
How
do
you
figure
out?
What
happened?
How
do
you
figure
out?
Who
did
it?
Maybe
you
have
an
in-app
audit
feature
that
may
or
may
not
be
complete,
but
even
if
that
exists,
it's
totally
disconnected
from
the
rest
of
your
processes.
B
You
know,
and
additionally,
everything
that's
encoding
in
your
ingestion
tool
is
completely
interconnected
to
what
is
going
on
the
rest
of
your
platform.
You
have
Downstream
computations
that
are
dependent
on
it,
so
we
really
want
to
manage
this
with
code.
Well,
isn't
this
infrastructure's
code
and
in
fact
that's
a
common.
You
know
held
belief
and
it's
a
reasonable
assumption.
B
You
know
this
is
in
fact
a
discussion
on
one
of
air
bites
forms
and
they
plainly
say
we
would
like
to
be
able
to
manage
and
update
our
data
sync
operations
as
code,
which
is
exactly
what
we
just
showed
you.
So
we
should
just
use
terraform
right
and
nope.
We
don't
think
so.
Let's
talk
about
why
for
a
second,
so
terraform
is
a
bespoke
custom,
DSL
designed
for
managing
infrastructure.
B
You
know-
and
it
was
designed
so
that
infrastructure
Engineers
could
set
up
load,
balancers,
ec2
instances,
databases
and
the
like
it's
at
a
fundamentally
different
layer
of
the
stack
and
it's
designed
for
a
completely
different
persona.
It's
for
infrastructure,
not
business
logic,
and
as
a
result,
we
don't
think
a
data
practitioner
should
be
forced
to
learn
it
in
order
to
find
data
assets
right.
B
They
shouldn't
have
to
learn
this
completely
foreign
tool
chain
in
language
to
make
changes
to
what
is
fundamentally
business
logic
and
then,
furthermore,
it's
a
tool
terraform
that
has
no
knowledge
of
the
rest
of
your
data
platform
and
assets.
It's
a
completely
siled
Black
Box.
You
know
how
does
one
declare
a
dependency
on
an
entity
defined
in
terraform?
You
can't
you'd
have
to
double
encode
it.
You
have
to
write
in
terraform
and
then
probably
write
it
again
in
your
orchestrator
and
then
set
dependencies
on
it.
B
B
We
believe
that
the
orchestrator
is
the
ultimate
source
of
Truth
for
defining
and
operating
your
data
assets,
it's
where
all
your
defendencies
are
defined
and
it's
where
all
those
dependencies
are
enforced,
because
it's
the
orchestrator
that
enforces
the
order
of
execution
and
it's
that
single
operational
pane
of
glass
for
your
entire
data
team.
You
know,
in
our
view,
Additionally
the
orchestrators
at
the
center
of
your
data
teams,
engineering
and
deployments
life
cycle.
It's
where
everything
has
to
come
together.
B
As
a
result,
we
think
it's
very
natural
for
ingestion
tools
and
other
managed
services
to
be
peers
to
DVT
spark
python.
Driven
assets
and
all
the
other
tools
and
have
a
single,
cohesive,
workflow
and
system
for
defining
your
data
platform.
So
thanks
for
your
time,
and
thanks
for
coming
to
Dexter
day.