►
From YouTube: Bundlegate - To bundle, unbundle, rebundle the data stack? w/ Nick Schrock & Scott Breitenother
Description
The last few weeks have featured a loud debate in the data community - should the data stack be bundled, unbundled, or rebundled? We're going to dive into this debate with Nick Schrock (Elementl) and Scott Breitenother (Brooklyn Data Co). This should be full of fireworks, so tune in!
Streamed live on YouTube and LinkedIn
A
What's
up
everybody,
it's
it's
friday!
Let's
talk
about
some
data
today
we
have
a
nick
schrock
scott
bright
another.
So
let's
kick
things
off.
What's
up
everybody,
hello,
hello,.
A
So
yeah,
I
think
this
chat
came
about.
We
were,
I
don't
exactly.
I
think,
we're
talking
about
a
bunch
of
stuff
happening
around
bundling
over
the
past
few
weeks.
So
a
little.
A
A
fundal
bundle,
I
guess
for
for
people
who
aren't
in
the
know,
on
what
we're
talking
about
nick.
Do
you
wanna,
maybe
introduce
a
topic.
B
A
B
He
published
a
piece
called
the
unbundling
of
airflow,
where
he
described
all
the
emergence
of
these
kind
of
more
domain
specific
tools,
and
he
framed
it
as
his
unbundling
yeah
the
air
flows
being
unbundled.
What
was
previously
a
hundred
tasks
in
airflow
is
now
100
dbt
models.
What
was
before,
like
50
ingest
jobs
in
airflow,
is
now
like
a
single
node
that
talks
to
5,
tran
and
or
airbite,
or
something
like
that.
So.
A
B
Kind
of
like
like,
and
that
phenomenon
was
explained
as
unbundling,
I
would
have
bring
a
little
differently
of
more
like
the
dag,
is
almost
being
pushed
down
into
constituent
tools
and
being
federated
across
those
things,
because
each
one
of
those
each
one
of
those
tools
has
its
own
baby
orchestrator
in
it.
You
know
like
if
you
I
haven't
done
it,
but
if
you
grep
the
dbt
code
base,
there's
probably
a
topological
sort
somewhere
in
there.
That's
like
taking
the
dag
of
dbt
models
and
then
running
a
sql
command
per
one.
B
So
there's
some
notion
of
orchestration
within
each
of
those
things,
and
this
is
kind
of
a
phase
shift
in
in
tooling.
Like
that,
and
you
know,
airflow
still
would
exist
in
those
worlds,
but
it
just
becomes
much
more
of
a
coarse-grained
tool.
B
But
I
saw
this
and
you
know
it
was
what
very
well
time
for
a
new
release
that
we
were
going
to
do
around
a
new
concept
called
software
defined
assets,
and
it
was
really
actually
what
he
was
saying
was
really
in
line
with
a
lot
of
what
we
were
thinking
internally
on
the
daxter
team,
because
we
didn't
phrase
it
on
bundling.
But
we
saw
that,
like
so
much
knowledge
was
being
pushed
down
into
these
constituent
tools
and
there
was
earliest
and
they
all
kind
of
were
much
more
asset
oriented
than
task
oriented.
B
And
we
had
the
opportunity
to
kind
of
like
in
a
first-class
way,
take
all
that
knowledge
embedded
in
all
those
tools
and
resurface
it
in
a
single
operational
plane.
And
so
I
took
the
opportunity
to
frame
that
as
rebundling
the
data
platform,
because.
B
Thank
you,
but
but
no
it
was
yeah.
We
wouldn't
have
used
the
term
rebundle
without
that
post,
but
the
thrust
of
the
content
was
exactly
the
same
where
we
felt
the
need
that,
as
kind
of
the
logic
of
all
these
dags
being
pushed
down
these
constituent
tools,
it
actually
reduced
observability
and
understanding
and
there's
really
no
cohesive
surface
to
understand,
what's
happening
in
your
data
systems
and
that's
the
kind
of
where
the
rebundling
the
data
platform
post
came
from,
which
effectively
describes
the
software-defined
asset
system.
D
And
I
mean
this:
this
totally
resonates
with
me.
To
be
honest,
I
I
found
the
article
very
intriguing
gorgeous
article,
but
I
also
felt
like
it
didn't
answer
the
question
of
how
you
could
like
try
to
discard
airflow
a
lot
of
what
he
was
talking
about.
To
be
quite
frank,
I've
been
using
airflow
since
2016
or
2017
in
the
original
conception
of
airflow
would
kind
of
be
considered
abuse
of
airflow.
D
If
that
makes
any
sense-
and
it
was
this
dirty
secret
like
really-
and
I
would
always
train
people
do
as
little
in
airflow
as
possible,
it's
a
babysitting
layer,
and
that
sounds
unsexy,
but
it's
actually
really
really
important.
If
you've
ever
had
crohn
jobs
go
down
on
thanksgiving
day
which
I've.
How
many
of
you
have
done
that
on
the
show
yeah
and
someone
calls
you
to
fix
it,
then
you
understand
why
this
is
important,
but
I
think
your
idea,
nick
and
what
you
guys
are
doing
with
daxter
is
phenomenal.
D
B
It's
it's
definitely
in
spirit,
you
know
we,
you
know,
I
would
prefer
to
actually,
I
think,
a
better
name
for
infrastructure.
As
code
is
software
defined
infrastructure?
Actually,
because
it
doesn't
mean
like
this,
like
terraform-
is
one
implementation
of
it.
There's
like
another
system
called
pollumi
right,
which
is
actually
you
write
it
in
javascript
actually
and
you
and
you
create
representations
of
software
but
yeah.
B
We
definitely
view
ourselves
in
the
lineage
of
like
a
bunch
of
domains
of
computing
have
actually
really
been
well
served
by
these
more
kind
of
declarative
approaches,
and
then
you
defer,
you
know
you
declarably
say
what
you
want
and
then
some
centralized
system
kind
of
makes
that
into
reality.
B
C
Well,
question
I
mean:
is
that:
are
we
limited
by
the
tools
that
you're
kind
of
orchestrating
in
this,
like?
I
think
my
challenge
is
if
you
are
essentially
orchestrating
kind
of
with
daxter
and
these
latest
generation
of
tools
orchestrating
sas
private
sas
public
sas,
self-hosted
applications?
And
you
know
the
your
your
own
applications.
B
B
Away
without
it
and
the
what
what
degree
of
control
they
allow
us
in
the
api
kind
of
it
affects
like
the
level
of
granularity
of
operational
control.
You
have
over
things,
but
even
in
the
like
simple
case,
where
it's
like,
hey
all
you
have
is
an
api
call
and
it
just
materializes
some
static
set
of
tables
say
from
like
error,
byte
or
five
trend.
B
We
still
think
that
this
programming
model
really
serves
it
well,
because
we
can,
even
with
that,
we
still
have
the
four
nodes
we
can
plug
into
the
global
asset
lineage
graph,
and
even
that's
like
a
huge
step
function
in
terms
of
observability
and
and
understandability.
D
Well,
let's
say
all
you
could
do
just
very
basic
api
access
is
ask
a
question:
are
you
done
with
what
you
promised
to
do
scheduled
in
the
console?
But
just
are
you
done
and
what's
your
schema,
those
two
questions
already
there's
a
lot
you
can
do
with
that
and
having
that
information
solves
a
lot
of
problems
or
we're
just
trying
to
use
a
crown
job
when
the
other
system
might
or
might
not
be
done
with
this
task.
C
But
I
mean
I
feel
like
we're:
just
I'm
excited
yeah,
I'm
excited
about
this
bundling
and
unbundling,
and
it
found
sounds
like
everybody
is,
is,
is
pretty
much
saying
the
exact
same
thing,
but
in
a
different
language
where,
like
everything,
is
getting
complex,
but
I'm
kind
of
excited
about
like
what's
next
when
when
you
have
either
through
some
sort
of
centralized
single
tool
that
does
all
that
does,
does
all
the
jobs
or
something
like
dagster.
That
is
so
aware
of
all
the
tasks.
What
are
the
cool
things
that
we
can
start
to?
C
Fast,
like
you
know,
like
materialization
strategies
like
what
grain
you
know,
what
system
should
this
be
materialized
in?
You
know
cdn
load,
balancing
like
all
this
kind
of
stuff,
like
frequency,
I
mean
really
truly,
if,
if
a
system
knows
that
I
need
to
support
dashboards
or
this
kind
of
recommendation
algorithm
downstream,
why
can't
we,
let
you
know
something
like
a
daxter
or
a
dexter
3000
in
a
year
or
two
decide
all
this?
The
kind
of
the
mach
the
make
all
those
optimization
decisions
that
that
humans
are
doing
that
like.
C
Why
should
I
have
to
decide
whether
I'm,
whether
I
should
materialize
my
intermediate
models
or
not
on
dbt
or
whether
I
should
have
a
roll-up
table
or
not
like
shouldn't
something
shouldn't
humans,
know
the
logic
shouldn't
there
be
a
computer
that
can
run
a
thousand
different
scenarios
that
can
can
decide
on
the
best
kind
of
way
to
deploy
this
infrastructure
to
optimize.
For
my
goals,
you
know
what
I
mean
like
again:
that's
the
thing
that
gets
me
excited
about.
B
Yeah,
no,
we
totally
want
to
push
in
the
direction
like
our
kind
of
dream
is
to
be
able
to
say,
like
hey,
I
need
this
asset
materialized
by
some
policy.
Right
like
this
needs
to
be,
you
know,
and
you
just
kind
of
like
set
declaratively
what
the
business
requirement
is,
and
then
the
system
kind
of
takes
care
of
scheduling,
optimizations
materialization
strategies,
all
that
as
much.
C
I
mean
just
like
drop
a
copy
of
it
in
you
know
in
click
house
or
something
like
that,
something
that's
really
snappy
for
my
dashboards
and
and
and
materialize
it
before
monday
morning,
before
the
reports
come
in
like
all
these
type
of
exciting,
like
I
think
that
would
be
very,
very
exciting
to
be
able
to
do
that
in
the
future.
Like
that's
what
I'm
pumped
about
yeah
for
sure
to
work
on
that.
C
D
A
A
B
D
Talk
about
that,
let's,
let
me
put
your
hands
off
you
guys,
yeah.
So
what
I've
been
thinking
lately
and
it's
not
just
in
the
context
of
data
just
up
front,
but
it
was
driven
by
just
thinking
about
airflow
and
dagster
and
data
orchestration
tools,
and
that
is
fundamentally,
let's
think
about
what
made
linux
so
successful.
Well,
you
had
linux
come
out
and
it
was
this
very
open
software
ecosystem.
Anyone
could
write
software
for
it,
which
was
not
as
true
with
unix
right
like
unix.
D
C
D
Example,
the
modern
version
of
this
is
called
systemd,
so
what
a
systemd
does
do
well,
unix
has
to
linux
has
to
run
a
bunch
of
different
services
to
communicate
with
networks,
to
present
the
display
to
do
all
kinds
of
things
that
you
want
to
do
in
your
software,
whether
it's
a
server
or
a
desktop
operating
system
system
d
takes
care
of
all
that
stuff.
For
you,
it
turns
on
different
services.
It
turns
them
on
in
the
correct
order.
It
manages
cron
jobs.
It
manages
essentially
what
you
would
call
orchestration,
but
in
a
desktop
environment.
D
So
what
I
was
thinking
about
increasingly
with
data
orchestration
tools,
is
that
what
we're
really
talking
about
is
like
an
operating
system
for
the
cloud,
so
in
other
words
instead
of
managing
services
on
a
single
box,
I'm
now
managing
services
across
an
entire
cloud,
and
so
that
could
mean
that
I'm
managing
bigquery
5
train
dbt
docker,
maybe
like
all
kinds
of
things
together,
and
so
I
feel
like
to
do
that.
Fundamentally,
there
are
three
components
that
aren't
yet
unified
but
will
probably
be
unified
in
the
future.
D
So,
first
one
at
least
for
me,
being
you
know,
data
engineer
is
going
to
be
something
like
airflow
data
orchestrator
manages
tasks.
That
kind
of
thing.
Next
one
is
going
to
be
something
like
terraform.
So
not
only
are
you
orchestrating
and
timing
things,
but
you're
standing
up
services
and
infrastructure
that
you
need
and
making
sure
they're
turned
on.
D
Third,
one
I'd
probably
throw
a
build
system
in
there
too,
because
I
think
at
some
point
not
only
will
you
orchestrate
these
processes,
but
you'll
check
for
the
latest
version
of
the
code
and
you'll
run
the
latest
version
of
the
code.
So
that's
going
to
be
a
build
process
in
the
mix
so
that
every
time
a
data
engineer
changes
the
sql
query.
It
just
kind
of
auto
deploys
in
a
common
platform,
so
you're
not
having
to
manage
this
across
multiple
platforms.
D
So
yeah,
the
discussion
around
what
you
guys
are
doing
with
dagster
was
just
music
to
my
ears,
because
it
doesn't
fully
realize
that
vision
of
being
an
operating
system
in
the
cloud,
but
I
think
we're
moving
in
that
direction.
Basically
like
there
seems
to
be
this
momentum
towards
solving
these
problems.
That
we've
had
in
unix
and
traditional
operating
systems
at
a
higher
level.
B
A
Take
over
this
whole
conversation,
I'm
actually
trying
to
troubleshoot
some
streaming
connections
too
we're
not
working
on
linkedin
for
whatever
reason
again
so
yeah
continue.
C
B
Here
you
probably
know
better
than
anyone
here
how
the
date
all
the
data
services
and
sas
services
they
kind
of
they're
they're
inching
towards
having
terraform
providers
like,
I
think,
dbc
cloud
has
one
now,
and
maybe
five
trend
does.
But
do
you
use
terraform
in
the
context
of
data
platform
deployment
with
your
clients
at
all.
C
It's
not
there
yet
it's
funny,
I
I
feel
like
I've,
so
we
do
a
lot
of.
We
do
a
lot
of
infrastructure
as
code,
but
we
use
kind
of
get
managed,
sql
like
grant
statements
and-
and
things
like
that,
because
it's
like,
I
guess,
there's
two
reasons.
I
think
we
found
that
our
clients
aren't
quite
ready
for
terraform
and
the
tools
aren't
readily
ready
for
terraform.
There
isn't
a
lot
of
there
aren't
a
lot
of
folks
that
are
using
terraform
to
manage
cloud
data
resources.
C
Yet
I
think
it's
yeah,
I
think
we're
we're
just
not
there
yet,
and
a
lot
of
the
services
themselves
aren't
ready.
I
think
snow
snowflake
and,
like
some
of
the
cloud
data
warehouses
are
further
along,
but
five
train
to
dvd
cloud.
I
don't
necessarily
feel
like
using
terraform
to
manage.
C
I
mean
just
from
a
just
making
sure
everything's
working
and
if
it's
broken,
knowing
that
it's
when
it's
broken
and
re-running
and
just
it's
and
everything
is
a
cron
right
now
and
you
know
we,
we
do
manage
things
through
kind
of
infrastructure
as
code
using
sql
sql
statements,
but
it
is
like
it's
a
challenging
job
to
manage
this
much
cloud
architecture
and
cloud
resources
in
the
current
state.
It's
just.
C
And
we
just
I
don't
want
to
push
any
like.
I
don't
want
to
push
on
terraform
on
any
of
our
clients
yet
because
they
wouldn't
be
ready
for
it
either.
A
Yes
about
a
couple
minutes
ago,
well,
a
couple
minutes
ago,
about
half
hour
ago,
I
was
talking
with
somebody
who
was
talking
about
trying
to
figure
out
the
orchestration
between
what
was
azure
and
aws.
You
know,
they're
using
amazon
managed
airflow
there
and
they're
like
well.
Now
we
have
to
think
of
how
we're
going
to
do
that
in
azure,
and
this
is
really
going
to
suck
and
I'm
like
yeah
totally
going
to
suck
for
you.
A
Now
I
got
to
find
a
database
they're,
probably
going
to
push
me
to
use
synapse,
and
I
really
don't
want
to
use
that
I'd
rather
have
something
that
interrupts
between
the
clouds.
I'm
like
yeah,
it's
gonna
suck
for
you
too
snowflake's
gonna
work
for
you,
but
you
know,
even
that's.
That's
pretty
simple.
Just
set
up
just
cross
account,
but
yeah
interrupts.
It's
really.
C
I
mean
to
your
point
like
that
operating
system
on
the
cloud
matt.
You
didn't
explicitly
talk
about
it
being
kind
of
you
know
being
cloud
interoperable,
but
I
think
that's
huge
too,
because
yeah,
you
know,
I
chat
with
a
lot
of
people
that
are
on
multi-cloud
and
I
wouldn't
describe
it
as
like
an
intentional
multi-cloud
strategy.
C
You
know
where,
like
the
the
probability
of
each
of
these
clouds
going
down
or
0.001
and
by
you
by
having
all
be
on
all
three
clouds,
you've
reduced
it
they're
actually
like
every
cloud
is
dependent
on
the
other
and
so
they're.
It's
kind
of
additive
on
their
kind
of
the
risk
of
going
down,
and
so
the
ability
to
to
manage
cloud
data
infrastructure
across
clouds
is
is
such
a
pain
right
now.
D
It
really
is
and
yeah
a
lot
of
it
with
multi-cloud
is
just
going
to
the
data,
so
in
other
words
people
don't
intentionally
spin
up.
Google,
maybe,
but
it
turns
out
that
google
cloud
platform
is
the
best
place
to
manage
any
kind
of
google
media.
Google
ads
data,
it's
just
integrated
with
bigquery.
So
now,
you're.
A
D
That
but
you're
actually
in
aws
shop,
so
you're
in
aws,
but
you've
got
some
obscure
services
running
on
azure
because
you
have
some
microsoft
stuff
around
and
pretty
soon
you're
in
three
four
clouds
and
yeah
to
your
point
like
there's:
no
central
management
layer,
no
central
pane
of
glass
to
make
sure
that
all
those
services
are
up
and
running
and
then
the
other
nightmare
that
you
run
into
is
when
you're
planning
migration
strategies.
How
do
you
actually
execute
a
mitigation
strategy?
So
you
go
through
the
simulation?
D
Let's
say:
azure
goes
down
or
audrey
usc
1
goes
down.
Well
there
there
really
aren't
great
tools
to
handle
that
process.
For
you,
you
can
have
a
playbook
for
devops,
but
that's
about
it,
and
so
a
lot
of
what
an
operating
system
is
supposed
to
do
is
manage
service
failures,
for
example,
and
try
to
mitigate
them
to
some
extent.
A
Oh,
it
totally
is,
I
think
it's
one
of
those
things
where
the
clouds
well
there's
definitely
documentation
for
it.
They
don't
really
tell
you
a
lot
about
this
kind
of
stuff
for
it's,
it's
a
nice
way
to
make
money,
especially
if
you
you
know
aws
go
across
region,
it's
money.
There
you
go
out
of
the
cloud,
it's
more
money
so
forth.
So
I
mean.
A
Yeah
yeah,
so
I
mean
that
that's
like
the
big
one,
so
with
interop
and
stuff
that
that's
those
are
the
costs.
That
kind
of
concern
me,
because
even
if
you
get
interrupt
working,
it's
like
yeah,
that's
still
going
to
cost
you
some
money,
I'm
so
I'm
still
betting
that
at
some
point
one
of
the
clouds
is
going
to
drop
egress
fees
just
like
cell
phone
carriers,
one
of
them
drop
like.
C
C
D
And
that's
where
I
have
to
give
it
to
google
I'll
give
google
a
shout
out
here.
I
mean
they're,
the
only
cloud
provider
with
their
own,
basically
their
own
private
internet
that
you
can
run
over,
and
so
I
wonder,
if
we're
going
to
see
the
other
cloud
providers
do
that
in
the
future
as
well.
It
just
seems
like
it
makes
sense
right.
Just
to
your
point.
C
A
C
A
No
man
yeah.
Actually
that's
a
good
point.
Thank
you
yeah,
but
I
think
everyone
made
the
move
over
martin
asked.
I
was
wondering
what
is
your
view
on
all-in-one
platforms,
basically
providing
majority
of
modern
data
stack
pieces,
especially
the
ability
of
orchestrated
and
running
all
pieces
at
one
place,
go
for
it.
B
It
gives
depend,
what
do
you
define
as
an
all
one
platform?
You
know
the
you
know,
I
I
think
there
are
some
startups
who
just
like
kind
of
like
pick.
You
know
like
okay,
like
we're
gonna,
do
one
of
the
ingest
tools
and
one
of
the
transform
tools
and
one
of
the
reverse
etl
tools
and
it's
kind
of
like
platform
in
a
box,
but
I
think
it's
too
brittle
to
like
everyone's.
B
C
B
It
kind
of
depends
on
what
dimension,
because
you
need
uniformity
on
some
dimension
in
order
to
make
sense
of
things,
but
you
know
actually
the
meltano
journey
is
kind
of
instructive.
B
Was
an
acronym
for
like
a
bajillion
different
technologies
because
they
were
kind
of
gonna
provide
like
a
data
stack
in
a
box,
but
it
was
you
know
some
people
want
some
stuff.
Some
people
want
another,
have
different
requirements,
so
they
had
to
kind
of
move
on
from
that
approach,
I
know
scott.
You
can.
C
Yeah,
I
mean
I
I
think
like
this
is
why
I
have
a
business
I
mean
literally
at
brooklyn
data.
What
we
do
is
we,
we
don't
build
tools
anymore.
We
are
architects,
we
implement
tools,
we
can
we
design
an
architecture,
select
the
right
combination
of
tools
for
a
business,
and
we
configure
them
through
code
and,
like
that's
like,
and
I
consider
like
dbt
is
like
kind
of
code
configuration
and
things
like
that.
C
It's
I've
seen
a
lot
of
these
all
in
ones
and
I
think
for
earlier
stage,
companies
it
actually
can
be
quite
freeing
and
easy,
but
I
find
folks
tend
to
grow
out
of
them
because,
for
the
exact
same
reason
that
that
you
said
nick
is
that
it's
not
like
it's
not
the
extreme
utopian
version
that
I'd
love
to
see,
which
is
essentially
a
mds.
C
Modern
data
stack
marketplace
where
you,
you
know
pick
which
ingestion
tool
and
you
pitch
pick
which
warehouse
and
you
pick
the
transformation
tool
and
then
it
instantly
configures
it
in
the
cloud
and
region
of
your
choice.
Everything
nothing
goes
on
the
public
internet
and
it
just
clicks
together,
and
it
also
has
observability
and
kind
of
dependency
management
across
the
cloud,
like
literally
I'm
out
of
a
job
when
that
happens,
but
I'm
happily
out
of
a
job
right
now.
C
It's
it's
kind
of
you
know
pick
the
one
of
each
and
that's
it,
but
I
think
we're
we're
going
to
go
towards
a
place
where
I
don't
know.
I
would
love
to
see
us
get
to
this
kind
of
mds
marketplace,
but
I
don't
think
we
will
get
there.
C
I
think
you'll
start
to
see
snowflake
databricks
these
technologies,
just
companies
just
picking
up
like
the
the
tier
two
or
three,
like
versions
of
each
of
these
tools
for
kind
of
a
combination
of
stock
equity
transaction,
just
because
they're
they're
so
valuable
that
they
can
drop
20,
30,
50,
100
million
dollars
in
stock
on
five
trans
number,
fifth
competitor
and
then
offer
something
like
that.
I
think
you'll
see
something
like
that,
but
I
don't
we're
going
to
see
some
sort
of
wave
of
consolidation.
C
I
think
it's
going
to
start
with
the
smaller
players
being
gobbled
up
by
the
giant
big
data
players,
but
ever
it's
inevitable
in
every
industry
that
there's
going
to
be
some
sort
of
consolidation
it
can't
I
mean
it
cannot
go
on
like
this.
You
know
you
see,
matt
turks
diagram,
which
is
phenomenal
and
I
feel
like
it
probably
must
take
more
and
more
time
every
single
year,
but.
A
A
And
snowflakes-
but
I
wrote
about
this
yesterday
in
my
in
the
newsletter
turner
data
newsletter
by
the
way
exclusive
stuff
there,
rants
and
stuff,
but
like
yeah,
my
data
ramp
was
just
about
it's
sort
of
the
pendulum
swing
between
rebundling,
bundling
and
so
forth.
Not
you
know
bundling
unbundling
and
consolidation
on
consolidation
right
and
so,
but
the
thesis
was
you
know.
What's
the
next
informatica,
basically,
informatica
is
still
like
a
billion
dollar
a
year
of
revenue
company,
which
is
insane
actually
that's
about
as
much
as
snowflake
makes.
A
So
you
know,
but
that's
that
to
me:
that's
like
both
the
meme
and
the
essence
of
like
extreme
bundling,
where
informatica
does
everything
but
they're
also
kind
of
considered
long
in
the
tooth
right
so
like
what's
next
and
my
whole
thesis
was
okay,
so
as
consumer
points
got
and
and
nick
is,
consolidation
starts
happening
like
who's
who's.
The
next
informatica
is
that
basically,
snowflake
does
dvt,
have
a
marketplace
and
they
become
informatica.
A
I
don't
know,
but
it
seems
like
that's
going
to
happen
so
and
then
all
of
a
sudden
that
company's
like
yeah,
that's
like
old.
Let's
go
into
the
next
thing.
B
Yeah,
I
I
think
the
operating
system
analogy
is
is
apt
in
terms
of
the
dynamics
that
you
want
the
consolidation
to
do,
because
I
don't
think
the
world
is
well
served
by
having,
like
the
snowflake
stack,
the
data
bricks
stack
and
like
having
these
like
siloed
things
where
every
you
know,
if
you
build,
you
know,
like
you,
kind
of
set
it
to
yourself,
scott,
like
they
acquire
like
the
fourth
or
fifth
place
tool
which
isn't
as
good
as
the
best
tool.
You
can't
do
best
and
read
anymore.
B
So
you
have
lots
of
players,
and
so
what
this
takes
is
a
coherent.
B
You
know
horizontal
layer
that
people
can
plug
into
as
opposed
to
pure
vertical
integration,
and
so
that's
really
not
to
like
I'm
going
to
pivot
back
to
talking
about
my
my
startup,
obviously,
but
that
there's
many
different
kind
of
I
think
thesises
around
around
what
will
be
this
kind
of
cross-cutting
layer,
because
I
don't
know
if
you
need
less
boxes
in
that
turks
diagram.
I
think
you
need
it
in
a
coherent
framework
yeah,
both
conceptually
and
in
terms
of
software,
where
we
can
make
sense
of
the
world,
because
I.
B
Kind
of
like
it's
this
level
of
like
chaos
and
everyone
feels
like
it's
out
of
control,
so
I
think
it
was
not
to
not
to
mash
up
every
piece
of
data
content.
That's
been
written
in
the
last
year,
but
like
ben
stancil
had
a
piece
on
on
like
a
data
os,
and
I
think
he
he
posited
that
dbt
could
be
that
layer.
I
don't.
I
think
dbt
can
be
a
big
part
of
that,
but
I
I
think
the
work
there
is
world
there.
A
B
You
know
so
I
don't
I
think
dbt
can
do.
I
think
I
think
what
they're
doing
on
the
metric
side
of
things
makes
a
ton
of
sense,
but
so
yeah,
that's
a
you
know.
I
just
I
just
don't
want
to
live
in
a
world
where
there's
like
a
snowflake
stack
and
a
daybreak
stack
and
then
like
people
have
to
choose
one
or
the
other
for
their
career
and
like
that,
just
doesn't
feel
great
to
me.
B
A
Or
or
even
worse,
the
world
of
linux,
where
you
have
like
a
bajillion
flavors
of
that
right
too,
and
but
yeah
point
taken
it's
interesting.
You
mentioned
you
know,
sort
of
the
your
comment
on
dbt.
It's
something
matt
and
I
have
been
talking
a
lot
about
too.
Where,
like
we're
increasingly
of
the
of
the
opinion,
the
data
stack
actually
needs
to
more
live
more
at
the
application
layer,
at
least
like
the
next
gen
modern
data
stack
data
warehousing
is
great,
but
that
kind
of
fights
yesterday's
battle.
A
So
I
think
in
our
opinion,
we
would
you
know
if
you
can
couple
things
closer
to
the
application.
You
know,
then
it
solves
a
lot
of
problems
for
one
we
know
it's
over
and
over
metrics,
for
example,
the
metrics
layer,
I
think,
exists
in
absolutely
the
wrong
spot.
The
data
warehouse
is
great.
It
solves
that
problem.
It
really
does
need
to
exist
at
the
application
layer
or
wherever
the
data
is
first
generated.
A
It
also
solves
it
problem,
incidentally,
of
master
data
management,
whatever
the
hell.
That
is
these
days
so,
but
yeah
I
mean
you
know,
especially
with
the
rise
of
data
apps
and
whatnot.
I
I
think
there's
actually
there's
actually
a
fundamental
rethink
going
on
right
now
with,
I
think,
at
least
from
our
perspective,
what
we're
seeing
is
a
next-gen
data
stack
happening
before
very
eyes.
Well,
everyone's
focused
on,
like
the
modern
data
stack,
which,
in
our
opinion
is
maybe
it's
kind
of
mid-century.
D
Bring
up
here-
and
this
is
more
almost
more
of
a
business
conversation
and
a
financial
conversation
than
a
data
one
in
a
pure
sense,
and
that
is
that
what
we're
seeing
is
just
I'll
call
call
it
startup
speciation.
So
someone
has
an
idea
at
one
startup
or
they're
at
google
and
they
can
get
money
and
they
run
off
and
start
a
startup
and
that's
part
of
what's
driving
all
these
data
tools.
That's
fantastic
right,
like
the
sun.
D
Bundling
is
awesome
for
data
users,
because
there's
so
much
great
activity
going
on
the
question
is:
does
that
stop
at
some
point?
Does
the
funding
slow
down
do
all
these
small
companies
get
bought
up
because
they're
not
getting
new
rounds
of
funding?
Sorry,
this
is
probably
making
you
guys.
You
uncomfortable.
B
C
A
See
you
next
time
the
smart
ones
actually
there's
a
kind
of
related
question
here.
Thoughts
on
snowflakes
acquisition
of
streamlit.
C
That's
that's.
What
is
that
one
170th
180th
these
days,
but
you
know
it's.
B
Yeah
I
mean,
I
think
the
the
acquisition
of
streamlit
is
very
it's
pretty
exciting,
like
the
notion
of
having
a
kind
of
that
sort
of
platform
embedded
directly
in
the
data
warehouse
is
like
super
cool,
so
I
think
if
they
execute
it
well,
it
will
be
a
huge
win
for
customers
and
I
think
it
also
can
like
push
them
into
more
ml
use
cases
which
is
super
interesting.
So
you
know,
I
think
it's
a
pretty
a
potentially
brilliant
acquisition
by
snowflake.
D
Okay,
nick,
can
I
put
you
on
the
spot
real,
quick
here
and
ask:
do
you
think,
there's
a
scenario
in
which
dagster
integrates
more
into
the
application
layer
where
somehow
you
guys
are
defining
data
that
goes
all
the
way
down
to
the
application
layer,
not
just
like
the
analytics
layer.
B
Yeah,
so
it
depends
on
what
you
you
know.
It
was
actually
kind
of
one
of
the
original
theses
here
in
that
I
you
know,
built
dagster
so
that
it
kind
of
made
sense
to
software
engineers
so
to
speak,
and
one
of
my
theories
is
that,
just
as
like,
more
developers
are
being
expected
to
do
their
own
ops
via
devops
tools
that
they
should
also
be
respond
and,
like
you
know
you
build
it,
you
deploy.
It
is
kind
of
like
the
amazon
way
of
doing
things
right.
B
I
think
that
the
same
token,
the
application
developers
at
some
point
should
be
responsible
for
the
analytics,
or
at
least
like
structuring
the
data
so
that
it
can
be
used
in
analytical
context,
because
you
know,
if
they're
do.
If
a
good
product
developer
is
doing
their
job,
they
should
understand
how
the
products
instrumented,
how
what
their
code
does,
how
it
affects,
like
the
actual
user
behavior
how
that
gets
surfaced
in
the
internal
tooling.
B
So
I
think
this
entire
discussion
is
is
pretty
interesting
about,
because
you
know
the
way
that
I
define
the
data
platform
is
what
you
do
with
the
data
once
it's
like
ripped
from
its
original
context.
You
know.
B
C
B
You're
taking
like
the
the
data
and
it
used
to
be
like
surrounded
by
all
the
software
that
manages
it
right
and
then
you're,
just
like,
like
literally
pressing
a
button,
an
ingest
tool
and
copying
it,
and
then
it's
completely
unprotected
and
divorced
from
the
original
business
logic
and
effectively
it's
the
job
of
an
analytics
team
to
almost
like
reconstruct
a
lot
of
that
business
logic
in
a
way
and
then
also
there's
the
problem
of
like
the
the
upstream
application
teams
can
break
you
whenever
they
want,
so
I've
been
kind
of
going
around
and
around,
but
I
think
it
it
makes
a
lot
of
sense
at
a
minimum,
for
the
application
seems
to
be
much
more
analytics.
B
B
From
the
application
source-
and
that
way
the
moment
they
change
it,
they
know
they're
going
to
break
the
analytics
team
and
so
having
an
expression
of
the
metadata
on
both
sides
of
the
house.
So
to
speak
is
like,
I
think,
a
very,
a
very
positive
next
step
to
that
front,
but
I
think
they're
like
like
there.
There
should
be
like
at
minimum
awareness
between
the
two
of
them
yeah.
A
Some
of
that
I've
been
riffing
on
too
we
kind
of
call
it
the
definitions
layer
where
at
each
stage
where
data
is
mutated
or
changed,
schema
definitions,
otherwise,
like
there's
a
semantic,
a
schema
and
a
metrics
layer.
So
even
at
the
application
point
it's
like
all
metrics
are
defined
by
the
application
team.
First,
then
you,
then
you
have
a
central
source
of
truth
or
or
somebody
defines
them
right,
but
then,
as
changes
happen
progressively
because
because
you
have
data
lineage,
but
you
don't
really
have
like
definitional
lineage
at
this
point.
B
A
A
I
don't
know
so,
but
that
that's
the
thing
that
plagues
data
teams-
I
mean
you
know
the
it's
the
trite
stat,
that
data
scientists
spend
80
percent
of
the
time
getting
mung
data
and
all
this
stuff,
and
I'm
like
yeah
that
really
sucks,
and
maybe
we
should
focus
on
fixing
that
one.
So,
but
you
know,
because
I
I
you
know,
I
used
to
be
like
a
a
lean
processed
junkie
back
in
the
former
life.
So,
like
I
just
see
things
like
this,
it
makes
my
eyes
bleed
like.
A
D
Days,
what
was
hadoop
originally
designed
to
do
well
log
processing
right,
because
the
assumption
was
that
these
applications
were
feeding
you
total
garbage
in
their
logs,
and
you
had
to
like
make
sense
of
it.
There
was
no
assumption
they
were
going
to
feed
you
good
quality
data,
and
we've
at
least
now
are
mostly
to
the
point
where
you
assume
that
you're
going
to
get
these
web
events,
and
maybe
a
nice
json
format
or.
D
This
that
you
can
handle-
and
I
think
joe
you've
talked
a
lot
about
this
too-
where
in
the
streaming
world,
it
becomes
very
hard
to
fix
data
issues
downstream
of
the
application,
and
so
you
need
to
think
about
that
process.
Much
much
earlier.
A
Yeah
I
mean
I
talked
to
people
who
have
designed
a
lot
of
the
you
know:
kind
of
the
fast
databases
we
use
now
like
druid
and
others
like
that,
and
I
ask
them
what
do
you
think
about
data
modeling
they're
like
oh,
I
just
don't
even
don't
even
bother
with
streaming
data,
just
assume
it's
correct
at
the
applications
level
and
it's
like
all
right,
that's
cool
I'll!
Just
I'll
do
that.
C
Yeah
I
mean,
I
think,
like
I
know,
we're
very
much
in
a
batch
kind
of
way
of
thinking
right
now,
so
I
don't
even
think
the
I
think
I
think
you
touched
on
a
couple
points
which
is
essentially
pushing
up
accountability
upstream
to
data
quality
and
streaming,
which
are
two
concepts
that
are
just
so
different
and
new
to
the
folks
currently
working
in
data
and
just
like
they're,
not
bad
concepts,
they're
and
they're
they're.
Actually,
in
fact,
good
concepts,
terrible
concepts,
joe
bad
ideas,
they're
good
concepts,
but
I
don't
think
anybody.
C
I
don't
think
this
generation
of
folks
that
are
just
getting
into
data
and
modern
data
stack,
can
even
consent
like
most
of
them,
can't
even
conceptualize
kind
of
to
give
events
and
and
pushing
the
stuff
upstream-
and
I
think
it'll
happen,
but
we're
we're
talking
about
both
the
technology
shift
and
just
a
paradigm
shift
on
the
type
of
tools.
How
you
approach.
Data
events
like
streaming
streaming,
data
streaming
analytics
event
analytics
is
a
completely
different
thing.
A
C
Cool
I
mean
I,
I
think
I
think
it's
like
right
now,
it's
just
if
any
time
someone
says
they
need
real
time
or
near
real
time.
Yeah,
you
know
we're
like.
Can
you
do
it
in
your
production
application
like
we
could
do
it
because
we
could
do
it
downstream.
We
could
do
it
in
their
kind
of
data
stack
but
they're,
not
investing
the
in
the
tooling,
the
people.
You
know
just
the
budget
to
get
that
five
nines
that
uptime
that
level
you
know
most
companies
don't
have
the
the
reliability
of
their
data.
C
I
guess
a
compounding
factor
that
I
don't
think
the
technologies
have
gotten
good
enough
in
the
data
stack
they're
kind
of
five
to
ten
years,
less
advanced
than
the
kind
of
the
production
data
technologies,
and
so
it's
like.
Not
only
do
you
have
to
put
the
organizations
that
put
more
money,
it's
harder,
and
so
it's
like
you
know,
do
it
upstream
and
I
and
I
do
think
tools
like
I
think,
platforms
like
firebolt.
B
Scott,
I
have
a
question
when
you
ask
your
clients
uh-huh
what
real
time
when
you
say
you
know,
I
imagine
you're
in
a
meeting
and
client
says
I
want
real-time
data
and
they
you
say.
Okay,
can
you
define
what
you
mean
by
real-time?
What
do
they
actually
want?
Usually.
C
Yeah
I
mean
I,
I
think,
that's
exactly
what
I
do,
because
you
know
everybody
wants
everything
as
fast
as
they
can
get
it
like
there.
There
is
I
hundred
percent
data
instant
and
like
nano
second
millisecond
real
time
is
definitely
better
for
every
single
organization
than
five
minutes,
which
is
better
than
15
minutes,
which
is
better
than
an
hour.
But
the
question
is
like
how
much
better,
and
so
it's
like
you're,
not
asking
exactly
what
real
times
means.
It's
like.
C
What
is
that
point
where
the
effort
and
the
values
like
is
worth
it
and
because
they'll
always
want
as
soon
as
you
can
have
it,
and
typically
in
a
typical
organization,
every
15
minutes
is
probably
as
as
kind
of
fresh
as
I've.
I've
seen
it.
Maybe
every
10
minutes,
but
most
organizations
we
work
with
is
every
one
to
four
hours,
and
that's
they
don't
even
they
don't
call
that
real
time.
They
just
call
it
like
ongoing
updating.
A
It's
a
slippery.
It's
a
slippery
term,
though
right
I
mean
yeah
for
the
audience,
scott
and
matt,
and
I
and
we
run
very
similar
companies
different,
but
we
think
we
deal
with
a
lot
of
the
same
types
of
problems.
A
D
D
A
A
So
if
you,
if
you
could
theoretically
get
data
every
every
picosecond,
for
example,
what
would
you
do
with
that
right
and
so
when
it
reaches
an
extreme
or
then
I
start
asking
okay,
so
at
a
certain
point,
are
you
doing
analysis,
or
are
you
trying
to
drive
a
behavior
and
if
you're
trying
to
drive
a
behavior
in
action?
A
Why
don't
you
try
and
automate
that
like?
Why
do
you
have
to
use
a
report
right?
So
if
you
need
data
that
quickly,
that's
not
something
a
human
can
process,
and
so
it's
really
that
continuum
of
are
you
trying
to
do
analysis
or
you
are
you
trying
to
take
an
immediate
action
like?
So
what
does
that
look
like?
And
then
you
know
from
there
sort
of
the
spectrum
of
you
know,
actions
that
you
could
possibly
take
up
to
it,
including
just
automating
stuff,
which
scott
would
love,
because
he
likes
automated
things
so,
but.
C
What
I
mean,
wouldn't
you
say
like
I,
I
agree
that
and
I
have
that
exact
same
conversation,
but
I
feel
like
every
organization
unless
they
indulge
in
bad
habits
like
watching
the
pop
boil.
Every
organization
would
theoretically
benefit
from
having
as
real-time
data
as
possible.
I
mean
you
know.
I
agree
if,
if,
if
there's
something
they're
trying
to
trigger,
they
should
automate
that,
but,
like
every
organization
will
benefit
as
much
as
possible
for
for
knowing
as
accurately
possible.
A
Yeah,
I
remember
there's
a
local
company
here
that
you
know
has
a
data
analytics
tool
and
this
is
in
salt
lake
and
they
you
know
they
have
this
billboard.
That
says
binge
watcher
data
and
I
was
like,
if
you
have
to
binge,
watch
or
binge
watch
your
business
or
something
I
was
like
if
you
have
to
binge
watch
your
business
like
every
second,
like
your
business,
probably
sucks.
Actually
so
like
that
to
me
that
you
should
start
automating
that,
like
why
do
you
have
to
pay
that
much
attention
to
a
to
a
chart?
C
C
Those
things
like
I'm
surprised
that
no
one's
cracked
this
so
far
because
I
do
think
there
is
a
lot
of
pot
watching
and
and
people
putting
rules
or
triggers
or
something
like
that
to
understand
anomalous
conditions
or
changes
or
what
and
there's
a
lot
of
people
that
spend
the
first
30
minutes
of
their
day.
Looking
over
the
same
dashboard
to
see
if
anything
changed,
why
do
you
think
that
hasn't
been
automated?
Yet
it's
it's?
B
C
It's
it's,
you
know
intriguing
on
what
what
sales
went
down,
what
you
know
was
it
and
I'm
gonna
go
back
to
my
mattress
days.
It's
like
was
it
mattresses
or
sheets
or
queens
or,
like
you
know,
was
the
conversion
rate
on
this,
and
just
there's
like
a
lot
of
people
that
just
have
great,
they
have
the
data,
they
have
all
the
dashboards
and
they
literally
have
the
combination
of
charts
that
they
can
triangulate
and
what
probably
happened
was
just
like
why
that
feels
like
a
computer
problem.
B
That
it
could
be
like
interesting
on
this
on
this
matter.
But
I
mean.
C
C
D
It
is
that
there's
a
cultural
problem
so,
for
example,
like
these
ads
platforms
right
for
the
most
part,
you're
gonna
have
a
hard
time
beating
google
ads
automation
for
advertising.
For
you,
there
are
certainly
cases
where
you
can,
especially
if
you
have
a
very
sophisticated
data
science
team,
but
so
many
times
people
just
don't
want
to
let
go
of
the
wheel,
they're
like
no.
No,
I
need
to
tune
this
bid
manually.
C
D
D
Will
we
be
able
to
let
go
of
the
wheel
and
just
say
you
know
what
I
as
a
human
are
going
to
focus
on
other
things
besides
driving
all
the
time
or
I'm
going
to
let
this
part
of
my
life
be
automated
or
my
business
and
then
there's
there's
the
technology
aspect
of
getting
the
technology
out
there,
but
yeah.
I
think
the
culture
part
is
a
huge
slice.
A
The
other
day
who
was
it
ryan
from
that
dot?
It's
a
new
startup,
but
they
he
he
created
this
software
package
called
quine,
which
is
it
has
a
direct.
It
came
out
of
darpa
of
all
places,
but
what
it
is
is
like
a
streaming
graft
technology
and
so
what
it
does
anomaly
detection,
using
a
graph
on
on
data
streams
and
on
the
surface
I
was
like
okay,
this
sounds
pretty
dang
cool
solves
a
lot
of
problems.
B
A
Like
bad
actors
or
something
like
that,
but
I
think
it's
it's
kind
of
the
general
case
of
a
lot
of
these.
You
know
we'll
take
anomaly
detection,
for
example.
I
think
there's
there's
no
shortage
of
companies
doing
this.
In
fact,
every
data
app
database
you
talk
to
like
yeah.
One
of
the
things
we
want
to
do
is
predict
fraud,
for
example,
or
some
sort
of
anomalous
behavior
with
streams,
and
then
I
think
it's
maybe
part
of
it
is
messaging
or
something
I
don't
know
because
then
it's
like
well.
Why
don't?
A
I
just
use
like
the
tools
that
already
do
monitoring,
for
example,
instead
of
this
bi
tool
or
this
new
data
tool,
and
so
I
think
part
of
it
might
be
marketing,
because
I
do
agree
like
this.
If
you
take
the
humans
out
of
the
loop
on
this
stuff
like
why
not
like
this
is
like
the
perfect
case
for
automation,
if
there
ever
was
one
right
yeah
totally,
so
I
don't
know
it's
a
good
question,
we'll.
B
Yeah
I
mean
insofar
as
yeah
like
an
anomaly:
detection
system
has
to
plug
into
orchestration
and
do
computations
on
a
regularized
basis,
but
certainly
not
something
on
our
near
term.
Radar
makes
sense.
C
And
it
just
everything
feels
I
think
my
theme
of
the
day
is
everything
relies
too
much
on
people
at
the
moment.
We
need
this.
We're
so
close.
I
think,
to
just
having
a
lot
of
really
cool,
having
the
people
focus
on
the
logic
and
the
business
logic
and
the
context
and
the
things
that
are
uniquely
hard
for
for
machines
to
do
and
letting
kind
of
machines
run
whether
it's
devops
or
kind
of
at
least
level.
One
observations
of
data
interesting.
A
I
was
writing
about
this
this
morning.
Actually
I
was
right
in
the
final
chapter
of
our
book
and
it's
about
the
future
of
data
engineering,
and
one
of
the
things
I
was
writing
about
was
how
you
know.
Data
engineering,
I
suspect,
is
gonna,
get
a
lot
more
enterprisey,
I.e
as
tools
become
simpler,
as
things
become
automated
for
data
engineers,
so
simply
just
move
up
the
value
chain
and
focus
on
next
to
order
problems
right
things
like
governance,
you
know
kind
of
data
management
related
stuff
things
that
were
traditionally
actually
very.
A
You
know
kind
of
blue
shirt
and
khaki
sort
of
tasks
that
were
done
at
you
know
kind
of
more
stodgy
companies,
but
now
this
has
been
democratized.
Governance
is
like
a
pretty
cool
word
these
days,
which
I
never
thought
I
would
see
ever
so.
A
You
know
data
management,
that's
another
one
where
I
see
that
as
a
field,
that's
going
to
see
a
renaissance
very
very
soon,
precisely
because
you
know
I
think
people
are
trying
to
write
off
the
death
of
you
know
like
data
engineers,
oh
that
field's
done
like
what's
her
left,
what's
left
after
you've
done
five
trend
of
snowflake
and
I'm,
like,
I
think,
you're
missing
the
bigger
picture
here.
There's.
A
A
Across
the
chasm
you
know,
on
the
other
side,
with
the
ml
ops
community,
we're
all
homies
and
there's
a
different
parallel
universe
going
on
there
right
now
you
know,
and
then
I
talk
to
software
engineers
and
they're
they're
fascinated
about
data.
So
what
I
see
happening,
there's
going
to
be
a
convergence.
C
A
B
Yeah
too
yeah
this
data
ml,
bridging,
is
really
a
place
where
actually,
I'm
really
excited
about
what
is
going
on
in
dexter
land,
because
we
we
kind
of
divide
our
customers
into
a
couple
of
personas,
and
so
one
is
what
I'll
call
the
modern
data
stack
platform
engineer
so,
like.
A
B
An
engineer
and
my
job
is
to
like
bring
in
these
different
tools
that
have
been
set
up
and
like
cohere
it
into
a
platform,
but
the
other
interesting
set
of
users.
We
have
are
what
I
call
the
ml
engineer,
the
mles
or
ml
platform
engineers,
and
they
also
gravitate
towards
system
and
then
actually,
we
have
some
users
where
they're
crossing
that
boundary
a
bit
right.
B
So
I'm
the
modern
data
stack
platform
engineer
I
have
like
got
my
data
engineering
team
spun
up,
but
now
my
mandate's
expanding
to
also
support
an
experimentation
platform
and
they
can
use
the
same
set
of
tooling
to
kind
of
cross
both
because
in
the
end
we
call
it
etl
on
one
side
of
the
house,
you
call
it
feature
engineering
on
the
other
side
of
the
house.
It's
a
it's
approximately
the
same
process,
you're
doing
computation
and
data
processing
to
produce
intermediate
artifacts
that
are
used
downstream
for
other
things.
B
Right-
and
you
know
we,
we
imagine
a
world
where,
like
the
ml,
people
can
cross
and
use
dbt
to
like
change
the
data
warehouse
and
then
pull
data
out
of
that
to
do
ml
modeling
and
like
kind
of
do
it
in
all
all
one
surface
and
yeah
and
where
we
kind
of
see
ourselves
playing
in
this
space
to
kind
of
expand
on
the
previous
question
is
like
we
want
to
be
like
the
system
of
record,
where
all
your
assets
are
defined
in
software,
so
like
attack
and
attaching
like
metadata
and
information
on
those
like
that.
B
Canonical
representation
of
the
data
asset,
like
you
could
like
attach
like
hey
this
rule
about
like
this
is
like
you
know
what
we
can.
This
is
what
we
consider
an
anomaly
you
know
to
basically
like,
and
that
can
be
code,
and
you
can
like
effectively
attach
it
to
that
software
definition,
but
yeah
no,
and
I
think
that
you
know
we.
We
see
this
data
ml
kind
of
boundary
fuzzing,
which
I
think
is
actually
really
positive.
D
Yeah,
I
totally
agree
yeah,
it's
kind
of
a
distinction
without
a
difference
in
some
ways
right,
like
yeah
ml
engineers,
are
probably
also
managing
gpu
clusters
or
something,
but
I
think
right
now.
What
we
see
is
a
lot
of
repeated
wasted,
labor,
where
the
data,
engineering
or
even
data
warehouse
side
of
the
house
is
doing
stuff
that
then
the
ml
engineers
kind
of
repeat
in
their
pipelines.
A
I
got
into
data
engineering
from
manual
engineering
yeah.
I
was
building
mls
ml
engineering
systems
back
in
the
early
2010s
and
that's
how
I
got
into
data
engineering
so,
like
the
hard
part,
was
the
engineering
which
did
not
exist
back.
Then
you
had
to
build
everything,
so
that
was
you
know
and
in
my
mind,
having
done
both
it's
kind
of.
I
don't
really
see
that
much
of
a
difference
honestly
they're
titles,
but
it's
like
back
in
the
day.
A
You
were
just
called
a
software
engineer
that
wasn't
even
an
ml
or
a
data
engineer
just
your
software
engineer.
So
you
know
it's
to
me.
It's
almost
a
false
dichotomy
in
some
ways,
but
then
looping
it
all
back
to
the
application
too,
where
I
think
you're
going
to
be
more
tightly
bound
to
that.
That's
where
I
think
it's
going
to
get
very,
very
interesting.
So
who
knows.
C
I
mean
it's
very
interesting.
I
feel
like
any
most
of
the
organizations
I've
worked
with
that
have
had
a
strong
ml
function
and
not
like
you
know,
exploratory
research
ml,
but
like
production
ml,
it's
a
completely
different
team
rolls
up
to
a
completely
different
part
of
the
organization
like
how
do
we?
How
do
we
bridge
that?
I
I
every
single
time
I'm
like?
Why
aren't
these
on
the
same
teams
and
the
the
answers
I
get?
C
Are
you
know
different
stacks
and
then
there's
not
like
there
aren't
experienced
data
leaders
that
can
run
that
have
experience
in
both
areas.
There
are
some
but
they're,
not
many
that
you
know
you
either
come
up
the
ml
track
or
you
come
up.
The
data
engineering
kind
of
you
know
now
analytics
engineering
track,
but
it's
only
if
you're
at
you
know
very,
very
large
organizations
that
have
like
a
chief
data
officer
or
chief
algorithms
officer
that
really
kind
of
rolls
up
a
big
part
of
the
ecosystem.
C
B
Yeah,
I
think
it
will
be
solved
not
by
merging
those
teams,
but
I
think
the
the
better
model
is
to
deploy
like
data
engineers
and
ml
engineers,
like
closer
to
their
business
function
and
kind
of
have
that
capability
spread
across
the
org
and
then
common
like
platform
and
services,
teams
that
ensure
that
those
people
are
more
efficient
and
productive.
That's
like
the
other
way
you
can
think
of
it
like
merging
those
two
orgs
or
you
think
about
like
like.
D
And
I'll
kind
of
give
orchestration
a
shout
out
here
again
and
say
that
I
see
orchestration
platforms
as
they
should
be
a
collaboration
layer
right.
So,
instead
of
me
as
an
engineer
just
consuming
someone
else's
table,
I
should
basically
consume
their
dag.
In
other
words,
I
should
look
at
their
tasks
and
say:
actually
I
want
to
piggyback
off
of
this
stage
of
their
data
before
it
becomes
final,
because
that
then
it
hasn't
been
aggregated
yet
and
I
can
get
what
I
need
without
having
to
repeat
all
that
work.
A
So
it
looks
like
we're
coming
up
on
time.
I
think
we
also
wanted
to
give
a
shout
out
as
well
to
you
know,
what's
happening
kind
of
globally
right
now,
I
think
we're
nick
scott
and
I
are
kind
of
talking
offline
and
matt
as
well.
You
know
it's.
I
think,
we're
definitely
privileged
to
be
being
able
to
talk
about
data
right
now.
Well,
there's
a
lot
of
chaos.
You
know
and
really
bad
things
happening
in
the
world.
A
So
just
I
don't
know
if
anyone
has
anything
to
add
to
that,
but.
C
If
I
can
be
helpful
in
kind
of
grand
scale
of
of
life
events
or
kind
of
smaller,
smaller
individual
life
events,
don't
hesitate
to
reach
out.
B
I
think
it's
always
a
good
reminder
to
put
all
your
problems
into
perspective,
yeah
that
you
know,
we
all
think
our
lives
are
stressful
and
you
know
we
have
our
trials
and
tribulations,
but
this
is
a
reminder
that
we
should
always
be
thankful
for
what
we
have
100.
D
Yeah,
it's
funny
I
mean
I
I
think
running
a
business
is
a
very
narcissistic
process
and
probably
you
know
joe
is
my
business
partner.
So
we
can
talk
about
this
very
directly,
but
it's
like
you're
running
this
business.
You
have
health
insurance
costs.
You
have
employees,
you
have
clients
you're
all
stressed
out,
but
then
you
step
back
and
you're
like
what's
the
worst.
That
happens.
Oh
well,
I
stopped
being
in
business
and
I
probably
just
go
find
another
job
like
and
yeah
when
you
start
to
think
about
stuff.
Other
people
are
going
through.
A
A
A
lot
so
yeah
so
great
talk
thanks
to
the
audience
for
the
questions.
I'm
sorry
things
didn't
work
on
linkedin,
I'm
trying
to
work
with
streamr
to
diagnosis,
but
there's
a
chance
I'm
about
to
find
a
another
platform
provider
of
choice.
If
this
doesn't
work
so.
C
A
Oh
boy
cool
well
anyway,
I'll
see
you
guys
and
also
on
the
monday
morning,
data
chat.
We
have
a
fabiana
clement
out
of
portugal,
she's
with
y
data,
we're
going
to
talk
about
synthetic
data.
So
if
synthetic
data
is
your
jam,
come
check
it
out.
She
knows
a
lot
about
this
topic,
so
thanks
have
a
great
weekend.
We'll
see
you
all
next
time,
thanks.