►
From YouTube: Data Pump Demo for Salesforce Team
Description
A demo from data engineering reviewing the data pump framework to discuss as a solution for a Snowflake to Salesforce integration
A
Word
great,
so
let's
go
ahead
and
jump
in
we're
going
to
jump
into
an
example,
a
working
example
of
how
data
pumps
work
today,
all
right,
so
the
list
that
I
have
here
in
the
agenda
is:
let's
make
this
bigger
we're
going
to
start
with
the
model,
so
you
can
kind
of
see
how
here
this
corresponds
to
this
list.
So
first
we're
going
to
go
looking
at
a
model.
I
broke
my
thing
and
then
so
this
model
is
here.
So
this
is
our
dbt
documentation.
A
Dbt
is
a
tool
that
we
use
to
do
all
of
our
modeling.
We
also
can
use
it
for
our
docs.
So
here
is
the
dbt
talks
for
the
pump
marketing
contact
model,
which
is
the
model
that
we're
using
to
pump
data
from
snowflake
to
marketo.
You
can
see
we
have
the
column
names
here.
What
the
data
type
is
and
then
the
way
that
we've
been
using
the
docs
for
this
is
that
we
use
this
description
field
to
just
tell
you
what
field
we're
mapping
to
right.
A
So
this
way
the
integrations
team
that
daniel
is
on
they
have
a
point
of
you
know
they
can
look
at
this
for
a
reference
and
when
they're
building
out
those
fields
and
those
mappings.
If
we're
working
with
someone
like
jack
or
jim,
on
the
system
side
or
the
technical
owner's
side,
they
can
have
a
source
to
see
what
we're
building
and
how
we're
imagining
a
mapping.
They
can
help
us
contribute
to
this
mapping.
Here
we
sort
of
have
a
documented
version
controlled
this.
A
This
exists
in
the
ammo
file,
also
in
our
public
repository
for
the
data
team,
which
we
can
point
to
later.
If
you
guys
are
interested
but
yeah,
this
is
the
model,
let's
say,
and
this
is,
if
you
go
into
snowflake,
you
can
query
pump
marketing
contact,
and
this
is
the
sequel
that
generates
it,
etc.
A
You
can,
even
if
you
really
care
to
see
how
our
warehouse
worked.
You
can
see
a
lineage
graph
of
where
the
data
comes
from,
so
we're
trying
to
leverage
and
resurface
as
much
of
what
we
already
have
in
place
as
possible.
So
this
is
the
model.
Then
the
next
step
is
in
pumps.yaml.
A
We
have
this
pump
marketing
contact
model
at
you
know
this
model
record,
and
then
we
have
some
attributes.
We
have
a
timestamp
column.
This
is
the
column
that
the
pump
framework
work
will
use
to
increment
the
data.
So
I
want
to
query
for
since
the
last
time
I
got
data
to
now
and
get
everything.
That's
changed.
We're
going
to
use
for
this
model,
the
last
changed
column.
A
If
there
isn't
is
there's
not
an
incrementing
timestamp
column,
meaning
we
always
want
to
get
all
the
data
every
time
we
query,
we
would
just
put
a
null
here
and
data
pump
will
handle
that
if
the
data
is
sensitive
in
this
case
it
is-
and
I
think
in
many
of
the
cases
where
we're
going
to
crm-
we'll
probably
have
some
sense
of
data
as
well.
A
We're
going
to
mark
that
as
true
and
what
that
means
is
that
this
will
look
at
the
sensitive
schema
so
in
the
process
here
we'll
go
through
this
more
detail
in
a
second.
The
first
step
is
to
create
a
data
model
once
that's
done
or
during
that
process,
we'll
evaluate
what
the
data
is
like.
If
they're
sent
to
the
data
part
of
that
model
creation
process,
we
make
sure
that
that
goes
to
the
right
place
in
the
warehouse
right.
A
So
if
it
has
personal
information
in
it,
we
have
to
be
careful
about
where
it
goes.
We
have
a
dedicated
schema
in
our
warehouse
for
that
sensitive
data
it'll
go
there
and
then
this
just
tells
the
data
pump.
That's
where
it
is.
This
owner
field
is
not
handled
at
all
by
the
data
pump
framework
by
what's
happening
on
in
the
orchestration.
It's
just
for
documentation
in
this
record.
A
So
if
something
goes
wrong,
we
want
to
know
who
to
reach
out
to
it's
in
the
same
place
as
this
other
information,
so
it
kind
of
keeps
it
all
together.
That's
that
any
questions
about
pumps.yaml.
A
Once
it's
in
pumps
to
ammo,
then
it'll
be
available
in
the
airflow
dac.
So
this
is
what
airflow
looks
like
each
of
these
squares
is
a
as
a
task
instance.
So
this
is
a
time
that
it
runs.
You
can
see
that
we've
been
running
this
pump.
Marketing
contact
this
pump.
Hash
marking
contact
is
a
hash
de-identified
version
for
testing.
For
some
time.
Just
last
week,
michael
walker
on
the
engineering
team
added
a
product
usage
pump.
This
could
be
what
we
use
for
salesforce
by
the
way.
A
I
think
this
is
the
data
that
we're
actually
interested
in.
You
wouldn't
have
plugged
it
in.
So
we
can
start
using
that
to
test
if
we'd
like
once
it's
in
this,
this
is
going
to
be
running.
We
can
look
at
a
log
here
and
it's
going
to
hood.
You
know
it's
going
to
do.
I
don't
know
if
it's
really
obvious
here
what
it's
doing-
maybe
not!
A
A
Joke
it's
a
joke,
yeah
just
assume!
I
guess
I
could
make
the
log
verbose,
but
I
don't
think
we
benefit
much
from
that.
So
here's
the
s3
bucket,
where
it's
landing
right,
so
here's
this
one's
actually
from
previous
testing.
So
this
isn't
getting
anything
actually.
This
might
still
be
getting
stuff.
I
have
another
task
running
in
the
background
from
testing
that
does
this.
A
We
have
the
marketing
contact
and
then
we
have
this
new
one
that
michael
just
added
palm
subscription
product
usage
and
it's
just
going
to
spit
out
csvs
the
if
you're
curious,
the
name
here
is
the
query
id
from
snowflake
and
it'd
be
fairly
easy
for
us
to
add
anything
to
the
object
naming
here.
A
So
one
of
the
things
we'll
talk
about
at
the
end
is
we
need
to
add
some
testing
development
and
monitoring
to
this
as
well
part
of
the
development
that,
we'll
probably
add,
is
we'll
probably
have
a
separate
object,
name
and
s3
for
when
we're
doing
when
we're
in
a
testing
or
development
phase
right.
So
these
are
some
things
we
haven't
added
yet,
but
we
will-
and
it's
really
easy
easy
to
add
right.
A
So
let's
say
we
wanted
to
add
the
targets
to
the
object
name
in
s3,
so
we
wanted
to
be.
You
know,
let's
say:
pump
subscription
product
usage
is
relevant
to
a
bunch
of
places
which
it
might
be,
and
we
maybe
want
to
like
split
that
out
into
separate
files.
For
some
reason,
maybe
we
don't,
but
maybe
we
could.
We
could
quickly,
you
know,
add
just
like
a
hey,
maybe
we
add
a
target
variable
to
the
pump.yaml
and
then
we
can
just
feed
that
into
the
way
that
this
name
is
constructed.
A
So
it's
all
happening
via
python.
It's
pretty
straightforward
to
make
those
changes.
Now,
after
that
point,
like
I
said,
workado
can
can
read
s3,
it's
got
permission.
I
just
have
a
link
here
to
all
the
integrations
that
are
currently
supported
by
rocato.
Anyone
who's
ever
used
a
a
tool
like
workato
before
whether
it's
mulesoft
or
ricato
or
what's
the
other
one
informatics.
A
Knows
that
like
not
all
of
these
are
created
equal
right,
so
there's
still
always
even
if
something's
in
this
list,
it
doesn't
necessarily
mean
that
we
could
do
it
the
way
that
we'd
hope,
but
this
is
just
a
list
of
the
thing
there,
daniel
parker's
on
the
call
he's
the
technical
owner
for
workato
at
the
moment,
thanks
for
joining
daniel,
I
don't
we
can
ask.
If
you
guys
have
questions
about
ricato
we
can.
We
can
jump
into
that
now
or,
if
you
kind
of
get
it,
we
can
keep
going.
B
Well,
I
I
have
a
whole
slew
of
questions
about
the
the
csv
output
into
the
s3
bucket,
but
that's
kind
of
you
know
well
hold
on
daniel
might
be
able
to
answer
something
by
showing
off
daniel.
How
are
you
ingesting
those
csvs
as
they
the
format
and
the
timing?
I
I
it
wasn't
like
one
per
day
or
that's
what
it
looked
like.
How
are
you
ingesting
those
into
ricotta
and
doing
anything
with
them?
If
you
want
to
show
it
off,
that'd,
be
amazing.
A
A
You're
already
sharing,
so
you
can
go
first
and
then
we'll
jump
to
mine
yeah,
so
the
way
that
work,
auto
works
last
time
I
set
something
like
this
up
and
I
imagine
this
hasn't
changed
is
workato
basically
lists
the
objects
in
the
bucket
at
some
interval.
I
think
it's
like
five
minutes
or
something
and
then
if
it
notices
a
change,
so
I
know
this
is
a
new
object.
Then
it
will
go
run.
You
know
it's.
You
know
the
the.
A
I
guess
it's
the
download
to
get
the
or
the
read
and
then
it'll
use
that
batch
of
that
way.
All
right.
B
So
yeah
well
dave's
got.
Maybe
I
can
ask
because
I'll
just
drill
in
we're
right
here.
So
currently
you
know
the
way
we're
doing
it
again
is
is
a
snapshot
model
which
is
we
have
the
complete
data
set,
we're
snapping
it
once
a
week,
just
arbitrary
amount
of
time.
That's
what
made
sense
again.
It's
a
manual
process
so
when
this
runs
justin
before
you
said
that
it's
actually
in
a
pending
process,
meaning
when
we
clicked
in
these
csvs
is
that
a
portion
of
the
data
set
is
that
a
snapshot
of
the
data
we.
A
Would
be
determined
by
two
things:
the
model
that's
created
right,
so
what's
in
that
model,
for
example,
because
you
can
have
them,
we
could
have
a
model
that
you
know
replicates
data.
So
every
time
you
know,
you've
got
a
time
stamp,
and
this
is
all
the
data
for
yesterday
that
you
could
have
rows
in
that
same
table
for
the
same
data,
but
what
it
was
yesterday.
So
that
could
be
one
way.
A
The
other
way
is,
it
could
just
be
the
current
state
of
the
table
and
then
what
we
would
do,
let's
see
if
I
still
have
it
up
and
pumps.yaml
here
again
we
have
this
timestamp
column.
So
if
you
know,
if
you
give
a
null
value
here,
then
our
pumps,
job
or
our
pumps
framework
will
just
say:
well,
there's
not
one.
So
I'm
just
going
to
clear
the
whole
thing,
and
I
can
show
you
really
quick
what
that
looks
like
it's
just
in
this
module.
A
What
is
it
and
well
actually
never
mind,
keep
going
I'm
interested
yeah.
So
basically,
all
this
is
doing
here
is
it's
generating
a
copy
command,
which
is
the
command?
You'd
run
the
snowflake,
and
if
it
doesn't
have
this
timestamp
column,
then
where
is
it
right
here?
If
timestamp
equals
none,
then
it's
just
select
star
from
the
table
right.
If
it
does
have
a
timestamp,
then
it
will
add
the
where
clause,
which
would
have
the
the
time
frame.
B
B
B
A
Just
imagine
snowflake
is
a
is
a
database
like
in
the
other
database,
like
my
sequel,
and
you
can
have
arbitrary
whatever
in
it
right,
whether
it's
good
or
bad?
So
we,
you
know,
because
that's
arbitrary
is
the
reason
why
in
pumps.yaml
we
make
you
tell
us
what
the
name
of
the
timestamp
column
is,
because
you
could
name
it.
You
know
gerald
and,
like
maybe
gerald,
is
the
column
that
you
have.
You
know
the
last
time
that
data's
changed
or
when
you
want
to
pull.
A
It
is
actually
what
it
really
means
when
you
want
to
send
it
in
the
pump.
You
just
tell
us
the
name
of
that
column
and
as
long
as
it's
a
date
or
timestamp
field,
it'll,
work,
yeah,.
B
And
you
have
the
ability
to
say,
listen,
we'd,
like
a
full
snap
of
the
data
at
an
interval,
and
the
only
thing
that
s3
kind
of
fails
at
is
is
the
only
clue
we
have
to
say
like
this
data.
This
snap
is
for
this
date
is
like
hiding
it
in
the
title
right.
I
know
you
can
append.
Can
this
append
metadata
to
anything
they
get
to
the
files
dumped
into
s3
or
do
we
have
to
rely
on
the
title.
A
It
depends
on
what
you
mean
by
metadata.
I
mean
there's
a
few
things
in
what
you're
asking,
and
maybe
we
could
do
this
in
a
separate
conversation
because
it
could
take
a
while
to
get
through
this.
But
basically
what's
happening
is
the
way
that
we
can
tell
what
we're
sending
when
would
be
a
combination
of.
A
B
A
A
B
How
do
we
we
say
that
this
is
an
now?
It's
a
two-parter,
let's
say
it's
a
single
table.
The
goal
is
just
to
get
that
table
into
a
csv
where
someone
can
consume
it
up
on
s3
awesome.
We
just
need
to
know
metadata
about
what
this
is
such
as.
Is
it
complete?
When
did
it
fire
the
the
rows,
even
obviously
we'd
get
that
in
about
two
seconds
reading
it,
but
you
know
any.
A
B
Almost
positive
and
again
we're
using
it
to
use
the
real
example
of
us
firing
off
an
apex
job
that
can
go
up
to
s3
and
talk.
I
I
would
hope
we
were
able
to
get
at
the
metadata.
That's
why
I
asked
I
would
much
much
rather
have
it
here
than
like
parse
a
csv
title.
Sadly,
in
the
past
I've
you
know
resorted
to
the
former.
We
can't
you
know.
No,
thank
you.
Can
the
can
your
code
that
pumps
into
s3
set
this
metadata.
A
B
Go
off
when
it
arrived
and
that's:
okay,
that's,
okay,
and
we
can
we
can
roll
with
that.
It's
like
our
data.
As
of
this
date.
It's
it's
a
it's
a
fact.
Awesome
bring
me
my
next
question.
You
mentioned
this
very
quickly
for
data
that
might
be
dependent
upon
even
intervals
right
if
there
was
a
problem
or
if
there
was
something
where
people
are
expecting
it
to
come
in
on
a
certain
day,
and
what's
it's
not
it's
not
all
the
time.
It's
not
always
right.
What
would
we
you
know?
B
A
There
are
two
ways
and
daniel
can
talk,
there's
two
parts
of
this,
and
I
can
talk
to
one
partner
and
start
the
other
part.
The
ideal
way.
The
way
that
we
want
to
do
this
is
we
want
to
keep
workato
and
s3
as
agnostic
staging
intermediaries
and
mappings
within
airflow.
I
can
you
know
what
would
happen
is
if
there
was
a
failure
in
airflow.
This
would
be
red
and
we
would
get
an
error
that
would
pump
out
to
our
monitoring.
A
It
would
show
up
enough
for
us
in
slack
it
would
be
up
to
whatever
data
engineer
was
triaging
that
day
they
might
end
up
sending
it
to
me
at
this
point,
since
this
is
new
to
go
and
resolve
it
and
fix
it,
and
then
it's
really
easy
for
us
to
rerun
these
jobs,
whether
that
be
through
this
interface
or
I
can
also
exec
into
the
container.
This
is
running
on
and
I
can
run
a
command
that
can
give
me
like
a
window
of
like
run
all
of
the
tasks
between
these
time
frames.
A
That
would
be
the
ideal
way
now
there's
a
possibility
when
this
is
very
new,
so
we're
not
sure
exactly
how
it's
going
to
work
out,
we're
confident,
positive
or
confident
and
optimistic,
but
there's
also
some
possibility.
We
don't
know
what
it
is
that
everything
works
well
in
airflow
and
in
snowflake
and
an
s3
but
workato
for
some
reason
has
a
problem.
Maybe
it's
with
the
api
or
something
else
sure
absolutely
rocado
does
have
a
way
to
do.
Error
handling
and
they've
got
some
retry
setup
over
there,
but
I'll.
B
B
That
would
create
this
but
yeah.
Oh
there's
about
a
billion
things
that
can
go
wrong
all
right,
excellent.
So
my
is
there
a.
Is
there
a
thought
here,
just
while
we're
on
and
then
if
we
want
to
go
on
snapshot
versus
the
appending?
Obviously
we
have
a
live
data
set,
that's
appending
inside
s3,
meaning
we
just
when
we
show
up
and
ask
it.
We
do
the
snapshotting
if
you
will
on
our
side,
meaning
that's
when
we
pulled
it
here.
B
The
data
set
updates,
updates,
updates,
that's
when
we
pull
it
here
and
so
this
now
the
the
consuming
system
is
now
the
only
keeper
of
those
slices
right,
because
it's
just
one
appending
table
up
on
s3.
A
B
Right
and
so,
but
what
I'm
saying
is
because
I
don't
think
we're
workato
is
gonna
cut
where
kano's
gonna
move
the
data.
I
don't
think
it's
gonna
store
the
data
it
will
for
a
time
daniel
can
speak
to
it
better
than
I
can
okay
neat
whatever
whatever
that
thing
is
that
goes
through
there?
Okay,
but
there's,
that's
not
gonna,
be
the
new
keeper
of
the
snapshots.
B
What
I'm
saying
is
is
that
if
we
did
it
where
s3
is
constantly
just
here's,
the
most
recent
data
set
that
you
can
get
at
mr
consumer
consumer
awesome
is
that
now
that
if
the
consumers
are
using
it
in
the
snapshot
model,
not
just
like
a
live,
oh
here's,
the
data,
but
we
need
it
for
trending,
a
very
important
part
of
a
lot
of
the
stuff
we're
doing
is.
We
need
to
know
that
answer
at
a
specific
point
of
time.
So
therefore
we
need
snaps.
B
A
So
if,
in
my
opinion,
if
we
had
a
more
rigorous
need
for
auditability-
and
we
wanted
to
do
some
sort
of
snapshots,
I
would
actually
prefer
to
handle
that
up
in
the
model
in
snowflake
right.
So
the
idea
would
be
hey.
Maybe
we
do
a
snapshot
model
in
the
sense
of
like
we're,
always
sending
all
the
data
that
we
have
available
today,
but
maybe
we
still
send
that
on
a
timestamp
and
what
we
have
in
that
model
is
the
actual
full
history
right
and
then
that
would
give
us
full
audibility.
A
So
we
would
have
the
ability
to
do
that
based
on
the
model
we
had
there,
but
I
want
to
keep
in
my
opinion.
I
want
to
keep
that
out
of
the
the
orchestration
and
the
data
push
here.
A
B
Which-
and
I
love
that
answer
my
question
currently
in
snowflake,
if
someone
asks
I
need
to
know
the
number
of
licenses
we
had
10
days
ago,
how
is
that
solved
by
the.
B
A
Yeah
so
there's
two
ways:
there's
like
the
emergency
way
and
then
like
the
way
that,
like
we
knew
about
it,
so
we
were
supporting
it
right,
yeah,
so
snowflake.
Let's.
A
We
have
so
dbt
our
transformation
tool
has
built-in
capabilities
of
snapshotting
the
data.
So
we
basically
just
point
the
snapshot
feature
at
the
model.
We
want
snapshotted
and
then
every
time
you
run
the
job.
It's
like
hey,
just
keep
just
keep
copying
snapchat
and
time
stamping
and
it
there's
a
valid.
A
A
B
A
B
A
B
B
Going
forward
because
I
love
the
idea
of
solving
the
model-
love
it,
because
there
could
be
things
where
hey
we
see
it
differently,
things
change
and
what
we
don't
want
to
do
is
become
the
keeper
of
what
is
effectively.
You
know.
This
is
the
license
dot
data
that
we
have
just
moved
over
into
salesforce.
If
we're
the
only
ones
with
the
historical
people
like
how
did
you
get?
This
number
be
like
it's?
B
What
was
on
the
s3
bucket
on
that
thing,
right,
yeah
that
that
that's
the
best
answer
we
got,
which
I'm
not
saying
that's
a
terrible
answer.
You
know
we're
we're
doing
this,
to
attempt
to
inform
our
end
users,
the
sales
folks
on
how
to
best
serve
our
our
customers,
it's
all
for
for
good,
but
the
the
it
person
in
me
deep
down,
says
like
okay,
that
there's
a
issue
where
we're
now
the
keeper
of
data
that
we
don't
own
and
we're
the
only
one
with
the
copy
of
it
and
that's
bad
yeah.
We
could.
B
B
Workato
is
obviously
a
more
on
rails.
I
don't
want
to
call
it
low
code,
but
it
could
be
called
load
code
style
thing.
So
I'd
love
to
see
like
how
you
get
around
all
the
kind
of
the
humps
and
hurdles
justin.
I
just
talked
about
yeah
like
how
does
it?
How
does
that
handle
for?
I
know
how
like
a
mulesoft
would
handle
for
it.
That's
right,
because
I
have
experience
with
that,
but
workato.
How
does
that
work.
A
Well,
given
that
we're
at
time
now,
maybe
it'd
be
worth
having
a
separate
meeting
and
I'm
happy
to
help
or
be
involved
with
daniel
parker
he's
he
and
his
team
he
gruner
went
through
and
they
set
up
a
pretty
smart
retry
framework
for
how
they
want
to
handle
this
in
general
in
work,
auto
and
obviously
like
you're
kind
of
pointing
at
instead
of
building
it
ourselves.
B
All
right,
excellent,
we
are
at
time
we'll
go
in,
did
you
want
to
stay
on
and
we
could
we
could
catch
up?
I
haven't
talked
to
dan
in
a
long
time
cool
all
right.
Justin.
Does
this
conclude
the
recording
yeah?
I
think
so.
Yep.