►
From YouTube: Snowflake Ingestion Improvements
Description
Shirshanka Das (Acryl Data) shares recent speed and functionality improvements to ingesting metadata from Snowflake during the July 2022 Town Hall.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
Awesome,
it's
raining
ingestion
today.
Overall,
I
think
right.
So
we've
gotten
a
lot
of
feedback
around
things
that
are
working
well
and
also
things
that
are
not
quite
working
as
amazingly
as
we
would
expect
one
common
one
that
comes
up
often-
and
we
see
this
both
in
the
open
source
data
hub
adopters,
as
well
as
actual
customers
is
people
forget
that
we
have
two
connectors,
often
for
the
same
warehouse,
so
snowflake
bigquery.
A
We
have
two
different
connectors
that
bring
in
slightly
different
metadata
for
each
warehouse,
so
snowflake
the
primary
connector
brings
in
table
schemas
columns,
descriptions
and
lineage,
and
it
you
know,
uses,
show
tables,
and
you
know
additional
queries
to
get
all
of
that
metadata
out
and
then
there's
the
snowflake
usage
connector
that
gets.
You
know
your
usage
counts,
both
tables
as
well
as
columns,
so
the
stop
users
as
well
as
kind
of
the
operational
history
of
the
table
and,
historically
the
reason
we
split.
A
Those
apart
was
because
the
first
one
we
actually
did
it
for
was
bigquery
and
with
bigquery
usage,
the
kind
of
permissions
you
needed
to
get
access
to
the
logs,
as
well
as
the
kind
of
protocol
that
was
being
used,
was
so
different.
It
felt
kind
of
weird
to
mash
up
both
these
sources
into
one,
and
so
we
repeated
the
pattern
for
snowflake
and
then
repeated
it
for
a
few
more
warehouses.
A
But
you
know
one
year
down
this
road,
we're
realizing
it's
not
quite
what
we
had
anticipated
in
terms
of
ease
of
use,
and
it's
not
ideal.
Most
people
do
one,
and
then
they
come
to
the
snowflake
data
set
page
table
page
and
they're
like.
Why?
Don't
I
have
my
queries
in
here?
Why
don't
I
have
top
users
they're
all
these
amazing
things.
A
How
do
I
get
them
and
so
we're
making
big
improvements
to
combine
these
connectors
together
so
that
you
only
have
one
connector
to
configure
and
you
can
get
the
benefits
of
both
of
these
streams
of
metadata
using
a
single
connector
next
slide.
There's
another
common
mistake
that
people
end
up
doing.
A
So
so
those
are
kind
of
two
common
problems
that
people
run
into
with
kind
of
the
split
connector
approach
and
then,
if
you
go
further
down
even
for
that
base
connector
that
we
have
the
one
that
pulls
out
technical
schemas
and
things
like
that.
The
first
time
we
wrote
these
connectors,
we
layered
them
on
top
of
sql,
alchemy
and
sql.
Alchemy
is
great
because
it
is,
you
know,
a
generic
connector
system.
It
can.
As
long
as
you
have
a
dialect,
you
can
go
get
a
bunch
of
metadata
about
anything
that
you
can
connect
to.
A
A
It's
actually
pretty
expensive,
essentially
there's
a
first
query
that
shows
essentially
does
the
equivalent
of
show
databases
and
then
for
each
database.
It
looks
at
all
the
schemas
in
the
database
and
then
for
each
schema.
It
looks
at
all
the
different
tables
and
then
for
each
table
it
goes
and
gets.
You
know,
descriptions
and
other
metadata.
So
essentially
the
number
of
queries
is,
you
know,
order
of
number
of
databases,
times
number
of
schemas
times
number
of
tables
it
can
take
a
while
for
most
production
warehouses.
A
We
end
up
seeing
it
taking
from
five
minutes
for
small
ones,
to
20
minutes
to
sometimes
even
30
minutes
to
an
hour
right
and
that's
not
ideal.
A
Moving
on
what
we
wanted
to
do
was
move
to
a
much
more
efficient
way
of
ingesting
where,
instead
of
depending
on
the
sql,
alchemy
connector,
we
just
use
the
official
client
so,
for
example,
for
the
snowflake
source.
If
we
use
the
official
snowflake
client
and
we
make
more
efficient
queries,
we
could
actually
do
it
in
order.
A
You
know
d
times
s
where
the
first
query
just
gets
the
databases
and
then
for
each
database
we
can
either
get
all
the
schemas
and
the
table
metadata
using
one
query
or
if
it's
too
much
then
for
each
schema,
we
can
just
get
the
table
method
now
and
we
were
like:
let's
try
this
out
and
let's
see
what
happens,
how
much
can
we
actually
shave
off
of
these
latencies
and
the
results
were
actually
pretty
surprising.
A
We
have
an
open
pr
that
proposes
adding
a
new
connector,
we're
calling
it
the
snowflake
beta
connector
and
you
know,
mayuri
ran
some
tests
against
the
long
tail
deployment
and
we
were
able
to
bring
the
latencies
and
this
long
tail
is
just
a
you
know:
a
simple
warehouse:
it's
not
really
a
production
warehouse
and
even
there
we're
able
to
shave
the
the
the
latencies
from
like
four
and
a
half
minutes
to
just
30
seconds,
which
is
amazing,
and
I
can
only
imagine
how
much
we'll
be
able
to
you
know
improve
the
latency
of
like
real
production,
warehouses
with
thousands
and
thousands
of
tables.
A
So
good
news,
the
pr
is
up,
we'll,
probably
merge
it
sometime
later
this
week
it
is
compatible
with
the
current
snowflake
connector
in
terms
of
getting
schemas
lineage
and
table
level
profiling.
So
if
that's
all
you're
using
your
current
snowflake
connector
for
you
can
actually
move
to
this
right
away.
A
A
We
so
we're
first
going
to
roll
out
the
connector
as
part
of
the
next
release.
With
these
caveats,
try
it
out
give
us
feedback,
help
us
improve
it.
We'll
follow
up
really
quickly,
with
addition
of
stateful
ingestion,
column,
level,
profiling
and
adding
in
the
usage
capabilities
from
the
usage
connector.
So
that
will
give
us
like
everything
we
wanted
a
fast,
efficient
connector
with
a
unified
config
profile
that
allows
us
to
get
both
user
statistics
as
well
as
regular
technical
metadata.