►
From YouTube: BigQuery Lineage and Lineage Performance Improvements: Oct 29 2021 Community Town Hall
Description
Varun Bharill and Gabe Lyons from Acryl Data share performance improvements to complex lineage views, and a new feature to infer dataset lineage for BigQuery datasets.
Join us at our next Town Hall - RSVP here: https://forms.gle/g8EpCLnohtPLLtdg6
A
Green
and
I
had
a
few
updates
about
the
lineage
view
on
datahub,
so
the
first
update
that
we
want
to
give
is
performance
improvements
just
like
in
profiling.
We've
also
made
some
performance
improvements
on
the
lineage
side.
So
if
you've
had
issues
in
the
past
loading
your
lineage
view,
especially
for
complex
lineage
views,
I
would
encourage
you
to
try
again.
The
page
is
going
to
be
a
lot
more
responsive.
It's
going
to
load
a
lot
faster,
so
definitely
give
it
a
go.
We
also
added
a
fun
little
easter
egg.
A
Sometimes
you'll
see
some
very
complex,
lineage
views
just
like
this
one.
It
can
be
hard
to
understand
exactly
what's
going
on,
so
we've
enabled
drag
and
drop
that
lets.
You
move
your
nodes
around
try
to
pull
things
out,
separate
them,
and
this
will
hopefully
help
you
get
a
better
understanding
in
very
interconnected
graphs
about
what
is
going
on
so
excited
about.
Both
these
improvements
definitely
check
it
out
on
the
demo
site
or
go
and
give
another
go
using
lineage
on
your
own.
A
B
All
right,
so
thanks
a
lot
gabe.
What
I'm
going
to
demo
today
is
some
of
the
recent
improvements
we've
made
to
the
bigquery
ingestion
process,
but
just
so
that
I
honor
the
time
what
I'm
going
to
do
is
I'm
going
to
kick
the
ingestion
process
and
then
I'll
continue
talking.
B
So
the
way
we
have
set
up
this
demo
is
that
essentially,
what
we're
trying
to
do
is
when
we
have
some
derived
tables
that
are
created
on
top
of
some
raw
tables.
So
essentially
after
we
ingest
the
bigquery
data
sets,
what
we
want
to
see
is
a
lineage
among
the
derived
tables
and
to
the
corresponding
raw
tables.
B
So,
as
you
can
see
that
there
are
two
tables,
one
is
seller
categories
and
seller
earnings
that
I've
just
created
as
a
sample
drive
table
and
from
the
query
history.
You
can
also
see
the
logs
so
so
this
seller
categories
is
built
on
top
of
these
three
or
four
raw
tables.
So,
ideally,
we
want
to
see
a
lineage
ads
going
from
the
seller
categories
to
the
raw
tables
and
the
way
we
do.
This.
B
Sorry,
I'm
just
looking
for
a
slide
yeah,
so
we
are.
What
we're
doing
is
that
we're?
We
are
inferring
the
lineage
information
from
the
google
audit
logs,
and
so
whenever
we
issue
a
query,
a
log
gets
recorded
and
we're
looking
at
that
log
information
which
has
specific
information
about
what
tables
were
referenced
and
what
was
the
destination
table.
B
So
when
we
created
the
derived
table,
a
typical
log
entry
looks
like
this
and,
as
you
can
see,
it
has
information
about
what
the
reference
tables
were
and
what
the
destination
tables
were,
and
it
also
has
a
ton
of
other
metadata
information
like
how
long
this
query
took
and
how
many
query,
how
many
rows
were
output,
and
so
at
some
point
we
can
also
leverage
that
so
going
back
to
our
ingestion
process,
you
can
see
that
all
the
tables
were
ingested
along
with
that.
We
have
two
lineage
work
units
as
well.
B
So
if
we
look
at,
let's
say
the
seller
earnings
and
if
you
look
at
its
lineage,
we
see
that
we
have
been
able
to
connect
it
to
the
underlying
raw
tables.
So
just
a
quick
demo
around
it.
Please
try
it
out
and
if
there's
any
feedback,
please
let
us
know
and
happy
to
brainstorm
for
the
wrong
improvements.
B
C
Up
this
does
require
you
to
have
access
to
the
google
cloud
api
so
that
there's
a
document
about
it
on
the
ingestion
source
page,
so
you'll
need
to
talk
to
your
I.t
department
or
whatever.
The
right
word
is
to
get
the
google
cloud
api
credentials
and
be
able
to
set
it
exactly
like
how
did
bigquery
usage
works.
So
it's
the
same
requirements
that
are
now
moving
up
into
the
regular
bigquery
source
as
well.