►
From YouTube: Improved Deletion Support in DataHub
Description
Surya Lanka & Pedro Silva (Acryl Data) share the latest advancements to managing deletes within DataHub during the April Town Hall.
Learn more about DataHub: https://datahubproject.io
Join us on Slack: http://slack.datahubproject.io
Follow us on Twitter: https://twitter.com/datahubproject
A
So,
basically,
as
part
of
like
improved
deletion
support,
so
we
have
added
three
things
so
so,
basically
one
is
like
deletion
apis.
Basically,
all
resident
points
have
been
updated
so
to
support
time
series
aspect
deletion.
So
we
also
have
support
for
rollback
and
retention.
Basically,
so
this
is
the
new
stuff
coming
that
we
have
managed
to
ignore
so
far
for
time
series
aspects,
so
these
are
special
things
that
basically
live
only
on
elastic
right
now
because
of
their
nature.
So
yeah,
that's
basically
the
sort
of
improvement.
A
So
you
will
see
in
the
docs
once
the
feature
trolls
out.
What
exactly
are
the
changes,
but
at
a
high
level
we
allow
you
to
basically
specify
additional
parameters
for
time
series
aspects.
So
when
you
do
delete
roll
back,
it's
like
there's
no
change
in
the
api.
It
just
deletes
basically
time
series
aspects
as
well
and
retention.
Basically,
it's
like
it's
in
index
lifetime
management
policies
that
allow
you
to
tell
like
when
to
start
cleaning
up
old
data.
A
So
that's
like
at
a
high
level
what
it
is
so
I'll
get
to
the
demo.
I'm
going
to
demo
like
two
most
like
commonly
used
use
cases
for
each
of
these,
so
I'm
going
to
start
with
deletion.
So,
okay,
here
is
my
demo
data
set.
So
if
you
see
there
are
no
so
for
this,
I
want
to
use
basically
the
data
set
profiles,
time
series
aspect,
so
here
there
is
no
data
right
now:
let's
go
ahead
and
magically
ingest
some
data
in
there
okay.
A
A
So
now
I
basically
don't
want
this,
so
I
want
to
basically
go
ahead
and
one
very
common
use
case
is
deleting
the
entire
platform
itself.
So
what
happens
today
is
like
when
that
is
done,
the
time
series
aspects
don't
get
deleted,
so
let's
basically
now
see
that
they
get
deleted
so
with
the
new
code.
So
this
is
files.
So
if
we
see
data
set
profiles
after
injection,
there
are
10
docs,
yes,
as
expected.
A
So
it
asks
we
go
ahead
that
deleted
13
rows
earlier.
It
used
to
say
three
time
series
were
not
liking
through
it
as
part
of
that.
So
let's
go
check
our
elastic
index.
Okay
gone,
so
they
are
gone.
If
you
go
check
here,
it
should
be
like
okay,
nothing!
So
that's,
basically
that
one
so
to
show
quickly
how
rollback
basically
works.
So
this
one
I'm
going
to
ingest
operational
stats,
basically
so
through
this
one.
So
this
is
my
recipe
for
the
demo.
A
So
it
is
basically
this
one
test
right
shift
usage
run
so
that
one
I'm
going
to
basically
roll
back.
A
B
All
right,
thank
you
very
much
for
that
surya.
So
continuing
on
this
idea
of
deletes,
I
want
to
give
you
a
little
bit
of
a
story
or
a
little
bit
of
background
on
some
of
the
missing
features
or
behaviors
that
you
might
naturally
expect
and
might
have
encountered
in
the
past,
particularly
around
deleting
metadata
references.
What
this
means
is
in
the
past,
if
you
try
to
delete
some
metadata
that
was
being
referenced
somewhere
else.
B
So
typically,
let's
say
you
have
a
tag
that
has
been
used
in
multiple
data
sets
and
if
you
delete
that
tag
by
default,
when
you
were
running
or
using
datahub
cli
to
delete
that
tag,
we
didn't
delete
these
references,
so
we
would
have
things
we
had
what
I
would
call
our
ghost
references
in
the
ui,
so
the
tag
would
still
appear
in
your
data
sets
in
ui,
and
this
is
what
we
call
dangling
metadata
and
the
way
that
we
had
to
fix
this
issue.
B
Up
until
now
was
to
manually
identify
where
all
these
references
were,
and
then
issue
mcps
to
actually
unset
that
property,
which
was
no
longer
valid.
However,
this
was
both
confusing
for
users
like
not
the
expected
behavior
in
some
cases,
but
also
occupying
space
in
data
hub's
databases
right
and
in
our
graph
and
indices.
So
this
was
taking
a
lot
of
space,
but
that's
no
longer
the
case.
With
that
in
mind,
let
me
show
you
a
little
demo,
so
in
this
case
hold
on
here.
B
We
are
so
suppose
that
you
have
some
data
set
right.
This
currently
is
tagged
with.
Nothing
has
no
glossary
terms
has
no
domains.
Well,
let's
just
say
that
I
want
to
add
some
tag
here
and
I
want
to
exemplify
what
has
been
happening
up
until
the
stage
in
this
case,
I'm
going
to
say
that
browser
id
is
some
pii,
so
I'm
going
to
create
a
tag
based
on
that.
B
Okay-
and
I
have
this
tag,
and
what
I'm
going
to
do
right
now
is
just
get
its
identifier,
which
is
up
here
and
I'm
going
to
delete
it,
as
that
would
happen
right
now,
so
delete
dash,
dash
burn,
we
pass
it
the
id
or
the
urn
of
the
resource
I
want
to
delete.
This
would
have
to
be
a
hard
delete
and
then
I'm
just
going
to
add
a
flag
all
which
I
implemented
just
for
demo
purposes
to
exemplify
what
has
been
happening
in
the
past.
So
do
I
want
to
delete
this?
B
Yes,
it
says
that
it
deleted
it.
Okay,
that's
all
fine!
I
refresh
this
page
and
yet
I
still
have
a
reference
to
a
pii.
However,
if
I
search
for
it,
let
me
just
put:
I
want
to
search
for
everything
and
search
my
tags.
It's
not
here
like
this
does
not
exist,
so
this
is
the
inconsistency
that
we
had
in
the
past
right
now.
B
If
I,
on
the
other
hand,
do
something
like
just
set
a
domain
for
this,
let's
say
that
I
want
to
create
a
domain
called
sales,
for
instance,
and
I
want
to
tag
that
data
set
with
this
domain
right.
So
I'm
just
going
to
come
here
and
I'm
going
to
add
it.
B
It's
being
referenced
right
here
and
I'm
going
to
do
the
exact
same
thing,
I'm
going
to
try
to
delete
this
domain
to
deleting
not
only
the
entity
itself,
but
also
the
reference
that
this
data
set
has,
and
that
is
done
simply
by
deleting
passing
it.
The
urn
heart
notice
that
I'm
not
using
the
old
version.
Now.
This
is
the
new
logic.
A
B
Being
referenced
somewhere
else,
we
will
provide
you
with
a
summary
of
where
those
references
exist
for
now
we're
just
showcasing
10
references,
but
you
will
have
the
total
number
of
accounts
across
the
entire
medicare
graph.
So
this
is
a
matter
of
do
you
want
to
delete
these
references?
Yes
or
no?
If
you
say
no,
it's
the
old
behavior.
If
you
say
yes,
we
will
clean
up
everything
for
you.
So
if
I
now
refresh
the
page,
the
domain
is
no
longer
set
and
if
I
go
into
domains,
sales
is
gone.
B
B
So
when
you
specify
burn
and
specifically
when
you
do
a
hard
delete,
because
that's
when
you
delete
from
the
databases
when
you
do
a
soft
delete,
it
still,
it
doesn't
appear
in
the
ui,
but
you
still
have
it
in
the
database,
and
that
still
holds-
and
the
second
thing
is
that
computing-
these
dangling
pointers
so
figuring
out
where
they
are
in
our
metadata
graph
and
generating
the
necessary
updates
across
the
entire
graph
right
now.
This
is
a
synchronous
operation.
B
It
might
be
a
heavy
operation
if
the
entity
that
you
are
deleting
is
referenced
in
tens
of
thousands
or
millions
of
places.
To
give
you
a
little
bit
of
notion
of
the
performance,
removing
a
thousand
references
takes
roughly
eight
seconds.
Removing
fifteen
thousand
takes
roughly
fifty.
So,
as
you
can
see
this
scales,
however,
we
will
be
very
shortly
releasing
or
working
on
a
second
iteration
of
this
project,
where
we
will
address
these
limitations
in
the
very
near
future.
B
So
you
don't
even
have
to
worry
about
it,
we'll
make
the
processing
completely
asynchronous
and
add
by
making
it
a
synchronous,
it's
no
longer
in
the
blocking
path.
So
operations
like
delete
by
platform
delete
by
registry
even
potentially
rollbacks,
that's
something
that
we
will
be
able
to
do
and
make
it
consistently.
B
A
Thanks
so
much
pedro
thanks,
surya
great
job,
y'all.