►
From YouTube: Inside the Indexer - Louis DeLosSantos (Red Hat)
Description
Inside the Indexer
How Clair extracts and persists your container contents
Louis DeLosSantos (Red Hat)
2021-04-26
OpenShift Commons Briefing #Upstream #AMA
For more information about Clair:
https://github.com/quay/clair
Slides: https://github.com/openshift-cs/commons.openshift.org/blob/master/briefings/slides/Inside%20The%20Indexer.pdf
A
Hey
everybody
welcome
to
another
monday
openshift
commons
briefing
with
another
wonderful
upstream
project,
this
one
claire.
We
have
luis
de
los
santos
here,
who's
a
principal
software
engineer,
working
on
the
claire
project
for
red
hat
and
he's
going
to
take
us
inside
the
indexer,
and
so
I'm
going
to
let
lewis
introduce
himself
introduce
what
everything
is
going
on
in
the
claire
world
today
and
if
you
ask
your
questions,
throw
them
in
the
chat
and
we
will
answer
them
at
the
end
of
the
presentation.
B
So
what
this
talk
is
really
trying
to
uncover
is
is
getting
the
community
or
the
watchers
of
this
presentation
more
acquainted
with
the
internals
of
how
claire
works.
B
Claire's
fundamental
goal
is
to
provide
insights
about
containers
to
the
client,
whether
that's
a
developer
or
an
operations
team.
We
want
to
show
you
what
exactly
is
inside
the
container
and
what
might
be
vulnerable
and
have
your
teams
patch
those
things
or
act
accordingly.
B
To
do
this,
it
becomes
obvious
that
we
need
to
understand,
what's
inside
the
container,
extract
the
contents
and
place
them
into
some
kind
of
schema,
which
is
searchable,
and
that's
what
this
talk
focuses
on
inside
the
indexer
indexer
is
a
service
which
actually
takes
all
the
layers
from
a
container,
looks
inside
them
pulls
out
the
contents
and
creates
a
report.
B
So
what
is
indexing?
Indexing
is
a
term
that
that
we
use
with
the
process
of
extracting
the
contents
of
the
container
itself.
B
It
is
the
first
step
in
claire's
analysis
pipeline,
so
inside
claire's
pipeline
we're
trying
to
take
a
container
and
we're
trying
to
understand
what
content
is
vulnerable.
We
split
this
pipeline
into
several
phases,
and
indexing
is
the
very
first
phase,
it's
responsible
for
creating
an
index
report
which
we're
going
to
go
into
detail
in
just
a
bit.
B
So
if
we're
looking
at
the
complete
claire
pipeline
to
create
a
vulnerability
report,
this
is
what
we're
looking
at.
This
is
the
30
000
foot
view.
I
have
highlighted
the
portion
of
the
application.
The
portion
of
the
pipeline,
which
we're
going
to
cover
today
in
this
talk.
What
you'll
notice
is
we
take
a
container
manifest?
B
We
feed
that
to
the
indexer.
The
indexer
performs
a
bunch
of
work
which
we're
going
to
go
into
in
detail
in
this
talk,
and
then
it
generates
an
index
report,
which
is
the
findings
of
of
the
work
that
it
just
performed
on
the
container
manifest.
B
So
there's
a
couple
key
components
here
now,
if
you'd
like
to
follow
along
or
you
go
back-
and
you
want
to
look
at
this
talk
if
we're
in
claircore,
which
is
our
project,
this
is
the
engine.
This
is
what's
really
doing
the
scanning
in
in
the
clear
project.
B
If
you
do
want
to
follow
along
in
our
source
tree,
the
indexer
code
is
in
this
internal
package
and
then
there's
the
indexer
directory
here.
Almost
everything
we're
going
to
cover
in
this
talk
is
laid
out
within
this
indexer
directory
and
there
will
be
a
lot
of
references
back
to
this.
So
if
you
are
interested
in
following
along
or
just
you
know,
you're
looking
at
this
talk
at
a
later
date
and
you're
trying
to
piece
together
what
we're
talking
about
to
the
code.
B
This
is
the
directory
of
interest,
so
the
key
components
in
this
little
section,
I'm
going
to
cover
the
data
models,
basically
how
we
go
and
structure
our
data
to
accomplish
this
goal
of
extracting
the
contents
and
reporting
what
we
found
inside
the
container.
B
So
first
we
have
a
manifest.
The
manifest
represents
a
container
image
for
us,
you'll
notice,
it's
made
of
a
slice
of
layers.
Those
are
order
dependent.
So
if
you
go-
and
you
created
a
container-
you
know
with
with
docker
or
pod
man.
Those
layers
are
created
with
a
parent
child
relationship
and
we
represent
that
with
the
slice.
B
So
when
you
submit
a
manifest
to
us
you're,
creating
the
same
concept
as
the
layers
hierarchy,
the
mat,
sorry,
the
containers
hierarchy
of
layers-
you
represent
that
with
a
slice
of
layers
to
us
and
then
just
the
hash
digest
of
the
manifest
the
content
addressable
hash,
signifying
the
manifest
as
a
whole.
B
Now
this
is
the
index
report.
This
is
how
we
communicate
to
clients
what
exactly
we
found
inside
containers.
So
we'll
start
again
with
the
with
a
hash
right.
This
is
a
hash
of
the
manifest
as
a
whole,
so
you
can
think
of
it
as
a
unique
identifier
for
the
container
and
its
layers
in
that
unique
ordering
and
what
happens
to
obtain
this
hash.
B
You
might
know
about
this
if
you're,
if
you
know
about
how
docker
images
are
built,
but
if
you
don't,
this
hash
is
actually
computed
by
taking
the
hash
over
each
individual
layer
inside
the
container
and
that
computes
a
final
content,
addressable
hash,
the
state,
this
is
used
internally
and
it
is
exposed
to
http
clients
who
might
want
to
query
the
indexer.
B
So
the
way
this
works
is
that
when
you
submit
a
job
to
the
indexer,
if
you
were
to
try
to
submit
the
same
job,
we
would
actually
give
you
back
this
structure
and
give
you
the
state
of
the
index.
So
claire
is
smart
enough
to
know
like
hey
we're
working
on
this
right
now,
but
here's
the
state
you
might
want
to
pull
the
state
if
you
want
to
pull
the
state
just
just
wait
until
you
see
an
error
or
you
see
that
it
is
successful.
B
So
we
don't
do
this
in
quay
right
now,
but
as
a
usability
factor,
you
could
write
clients
which
just
kind
of
sit
there
and
like
pull
on
their
job.
It's
part
of
the
design
specification
for
the
indexer
itself
packages.
This
acts
as
a.
Let
me
actually
take
a
step
back,
you'll
notice
packages,
distributions
repositories,
they're,
actually,
maps
right.
It's
a
map
string
with
the
actual
structure
of
the
package,
distribution
and
repository.
B
So
the
index
report
is
really
acting
like
a
portable
database.
We
do
this
for
de-duplication
reasons.
It
would
be
unfortunate
if
we
just
continued
to
write
the
same
package
strings
for
every
layer
that
we
found
it
in
or
had
to
duplicate
that
information.
B
So
when
you're,
looking
at
the
index
report,
you
actually
kind
of
want
to
treat
it
as
a
database
which
has
key
values
that
you
can
string
together
to
understand
where
certain
packages
were
found.
B
So
the
way
this
works
is
you'll,
look
at
the
packages,
we'll
call
it
a
database
and
it
has
the
id,
and
then
it
has
the
package
name.
So
right
now,
this
you
can
picture
this
as
a
deduplicated
database
of
all
the
packages
that
were
found
inside
your
container
same
thing
with
distributions,
we
could
technically
identify
more
than
one
distribution.
B
We
typically
don't
if
it's
a
normal
container,
but
sometimes
there's
dist
upgrades
or
sometimes
you
know,
there'll
be
more
than
one
file
that
gives
us
a
hint
on
the
distribution
of
the
actual
container,
and
this
is
you
know
whether
the
container
is
rel,
whether
the
container
is
set
os,
whether
the
container
is
debian,
that's
what
this
is
representing
and
then
repositories
these
as
if
these
are
usually
language
repositories.
So
if
we
find
pip,
if
we
find
npm
they'll
be
represented
here
and
then
the
environments
is
what
strings
this
together.
B
So
when
you're
looking
at
environments,
this
will
basically
give
you
the
idea
of
saying,
okay.
We
found
this
package
in
this
layer
at
this
file
system
path
and
we
needed
to
do
this
to
support
language
packages,
because
once
you
start
supporting
language
packages,
you
have
this
predicament
where
the
same
packages
could
exist
in
multiple
directories
across
the
file
system,
for
instance,
if
you
are
using
npm
and
you
have
a
forms
library,
you
might
use
that
in
five
projects
that
are
scattered
around
the
containers
environment.
B
So
we
record
each
one
uniquely
without
having
to
duplicate
the
packages
identity
by
compressing
them
into
these
small
databases.
So
that's
really
the
bulk
of
what
the
index
report
is
providing
you
now,
there's
just
some
bookkeeping
whether
you
had
a
success
or
not.
So
this
is
again
if
you're
a
client,
that's
polling
and
you
want
to
know
okay
did
we
have
a
successful
index
or
not.
B
B
So
now
we
have
scanner
interfaces.
This
is
a
very
important
concept
when
dealing
with
the
indexer,
because
this
is
the
externally
implemented
sections
of
code.
B
The
each
scanner
is
repre
is
in
charge
of
taking
a
container
layer
and
then
parsing
through
it
and
finding
the
desired
content
that
they're
interested
in.
So
we
wrote
these
as
interfaces
allowing
other
teams
other
upstream
contributors
to
come
in
and
say:
okay,
I
want
a
jar
package
scanner
you'd
come
in
you'd
implement
this
interface,
which
takes
the
layer,
looks
for
jars
and
just
parses
them
into
packages,
returns
it
to
the
clear
code,
very
simple,
and
this
has
been
proving
itself
useful.
With
the
crda
integrations.
We've
done.
B
I
was
working
with
a
roon
on
code,
ready
containers,
and
I
discussed
this
with
them.
I
was
like
how
interface
is
doing
and
he
showed
us
the
pr.
It
was
all
code
ad
nothing
needed
to
be
changed,
so
this
abstraction
has
been
working
pretty
well
for
us.
The
same
goes
for
distribution
scanner
or
repository
scanners.
B
This.
This
plays
an
important
role
later
on
in
the
talk,
but,
as
you
can
imagine,
the
indexer
is
taking
container
layers
and
trying
to
understand
what's
inside
them.
This
does
the
bulk
of
that
work.
We
hand
each
implementation,
a
layer
and
it
can
go
ahead
and
scan
through
it
and
understand
with
its
own
business
logic.
The
clear
code
proper
isn't
too
concerned
about
what's
happening
in
there.
B
So
the
flexibility
is
there's
a
lot
of
flexibility
there
to
to
perform
package
scanning
distribution
scanning
and
repository
scanners
scanning
the
way
the
ecosystem
sees
fit,
whether
that's
npm
or
python
or
whatever,
and
then
there's
a
coalescer.
So
this
this
is
probably
my
favorite
part
of
the
indexer,
and
it's
somewhat
this
took
a
little
bit
of
time
to
figure
out.
You
know:
how
do
we
do
this
right?
So
let
me
think
of
the
best
way
to
explain
this.
B
When
you
have
a
normal
container,
it's
a
series
of
discrete
tar
balls
right
and
that
they
represent
file
system
layers.
B
B
There
might
be
situations
where
the
packages
I
found
in
layer-
one
don't
even
exist
in
the
later
layers.
So
we
don't
want
to
put
that
in
the
final
index
report
because
they
were
deleted
in
some
intermediate
layer.
So
the
coalescer
is
another
interface
which
handles
this
business
logic.
So
it
can
go
ahead
and
it
looks
at
layer
artifacts
which
are
similar
to
the
index
report,
but
they
represent
the
individual
packages.
B
Distributions
repositories
found
inside
an
individual
layer,
so
the
coalescer
will
take
a
list
of
these
artifacts
and
with
its
own
business
logic,
it
will
understand
whether
it
should
actually
keep
or
remove
particular
artifacts
from
the
final
index
report
in
a
similar
fashion,
as
if
you
were
a
container
runtime
and
you
had
to
apply
a
set
of
layers
on
top
of
each
other
to
get
the
final
container
file
system
image.
That's
going
to
run
on
the
host.
We
do
this.
We
obviously
don't
have
to
do
it
with
the
file
system
in
mind.
B
We
have
to
do
it
with
the
end
goal
of
creating
an
index
report
in
mind.
So
a
little
bit
of
in-depth
detail
there
there's
two
implementations
of
the
coalescer,
currently
there's
one
specifically
for
rel
that
you'll
find
and
then
there's
a
generic
one.
So
let's
take
a
look
at
that
real,
quick
inside
this
inside
our
root
directory.
B
If
you
go
into
internal
indexer
linux,
because
this
is
a
linux,
focus
coalescer,
the
coalescer
is
inside
here,
and
this
would
be
a
really
valuable
piece
of
code
to
understand
how
we
actually
go
about
creating
these
final
index
report
and
there's
a
little
bit
of
heuristic
in
here.
B
B
We
now
have
to
somewhat
back
fill
that
information
to
previous
containers
and
then
attribute
the
packages
found
in
those
previous
containers
with
the
distribution
information
we
found
later
on-
and
this
is
just
the
nature
of
claire
right,
like
this-
is
kind
of
what
makes
claire
a
unique
application
in
the
fact
that
it's
dealing
with
piecemeal
information,
the
entire
way
through
and
we're
kind
of
finding
novel
ways
to
stitch
this
information
together
and
then
create
a
cohesive
result.
That
represents
the
final
image.
B
So
the
architecture
of
the
indexer
itself,
it's
has
a
restful,
http
api
and
we've
written
this
in
such
a
way
that
you
could.
Theoretically,
if
your
application
needs
were
simply
just
I
want
to
know,
what's
inside
the
container,
I
don't
care
about
vulnerabilities
or
matching
them
against
anything.
You
could
go
and
you
can
take
the
the
indexer
and
use
it
as
a
discrete
service.
It
has
no
other
dependencies.
B
So
if,
for
some
reason
you
had
the
idea
of
like
okay,
you
know
I'll
go
and
I'll
do
my
own
vulnerability
matching.
Given
that
I
have
this
little
piece
of
code,
this
service,
that's
able
to
give
me
the
contents
of
a
container,
then
you
can
simply
just
use
this
alone
and
there's
a
restful
http
api
to
do
that.
It
is
also
architected
and
modeled
as
a
finite
state
machine
and
the
reason
we
did
this
well,
if
you're
not
quite
sure
what
a
finite
state
machine
is.
B
It's
a
set
of
logical
steps.
States,
if
you
will
that
house
business
logic
so
as
you're
moving
through
this
business
logic,
you're
transitioning
via
states.
Now
what
this
allows
you
to
do
is
when
we
were
re-architecting
claire
v4.
We
wanted
to
be
able
to
basically
quickly
say:
okay,
there's
something
else
we
need
to
do.
B
We
don't
want
to
refactor
the
entire
application,
so
if
we
model
it
in
a
state
diagram
like
this,
don't
don't
worry
about
knowing
all
this
we're
literally
going
to
go
over
every
step
in
this,
but
if
we
model
it
as
a
state
diagram,
and
then
we
decided
like
hey
after
scan
layers,
we
actually
needed
to
do
this
new
thing.
We
just
pop
it
into
the
diagram
and
then
do
the
plumbing
necessary.
B
Almost
no
code
has
to
be
refactored,
which
has
worked
out
very
well
for
us,
because
when
we
were
re-architecting,
for
instance,
index
manifest
came
as
a
requirement
much
later
in
our
development
cycle,
and
it
was
almost
no
refactoring
was
necessary
because
we
just
created
a
new
state
and
popped
it
into
the
state
diagram.
B
It
is
a
comic
pattern,
I'm
just
explaining
it
in
case
you're,
not
too
aware
of
of
what
that
looks
like.
So
I
have
a
little
snippet
of
code
here
about
how
the
state
machine
runs
and
let
me
go
back
to
the
source
code.
Just
in
case
you
do
want
to
follow
along
the
actual
state
machine
is
in
the
same
directory,
the
internal
indexer,
and
we
call
it
a
controller
to
follow
along
with
a
lot
of
the
semantics
around
claire.
B
Maybe
a
smaller
side
is
that
a
lot
of
times,
you'll,
see
interfaces
and
then
controllers
and
the
way
we
kind
of
architected
claire
as
a
whole,
is
that
people
upstream
individuals
very
simply
just
implement
interfaces
and
the
controller
handles
most
of
the
business
logic.
This
separation
has
made
contribution
pretty
seamless,
because
contributors
don't
need
to
worry
about
databases,
they
don't
need
to
worry
about
how
claire
actually
stitches
things
together.
All
they
really
worry
about
is
implementing
interfaces,
and
then
we
have
controllers
which
control
these
interfaces.
B
So
I'm
just
going
to
go
over
the
actual
run
method
here,
because
I
think,
if
you
are
trying
to
follow
along,
then
it
can
give
you
good
insights.
B
We
have
this
this
dictionary
or
this
map
of
state
names
to
state
functions
and,
as
you
could
assume
state
functions,
are
the
actual
business
logics
of
each
of
these
states.
So
you'll
probably
see
like
a
fetch
layer
function.
B
What
we
do
is
we
get
the
current
state
of
the
state
machine
and
then
we
automatically
run
that
state
function
that
state
function
is
going
to
return
a
new
state.
It's
a
very
recursive
algorithm.
Here
it's
going
to
return
a
new
state,
we're
going
to
see
if
we
need
to
do
any
error,
handling
we're
going
to
check
if
we're
at
the
terminal
state,
which
is
a
canonical
way
of
saying,
everything's
done
you,
you
can
halt
the
machine.
B
Once
we
have
once
we
determine
it's,
not
the
terminal
state,
we
set
the
machine
state
to
the
one
that
was
just
returned.
We
do
a
little
bit
of
bookkeeping
here
which
writes
the
new
state
to
the
database.
This
goes
back
to
if
a
client
is
polling,
the
indexer.
B
B
So
it's
a
it's
a
it's
a
recursive
algorithm,
but
again,
what's
nice
is
that
when
we
update
the
state
diagram,
we
we
don't
really
refactor
anything.
We
just
add
a
new.
We
just
add
a
new
state
function
and
then
we
update
the
state
map.
B
Okay,
so
now
I
want
to
dig
into
each
of
these
states
individually
to
give
you
an
idea
of
of
what
the
indexer
is
doing.
So
the
very
first
state
that
that
we
enter
when
you
submit
a
container
manifest
to
the
indexer
is
called
check
manifest,
and
it's
exactly
what
you
think
it
is
it's
we
determine
if
we've
ever
seen
this
manifest
before.
B
So
we
can
do
this
because
of
content
addressability.
So
I
don't
know
if
you've,
if
anyone
has
watched
previous
talks
with
me
and
claire
we're
always
hammering
on
this,
the
content,
addressability
aspect,
what
this
really
means
is
that
if
we
see
a
manifest
with
a
particular
hash,
it's
content
addressable,
it's
the
same
content,
no
matter
when
we
scan
it
again,
no
matter
when
we
see
it,
we
can
always
be
sure
that
the
same
layers
with
the
same
content
makeup
that
manifests.
Therefore,
if
claire
sees
it,
it
can
go.
B
Oh
okay,
I've
seen
this
manifest.
I
don't
need
to
do
anything
else.
I
can
literally
just
return
the
index
report
that
I've
already
computed
for
this
manifest
now,
in
this
case,
we're
going
to
say
claire
has
not
seen
this
manifest.
So
it's
going
to
go.
Okay,
move
it
through
the
pipeline.
We
need
to.
We
need
to
move
it
forward.
Now,
there's
a
bit
of
a
subtlety
here
and
just
something:
that's
nice
to
know
when
you're
working
with
multiple
scanners.
B
What
we'll
actually
do
here
is:
let's
say
that,
let's
say
claire
has
seen
the
manifest,
but
now
the
implementer
of
the
jar
scanner
for
claire
made
some
changes
to
it
and
it
might
detect
things
a
little
differently.
B
So
this
adds
to
the
ability
of
just
doing
as
little
work
as
possible,
and
you
can
see
that
actually
in
the
source
code
there's
a
little
section
where
we
we
clip
off
scanners
if
they
have
actually
scanned
the
manifest
before
so
just
a
little
tidbit
of
information.
That,
I
think
is,
is
nice
to
bring
along
just
in
case
if
you
see
it
in
the
source
code.
B
B
So
in
this,
in
this
state,
claire
is
trying
to
determine
which
layers
it
actually
needs
to
go
out,
spend
system
resources
to
fetch,
download,
decompress,
possibly
and
then
scan
so
in
this
diagram.
Here
I
express
the
common
case
of
claire
deciding
this
base
layer.
I've
seen
it
already.
I
don't
need
to
go
and
grab
it.
B
B
This
is
a
small
change
which
adds
a
lot
of
benefits
from
claire
v2
declare.
V4
clearv2
would
actually
do
all
the
work
in
memory
which
can
be
problematic
if
you're
trying
to
pull
down
gigs
of
layers
right.
So
now
we
buffer
to
disk
when
you're
running
clear,
it's
we
advise,
you
know,
definitely
have
at
least
100
gigs
of
scratch.
Space
ssds
will
help
because
we
actually
do
use
the
disk
layer
quite
a
bit
for
buffering
data,
especially
for
very
large
layers.
B
Now
we
go
into
the
next
state,
so
we
have
the
layers
they're
local,
on
the
file
system.
Now
we
take
the
scanners,
which
were
computed
in
the
check
manifest
state.
We
say:
okay,
we
have
this
list
of
scanners.
We
know
what
we
want
to
scan
inside
the
container.
Now
we
do
that
work
so
the
way
the
scanning
state
works
is
it
takes
that
list
of
scanners
and
it
will
concurrently
via
go
routines,
fan
out
the
scanning
business
logic.
B
The
controller
does
this
right:
it
knows
the
implemented
scanners
that
are
configured
it'll,
fan
them
out
and
then
hand
them
each
layers,
and
then
they
just
begin
scanning
the
layers
and
then
they
return
their
contents
back
to
the
controller.
And
then
the
controller
will
write
those
contents
to
the
database
so
in
the
scan
layer
phase
is
when
we
are
actually
computing.
What's
inside
each
layer
and
then
storing
the
partial
results
of
each
layer
into
the
database.
B
So
to
touch
on
this
a
little
bit,
I
I
went
over
this
in
the
components
section
a
little
bit,
but
just
as
a
refresher,
because
it
is
a
lot
of
data
throughout
the
talk.
So
I
wanted
to
put
little
reminders.
This
is
what
the
package
scanner,
the
distribution
scanner
and
repository
scanners
look
like,
as
you
can
tell
when
you
call
their
scan
methods
they're,
given
a
layer.
B
If
we
go
back
just
a
bit,
you
will
know
that
the
layer
was
buffered
to
disk,
so
it's
a
little
bit
abstracted,
but
the
scanner
can
get
a
tar
handle
to
the
layer
if
they
want,
or
we
do
have
some
abstraction
methods
on
the
layer
that
says
hey
just
give
me
this
file.
So
an
example
of
this.
B
So
that's
the
level
of
abstraction
that
you
can
expect
if
you're
trying
to
implement
these
scanners
yourself
for
your
own
purposes
inside
claire.
That's
all
you
have
to
do
so,
then,
when
we
get
back
these
clear
core
packages,
these
clear
core
distributions
and
repositories,
we
simply
just
write
them
to
the
database
with
our
with
our
own
database
handling
logic,
all
internal
declare,
so
any
implementers
will
not
have
to
worry
about
that.
B
This
is
a
look
inside
the
claire
data
model
of
how
we
actually
stitch
these
items
that
are
found
during
scanning
together
into
an
erd
for
searchability.
So
we
created
the
idea
of
scan
artifacts
inside
the
claire
database,
and
what
this
is
really
doing
is
it's
just
tying
together
the
ability
to
search,
saying,
okay,
we
found
this
package
in
this
layer
and
it
was
found
by
this
scanner.
B
This
data
model
makes
it
possible
to
say
a
new
scanner
has
been
changed.
Let's
scan
that
layer
again
because
we're
recording
exactly
the
scanner
name
version
and
kind
which
found
these
artifacts.
B
So
I
mentioned
coalescing
a
little
bit
about
how
this
is
how
claire
computes
all
this
partial
data
into
a
final
index
report
and
the
way
this
works
is
the
business
logic
in
in
the
controller
will
go
ahead
and
it'll
ask
for
the
scan
artifacts
for
both
these
layers
right,
the
the
two
layers
that
we
scanned
it's
going
to
go
and
it's
going
to
get
this
layer,
artifact
structure,
which
has
the
packages,
the
distributions,
the
repositories
and
it's
going
to
go
ahead,
and
it's
going
to
get
the
the
other
layer
artifacts
from
df0
and
you're,
going
to
feed
both
these
layer,
artifact
structs
to
the
coalescer.
B
Now
what
the
coalescer
wants
to
figure
out
is.
How
can
I
attribute
packages
to
distributions
we
touched
on
this
a
little
bit,
but
the
distribution
information
might
be.
You
know
in
layer,
10
and
the
packages
database
might
be
in
layer
2..
So
we
have
to
somehow
coalesce
this
and
backfill
distribution
information.
B
We
also
have
to
figure
out
which
packages
should
remain
in
the
final
index
report
and
what
packages
should
be
deleted.
You
know
based
on
the
state
of
each
individual
layer,
so
the
coalescer
works
similar
to
the
scanners,
in
the
fact
that
the
business
logic
in
the
controller
will
spawn
coalescers
with
go
routines,
run
them
in
parallel.
B
They'll
both
create
their
own
representation
of
the
final
index
report,
and
then
we
just
merge
them
together
to
get
the
final
index
report
with
the
final
set
of
contents
that
are
left
inside
the
image.
B
So
this
is
in
the
index
manifest
state
is
where
we
make
the
contents
of
a
container
searchable.
B
It's
not
a
super
complex
data
model
or
erd
diagram,
but
basically,
what
we
are
doing
is
we
just
have
a
giant
link
table
that
basically
says
we
found
this
package
in
this
manifest.
We
found
this
distribution
in
this
manifest.
B
We
found
this
repository
in
this
manifest
where
this
comes
in
handy
is
when
a
new
vulnerability
enters
the
claire
system.
Vulnerabilities
are
usually
tied
to
packages
and
distributions
right.
So
if
you
have
the
rel
pulp
security
database,
when
you
look
at
a
vulnerability,
it's
going
to
say
like
open,
ssl,
rel
eight,
you
can
take
that
vulnerability
and
ask
the
indexer
hey,
which
manifests
have
openssl
and
are
of
the
distribution
rel8
and
this
index
manifest
makes
that
possible
to
give
you
that
answer.
B
It
happens
after
coalescing,
so
we
get
the
final
computed
results
of
what's
available
inside
the
container
image
we
index
that,
hence
the
name
indexer,
and
then
it
becomes
searchable
in
in
the
way.
The
aforementioned
way,
when
a
vulnerability
is,
is
attributed
to
a
particular
package
and
distribution,
and
this
is
exactly
what
the
data
model
looks
like
at
the
end.
I
I'll
go
through
the
erd
diagram
and
I'll
go
through
basically
the
database
code
and
then,
finally,
we
have
index
finished,
and
this
is
very
simple
state.
B
This,
basically
just
massages
the
the
state
and
success
keys,
while
the
values
in
the
index
report
and
then
writes
them
to
the
database.
B
Deferring
work
so
claire
we
touched
upon
this
throughout
the
talk,
but
one
of
claire's
main
goals
is
to
do
as
little
work
as
possible,
so
we
can
compute
results
and
give
them
to
the
client
as
fast
as
possible.
B
So
I
want
to
just
review
real
quick
some
of
the
ways
that
player
v4
does
this
deferment
of
work,
so
the
big
first
part
is
just
the
manifest
scene
start
right
because
of
content
addressability.
B
If
we
have
seen
a
manifest
before
we're
simply
just
not
going
to
do
any
work,
we're
just
going
to
go
right
to
the
database
and
we're
going
to
say.
Okay,
I
have
an
index
report
for
this
manifest
hash,
I'm
just
going
to
return
it
now
again.
This
is
excluding
when
scanners
might
have
changed
or
claire
is
just
configured
in
a
separate
way
or
a
different
way.
A
B
So
another
way
of
deferring
work
is
determining
which
layers
to
actually
scan.
This
is
fundamentally
the
same
as
the
check
manifest
state
just
on
an
individual
layer
basis.
So
again,
content
addressability
indicates
that
if
I
see
this
hash,
if
I
ever
see
it
again,
the
contents
haven't
changed.
Therefore,
I
don't
need
to
rescan
it,
so
it's
another
way
that
we're
able
to
do
less
work
and
another
reason
why
indexing
large
amounts
of
images
might
not
be
as
scary
as
it
sounds.
B
As
long
as
they
are
sharing.
You
know
several
layers.
A
very
common
thing
to
do
is
have
a
base
layer
with
a
dependency
layer
and
then
finally,
a
third
layer
that
just
changes.
You
know
your
application
and,
if
that's
the
case,
then
claire
is
really
only
doing
work
on
a
single
layer.
Every
time
you
push
an
application
update
as
long
as
your
dependencies
aren't
changing,
and
you
write
your
doctor
containers
in
a
sane
way
which
utilizes
this,
this
separation
between
base
dependencies
and
application.
B
This
is
just
touching
upon
when
we
do
decide
that
we're
only
going
to
scan
particular
layers.
We
just
go
right
out
to
our
own
database
that
has
this
information
in
it
already
grab
the
information
and
then
bring
that
information
with
us
for
the
other
steps.
The
other
portions
of
the
pipeline.
B
B
So
that's
all
I
have
for
the
presentation
I'm
into
either
doing
a
little
bit
of
code
digging
or
we
can
go
right
to
questions.
What
do
you
think
diane.
A
A
little
q
a
here
because
there's
one
question
and
the
other
thing
that
I
would
have
you
do-
is
go
to
your
site
and
show
the
schedule
for
community
meetings,
because
you've
just
done
an
amazing
run
through
to
give
people
insights
into
how
to
contribute
and
how
it
all
works.
A
So
I
want
to
make
sure
people
know
how
to
find
you
and
get
into
the
community
and
and
get
started,
and
while
you're
doing
that
I'll
read
off
andre's
question
here,
a
couple
of
them
coming
in
so
which
phase
of
the
general
cicd
pipeline
should
be
the
appropriate
position
for
claire
scanners
after
some
deployment
or
somewhere
as
a
testing,
linting
health
check,
phase
of
ci
cd
or
as
part
of
the
security,
vulnerability
management
and
qa
process.
I
know
you
have
opinions
about
that,
but
I
think
everybody
holds
an
opinion
about
that.
I
think.
B
Yeah
definitely
there's
there's
opinions
about
that
me
personally.
If
I
have
a
build
system-
and
I
am
performing
you
know-
staging
builds
when
you
create
those
staging
containers
and
they
get
pushed
to
a
repository.
That's
really
your
time
to
do
that
scanning
understand
the
vulnerabilities
that
might
be
inside
your
container
before
they
ever
hit
production
right.
If
you
don't
have
a
staging
environment
and
you
simply
push
containers
and
then
deploy
them
to
production,
you
still
have
that
period
of
time
where
you've
built
a
container
you've
pushed
it
to
a
registry.
B
So
in
the
cice
pipeline
I
would
say,
as
a
you
know,
as
a
general
best
practice:
do
it
as
early
as
possible
right
like
as
soon
as
you
have
the
container
built
and
obviously
before
you
push
it
out
to
an
environment,
then
I
would
do
the
scanning
as
early
as
possible
in
your
ci
cd
pipeline.
B
A
And
narandev
has
a
I'm
sure.
It's
a
interesting
thing
that
he's
posted
is
wait.
There's
a
state
management
solution
for
golang.
Can
you
talk
a
little
bit
about
that?
I
you
must
have
mentioned
it
earlier
and.
B
I'm
not
exactly
sure
what
you're
referring
to,
but
we
have
written
that
state
machine
code
just
in
pure,
go
as
a
incarnation
of
our
own
development,
we're
not
using
a
library
for
state
management.
But
if
you
are
interested
in
state
management
and
you'd
like
to
see
how
claire
does
it,
I
would
definitely
check
out
that
code.
B
It's
probably
a
decent
representation
of
what
a
you
know,
ffsm
or
finite
state
machine
implementation
in
go,
could
look
like
yeah
it
was.
It
was
written
to
serve
a
purpose,
it
might
not
be
the
shiniest
cleanest
thing,
but
it
works
and
it
works
well.
So
if
you
do
want
to
take
a
look
at
how
we
work
that
finite
state
machine
architecture,
in
again,
you
can
go
to
our
source,
which
is
clear
core.
B
It's
the
internal
directory,
indexer
and
controller,
and
what's
really
interesting
to
you,
would
be
this
controller.go
and
state.go,
and
this
is
basically
how
we
created
the
state
transition
tables
which
maps
straight
states
to
functions
but
yeah,
no,
no
library,
no
matter
state
management
solution
for
that
we
just
we
just
coded
it.
A
B
A
B
A
But
I
think
what
what
you've
just
done,
which
I
wish
I
could
get
every
upstream
project
to
do-
is
to
really
explain
how
claire
works
internally
and
in
order
to
contribute
to
a
project.
That's
like
one
of
the
often
one
of
the
missing
pieces,
because
you
know
a
bunch
of
engineers
from
red
hat
and
elsewhere
that
have
been
contributing
over
and
over
and
taking
the
time
to
really
explain
how
it
works
is
wonderful,
and
so
I
can't
thank
you
enough.
A
I'm
hoping
that
will
drive
people
who
watch
this
and
want
to
use
claire
in
whatever
projects
or
products
or
states
that
they
want
to
will
come
to
these
community
meetings.
And
then
you
know,
take
a
look
at
it.
Whether
you
want
to
create.
You
know,
take
a
look
at
the
state
management
solution
or
code
base.
I
guess
rather
than
solution
or
contribute
to
this.
A
That
would
be
a
lovely
thing,
and
so
thank
you
for
powering
through
your
power
outage
there
this
morning,
louis
and
making
this
happen
and
my
pleasure
anything
else.
You
want
to
add
in
terms
of
what's
next
for
claire
and
the
claire
community.
B
Yeah,
maybe
just
a
couple
of
touch
points
on
what's
coming
up
on
the
our
internal
agenda
right
now
we
have
the
4.1
release
baking,
and
this
release
has
a
pretty
paramount
feature
called
what
we're
calling
enrichments
what
you
might
have
noticed
when
we
redesigned
claire
v4.
B
We
wanted
to
remove
false
positives
as
much
as
we
can.
By
doing
so,
we've
removed
nvd
as
a
vulnerability
data
source.
We
did
that
somewhat
opinionated.
B
I
think
a
lot
of
people
share
our
opinions
that
mvd
might
not
be
the
best
source
of
data.
However,
when
we
did
that
we
removed
a
lot
of
the
severity
information
that
people
became
accustomed
to
so
the
enrichment
specification
and
4.1
roadmap
goal
is
all
about
allowing
auxiliary
data
information
to
enrich
our
vulnerability
report.
So
we
took
kind
of
a
best
of
both
worlds.
Approach,
in
my
opinion,
is
that
we're
sticking
with
the
official
upstream
vulnerability
data,
but
now
we're
just
enriching
that
data
with
mvd
metadata.
B
So
it's
a
little
it's
a
little
different
from
going
to
nvd
and
trusting
all
of
it.
Instead,
we
have
the
trusted
source
and
then
we're
adding
information
to
the
information
that
we
already
trust.
So
this
is
a
4.1
goal
and
you'll
notice
that
a
lot
of
information
for
vulnerabilities
will
become
richer
and,
if
you'd
like
to
follow
that
development
in
any
way,
shape
or
form,
you
can
go
to
quay
claire.
B
Perfect,
so
you
can
go
to
our
discussions
and
inside
this
design.
Tab
right
here,
you'll
see
claire
in
richmond
specification
and
just
by
the
way
we
practice
open
design.
So
any
big
ticket
changes
that
are
going
to
happen
to
claire
will
be
in
this
section.
So
it's
just
a
good
good
area
just
to
watch,
and
this
is
the
claire
enrichment
specification
and
there's
a
link
to
our
github
repository,
so
the
spec
is
here
and
the
implementation
details
are
here,
I'm
working
mostly
on
this
implementation,
but
community
contributions
are
completely
welcome.
B
Every
single
detail
to
my
best
ability
is
outlined
here.
Some
things
may
come
up
just
from
you
know.
Implementing
software
is
not
it's
not
always
so
easy
to
foresee
everything
that's
necessary,
but
the
majority
and
the
chunk
of
work
that
needs
to
happen
is
all
here
and
welcome
for
community
development.
So,
if
you'd
like
to
speed
up
the
the
rate
at
which
mvd
data
winds
up
back
in
clay,
it's
a
good
one
to
just
you
know,
be
abreast
of
and
take
a
look
yeah
other
than
that.
B
I
think
that
just
kind
of
being
aware
that
we
have
a
community
development
meeting
every
second
tuesday
of
the
month.
A
So
cool,
so
what
I'm
hoping
I'm
looking
to
see
if
anyone
else
has
any
questions,
whether
you're
out
there
in
twitch
land
or
in
blue
jeans,
or
wherever
youtubing
and
watching
this
or
on
facebook,
even
post
your
questions,
otherwise
we're
all
clear
with
all
the
questions,
and
I
will
go
we'll,
let
you
go
back
to
your
day,
lewis
and
if
you
can
share
your
slides
with
me,
I'll
share
it
with
the
community
as
well
and
we'll
upload
this
to
youtube,
and
hopefully
now
that
everybody
understands
how
the
indexer
works
they'll
be
excited
about
contributing
to
it
and
and
come
to
a
community
meeting.