►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Please
feel
free
to
drop
them
there,
we'll
get
to
as
many
as
we
can
either
during
or
at
the
end.
Whichever
way
works
out
best,
this
is
an
official
webinar
of
the
cncf
and,
as
such
is
subject
to
the
cncf
code
of
conduct.
A
Please
do
not
add
anything
to
the
chat
or
ask
any
questions
that
would
be
in
violation
of
that
code
of
conduct
and
please
be
respectful
of
your
fellow
participants
and
our
lovely
presenters
who
are
sridhar,
vincr,
vincractrumen
and
luke
rota
and
I've
totally
butchered
it
I'm
going
to
let
you
say
it
beautifully
with
ops,
cruise
and
chicago
trading
company
to
kick
us
off
today.
B
Yes,
hi,
hey
welcome
all,
and
you
got
my
name:
okay,
no
problem.
This
is
sridevi.
B
You
know
we
have
a
a
opportunity
for
us
to
sit
down
and
discuss
chicago
trading
company's
journey
to
to
the
cloud
with
an
observability
with
our
own
luke
rotor
who's,
the
manager
of
sre
and
observability
at
the
chicago
trading
company
right,
I'm,
the
founder
and
the
chief
architect
for
ops
cruise
and
look
forward
to
talking
to
you.
So
look.
C
Hi,
thank
you
really
appreciate
the
invite
here
to
speak
about
ctc's
cloud
journey,
and
so
I'm
gonna
be
going
into
some
some
deep
dives
about
our
technology
as
well
as
overall,
the
company
itself
and
and
what
it
is
that
we
do
and
then
how
that
intersects
with
ops
crews
all
right.
C
So
next
I
will
get
into
here
who
is
chicago
trading,
so
we
were
founded
in
1995,
so
we've
been
around
for
for
quite
some
time.
C
Our
mission
is
to
make
markets
better
and
provide
liquidity
when
it
matters
most.
So
you
know
being
in
the
market
and
I'll
talk
about.
What
that
is,
is
is
paramount
to
us.
So
chicago
trading
company
is
a
market-making
proprietary
trading
firm.
So
you
know
what
does
that
mean?
It
means
we
represent
both
the
buyer
and
and
the
seller
in
the
marketplace,
and
those
marketplaces
are
still
today
some
trading
floor
venues,
but
largely
electronic
venues
run
by
exchanges
such
as
cme
group,
bats,
new
york,
stock
exchange.
You
know
chicago
board,
of
options.
C
C
We
are
headquartered
in
chicago
and
we
have
offices
in
new
york
and
london
right
now,
we're
around
600
people
or
so
and
and
we're
rapidly
growing.
So
you
know,
we've
doubled
in
size
almost
in
in
the
last
four
years,
so
a
lot
going
on
at
ctc
definitely
exciting
times
one.
C
So
since
markets
you
know,
are
really
always
on
at
any
one
point
in
time.
In
the
world
there
are
a
few
market
pauses,
depending
on
which
exchanges
you're
trading
on.
For
the
most
part
you
know,
markets
are
trading
23
hours
a
day,
or
at
least
the
markets.
We
trade
in
we're
trading
about.
You
know
23
hours
a
day
in
over
20
markets
across
the
world,
and
you
know
some
of
those
markets
are
even
open
on
the
weekends.
Many
of
them
are
closed.
C
C
So
overall
we
have
a
very
narrow
window
of
time
in
which
we
can
release
software
changes.
So
observability
is
really
important
to
us,
as
we
have
to
understand
the
state
of
our
software
at
any
one
point
in
time
so
I'll
get
into.
You
know,
where
kind
of
why
that's
important
as
we
go.
We
have
hundreds
of
applications
that
make
up
our
trading
platform
that
our
traders
and
quants
use
every
day
to
run
our
strategies,
engineers
and
and
quants,
and
even
traders
write
code.
C
You
know
most
of
our
code
is
written
in
python,
c,
plus
and
java,
and
so
you
know
I'll
go
into
here
now,
a
little
bit
about
the
technology
stack
and
some
of
the
you
know,
challenges
that
we
currently
face.
C
So
our
current
environment
is
is
made
up
of
of
a
mixture
of
things.
We
we
pride
ourselves
on
our
research
that
we
do
our
pricing
and
our
risk
management.
C
C
You
know
that
may
change
over
time.
Depending
on
how
market
structure
changes.
There
are
exchanges
that
have
struck
deals
with
big
cloud
providers.
C
You
know
new
york
stock
exchange,
as
well
as
cme
group,
have
recently
struck
deals
to
extend
their
systems
into
into
the
cloud
providers.
So
you
know
we'll
have
to
see
where
that
takes
us,
but
for
right
now
there
is
still
a
need
to
be.
You
know,
co-located,
some
other
things
that
have
traditionally
kept
us
on
premise.
C
Are
things
like
multicast
type
protocols,
customized
hardware
configurations,
data,
locality
right
these
things
can
be
challenging
at
times
in
a
cloud
environment,
honest
you
know,
similar
it'll
kind
of
also
combine
with
the
on-prem
is
lower
latency
requirements.
So
we
we
aren't
a
shop
nests.
Necessarily
that
you
know
is
worried
about
every
nanosecond,
but
we
do
need
to
compete
for
speed
again.
It
is,
is
table
stakes
these
days
to
have
some
level
of
speed
in
order
to
to
compete
with
others
in
the
market
place.
C
So
low
latency
is
important
to
us.
There
are
customized
server
and
switching
configurations
that
aren't
available
in
the
cloud
as
well.
As
you
know,
we
have
specialized
algorithms
that
take
advantage
of
this
hardware
capability
and
so
low
latency
is
also
traditionally
in
our
kind
of
on-prem
data
centers
and
doesn't
run
in
a
cloud
environment.
C
C
So
we,
like,
I
said
before
we
we,
you
know.
Research
is
one
of
the
things
that
we
pride
ourselves
on
and
there's
a
lot
of
computing.
That
needs
to
be
done
when
you're,
researching
and
doing
things
like
back
testing
back
testing
is
a
is
a
common
practice
in
a
trading
environment
to
see
how
your
strategies
are
are
working
overtime,
and
that
requires
a
lot
of
compute.
C
So,
in
order
to
do
that,
on-prem
we
would
have
to
really
scale
and
that
that
can
be
very
costly
and
difficult,
and
so
that's
where
one
of
the
areas
that
the
cloud
has
become
front
and
center
because
we
can
scale
there
quicker.
We
can
take
advantage
of
tooling
and
native
cloud
functions
in
a
cloud
environment
and
we
can
also
leverage
economies
of
scale.
C
Those
tools
are
being
challenged
and
there's
gaps
in
those
tools,
and
so
that's
where
a
tool
like
op
screws,
that's
where
they
can
come
in
and
provide
some,
not
only
technical
value
but
business
value.
C
So,
as
we've
began
to
adopt
containers
and
change
our
architecture,
you
know
we've
really
had
to
rethink
our
approach
around
monitoring
and
observability.
C
So
what
has
the
the
cloud
native
shift
looked
like
for
us
so
really
markets?
You
know
the
markets
are
ever
changing,
they're,
always
moving
faster.
C
The
data
sets
are
ever
increasing
and
we
are,
you
know,
trying
to
always
stay
ahead
of
the
curve,
and
so
one
of
the
things
we've
been
challenged
with
over
the
years
and
continue
to
be
challenged
with
is
getting
our
ideas
into
production
as
quick
as
possible,
so
that
we
can
understand
the
impact
that
it's
having
on
our
business.
Is
it
working?
Is
it
not
working?
C
So
it's
really
important
that
we
can
iterate
quickly
with
that
said,
we
also
need
to
keep
our
outages
low
right.
So,
like
I
started
out,
saying
you
know
being
in
the
market
is
really
paramount
to
us.
Not
only
do
we
provide
a
service
to
the
markets
for
our
customers,
but
you
know
it
there's
also
opportunity
cost
for
really
any
trading
firm
when
they're
not
participating
in
the
marketplace
right.
So
if
you're
not
participating,
there
isn't
a
ability
to
capture
opportunity.
C
So
so
really
we
had
to
think
about.
How
can
we
scale
right,
and
so
we
started
moving
down
the
path
of
microservices,
which
initially
we
started
breaking
up
monolith
applications
into
smaller
things,
but
they're
still
highly
dependent
on
each
other,
so
it
has
become
more
of
a
distributed
monolith
and
so
now
we're.
We
need
to
continue
to
modernize
our
application
architecture
and
adopt
a
cloud-native
approach
and
start
using
things
like
containers
and
k-8s
and
cloud
providers.
C
Things
like
you
know
azure
in
aws.
So
so
we
we
really
started.
You
know
to
modernize
once
again
and
really
to
reduce
our
slow
iteration
cycles
and
be
able
to
get
our
ideas
into
production
faster.
C
C
But
this
has
significantly
changed
the
way
in
which
we
monitor
and
observe
our
applications.
So
we've
had
to
really
rethink.
How
do
we
fill
these
gaps
right?
How
do
we
know
what
a
container
is
doing?
How
do
we
know
how
kubernetes
is
working
right?
There's
a
lot
of
a
lot
of
things
in
play
when
you
introduce
these
new
technologies
that
traditional
monitoring,
observability
tools,
weren't
necessarily
built
from
the
ground
up
to
handle
these
situations?
C
So
we've
we've
really
evolved
our
focus
on
telemetry
right,
so
focus
on
instrumentation
what
we
should
be
logging,
what
are
our
metrics
preparing
for
things
like
tracing
and
so
as
as
we've
done,
that
we've
really
looked
to
focus
in
on
open
source
tools.
C
So
one
of
those
sorry
so
we've
we've
started
to
focus
in
on
open
source
tools
and
and
really
that's
because
we're
trying
to
solve
some
pain
points,
one
of
them
being
the
swivel
chair
right.
So
we
have,
we
have
logs,
we
we
have
metrics.
We
have
dashboards
a
lot
of
traditional
monitoring
tools.
Today
they
do
a
pretty
decent
job
of
bringing
all
of
these
things
together
into
one
dashboard,
but
there's
still
a
bit
of
context.
C
C
As
we
know,
logging
data
typically
doesn't
decrease
right,
typically,
at
least
in
ctcs
case-
we're
always
creating
more
applications
right
and
we're
logging
more
and
trying
to
understand
more,
what's
what's
happening
with
an
application
right,
but
you
can
only
store
this
data
for
so
long
and
it
be
then
it
becomes.
It
can
become
very
costly
to
store
this
data
in
in
a
closed
sourced
vendor,
and
it
can
be
hard
to
assimilate
this
log
data
with
other
metric
and
tracing
data.
C
C
They
allow
you
to
avoid
vendor
lock-in
and
they
give
you
flexibility
with
that.
You
know
you
do
need
to
have
some
knowledge
about
these
open
source
tools
and
at
least
at
ctc.
You
know
we're
continuing
to
build,
build
our
knowledge
right,
but
you
know
it's
it's.
Certainly
it's
been
a
journey,
we
don't
we
don't
have
all
the
knowledge,
and
so
it
can
be
difficult
to
hire
for
those
skill
sets
or
or
build
it
internally.
C
C
As
you
start
to
build
the
knowledge-
and
you
start
to
figure
out
what
data
you
want
to
collect,
it's
still
trying
to
assimilate
that
data
right,
and
so
that's
where
something
like
a
smart
layer
on
top
of
it
that
can
natively
plug
into
open
source
tools.
So
you
don't
you
don't
have
to.
C
You-
can
still
continue
to
use
your
investment
in
open
source
and
and
the
flexibility
that
it
provides.
But,
in
addition,
you
can
get
a
smart
layer
on
top
of
that
and
I'll
talk
about
that
in
a
little
bit
and
how
ops
crews
comes
in
and
some
of
the
the
business
value
that
it
provides
there
all
right.
So
next
I
will
talk
about
here.
C
So
so
here's
kind
of
you
know
the
layout
of
some
open
source
tools
out
there
right
things
like
prometheus,
loki,
jaeger,
ops,
crews,
out
of
the
box
works
with
all
of
these
natively.
So,
even
if
you
have
these
tools
today,
there's
you
know
not
much.
You
really
need
to
change
and
you
can
leverage
all
of
your
investment
in
your
current
telemetry
collection
right.
C
So
telemetry
collection
itself
has
really
become
commoditized
by
a
lot
of
these
open
source
tools,
and
so
you
don't
have
to
spend
you
know
you
don't
have
to
lock
in
with
the
vendor.
When
it
comes
to
collection
right
you,
you
can
get
that
in
an
open
source
way.
C
So
there's
also
open
telemetry.
There's
several
standards
within
open
telemetry
different
protocols
that
you
can
use
which
work
with
prometheus
loki.
Yeager
right,
you
can
use
it
for
logs,
metrics
and
tracing.
C
You
can
use
the
open,
telemetry
libraries
within
your
application,
and
so
everything
really
from
a
data
collection
and
sending
standpoint
can
be
done
in
an
open
source
way
and
you
don't
have
to
worry
about
walking
into
a
vendor.
The
only
thing
you
don't
get
is
a
smart
layer
and
that
smart
layer
being.
C
Things,
like
you,
know,
machine
learning
and
and
telling
you
more
insights
into
your
data
right
things
that
traditional
tools,
traditional
monitoring
and
observability
tools-
that
they
don't
always
have
those
capabilities
right,
they're,
very
good
at
collecting
the
data
and
giving
you
ability
to
graph
the
data
but
contextualizing.
The
data
really
is
the
next.
The
next
evolution,
at
least
from
my
perspective,
when
it
comes
to
observability.
C
All
right,
we
can
go
to
the
next
one,
all
right
so
ctc
and
op
screws.
This
is
where
the
the
the
intersection
really
happens
here
and
where
the
business
value
comes
in,
so
one
of
the
things
afterwards
provides
is
telemetry
unification
and
support
right,
so
they
can
bring
all
of
your
logs
metrics
and
traces.
C
All
of
that
can
be
collected
in
an
open
source
non-vendor
provided
way.
They
will
gather
that
data.
They
will
display
that
data.
They
will
contextualize
that
data
and
you
know
it
it,
and
it
also
leverages.
You
know,
like
I've,
said
the
the
collectors
today
that
are
out
there
like
prometheus
loki
and
yeager.
So
if
you
have
prior
investments
there
that
those
will
not
be
wasted,
it
also
has
flow
tracing
in
it.
C
It
uses
this
to
do
present
the
application
map
which
street
r
will
go
into,
and
so
you
don't
have
to
do
any
customized
tracing
to
get
an
application
map
and
how
everything
is
interconnected.
C
It
it
is,
is
done
without
any
development
time
at
all
architectural
governance
right,
so
it
provides
you
inventory
of
where
your
containers
are
running,
how
they're
running
what
they're
doing
where
they're
running
you
have
a
lot
of
view
into
that,
and
so
that
it
really
brings
it
all
together.
It's
easy
way
for
you
to
understand
inventory,
of
where
things
are,
at
least
for
me.
C
So
this
this
is
really
important
piece
and
there's
a
lot
of
business
value
here,
because
the
quicker
I
can
find
something
and
understand
what
is
going
on
the
quicker.
I
can
understand,
root
cause
and
solve
the
issue,
and
then
it
brings
ml
into
the
fold
right.
So
I
don't
have
to
apply
a
lot
of
human
power
to
understand
and
assimilate
data.
C
The
ml
learns
over
time
and
can
present
to
you
issues
that
it
has
found
right
and
a
lot
of
times
that
can
take
on
on
the
order
of
days
for
engineers
to
find
a
configuration
that
might
be
causing
an
issue
in
the
environment.
I
I
personally
just
ran
into
this.
A
few
days
ago,
I've
had
engineers
spending
hours
or
days
actually
on
trying
to
find
an
issue
within
kubernetes
that
a
tool
like
op
screws
through
its
ml
could
easily
provide
in
minutes.
C
All
right
next.
C
Go
to
then
yep
all
right,
so
the
features
that
I
that
I
really
enjoy
about
ops
cruise
is
one
this
application
map
right.
It's
it's
really
intuitive.
It's
amazing!
It's
out
of
the
box.
I
don't
have
to
do
any
custom
tracing
any
custom
development
right.
I
don't
have
to
invest
any
development
time.
I
can
just
install
the
agents
and
I
can
start
my
applications
and
I
get
an
application
map.
C
The
next
thing
that
I
really
like
about
the
apps
crews
tool
is
fault,
isolation
and
cause
analysis
right.
So
this
there's
a
lot
of
business
value
here,
right,
it's
it
can
be.
You
know
anyone
can
manage
open
source
tools
and
collect
data.
That's
pretty
easy
to
do
right,
assuming
you
have.
The
skill
sets
to
do
it.
C
What's
not
easy
to
do
is
how
to
assimilate
assimilate
that
data
and
and
find
issues
within
the
data
right.
It's
really
powerful
when
a
tool
can
tell
you
something
quicker
than
what
you
could
find
out
researching
on
your
own
in
ctc's
line
of
business
every
minute.
Every
second
really
counts.
C
When
there's
an
outage
right
when
time
is
ticking
away,
we're
losing
our
we're
losing
the
ability
to
capture
opportunity,
and
so
this
is
where
the
business
value
of
ops
crews
really
comes
in
to
be
able
to
contextualize
this
data
and
find
faults
quicker,
the
quicker
you
find
them,
the
the
sooner,
at
least
for
ctc,
we're
back
in
the
market
and
capturing
opportunity
all
right.
The
next
the
next
feature
here
is,
you
know
it
makes
more
data
available
right
it.
It
pulls
all
of
your
log
tracing
and
metrics
data
together
right.
C
It
pulls
it
into
one
view,
and
you
and
really
anybody
can
log
in
and
see
this
data
right,
and
so
you
have
to
have
you
don't
have
to
have
as
much
operational
expertise
of
how
a
dashboard
was
built
or
how
it's
being
presented.
C
All
of
that
is,
you
know,
pulled
together
in
an
easy
to
read
view
within
obscures,
and
then
the
final
thing
is
being
able
to.
You
know,
look
back
with.
You
know
topology
and
understand
where,
in
the
topology
of
an
application,
things
may
have
broke
down
or
whether
you
might
be
experiencing
it
in
the
error,
and
then
I
guess
not
necessarily
a
screenshot
for
this
right,
but
one
of
the
things
that's
hard.
C
That's
really
invaluable
in
my
mind,
is
that
ops
crews
has
an
extreme
amount
of
knowledge
and
expertise
with
open
source
tools
and
kubernetes
itself,
as
well
as
running
and
operating
in
a
cloud
native
world.
Their
expertise
is
invaluable,
they're,
an
excellent
partner.
They
can
really
help
guide
you
with
your
challenges
and
either
collecting
or
presenting
data,
and
so
you
know
I
just
wanted
to
mention
that
as
well
is
an
additional
thing
that
I
I
really
appreciate
about.
Obstacles.
C
That's
so
I'll
hand
it
off
here
to
to
sridhar,
and
he
can
jump
into
some
more
technical,
deep
dive
about
opt-offs
cruise
tool
itself
and
I'm
happy
to
answer
any
questions
afterwards.
Thank
you.
B
Hey
thanks,
luke.
That
is
great
thanks
for
speaking
so
highly.
What
about
us,
so
I'm
going
to
spend
a
few
minutes
going
through
some
slides.
That
gives
you
an
idea
of
what.
Why
do
we
do,
what
we
do
and
what
we
do
right
so
so,
though
we
all
of
us
know
this,
it's
worth
spending
a
minute
catching
up
on
the
the
what
the
fundamental
challenges
are
in
in
today,
where
observability
has
to
make
some
changes.
B
The
first
thing
is,
things
are
very
complex,
right,
there's
a
lot
of
abstractions
everywhere
and
and
that
abstraction
creates
points
of
performance
loss,
performance
issues
also
just
like
traffic
bunch.
Ups
move
around
the
system,
they're
no
longer
static
and
easy
to
catch.
There
are
also
dependencies
and
they
depend
we
build
systems
with
lot
more
dependencies
today
than
we
did
some
time
ago,
and
that
makes
a
brings
in
a
set
of
problems
that
you
to
deal
with.
And
finally
we
mentioned
this
before
look
also
mentioned.
B
The
fact
that
you've
got
to
have
you've
got
to
recognize
and
understand
that
the
speed
at
which
we
update
and
and
move
the
products
forward
is
increasing
and
that
these
are
the
three
complexities
that
really
form
the
undergird
or
girth
the
the
observability
goals
of
of
today.
But
all
this
actually
can
be
seen
to
have
a
problem
of
disjointed
monitoring.
So
there's
no
problem
for
data.
Today,
data
comes
out
from
everywhere.
We
look
at
right,
but
the
and
you
can
bring
the
data
together.
Say
I've
got
it
here.
B
I've
got
it
in
the
dashboard
here.
I've
got
a
dashboard
there,
but
that
is
a
a
data,
rich
information,
poor
situation
that
we
have
seen
in
the
in
the
in
the
market
right
and
because
of
that,
and
and
and
partly
in
spite
of
that,
we
have
its
manual
and
lacks
closed
loop
resolutions
and
that's
a
problem
that
needs
to
be
dealt
with.
B
So
these
were
the
challenges
we
thought
we
should
try
and
and
solve
as
as
obscures,
and
and
as
these
complexities
and
dependencies
and
dynamics
increases,
the
problem
becomes
a
multiplier
and
it
becomes
so
instead
of
a
building
saw
a
small
number
of
thick
things
which
you
focus
was
looking
at.
Inside
that
thing
we
have
a
larger
number
of
thin
things
and,
and
that
changes
the
way
thing
behave
right.
So
the
whole
emergent
behavior
becomes
a
part
of
the
problems
scenario
in
today's
systems.
B
Now,
so
how?
What
else
do
we
need,
in
addition
to
the
normal,
a
telemetry
that
we
look
forward
to
and
know
and
love
like
metrics
and
logs,
and
all
that
clearly,
traces
have
added
a
lot
of
value
and
that
traces
are
a
important
aspect
of
of
telemetry.
But
there
are
other
things
we
need
to
know
right.
We
need
to
know
the
structure
and
dependencies
of
the
application
right.
It's
not
just
about
moving
a
trace
from
one
container
to
another.
It's
also
about.
Is
it
going
to
lambda?
What
sort
of
rds
is
it
accessing?
B
Is
there
and
dealing
with
so
there's
a
dependency
and
structure
that
you
have
to
deal
with?
We
also
need
to
look
at
every
element,
whether
it's
a
container
in
a
360
degrees
view
right.
What
is
coming
in
what
is
going
out?
What
resources
is
it
using?
What
services
does
it
provide?
You've
got
to
look
at
that
completely
right.
The
third
thing
is
you've
got
to
have
a
a
way
by
which
you
can
understand
the
behavior
of
the
application.
B
Slos
and
thresholds
are
useful
and
and
provide
value,
but
that
they
don't
represent
the
behavior
of
the
app
of
a
container.
They
don't
represent
what
it's
all
about,
because
things
are
non-linear.
It's
not
just
a
linear
situation,
so
understanding
and
having
a
better
way
by
which
we
can
look
at
a
container
is
important.
And
finally,
we
see
that
in
all
of
these
systems,
expertise,
human
expertise
has
to
be
figured
out.
Human
expertise
has
to
be
laid
into
the
architecture
so
that
it
can.
B
It
can
complete
the
story
and
and
make
it
valuable
for
everybody.
So,
having
said
this,
what
do
we
do?
We
have
two
things.
One
is
clearly
observing
and
following
the
standards
is
important,
the
more
we
have
standards,
the
better
it
is
for
everyone
and
costs
and
quality
goes
up
and
costs
come
down.
So
that's
one
thing
that
we
all
agree
and
that
we
should
do
then,
beyond
that
there
are
two,
so
many
great
tools
which
are
conforming
to
those
standards
as
well
as
open
source.
B
So
we
also
felt
that
we
got
to
lay
our
strategy
on
top
of
these,
this,
these
two
things,
the
standards
and
the
open
source
tools.
So
what
we
do
is
we
are
not
in
the
collection
business
we
don't.
We
don't
even
intend
being
the
organization
which
keeps
your
data
for
long
term
right.
We
only
keep
it
for
the
purpose
of
providing
operational
support
in
the
short
term,
so
you
have
the
freedom
to
move
your
data
wherever
you
want
and
keep
it
for
whatever
other
purposes.
B
So
so
some
of
the
data
that
we
we
all
we
all
know
is
the
and
the
types
of
tools
are
is
shown
in
this
in
this
slide,
in
addition
to
the
metrics
logs
and
traces,
we
get
configurations
right,
the
orchestrators
like
kubernetes,
which
we
support
and
then
of
course,
nomads
and
other
things
which
are
there.
Also
they
provide
a
level
of
configuration
information
and
in
a
standardized
way,
which
is
a
very
important
for
observability.
B
Then
you
obviously
need
to
get
changes
and
topology
as
to
as
the
the
systems
are
more
constructive
with
small
things.
You
need
the
topology
without
which
you
can't
deal
with
with
observability
completely
and
finally,
the
knowledge
of
each
application.
How
do
we
understand
each
particular
application?
How
does
this
one
work
differently
from
the
other
one?
These
are
the
pieces
of
information.
We
feel
we
have
to
add
to
the
story
of
observability
before
we
com
we
can
handle
it
all
together.
So
that's
what
we
do.
B
We
bring
in
all
this
other
information,
in
addition
to
the
metrics
logs
and
traces,
to
give
you
a
complete
story.
How
do
we
deploy?
This
is
a
diagram
which
got
lots
of
colors
in
it
and
like
a
mccarter
projection,
it
may
show
our
tools
much
larger
than
your
application,
really
what
we
just
provide
a
set
of
pods,
which
are
either
they
are
demon
set
replacements
like
in
the
case
of
node,
exporter
and
c
advisor.
B
B
So
all
of
that
we
monitor
and
variable
there's
a
simple
deployment
deployments
is
easy
to
do
in
a
few
minutes
and
instantaneously
data
is
available
to
for
you
to
see
so
some
of
the
problems
we
solve
is
we
try
to
solve
and,
and
we
succeed,
is
you
know
kubernetes
day,
one
problems
day,
two
problems
and
a
lot
of
people
are
in
the
day
one
and
day
two
situation,
so
you've
got
to
figure
out
basic
things,
but
also
the
communities
itself
makes
it
very
complex
because
of
these
interdependencies
and
multiple
objects
that
it
defines.
B
So
we
solve
a
lot
of
kubernetes
problems.
We
solved
a
lot
of
technology,
related
issues,
serverless
we
do
sum
in
serverless
and
and
so
that
you
can
see
if
an
application
is
talking
from
a
container
to
a
to
a
lambda
function.
You
want
to
include
that
in
your
overall
observability
in
an
integrated
way.
We
do
that
too.
So
these
are
some
of
the
types
of
problems
that
we
had
to
deal
with
in
in
the
interest
of
time
I'm
going
to
switch
over
to
oh
sorry,
I
clicked
the
wrong
button.
B
B
Okay,
so
it
we
discovered
this
automatically.
All
of
this
is
picked
up
by
our
system
within
five
six
minutes
of
you
installing
it,
and
you
automatically
do
a
picture
of
your
your
environment
and
here
the
dead
star
view
and
and
when
you
look
at
each
one
of
these
things
and
it's
expanding
it
a
bit
for
you
to
see
you,
you
get
to
see
these
different
pieces.
For
example,
you
have
a
rds
sitting
there.
We
have
a
elbs
in
another
part
of
the
system
and
we
have
different
containers.
B
Figure
out
where
it
is
yeah,
so
you
can
pick
any
one
of
these.
You
know
tools.
For
example,
you
can
see
a
container
immediately.
The
container
information
is
available
for
you
to
see
and
and
in
not
only
that
it
gives
you
all
the
metadata
and
all
that
that
you
get
from
kubernetes.
B
It
also
gives
you
metrics
that
is
picked
up.
You
can
also
pick
up
its
labels
that
are
there
and,
and
at
the
same
time
I
mean
right
there.
You
can
see
it's
logs
connections,
traces
and
even
a
three
layer
view.
A
3d
view
tells
you
that
this
application
is,
you
know,
is
running,
and
this
is
the
information
about
that
container,
for
example,
and
it
also
shows
you
on
which
node
in
the
kubernetes
is
running
and
its
neighbors.
B
In
case
you
have
a
noisy
neighbor
problem
and
at
the
same
time,
what
infrastructure,
node
you're
running
on
say
in
this
case
it's
aws
your
instance
name
and
the
metrics
on
that
aspect
of
your
orchid.
So
you
get
a
all
these
things
integrated
and
made
available
instantaneously
right
from
from
the
system.
Now
you
can.
This
is
also
happens
for
for
other
things.
For
example,
if
you've
got
you're
looking
at,
say,
rds,
I'm
a
little.
B
Everything
is
slow
when
this
thing
is
going
on.
So
here
you
get
information
about
rds,
which
we
also
pick
up
and
make
make
it
available
to
you
right.
So
you
have
different
ways
of
servicing
you.
For
example,
this
gives
you
a
service
view
the
service
interactions
without
the
pods
and
see
how
they
are
all
connected
together,
and
you
can
filter
them
with
all
of
these
filters
and
and
save
them
and
all
those
standard
things
that
you
see
in
in
dashboards,
but
just
quickly
moving.
B
You
get
the
node
map
which
gives
you
discovers
all
your
nodes
and
shows
you
the
these.
The
information
that
you
need
about
the
nodes,
including
its
its
metrics
configurations.
Even
for
example,
you
want
to
see
when
we
talked
about
providing
more
data
for
to
more
eyes,
it's
much
easier
for
people
to
use
such
a
system
than
to
get
into
kubernetes.
You
don't
even
have
to
teach
them
the
operational
tools
they
can.
The
app
folks
can
access
the
system
and
be
able
to
see
what's
happening
with
their
part
of
the
application.
B
Now
we
also
provide
some
analysis
which
helps
in
capacity
planning,
looks
at
all
your
pods
in
the
node
and
tries
to
estimate
the
usage
of
your
cpu
memory
and
all
of
that
and
tell
you
whether
it's
effect
evictions
are
possible
whether
you
need
to
provide
a
better.
This
is
look
at
this.
It's
burstable
right,
so
you
may
not.
You
may
be
evictable,
so
these
type
of
things
are
also
providing
this.
I'm
just
running
through
this
quickly.
To
give
you
a
quick,
quick
idea.
Similarly,
we've
got
other
things
right.
B
We
have
a
k
test
view
which
gives
you
all
the
k
test
things
you're
looking
at
the
node
you've
got
five
nodes,
click
on
the
nodes.
You
can
see
all
the
node
information
provided,
and
then
we
have
a
service
performance
dashboard
here
here.
What
we
do
is
the
the
same
technique
that
helped
us
build
the
app
map.
What
it's
doing
is.
B
I
don't
know
what
right
now,
but
I'll
come
back
to
in
a
second,
so
we
have
events
logs
functions
which
are
easy
to
understand
you
can
you
get?
You
can
pick
up
a
see,
all
your
logs
and
search
on
them
and
look
at
the
labels
that
they've
been
with
these
are
functions
that
we
are,
I
think,
they're
all
familiar
with.
B
These
are
communities
events,
and
it's
also,
you
know
the
processing
of
these
events
in
the
system
and
I'm
having
an
issue
here.
But
what
you
see
in
the
service
performance
dashboard
is
we
see
the
urls
right,
every
service
to
service
interaction
and-
and
maybe
that's
what
this
is.
B
Yeah,
so
what
we
see
here
is,
for
example,
you
can
pick
up.
B
B
B
B
As
soon
as
you
install
the
software,
so
as
as
a
total
we've
got
logs
events,
time,
travel,
search,
communities
and
time
travel
is
something
very
unique.
What
we
do
is
every
five
minutes.
We
take
a
snapshot
of
the
system
and
we
store
it.
So
you
can
go
back
any
time
in
the
past
and
look
at
how
your
techno
your
topology
look.
For
example,
you
have
a
problem
at
12
o'clock
in
the
afternoon
and
at
four
in
the
evening.
B
You
want
to
look
at
what's
the
problem,
what
took
place
12
o'clock,
what
you
find
is
that
the
topology
has
changed.
Your
scale
scale
on
your
pods
are
not
available
anymore.
So
what
would
you
look
at?
You
need
a
way
by
which
you
can
go
back
in
time
and
see
the
topology
the
way
it
was
and
that's
what
our
time
travel
feature
provides.
B
You
can
go
back
and
say:
okay,
there
were
four
more
nodes
and
23
more
pods
at
12
o'clock
and
those
were
the
ones
with
problems
and
maybe
not
the
ones
which
are
running
today
and
we're
running
at
this
point.
So
that's
our
idea
of
time
travel,
which
we
think
in
adds
value
in
a
dynamically
changing
world.
So
this
is
a
quick
summary.
B
Finally,
there's
alerts
right.
These
alerts
are
are
done
in
multiple
ways.
We
generate
alerts
from
logs
and
metrics
and
in
the
traditional
way
we
have
that
too,
and
we
also
generate
alerts
from
our
machine
learning,
which
provides
different
ways
by
which
you
can
see
how
it,
without
setting
thresholds,
we're
able
to
tell
you
that
there
is
something
wrong
in
this
container.
Something
is
off,
it's
not
normal.
B
You
know,
and
and
that
available,
for
example,
here
here
is
a
a
metric,
which
I
mean,
an
alert
which
has
come
by
automatically
without
any
thresholds,
say
that
something
is
off
here
and
these
three
metrics
it
give.
You
also
give
you
explanation,
saying
that
these
three
metrics
have
sort
of
created
a
this
situation,
where
we
think
you're
doing
something
wrong
or
something
off.
Now
we
look
at
a
lot
more
metrics.
We
look
at
about
34
metrics,
just
for
a
cleaner
and
our
machine.
Learning
algorithms.
B
Look
at
all
of
this,
but
then
identifies
that
because
a
combination
of
of
metrics
that
we
have
from
our
learned
past
behavior
some
you're,
not
in
not
normal,
and
these
are
the
three
things
you
ought
to
look
at.
So
that's
our
ml
model.
The
other
type
of
alert
that
we
do
is
is
and,
for
example,
a
chain
model.
We
we
look
at
the
chain
of.
B
A
response
time
has
taken
place
right
and
our
analysis
system
runs
through
and
looks
at
all
the
containers
that
are
in
the
path
and
and
suitably
you
know,
prepares
the
information
makes
it
available
to
you
saying
there
is
a
problem
that
you
had
a
problem
in
the
slo
here,
but
we
see
down
the
chain.
You
have
a
problem
in
this
car
cash
and
that's
got
a
problem,
and
here
this
card
server
has
a
problem.
B
So
if
I
just
click
on
that-
and
I
look
at
its
its
analysis,
I
get
to
see
more
about
what
that
particular
one
segment
where
that
that
last
piece
in
the
in
the
chain
had
a
problem
right.
So
you
get
to
see
the
whole
chain,
and
this
is
discovered
right.
This
is
discovered
on
the
fly
based
on
the
problem
that
took
place
in
that.
B
So
you
could
have
hundreds
of
containers,
hundreds
of
pods
running
and
services,
but
it
isolates
the
tree
down
based
on
that
current
situation
and
presents
it
to
you
to
see
where
the
problems
are
and
then
isolate
it
to
to
a
is
action
that
you
take
to
fix
it
now
now
there
are
other
normal
things
like,
for
example.
Let
me
just
give
you
one
more
a
it's
a
replica
account
that
is
not
okay.
B
It
sort
of
figures
that
out
and
also
does
some
level
of
analysis,
and
it
can
give
you
some
information.
There's
a
you
know,
problems
many
things
we
could
live
in
a
from
the
ionosphere
image
tags
image
invalid
image,
name
volume
amount,
failed,
missing,
config
map.
All
this
is
analyzed
and
provided
to
you
as
part
of
our
rca
process,
all
right.
So
that
comes
to
the
end
of
of
the
quick
demo
that
I
wanted
to
do
today.
Let
me
go
back
to
the
slides.
B
Okay,
so
any
questions
it's
time
for
us
to
question.
We
did.
We
were
close
to
the
time
here.
So
any
questions
from
anybody.
C
I
I
think
there
are.
There
are
some
questions
here.
Sridar
around
you
know
its
interaction
with
istio,
as
well
as
any
any
kind
of
overall
system
performance
impact
which
I
think
cesar
largely
addressed
as
well,
and
then
I
think,
a
search
option.
Yeah.
B
Let
me
just
I
just
saw
the
questions
now,
it's
cool!
Yes,
let
me
just
answer
the
one
by
one
quickly.
Yes,
we
do
work
with
sdo
with
sdo's
there
we
are
able
to
pick
up
flow
data
from
the
sdo.
We
we
set
up
certain
configurations
in
sdo
which
allows
us
to
pull
useful
metrics
from
sdo.
Yes,
we
do
work
with
it.
Then,
if
you're
already
running
the
open
source
tools,
there
are
customers
who
already
use
the
open
source
tools
that
we
suggest.
So
we
don't
have
to
install
anything.
B
But
if
you
don't,
we
would
do
that
for
you
and
and-
and
we
just
connect
your
existing
tools-
that
you
are
only
moving
right.
So,
if
you're
running
the
open
source
tools
in
a
cluster,
the
data
movers
or
gateways
use
up
less
than
1gb.
So
that
sort
of
addresses
that
question
about
the
system
impact
right.
B
C
Right
yeah,
I
think
with
you
know,
I
don't
see
any
other
questions
here,
but
I
think
just
just
to
add
a
few
things
to
what
schroeder
was
demoing
and
why,
at
least
from
a
customer
perspective,
my
perspective,
it's
important
is
you
know
those
being
able
to
diagnose?
C
You
know
those
you
know
like
he.
He
showed
what
is
what
is
the
container
doing
right,
showing
faults
and
things
like
that?
Those
are
things
that
my
engineers
spend
a
lot
of
time,
trying
to
figure
out
right.
C
So
there's
a
lot
of
value
if
I
can
find
those
things
quicker
through
a
tool
right
and
that's
where
ops
cruise
really
provides
some
uniqueness
in
being
able
to
define,
find
those
issues
quicker,
as
well
as
to
understand,
you
know
thinks
since
we're
in
such
a
dynamic
environment
when
it
comes
to
kubernetes,
it
can
be
hard
to
piece
together
what
happened
in
time
right
so
seeing
what
what
it
was
versus.
C
So
this
is
where
you
know,
ops
cruise
is
really
unique
in
in
observability
overall,
rather
than
trying
to
provide
dashboards
that
you
have
to
that
help.
You
find
root
cause,
but
you
still
have
to
figure
it
out
on
your
own.
It
really
contextualized
all
that
information
for
you
and
that's
where
it's
it's
really
unique
and
set
apart
from
other
tools,
at
least
in
my
mind,.
B
Just
you
can
view
us
as
a
smart
layer
that
sits
on
your
telemetry,
but
leaves
you
with
all
the
options
that
you
want
to
manage
your
telemetry
yourself
right.
So
you
could
be
using
your
data
for
purposes
other
than
just
basic
observability.
It
could
be
for
product
management,
it
could
be
for
customer
and
market.
There's
so
many
reasons
that
you
need
to
use
your
telemetry
that
you
don't
necessarily
have
to
leave
it
with
us
right.
You
can
keep
it
on
your
own
cloud
and
but
we
what
we
take.
C
Yep-
and
it's
also
really
important
from
a
user
perspective
too
right,
because
a
lot
of
observability
tools,
you
know
really
people
lean
on
folks
and
sre
operations
and
things
like
that
to
really
determine
what's
going
on
and
how
to
interpolate
the
data
in
op
screws.
You
know
developers
can
and
operations
they
can
all
play
in
the
same
tool
and
it's
very
easy
to
understand.