►
From YouTube: Infrastructure Group Conversation (Public Livestream)
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
hello,
everyone
welcome.
My
name
is
Gerry
Lopez
and
I'm,
the
director
of
infrastructure.
We
are
doing
a
live
crew,
conversation
for
infrastructure
and
we
already
have.
We
had
an
offline
question
about
continuous
delivery,
but
I'm
gonna
ask
Marianne
to
take
on
since
he's
the
one.
That's
provided
the
detailed
answer.
C
D
C
E
I
missed
in
cuisine
in
the
list
of
participants,
I
was
would
be
able
to
skip
this
okay,
but
a
supply
of
increased
deployments
in
production
from
weekly
to
daily
or
even
switch
to
a
continuous
deployment
for
petrol
calm.
And
what
are
the
blockers
now
to
implement
this
change?
Good
thanks,
I
think
because.
C
So
yeah
I
wrote
a
couple
of
things
in
there
that
are
interesting,
so
basically
I
could
say
we
can
tweaked
frequency
of
deployments
right
now.
If
you
want
to
the
problem,
is
how
will
this
affect
the
platform?
How
will
this
affect
all
the
teams
involved
in
this
and
I
wrote
up
some
of
the
examples
there
gave
you
some
links.
C
Basically,
we
need
to
depend
more
on
automated
system
and
metrics
than
on
our
gut
feeling,
which
is
yeah
partially
what
we
are
doing
right
now
when
we
are
promoting
to
production.
So
what
release
managers
do
when
promoting
to
production
is
talk
with
s
eries
see
what
the
state
of
the
platform
is,
get
the
approval
and
then
click
that
button
for
the
deployment
to
go
through.
We
are
working
to
make
sure
that
we
remove
as
much
as
human
interaction
as
possible
and
then
with
for
that
we
need
multiple
stakeholders
to
to
contribute
so
from
quality.
C
The
rest
of
the
infrastructure
or
end
development
is
always
so.
There
is
a
lot
of
responsibility
in
development
to
know
exactly
what
they're
merging
and
ensuring
that
when
a
change
is
deployed
is
where
the
work
actually
is
completed
rather
than
when
you
click
that
merge
button.
So
I'll
ask
you
to
ask
any
additional
questions
from
the
comments.
I
wrote
up,
so
I
don't
go
over
them
again.
D
Yeah
we
want
to
kind
of
talk
food
metrics
and
get
lap,
so
might
get
get
lab
as
the
ability
to
display
Prometheus
metrics
and
want
to
use
that
forget
Lancome.
How
are
we
thinking
of
getting
there?
I
see
a
few
roadblocks.
We
use
multiple
Prometheus
servers,
use
tano's
for
the
long
term
retention
of
metrics.
We
heard
about
some
extra
information,
math
and
graph
Anna.
We
might
have
a
learning
setup
through
that.
So
now
we
have
to
then
have
some
alerting
setup
to
get
lab.
What
is
a
simple
start
to
go?
A
So
I
specifically
don't
have
details
when
I
found
funny
or
been
or
in
the
call.
They
can
talk
about
some
of
these
things,
but
one
simple
star
that
we
decided
to
to
start
with
is
essentially
displaying
our
app
time
and
get
lock-on
itself
as
part
of
the
application.
So
I
think
the
application
should
have
an
understanding
of
whether
it's
functioning
or
not,
and
then
be
able
to
display
that
data.
A
D
The
the
uptime
that
seems
super
useful
and
that's
the
thing
we
can
struggle
to
have
a
single
source
of
truth
or
not.
So
that
seems
super
useful,
but
my
idea
with
dogfooding
or
like
obviously
it's
great
to
use
Gila
for
that,
but
it's
a
whole
new
project
project
that,
as
we've
been
trying
to
get
a
better
understanding
of
our
uptime
for
over
a
year
now.
So
it's
probably
a
complex
new
thing.
Is
there
also
some
existing
measure
that
we
can
move
something
we
already
know
like
it's
not
figuring
out
the
metric,
that's
hard.
A
D
F
And
I'll
add
one
of
the
considerations
here
when
we're
talking
about
dog
fooding,
monitor
is
with
respect
to
metrics
and
graphs.
Our
workflow
isn't
that
s
arrays
are
sitting
with
dashboards
open
or
have
you
know,
Chrome
extensions
that
rotate
one
to
the
next
next
and
they
with
vision,
detect
an
anomaly.
So
the
fact
that
a
lot
of
our
reactive
response
to
alerts
then
has
SR
he's
going
to
dashboards
has
led
us
to
say
well,
we
need
to
keep
these
in
an
external,
non
gate,
lab
comm
location
because
get
lab
the
product
to
monitor,
get
live.
F
Comm
campion
get
live.com,
there's
a
cyclical,
chicken-egg
dependency
there.
So,
in
the
conversations
with
the
monitor,
but
I
do
think
we
have
consensus
generally,
I
know
we're
not
consensus,
driven
that
the
OP
server
might
be
the
right
place
to
do
this,
because
a
lot
of
the
metrics
tend
to
be
a
byproduct
of
the
incident
management
workflow.
We
have
an
incident.
We
have
something
that
we're
trying
to
discern
about
the
status
of
the
system.
That's
not
working
properly.
D
F
D
I
think
that
we
should
be
very
clear.
The
goal
is
not
to
have
some
nice
metrics
on
comm
that
nobody
uses
in
incidents.
The
goal
is
to
have
our
core
work
flow
with
gitlab,
something
we're
expecting
our
customers
to
do
with
their
applications.
So
having
a
metric
on
comm
I,
don't
think
should
be
in
scope
of
this
at
all.
D
I
think
it
should
be
very
much
focused
on
the
observer
and
it
should
be
a
replacement
for
the
graph
on
a
metric,
because
if
you
do
things
on
comm-
and
you
say-
hey
okay,
now
we're
going
to
deprecated
this
graph
on
our
dashboard.
It's
like
well,
if
comm
is
down,
where
do
I
look
so
I
think
we
should
be
much
more
crisp
about
this.
Okay,
one
additional
consideration.
F
That
we
have
is
to
continue
to
embody
a
value
of
transparency.
A
lot
of
our
incident
communications
that
are
public
facing
generally
depend
on
updates
and
comments
to
a
what
we
call
our
production
issue
or
an
incident
issue.
If
that
workflow
moves
to
ops,
then
we
would
have
to
rely
on
a
different
medium
to
communicate
out
to
the
public
the
status
of
the
ongoing
incident.
F
Because
the
metrics
that
would
find
their
way
into
the
issue,
in
addition
to
the
static
end
points,
there
are
a
lot
of
integrations
where
the
graphs
can
be
embedded
into
the
issue
and
while
you're
working
on
the
incidents,
it's
here's
the
snapshot
from
Prometheus
from
this
time,
and
then
you
can
pop
out
and
go
look
at
a
different
dashboard
elsewhere.
But
that
would
all
be
done
with
the
issue
as
the
primary
record
of
of
truth
and
live
state.
F
D
An
that's
an
interesting
question:
I,
don't
think
it
gets
any
worse
like.
Currently,
we
have
private
dashboards
because
we're
using
craf
on
our
dashboards
that
get
about
Nats
a
lot
more
than
dashboards
that
get
Lancome
so
that
this
is
not
a
change.
I
do
think
for
incidents.
We
have
a
communication,
Officer
or
person
who's
designated.
To
do
the
communication
is
that
correct,
that's
correct!
D
A
Thank
you.
So,
okay,
since
we're
having
the
conversation,
we've
waded
into
a
number
of
different
aspects
about
new
metrics.
So
one
comment
that
I
understood
you
making
was
if
guitar
comes
down,
then
having
this
uptime
I'll
get
Lancome
makes
no
sense
and
I.
Think
Anthony's
point
is
we
do
use
production
issues
as
one
of
the
ways
in
which
communicates.
So,
if
the
logic
for
the
first
one
applies,
then
that
is
also
racing.
A
D
We
should
I
think
it
doesn't
make
sense
like
if
there's
a
serious
incident
and
that's
that's
one-
that
people
are
gonna
hold
us
accountable.
You
can't
be
using
an
issue
on
comm,
so
make
sense
to
move
them
to
the
observer,
and
instead
of
linking
to
the
production
issue
via
Twitter,
we
should
link
to
a
live
stream
of
the
communication
person.
D
G
We
want
to
create
extra
overhead
in
the
process
to
communicate
publicly.
It
seems
a
little
bit
inefficient
if,
if
we
already
have
everything
going
into
issues,
we're
communicating
there
and
that's
actually
the
primary
place,
having
the
additional
responsibility
of
somebody
reading
that
out
and
showing
it
seems
inefficient
to
me
that
you
know
it's.
C
Not
an
extra
overhead,
because
the
manager
on-call
responsible
for
that
communication
already
has
to
do
something
like
that.
They
need
to
ensure
that
the
discussion
in
the
incident
is
flowing
while
the
engineers
are
working.
Sometimes
you
go
into
different
alleys
instead
of
keeping
focused
on
the
challenge
at
hand,
so
they
already
have
to
do
those
those
things
like
they
need
to
communicate
over
Twitter.
They
need
to
update
the
public
with
what's
happening
so
that
overhead
should
be
minimal,
and
it's
mostly
just
that
you
need
to
start
up
a
YouTube
livestream
communication
communication
incident.
H
Is
the
most
important
thing
in
an
incident
like
I?
Don't
know
how
to
stress
it
like
fixing
the
issue
is
super
important
and
figuring
out
a
workaround
or
how
to
quickly
address
the
issue
is,
is
also
important,
but
actually
communicating
customers
what's
going
on
and
that
you're
aware
of
it
is
is
in
many
instances
more
important
than
actually
resolving
the
issue.
Assuming
that
you're
gonna
resolve
it
quickly.
Thereafter,.
H
Hi
I
understand
what
you're
saying,
but
we're
splitting
hairs
and
generally
what
happens
in
those
situations
is
the
person
who's
trying
to
solve
the
problem
sends
more
time
trying
to
solve
the
problem
than
actually
communicating,
and
then
people
are
like
okay.
Are
they
really
working
on
it?
Or
is
this
not
a
concern
sure.
F
And
that's
why
we
have
to
find
the
role
of
the
C
Maki
and
the
I
mock
for
those
that
aren't
where
we
have
a
incident
manager
on
call
which
is
the
nighlok
acronym
and
then
there's
a
communication
manager
on
call
which
is
a
specific
role.
That's
less
involved
with
actually
discovering
the
underlying
cause
or
the
solution
to
the
incident
and
only
concerned
with
the
communication
to
internal
and
external
stakeholders.
In
this
case
the
business
and
our
customers.
F
So
I
think
it's
worth
splitting
hairs
in
saying
that
that
second
role,
the
Communication
Manager
on-call
when
it
comes
to
what
they're
communicating,
is
simply
current
status
and
expectations
and
provided
those
two
needs
are
Mads
for
the
stakeholders.
I
think
the
way
that
that's
communicated
out
is
something
that
we
can
iterate
on,
because,
frankly,
that
person
needs
an
inlet
of
communication
as
well
and
whether
they
can
feasibly
screen
share,
while
also
being
in
another
zoom,
where
they're
consuming
communication
and
like
may
not
be
realistic,
but
I
think
there's
definitely
an
opportunity
to
explore
there.
Let's.
D
Change
that
so
so,
first
of
all,
very
important
manager
of
the
incident
is
not
the
communication
person.
This
is
a
full-time
I
have
full-time
as
in
completely
focused
on
communicating
it
shouldn't,
be
a
separate
zoom
call,
so
this
person
should
be
should
not
be.
There
should
not
be
the
incident.
Zoom
condition
called
there's
one
zoom
call,
but
for
that.
A
Means
so
let
me
let
me
try
an
hour
transmits
a
little
bit,
so
that
means
having
people
joy.
You
know
live-streaming.
The
incident
call
which
we
can
not
do
because
sometimes
the
information
that
flows
through
people's
screens
is
you
know
that
makes
sense.
So
we
stopped
doing
that
a
long
time
ago
and
I
am
NOT
going
back
there
second
sort
of
parroting
an
incident
over
a
call,
makes
no
sense.
Incidents
tend
to
be
very
chaotic
events
by
the
song,
not
by
the
sign
by
their
own
nature.
A
So
you
would
have
someone
who's
trying
to
read
an
issue
much
like
Dylan
point
it
out
and
then
there's
gonna
be
times
where
we
were
going
down
a
path.
Then
we
realize
that
some
of
the
paths,
then
we
go
back
that
gets
communicated
on
the
incident
call.
So
this
individual
we're
actually
have
to
be
on
two
calls
at
the
same
time,
trying
to
make
sense
of
what's
happening
in
one
filter
in
real-time
passing
things
to
the
other
I
mean
it
would
be
messy,
I
think
having
an
issue
that
people
can
trail
they
control.
A
The
comments
is
the
right
way.
Everybody
does
it
that
way:
we've
been
doing
it
that
way.
For
a
long
time,
we
tried
the
rule,
live
Google,
Docs
that
grass,
in
a
fantastic
way,
because
lots
and
lots
of
people
would
get
on
that
on
that
document.
So
I
think
the
issue
is
Plus.
Twitter
has
worked
well
to
keep
end-users
well
communicated.
We
fail
sometimes
that
sort
of
keeping
the
clock
in
saying.
Okay,
every
15
minutes
we're
gonna
give
an
update.
A
E
D
That
that
we're
gonna
tell
confidential
information.
Okay,
is
there
another
way?
Is
there
I
don't
know?
Can
we
I'm
gonna,
ask
obvious
questions
that
I
think
I
know
the
answer
to,
but
is
there
a
way
to
open
up
obstacle
app
net
in
a
way
that
we
have
make
a
public
project
for
the
incidents
where
we
open
up
the
issue
tracker
only
to
the
public,
maybe
not,
maybe
that
whole
server
is
cordoned
off
also
to
prevent
the
DDoS
or
something
like
that.
But
is
there
a
way
to
have
public
issues
on
that
server.
C
C
D
A
Me
ask
one
quick
question
before
that,
so
if
the
objective
is
to
have
something
that
is
out
of
band
from
kidnap
calm
to
provide
updates,
so
people
can
see
the
issue
and
not
necessarily
participate
building,
something
that
builds
sort
of
static
versions
of
that
on
a
regular
basis,
and
simply
it's
you
know
it
sits
in
front
of
like
we
can
do
starter.
Second
I
thought
comment
point
to
the
right
issue
at
the
root,
and
it's
just
every
minute
is
just
scraping
whatever
comments
and
posting
them
like.