►
From YouTube: Public - Infrastructure Group Conversation
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
Okay,
I
hope
you
can
all
see
that
if
I
did
this
right,
so
I've
been
known
not
to
do
this
right.
So
I
split.
This
group
conversation
in
to
talk
about
lately
about
reliability
and
some
of
the
foundational
aspects
were
focusing
on
because
those
I
think
I
believe
need
to
get
a
little
bit
more
attention
and
wider.
A
What
our
attention
from
from
the
company,
because
they're
important,
but
they
aren't
because
they're
foundational,
they're
kind
of
hidden
and
then
Marian
will
talk
about
delivery
because
they're
doing
a
lot
of
very
significant
work
that
anything
is
exciting
and
moving,
how
we're
doing
continuous
delivery
and
continuous
integration.
So
first
welcome
to
new
hires.
We
have
two
newest
Aries
and
one
D
BRE
we're
coming
from
all
over
the
globe,
and
then
we
just
recently
heard
that
we
actually
have
another
hire
on
the
way
from
the
UK.
So
welcome
everybody.
A
We're
very
excited
to
have
you
here.
I
wanted
to
take
a
little
bit
of
time
to
talk
about,
distorts
notes
which
today
are
running
the
ext4
filesystem,
which
is
actually
a
great
file
system,
but
I
think
we
can
do
better
and
in
discussions
with
the
team,
we've
decided
to
move
to
CFS,
which
is
essentially
a
volume
manager
and
file
system
and
a
single
tool,
and
it
has
some
advanced
capabilities
that
would
really
like
it
has
a
long
history
as
well
tested
is
supporting
multiple
platforms.
A
A
number
of
people
on
the
team
have
had
long
experience
with
this
file
system
and
we
want
to
address
taking
a
page
from
the
security
team
about
securing
and
F.
We
want
to
implement
data
protection
and
that
so
we
want
to
add
as
many
protection
layers
to
the
data
that
we
host
as
we
can,
and
there
are
four
aspects
that
that
were
worried
about
file
system,
corruption,
I'm
one
of
the
lesions,
meaning
things
we
didn't
mean
to
delete
that
mean
quickly
need
to
recover
disaster
recovery
itself.
A
Gos
is
been
working
well
for
us,
and
the
Geo
team
has
been
amazing
in
addressing
issues
that
we've
run
into.
But
there
are
some
things
that
we
can
actually
add
again:
additional
layers
of
protection
by
using
CFS
and
then
testing,
which
is
something
that
we've
been
trying
to
implement
more
extensively
for
a
while,
but
it's
expensive
because
you
are
not
having
to
replicate
data
and
what
we
want
to
do
is
do
do
so
in
a
smart
way.
So
CFS
gives
us
a
very
simple
tool
that
implements
a
volume
manager
and
a
file
system.
A
B
Snapshot,
I'm
really
sorry,
but
this
is
this
is
terrible.
I
think
what
is
going
on
here,
like
this
is
like
I've
run
database
systems
that
are
ten
times
the
size
of
gitlab,
and
we
didn't
need
ZFS
like
this
is.
This
is
ridiculous.
Why,
like
this,
is
that
these
kind
of
data
protections
should
happen
at
the
database
layer,
not
the
file
system.
Layer
like
this
is
solutions.
Looking
for
a
problem
and
I
really
don't
understand
this
well.
A
A
B
Notes,
you
don't
win.
Do
we
F
SDK
of
a
database
server,
you
shouldn't,
you
should
failover
to
a
clean
database
server
like
you,
like,
literally
I
in
the
years
and
years
and
years
that
I've
been
running
database
servers.
I
never
did
anything
to
the
online
fsck
database
like
it
was
always
done
offline
in
triage
after
we
had
failed
over
to
a
replica
like
there's,
no
way
that
we
should
never
be
worrying
about
fsck
on
a
database.
Like
that's
a
wife
hang.
A
B
A
Why
are
we
wasting
time
on
this?
Like
visits?
For
instance,
we've
we've
gone
through
data
that
he's
been
deleted,
that
has
taken
us
hours
to
recover
because
we
didn't
have
local
snapshots.
That's
that's
a
a
second
operation
on
something
that
has
local
snapshots.
We
also
want
to
replicate
terabytes
and
terabytes
of
data
in
a
smart
way.
A
I,
don't
want
to
run
three
environments
that
are
running
at
three
times
the
size
of
the
production
environment,
I
want
to
use
clothes
and
then
once
I
do
a
clone
if
I
expose
that
clone
people
can
use
actual
data
at
the
scale
of
get
lap
comm
and
when
they're
done
using
it
just
a
very
tiny
Delta
from
the
production
data
set.
These
are
things
that
I
cannot
do
only
X
before
these
are
things
that
that
the
application
cannot
do
it
for.
C
Me
I'm
gonna,
I'm
gonna,
interrupt
here
a
bit
then
this
is
I.
I
think
I
think
there's
something
in
what
you
say.
That
makes
a
really
good
point,
but
I
I
think
we
should
be
kind
of
kinder
to
each
other.
When
presenting
the
point-
and
it's
I
think
you
should-
we
should
definitely
talk
about
this.
But
like
saying
this
is
terrible
without
any
qualifiers
is,
is
not
is
a
way
to
get
in
a
very
heated
debate
and
and
we
need
dispassionate
debate.
I
agree,
I,
apologize.
A
Norris
I
I
know
we
can
be
passionate
about
these
things,
I'm
I,
suppose
I
also
come
from
environments
where
CFS
has
actually
done
wonderful
things
for
us
and
we
wrote
the
blueprints.
We
wrote
the
design
they've
been
available
for
a
long
time,
happy
to
sit
down
about
it
to
sit
down
and
talk
about
it
in
detail
again,
I'm,
not
saying
that
this
solves
everything,
and
that
this
is
the
only
solution.
I'm
saying
that
we
can
build
multiple
layers
that
can
help
us
better,
protect
the
data.
D
A
B
Oh,
when
is
my
employment
opportunity
cost
is
this
work
is
blocking
other
work,
other
work
work.
Is
it
block?
Actually,
let
me
look
at
the
slides
and
see
how
many
slides
mentioned
kubernetes
migration.
A
C
A
No,
we
still
have
to
the
migration
and
all
that,
but
again
there
is
number
of
things
that
write.
Kubernetes
is
an
important
thing
and
no
one's
denying
that,
but
if
I
speak
to
people
who
want
to
do
testing
against
production,
datasets
I
cannot
offer
that
solution
at
an
even
remotely
reasonable
cost
today
or
performance,
because
I
need
to
continue
co-op
in
this
data
over
and
over
again.
So
this
isn't
solving
all
the
problems
perfectly
is
solving
some
problems
that
I
can't
solve
any
other
way,
and
it's
not.
A
Yes,
it's
taking
some
resources
because
we're
making
some
investments
on
things
like
testing
environments
and
the
number
one
thing
that
that
folks
have
asked
over
time
about
being
able
to
test
this
I
need
to
test
with
production.
Scale
data-
and
you
know
geo,
doesn't
do
that.
Geo
gives
me
a
copy,
but
I
can't
touch
that
copy
and
if
I
want
to
copy
that
copy,
then
now
I
have
three
copies
with
CFS.
A
I
only
need
two,
and
whatever
little
Delta's
and
I
can
create
a
gazillion
clones
for
people
to
test
against
their
apps
on
the
ephemeral
environments.
As
an
you
know,
my
dream
is
for
an
engineer
to
say:
I've,
build
this
new
feature
and
I
want
to
test
them.
If
I
go-
and
here
is
my
big
data
set
and
it's
super
cheap
to
do
again,
I
think
I
am
more
than
happy
to
have
a
deeper
conversation
as
to
why
we
we
went
down
this
path
and
why
we
decided
to
invest
some
resources
on
this.
A
I'll
set
up
a
coffee
with
you,
Ben
all
right,
I
never
want
to
the
next
one.
So
another
very
foundational
thing
which
we've
been
working
on
amar
has
been
working
on
this
and
other
members
of
the
team
is
our
services.
Information
is
very
imprecise,
there's
a
lot
of
tribal
knowledge
of
what
things
are
and
how
they
work,
and
so
we've
invested
time
in
essentially
standardizing
how
we
represent
services
in
terms
of
metadata.
We
use
this
for
animation.
We
use
this
for
monitoring.
A
We
use
this
for
figuring
out
incidents
and
dependencies
and
and
some
other
meta
data
that
we
need
to
essentially
carry
out
our
business.
So
we've
developed
a
service
inventory
as
an
API
in
front
of
it,
and
it's
it
offers
structured
data
about
services.
This
will
allow
us
to
not
just
do
better
automation,
but
also
to
perform
auditing,
so
when
services
have
to
meet
certain
characteristics,
that
means
that
we
can
ask
this
service
for
information
and
decide
whether
a
service
is
actually
in
an
operational
state.
That
is
that
we
can
work
with
versus.
A
You
know
run
books
and
handbooks
and
we
keep
a
not
wicked
pages
to
do
and
use
that.
But
so
this
has
happened
running
a
marked,
the
Service
Catalog
I
believe
is
called
and
we
help
with
that.
They
put
a
UI
in
front
of
it,
which
is
super
nice,
because
now
we
can
go
and
ask
questions
and
get
very
structure
answers.
D
A
A
So
this
is
this
is
a
way
to
do
away
with
rival
knowledge
and
have
this
in
a
way
that
is
consumable
by
tools.
If
we
eventually
find
that
we
need
to
extend
that
to
external
services,
the
model
is
simple:
it's
you
know
just
a
bunch
of
attributes.
So
if
we
needed
to
do
that,
I
think
it's
just
adding
the
data,
and
there
may
be
some
other
attributes
that
we
may
need.
That
would
be
fine
as
well.
A
I
mean
again,
as
this
start
as
a
as
a
gamma
file
that
Andrew
created
and
then
a
more
expanded
that,
including
that
the
definition
and
then
built
a
thin
layer
that
allows
us
to
ask
questions
about
services,
and
then
they
actually
want
to
give
this
really
nifty.
Api
UI
I
mean
it's
simple,
but
now
I
can
go
and
find
out
things.
So
when
I'm
cranking
through
our
budgets,
this
is
a
lifesaver
great
thanks.
A
I'm,
looking
at
the
door
service
information,
are
we
considering
tracing
I
know
under
instrument
that
they
give
that
yes,
I,
don't
know
how
I
knew
or
if
andrew
is
actually
wiring
these
two
together,
but
I
know
that
if
experience
has
shown
me
anything
it's
that
once
we
have
this
more
authoritative
catalog
of
services,
more
things
will
will
consume
this.
This
data,
so
if
I
assume
Anderson
Nicole
key
may
be
able
to
answer
way
more
thoroughly
than
I
can.
A
E
A
Don't
know
that
the
trace
the
tracing
system
would
use
the
service
directly,
but
I
would
imagine
that
it
does
use,
for
instance,
how
we
represent
a
service
name,
so
that
we
have
a
single
representation
of
a
service
name
so
that
when
you
capitalize
or
or
maybe
yes,
by
ID,
and
then
you
can
look
it
up.
I
know
that
we're
we're
looking
at
using
this
with
our
budgets,
for
instance.
So
instead
of
me
saying
this
is
giddily
and
capitalizing
the
G
or
not.
We
say
that
is
there.
A
That
is
the
thing
that
had
an
issue
and
how
what
that
is
name
I
I
find
that
later
and
if
we
ever
change
some
name
or
in
the
case
where,
for
instance,
we
decide
to
break
up
services
for
some
reason
by
using
the
centralized
service
directory,
we
can
do
those
things
and
the
tools
will
consume
the
data.
We're
not
harkening
service
data
everywhere
and.
A
A
UI
that
the
the
team
built
for
it.
Essentially,
that
was
the
first
thing
they
built
so
that
we
could
actually
play
and
interact
with
it,
and
then
they
built
an
API
so
that
we
can
consume
the
JSON
that
the
API
calls
returned.
So
if
I'm
writing
a
tool
that
requires
the
name
of
the
service,
so
aerobatic,
for
instance,
then
those
are
API
calls
and
it's
the
service
to
get
the
information
it
needs,
because
maybe
okay
I
know
from
an
incident.
A
It
was
caused
by
the
service
and
then
I
need
to
do
attribution
to
the
team
instead
of
hard-coding
that
in
a
spreadsheet
or
everywhere,
we
just
know
that
that
team
owns
that
service.
The
great
thing
about
that,
too,
is
that
let's
say
that
a
service
moves
teams
right,
so
some
other
team
takes
over
a
specific
the
needs
to
update
the
directory
right,
and
so,
when
I
ask
that
same
question.
Three
months
later,
I
get
the
right
answer.
A
E
E
A
This
is,
this
is
less
concerned
with
tracing
in
terms
of
following
I.
Think
of
it
more
of
the
if
the
trace
says:
I
hit
service,
X
and
then
service
y.
If
I
need
more
data
data,
more
metadata
about
service,
X
and
then
service
wine,
then
I
get
it
from
the
service
directory.
So
this
is
very
contextual
right.
So,
if
I
need
to
set
up
some,
let's
say
I
do
some
dependency,
monitoring
and
I,
say:
service
AAA
depends
on
service
B,
and
so,
if
service
B
is
having
an
issue,
don't
hate
me
about
service
a
just.
A
Let
me
know
that
service
B
is
broken,
so
we
can.
We
can
build
some
of
these
function,
I
with
a
directory
right
now.
The
rectory
is
just
very
simple:
we
really
want
it
mostly
to
capture
sort
of
do
the
breakdown
of
the
services
to
begin
with,
who
are
the
owners,
and
then
there
are
a
bunch
of
other
attributes
that
the
team
decided.
We
need
it
onto
service
automation,
but
this
I
mean
will
continue
to
iterate
on
this.
A
Robert,
how
are
minimizing
the
risk
of
CFS
not
having
direct
kernel
Linux
support?
Has
that
changed
in
the
last
year,
so
I
believe
it
has
because
I
believe
now
than
to
actually
ship
satisfy
of
the
kernel.
I
know
we
had
some
conversations
about
the
legal
aspects
of
this
and
we
decided
that,
yes,
we
could
do
it.
I,
don't
know
the
details.
I
believe
they're
in
group
and
if
not
I
can
I
can
definitely
get
the
details.
A
Then,
when
what
is
our
kubernetes
plan,
what
is
the
timeline?
What
is
hold
up,
kubernetes
is
being
worked
on
and
we
were
shipping
to
services.
I
think
this
quarter
on
kubernetes,
so
it
is
in
progress.
It's
not
it's
one
of
our
key
arts,
so
it's
being
work,
but
if
we
want
more
details
about
it,
Dave
would
be
the
person
to
talk
to,
because
right
now,
I,
don't
have
all
the
details
but
we're
working
on
it.
This
court.
E
B
For
example,
I
spent
the
better
part
of
two
weeks
simply
trying
to
scale
up
the
number
of
web
workers,
because
the
process
required
a
significant
amount
of
overhead
to
just
simply
I'd
like
to
have
some
more
web
workers
and
some
more
web
nodes
and
and
getting
that
spun
up
and
getting
that
available
was
not
something.
I
could
do
easily
with
the
current
infrastructure
and
on
kubernetes.
It
would
be
a
significantly
simpler
change
to
say,
make
me
some
more
pods
and
they
would
appear
right.
F
F
B
No
not
yet,
we've
been
working
on
getting
the
data
storage
transferred
out
of
the
local
disk
and
into
Fano
storage.
We
actually
I
just
completed
I
just
completed
one
of
the
down
stepping
of
the
production
data
storage,
so
that
the
Prometheus
servers
only
need
to
store
now
a
week
of
state
in
order
to
operate
and
will
actually
be
stepping
it
down
to
24
hours
of
State
locally
within
each
Prometheus
server,
and
that
will
allow
us
to
migrate
those
to
kubernetes
pretty
easily
cool.
F
That
obviously,
is
impacting
our
costs
as
well
and
I'll
just
quickly
run
through
the
single
codebase
effort.
Where
we're
making
a
quite
a
bit
of
progress,
there
I
think
the
graph
of
the
diff
between
C
and
E
is
trending
down
and
we
are
somewhere
in
the
area
of
two
like
in
the
year
2015
we
had
the
diff
between
cme
the
same
as
it
is
now.
F
F
We
introduced
new
environments
where
we
are
going
to
be
testing
our
new
I'm
just
checking
time
where
we
are
going
to
be
testing
some
of
the
some
of
our
new
tooling
and
we
are
on
our
way
with
automated
weekly
deploys,
so
whether
that's
done
with
kubernetes
or
with
whatever
it
doesn't
really
matter.
At
this
point,
it
will
very
much
matter
as
we
as
we
start
speeding
up
this
deployment
process.
So
we
need
to
change
some
processes
around
development,
as
well
as
how
things
end
up
in
production
and,
finally,
just
a
plug
there.