►
From YouTube: Project Operations 2019-08-19
A
Stephen
have
enough
credits
he's
done.
It
welcome
to
the
ipfs
Bifrost
gateway
as
a
service
product
eyes.
Let's
run
ipfs
really
well
for
lots
of
users
regularly
weekly
call
on
the
agenda.
Today
we
have
a
TV
okay,
our
check-in.
We
don't
need
to
spend
too
much
time
on
that.
Let
me
just
share
my
screen.
A
A
A
B
A
B
A
B
Terraform
has
a
different
view
of
the
world
and
the
config,
then
nginx
or
then
instable,
and
so
when
building
nginx
it
writes
an
invalid
config
file
that
config
file
fails
when
you
try
and
restart
the
daemon
that
causes
user
data
fail.
That
causes
terraform
to
fail
and
like
the
box,
is
just
in
a
sad
statement.
So
you
go
massage
it
and
get
some
things
over.
Can
we,
let's.
A
B
B
So
it's
it
is
not
able
to
deploy
the
certs
that
the
nginx
config
file
mentions,
and
so
you
can't
restart
the
service.
So
we
partly
migrated
off
of
it.
We
just
left
terraform
in
its
add
state
should
not
have
let
it
be
there
and
it
would
be
nice
if
we
could
wire
up
a
nightly
to
catch
that
a
little
earlier,
but
I'm
Midway
fixing
and
should
be
half
of
the
PR
for
that
up
within
the
hour.
It's
just
deleting
stuff
from
terraform.
Basically,
nice.
A
C
A
B
C
C
A
A
A
I've
raised
the
issue
for
it,
but
I
was
wondering
if
you
guys,
having
had
any
thoughts
about
it
to
make
sure
that
my
assessment
is
correct,
with
we're
relying
on
bird
bgp
and
dns
to
do
a
kind
of
passive
failover
when
a
machine
completely
dies
like
the
bird
it's.
The
third
thing.
Third
demon
has
to
fail
for
the
thing
to
be
taken
out
and
then
there's
still
a
kind
of
latency
of
a
few
minutes
for
the
DNS
they
stopped
routing
traffic.
Is
that
correct,
yep.
C
Yeah
I,
yes,
it
would
be
potentially
a
priority
to
deliver
the
redesign,
as
in
have
the
nginx
and
bird
daemon
coupled
on
one
machine
and
then
I
think
I.
Think
some
people
have
suggested
that
having
the
the
nodes
upstream
of
those
so
that
we
can
at
least
have
like
some
sort
of
back-end
checks
and
fail
overs,
so
I
think
right
now,
yeah.
The
idea
that
bird
is
our
only
health
check
is
a
big
flaw
and
probably
contributes
to
a
lot
of
our
issues.
Cool.
A
At
this
point,
what
would
prompt
a
health
check?
Look
like
my
my
suggestion
was
like
a
type
EFS
get
requests
for
the
empty
directory
seems
pretty
light.
I
want
Steven.
Is
that
a
reasonable
health
check
at
this
point?
Do
we
have
anything
more
satisfactory,
or
is
that
a
good
one?
Sorry
is
everything
else?
It's
that's!
Okay,
we're
talking
about
useful
health
checks
that
an
IP
FS
demon
is
in
a
functional
state.
So
so
the
nginx
can
stop
routing
traffic
to
a
stock
like
BFS
process
and.
D
C
D
To
the
block
circuit
of
caching
everywhere,
but
I
think
these
actually
just
like
forced
to
go
through
bit
swamp,
occasionally
yep
and
like
at
the
time
of
one
machine,
fetch
timing
of
the
Machine
yeah,
and
you
actually
really
want
like
very
so.
If
you
just
want
to
test
it
like
is
this
than
you're
responding.
A
A
A
From
the
level
of
we
want,
they
get
get
requests
for
data
to
it.
So
this
is
on
a
gateway
node,
its
fronted
by
nginx,
it's
high
traffic
and
we
keep
saying
I'd,
give
us
demons
getting
kind
of
ultimately
frozen
up
or
unresponsive.
Maybe
that
they
are
responsible
is
I,
think
near
yes,
I
would
yeah.
We
don't
have
the
answers.
Failure.
D
Okay,
isn't
like
the
first
enough,
not
serving
data
that
they
have
frozen
up,
just
not
surfing
things
you're,
exactly
with
a
way
to
have
that
level
of
granularity
that
I'm,
assuming
it's
frozen
up,
not
sort
of
things.
They
have
locally
cached,
which
means
that
yeah,
the
solution
is
you
that
you
find
out
the
Machine.
You
add
a
files
up.
If
you
try
to
fetch
on
the
first
machine,
I
told.
D
Piping
dated
through
otherwise
it's
like
there's,
not
that
much
like.
So
sorry,
we
don't
really
even
happening
many
locks.
Okay,
so
there
is
one
potential
problem
here:
maybe
you
find
out
a
file
descriptor
somewhere
or
about
nginx,
refuses
to
open
more
than
certain
number
of
connections
like
that.
It's
like
literally
it's
not
responding
to
versions
and
that's
a
big
issue.
D
B
A
D
D
A
C
D
C
That
was
something
that
I
wondered
about
for
sure.
If
we
can
just
use
some
garbage
collection
tuning,
it
seems
like
it
doesn't
run
that
frequently
and
then,
when
it
does
run
it
just
actually
frees
up
the
box.
We
get
a
out
of
memory
error
on
AMS
nothing,
so
it
could
be
due
to
do
that
kind
of
thing
right
if
they're,
if
you're
the
you
see,
Q
gets
too
too
large
or
everything
yeah
yeah
well,.
D
It's
not
a
to
see
a
piece
of
crap
the
way
QC
works.
Is
it
locks
entire
datastore,
which
means
we're
doing
how
morgan
has
to
walk
so
like
the,
although
in
this
case
this
is
a
funky
case,
it's
the
case
for
like
we
don't
have
anything
actually.
Is
there
anything
pinned
on
these
game
events
or
not?
So
there
is
now
okay,
so
that
would
make
GC
slightly
so
exact
used
to
walk
that
data
first
and
then
walk
the
datastore
and
figure
out.
What's
knocking
that
data
all
right
just.
A
The
thing
that's
changed
on
the
gateways
recently
is
because
we're
seeing
slow,
slow
response
times
for
content
that
is
on
cluster,
even
though
we're
trying
to
maintain
connections
to
cluster
we're
regular
things
just
slow
response
times.
So
we
have
deliberately
pinned
all
of
the
pre
PL
websites
and
the
file
coin.
Pretty
brown
use
to
the
gateways
which
and
in
total
that's
about
20
gigabytes
of
content,
including
destroyed
the
FSBO,
which
is
the
lion's
share
of
the.
D
A
D
A
D
D
Don't
cache,
then,
once
it's
done,
that
it
walks
through
everything
in
the
data
store,
but
just
actually
read
ahead
and
deletes
everything.
That's
not
like
within
the
set
of
things
it's
already
read
so
there's
reason.
Some
things
to
keep
then
reads
instead
of
thing:
X
it
has.
If
it's
not
some
things
to
keep
it
clean,
looks
like
if
you
have
a
few
things
to
keep.
It's
been
much
faster,
actually
reading
all
the
data,
a
second
user,
putting
a
lot
of
things
GS,
you
see,
will
slow
down.
D
D
Has
not
really
followed
a
package
managers,
because
actually
you
just
need
to
be
able
to
it
and
retrieve
files
quickly.
They
don't
really
need
to
GC
that
much
like
it's
somewhat
important,
but
like
actually
a
lot
of
actors
want
to
keep
old
packages
like
running
activist
infrastructure.
People
do
care
about
like
GC,
and
all
these
like
higher,
like
bigger
system
issues,
yep.
D
D
A
A
Anywho,
okay,
let's
put
a
pin
in
that,
so
the
outcomes
of
that
are
because
we
are
now
telling
things
on
the
gateway.
Gc
is
now
costing
us
more
in
terms
of
RAM,
RAM
and
CPU
time
when
it
does
a
TC,
so
we
may
trigger
GCS
GC
more
proactively,
so
that
its
operating
rather
on
a
full
repo,
but
wouldn't
we
just
wouldn't
we
just
shrink
down
the
high-water
mark
like
the
max
storage
Feli
rather
than
my
product?
Are
we
trading
it
just
give
it
a
small
earth
mix,
that'll.
D
D
D
D
D
A
A
C
C
They
were
seeing
the
download
it
would
start,
but
then
it
was
gonna
taper
off
to
like
you
know,
zero
bytes
being
transferred
people
would
cancel
so
they're
getting
frustrated,
so
they
decided
to
try
using
IP
yet
so
they
would
be
able
to
to
to
download
the
proof
parameters
without
involving
the
Gateway
and
what
was
interesting
about
that,
those
that
got
it
working.
There
were
some
problems,
but
those
that
got
it
working
still
had
major
problems
with
downloading's
of
crews
grounders.
So
some
people
reported
in
a
1.5
gigabyte
file.
C
It
would
take
two
and
a
half
hours
or
or
more,
and
they
would
see
the
same
behavior
of
the
download
tapering
off
and
then
starting
again.
So
this
is
a
little
bit
of
anecdotal
/
evidence
that
there's
potentially
some
of
the
issues
that
we've
been
facing
on
the
Gateway
are
not
unique
to
the
Gateway
and
are
more
global
to
the
I.
Feel
fast
Network,
so
just
in
context
for
some
of
the
things
that
we've
been
seeing
aren't
necessarily
remedied
by
using
a
direct
connection
or
by
bypassing
the
Gateway
yep.
A
C
A
My
understanding
of
this
is
that
there
there
was
some
we've
got
patches
into
the
Gateway
nodes
and
we
have
DHT
boosters
but
correct
me
if
I'm
wrong
Steven,
but
we
sort
of
we
know
that
there
are
still
performance
issues
with
the
DHT
and
really
yes
yeah.
So
we
we
know
if
we
know
for
sure
that,
like
we
are
not
out
of
the
woods,
these
are
the
you're
not
even.
D
A
C
Just
wanted
to
say
that
my
main
area
of
interest
from
the
from
Petra's
planning
document
seems
to
be
around
service
levels
and
indicators
and
I
just
want
to
point
out
that
I'm
still
quite
blocked
on
making
any
progress
on
that,
because
we
still
don't
have
a
lot
of
great
metrics
that
are
coming
in
so
like
one
blocker
is
definitely
the
yes
exporter
thing.
So
I
just
want
to
make
sure
that
that's
still
a
prioritized
thing,
because
I
like
to
be
able
to
have
a
success
rate
with
a
gateway.
But
currently
that's
that's
impossible.
With
metadata.
C
Because
there's
there's
only
the
only
metadata
or
like
labels
that
are
that
are
associated
with
those
status
codes
is
actually
there,
there's
barely
any.
So
you
get
a
you,
you
can
see
that
we
have
like
say,
for
example,
of
four
nine
nine
code,
but
there's
no
other
information
that
is
relevant
to
it.
So
we
don't
have
like
what
the
request
path
is
refer
like
any
of
that
sort
of
stuff.
C
So
you
have
four
four,
nine
nine
are
a
big
potential
source
of
like
people
are
having
problems,
but
because
it
could
be
anything
we
have
no
way
of
trying
to
make
sense
of
what
those
four
nine
nines
mean
without
some
more
relevant
labels
attached
to
it.
So,
yes,
exporter
would
export
all
the
fields
that
we
currently
see
in
our
law
and
yes
or
in
Habana,
which
wouldn't
give
us
a
lot
more
options
so
like
be
a
duplicate
and
make
sense
of
the
four
9/9
which
is
critical
for
our
success
rate
indicator
thankee.
C
Exactly
and
as
I
say
that
that
also
blocks
incident
or
being
able
to
declare
a
incident
I
made
a
comment
on
that
on
that
issue
and
I.
Think
it's
yeah,
that's
true.
We
only
have
black
box
metrics
still
and
that's
like
extremely
scary.
That's
you
want
to
have
some
indicators
that
are
actually
pointing
to
some.
You
know
precise
data
that
that
we
have
okay.
A
C
A
A
no
op,
but
it
does
so
your
Isis
through
the
passes,
through
a
wait
when
you
do
call
to
connect
as
through
this
kind
of
school.
It's
not
this
like
fixed,
hard-coded,
100
school,
that
it
passes
through
the
connection
manager
and
what
I'm
not
clear
about
is
what
happens
inside
the
connection
manager.
With
that
score,
does
it
then
reset
the
score
back
to
100?
If
it
you.
D
A
Not
set
so
it
doesn't
matter,
I
gotcha,
so
just
with
continually
cuz
the
tag
is
user
connect
and
the
value
is
100
and
if
I
keep
calling
that
it's
always
100
and
it
can
never
be
anything
other
than
100.
Yes,
okay,
good
tonight,
all
right
we're
out
of
time
we're
four
minutes
over.
If
anyone
has
any
questions
you
can
say
now,
but
it's
times
up,
alright,
there's
so
much
more
to
talk
about
these
meetings
are
too
short
at
30
minutes
keeps
us
from
see
you
later
I.