►
From YouTube: Infrastructure Orchestration from the Ground Up
Description
Slides from the workshop: https://docs.google.com/presentation/d/1VJ09Bz4NQ6ud3XIH_0RMWrC3THqt1A-iZhoKgjo-FN8/edit#slide=id.p
The Graph is an indexing protocol for querying networks like Ethereum and IPFS. Anyone can build and publish open APIs, called subgraphs, making data easily accessible.
Follow The Graph on social media
Twitter: https://twitter.com/graphprotocol?s=20
Instagram: https://instagram.com/graphprotocol
LinkedIn: https://www.linkedin.com/company/thegraph/
GitHub: https://github.com/graphprotocol
Website: https://thegraph.com
A
Welcome
everyone
to
another
mips
workshop
today,
we'll
be
talking
about
infrastructure
orchestration
from
the
ground
up
overall,
this
is
quite
a
big
and
complicated
workshop,
and
it's
just
the
content
is
quite
large.
So
next
slide,
please.
A
A
So
on
tuesday
we
have
talked
about
indexer,
best
practices
and
most
of
those
work
to
do
with
software
configuration
and
specifics
of
the
application
stack.
So
in
today's
workshop
we'll
try
to
cover
all
aspects
of
setting
up
your
indexer
from
picking
hardware
to
spin
it
spinning
it
to
picking
storage
all
the
way
to
day-to-day
operations.
A
I
do
want
to
note
that
this
is
one
perspective
from
many
many
diff
from
research
from
many
different
options
out
there
from
running
an
indexer
and
also
from
the
community,
and
I
also
want
to
welcome
anyone
who
anyone
to
challenge
any
of
the
information
that
we
will
be
talking
about
and
feel
free
to
jump
in
with
questions
at
any
point.
A
So
the
the
structure
of
what
you
should
think
about
when
navigating
a
sea
of
options
in
a
very
complex
environment
will
be
our
agenda,
so
we
will
start
with
a
very
quick
review
of
the
indexer
software
stack.
This
is
something
you've
seen
before.
We'll
just
remind
you
so
that
you
know
what
we're
talking
about,
then
we
will
talk
a
bit
about
hardware
and
the
decisions
to
make
and
how
to
pick
that
hardware,
then
we'll
talk
about
understanding
the
different
types
of
blockchain
nodes.
A
They
will
cover
the
importance
of
storage
when
we're
talking
about
indexing
and
then
we'll
move
on
to
orchestra
orchestration.
So
you
have
picked
your
hardware
and
your
storage,
and
now
you
need
to
make
things
happen.
So
here
we
will
talk
about
declarative
infrastructure
versus
imperative
infrastructure.
A
We
will
also
briefly
touch
upon
running
containers
versus
self-compiling
versus
self-compiling
and
then
we'll
look
into
docker
versus
kubernetes,
and
then
we'll
quickly
talk
about
the
tools
available
and
a
quick
intro
to
sre
and
what
that
means
and
at
the
end,
we'll
have
more
time
for
q
a
next
slide,
please,
okay,
so
quick
review
of
the
indexer
software
stack.
A
So
let's
quickly
remind
you
how
this
diagram
works
as
an
indexer
operator,
so
the
guy
with
the
sunglasses
emoji,
you
have
to
first
decide
what
sub
graphs
you
will
be
indexing
and
once
you've
decided
that
you,
in
order
to
interact
with
the
orchestrated,
indexer
staff,
a
stack,
you
will
use
the
indexer
cli
to
instruct
the
indexer
agent
on
which
subgraphs
to
allocate
to,
as
well
as
their
cost
models,
and
these
details
and
metadata
will
be
stored
into
the
postgres
indexer
data
database.
A
The
index
node
would
then
store
index
data
into
a
postgres
sql
subgraph
database.
At
this
point,
the
subgraph
data
is
indexed
and
available
for
querying.
So
so
that's
the
detail
and
in
the
inductor
infrastructure
block
of
this
diagram,
you
can
also
see
the
different
subs
software
components
that
you
would
be
expected
to
run.
So
now,
if
we
go
to
the
next
diagram
to
the
next
slide,
please
this
slide
shows
which
of
those
software
components
are
stateful,
so
require
persistence,
and
this
would
be
the
blockchain
node,
the
postclass
sql.
A
A
And
this
is
important
because
later
on,
we'll
talk
about
storage,
and
this
is
where
the
storage
comes
into
place.
Next
slide.
Please.
A
A
Then,
in
order
to
scale
to
be
able
to
maintain
all
of
those
child
chains,
you
would
need
the
stateless
pieces
of
of
your
application
so
graph
node
for
the
index,
node
graph
node
for
the
query,
node
and
the
indexer
service.
You
need
to
scale
those
as
well
to
sustain
the
capacity
of
multiple
chains,
and
you
would
ideally
scale
them
horizontally
next
slide.
Please.
A
And
the
indexer
agent,
you
will
only
ever
need
one
instance
of
it,
even
when
you
are
indexing,
multiple
chains
and
as
your
stack
is
growing.
So
next
slide.
A
So
now
that
we
have
spoken
about
reminding
you
how
the
actual
stack
looks
like,
let's
talk
a
bit
about
hardware
and
the
decision
to
make
and
based
on
the
software
stack
that's
available
next,
please,
okay!
So
here
we'll
be
talking
about
the
cost
of
doing
business.
A
The
cost
of
the
business
because
running
an
indexing
operation
is
running
a
business
and
everyone
who
has
run
a
business
before
at
c
suite
level
would
know
the
terms
of
capex
and
opex,
so
capex
or
capital
expand
expenditures
refers
to
investments
in
the
actual
physical
resources
behind
the
technology,
so,
for
example,
installing
servers,
network
pipes
or
hardware
in
the
in
the
old
traditional
in
it
world,
and
then
we
also
talk
about
opex,
which
is
operating
expenses
or
the
expenses
to
run
day-to-day
business,
and,
if
you
think
of
opex
as
how
cloud
computing
services
are
procured
and
and
capex
as
the
standard
model
for
traditional
I.t
procurement,
so
creating
a
data
center,
where
you
own,
where
you
own
the
the
servers
and
as
chris
is
reminding
us
capex
is
generally
something
that
happens
ones
off
is
at
the
beginning
of
the
business.
A
A
Okay!
So
here
we
have
a
diagram
of
the
hardware
with
initial
versus
ongoing
cost.
So
if
you
decide
to
pick
a
self-owned
bare
metal
servers,
those
have
a
really
high
capex,
because
you
buy
them
upfront,
they
will
cost
you
a
certain
amount
of
money
and
then,
every
month
from
there
on
your
cost
will
only
be
things
like
electricity
and
rent.
If
you
pay
rent
where
you
are
and
stuff
like
that,
so
very
high
capex
initial
cost
and
very
low
opex,
so
ongoing
monthly
cost.
A
If
for
looking
at
bare
metal
cloud
and
by
bare
metal
cloud
here,
I
specifically
mean
servers
that
you
are
renting
from
a
provider.
Think
of
there
are
so
many
providers
out
there
digital
ocean.
A
I
think
someone
was
pointing
out
to
vulture
or
something
like
that
just
yesterday,
so
there
are
many
options
out
there.
So
for
that
option
for
bare
metal
cloud
where
you
are
just
granting
servers
on
a
monthly
basis,
you
have
no
capex
to
begin
with,
but
your
opex
is
will
be
fairly
high,
a
lot
a
lot
higher
than
than
the
opex
for
cellphone
bare
metal,
and
we
specifically
didn't
put
any
numbers
here,
because
the
cost
variables
depend
on
provider
depend
on
the
location.
A
Your
servers
are
hosted
in
and
on
the
hardware
spec
as
well,
and
the
last
option
is
public
cloud
which
refers
to
refers
to
managed
usually
virtual
machines
somewhere
in
a
data
center,
and
it
has
its
advantages,
we'll
cover
them
in
a
bit,
but
that
test
stands
to
have
a
higher
cost
higher
opex
cost
than.
A
A
So,
let's
talk
about
managed
public
cloud,
so
this
is
your
cla.
Your
gcloud
is
your
aws
and
so
on.
So
some
of
the
pros
of
managed
public
cloud
is
that
you're
not
exposed
to
hardware
failure.
You
don't
have
to
worry.
If
your
server
fails,
they
usually
are
set
up
in
high
of
highly
available
ways
so
that,
even
if
there
is
a
failure,
you
don't
have
to
fix
it.
Someone
else
is
fixing
it
for
you
and
it
resolves
fairly
fast,
then
usually
managed
public
clouds
have
amazing
seamless
user
experience.
A
So
they
have
automated
live
migrations
of
your
servers
between
physical
servers
between
of
your
vms,
between
physical
servers.
They
do
provide
strong,
sla,
so
service
level
agreements
for
uptime
and
support.
A
Which
will
matter
when
we're
talking
about
latency?
They
have
low
maintenance
from
the
user
point
of
view.
They
are
very
easy
and
very
quick
to
provision.
Most
of
the
big
cloud
providers
have
really
good
documentation.
A
A
Some
of
the
cops
some
of
the
cons
are
limited
capabilities
and
flexibilities
because
they
are
managed
servers.
You
don't
have
the
same
level
of
customizability
as
you
would
with
say
a
self
hosted
or
a
rented
server
in
another
provider,
and
then
they
also
have
very
high
opex,
so
they
tend
to
cost
quite
a
bit
more
than
than
the
the
other
option,
which
is
the
the
renting
servers.
A
I'm
just
gonna
close
the
chat,
because
it's
distracting
me
that's
why
I
am
so
I'll
make
a
pause
in
a
bit
and
we
can
go
to
some
of
the
questions.
Actually
a
good
yeah.
Let's
go
to
the
next
slide.
Please.
A
Okay,
so
then
the
bare
metal
cloud.
This
is
where
we
talk
about
rented
dedicated
servers
and
there
are
many
providers
out
there.
I'm
gonna
try
not
to
name
them
so
that
we
don't
influence
people
but
some
of
the
pros.
So
they
are
highly
customizable
a
lot
more
customizable
than
manish
public
cloud.
A
Again
they
have
no
initial,
no
capex
and
low
low
opex.
The
the
pro
the
opex
depends
from
provider
to
provider.
They
have
remote
control
over
the
physical
hardware.
So
while
you
cannot
go
and
turn
on
and
off
the
server,
you
do
have
remote
control
over
it
via
ssh
or
other
ways.
They
are
fairly
easy
to
provision
not
as
easy
to
provision
as
the
public
cloud,
but.
A
And
better
value
for
money
and
in
terms
of
some
of
the
cons,
they
have
a
higher
maintenance
than
public
cloud
you're
in
charge
of
the
security,
and
you
are
exposed
to
risks
of
hardware
failure.
Let's
go
to
the
next
slide.
Please,
okay,
and
the
last
option
here
in
terms
of
picking
hardware,
is
self-hosted
servers,
so
the
self-hosted
servers
are
highly
customizable.
A
They
have
the
lowest
topics.
As
I
said,
you
only
worry
about
electricity
and
rent.
Once
you're
you
decide
to
go
with
self-hosted.
B
A
Of
expertise
to
set
up
and
a
lot
of
knowledge
to
operate
on
an
ongoing
basis,
you're
also
in
charge
of
security
network,
king
power
and
redundancy,
a
very
important
thing
to
mention:
self-hosted
servers
depreciates
over
time,
and
they
have
a
high
capex
to
begin
with,
and
you're
exposed
to
risk
and
cost
of
hardware
failures.
A
I'm
gonna
stop
here!
Really
quick
and
chris.
I
see
you
you're
typing
joanna.
Do
you
have
anything
else
to
mention
that
I
haven't
said.
C
No,
no,
I
think
you've
covered
it.
Well,
sorry
for
distracting
you
with
all
the
chat
messages.
I
I
just
mentioned
that
there
is
one
more
option
which
is
co-location,
which
is
kind
of
like
a
combination
between
this
slide
self-hosted
servers
and
bare
metal
cloud
where
you
basically
buy
the
server,
but
then
you
just
rent
space
in
some
third-party
data
center
to
host
them,
so
you
buy
the
server
but
rent
everything
else.
The
power,
the
space,
the
networking
everything.
A
Yes,
thank
you
and
also
someone
else
mentioned
the
data
center
connectivity.
I'm
sorry,
I'm
not
gonna
try
to
pronounce
your
name,
because
I
won't
do
it
justice,
but
thank
you
for
your
contribution
as
well.
Yeah
next
slide.
Please.
A
Okay,
so
now
that
we
spoke
a
bit
about
hardware,
let's
talk
about
understanding
the
different
blockchain
nodes,
because
there
can
be
some
confusion
there
and
yeah
it's
worth
quickly.
Talking
about
that,
so
keep
in
mind
this
diagram.
This
is
where
the
the
blockchain
node
is
so
the
border
blockchain,
node
and
next
slide.
Please.
A
So
there
is
the
concept
of
light
node,
which
is
only
useful
for
sending
transactions.
This
actually
means
that
it's
not
useful
for
indexing,
but
you
can
use
for
sending
transactions
so
a
light
node.
You
cannot
index
any
type
of
sub
graphs
at
all.
A
Then
we
have
a
full
node
which
keeps
all
log
history
but
prunes
everything
else
that
this
historical
state
and
other
data,
so
you
can
use
a
full
node
to
send
transaction
and
to
sub
graph
and
to
index
subgraphs
that
only
consume
events
and
don't
need
to
access
historical
data.
Then
we
have
archival
nodes,
the
archival
nodes.
Let
you
send
transaction,
let
you
sell
index
sub
graphs
that
only
consume
events,
but
also
index
sub
graphs
that
query
chain
state,
so
they
keep
everything
a
full
node
would
keep
plus
historical
state.
D
C
I
guess
maybe
the
distinction
is
that
we're
talking
about
evm
nodes,
so
that's
ethereum,
gnosis,
chain,
optimism,
arbitrim
and
many
of
the
networks
that
will
come
to
the
mips
program.
A
Yes,
thank
you
very
much
and
for
the
next
question,
does
this
apply
to
gnosis
chain?
Yes,
for
mips,
what
kind
of
eat
note
will
we
need
so
for
mips
depends
on
the
kind
of
subgraphs
you
will
be
querying
chris.
C
Yeah
I
mean
the
bare
minimum
to
be
able
to
index
a
subgraph
will
be
a
full
node,
but
if
you
would
like
to
kind
of
seriously
compete,
you
probably
want
to
be
able
to
index.
You
know
whatever
sub
graphs.
Have
lots
of
curation
or
you
know,
are
receiving
many
queries
and
in
that
instance,
you'll
want
an
archival
trace,
node
to
be
able
to
index
everything.
A
A
Well,
the
pros
are
that
it's
really
easy
and
quick
to
set
up.
You
don't
need
any
infrastructure,
expertise,
expertise
to
do
with
setting
up
servers,
deploying
applications
and
so
on.
A
With
the
caveat
of
you,
you
still
need
to
run
the
other
components,
so
you're
buying
a
host
rpc
service
to
replace
your
blockchain
node.
In
that
diagram
that
we
showed
earlier,
you
don't
have
to
worry
about
the
security
of
the
blockchain
node,
another
pro
in
terms
of
cons,
very,
very
high
opex,
so
the
monthly
operating
sys
of
operating
expense
is
high.
A
Then
you
are
likely
to
have
high
latency
issues
between
the
graph,
node
location
and
the
blockchain
location,
and,
to
put
that
into
a
bit
more
perspective,
it
can
be
anywhere
up
to
100
times
less
cumulative
latency
when
graph,
node
and
blockchain
nodes
are
on
the
same
server
compared
to
when
you're,
using
an
rpc
service
and
as
we
spoken
in
the
previous
presentation,
latency
is
very
important.
A
Then
another
point
to
make
is
that
on
many
providers,
archival
state
and
trace
data
are
add-on
premiums
not
on
all
of
them,
but
on
most
of
them.
I
had
a
look
this
morning
with
chris
and,
for
example,
quick
node
has.
A
So
this
is
where
you
are
not
contributing
to
the
bigger
ecosystem,
but
at
the
same
time,
if
you
are
seriously,
if
you
seriously
have
the
cash
to
dish
on
hosted
rbc
service-
and
you
want
to-
I
don't
know
start
you
might
be
look
at
this,
but
yeah
best
is
to
run
your
own
nodes
and
save
money
at
the
same
time,
and
then
you
may
need
to
use
different
providers
for
different
chains
so,
for
example,
for
gnosis
chain,
infuria
and
alchemy
do
not
support
gnosis
chain.
A
So
not
all
nodes
are
supported
on
all
providers.
A
C
Thanks
anna
next
slide,
please-
and
I
will
also
note
there-
are
a
few
people
on
this
call
who
have
far
greater
storage
expertise
than
me.
So
please
do.
C
Chip
in
this
is
the
only
slide
we
have
so
definitely
a
bit
of
time
to
discuss
other
perspectives
when
it
comes
to
storage,
like
we've
already
seen
from
the
diagram
that
there
are
three
components
in
particular
that
are
stateful
and
where
storage
matters,
two
of
those
components
are
particularly
performance
sensitive,
and
that
is
the
blockchain
node,
which
obviously
synchronizes
with
the
chain
and
keeps
up
to
date
with
all
of
the
latest
blocks,
and
then
the
postgres
database
that
holds
the
subgraph
data
and
so
for
both
of
these
use
cases.
C
High
speed
by
physical
storage
media
is,
is
really
high.
Leverage.
It
makes
a
very
big
impact
and,
generally
storage
is
one
of
the
most
important
and
largest
cost
components
of
of
running
the
infrastructure.
So
it's
worth
thinking
about
for
blockchain
nodes,
in
particular.
C
Any
storage
solution
that
offers
high
random
iops
performance
is
going
to
be
particularly
well
suited,
and
that's
because,
as
blockchains
sync,
they
do
a
lot
of
random
state
accesses.
So
yeah
random,
I
o
performance,
is
important.
C
Similarly,
when
subgraphs
are
indexing,
you
know
they're,
constantly
writing
to
the
database
and,
at
the
same
time,
you're
issuing
queries
against
the
database.
You
know,
as
as
queries
come
in
from
the
gateway,
and
so
again
this
is
a
an
area
and-
and
just
to
recap,
I'm
talking
now
about
the
postgres
db
storage
that
actually
holds
subgraph
data.
C
This
is
an
area
where
high
performance
storage
has
has
a
high
impact,
so
it
allows
you
to
both
index
subgraphs
faster,
but
also
answer
queries
faster.
So
that's
that's
the
physical
media.
I
know
I
see.
Xfs
is
mentioned
in
the
chat.
It's
a
file
system
that
I've
personally
never
used.
Please
check
it
out
and
actually
yeah,
I'm
hesitant
to
try
saying
your
name
for
fear
of
butchering
it.
C
But
if
you
are
open
to
saying
a
few
words
about
xfs,
please
do
I'll
just
get
to
the
end
of
the
next
section
so
other
than
physical
media.
The
the
other
thing
is
like
how
you
actually
use
these
devices,
particularly
for
some
of
these
larger
chains.
So
I
think
gnosis
chain,
for
example,
a
full
archival
trace
node.
C
I
think
is
on
the
order
of
three
to
five
terabytes,
depending
on,
if
you're,
using
compression
and
things
like
that,
and
so
you
know
you
might
you
may
not
have
a
single
physical
disk
that
can
hold
all
of
that
state
and
and
obviously
you
have
a
single
point
of
failure.
C
So
generally
it
makes
sense
to
use
some
technology
to
aggregate
individual
physical
disks
into
a
larger
volume
and
at
the
same
time
you
know
get
some
other
features
around
redundancy
and-
and
you
know
just
the
ability
to
tolerate
a
drive
failure,
so
raid
or
lvm
are
great
kind
of
beginner
options
that
can
allow
you
to
do
that,
but
zfs
is
also
a
fantastic
option
that
comes
with
some
other
benefits
as
well.
C
Lvm
and
zfs
both
offer
quality
of
life
features
like
snapshotting.
That
are
definitely
worthwhile,
and
you
know,
particularly
when
it
comes
to
just
the
operational
tasks
associated
with
your
indexer,
like
taking
backups
or
moving
nodes
between
machines
or
things
like
that.
Having
those
features
really
does
make
a
difference,
and
yes,
we
do.
C
We.
We
do
actually
have
have
a
couple
of
a
couple
of
people
that
run
butter
fs
in
the
ecosystem.
Cohen
is
one
of
them.
I
don't
know
if
he's
on
the
call,
but
it
it
is.
It
is
something
that
is
used
at
graphops
we're
using
zfs,
and
I
know
a
few
people
use
it
quite
successfully.
A
And
vince
from
notify
said.
He
also
said
that
butter
fs
leaves
performance
to
be
desired.
Have
you
used
advance?
Would
you
like
to
say
a
few
words
if
you
have.
C
He
he
can
speak
no
worries
thanks
for
sharing
your
perspectives
anyway,
vince.
I
don't
know
whether
you
know
not
to
put
people
on
the
spot.
It's
totally!
Okay,
if
you
can't
speak,
but
I,
if
jim,
has
any
thoughts.
Jim's,
a
storage
master.
B
That's
not
true,
I'm
not
a
storage
master
I'd,
say
I'm
a
storage
enthusiast.
Maybe
can
you
hear
me
okay,.
B
Well,
you
know
the
one
thing
I
think
that's
interesting
about
zfs.
Is
that
all
the
features
that
come
with
it
right?
I
think
you
mentioned
that
chris,
but
it
really
can't
be
understated
how
how
often
the
features
of
gfs
like
snapshotting
and
the
fact
that
it's
a
copy
on
right
operating
system.
So
if
things
go
really
wrong
and
you
get
like
a
a
sig
interest
or
some
kind
of
really
awful
thing
happen
on
your
server,
then
zfs
generally
has
your
back
because
it's
a
copy
on
right
operating
system.
B
So
if
you
were
to
lose
power
immediately,
then
the
likelihood
of
you
actually
corrupting
something
in
the
office
and
it
not
auto
correcting,
is
super
super
low
and
then,
of
course,
snapshotting
and
compression
compression
is
free.
Space
and
free
performance
on
zfs,
just
enabling
compression
can
improve
even
with
uncompressible
data
can
improve
your.
You
know,
performance
biops
quite
considerably,
but
as
with
all
things
I
mean,
zfs
is
not
magical
right.
B
There's
it
doesn't
really
matter
what
type
of
low-level
file
system
you
decide
to
go
for
or
there's
always
going
to
be
that
trilemma
right.
So
there's
three
things
that
you're
planning
for
you're
trying
to
architect
for
it's
capacity,
so
how
much
useful
space
will
I
have
on
my
arrays
performance?
B
How
performance
is
my
array?
Is
it
you
know?
Is
it
suitable
for
a
really
high
read
read
streaming
intensity?
Is
it
is
it
performant
for
random
rewrite?
Is
it
going
to
match
the
types
of
workload
that
I
need
to
do
with
my
stack
right
and
then
the
last
one,
of
course
is
integrity?
B
How
much
of
your
hardware,
your
very
expensive
micron!
You
know
nvme
u.3
ssds,
that
you
paid
huge
amounts
of
money
for
or
huge
amounts
of
to,
rent.
How
many
of
them
do
you
want
to
sacrifice
in
the
you
know,
for
the
sake
of
making
sure
that
you
have
a
very
resilient
zfs
pool,
you
know,
do
you
want
to
do?
Do
you
want
to
mirror
your
drive,
so
you
have
an
exact
copy
or
multiple
exact
copies?
B
Do
you
want
to
do
raid
z2,
where
maybe
you
can
afford
to
have
two
drives
in
in
part
of
the
pool
fail?
Or
do
you
want
to
just
you
know,
go
balls
to
the
wall
and
forget
about
all
of
that
and
just
go
for
pure
performance,
and
you
know,
put
loads
of
drives
in
parallel
and
really
to
you
know,
take
a
real
serious
hit
in
your
wallet.
B
Those
are
all
things
to
consider,
and
the
thing
I
would
say
is:
if
you're
really
really
serious
about
this,
you
want
to
you
want
to
play
with
this
stuff
on
a
on
a
even
if
you're,
going
down
the
co-host
co-hosting
or
this
the
cloud
route,
you
want
to
have
something
at
home
that
you
can
play
with
these
things,
experiment
benchmark
so
that
all
the
mistakes
you
make
you
make
them
on
a
system
that
does
not
matter
you
take
some
cheap
drives
some
cheap
ssds
at
home
play
around
with
them
in
an
old
computer.
B
Don't
don't
go
straight
into
production
with
what
you
think
is
going
to
work
because
zfs
has
got
you
know
you
can
you
can
organize
things
in
a
near
infinite?
Number
of
ways
same
with
any
any
sort
of
low
level
file
system,
but
don't
go
don't
go
in
thinking
that
your
theory
or
your
fio
is
the
tool
we
use
often
for
for,
for
benchmarking.
Don't
don't
just
believe
what
the
tools
telling
you
do.
B
Some
experimentation
at
home
before
you
spend
the
big
bucks
either
on
your
own
servers
or
servers
that
you're
renting
in
the
cloud.
C
Yeah,
thank
you.
Thank
you.
So
much
for
sharing
jim
some
great
stuff
in
there
a
couple
of
questions
in
the
chat.
At
least
some
of
them
have
been
answered
in
the
chat
but
christopher
from
lemonade
said,
wouldn't
compression
decrease,
throughput
and
thus
negatively
affect
performance.
It's
really
cool
to
see
how
zfs
impacts,
workloads
and
zfs
with
postgres
in
particular.
C
If
your
queries
are
are
reading.
Large
amounts
of
data
can
actually
significantly
increase
your
query
performance
because,
even
though
you
know
you're,
obviously
compression
is
not
free.
You,
you
have
some
cpu
trade-off
that
you're
making
there,
but
if
you're
pulling
large
amounts
of
data
off
of
the
disk,
you
can
actually
get
more
of
a
performance
benefit
from
compression,
allowing
you
to
read
and
write
less
data
to
the
disk
than
the
trade-off
you
make
with
cpu
so
yeah.
C
There
are
many
configurations
and,
as
as
is
being
said
in
the
chat
chat,
often
with
zfs,
you
kind
of
want
to
tune
it
for
each
workload.
If
you,
you
know,
if
you
kind
of
want
to
eke
out
maximum
performance,
one
last
thing
I'll
say
is
that
you
know
people
make
a
career
out
of
managing
zfs
and
indeed
many
of
the
components
in
this
stack.
You
don't
need
to
run
some
crazy,
optimized
zfs
setup
in
order
to
be
a
successful
indexer.
C
There
are
many
indexers
that
operate
well,
just
with
a
bug,
standard,
ext4
file
system.
So
yeah
I
mean
one
of
the
things.
That's
fantastic
about
this
ecosystem
is
there's
just
so
much
room
for,
like
technical
growth,
don't
feel
that
you
need
to
go
all
in.
You
know
in
every
layer
from
day
one
it's
something
that
you
can
kind
of
level
up
on,
as
as
you
get
more
comfortable
with
the
stack
okay,
if
anyone
has
anything
else
to
say
about
storage,
please
say
it
now,
and
otherwise
I
will
hand
back
down.
A
G
Sorry,
I'm
really
sorry
to
interrupt,
but
that's
a
question
sure
I'm
wondering
a
little
bit
about
like
what
is
the
like.
The
the
pattern
of
read
and
writes
like.
Is
it
like
consistent
because
blocks
are
produced
consistently
or
will
it
more
follow
like
a
burst
pattern.
C
It's
it's
it's
hard
to
answer
precisely.
I
think
there
are.
There
are
aspects
of
the
workload
that
are
consistent
and
then
there
are
aspects
of
the
workload
that
are
highly
variable.
If
you're
indexing
a
large
number
of
subgraphs,
then
you
know
with
each
block
that
comes
in
your
graph.
Node
is
consuming
and
transforming
data
and
saving
it
into
the
database.
So
there's
going
to
be
some
consistent,
write
workload
that
scales,
you
know
roughly
linearly
to
the
number
of
subgraphs
that
you
index
the
side.
C
That
is
less
easy
to
answer
is
the
query
side,
and
you
know
that
is
not
just
a
a
matter
of
like
what
demand
exists
in
the
market
for
different
types
of
data,
but
even
right
you
can
express
different
preferences
to
the
market,
about
which
queries
you
serve
using
cost
models.
So,
for
example,
if
you
wanted
to
really
optimize
for
serving
queries
that
are
otherwise
you
know
massive
and
super
expensive
for
other
indexers
to
run,
maybe
they
have
to
scan
over
very
large
amounts
of
sequential
data.
C
C
A
A
How
you're
going
to
spin
it
up?
How
are
you
gonna,
automate,
the
provisioning
and
so
on?
So
next
slide.
Please.
A
So
here
we
talk
about
infrastructures
code,
I'm
sure
many
of
you
have
heard
this
term,
but
basically,
what
it
means
is
instead
of
going
around
and
clicking
or
running
one
command
ad
hoc
somewhere
to
set
something
up.
You
have
everything
written
in
code,
whether
that's
yamo,
using
something
like
ansible
or
whether
that's
terraform
using
hcl,
is
what
they
use
or
even
bash
scripts
it.
It
would
still
be
infrastructures
code
as
long
as
you
have
a
repeatable
way
of
recreating
the
same
thing.
A
A
Whereas
for
imperative
infrastructure,
you
specify
a
list
of
commands
to
run
to
create
the
resources
that
you
want
so
imperative.
Infrastructure
in
this
case
can
be
bad
because
it
can
introduce
configuration
drift
and
because
it's
very
specific
so,
for
example,
running
a
command
with
aws
cli.
A
A
As
chris
said,
we
strongly
recommend
the
approaches
of
version
control,
declarative
infrastructure
as
code.
Examples
of
tools
that
are
declarative
infrastructure
is
code
terraform,
which
maintains
a
state
and
is
very
widely
used.
I
think
it's
the
preferred
the
preferred
tool
for
many
sres
when
it
comes
to
provisioning
infrastructure,
there's
cloud
formation,
which
is
an
aws
product.
A
My
personal
opinion
is
that
platform
is
much
better
than
cloud
formation
and
then
there's
ansible,
which
is
configuration
management
but
can
be
used
as
infrastructure
as
code
as
well
again
use
version
controlled,
declarative
infrastructures
code
to
provision
your
infrastructure.
A
Any
questions
about
this
I
before
we
move
to
the
next
slide.
A
As
chris
said,
launchpad,
which
is
our
kubernetes
toolkit
for
indexers,
is
built
on
ansible,
kubernetes,
helm
and
help
file,
which
are
all
declarative
infrastructures
code
and
in
case
you're
wondering
why
ansible
and
not
something
else
is
because
we're
only
using
the
ansible
part
for
installing
the
different
for
for
installing
kubernetes
between
your
so
between
your
different
nodes
and
a
few
other
initial
setup.
Once
you've
installed
the
operating
system,
there
will
be
an
a
link
in
the
next
slide
for
launchpad
in
the
next
few
slides.
A
Okay,
next
slide,
please.
So
I
wanted
to
quickly
address
the
whole
docker
versus
kubernetes
debate
and
there's
many
ways
to
address
this.
So
first
docker
is
a
suite
of
software
development
tools
for
creating
sharing
and
running
individual
containers
where
containers
are
standardized
packages
for
microservices
applications
with
all
their
needed
dependencies
inside,
whereas
kubernetes
is
the
system
for
operating
containerized
applications
at
scale
where
an
application
consists
of
many
containers
and
to
orchestrate
them
in
a
production
environment
to
orchestrate
them
over
many
servers,
many
data
centers
you'd
use
kubernetes.
Now.
A
What
I
want
to
specify
here
is
that
docker
has
been
used
recently
has
become
the
thing
when
you
talk
about
containers.
So
if
you
think
in
our
google
example,
we
don't
say
I'm
going
to
do
internet
search,
you
say
I'm
going
to
google
this.
That
is
the
equivalent
of
what
people
mean
when
they
talk
about
docker.
They
just
mean
containers,
but
the
reality
is
that
there's
many
tools
for
docker,
so
there's
docker
for
mac,
there's
docker
compose,
there's
docker
swarm,
there's
docker
as
a
container
runtime
and
so
on.
A
I
colson
aic
means
infrastructures
code
should
have
should
have
made
that
more
clear.
A
So
going
back
to
docker
versus
kubernetes,
I
would
say
that
a
fair
comparison
would
be
docker
swarm
versus
kubernetes,
where
docker
is
also
docker.
Swarm
is
also
a
system
for
operating
containerized
application
at
scale,
as
chris
is
saying,
is
key
difference
is
that
docker
doesn't
provide
any
host
abstraction
clustering,
but
kubernetes
dance,
and
here
yeah
docker
storm
also
does
provide
host.
A
Host
abstraction,
but
it
has
been
deprecated
and
there
is
also
docker
compose
that
also
doesn't
provide
clustering.
Okay
next
slide
so,
and
maybe
this
would
have
been
a
better
way
if
I
would
have
switched
the
slides
around.
But
here
we
are
so
we
talked
about
containers,
but
let's
quickly
touch
on
self
compile,
so
your
options
are
running
containers
which
run
based
on
an
image
which
is
a
specification
of
what
you're
going
to
install
and
how,
in
that
container
or
self-compiling
applications.
A
Self-Compiling
packages
on
an
operating
system
so
running
using
containers
is
highly
recommended
because
containers
come
with
100,
rep,
usable
environment
and
because
it's
very
hard
to
replicate
the
same
behavior
between
many
different
servers
and
nodes
outside
of
containers,
especially
when
different
machines,
different
environments,
have
different
dependencies
and
especially
when
sometimes
running
hardware
just
means
that
there
might
be
something
really
specific
to
one
at
one
machine
that
you
are
not
seeing
on
other
machines.
A
So
again,
please
run
containers
instead
of
self
compiling.
We
know
that
I
think
it's
cohen,
who
self
self
compiles
everything.
I'm
not
sure,
if
he's
here,
to
comment,
but
maybe
we'll
try
to
get
his
comments
in
the
future
to
have
a
more
balanced
view
of
his
pros
and
cons
next
slide.
Please.
A
Okay,
so
initially
we're
going
to
talk
about
an
intro
to
sre
and
just
in
general,
day-to-day
operations
we
run
out
of
time,
but
even
the
fact
that
you
have
to
think
about
running
infrastructures,
code
running
everything
version
controlled.
A
Even
that
is
day-to-day
sra.
You
also
want
to
make
sure
that
you
have
a
monitoring
system
yeah.
You
want
to
make
sure
that
you
have
monitoring
systems.
A
You
have
visibility
into
your
into
what
your
applications
are
doing,
but
also
visibility
into
how
your
hardware
is
behaving
as
well,
so
to
wrap
up
this
whole
conversation
to
remind
you,
the
different
indexing
stacks
available,
so
there's
the
docker
compose
setup
from
stake,
squid
that
has
been
recently
updated
to
support
maps.
A
However,
the
limitations
of
the
docker
compose
setup
are
that
you
can't
scale
that
beyond
beyond
one
host
one
server.
So
when
it
comes
to
supporting
multiple
chains,
you
can
only
get
so
far
then
soon
we'll
be
revealing
the
kubernetes
launchpad
from
graphops,
which
there's
a
lot
of
work
that
has
gone
into
that.
The
purpose
of
it
is
to
take
you
from
you,
give
it
an
ip
address
and
the
way
of
connecting
to
your
servers
and-
and
it
takes
those
servers,
creates
clusters
for
you.
A
A
A
C
Maybe
we'll
we'll,
I
don't
know
about
whether
the
faq
will
be
in
the
recording.
We
will
hand
it
back
to
abel
and
and
then
yeah
over
to
you
able.
E
Sure
thing
yeah,
I
I
I
I'd
say
we
should
keep
the
recording
in
the
faq.
So
like
I've
I
mean
I
think
people
are
feel
free
to
drop
off.
If
you,
if
you're
busy
but
yeah,
I
think
it's
it's
valuable,
so
we'll
keep
the
recording
the
faq,
so
yeah
so
keep
going,
but
obviously
try
not
to
keep
it
for
too
long.
C
D
C
Question
yeah,
and
indeed
to
everybody,
who's
shared
their
opinions
and
perspectives
in
the
call
and
on
the
chat,
I
really
appreciate
the
diversity
of
perspective
and
keep
contributing.
I
hope
everybody
can
feel
that
that's
a
big
part
of
this
community.
A
What
is
victoria
metrics?
I've
never
heard
of
that.
F
It
allows
you
to
horizontally
scale,
your
storage
for
metrics
and
it's
more
efficient
faster,
doesn't
require
such
resources
like
primitives,
it's
less
cpu
intensive,
so
you
can
use
it
more
easier.
A
And
have
you
used
that.
F
Yes,
there's
the
state
machine
that
board
is
running
it,
I've
seen
genesis,
so
we
don't
use
primitives.
It's
fully
reliant
on
the
victoria
metric
storage.
A
And
okay
cool
so
in
terms
of
so
most
of
the
helm,
charts
and
just
in
general,
kubernetes
setups
out
there
for
different
different
applications,
usually
out
of
the
box,
support
prometheus.
What's
the
do
you
need
to
do
migration,
or
are
you
able
to
use
prometheus
metrics
with
victory.
F
A
So
now
we
have
a
name
for
you,
we'll
call
you
state
machine
and
okay.
Anyone
else
want
any
questions,
any
anything.
E
Everyone
should
have
the
ability
to
unmute
so
feel
free
to
do
so.
It
looks
like
we
have
a
comment
in
the
chat,
so
I
know
chris
feel
free
to
grab
that.
C
I'll
grab
that
hey
dan
hoping
your
name
is
dan,
it's
not
a
huge
stretch
right.
So
the
question
is
as
a
newbie,
I'm
confused
as
to
whether
the
intention
is
for
missed
participants
to
run
through
every
chain
that
is
specified
in
the
program
or
if
you
think
you
can
only
support
one
chain.
You
should
just
pick
one
and
forget
the
rest,
so
you
can
participate
in
whichever
chain
phases
you
would
like
to.
C
As
far
as
I
understand
so
I'm
sure
this
won't
be
new
information,
but
through
the
mips
program
there
will
be
a
number
of
phases,
and
you
know
each
chain
you
can
kind
of
think
of.
Has
its
own
phase
and
a
phase
has
a
cycle
that
starts
at
test
net
and
so
mips
participants
are
expected
to.
You
know,
participate
in
test
net
and
meet
mission
and
qos
objectives
in
testnet,
then
one
once
testnet
qos
gets
to
qos's
quality
of
service
for
anyone
unsure
once
testnet
qos
gets
to
a
certain
threshold.
C
So,
but
each
kind
of
chain
phase
is
other
you
know
is
basically
largely
independent
from
any
other.
C
So
you
can
pick
the
chains
that
you
want
to
participate
in
the
program
for,
but
the
important
part
is
that
you
finish
the
complete
phase,
so
you
get
from
test
net
all
the
way
through
to
mainnet
for
every
chain
and
then,
as
for
whether
it
makes
sense
to
set
up
and
then
tear
down
for
every
chain
phase,
I
mean
to
me
that
doesn't
make
sense,
because
the
objective
of
the
program
is
obviously,
to
you
know,
add
sustainable
indexers
to
the
network.
C
C
Follow-Up
questions
from
dan,
so
I
might
not
be
ready
for
nurses
chain,
but
I
can
start
getting
things
ready
in
order
to
jump
on
a
future
phase.
Yes,
that's
correct!
So
long
as
you,
you
know
applied
before
the
deadline
and
submitted
all
your
kyc,
but
yeah
you.
You
can
pick
the
chains
that
you
would
like
to
participate
in.
A
Any
more
questions,
and
since
we're
waiting
for
questions
just
just
it's
more
of
a
request
for
feedback,
this
is
around
what
was
missing
from
this
presentation.
Obviously,
there's
more
workshops
coming
up,
but
if
there's
something
that
you
need
more
help
with
in
terms
of
understanding
everything.
A
The
name
of
the
node
is
archival
trace,
node
to
index
all
sorts
of
subgraphs
coulson.
C
Maybe
just
to
clarify
the
node
client
is
nethermind,
but
you
can
run
nethermind
in
those
nodes
in
those
modes
that
were
covered
so
as
a
light
client
as
a
full
client
as
an
archive
etc.
So
you
want
to
run
ansible
for
nurses
chain
and
if
you
would
like
to
support
any
subgraph,
you
need
to
run
it
in
archival
trace
mode
which
yeah
guess.
I'm
sure
there
is
documentation
on
that
and
we
will
actually
cover
this
in
our
launchpad
workshop.
C
But
yeah
you
just
pass
a
config
flag
to
nethermind,
saying
xdi
underscore
archive,
and
and
that
should
be
what
you
want.
A
And
is
there
a
way
to
sync
your
gnosis
gnosis
archival
node,
without
having
to
wait
a
few
days?
I
don't
think
there
is,
unless
you're,
using
an
rpc
node.
But
again
we
are
not
recommending
using
our
pc
providers.
B
Someone
the
other
day
said
that
they
were
uploading
a
snapshot.
I
think
it
was
sun
tzu.
I
don't
know
if
he's
on
the
call,
but
maybe
one
of
the
community
members
might
be
providing
a
snapshot
we'll
see.
There
was
also
some
talk
about,
I
think
hypnosis
providing
them,
but
whether
that
actually
came
to
to
be
or
not,
I'm
not
sure.
C
Yeah,
the
there
are
a
few
efforts
to
organize
snapshots.
Honestly,
one
of
the
biggest
challenges
with
snaps
snapshots
isn't
so
much
making
them
it's
distributing
them
because,
yes,
as
state
machines,
said
a
1.7
terabyte
snapshots,
just
a
lot
of
bandwidth
to
pay
for,
if
you
know,
100
150
500
people
are
downloading.
It.
C
A
And
to
all
these
questions,
question
two:
is
there
a
recommended
list
of
regions
where
we
should
start
the
servers
it's
hard
to
provide
recommendations
on
locations,
especially
because
once
we
start
recommending
locations,
then
the
chances
are
that
most
in
indexer
might
go
towards
the
same
location.
A
C
Yeah,
no,
I
think
spot
on.
You
know
we
don't
really
want
to
recommend
locations,
but
you
know
I
I
it's
not
rocket
science.
You
know
think
about
where
the
major
markets
are
for
cryptocurrency.
You
know
not
a
stretch
to
think
that
the
united
states
generates
a
lot
of
of
query
demand.
C
Also,
parts
of
of
asia
also
generate
a
lot
of
demand,
so
you
can
kind
of
just
think
about
it.
From
that
perspective,
it's
also
worth
saying
that
graphops,
in
collaboration
with
edge
and
node,
is
working
on
a
gateway
quality
of
service
oracle,
and
this
will,
you
know,
give
us
a
bit
more
quantitative
data.
All
of
us
to
understand.
C
A
So
lex
crime
has
shared
the
graph
state
machine.
The
map
with
the
loss
and
he's
saying
to
look
at
this
map
and
choose
the
lowest
concentration
of
indexer,
but
also
remember
that.
A
Oh
sorry,
I
don't
know
what
I
was
gonna
say
my
mind
went
blank.
A
state
machine
is
also
saying
that
during
mip
you
will
get
synthetic
load,
so
latencies
to
gateways
would
be
very,
very
important.
B
B
At
the
map
will
will
will
get
that
joke.
E
I
think
so
too,
so
I
think
it
might
be
a
good
opportunity
to
wrap
up.
Thank
you,
everyone
for
all
of
your
participation.
This
has
certainly
been
a
very
educational
and
informative
conversation,
so
yeah.
Thank
you
anna.
Thank
you,
chris.
Thank
you
to
everyone.
This
has
been
an
absolute
pleasure
and
we
look
forward
to
seeing
you
all
on
tuesday
for
the
next
ioh
slash
mip
workshop.
Thank
you
all.
Take
care.