►
From YouTube: Buzzword Bingo: Architecting a Cloud-Native Internet Time Machine by Ross Kukulinski, NodeSource
Description
Buzzword Bingo: Architecting a Cloud-Native Internet Time Machine - Ross Kukulinski, NodeSource
Attend this interactive session (including real Bingo!) where Ross Kukulinski, a NodeJS Evangelist, container enthusiast, and NodeSource technical product manager will share his cloud-native implementation of the Internet Archive’s Wayback Machine, including a live demonstration.
Ross will share his goals, architecture, and technology stack choices to implement a scalable, containerized, microservice implementation of the Wayback Machine using Node.JS, Docker, and Kubernetes.
Topics covered in the talk will include:- Architecture- Scaling- Monitoring- CI/CD
A
Hey
everybody
so
today,
I'm
gonna
be
talking
about
architecting,
a
cloud
native
internet
time,
machine
quickly
about
me,
I'm,
currently,
a
product
manager
at
node,
source
and
I'm.
Also
a
member
of
the
nodejs
evangelism
working
group,
so
I
like
to
go
to
meet
ups
and
help
with
node
schools
to
help
evangelize
node
think
everybody
should
be
using
node,
and
that's
that's
why
I'm
here
I
also
have
an
introduction
to
core
OS
video
tutorial
series
with
o'reilly
media
and,
if
you're
interested
in
finding
me
on
the
internet.
A
I
am
at
roscoe
kolinsky,
very
original,
so
getting
started.
What
does
cloud
native
mean?
Well,
it
means
lots
of
things
depending
on
who
you
ask
and
what
they're
trying
to
accomplish,
but
someone
I.
I
follow
quite
a
bit
on
Twitter
job
beta,
who
just
had
a
new
startup
hep,
do
had
this
to
say
at
its
root
cloud
native
is
structuring
teams,
culture
and
technology
to
utilize,
automation
and
architectures
to
manage
complexity
and
unlock
velocity.
A
A
I
work
with
they're,
not
really
sure
where
to
begin
with
learning
operations
or
metrics
or
logging
or
distributed
tracing.
They
want
to
learn
but
they're,
not
sure
how
and
for
many
of
them,
they're
not
lucky
enough
to
be
in
an
organization
that
already
has
these
types
of
systems
in
place
that
are
already
trying
to
do
DevOps
or
do
cloud
native.
So
how
are
those
people
supposed
to
learn
and
hone
these
skills?
A
Now,
there's
an
engineer,
Angela
and
Magnum,
who
did
a
really
really
awesome
talk
at
jas
comp?
2013
I
was
not
there,
but
I
watched
online
along
with
120,000
other
people
on
YouTube
I
highly
recommend
watching
this
video,
if
you're
interested
in
trying
to
level
up
from
a
beginner
or
an
intermediate
developer
to
an
advanced
developer,
and
it's
not
just
JavaScript,
but
some
of
the
key
takeaways,
the
big
key
takeaway
that
that
I
really
took
from
Angelina's
talk.
Was
this
idea
of
experimenting
recklessly,
throw
caution
to
the
wind?
A
Try
something
dangerous
and
see
what
happens
it
doesn't
work.
That's
okay,
figure
out
what
went
wrong,
how
you
might
change
it
to
make
it
better
and
try
again,
but
recklessly
reckless
recklessness
can
be
very,
very
dangerous
for
an
organization.
So
you
really
as
a
developer.
You
want
a
safe
environment.
To
do
this.
Really
you
want
a
sandbox.
A
I
have
experimented
recklessly
and
felt
the
pain
and
I
realized
that
there's
there's
so
many
resources
out
there
for
us
as
developers
right,
you
can
go,
buy
a
book,
there's
ebooks,
there's
video
courses.
I
have
a
video
of
course
the
moment
I
shipped
my
video.
Of
course
they
was
outdated.
Core
concepts
were
right,
but
the
actual
implementation
details
were
no
longer
accurate.
It
wasn't
best
practices
anymore
because
we
move
in
a
space
that
moved
so
quickly,
so
I
wanted
to
create
a
sandbox
that
was
entirely
open
source.
A
That
would
enable
developers
like
myself
and
other
people
to
experiment
recklessly.
We
should
be
able
to
publish
and
document
everything
that
we
learn
and
experience.
What
are
the
best
practices
or
the
lessons
learned.
We
should
have
a
supportive
and
inclusive
mentorship
and
discussion
system,
so
everyone
should
be
feel,
welcome
to
contribute
or
try
or
learn
or
just
lurk.
A
Just
read
the
design
docs
and
see
what
you
can
figure
out,
but,
more
importantly,
as
I've
read
so
many
hello
world
apps
in
my
life,
I
didn't
want
to
have
a
hello
world
app
I
wanted
something
real
I
want
something
that
was
solving
a
real
problem
using
cloud
native
technologies
and
procedures.
So
after
some
soul-searching
and
brainstorming
and
talking
with
other
people
that
I've
been
working
with,
we
stumbled
across
the
wayback
machine.
So
who
here
is
familiar
with
the
Wayback
Machine
raise
our
hand
all
right?
A
That's
so
a
lot
of
people
are
if
you're
not
familiar.
It's
ok,
essentially
what
the
Wayback
Machine
does.
Is
it
crawls
the
internet
based
on
users,
requests
and
it
will
catalog
an
entire
website,
I'll
cash
at
all,
and
then
you
can
go
to
internet
archive
org.
You
can
go
back
and
search
any
URL
and
go
back
and
see
what
the
facebook
look
like
in
2009
or
2008
when
it
first
came
out
very
different
from
today
and
wayback
machine
is
an
entire
open-source
organization
or
the
the
organization
is,
is
it's
a
non-profit?
A
Their
goal
is
to
catalog
and
archive
the
internet
for
all
of
us
to
see
what
the
history
of
the
internet
is
and
from
an
engineering
perspective,
the
intern
archive
is
huge.
There's
been
a
ton
of
research
papers
published
on
the
internet
archive
and
their
architecture.
This
paper,
that's
up
here
the
architecture
of
the
Internet
Archive
by
Elliot
hefei
and
Scott
Corp
ettrick.
In
this
paper
they
described
in
2009.
This
is
2009.
The
internet
archive
was
serving
two
point:
three
gigabytes
of
data
per
second
to
its
users.
A
In
2014
they
had
catalog
15,
petabyte
petabytes
of
information
and
by
2016,
which
is
early
this
year,
430
billion
web
pages
were
archived
and
they
were
adding
5
billion
new
URLs
in
the
last
year.
So
this
is
a
real
system
at
real
scale,
and
for
me
this
is
an
interesting
problem.
How
might
I
go
and
build
and
implement
a
cloud
time
machine
now
when
they
first
built
the
Internet
Archive?
They
had
some
very,
very
a
for
real,
simple
rules,
very,
very
simple
design
goals,
which,
for
me,
is
an
engineer.
I
loved
number
one.
A
A
The
system
should
not
rely
on
any
commercial
software,
so
it's
either
all
open
source
or
they
build
it
themselves
and
then
and
then
make
it
available,
and
actually
they
went
ahead
and
built
their
own
file
store.
So,
like
we've
got
s3
and
Amazon
this,
like
sort
of
next-generation
cloud
file,
storage
built
their
own
in
1996,
and
if
you
read
the
research
papers,
they're
actually
remarkably
similar
the
system
should
not
require
a
PhD
to
implement
or
maintain.
This
is
good.
A
I
don't
have
a
PhD,
but
I
want
to
be
able
to
understand
how
these
components
fit
together.
How
do
they
work
and
anyone
should
be
able
to
approach
and
work
on
this,
and
the
system
should
be
as
simple
as
possible.
Obviously,
there's
constraints,
but
we
should
strive
to
make
it
as
simple
as
possible,
so
the
Internet
Archive
provided
a
tractable
problem
and
in
theory
we
should
build
something,
that's
easy
to
maintain
and
hopefully
it's
fun.
So
we
built
cloudy
time
machine.
It's
a
cloud
native
micro
service
based
implementation
of
the
Wayback
Machine.
A
Initially,
when
we
created
this,
we
actually
called
it.
The
lazy
internet
archive
and
I'll
get
to
that
in
a
little
bit
and
I
do
want
to
make
an
important
note.
This
is
not
production
ready.
We
are
not
trying
to
put
the
real
internet
archive
out
of
business.
They
are
a
huge
organization,
they're
doing
the
right
things
for
us
as
an
Internet
community.
This
is
a
problem
or
this
is
a
sandbox
right.
We
want
to
experiment
recklessly.
A
It
is
entirely
open
source.
It's
all
on
github,
you
go
to
github
com,
/,
cloudy
time
machine.
We
leverage,
Circle
C
I
for
automated
CI
and
CD
system
and
all
the
images
are
hosted
on
kwai
kwai,
tai
o,
which
is
a
docker
registry
and
they're
all
public.
All
of
this
is
public.
So,
let's
take
a
look
and
see
you
oops.
Let's
see
if
we
can
take
a
look
at
this
might
open
on
the
wrong
window.
A
There
it
is
oh
now,
I
can't
see
it.
That's
funny
so
here
down
here,
I
should
be
able
to
type
in
some
a
big
fan
of
XKCD
com,
so
in
our
lazy
wayback
machine
instead
of
crawling
and
pulling
out
all
the
index.html
and
all
the
HTML
and
CSS
and
images
and
trans
filing
all
of
them
in
rewriting
all
the
URLs,
we
were
lazy.
We
just
used
phantom
j/s
and
took
a
nice
big
snapshot
of
that
and
then
shoved
it
into
Google's
file
store.
A
A
A
All
right,
so
what
does
it
look
like?
Well,
Claudia
time
machine
is
a
react
and
web
pack
front
end.
We
have
a
public-facing
REST
API
documented
entirely
with
swagger.
In
theory,
we
could
generate
client-side
libraries
for
this
using
Swagger's
tools.
We
haven't
done
that,
yet
it
is
a
node.js
micro
service
architecture.
It
is
note,
but
it
doesn't
have
to
be
because
this
is
all
in
containerized
containers
and
it
has
a
well-documented
microservice
based
message,
bus
system.
We
could,
in
theory
rewrite
any
part
of
this
in
a
different
language.
If
we
wanted
to.
A
In
fact,
we
have
someone
who's
working
on
a
go
implementation
of
the
screenshot
software.
We
do
use
message
queues
for
our
internal
API,
so
all
the
services
communicate
using
a
message
queue
in
this
case.
It's
currently
Redis
I'll
talk
more
about
that
in
a
little
bit
and
from
day
one.
This
has
been
containerized
100%,
it's
leveraging
docker
and
crew
benetti's
in
this
production
deployment
it's
running
in
Google's
container
engine.
It
doesn't
have
to
be
good
right.
It
is
technically
cloud
agnostic.
A
However,
we
are
leveraging
Google's
cloud
storage
API.
So
right
now
we
write
directly
to
Google's
file
store
in
theory
the
service
that
is
responsible
for
writing
that
could
take
an
environmental
variable
and
switch
between
A&B,
AWS
or
gcp
or
azure,
or
maybe
even
a
local
file
store
solution.
We're
leveraging
rethink
TB
for
all
the
metadata
about
all
the
snapshots
and
then,
as
I
mentioned,
we're
using
see
I
CD
for
all
the
deployments,
so
here's
the
top
level
architecture.
A
So
at
the
top
we
have
our
engine
X
prophecy
layers,
so
those
are
simply
proxying
to
the
two
major
front.
End
surface
s:
services
first
is
the
front
end
single
page
app
that
react
app.
It
is
actually
served
up
by
engine
X,
so
we've
got
an
engine
X
talking
to
another
engine,
X
on
the
API
side,
node
base,
we're
using
rest
defy
to
communicate
and
rethink.
Tb
is
providing
again
that
caching
layer
red
is
is
the
jobs
system,
is
the
job
processing.
A
So
when
a
request
comes
in
to
take
a
snapshot
that
gets
written
to,
read
us
and
then
one
of
the
screenshot
workers,
poles
from
Redis,
will
then
start
doing
that
processing
talk
to
the
internet,
take
the
screenshot
and
then
upload
it
to
the
Google,
Cloud
storage,
and
then
that
information
is
then
sort
of
relayed
back
through
Redis
to
the
API
service,
which
then
writes
to
rethink
to
be
so.
What
are
some
of
the
lessons
learned?
Well,
first
of
all,
it's
fun
to
experiment.
Recklessly.
A
We've
tried
a
lot
of
different
things
that
it
worked
in
a
lot
of
things
that
haven't
worked
when
a
big
takeaways
is
that
we
recognize
that
cupones
really
is
great
for
developers,
containers
themselves
developers
if
they're
going
to
have
to
deploy
something
they
want
to
think
about
their
application.
They
want
to
deploy
a
nap.
They
don't
to
deploy
a
virtual
machine
or
some
other
thing.
They
just
want
to
say.
I've
got
my
app
and
I
want
to
get
it
out
there.
Well,
a
container
is
a
one-to-one
mapping
to
that
application.
You
have
so
for
developers.
A
It's
very
it's
very
easy
to
understand
how
your
application
then
behaves
in
this
containerized
space,
a
nod
to
node
source
and
the
community
of
the
crew.
Benetti's
open
source
project
design.
Docs
are
critical
so
when
we're
proposing
an
idea
or
we
have
an
idea
of
something
we
want
to
improve
or
change
or
to
designed
on
what
are
our
goals?
What's
the
executive
summary,
what
do
we
think?
A
So
we
had
automatically
building
in
deploying
initially
wasn't
deploying
anywhere,
but
at
least
was
building
when
we
had
tests-
and
the
next
stage
was
all
right.
Well,
let's
stand
up
a
cluster
and
actually
deploy
to
staging
and
then
that
production
it
made
our
iterative
cycle
as
we're
testing
and
validating
incredibly
fast.
A
Another
important
element
that
we
recognized
is
that
container
scale
independently
of
your
cloud
provider.
So
if
you're
using
a
container
orchestration
system
like
docker
or
a
docker
swarm
or
may
so
sore
crew
benetti's,
if
their
systems
are
already
deployed
on
to
some
number
of
hosts-
and
you
realize
that
you
need
to
scale
up-
maybe
your
front-end
application
scaling
up
is
just
a
question
of
adding
more
instance
is
of
that
container
in
the
existing
machines
you
have.
So
if
you
have
available
capacity
already,
you
just
add
more
containers.
A
Obviously,
if
you're
out
of
capacity,
you
have
to
add
new
machines,
this
is
different
from
a
system
where
you
are
deploying
new
virtual
machines.
If
you're
deploying
virtual
machines,
you
have
to
make
sure
hoping
and
praying
that
your
cloud
provider
has
the
ability
to
deploy
those
machines
they
have
downtime.
You
can't
deploy
if
they're
out
of
capacity
and
us
used
one
you
can't
deploy.
A
Another
important
element
is
that
micro
services
are
not
a
magic
bullet,
they
shift
complexity
around.
They
move
complexity
out
of
your
application
into
the
orchestration
layer,
and
so
that's
why
we
need
things
like
distributed
tracing
and
monitoring.
Distributing
blogging
have
an
understanding
of
what
your
overall
system
is
doing.
Each
individual
application
is
much
more
simple,
but
the
overall
system
is
still
complex.
A
Finally,
documenting
and
versioning
of
streaming
API
ap
is
is
still
kind
of
difficult.
We
for
rest
api
world,
there's
a
swagger
documentation,
spec,
it's
pretty
great
for
for
developing
building
rest
api's,
but
there
really
isn't
equivalent
that
I've
found
for
streaming
API
sore
message
by
stay
p
is:
it
tends
to
depend
on
which
there
might
be
something
specific
to
a
specific
message:
bus
system,
but
there's
no
generic
one
that
I
found
and
we've
also
built
some
open
source
tools.
So
container
izing
stateful
services
is
still
hard.
A
There's
no
way
around
that
because
we're
using
rethink
TV,
we
built
out
a
pre-baked
Kubra,
Nettie's
deployment
of
rethink
DB,
and
that
made
it
very
easy
for
us
to
manage
and
deploy
these
things,
and
it's
actually
gotten
picked
up.
There's
actually
a
number
of
companies
that
are
now
using
this
in
production,
which
is
cool.
It's
also
it's
okay
to
have
opinions,
so
we
realized
that
there
is
a
lot
of,
as
we
were,
adding
more
and
more
services
to
our
system
that
we
were
copy
and
pasting
deploy
code.
A
How
do
we
build
and
deploy
and
manage
the
the
CIC
d
system,
and
so
we
ended
up
building
something
that
we
call
Kate
s
or
Kate's
or
proven
any
scripts.
Basically,
is
a
declarative
syntax
for
managing
a
building
and
deploying
of
your
applications
to
Coober
Nettie's.
So
what's
next
well,
more
tests
always
need
more
tests.
In
fact,
we
have
very
few
tests
biggest
big,
big
shame
on
us.
We're
also
experimenting
with
ephemeral
environments
for
PR
testing.
A
So
when
we
get
a
new
PR
for
one
arm,
one
of
our
modules,
we
really
want
to
spin
up
a
whole
new
instance
like
one
instance
of
every
container,
so
we
have
a
complete
functioning
instance
of
audio
time
machine,
with
the
only
change
being
that
one
PR
we're
also
looking
to
switch
to
a
language
agnostic
message.
Bus
right
now
we're
relying
on
a
library
called
bull
which
build
off
of
reddest
is
an
NPM
module
is
aboard
very
very
well
for
us,
but
it
is
specific
to
nodejs.
A
There
is
no
equivalent
for
go
or
Java
or
Ruby
or
anything
else,
and
so,
if
we
have
other
developers
that
want
to
come
on
board
and
try
to
experiment
with
a
different
language,
I
want
to
experiment
with
go
I
have
to
go
write.
A
go.
Implementation
of
bull
before
I
can
actually
get
any
work
done.
So
we're
looking
to
switch,
replace
Redis
with
a
new
message.
Bus
system
called
Nats,
which
I've
been
following
for
a
long
time
and
finally,
we're
going
to
sort
of
try
to
dive
in
more
into
the
streaming
API
documentation
and
validation.
A
Now,
if
any
of
this
is
interesting
to
you,
and
you
want
to
see
moresee
what
it's
actually
like
tomorrow
morning,
I'm
running
a
workshop
with
my
coworker
Nathan
white
called
deploying
and
scaling
nodejs
with
crew
benetti's
tomorrow
at
9am,
and
we
will
actually
be
deploying
cloudy
time.
Machine
on
communities
in
the
workshop
I've
got
a
whole
lot
of
resources.
Here.
I
have
done
a
bunch
of
research
on
this
sort
of
learning
more
about
Claudia
time
machine,
so
I
think
it's
important
to
always
tip
your
hat
to
the
resources
you've
used
and
thank
you.
B
A
My
credit
card
is
paying
for
clatter
time
machine.
So
if
there's
a
foundation
that
is
interested
in
this,
for
benchmarking
are
the
things
I
would
love
to
have
a
conversation
with
them,
but
in
reality
you
know.
I
use
this
for
talking
with
customers
and
people
that
are
interested
in
learning
about
this.
So
it
turns
out
to
not
be
that
expensive,
mostly
because
it's
not
very
popular,
no
one
really
goes
to
it,
and
so
I
mean
we
have.
A
We
have
automated
machines
that
precede
data
that
have
a
list
of
I,
think
we've
got
a
thousand
URLs
that
we
cash
every
day.
So
we
have
consistent
traffic,
spread
out
through
the
day
sort
of
fake
data
to
make
the
system
actually
usable
and
actually
work
with
it.
It's
actually
pretty
cheesy
and
I
should
also
note
that
the
first
prototype
was
a
one
day
where
I
actually
been
working
on
this
for
a
while
now.
So
it's
finally.
C
A
We
built
the
first
version
the
first
day
and
that
first
day
we
had
see
icd
deploying
to
Google
container
engine.
We
didn't
have
a
staging
environment
at
first
we
were
just
all
right:
let's
just
shift
production
because
no
one's
using
it.
So
if
we
break
it,
who
cares
but
then
we
have
a
process.
So
then,
if
I
ship
something
new
and
one
of
the
other
people,
that's
working
with
you
on
the
project
and
they're
trying
out
their
thing
and
something
is
broken,
they
can
say:
hey
Ross,
you
break
something
and
I
go
yeah.
A
Probably
so
I
think
that,
having
that
immediate
sort
of
and
want
that
immediate
feedback
like
my
ship,
something
I
want
to
know,
do
my
test
pass.
Do
my
integration
test
pass?
Does
it
ship
the
staging
and
is
it
workable
and
then
let's
watch
it
for
a
day
and
we
have
automated
tests
or
automated
systems
that
are
basically
doing
queries
against
it?
Let's
see
what
the
graphs
look
like.
Is
the
system
still
happy?
Oh,
it's
still
happy
great,
let's
press
the
big
scary
ship
to
production
button.
What.