►
From YouTube: IPFS Weekly Call 2019-03-18 🙌🏽📞
Description
Newsletter: https://tinyletter.com/ipfsnewsletter
A
A
All
right,
everyone,
howdy
hello
and
welcome
to
the
IPA
FS
weekly,
call
where
we
get
to
learn
the
great
things
are
happening
in
our
committee
and
stuff
that
we're
building
today,
let's
see
so
before
we
begin.
If
everyone
can
just
before
we
again,
if
everyone
can
just
fill
out
the
IP,
our.
A
Weekly
Call
list,
so,
if
you're
attending
to
call
just
put
your
handle
in
under
the
attendees
section,
we
don't
have
any
announcements
for
today.
So
we'll
begin
the
main
presentation,
but
before
we
begin
I
want
to
thank
Ollie
for
taking
notes.
Thank
you,
ollie,
and
today's
main
presenter
is
Michael
Rogers
and
he's
going
to
talk
about
github
ecosystem
metric.
The
project
he's
been
working
on
the
past
several
weeks,
so
Michael,
please
take
it
away
cool.
B
During
the
broadcast
now
let
me
know
when
you
can
see
it:
cool,
okay,
I
think,
I'm.
Sorry,
awesome,
awesome!
Okay,
let's
go
to
content
okay,
so
this
is
going
to
be
a
little
bit
kind
of
anyway.
B
So
the
system
that
we
built
uses
a
gh
archive,
and
so
this
that
this
is
when
they
get
up
with
a
but
basically
every
hour
they
put
out
a
file
right
and
it
looks
like
this
and
it
has
so
it's
it's
an
hour
of
public
data,
so
every
single
piece
of
metal
is
metadata
for
every
single
action
that
anyone
takes
across
all
of
github
public
resources,
so
any
push
any
at
people
watching
the
repo
people
commenting
on
things.
Anything
like
that.
There's
only
a
couple
things
that
it
doesn't
really
pick
up
like.
B
If
you
you
know
emoji
thumbs
up
something,
it
doesn't
pick
that
up,
but
it's
it's
a
very
large
amount
of
data.
So
if
you
go
back
like
maybe
two
or
three
years,
you
end
up
with
like
a
couple
terabytes
worth
of
data,
but
because
of
rate-limiting
and
a
bunch
of
other
stuff.
This
is
really
like
the
only
effective
way
to
look
at
entire
ecosystems
across
github.
B
And
then
we
want
to
sort
of
look
at
all
of
that
repo
set
across
all
of
github
archive
or
we
might
use
bigquery
or
something
like
that
and
like
with
bigquery,
we
could
actually
look
at.
They
have
a
snapshot
that
they
date
I
think
like
monthly
of
every
single
file
in
every
master
branch
across
github.
So
that
would
allow
us
to
say,
like
oh,
like
what
does
the
doc
Rica's
look
like.
So
what
are
all
the
repos
that
have
a
docker
file
in
them?
Things
like
that,
so
you
can
be
a
really
useful.
B
Information
of
that.
Bigquery
is
quite
expensive,
though,
so
we
really
need
to
to
limit
those
queries
like
that
query,
to
query
those
docker
files,
it's
probably
like
$15
to
run
it
like
once
so
we
identify
repos
in
one
step
and
then
we
basically
we
want
to
get
a
filtered
set
of
data
and
I'm
going
to
explain
how
that
system
works
a
bit
here,
so
we
use
lambda,
and
what
we
do
is
that
we
ask
lambda
for
a
month
of
data.
B
24
requests
to
outside
to
another
lambda
function
called
no
sorry,
it's
not
filtered,
it's
called
pluck,
and
so
what
pluck
is
going
to
do?
Is
it's
it's
first
going
to
check
like
do
we
have
that
github
archive
file
for
that
hour
in
s3?
If
not
we're
going
to
go
and
get
it
and
then
once
we
put
it
in
s3,
we
use
this
thing
called
s3
select
an
s3
select.
B
Actually,
it
allows
you
to
do
like
sequel
queries
on
either
CSV
or
JSON
data,
and
that
data
can
even
be
lines
of
JSON
in
a
gzip
file.
So
it
actually
works
perfectly
for
good
of
archives
except
every
couple
months.
Some
people
do
a
gigantic
push
to
github
with
a
lot
of
file
updates
and
even
the
metadata
about
the
push
is
larger
than
one
Meg.
B
That
just
does
the
exact
same
operation,
and
so
what
happens
is
that
with
both
of
these,
what
they're
going
to
do
is
they're
gonna
return
a
just
just
to
the
attributes
out
of
the
objects
that
we
want:
individual,
turday,
okay,
so
coming
back
to
this
initial
piece
here,
when
we
call
filter
month,
we
also
give
it
what
we,
what
we
call
sort
of
a
filter,
and
so
that
has
that's
like
an
encoded.
Seymour
object
basically,
and
we
stored
in
s3.
B
So
we
don't
have
to
pass
it
to
the
lambda
functions
because
they
can
actually
get
too
big
to
be
passed
around
between
the
lambda
functions
of
the
time,
so
that
Seymour
object
has
not
only
just
these
flex
values
in
it,
but
also
any
repositories
that
want
to
filter
out.
So
what
pluck
des
will
do
after
I
guess
this
plug
set
is
then
filter
out.
All
of
the
repository
sorry
I'm,
all
of
the
information
just
for
the
repositories
that
it
cares
about
and
then
filters
day
is
going
to
store
that
in
s3.
B
B
Another
lambda
function
called
scan
cat
and
that
can
catch
all
those
files
together
and
stores
the
new
value
in
s3.
So
when
we
go
and
get
a
year's
worth
of
data,
we
basically
just
do
a
month
at
a
time
a
month
is
going
to
generate
between
700
and
1400
lambda
functions
depending
on
caching,
and
those
are
all
gonna
run
in
parallel.
So
if
it
has
to
hit
plug
fall
back
it'll
take
a
little
bit
longer
because
you're
you're
off
the
whole
sets
gonna
go
as
slow
as
the
slowest
request.
B
But
you
know
in
about
like
less
than
10
seconds,
we'll
get
back
an
entire
month's
worth
of
data,
and
then
we
do
a
month
at
a
time.
I
would
love
to
pull
a
year
at
a
time
we
got
the
lamda
rate
limit
increased,
but
then,
as
soon
as
the
rate
limit
was
increased
for
lambda,
we
noticed
that
there's
also
in
a
three
rate
limit.
So
now
we
were
blowing
out
the
s3
right
limit.
Once
we
started
to
actually
use
our
new
landal
in
it,
so
we're
trying
to
figure
that
out.
B
Hopefully
we
would
eventually
just
be
able
to
do
a
year
at
a
time,
and
then
we
could
pull
a
year
in
about
ten
seconds.
But
as
it
stands,
you
know
we
can
get.
You
know
three
years
worth
of
data
in
less
than
ten
minutes.
So
it's
pretty
fine,
but
the
nice
thing
about
this
system
is
that
because
we're
taking
a
c-more
object
here,
we're
not
limited
in
terms
of
like
how
many
repos
that
we
can
query
for,
we
can
literally
filter
for
a
hundred
thousand
repositories.
B
It
would
take
a
little
bit
longer
because
each
of
those
are
gonna
have
to
code
this
giant
sieve
or
object
at
its
s3.
But
you
know
we
can
do
really
really
huge
sets
of
data.
That's
one
of
the
reasons
why
we
couldn't
use
some
of
the
off-the-shelf
stuff,
like
big
queries.
Big
query
also
has
all
this
activity
data
in
it
and
other
than
this
being
cost
prohibitive.
B
There's
a
limit
on
the
size
of
the
single
query
that
you
can
do
so
being
that
there's
just
a
single
query:
we
can't
like
jam,
you
know
hundred
thousand
repos
in
there,
so
we'd
have
to
chump
through
them
and
then
it
would
be
insanely
expensive,
so
yeah.
This
is
basically
how
we
can.
This
allows
us
to
pull
in
a
very
reduced
data
set
of
just
the
values
that
we
care
about
for
just
the
repositories
that
we
care
about
system.
B
So
what
ends
up
happening
now
now
that
we
have
this
system
is
that
we
as
long
as
we
can
basically
turn
an
ecosystem
into
repositories.
We
can
then
filter
out
a
datasets
for
those
and
then
in
just
a
matter
of
like
actually
processing
the
metrics,
and
this
is
sort
of
the
stage
that
we're
at
now
what
we're,
trying
to
figure
out
useful
data
that
we
can
get
from
this
right.
So
you
can
do
things
like
we
can
get
the
unique
like
people
that
are
engaged
in
different
kinds
of
activity
or
all
activity.
B
Who
are
all
the
people
that
engaged
in
anyway,
so
there
even
in
you
know,
sort
of
issue,
comments
and
stuff
like
that,
so
you
can
assume
that
they're,
like
users
of
the
system,
you
can
also
just
look
at
you
know
the
overall
level
of
activity
right
like
it's,
this
Justin
coming
down,
or
is
it
still
sort
of
you
know
rising
in
terms
of
overall
activity
and
then,
most
importantly,
you
can
take
all
of
that
data
and
you
can
start
to
look
at
like
well.
What
is
the
growth
rate
of
that
ecosystem?
B
How
much
is
is
activity
and
unique
users
growing
over
different
time
slices?
Do
you
want
to
look
at
so
yeah?
This
is
sort
of
what
we're
doing
with
the
system.
Now.
The
main
reason
that
I
wanted
to
sort
of
show
people
the
system
and
how
it
works
is
that
it's
a
little
bit
different
than
traditional
metric
systems,
where
you
log
the
metrics
in
some
database,
and
then
you
can
do
queries
on
it.
B
We
actually
have
to
you,
know,
go
and
filter
this
giant
data
set
from
github
and
and
then
get
interesting,
metrics
out
of
it
from
there.
But
with
it
we
can
look
at
not
just
our
ecosystems
right
so
for
ipfs
and
let
p2p
and
all
these
other
systems.
But
we
can
look
at
ecosystems
that
were
interested
in
maybe
getting
involved
in
right
as.
B
C
B
Work
we
can
say
like
okay.
Well,
what
is
the
growth
of
docker
files
compared
to
package
JSON
files
and
how
many
people
are
engaged
in
those,
and
one
of
the
interesting
things
here
is
that
when
you
look
at
different
ecosystems,
you
know
the
number
of
packages
and
a
package
manager
may
be
growing
or
you
know
just
the
number
of
docker
files
may
be
growing
in
different
repositories,
but
there
may
be
really
big
differences
and
the
number
of
users
of
those
ecosystems.
B
When
I
did
this
analysis
four
or
five
years
ago,
using
much
worse
tools
just
separating
out
to
back-end
JavaScript
from
front-end
JavaScript,
what
there
was
like
a
noticeable
difference
in
the
number
of
people
engaged
in
those
front-end
projects,
so
it
you
could
sort
of
correlate
that
to
there
are
more
people
using
those
actually
there's
more
users
of
these
projects
than
the
backend
projects.
So
this
gives
us
like
really
interesting
sort
of
competitive
analysis
between
different
open
source
ecosystems
as
well
so
yeah
I.
Think
at
this
point,
I
can
open
it
up
for
questions.
A
B
B
What
so,
if
you
go
into
your
github
repo,
there's
like
a
dependents
tab,
and
you
can
see
all
the
repositories
I'm
on
your
repository,
there's
a
lot
more
repos
there
than
in
the
libraries
io
data,
probably
because
libraries
IO
is
doing
like
a
really
complicated
operation,
where
they're
trying
to
look
at
repo
data's.
That
is
in
the
package
JSON
and
then
correlate
it
back
through
the
dependency
graph.
But
not
everybody
has
that
metadata
up-to-date,
whereas,
like
github,
you
know
has
all
of
the
repos
themselves.
B
So
they
know
if
a
package.json
dependent
on
that
package
name
somewhere,
they
can
even
pull.
You
know,
repositories
that
are
depending
on
it
that
aren't
published
in
any
way.
So
that's
really
useful
and
so
yeah
and
but
but
the
problem
with
that
data
from
github.
B
Is
that
there's
no
API,
so
then
poking
them
to
try
to
get
them
to
give
me
that
data
in
some
usable
way
without
me,
like
literally,
writing
a
scraper
through
the
website
and
bi
yeah,
so
I've
looked
at
sort
of
like
looking
at
sort
of
the
data
system
in
our
ecosystem.
One
interesting
thing
is
that
so
far
the
the
projects
that
are
in
a
similar
space
to
us
I'm,
a
very
similar
sort
of
curves
and
they're
grow.
So
it
looks
like
you
know
the
space
that
we're
in
the
sort
of
the
centralization
space
is
growing.
B
It
really
at
a
particular
rate
and
we're
all
growing
in
it
really
well.
So
that
was
one
interesting
kind
of
takeaway,
but
I'm
sort
of
waiting
to
get
more
data
and
to
refine
our
repo
identification
before
I
make
a
lot
of
other
judgements
about
it.
Also,
like
I,
really
want
to
look
at
more
mature
ecosystems
and
see
sort
of
like
where
their
growth
hit
particular
spikes
and
what
the
different
curves
went
and
see.
If
you
can
find
correlations
between
them
and
then
we
can
look
at
our
growth
and
say
like
okay.
B
What
what
phase
of
adoption
are
we
in?
In
which
phase
of
maturity
are
we
in
in
terms
of
because
in
belleville
we
don't
have
great
stuff
on
that?
Yet
we
literally
like
just
got
the
system
sort
of
working
and
then
every
time
that
it
works.
Then
we
like
make
some
make
some
improvement
to
make
it
a
little
bit
faster
and
then
that
that
blows
out
a
new
rate
limit
that
we
didn't
know
about
somewhere.
B
And
then
we
hunt
that
down
and
then
add
a
new
layer
of
caching
and
that
blows
out
some
you
rate
limit,
because
things
are
faster.
So
a
lot
of
sort
of
like
if
you
look
at
the
repository
there's
just
like
a
lot
of
churn
in
the
code
and
the
methods
that
were
using
to
get
to
something
relatively
stable,
now,
essentially
like
at
at
the
filter
month,
phase
there's
effectively
like
a
rate
limit
or
inside
of
it
that
tries
to
estimate
the
potential
number
of
SLAM
to
functions.
A
This
question
is
very
similar
phone
Jonny
Quest,
so
many
insight
so
far.
B
Yeah
yeah
I
think
I
kind
of
covered
that
not
yeah
covered
as
much
as
there-there
is
so
far.
My
man
insight
is
the
LEM
de
is
like
super
powerful
and
can
do
this
kind
of
stuff
like
really
fast
like.
If
you're
running
this
locally
on
your
own
machine,
it
would
take
like
a
month.
You
can
do
it
like
a
lot
quicker.
However,
there
are
really
really
nasty
undocumented,
like
fairly
undocumented
rail.
B
Everything-
and
also
this
is
like
not
a
common
use
case
in
the
lamda
world,
like
bursting
from
zero
to
thousands
of
functions
and
back
down
to
zero
is
not
as
common,
so
we're
hitting
a
lot
of
like
you
know,
weird
things
with
that.
A
Next
question
from
ollie:
have
you
got
any
exciting
headline
stats
out
of
it
so
far
any
surprises
and
I
guess
that
would
be
not
yet
because
you're
still
any
early
phases,
I'm
analyzing
the
data
right,
oh
yeah,.
B
So
right
now
I'm
trying
to
just
provide
the
data
to
the
projects
and
then
I'll.
Let
the
project's
decide
that
what
they,
what
kind
of
top-line
numbers
they
want
to
pull
out
of,
that
I
will
just
sort
of
caution
generally
against
the
top-line
numbers
like
all
of
these
are
estimates
right,
like
not
everyone,
not
all
of
our
users
engage
in
a
public
repository,
not
all
of
our
users,
public
repository,
you
know.
A
B
We
also
just
had
a
sort
of
a
quick
metrics
print
where
we're
just
trying
to
get
a
bunch
of
data
out
of
a
few
different
projects,
so
that
gave
us
some
basic
info
on
on
the
packages
in
the
in
the
org
for
Lupita,
Pei,
PFS
and
IP
LD,
and
so
we
were
able
to
see
like
okay
like
what
what
is
the
further
growth
rate
right
now
in
just
activity
on
our
own
project.
Stuff
like
that
but
yeah,
where
we
the
whole
sort
of
like
across
the
organization.
The
sort
of
metrics
in
KPI
situation.
B
Right
now
is
like
just
log
data,
like
figure
out
how
we
get
data
and
then
and
then
we
have
once
we
have
a
lot
of
data,
then
we
can
start
to
talk
about
the
best
ways
to
use
it
and
analyze
it
so
so
yeah.
This
is
like
a
lot
of
what
this
presentation
is
meant
to
do
is
to
open
up
the
possibility
of
questions
that
you
can
ask
about
data,
because
now
we
can
capture
this
entire
ecosystem
data
and
please
sort
of
feed
these
kinds
of
questions
back
into
the
process.
A
Next
comment
is
from
Jim
Pitts.
He
said
I'd
love
to
see
some
cohort
analysis
see
how
many
people
are
contributing
and
then
turning
it
out.
B
B
So
we
would
use
bigquery
for
that,
so
bigquery
can
look
at
the
contents
of
all
of
the
doctor
files
and
on
github
and
then
from
that
we
would
get
a
set
of
repos.
So
if
we,
if
we
wanted
to
have
some
kind
of
matching
inside
of
the
docker
file,
then
yeah
we
that
would
tell
us,
like
you
know,
if
a
doctor
file
has
this
thing
in
it.
B
Tell
me
about
it
or
tell
me
the
repo
name
or
tell
me
the
file
or
whatever,
and
then
we
could
then
do
a
filter
analysis
across
all
of
those
repos
in
the
archive.
But
this
the
system
itself
doesn't
have
the
contents
of
all
the
repositories
in
it.
Bigquery
is
quite
good
at
that
and
yeah
we
replacing
that
would
be
pretty
massive
and
I
mean
and
not
a
huge
win
for
us,
because
the
the
types
of
queries
that
we
need
to
do
on
the
actual
file
contents
right
now
are
pretty
minimal.
B
Like
you
know,
we
just
need
to
see
something
matches.
Usually
so
we
can,
you
know,
pay
you
know,
fifteen
twenty
dollars
or
whatever
a
query,
get
that
data
and
then
have
the
set
of
repositories
and
then
have
like
this.
It's
much
more
cost
effective
system
to
look
at
all
the
meta
analysis
about
those
repositories.
A
C
So
what
I
was
thinking
there
was,
rather
than
just
like
a
repo
that
has
a
match
or
some
content
in
a
file.
But
like
are
there
kind
of
clusters
of
repositories
that
all
say,
use
this
these
five
packages
or
and
what
mister
then
cross
over
of
like
other
group
packages?
That
would
be
often
used
together
to
kind
of
kind
of
highlight
collections
of
similar
or
shared
bits
of
code,
so,
like
ipfs
libraries
always
being
required
along
with
these
kinds
of
things,
maybe
probably
lit
p2p
related
stuff,
for
example.
C
But
also
can
we
highlight
similar
kind
of
groups
of
types
of
dependencies
that
people
are
using
with
ipfs
to
get
an
idea
of
where
there
might
be
almost
like
an
amazon
recommendation
for
stuff?
That
is
closely
related
to
a
given
set
of
packages
and
I'm,
really
mostly
interested
in
ipfs
packages,
particularly
there.
B
They
then
you
would
just
look
at
their
package,
a
son
and
then
start
to
basically
log
all
of
the
packages
and
see
what
are
the
kind
of
top
ones
that
were
required
alongside
and
then
you
could
use
this
system
to
then
do
an
analysis
of
those
repositories
and
find
people
that
are
the
most
active
as
well.
So
you
can
start
to
like
include
the
IP
FS.
B
D
D
A
Will
see
everyone
next
week
the
next
Monday's
women,
it's
ipfs,
recall,
have
a
great
week
take
care
bye.