►
From YouTube: Advanced patterns for GitHub's GraphQL API
Description
Presented by Rea Loretta, CEO at Toast
The GitHub API is a key part of accelerating workflows at scale. This session will leave you with tactical tips for how to paginate effectively, create and plan queries, use tech-preview features, and manage costs learned from years of practice and iteration at Toast and beyond.
About GitHub Universe:
GitHub Universe is a two-day conference dedicated to the creativity and curiosity of the largest software community in the world. Sessions cover topics from team culture to open source software across industries and technologies.
For more information on GitHub Universe, check the website:
https://githubuniverse.com
A
Hi,
everyone
welcome
to
advanced
patterns
for
a
github
graph,
QL
API,
I'm,
RIA,
co-founder
and
CEO
of
toast
and
I'm
super
thrilled
to
be
here
in
front
of
all
of
you
and
that
all
of
you
want
to
take
your
grass
fuel
to
the
next
level
and
I
really
hope
that
you'll
enjoy
this
talk
as
much
as
I
did,
making
it
and
definitely
tweet
me.
If
you
have
any
questions,
I
make
sure
to
get
back
to
you.
A
Okay,
so
I
wanted
to
start
off
with
some
pre
talk,
hype
I
know
it's
mid-afternoon
and
it's
probably
tiring
you're,
probably
craving
in
that
pod.
So
work
with
me,
I
need
some
energy
in
the
room.
Here
are
some
pictures
of
people
who
listen
to
this
talk
you'll
become
a
master
of
graph
QL.
You
will
manipulate
code
with
just
your
mind.
You
will
put
out
production
fires
with
a
snap
of
your
fingers
and
you
will
be
a
10x
engineer,
but
you
probably
still
won't
understand
ma
nuts,
but
that's
ok,
it's
another
talk.
A
A
So
my
goal
with
this
talk
is
to
enable
you
to
build
your
own
data
fetcher
for
github
using
the
graphical
API,
and
then
you
can
go
crazy,
retrieve
massive
amounts
of
data,
and
you
know
it's
all
good
and
also
with
open
source
our
code
into
a
reusable
starter
kit,
so
everyone
can
play
and
I
believe
data
should
be
freely
accessible
for
everyone
who
is
curious
to
learn.
So
that's
why
it's
really
really
important
to
me.
Disclaimer,
though,
don't
use
it
for
evil.
It's
not
a
real
license,
don't
look
it
up.
A
So,
like
other
talks
today
have
mentioned,
data
is
pretty
amazing
and
does
cool
things
like
you
can
use
it
to
identify
bottlenecks
and
unblock
your
team.
This
is
good.
You
should
be
doing
this,
but
data
can
also
be
taken
out
of
context
and
used
to
make
up
arbitrary
metrics
to
track
engineer
performance.
This
is
bad,
very
bad
and
the
consequences
for
misusing
the
content
of
this
talk
and
open
source
library
is
I
will
personally
be
very
disappointed
in
you.
A
Let
that
sink
in
okay,
so
now
that
we
squared
that
bit
away
I
want
to
provide
context
about
myself
and
how
I
got
interested
in
graph
QL
for
starters,
toasts
integrates
with
github
and
slack
and
notifies
engineers
when
to
unblock
teammates
and,
as
you
probably
all
know,
github
has
a
lot
of
activities.
The
bigger
your
team
is
the
more
activity,
so
it's
very
important
for
us
to
not
simply
just
pass
everything
through
and
because
I
would
just
create
distractions
for
everyone.
A
So
toast
filters
through
this
noise
and
delivers
relevant
notifications
to
the
person
needed
for
unblocking
the
team,
and
we
started
out
as
this
zero
setup
notification
bot
for
individual
contributors
and
since
then,
we've
evolved
to
empower
the
entire
review
process
by
allowing
teammates
to
respond
directly
in
slack
and
this
past
year.
We've
just
we've
learned
so
much
and
we're
continuously
growing
to
meet
demands
of
the
team
level
and
I'm
sure
you've
all
been
in
healthy
teams
and
unhealthy
teams.
A
So
to
do
this,
we
learn
from
a
variety
of
Oryx,
with
different
workflows
and
habits,
and
we've
needed
to
build
out
a
robust
analytics
pipeline
for
this.
So
in
this
talk,
I'll
share
some
of
the
technical
challenges
that
we
face
along
the
way
and
personally
I
have
a
hard
time
following
super
abstract
talks.
So
I
thought
it'd
be
nice
to
and
fun
to
learn
through
building
something
meaningful.
A
So
let's
define
the
problem
and
the
scope.
What
data
should
we
pull?
First,
arguably,
one
of
the
most
interesting
aspects
of
github
is
code
collaboration
where
all
the
learnings
and
drama
happens.
So
here's
a
familiar
story.
We
write
some
code,
we
send
it
off
for
review
it
lands
in
the
reviewers
inbox
and
after
some
time
they
get
to
it.
If
we're
lucky,
they
give
us
a
meaningful
feedback,
plus
bonus
knits,
and
we
get
the
review
back
in
oh
boy.
A
We're
excited
to
make
all
these
changes
and
grow
as
engineers,
and
maybe
we
go
through
a
few
more
rounds
of
commits
re
reviews,
more
changes
and
eventually
ends
up
like
that
right.
But
in
all
seriousness,
pull
request.
History
tells
a
fascinating
story
and
we
can
learn
a
lot
from
the
timeline
events,
the
people
involved
and
how
quickly
concerns
were
raised
and
resolved.
A
So
this
leads
to
fascinating
insights
now,
I'm,
not
saying
that
you're
going
to
be
able
to
infer
all
of
this
magically,
but
you
can
at
least
start
looking
through
some
data,
and
then
you
can
detect
potential
triggers
that
lead
to
a
high-stress
working
environment
or
even
get
deeper
insights
like
what
is
the
best
time
of
day
to
ask
for
a
review
or
learn
that
large
PRS
are
three
times
less
likely
to
be
approved
within
the
first
24
hours.
With
this
you
can
encourage
best
practices
and
healthy
habits
for
your
team.
A
So,
let's
look
at
our
data
shape
at
the
root
there's
our
organization
and
some
repos,
and
each
repository
has
four
requests.
Y'all
know
this:
each
BOE
request
has
lists
associated
with
with
associated
entities
so
such
as
timeline
events,
reviews
commits,
etc.
All
of
those
these
are
the
ones
we
thought
were
interesting.
A
So
this
is
a
good
match
for
a
graph
he'll
write
in
the
shape
of
a
graph
we're
at
a
talk
about
graph
QL,
so
naturally
you're
all
here
to
learn
about
this,
but
I
still
want
to
take
some
time
and
point
out
benefits
of
using
Dracul.
So
for
one
we
can
pull
much
more
data
with
your
round
trips,
so
this
entire
process
is
more
efficient.
A
A
On
top
of
this
github
graph,
QL
API
is
one
of
the
better
graphical
API
eyes
out
there
and
they've
really
tried,
in
my
opinion,
to
provide
a
good
developer
experience.
So
that's
in
itself
a
good
reason
to
try
it
out,
but
if
you're
not
really
into
any
of
these
benefits,
then
this
talk
is
probably
not
very
useful
for
you,
but
I,
don't
know.
Maybe
you're
here
for
the
doodles
I'll
be
ok
with
that.
A
Ok,
so
we
got
our
data
data
shape.
Let's
look
at
the
schema
next,
so
the
schema
will
describe
our
graph
nodes
and
their
relationships
to
each
other,
and
the
good
news
is
github.
Scrath
ql
schema
is
structured,
exactly
like
we
want
so
can
we
just
fetch
all
the
data?
It's
not
that
simple.
So,
according
to
our
schema,
we
want
to
pull
all
the
poll
requests
for
the
org.
If
we
want
to
do
that,
we
want
to
first
pull
all
the
repos
and
then
all
the
associated
PRS.
A
So
a
very
simple
query
would
look
like
this
notice.
We
specify
the
organization
by
name
then
the
repositories
with
the
argument.
First
100-
and
this
is
our
page
size
upper
limit
which
specifies
fetching
only
the
first
100
repos,
which
is
fine
for
some
orgs,
and
we
also
request
the
first
100
PRS
for
this
repo
and
the
repos
I
have
more
than
100.
Prs
are
out
of
luck
for
now.
A
So,
coming
back
to
our
query,
there's
an
obvious
problem
with
this.
Since
we're
not
fetching
all
the
data,
we
can
get
the
first
100,
but
the
volume
of
data
that
we
want
may
be
much
larger.
How
large?
Well
it's
not
uncommon
to
have
thousands
of
PRS
in
one
repo
or
more
and
for
analytics.
We
need
to
fetch
everything
to
get
that
complete
picture.
How
do
we
tackle
this?
A
So
you
probably
all
know
that
API
s
and
database
has
solved
this
by
allowing
us
to
paginate,
and
there
are
various
types
of
pagination
supported
by
graph
QL,
but
the
most
flexible
one
is
cursor
based
pagination.
So
here's
a
quick
refresher,
let's
represent
the
pull
request.
We
want
to
fetch
with
these
squares
and
number
them
for
convenience.
Now,
let's
line
them
up.
A
And
as
I
mentioned
previously,
that
the
max
page
size
is
a
hundred
so
for
the
purposes
of
illustration,
let's
just
pretend
we're
gonna
pull
three
at
a
time
arbitrarily
choose
that
page
size
of
three.
So
these
will
be
the
first
three
pour
request
that
we
fetch
taking
a
look
at
our
updated
query.
We
just
update
that
number
to
three
and
also
notice
that
we
pull
our
specific
repo
by
name
to
simplify
our
query.
A
A
Our
return
would
look
like
this,
but
we
still
want
a
cursor
to
get
the
cursor
as
part
of
our
return.
We
need
to
update
our
query
to
pull
an
object
called
page
info.
That's
what
it
is
right
there.
Now
we
get
a
cursor
along
with
poor
requests
and
our
result
set
nice
so
for
context.
The
results
would
look
like
this.
A
Our
cursor
is
just
an
opaque
pointer.
It's
called
end
cursor
because
it
points
to
the
last
item
in
the
data
set.
We
just
fetched,
so
that's
what
it
looks
like
now.
It's
pointing
at
PR
number
three
in
this
case,
so
for
the
second
page,
we
just
repeat
simple
enough
page
slide
is
page
size
is
still
three
and
we
provide
the
cursor
that
we
got
from
page
one
and
we
use
the
after
parameter
this
passing
the
cursor.
A
Here's
how
our
query
looks
like
now
so
note
the
highlight
the
dollar
sign
this
case,
a
notes,
the
query
at
the
cursor
variable
that
has
to
be
passed
into
the
query
separately.
Your
implementation
will
depend
entirely
on
your
language
and
library
of
choice,
but
as
well.
It
looks
like
for
us
as
a
result,
this
query
would
fetch
PRS,
four
five
and
six
and
the
new
cursor,
which
points
to
PR
six
now.
A
Finally,
we
can
fetch
our
last
two
PRS
by
repeating
this
process
cool
like
this,
so
we
notice
notice
that
we're
using
our
cursor
from
step
two
right
there.
So
it's
pretty
straightforward
this
whole
path,
and
now
our
cursor
points
a
PR,
eight
right.
This
is
our
result.
How
do
we
know
we're
at
the
end?
Because
we
don't
want
to
just
keep
fetching
forever?
How
do
we
know
we
need
to
stop
fetching?
So
in
our
return,
the
list
of
PRS
is
contained
within
notes.
A
This
notes
object
and
the
cursor
is
actually
living
in
page
info,
like
we
know
already,
with
a
full
name
and
cursor
page
info
also
has
another
property
called
has
next
page,
which
behaves
exactly
as
it
sounds.
This
property
is
true
when
there
are
more
pages
and
we
can
keep
fetching
it's
false
when
we
fetched
everything
straight
forward.
The
updated
final
query
would
look
like
this
right
now
we
can
paginate
with
cursors
yay,
so
I
know
what
you're
thinking
you're,
probably
like
ria
hi.
A
A
So
hold
on
yes,
it
sounds
absurdly
simple
so
far,
but
what
makes
it
hard
like?
Why
is
this
a
basis
of
a
talk
at
all?
Well,
for
one
pagination
alone
is
not
enough
to
pull
all
the
data
we
need.
Pagination
is
a
big
part,
but
there
are
pitfalls
with
nested
queries
and
graph.
Ql
is
all
about
nested
queries.
We
can't
just
padge
me
on
any
query.
We
want
why
well
meet
this
little
guy
from
earlier.
You
may
have
noticed
him
terrorizing
my
engineers
in
the
opening
slides,
he's
a
gotcha
sore.
A
This
little
monster
lives
in
every
codebase
architecture,
complex
system
in
the
nooks
and
crannies,
ready
to
jump
out
and
delight
you
with
all
the
gotchas.
He
can
find.
That's
not
terrible,
he's
actually
kind
of
cute
to
me.
But
what
can
surprise
you
is
that
these
gotchas
can
turn
into
much
bigger
blockers
which
can't
be
resolved
without
some
serious
consideration.
So,
let's
take
a
look
at
the
gotchas
that
are
bigger
than
they
seem
so
one
of
the
biggest
strengths
of
graph
QL
also
presents
one
of
the
biggest
technical
challenges.
A
A
A
Regardless
of
how
many
nodes
actually
exist
on
each
layer,
the
nodes
limit
looks
at
the
projected
total,
so
a
hundred
times
100
gives
us
10,000
total
nodes.
What
if
we
decide
to
fetch
comments
with
our
pull
requests,
my
calculation
yields
1
million
total
nodes.
That's
a
lot.
Github
notes
limit
is
500,000,
which
is
already
very
generous,
and
if
we
run
our
query
will
see
the
following
error:
ok
and
the
good
part
is
github-
is
very
explicit
about
what
went
wrong.
A
A
A
This
is
a
special
type
that
we
can
add
at
the
top
level
called
rate
limit
here
we
ask
for
two
properties
cost
and
remaining,
and
remaining
is
like
how
much
of
our
query
allowance
is
left.
Quick
note.
This
is
the
primary
way
that
github
imposes
rate
limits
on
graph
QL
requests
where
rest
api
is
will
have
a
limit
of
5,000
requests
per
hour
graph
QL
api's.
Will
have
a
hourly
cost
limit
of
5,000
and
the
simplest
queries.
A
My
experience
usually
cost
one
for
comparison,
so
notice,
like
I'm
in
commented
out
comments
which
we're
still
pulling
10
and
notes
at
maths
right
now.
This
is
still
a
lot.
Yes,
how
much
this
costs
it's
one!
That's
a
bit
unexpected!
We're
fetching
a
lot
of
data
right.
So
let's
add
comments
back
in
and
go
ahead
and
make
a
guess
in
your
minds:
how
does
cost
change
yeah?
That's
a
huge
cost
difference.
That's
a
huge
cost
difference
with
a
query.
A
This
costly,
you
won't
be
able
to
exceed
50
requests
per
hour,
so
be
wary
and
there
are
many
things
that
can
influence
cost
the
biggest
one
in
my
experience
is
projected
amount
of
nested
entities.
In
this
case.
It's
almost
500k
so
use
this
as
an
indicator
for
potential
costs,
but
do
experiment.
It
will
vary
from
time
from
different
experiences.
We
will
discuss
strategies
for
minimizing
costs,
but
for
now,
let's
just
look
at
the
next
gotcha
so
now
that
we
satisfy
notes,
limits
and
cost
constraints.
The
final
test
is
the
actual
runtime
performance
of
this
query.
A
It's
not
something
that
even
github
can
protect
predict
ahead
of
time
and
sometimes
a
perfectly
fine
query
that
doesn't
look
like
much
would
consistently
timeout
and
the
timeout
is
fairly
aggressive.
If
a
query
takes
more
than
10
seconds,
it
will
fail.
Unfortunately,
so
this
query
only
costs
6
points
but
9
times
out
of
10.
It
would
timeout
with
this
error.
A
In
our
experience,
this
happens
when
a
query
actually
returns
a
lot
of
data.
So
here
it's
attempting
to
return
about
30,000
lines
of
JSON,
which
is
not
that
much
only
about
12
kilobytes
gzipped.
But
it's
a
lot
for
github
to
compute
in
a
single
API
call.
That's
not
pre
cached,
so
note
that
also
cost
is
just
an
estimate
of
how
heavy
the
query
is.
Forget,
observers
and
even
low-cost
queries
could
timeout.
This
makes
designing
our
strategy
complex
because
we
don't
know
upfront
which
exact
queries
would
timeout
ahead.
A
So
these
gotchas
are
were
the
opponents
on
their
own,
but
in
a
teamfight
they
can
be
overwhelming
and
present
quite
the
challenge.
So,
let's
take
a
look
at
the
whole
request
lifecycle
in
context.
We
write
our
graph
QL
query
that
pulls
NASA
data,
send
it
to
github
servers.
They
check
that
our
queries
but
note
below
the
notes
limit.
Okay,
that
passes,
then
they
compute
the
cost
and
add
it
to
our
current
running
costs
and
check.
A
A
Now
that
I've
convinced
you
of
the
sufficiently
challenging
problem.
How
do
we
solve
it
so
recall
to
answer
this
question
looks
to
call
our
data
model.
Any
entities
on
PRS
are
three
levels.
Deep.
Let's
try
to
query
for
comments
in
this
case
so
notice
we
picked
40
as
the
upper
limit
for
PRS.
Just
a
satisfying
notes
limit
could
have
been
100.
While
we
limit
comments
to
40,
it's
fine
doesn't
really
matter
much
for
this
example.
A
What
does
matter
is
that
this
query
Unisa
paginate
on
an
s,
identity
PRS
in
this
case,
and
it's
very
common
to
have
more
than
40
PRS
in
a
repo,
and
if
your
company
has
more
than
100
repos.
That
complicates
things,
but
even
if
we
can
handle
that
some
PRS
are
more
than
100
comments,
that's
even
more
nesting,
so
we
need
a
better
strategy.
A
So
this
query
would
fetch
IDs
and
names
of
first
hundred
repos.
Now
with
IDs
and
names,
we
can
create
a
new
query
where
we
provide
one
idea
at
a
time
and
paginate
on
purpose
simple
enough.
But
how
do
we
do
this?
Typically
graphical
API
is
have
a
rigid
schema
for
this
to
work.
We
need
to
find
a
way
to
construct
a
query
that
fetches
repositories
by
ID
or
similar
alternative.
A
This
is
from
the
API
schema
documentation,
so
we
can
either
paginate
over
repos
and
apply
various
filters,
or
we
can
just
pull
one
repo
at
a
time
the
latter
seems
handy.
So,
let's
use
that
now
we
can
run
one
query
for
each
repository
and
then
paginate
on
the
nested
entity.
Poor
request:
where
do
you
learn
how
to
do
this
previously,
and
this
approach
works
for
sure,
but
it's
kind
of
slow
and
it
doesn't
fully
leverage
the
power
of
github
Braccio
api?
Let's
think
about
how
we
can
optimize
us
further.
A
If
we
look
closely
at
a
typical
organization,
we
noticed
that
the
total
number
of
PRS
in
repos
is
very
non-uniform
in
distribution
and
it's
common
to
have
repos
with
thousands
of
peers
and
repos
with
zero
to
ten.
We
can
plan
our
queries
better
by
taking
into
account
the
total
number
of
PRS
and
repos,
and
if
we
can
get
this
number
without
fat,
fetching
all
the
PRS
that's
great
and
it
turns
out
we
can
so.
Let's
modify
our
query.
This
query
will
fetch
three
repos
and
a
total
counts
of
PRS.
They
have.
A
A
Okay,
so
first
we
run
a
query
to
get
all
repo
IDs
and
how
many
PRS
they
contain
to
illustrate
session
the
result,
let's
use
an
adorable
puppy.
We
separate
out
repos
that
have
fewer
than
100
PRS,
save
it
for
batch
fetching,
so
we
don't
have
to
fetch
each
separately.
This
reduces
a
total
number
of
requests
and
if
the
repo
has
more
than
100
PRS,
it
goes
on
the
right
and
we
fetch
one
repo
at
a
time
paginate
on
pour
requests.
A
And
then,
when
we're
done,
we
can
merge
the
results
on
those
two
strategies
and
end
up
with
a
lot
of
puppies.
So
straightforward
enough.
Let's
see
it
with
some
data.
Here's
our
repos
with
their
accounts,
I've
arranged
them
an
increasing
count,
so
we
can
quickly
sort
through
them
into
two
groups
and
we're
batching
these
ones
into
groups
where
the
totals
still
less
than
a
hundred.
It
doesn't
really
matter
how
we
group
these
as
long
as
we
keep
below
the
max
of
100.
A
A
We
merge
this
together
and
we
have
all
of
our
peers
that
we
set
out
to
fetch
right
nice,
there's
still
one
piece
to
flush
out:
let's
deep
dive
into
this
part,
so
the
repos
that
don't
have
a
lot
of
PRS,
we
need
to
construct
a
query
that
pulls
a
subset
of
repos
and
they're
PRS.
How
do
we
do
this?
Specifically?
How
do
we
batch?
A
We
need
to
find
a
way
to
construct
a
query
that
fetches
multiple
repositories
by
ID
seen
this
documentation
before
it
does
not
allow
us
to
pull
data
by
multiple
IDs
or
by
multiple
names.
What
if
we
just
combine
multiple
repository
types
into
one
query?
Ok,
let's
be
less
confusing
with
some
examples.
One
approach
would
be
to
use
graphical
aliases
to
fetch
several
repositories
by
name
notice,
these
repo
back-end
and
repo
web
labels.
These
are
aliases,
which
is
just
a
way
to
name
your
data
to
avoid
conflicts
in
the
resulting
data
set.
A
This
would
work,
but
it's
a
lot
more
hands-on.
We
would
have
to
figure
out
alias
names
and
how
to
stitch
this
together.
This
will
either
result
in
a
heavier
query
or
require
us
to
use
partials,
so
we're
not
duplicating
our
sub
queries
for
each
repo
and,
most
importantly,
this
approach
only
works
because
there
is
a
type
that
allows
us
to
fetch
repo
by
name
what
about
all
the
other
entities
we
might
need
to
pull
like
this.
A
This
allows
us
to
query
any
item
by
ID
and
it's
like
a
global
key
value
store
so,
for
example,
consider
this
query
notice
top-level
that
accepts
an
array
of
IDs
and
returns
given
entities
like
type,
but
because
it's
a
generic
type.
We
can't
access
any
of
the
repository
type
properties.
We
can
only
get
ID
and
type
name
fields.
So,
in
order
to
solve
this,
we
need
to
use
inline
fragments
it's
like
a
type
cast
for
graph
QL,
and
we
can
now
rewrite
our
query,
reporting
Li,
that's
what
the
syntax
looks
like.
A
This
allows
us
to
access
properties
on
a
specific
type
and
it's
a
very
useful
technique.
We
can
use
it
to
fetch
pretty
much
anything.
We
can
just
keep
adding
IDs
the
list
simple
and
easy.
So
when
we
batch
we'll
just
feed
all
the
IDS
from
each
batch
into
this
parameter
and
fetch
all
those
entities
at
once,
so
you
might
be
thinking
hey.
This
type
thing
looks
like
cheating.
A
Can
I
just
provide
a
million
IDs
and
fetch
everything
in
one
query,
you
are
right:
normal
pagination
rules
don't
apply
to
this
type,
it's
not
bound
by
the
upper
limit
of
a
hundred
items.
But
in
our
experience
it's
easy
to
construct
queries
then
that
lead
to
timeouts,
and
it
really
depends
on
how
heavy
the
objects
are.
In
our
experience,
the
reasonable
limit
would
be
between
30
and
80
I
use
at
once.
The
proquest
with
a
lot
more
nested
properties.
Fetched
50,
is
a
good
middle
ground.
A
These
should
provide
some
context
for
your
experimentation
all
right,
so
this
all
sounds
good,
but
let's
make
sure
we
covered
and
accounted
for
all
the
gotchas
that
we
discussed
before
these
guys.
So
the
nested
data
problem
is
solved
by
double
pass
by
definition,
although,
depending
on
what
you're
hoping
to
do
you
mind,
a
triple
pass
or
dribble
pass
more
passes
nodes
limits
with
this
is
simple.
Well,
we
just
make
sure
we
don't
exceed
the
notes
limit.
During
query
composition,
we
discussed
optimizing
for
cost
reduction.
Let's
make
sure
we
handle
this.
A
Let's
consider
this
query
to
fetch
reviews
and
comments
for
every
pull
request.
If
we
run
it,
as
is
the
cost,
is
gonna
be
the
following,
which
is
a
lot?
We
can
cheat
a
little.
What
do
we
know
about
this
data?
If
we
think
about
this
query?
Well,
it's
very
rare
when
a
poor
ol
class
will
have
more
than
100
reviews
or
more
than
40
comments
per
review,
so
intuitively
it
just
seems
abnormal.
So
I
would
say
it's
safe
to
decrease
those
page
sizes.
A
Let's
assume
there
are
no
more
than
20
reviews,
for
starters,
run
our
update
of
query
and
look
at
that.
The
cost
went
down
by
a
factor
of
five
optimizations
like
this
require
knowing
your
data,
of
course,
intuition
and
experiments,
but
it
can
make
a
drastic
difference.
So
definitely
make
sure
you
play
around
with
this.
A
We
at
toast
analyzed
data
from
a
bunch
of
companies
and
pick
sensible
defaults.
So
if
you're
planning
to
use
our
library,
that's
all
covered
there.
Lastly,
we
have
timeouts.
This
is
purely
trial
and
error
and
running
queries
in
production.
What
we
learned
is
that
the
easiest
strategy
to
reduce
page
size
is
at
the
cost
of
making
more
requests,
or
the
easiest
strategy
is
to
reduce
pay
sites.
A
We
can
reduce
page
sizes
from
like
a
hundred
250
and
double
the
number
of
requests,
but
in
general
seems
like
50
s,
a
good
default
to
set
as
a
pagination
limit
for
heavy
queries,
and
the
last
bit
I
want
to
mention
is
a
schema
preview.
The
github
Raphael
API
is
constantly
evolving
and
new
features
become
available.
Often,
for
example,
things
like
is
my
pull,
request
a
draft
and
are
my
CI
checks
passing
these
are
still
in
the
preview
mode.
So
to
access
these,
you
would
need
to
execute
your
queries
while
sending
HTTP
accept
headers.
A
It
would
look
something
like
this.
This
is
a
header
I
use
for
draft
pull,
requests
and
cool,
that's
it.
That
was
a
lot
of
stuff.
A
lot
of
learnings
that
we
just
went
through
give
yourself
a
pat
on
the
back.
If
you're
still
with
me,
high-five
and
go
build
your
own
github
data,
fetcher,
here's
the
link
to
the
open
source
library,
it's
a
work-in-progress
and
currently
contains
starter
code.
A
You
can
use
it
as
a
one-off
script
to
pull
data
for
your
org
and
get
it
as
a
JSON
file,
we'll
be
adding
to
it
more
as
we
build
more
so
and
you
know
feel
free
to
send
in
some
poor
requests
and
if
you're
like
RIA
where's
the
advanced
stuff,
and
this
talk
is
apparently
still
way
too
baby
level
for
you.
Please
please,
please
PLEASE,
say
hi
to
me
after
I'd
love
to
get
your
expertise
on
stuff
that
we
haven't
solved
yet
and
thanks
and
you
can
send
me
homework
reading
on
Twitter
yeah.