►
Description
An introduction to the GitHub Import tool by George Koltsov.
A
Hello
and
welcome
to
github
importer
overview.
This
is
a
gitlab
feature
that
allows
you
to
migrate
projects
from
github
to
gitlab.
You
can
check
more
details
under
these
urls,
the
user
guide,
as
well
as
the
developer,
documentation.
A
What
it
does
like,
I
said,
migrates
github
repositories
to
gitlab's
projects
and
migrates
this
list
of
data.
Mostly
it's
you
know,
pull
requests
issues,
labels
milestones
stuff
like
that,
and
comments
and
lfs
objects.
A
A
Yes,
it's
highly
asynchronous
and
I
will
show
more
detail,
but
when
we
execute
this
importer,
the
very
first
the
importer
overall
happens
in
different
stages.
It
executes
it
imports
data
in
stages,
so
the
first
stage
is
repository.
This
is
something
that
is
the
first
step
for
many
of
our
importers.
That's
the
step
that
we
need
for
in
order
to
migrate,
merge
requests
it's
pretty
central
to
to
our
all
of
the
importers.
A
So,
as
you
can
see
here,
the
parallel
importer,
it
includes
the
job
for
import
repository
worker,
it
does
the
import
and
then
the
next
stage
is
base
data
and
base
data
is
essentially
this
labels
milestones
and
releases
it.
It
performs
that
it
executes
every
importer
for
within
this
job
and
then
it
includes
infuse
pull
requests
worker.
A
So
that's
the
next
stage
and
after
that,
all
of
these,
so
we
have
an
advanced
stage
worker,
that's
the
worker
that
is
used
in
order
to
wait
for
a
number
of
jobs
to
complete
before
we
go
because,
prior
to
prior,
through
the
advanced
stage
worker
all
of
these
workers,
you
know
as
the
last
step,
they
would
end
you,
the
next
worker,
but
in
case
of
advanced
stage
worker.
That's
no
longer
the
case,
because,
because
of
the
way
we
import
data
and
here's
the
list
of
stages
that
we.
A
Yeah,
it's
it's
not
necessarily
that
all
the
jobs
have
been
completed,
but
it
waits
for
jobs
to
be
completed
and
and
then
it
reincurs
itself.
So
the
the
main
collections
worker
it
executes
an
importer.
Okay
and.
A
Every
importer
has
this
module
parallel
scheduling
all
of
the
importers
across
the
entire
github
importer.
They
all
have
the
similar
pattern,
which
is
utilizes
inheritance
heavily,
as
well
as
the
defines
these
kind
of
classes.
So
there's
a
number
of
methods
that
you
need
to
define
and
include
the
parallel
schedule,
scheduling,
model,
sorry
module
and
parallel
scheduling
what
it
does
is
it
fetches
it
fetches
resource
collection,
collections,
endpoint
from
github
page
by
page
and
for
every
object
to
import
it
and
use
a
separate
job.
A
A
We
take
that
and
we
enqueue
the
sidekick
worker
for
each
individual.
Now,
pull
requests
for
each
individual
object,
whatever
it
may
be:
issue
node
label,
not
label,
but
issue,
note,
pull,
request,
etc.
Right
and
then
we
create
a
waiter
for
it,
and
then
we
return
it,
and
I
will
cover
the
waiter
now
later
on.
A
Actually
right
now,
the
job
waiter
is
used
to
keep
track
of
state.
A
If
you've
seen
my
previous
recordings
for
import
export,
for
example,
we
use
database
record
to
create
to
keep
track
of
import
state,
but
here
we
use
registered
state
to
keep
track
of
all
the
all
these
jobs,
because,
if
you
imagine,
if
you
have
a
project
with
10,
000,
pull
requests
and
each
pull
request
is
a
separate
sidekick
job
and
if
you
have
a
separate
sidekick
job
for
every
single
sorry,
if
you
have
separate
database
record
for
every
single
pull
request
to
keep
track
of
its
state.
A
A
So
yeah
here
is
the
gitlab
job.
Waiter,
utilizes
radius,
for
it.
A
And
that's
how
it
keeps
track
of
of
of
the
of
the
progress
that's
been
made.
You
know
the
advanced
stage
worker.
It
checks
this
array
by
by
the
key,
if
it
sees
it
being
empty,
then
that
indicates
to
the
worker
that
the
job
is
done.
If
there
is
something
in
the
array,
then
it's
it
knows
that
it's
not
done
yet.
A
Okay
and
every
importer
yeah,
like
I
said,
every
importer
class
kinda,
looks
like
this,
where
it
includes
parallel
scheduling
and
defines
the
importer
class.
So
this
is
the
collection
importer
right,
but
it
defines
individual
importer,
an
individual
object,
importer,
the
representation
class,
the
sidekick
worker
class,
the
collection
method.
This
is
the
method
that
is
used
to
fetch
data
from
github
and
so
on.
A
You
can
have
a
look
at
the
parallel
scheduling
I
mean
at
first
it
is
kind
of
difficult
to
grasp
and
navigate,
but
over
time
it
becomes
easier
to
locate
certain
certain
bits
of
logic
that
you're
you're
looking
for
and
something
is
wrong
with
the
import.
But
that's
oh
no,
never
mind.
A
Bulk
insert
so,
unlike
unlike
import
export,
where
we
for
every
for
everything
that
we
insert
into
the
database
for
everything
that
we
save.
We
run
active
record
callbacks,
except
for
a
few
exceptions,
but
the
overall
is
that
all
of
the
callbacks
are
running.
We
never
skip
anything
well,
almost
anything
in
github
importer,
that's
exact!
That's
the
opposite!
Actually,
because.
A
Well,
primarily,
he's
for
historic
reasons:
that's
how
it's
been
reintroduced,
but
mainly
yeah,
because
there
might
be
tens
of
thousands
of
nodes
and
in
order
to
release
the
database
from
additional
pressure,
we
we
bulk,
insert
and
here's
the
example
of
diff
node
importer,
where
we
format
the
node,
and
then
we
use
bulk
insert
to
insert
these
attributes.
A
I
guess
the
main
difference
between
this
and
import
export
is
that
import
export
has
a
lot
more
nested
associations.
You
know,
like
you,
can
imagine
a
merger
class
with
notes.
Inside
of
notes,
you
have
word
emojis
user
references
system.
Note
metadata
events
whatever
it
might
be.
It
can
be
a
bit
of
a
bigger
package
as
opposed
to
what
we
fetch
from
github
api.
A
Now
you
may
have
noticed
that
the
advanced
stage
worker
has
these
two
stages:
pull
requests
merge
by
and
pull
request,
reviews
like
why.
Why
are
they
needed?
Why
not
just
have
them
as
part
for
like,
let's
say
no
diff
notes
right
or
notes,
and
the
main
reason
is
because
these
are
separate
api
endpoints
that
are
coming
from
github
and
they
do
not
have.
A
They
only
have
individual
api
endpoints
and
not
collections.
So
for
every
merge
request,
we
would
say:
let's
say
we
imported
10,
000
version
quests
for
every
merge
request.
We
need
to
go
back
and
one
by
one
fetch
all
of
them
merge
by
information
which
can
be
quite
slow,
so
yeah.
It
does
not
have
the
collection
api.
So
we
have
to
facial
mrs
1.1,
which
is
not
efficient.
A
Have
here
obviously
so
we
have
issues
and
div
nodes
combined.
So
we
have
the
issues
importer
together
with
diff
nodes,
new.
A
I
guess
another
thing
worth
noting
is
object
importer,
that's
another
module
that
is
being
used
to
execute
individual.
A
A
A
Well,
the
difference
is
this:
let
me
just
open
them
side
by
side
and
if
you
take
a
look
at
the
collection
method,
this
one
uses
pull
requests.
Plural,
pull
requests
comments
versus
pull
request
comment
so
which
indicates
that
this
method
of
importing
fetches
diff
nodes
comments,
one
by
one,
mr
by
mr
okay.
Mr
mr
number,
one
give
me
all
the
comments.
More
number
two
give
me
all
the
comments.
A
While
this
endpoint
returns
all
the
comments
across
all
of
the
merge
requests
right.
So
you
don't
like
it's
convenient
because
you
have
one
endpoint,
but
you
need
to
sort.
You
need
to
sort
which
div
node
belongs
to
which
merge
request
yourself
and
that's
what
we
do
during
the
import.
A
It
will
tell
you
this
in
order
to
keep
the
api
fast
for
everyone.
Pagination
is
limited
for
this
resource,
meaning
that
you
simply
cannot
fetch
all
the
available
comments
for
this
project,
and
this
is
the
node.js
project
by
the
way.
So
we
stumbled
upon
this
problem
somewhat
recently,
where
an
import
was
missing
comments
and
after
a
lot
of
digging
around,
we
found
that
github
api
just
simply
does
not
return.
A
All
of
the
comments
that
there
are
so
we
had
no
choice
but
introduced
an
alternative
way
of
importing
notes
to
to
the
github
importer
and
this
way
fetches
them
one
by
one
right,
which
is
way
slower,
but
that's
the
trade
of
if
you
want,
if,
let's
say
you
have
a
massive
project,
you
want
to
import
all
of
the
comments
and
the
current
approach
does
not
provide
all
the
comments.
Then
an
alternative
solution
is
to
try
this
way
of
importing,
and
this
issue
is
behind
two
feature.
A
Flags
number
one
is
the
single
point:
endpoint
nodes
import,
which
changes
the
way
we
import
nodes
and
the
other
one
is
the
lower
per
page
limit
and
that's
another
github
limitation.
Where,
sometimes,
if
you
own
fetch
a
page
of
default,
page
number,
100
github
will
return
500
error
and
will
not
return
the
result.
So
another
feature
flag
was
added
to
just
reduce
the
page
number,
the
page
size
in
order
to
try
and
help
with
stability
and
try
and
fetch
all
the
data.
There
is,
even
though
the
price
is
speed.