►
Description
Presented by Tobias Gabriel and Nikolas Krätzschmar, SAP.
GitHub Satellite: A community connected by code
On May 6th, we threw a free virtual event featuring developers working together on the world’s software, announcements from the GitHub team, and inspiring performances by artists who code.
More information: https://githubsatellite.com
Schedule: https://githubsatellite.com/schedule/
A
So
what
we
have
coming
ups
access,
credential
large-scale
organizations
and
we're
gonna
have
two
amazing
speakers
joining
us:
Tobias
Gabriel
developer
at
sa
P
yeah.
You
know
me:
I
can't
wait
for
that.
Ai
rap
battle,
I'm!
Sorry,
it's
been
on
my
mind
all
day:
12:40
Pacific,
gonna,
wash
it
on
and
Nicholas
catch,
Meyer
who's,
a
student
s,
AP
developer
and
they're
gonna.
Tell
you
all
about
what
they've
been
learning
and
using
how
to
make
it
and
get
help
more
effective
for
people
that
are
learning
so
take
it
away
to
Baia
San
Nicolas.
B
High
and
vacant
from
my
site
as
well
today,
we
want
to
give
you
an
quick
overview
in
the
next
20
minutes.
What
we
learned
from
our
adventures
in
credential
mitigation,
at
s
AP,
and
what
things
we
did
there
s
vs.
Dylan
topic
in
motion
for
us.
We
will
not
go
into
too
much
deeper
detail
about
the
actual
findings,
but
rather
focus
on
the
tools
and
processes
we
used
for
that,
the
Fed,
let
the
that
being
said.
Let
me
quickly
introduce
ourselves.
B
My
name
is
Tobias
and
I'm
a
developer
at
the
tools
team
at
sa,
pier,
where
my
main
focus
is
on
everything
around
github
from
style
administrating,
our
internal
github
servers
over
doing
git
and
github
trainings,
as
well
as
doing
working
on
cross
topics
like
this.
One
I'm
today
joined
by
Nico,
who
is
a
master
student
in
our
team
and
did
most
of
the
technical
implementation
of
the
scanner
and
they'll
also
later
go
into
detail
about
that.
B
We
both
work
for
sapa,
which
is
one
of
the
largest
enterprise
software
vendors
in
the
world
and
has
over
30,000
developers
with
that
many
developers.
We
also
have
a
quite
big
codebase.
Currently,
we
have
over
250,000
repositories,
hosted
on
our
main
github
server,
as
well
as
roughly
five
terabyte
of
compressed
source
code
with
a
push
to
Mariner
source,
meaning
that
we
would
like
to
open
up
more
of
his
repository
so
that
every
colleague
can
see
them
and
reuse
code.
We
wanted
to
make
sure,
but
no
credentials
by
accident
get
leaked
there.
B
It
happens
and
probably
already
happened
every
one
of
you
myself,
including,
but
you
by
accident,
commits
on
fire,
push
it
up
and
then
never
for
never
come
back
to
clean
that
up.
So
you
have
some
password
or
some
credentials
leaked
via
and
by
limb
private
repositories,
but
not
critical.
When
opening
up
were
two
organizations
or
whole
enterprises,
we
wanted
to
make
sure
to
reduce
the
risk
bear
as
much
as
possible.
B
However,
the
that
many
repositories,
our
challenge
was
now
to
firstly
figure
out
how
big
of
an
issue
this
is
and
how
many
credentials
are
there
actually
in
our
source
code
and
to
focus
on
that
I'm
now
handing
over
to
Niko
who
they
go
into
details,
how
we
implemented
what
things
we
had
to
keep
in
mind.
Where
and
what
consideration
we
needed
to
take
from
there
Niko
take
a
bit
alike:
yep.
C
Thanks
to
be
us,
I'm
Nicolas
also
welcome
everyone
from
my
side,
ok,
first
to
define
what
we
were
actually
scanning
for.
We
opted
to
limit
our
search
to
just
static
patterns
that
can
be
easily
identified
by
a
regular
expressions,
as
this
appeared
to
be
sufficient
for
most
types
of
authentication
tokens.
For
example,
all
AWS
keys
always
begin
with
the
same
leading
character,
sequence
or
take
google
cloud
certificates.
C
Then
there
is
a
pretty
nice
tool
called
get
leaks,
which
uses
a
slightly
more
sophisticated
approach
to
identify
secrets
than
just
plain
regular
expressions
by
also
employing
entropy
measurements
and
now,
while
both
of
these
tools
are
great,
and
we
would
in
fact
highly
encourage
anyone
facing
a
similar
problem
to
give
them
a
try
first,
they
were
simply
not
performant
enough
at
our
scale.
So
instead
we
decided
to
implement
our
own
solution,
drawing
inspiration
from
those
existing
tools,
but
with
a
more
performance,
focused
approach.
C
C
However,
this
comes
at
the
cost
of
producing
more
output,
meaning
a
substantial
amount
of
additional
data
to
be
scanned
in
the
next
step.
Therefore,
whichever
of
these
two
option
is
preferable
will
highly
depend
on
the
regex
scanner,
throughput
capabilities
to
perform
the
actual
pattern
matching.
We
looked
at
various
reg
X
tools
out
there
and
during
some
initial
research
we
quickly
discovered
that
standard
grep
was
just
not
going
to
be
up
to
the
task,
primarily
due
to
its
lack
of
multi-line
pattern
support.
C
So
instead
we
looked
at
pcre
grep
as
a
mostly
comparable
alternative
that
does
indeed
support.
Multi-Line
mode,
and
also
some
more
complex
patterns
with
that,
we
were
able
to
scan
all
blobs
from
the
hundred
test
repositories
in
just
over
a
hundred
seconds
in
this
process.
We
also
got
some
support
from
Laos
from
github
professional
services
team,
and
he
pointed
us
towards
using
Intel
hyper
scan
library,
there's
a
high
performance,
regular
expression,
matching
library
that
works
by
recompiling
patterns
and
tuning
them
to
a
specific
CPUs
microarchitecture
by
using
vector
instructions
and
some
other
magic
optimizations.
C
C
Now,
looking
back
at
the
two
options
discussed
earlier
for
how
to
extract
the
repositories
content,
it
becomes
clear
that
the
scanner
throughput
is
not
the
limiting
factor.
Therefore,
the
option
of
just
outputting
all
objects,
so
the
KitKat
file
based
approach
should
be
preferable
because
the
slightly
longer
time
needed
to
perform
pattern
matching
on
the
additional
data
is
more
than
made
up
for
by
the
time
safe,
not
computing
differences
between
files
and
putting
this
all
together.
It
takes
40
seconds
to
both
extract
and
scan
the
contents
of
all
hundred
repositories.
C
Okay,
with
all
this
in
place,
we
decided
to
run
this
on
our
entire
code
base
of
over
250,000
repositories
and
at
first
we
considered
cloning
them
to
a
separate
machine
to
perform
the
scanning
there
like
we
did
for
the
test
repositories,
but,
as
you
can
imagine,
that
quickly
ran
into
a
couple
of
problems,
primarily
how
to
get
access
to
all
the
repositories,
including
the
private
ones.
But
more
importantly,
this
approach
would
essentially
be
equivalent
to
spamming.
Our
own
github
instance
also
would
just
take
too
long.
C
With
the
setup
running
on
128
worker
threads,
we
were
able
to
perform
a
full
scan
of
the
entire
five
terabytes
of
compressed
repository
data
in
just
four
hours
and
that
left
us
with
a
list
of
findings,
each
potentially
being
a
leaked
secret.
Now,
with
this
I'm
handing
back
to
Tobias,
to
talk
to
you
about
what
we
did
with
those
findings,
how
we
post
process
them
and
also
about
some
of
the
non-technical
observations
we
made
rolling
out
this
tool.
B
Yeah,
thank
you
very
much.
Nico
for
that
and
with
the
skin
are
now
in
place
and
we
actually
run
the
scan
on
a
daily
basis.
We
are
able
to
get
every
day
a
list
of
tensho
secrets
which
match
our
given
patterns
and
probably
to
no
one's
surprise.
We
found
quite
a
few
more
than
one
and
actually
so
many,
but
we
didn't
want
to
manually,
implement
and
follow
a
process
or
sent
around
some
excess,
or
things
like
that.
B
So
we
needed
now
to
take
our
findings
our
match
patterns
and
go
into
more
detail
about
them
of
what
they
are
if
they
are
indeed
valid,
and
things
like
that
to
show
you
at
one
example,
what
we
did
is
select
their
books.
They
look
like
this
and
already
contain
an
a
secret
value
at
the
end
which
you
can
use
to
post
to
a
specific
channel.
So
you
only
need
that
this
URL
and
can
do
in
progress
and
the
message
gets
sent
through
a
channel
without
any
further
information.
B
So
this
is
a
credential
and
you
probably
don't
want
that
accessible
to
everyone.
While
it
is
not
the
end
of
a
word,
if
somebody
has
bet
they,
they
can
still
spam
your
channel-
and
you
probably
don't
want.
But
if
you
just
look
at
this
tool
and
I
can
say
that
one
is
Method
and
the
other
one
is
invalid.
You
don't
you
can't
see
which
one
is
actually
still
valid
and
should
probably
be
mitigated.
B
The
same
process
of
verification
can
be
also
applied
to
ABBA.
We
call
them
Service
credentials
which
are
for
accounts
which,
from
a
central
service
like
cloud
accounts,
GCP
accounts
or
AWS
accounts,
or
even
the
Bitcoin
Keys
Niko
mentioned
earlier,
because
Bitcoin
keys
are
in
base
58
encoded
and
you
can
match
for
them.
And
if
you
receive
a
list
of
strings,
you
can
try
just
try
them
out
if
they
are
valid
and
we
actually
found
a
single
Bitcoin
key
in
our
code
base.
Unfortunately,
the
corresponding
Bitcoin
wallet
was
already
empty.
B
After
having
now
this
list
of
verified
credentials,
the
next
step
would
be
to
start
and
mitigation
process
and
to
quickly
summarize
what
we
have.
Now.
We
have
an
scanning
process
in
place
which
takes
the
data
from
the
Becca
scan
spam
on
regular
basis
and
then
tries
to
validate
them
if
they
are
service
credentials
against
a
central
service.
B
However,
the
that
many
findings,
and
so
many
development
teams
we
didn't
want
to
create
Excellus
or
manually,
send
emails
around,
but
rather
opted
to
implement
a
full
service
for
wet
because
it
outed
service
which
takes
in
these
findings
and
then
notifies
responsible
service
owners,
meaning
if
we
can
identify
them
cloud
account
owners,
slack
owners
and
things
like
that,
or
in
case
of
more
generic
secrets
like
the
LS,
our
keys
or
private
keys
or
even
passwords.
We
opted
to
notify
every
responsible
repository
owner,
so
they
can
review
them
and
decide
if
they
want
to
mitigate
them.
B
Even
before
Toton
scanning
github
announced
earlier
today.
We
think
that
it
can
replace
some
of
the
parts
of
the
scanner
but
with,
and
then
we
can
focus
more
on
the
specific
thing
things
like
my
audit
service
and
they
actually
noticed
when
we
started
rolling
out
by
audit
service
to
our
development
teams
that
we
received
quite
a
bit
of
feedback
on
what
things
were
good
and
what
things
were
bad
and
so
I
wanted.
To
give
you
also
now
a
bit
more
insight
into
things.
B
We
found
to
be
important
and
consider
rating
that
the
most
important
thing
we
noticed
is
that
you
need
to
try
to
have
as
few
false
positives
as
possible.
Probably
already
all
everybody
had
in
security
scanner
that
sent
out
like
300
messages
and
two
of
them
is
valid,
and
this
ends
up
with
development
teams,
ignoring
with
messages.
So
the
of
highest
importance
to
us
was
to
ensure
an
SI
occurs,
city
as
possible.
B
What
we
also
notice
is
that
a
lot
of
credentials
are
in
dependency,
folders
like
vendor,
for
go
or
not
mod
gears
for
no
GS
and
probably
imported
from
other
sources
like
github.com,
and
so
we
opted
to
exclude
them
as
if
they
are
valid
credentials.
They
probably
should
already
been
mitigated
at
the
source,
and
we
didn't
want
to
include
them
again
to
send
out
the
second
thing.
If
you
send
out
notification
to
people
and
our
development
colleagues
was
that
the
first
question
we
receive
yeah
and
what
should
we
do?
B
Additionally,
what
is
important
is
to
have
somebody
as
in
contact
where
teams
can
reach
out
in
case
they
have
IVA
questions
or
if
they
noticed
abuse
of
some
accounts,
so
that
this
can
be
escalated
and
properly
handed
and
doesn't
end
up
in
a
void.
And
the
last
thing
we
implemented
mainly
to
track
progress.
Our
safe
was
an
automatic
revalidation
of
all
and
potential
patterns
be
matched
every
day.
B
So
we
have
been
every
day
and
list
of
credentials
which
are
which
we
know
if
they
get
mitigated
on
net
and
as
well
take
the
burden
from
the
response
of
their
development
colleagues
to
manually
flag
for
dentists
as
mitigated
or
not,
and
don't
bug
teams
about
credentials
which
are
already
no
longer
valid
the
Fed.
We
can
also
make
sure
that,
if
credentials
don't
get
updated
or
mitigated
in
time,
we
can
either
take
a
look
at
them
ourselves
or
escalate
it
as
necessary
and
professor
in
mind.
This
concludes
our
presentation.
B
D
A
D
No,
it
was
the
push
day
and
the
sauce
was
really
good
to
hear
about.
You
know
with
how
we
how
we,
how
you
had
do
it
in
real
life
and
we've
got
a
session
coming
up.
Gosh
is
very
early
in
the
morning
our
time
in
Europe
here,
but
Denis
coop
is
coming
on
later
on.
Today,
he's
obviously
mastered
min
the
source,
and
you
know
talking
about
how
in
a
source
is
going
to
be
good
for
free
and
open
source
sustainability.
But
what
was
fantastic
really
was
to
kind
of
see.
D
A
I'm
gonna
tell
you
finding
that
Bitcoin
key
I'm,
just
bummed
that
y'all
aren't
crypto
rich.
You
should
have
been
crypto
rich.
You
just
stood
up
in
crypto
ready
now
everybody
watchin
part,
and
then
you
actually
something
how
accept
how
excited
did
y'all
get
when
you
found
that?
Oh
this
is
my
own
question
outside
I
would
have
been
ecstatic
good,
yeah.
B
I,
actually,
when
we
found
that
we
found
rather
many
patterns
which
match
them
so
I
was
surprised
how
many
were
there
and
when
we
implemented
some
further
verification
to
see
how
many
of
these
are
actually
valid,
matches
or
just
false
positives
and
been
starting
that
and
applying
that.
Unlike
the
first
hundred
matches,
we
saw
no
positive.
I
was
already
pumped
and
then
at
the
last
point.
We
still
found
one
and
then
I
also
copy/paste,
but
over
to
check
if
there's
anything
on
the
wallet
and
yeah.
B
Yeah,
so
currently
we
don't
have
concrete
plans
to
open
source.
It
mainly
out
of
a
reason
that
we
have
a
lot
of
very
specific
hex
in
there
which
apply
to
our
specific
use
case
and
our
internal
services.
But
if
you
are
interested
in
some
of
the
more
details,
I
think
we
can
write
up
some
of
the
more
generic
things.
So
the
core
implementation
of
the
scanning
is
rather
straightforward
and
I.
Think
we
can
see
if
we
can
provide
for
something
but
I'll
take
a
look
at
that
in
detail.
No
ABI.
D
D
It
and
then
we've
got
another
question
here
from
K
robots.
They
were
asking.
Is
it
possible
to
have
these
scans
run
before
users
are
able
to
make
pull
requests?
So
is
there
any
way
you
know
it's
like
a
pre-commit
hook,
that
sort
of
thing
to
stop
them
getting
into
a
branch,
or
what
do
you
do
if
they're
in
in
the
pull
request,
you
know,
are
they
not
in
history.
B
Yeah
so
the
first
thing
and
I
already
plan
to
include
that
in
my
presentation,
but
actually
missed
there
was
that,
including
on
scanning
for
credentials
in
repositories
only
the
second
step.
The
first
step
is
that
you
would
like
to
prevent
them
getting
there
at
all.
So
you
don't
need
to
trigger
an
mitigation
process
because
as
soon
as
credentials
on
github,
you
should
probably
need
to
rotate
their
credentials
and
not
only
remove
them
from
the
history,
the
vet
being
said,
for
example,
the
get
lock
utility
or,
if
I,
remember,
correct.
D
The
that's
really
interesting!
So
do
you
have
any?
This
is
a
question
for
me
now
not
from
the
audience.
Sorry,
but
anyway,
I'll
get
back
to
you
in
a
minute
folks.
So
do
you
have
any
tooling
that
you
use
you
know
when,
for
developers
to
kind
of
sell
their
local
get
environments
and
get
their
you
know,
their
email
addresses
settle
properly,
get
there
any
pre-commit
hooks
anything
like
that
sort
of
property
in
their
local
environment,
yeah.
B
So
we
have
don't
have
an
fully
script
to
run
it,
as
most
people
are
already
grabbed
a
family
of
git
and
how
to
set
it
up.
But
we
have
an
very
handy,
Quick,
Start
Guide,
which
says
bitch'
email
address
you
should
set
and
how
you
set
it
up.
So
bad
development
colleagues
can
quickly
get
ramped
up
on
git
and
the
great
thing
about
git
is
actually
but
most
of
the
people
actually
I.
B
Rather
family
love
it
already
and
don't
need
a
long
onboarding
trail
for
that,
so
the
internal
documentation
we
have
around
it
is
rather
limited
and
fear
would
fare
quite
a
lot
on
already
but
open
knowledge
which
is
available,
publicity,
and
so
that's
also
a
great
thing
about
git,
but
everything
you
can
find
online
and
don't
have
to
have
some
pay.
What
tools
around
that?
Yes,
that's.
D
Fantastic,
yellow
and
so
I
helped
how
nice
their
work
over
Microsoft
while
ago
and
I
helped
out
with
that,
when
the
windows
team
were
adopting
it
and
one
of
the
things
that
they
had
was
some
posters
around.
You
know
the
five
stages
of
git
as
they
were
coming
up
to
speed.
Instead
of
the
five
stages
of
grief,
you
know
so
it's
quite
interesting
source
in
the
training
that
different
people
need
as
you
migrate
them,
and
it's
brilliant
that
your
community
we're
just
able
to
pick
up
and
run.
So
that's
fantastic!
D
B
A
C
D
C
Really
nice,
like
both
they
are
to
actually
program
properly,
like
they
probably
do
at
university,
usually
far
less
advanced,
also
I
think
I
probably
learn
more
while
working
than
at
university.
Actually
because,
just
if
you
have
a
problem
and
then
you're
researching
information
for
this,
you
directly
know
how
to
apply
the
new
knowledge.
So
it
just
sticks
better
I.
Think.
A
You
know
I
I
just
had
such
a
sweet
spot
in
my
heart
for
students
and
learners,
and
you
hear
you
know
no
matter
if
you're
going
to
university
or
you
taking
an
apprenticeship
or
an
internship,
you
know
I
love
our
github
interns
get
enough.
Real-World
experience
is
so
awesome
and
important.
Getting
to
do
real
cool
stuff
in
real
life,
I
mean
for
me.
It's
like
I,
know,
I
think
about
it.
There's
been
like
20
years
since
I've
had
actually
you
know,
do
some
studying
in
university.
A
A
A
C
Act,
the
actual
scanning
port
that
we
implemented-
it's
not
your
top
specific
it
just
uses
like
just
calls
to
get
so
that
can
be
run
on
anything.
But
then
some
of
the
metadata
extraction
part
are
we
doing
to
find
who
the
repository
owner
is
and
that
kind
of
stuff
that's
get
up,
Enterprise
specific,
but
with
reusing
it
on
get
up
comm.