►
From YouTube: QnA with Brendan Dolan Gavitt on AI Code Suggestions
Description
This is a QnA with the author of Fauxpilot where Gitlab Product , Engineering and Incubation team asks question to understand further on AI code suggestion , future of Smart Code as well as large scale models
A
Yeah
all
right,
hi,
everyone
we
are
at
a
q,
a
with
Brendan,
Dolan,
Kevin,
I,
hope
I'm,
not
butchering
the
name
in
reference
to
our
work
and
with
gitlab
AI
assist
as
well
as
all
pilot
and
anything
to
do
with
smart,
secure
coach
suggestions,
so
I'm
gonna,
first
start
off
with
actually
just
probably
saying
what
Brandon
would
say
in
his
own
words
who
he
is
Brennan
is
an
assistant
professor
in
computer
science
and
engineering
department
at
NYU.
A
A
He
can
ultimately
found
posting
pictures
of
cats
and
making
very
bad
jokes
on
Twitter
at
I
would
like
randomly
how
you
say
it
sure.
B
This
is
just
moyex.
This
is
one
of
these
things
where
you
chose
a
username
in
high
school,
and
it's
been
with
you
ever
since
so
yeah.
Thank
you
very
much
for
the
introduction.
A
A
B
Sure,
yeah
so
I
guess
full
pilot
kind
of
came
about
because
so
I
had
been
using
GitHub,
copilot
and
thought
it
was.
You
know
very
helpful
in
doing
my
own
programming,
but
I
guess
as
researcher
we
wanted.
You
know
my
lab
wanted
a
way
to
be
able
to
do
things
like
you
know
if
we
train
our
own
model
or
fine-tune
our
own
model
for
different
purposes,
we'll
want
some
way
to.
B
Actually
you
know
use
that
in
something
that
looks
like
a
GitHub,
copilot
kind
of
use
case
and
at
the
same
time,
I
guess
you
know.
Whenever
I
saw
people
discussing
you
know
things
like
co-pilot
online,
one
of
the
things
that
they
were
sort
of
concerned
about.
Is
this
the
fact
that
you
have
to
sort
of
send
your
code
up
to
some
remote
server,
hosted
by
GitHub
to
actually
get
suggestions
right
because
it
sends
sort
of
the
code
that
you're
currently
working
on
and
maybe
code,
that's
working.
B
That
is
in
other
files,
open
your
in
your
editor
so
that
it
can
get
code
suggestions
and
there
were.
B
There
was
a
lot
of
interest
in
being
able
to
kind
of
run
your
own
locally
hosted
version
of
it
and
around
I
think
this
was
sometime
in
maybe
January
or
February
Salesforce
released
these
fairly
high
quality
code
models
that
were
open
in
the
sense
that
you
could
actually
download
them
and
run
them
locally
yourself,
and
they
had
reported
at
least
that
in
their
evaluation,
some
of
the
models
like
the
16
billion
parameter
python
model
had
comparable
or
better
performance
on.
B
You
know:
sort
of
standard
code
benchmarks
like
human
eval
or
apps
for
solving
programming
tasks
so
yeah.
That
was
essentially
it
you
know.
So
a
lot
of
the
research
I'm
doing
these
days
focuses
on
like
okay,
you
know
here
are
these
code
models.
It
seems
like
a
lot
of
people
are
going
to
be
using
them
over
the
next
few
years.
B
I
think
I
saw
a
stat
the
other
day
that
GitHub
co-pilot
got
400
000
new
users
in
their
first
month,
which
that's
a
lot
of
people.
Suddenly
using
you
know,
AI
code
suggestions,
and
so
we
want
to
do
research
on
you
know.
B
How
can
we
make
sure
that
the
suggestions
that
it
produces
don't
make
your
code
worse
and
ideally
can
they
suggest
things
that
are
sort
of
more
secure
than
what
you
might
be
inclined
to
write
by
hand
and
questions
like
this,
so
I
guess
I
see
it
as
a
kind
of
a
combination
of
this
is
our
research
platform,
for
you
know
running
future
studies.
B
We've
already
done
one
kind
of
user
study
where
we
had
gave
half
of
the
users
access
to
the
code
model
and
half
the
users
wrote
Things
by
hand,
and
we
didn't
use
fopilot
for
that
one
because
we
ran
it
before
if
a
pilot
existed
but
we're
planning
on
doing
more
of
those
in
the
future
and
we're
going
to
try
and
use
those
with
low
pilot,
because
that
really
gives
us
very
kind
of
fine-grained
control.
Over
the
we
know
the
exact
model
that
they
used.
We
know
what
it
was
trained
on.
B
We
can
use
models
that
were
fine-tuned
on
particular
code
bases,
or
you
know
only
trained
on
code
that
we've
maybe
tried
to
scan
for
security
vulnerabilities
or
things
like
that.
So
yeah
that
was,
that
was
kind
of
the
the
main
thing
I
guess
I
also
don't
want
to
necessarily
oversell
it.
It
is
a
project
that
I
kind
of
put
together
over
the
course
of
a
maybe
a
month
or
two
over
the
summer.
When
I
was,
you
know
not
didn't,
have
to
teach
and
the
main
components
are.
B
You
know
from
other
more
mature
projects,
so
nvidia's
Triton
infant
server
and
the
faster
Transformer
Library
are
what
we're
using
to
get
this
kind
of
very
nice
low,
latency,
inference
and
I.
Guess
that's
the
main
thing
so
yeah.
Hopefully,
that
is
kind
of
clarifies
how
this
thing
came
to
be.
A
B
Yeah
cool,
so
if
we
want,
we
can
go,
keep
going
through
the
questions.
A
Yeah,
we
can
come
back
into
that
as
well.
We
can
skip
to
probably
buy,
which
would
be
my
question
and
we've
spoken
about
it,
but
I
think
we
are
looking
into
building
out
of
a
POC
using
full
palette.
So
what
would
you
think
we
would
need
to
consider
in
actually
using
full
pilot
and
testing
it
with,
let's
say
even
internally,
with
our
gitlab
audience
of
developers
with
writing
in
different
languages
with
RoR,
BJs
Python
and
all
of
that
and
also
keep
in
mind
compute
storing
what?
B
Yeah
I
mean
so
I
think
some
of
the
things
you've
already
kind
of
mentioned
in
your
question,
like
load,
balancing
and
batching
I-
think
we'll
probably
be
pretty
important.
B
You
know,
batching
is
a
little
bit
tricky
because
you
sort
of
have
to
have
enough
people
using
it
that
it
makes
sense
to
combine
multiple
requests,
because
if
you
have
to
wait,
you
know
30
seconds
for
the
next
request
to
come
in
before
you
try
to
batch
them
together.
Then,
all
of
a
sudden,
your
latency,
is
30
seconds
for
that
first
user.
But
assuming
you
have
a
lot
of
requests
coming
in
simultaneously,
then
batching
together
makes
a
lot
of
sense.
B
As
long
as
you
know,
the
batch
fits
in
GPU
memory
alongside
the
model,
the
let's
see
other
things
that
I
would
make
sense
to
do
so.
Load
balancing
is
definitely
useful
and
I.
B
Think
that
that's
something
that
you
can
do
pretty
trivially
by
just
running
multiple
servers
and
you
know
having
either
a
front-end
proxy
that
redirects
to
one
of
them
or
by
you
know
changing
the
little
flask
app
to
redirect
one
of
the
inference
servers
you
know
either
of
those
should
work
pretty
well,
the
models
don't
have,
don't
have
to
really
store
any
state
right,
they're,
all
kind
of
one
shot.
You
give
them
the
context,
and
then
they
produce
a
suggestion.
B
So
there's
not
anything
where
like.
If
the
user
was
making
multiple
requests,
they
would
always
have
to
go
to
the
same
server.
So
it's
really
a
very
nice
use
case
for
load.
Balancing
there
going
beyond
that
and
starting
to
think
about.
You
know
what
can
be
done
to
make
things
faster
and
more
performant
and
hopefully
have
a
lower
resource
requirements.
B
B
You
can
actually
convert
them
to
8-bit,
integers
and
just
do
the
same
inference
and
you
don't
lose
any
sort
of
quality
of
the
output,
but
it
runs
faster
and
uses
less
memory
and
very
very
recently,
like
three
days
ago,
one
group
even
managed
to
get
in
four,
so
forb
King,
and
so
at
that
point
you're,
where
you
can
run
the
16
billion
parameter
model
in
just
eight
gigs
of
GPU
memory
right
because
forbidden
means
you're,
only
storing
four
bits
per
parameter,
half
a
byte,
and
so
that
can
potentially
really
could
really
speed
things
up
so
yeah
so
I
think
there
are
lots
of
you
know
ways
to
make
this
even
faster
and
use
fewer
resources.
D
B
For
a
given
model,
you
know
if
it's
a
16
billion
parameter
model.
You
need
basically
16
gigabytes
of
GPU
memory,
just
to
store
the
model.
A
Yeah
thanks
for
that
Brendan
I
think
Fred
is
back.
We
just
skipped
your
questions
right.
If
you
wanted
to
take
up
on
number
four.
C
Yeah
sorry
about
that
yeah
I
was
wondering
if
you
have
some
metrics
into
insights.
C
B
Yeah
so
I
don't
know
if
I
can
do
yeah
I
think
I
can
do
screen.
Sharing
and
I
can
just
show
you
the
traffic
page
both
for
the
GitHub
repository,
but
also
for
the
hugging
face.
That
is
the
wrong
one.
Sorry
I
think
it's
this
one.
There
we
go
so
this
is
the
page
on
hugging
face
where
the
actual
models
that
it
downloads
are
stored,
and
this
is
a
pretty
decent
proxy
for
whether
people
are
actually
using
it.
B
B
and
then
I
think
probably
people
who
are
just
trying
you
know
trying
it
out
might
be
using
the
really
small
models.
So
we
get
some
downloads
there
for
the
python,
only
version
and
the
multilingual
version,
and
so
that's
a
pretty
decent
proxy
for
whether
people
are
actually
using
it.
B
Almost
no
one
is
using
the
natural
language
versions
of
these,
because
I
think
they're
not
even
exposed
in
the
setup
script,
and
there
aren't
very
many
people
who
are
trying
to
use
faux
pilot
to
help
them
write
English
over
on
the
traffic
page.
So
this
it's,
you
know
it
was
a
little
bit
annoying.
That
GitHub
only
gives
you
the
last
month
of
usage
stats
here,
but
it
tends.
B
It
looks
like
it's
settled
into
around
100-ish
clones
per
month,
and
then
you
know
more
people
than
that
at
least
sort
of
checking
it
out
and
having
a
look
at
it.
B
E
A
quick
question,
while
you're
on
hugging
face
my
job
as
a
product
manager
is
speaking
with
our
customers.
I've
not
run
into
any
customer
that
uses
hugging
face
today.
I
suspect,
that's
largely
due
to
Enterprise
use
cases,
but
what's
your
take
on
hugging
face.
B
B
You
know
for
free,
so
that
was
kind
of
very
attractive
as
far
as
like
what
I
think
of
them
as
a
company,
you
know
I
I
like
a
lot
of
the
things
they
do.
They
have
made
it
a
lot
easier
to
train
models
and
to
host
a
bunch
of
fairly
different
models
for
inference,
yeah,
so
I,
you
know
I,
think
I,
guess
I'm
a
little
bit
still
confused
by
if
they're
a
company
how
they
are
planning
to
make
money.
B
But
you
know
that's
I,
guess
that
is
their
problem
rather
than
mine.
B
Yeah
I
think
that
plausibly,
what
they're
doing
is
something
like
you
know:
they
have
a
bunch
of
people
who
know
a
lot
about
Ai
and
building
and
training
models,
and
they
may
be
doing
a
sort
of
Contracting
thing
where
they
say.
Oh,
you
can
hire
us
to.
You
know,
help
you
train
or
deploy
some
models
that
you
have
or
are
interested
in
having
and
they
may
be
able
to.
B
A
Oh,
we
move
on
I
think
then
it
would
be
back
to
oh
I.
Think
back
to
me
and
Alexander
the
models.
We
would
love
to
know
a
little
bit
more
on
actually
the
models.
A
Obviously
we
have
four
so
all
the
way
to
the
32
gigabyte
on
and
some
more
insight
as
to
where
the
model
Works,
which
in
in
what
areas,
what
areas
it
wouldn't
and
then
I
think
Alex
and
I
have
a
few
more
questions
just
on
how
we
would
go
in
if
we
had
to
take
that
model
and
optimize
it.
What
would
that
look
like
so
yeah
sure.
B
B
You
know
the
code
suggestions
it
gives
are
pretty
bad
all
the
way
up
through
two
billion
6
billion
and
16
billion
parameters
and
I
think
I
misspoke
earlier
when
I
said
that
that
would
take
just
16
gigs
of
RAM,
it
would
actually
be
32
gigs
because
it's
16-bit
floats,
but
the
two
flavors
are
the
multilingual
one
which
I
believe
was
trained
on
C
python
Java.
B
B
B
B
B
But
that
one
was
trained,
I
think
on
a
larger
set
of
languages
and
uses
gbt,
Neo
X,
and
so
that
might
be
another
one
that
could
easily
be
added,
because
again
it
uses
faster
Transformer.
You
can
convert
their
model
into
faster,
Transformer
and
use
it,
and
so
that
one
might
be
a
good
one
to
add
in
the
very
near
future.
B
That
would
let
you
then
run
any
of
those
models
as
well.
Those
would
include
things
like
Facebook.
Has
this
one
called
encoder
hugging
face
itself,
has
one
called
code
parrot
which
is
focused
on
Python,
and
they
are
current.
What's
that
yeah
polycutor
is
the
one
yeah
yeah
a
polycoder
is
the
GPT
neox
one
and
it
is
going
and
I
think
that
could
be.
B
That
would
be
like
half
a
day
or
something
like
that
to
add
that
to
faux
pilot
as
it
is
and
hugging
faces,
also
gearing
up
right
now
to
train
another
large
open
code
model
that
they're
calling
big
code
and
I'm
I
have
been
working
a
bit
with
them
on
that.
B
Just
you
know
on
questions
like
you
know
what
kind
of
training
data,
what
sort
of
model
and
things
like
that,
so
that
one
I
think
is
probably
not
going
to
be
released
for
a
couple
more
months,
but
when
it
is,
that
would
be
in
a
very
nice
one
to
be
able
to
add
as
well
in
particular,
because
that's
going
to
support
some
things
that
are
really
useful
for
actually
using
this
CNN
IDE.
B
So
it's
going
to
be
able
to
do
things
like
this
fill
in
the
middle
task,
where
you
give
the
model
not
just
the
code
up
to
the
cursor,
but
the
code
before
and
after
the
cursor,
and
then
ask
it
what
the
best
way
to
fill
in
code
in
between
is-
and
that's
really
helpful,
just
because
a
lot
of
times
you
know
if
you're
editing
a
file
you're,
not
sort
of
writing
it
from
scratch,
top
down
you're
making
changes
to
what's
already
there
and
the
changes
you're
adding
have
to
be
consistent
with
what
comes
before
and
what
comes
after
yeah,
so
I'm,
I
guess
what
yeah
are
there
other
things
that
you
wanted
to
know
about
the
languages
and
models.
A
No
I
think
that's
good
I'm
good
for
now
that
I
think
Alexander
you're.
Next,
with
the
questions
on
the
model.
A
D
So
I
saw
to
get
recommendations.
We
need
to
provide
several,
let's
say
hyper
parameters.
One
of
them
is
temperature.
Another
one
is
Max
tokens,
so
I
guess
so
the
question
was
like:
is
there
a
way
maybe
to
tune
automatically
these
hyper
parameters
for
each
project
or
how
should
we
set
up
them?.
B
Yeah
I
mean
so
right
now.
This
is
very
much
a
kind
of
you
know
more
art
than
science.
People
have
been
mostly
going
by
kind
of
rule
of
thumb.
You
know
we
want
sort
of
low
temperatures
for
code
Generation,
so
I
I,
don't
know
that
there
are
exact
ways
to
derive.
B
This
I
have
mostly
not
found
that
it
is
something
that
I've
wanted
to
be
able
to
change
kind
of
on
a
per
project
basis,
but
I
think
it
would
be
something
that
you
might
want
to
say,
take
some
kind
of
standard
Benchmark
and
do
a
bunch
of
Generations
at
different
temperatures
to
decide
on
at
least
what
the
default
is,
and
there
are
you
know,
benchmarks
available,
certainly
for
Python
and
I
think
a
few
other
languages
for
just
you
know
the
sort
of
smallish
little
programming
problems
that
could
be
used
for
evaluation
and
you
just
basically
generate
code
and
then
run
a
bunch
of
tests
to
see
if
the
code
worked
as
far
as
number
of
tokens.
B
So
the
way
that
GitHub
copilot
does
it
is
they
mostly
I
mean
so
they
always
ask
for
a
fixed
512
tokens,
and
the
reason
that
this
doesn't
take
ages
to
generate
is
that
they
change
the
stop
sequence.
So
a
stop
sequence
just
says
as
you're
generating.
B
If
you
see
this
and
the
output
then
you're
done
generating
and
you
can
return
immediately
and
so
the
way
they
do
it
is
most
of
the
time
it
operates
in
a
kind
of
one
line
at
a
time
time
mode
where
the
stop
sequence
is
just
a
new
line.
So
as
you're
typing
an
individual
line,
maybe
it
only
has
to
generate
like
four
tokens
to
be
able
to
complete
that
line
and
then
I
think
if
you
sort
of
stop
and
wait
a
little
bit,
it
will
then
kick
into
this.
B
Oh,
maybe
we
should
try
to
generate
a
larger
amount
of
code
mode
where
it
sets
the
stop
sequence
to
something
that
looks
like
you
know,
an
end
of
function
or
end
of
an
if
statement
or
something
like
that,
which
is
more
language
specific
and
then
we'll
try
to
generate
that
whole
block
of
code,
so
yeah
and
I
think.
B
Certainly
that
makes
a
lot
of
sense,
particularly
in
terms
of
performance,
because
at
least
most
of
I
think
most
much
of
the
benefit
that
I
get
from
copilot
is
the
sort
of
one-line
completions
as
I'm
typing,
and
you
want
those
to
be
very,
very
fast
so
that
you
know
you
can
even
see
them
like,
as
you
type
one
character
at
a
time
it
will
go
through
different
completions
and
so
I
think
that
that
does
tend
to
work
quite
well.
If
you
are
just
trying
to
generate
one
line
it
really.
D
So
what
are
the
requirements
for
inputs
to
get
recommendations?
Should
it
be
I,
don't
know
a
good
blog
of
English,
Chinese
or
another
language?
You
can
constantly
just
pass,
let's
say
the
python
code
block
and
it
will
be
completed
or
something
else.
B
Yeah,
so,
generally
speaking,
you
just
pass
it
the
code
you
already
have
and
if
users
want
to
kind
of
instruct
it
in
English
the
way
you
do
it
is
by
just
writing
a
comment
and
I
have
actually
found
this.
It
may
be
a
little
bit
surprising.
Is
that
a
lot
of
the
you
know?
My
code
is
much
better
commented
now,
because
I
write
a
comment
saying
what
I
want
the
model
to
fill
in
and
then
it
fills
in
based
on
that
comment.
B
So
yeah,
that's
the
usual.
That's
the
usual
scenario.
It's
not
I!
You
know,
I,
think
that
it
is
not
very
good
with
languages
other
than
English
in
terms
of
comments
and
again,
just
because
there's
not
nearly
as
much
non-english
comment
data
in
the
data
set.
B
D
A
You
also
say
basically
it
it
fills
in
based
on
writing
the
beginning
and
not
necessarily
if
it's
in
the
middle.
B
Right
and
so
I
think
the
I
mean
the
main
limitation
here
is
that
there's
the
model
supports
2048
tokens
a
time,
and
that
is
includes
you
know
the
input
plus
whatever
it
needs
to
generate,
and
so
that
means,
if
you
are
asking
it
to
generate
up
to
512
tokens,
then
as
that
gives
you,
you
know
around
1500
tokens
of
input
and
so
often,
particularly
in
a
larger
source
code
file.
You're
not
going
to
be
able
to
fit
the
whole
thing
into
the.
B
And
so
you
know
the
simplest
strategy.
Is
you
just
give
it
the
most
recent
1500
tokens?
You
can
get
more
sophisticated
about
this,
because
sometimes
there
might
be
extra
context
that
you
want
to
include.
So
you
know
you
could
do
something
like
say,
assuming
that
you
know
a
little
bit
about
what
the
language
they're
using
is.
B
You
can
like
go
up
to
the
top
of
the
file
and
look
for
import
statements
and
then
have
the
prompt,
be
here's
my
import
statements
and
then
here's
as
much
of
the
code
I
can
as
I
can
fit
so
that
it
has
things
like
Library
definitions
or
structs,
or
things
like
that,
and
then
you
can
even
extend
this
by
saying.
Oh
okay,
you
know
I'm
writing
C
and
they
include
food.h
I'll,
go
to
foo.h
and
try
to
pull
in
some.
B
D
B
Right
I
mean
so
I
think
this
is
something
that
for
now,
I
have
not
done
very
much
on
kind
of
the
client
side,
so
this
would
all
be
sort
of
done
within.
Like
a
you
know,
a
visual
studio
code,
Plugin
or
an
IDE,
plugin
and
I
have
not
done
very
much
with
that.
I
know
that
the
GitHub
copilot
plugin
does
do
a
lot
of
this,
because
I
see
it
pulling
in
pieces
of
other
libraries
and
other
files
that
I
have
open.
B
You
know
there's
a
it's
a
decent
bit
of
kind
of
implementation
work,
particularly
if
you
want
to
support
a
bunch
of
languages,
because
you
have
to
know
that,
like
okay
in
PHP,
it's
required
and
in
Python
it's
import
and
in
see,
it's
include
and
then
have
to.
You
know
know
where
to
find
these
dependencies.
D
Me
ask
you,
so
how
was
the
model
tested
I
mean?
Maybe
maybe
several
use
cases
were
collected
just
to
understand
in
which
cases
the
model
performs
Better
or
Worse?
Let's
say
I,
don't
know
you
need
case
sorry
unit
test
generation
or
web
service
or
something
else.
B
Right
so
right
now,
tests
are
a
wishlist
item,
but
as
far
as
kind
of
evaluating
like
the
quality
of
the
model
output
they're,
the
paper
that
Salesforce
released,
which
is
called
a
conversational
model
of
program,
synthesis
our
conversational
for
programming
synthesis,
but
they
did
actually
kind
of
a
full
evaluation
of
all
of
these
and
on
a
bunch
of
benchmarks,
and
so
that's
kind
of
what
I've
been
relying
on
for
my
general
sense
of
which
model
should
I
use
and
sort
of.
B
Unsurprisingly,
it
is
that
you
should
use
the
biggest
model
that
you
have
or
that
you
can
feasibly
run,
and
if
there
is
a
version
of
it
that
is
trained
on
your
specific
language,
then
you
should
use
that
one
in
preference
to
one
that
tries
to
support
many
languages
at
once.
B
You
know,
aside
from
that,
I
guess
just
sort
of
using
it
interactively
I've
I
found
that
it's
it's
pretty
decent,
it's
very
it's
much
better
with
python
than
it
is
with,
like
writing.
C
code,
for
example,
and
so
I
think
this
makes
sense.
Just
given
how
much
python
code
there
is
out
there,
but
yeah
I
think
it
would
be
really
nice
to
have
like
some.
You
know
automated
harnesses
for
saying:
okay
I
mean
even
just
you
know:
does
the
model
still
work
to
generate
code
at
all?
B
Can
we
run
it
on?
You
know
a
small
Benchmark
and
make
sure
we
haven't
kind
of
regressed
on
the
quality
of
recommendations.
A
B
Right,
yeah
absolutely
possible
for
it
to
produce
suggestions
that
don't
compile
because,
for
example,
maybe
it
refers
to
a
variable
that
doesn't
actually
exist
in
your
program.
B
You
know
really
speaking,
it's
not
going
to
make
the
kinds
of
mistakes
like
forgetting
a
semicolon,
or
you
know
doing
something.
Obviously
syntactically
wrong.
The
mistakes
really
are
more
of
well.
You
know
it
doesn't
have
any
context.
So
it
doesn't
know
what
you
know.
The
field
name
of
this
data
structure
is
so
it'll
just
make
one
up,
and
sometimes
he
guesses
right
and
sometimes
it
guesses
wrong.
B
But
that's
you
know,
with
this
kind
of
sort
of
prompt
engineering
comes
in
where
you
try
to
figure
out.
Okay,
what
do
I
need
to
show
it
so
that
it
can
give
me
reasonable
suggestions.
A
Cool
I
think
on
that
also
I'm,
also
conscious
of
time,
so
we're
gonna
go
through
the
questions
of
people
who
are
also
not
in
call
Dinesh
similar
to
the
testing.
How
would
you
compare
the
usefulness
of
full
Pilot's
suggestion
with
respect
to
co-pilot
ones,.
B
Yep
so
I
think
certainly
Coppell
is
currently
better
at
producing
mode.
Some
of
that
comes
from
the
fact
that
I
think
they
are
using
a
larger
model.
It
might
be
using
a
larger
model
but
they're,
certainly
using
what's
trained
on
more
data,
so
I
think
that
they
have
trained
it
on
basically
all
the
code
they
could
get
their
hands
on
and,
as
a
result,
it
is
pretty
good
yeah
generating
suggestions
there.
B
B
That
said,
you
know
I
think
it
has
been
a
it
seemed.
It
has
been
sort
of
perfectly
fine
for
kind
of
writing.
The
kind
of
code
that
I
usually
write,
which
is
like
okay
I'm
in
Python
I,
want
to
like
read
in
a
bunch
of
data
from
somewhere.
Do
some
like
analysis
on
it,
create
some
visualizations
and
graphs.
B
That's
the
kind
of
stuff
I
I
do
most
often,
and
it
works
very
well
for
that
and
so
I
guess
you
know
I
I.
Unfortunately,
don't
do
lots
of
stuff
in
like
writing.
Web
apps
and
JavaScript,
or
things
like
that,
so
I
have
less
kind
of
direct
experience.
A
B
Yeah
so
I
mean
there's
sort
of
two
ways
you
could
think
about
this
one.
Is
you
know
what
do
you
put
into
the
prompt
to
generate
the
output?
B
You
know,
and
maybe
this
is
covered
in
a
different
question,
but
I
think
the
other
strategy
is,
you
could
take
a
model
and
try
to
do
you
know,
what's
called
fine
tuning
where
you
train
it
on
additional
data,
for
you
know
a
much
shorter
amount
of
time
and
with
a
lower,
what's
called
learning
rate,
which
just
controls
how
dramatic
the
changes
to
the
model
are
at
each
training
step
and
so
fine-tuning.
B
These
models
is
also
possible,
which
is
something
that
is
not
possible
with
copilot,
because
you
know
you
need
to
have
the
actual
weight
of
the
model
available
to
do.
Fine
tuning
and
the
weights
for
codex
and
co-file
are
not
available.
That
said,
fine
tuning
is
not
totally
trivial.
B
You
really
just
make
a
data
set
of
your
code
in
Json
form,
or
you
know
it's
just
a
Json
dictionary
where
each
where
the
text
Keys
is
the
contents
of
your
source
code
file
and
then
each
line
is
a
new
Json
dictionary,
and
then
you
can
pass
that
to
a
standard
script
they
have
and
it
will
fine-tune
the
model
on
the
code
you
gave
it.
B
B
Are
these
in
the
16
billion
parameter
model
on
a
data
set
of
verilog
code,
which
is
used
for
like
a
CPU
design
and
Hardware
design,
and
that
was
a
400
Meg
code
data
set
and
it
took
three
a100
gpus
each
with
80
gigabytes
of
vram
about
six
days
to
to
fine-tuning
on
that
data
set
so
yeah
the
the
computational
requirements
are
definitely
not
like
I
can
take.
You
know
my
even
like
my
souped
up
gaming
desktop
and
fine-tune
a
big
model
on
it.
B
A
Yeah
I
think
that
actually
even
answers
the
next
question
on
how
peaceful
it
is
to
point
to,
and
so
thank
you
for
that
and
then
now
Taylor.
E
E
Kite
kind
of
seems
to
be
falling
out
too
so
I'm
just
curious.
Your
thoughts
on
the
the
industry
in
general.
B
Yeah
I
mean
so
I
think
that
it
is
only
going
to
get
bigger
the
other
one
that
is
available.
Now.
Is
this
Amazon
code
Whisperer,
which
is
yet
another
which
I
have
not
even
had
a
chance
to
play
with
it
all
yet
so
I've
just
you
know
like
read
a
blog
post
about
it,
but
and
we
do
actually
note
the
tab
9
folks
pretty
well,
we've
been
talking
a
lot
with
them
about.
B
You
know
collaborating
on
some
things,
like
you
know,
user
studies
with
their
users
and
figuring
out.
You
know
if
there
are
ways
that
we
can
say
if
we've
trained
like
the
models
that
tries
to
produce
more
secure
code,
can
we
try
having
it
deployed
with
them
to
see?
If
you
know
it
helps,
helps
their
users
as
well.
B
That
said,
I
have
not
personally
actually
used
tab,
9,
so
I
don't
know
as
much
about
how
it
Compares
in
terms
of
code
quality
I
do
think
that
you
know
their
approach
has
been
to
use
much
smaller
models
and
then
try
to
kind
of
make
up
for
that
by
training
on
your
own
code
and
using
language
specific
models,
so
I
think
they
have
a
smaller
model,
but
it's
been
trained.
B
You
know
specifically
on
Java
and
another
one,
that's
been
trained
on
Python
and
another
one
has
been
trained
on
c
and
that's
sort
of
how
they
get
around
that.
But
you
know
anecdotally.
B
I
should
have
heard
that
the
quality
of
output
it
gives
you
is
not
up
to
what
copilot
currently
does
and
I
can
believe
that,
because
the
Codex
model
is
very,
very
good,
so
yeah,
you
know
I
definitely
am
not
the
person
to
ask
about
business
questions
because
you
know
I
I
know
nothing
about
it
at
La
will
say
is.
It
seems
like
it
is
going
to
be
very,
very
popular
over
the
next
few
years,
because
when
it
works,
it
works
really
really
well.
E
Awesome
thanks
for
that
on
to
the
next
question,
and
this
is
actually
kind
of
funny,
as
I
was
looking
at
the
articles
that
I've
been
researching,
you
were
actually
referenced
in
a
number
of
them.
There's
been
a
lot
of
research
done
on
co-pilot
and
detection
of
you
know
bad
insecure,
malicious
verbatim
code.
B
Yeah,
so
this
is
a
great
question,
so
we
I'm
I
was
I've
kind
of
gone
back
and
forth
on
this.
So
initially
we
did
this
big
evaluation
of
copilot
on
a
whole
bunch
of
different
classes
of
vulnerabilities,
where
we
made
these
like
little
toy
scenarios
like
I'm
about
to
insert
some
data
into
a
database
copilot.
B
How
would
you
do
that,
and
it
would
say,
oh
I
will
use
string
concatenation
to
put
your
SQL
into
this
database
and
that
was
obviously
a
very
bad
idea,
because
that
was
an
SQL
injection
vulnerability
and
when
we
measured
the
rate
at
which
it
does
that
in
that
study
it
was
something
like
40
of
the
time
you
know
which
was
quite
bad,
and
so
that
was
quite
alarming,
but
you
know,
then
we
sort
of
thought
a
little
bit
more
and
said:
well,
okay,
but
a
we
don't
know
how
often
our
human
users
would
make
that
kind
of
mistake
and
be,
if
it's
being
used
by
a
human,
then,
presumably
they
also
have
a
chance
to
look
at
the
output
and
say:
oh
that
doesn't
make
any
sense
or
you
know-
oh
that's,
obviously
vulnerable
and
fix
it.
B
So
we
actually
did
do
a
user
study
and
this
paper
we
actually
just
submitted
two
days
ago
so
I'm
happy
to
share
a
copy
of
that.
But
we
did
I
use
it.
We
did
this
user
study,
comparing
the
functionality
and
security
of
code
produced
either
by
hand
or
with
the
assistance
of
codex
and
somewhat
surprisingly
to
us.
B
B
These
were
undergraduate
and
Master's
students
I'm
in
computer
science,
and
they
were
writing
in
C,
where
it's
very
hard
to
write
good
code
in
the
first
place,
but
the
rate
of
vulnerability
between
the
two
groups
was
basically
the
same
as
far
as
we
could
determine,
and
so
that's
maybe
a
little
bit
encouraging
encouraging,
on
the
one
hand,
but
also
discouraging,
on
the
other
hand,
encouraging
because
it
means
it's
not
making
it
much
worse.
B
But
our
kind
of
working
hypothesis
at
the
moment
is
that,
because
the
models
were
trained
not
to
write
good
code
but
to
very
accurately
predict
what
would
come
next
in
a
source
code
file.
If
the
code
that
you
write
is
not
very
secure
or
not
very
high
quality,
it
will
very
Faithfully
write
insecure
and
low
quality
code
for
you,
and
so
it
did
tend
to
kind
of
match
the
quality
of
code
written
by
the
user.
B
B
So
that's
kind
of
my
main
ongoing
research
area
and
we
have
a
few
kind
of
fun
ideas
for
how
to
do
this.
One
is
that
we
can
try
to
actually
kind
of
patch
these
models
and
there's
a
bunch
of
different
ways.
You
can
think
of.
B
So
you
could
collect
a
new
training
data
set
and
annotate
it
based
on
some
Judgment
of
its
code
quality
and
then
basically
train
it
with
a
quality
tag
followed
by
the
training
data,
and
that
would
let
the
model
have
some
idea
of
like
okay.
B
This
is
how
I
produce
like
high
quality
code
and
then,
when
you
are
trying
to
generate
user
like
actual
suggestions
for
a
user,
you
would
always
include
the
high
quality
tag,
so
the
model
is
strongly
influenced
to
produce
higher
quality
code,
even
if,
what's
already
there
is
not
as
good
so.
Something
like
that
could
work
We've
also
been
looking
at
ways
of
doing
this
without
having
to
do
any
additional
training.
B
So
there
are
some
very,
very,
very
recent
approaches
that
will
try
to
sort
of
directly
edit
things
that
the
model
knows
by
actually
just
like
updating
the
floating
Point
weights,
and
this
works
in
natural
language
models.
To
do
things
like
changing
facts.
B
So
the
example
they
had
in
this
paper
called
Rome
was
you
say
you
know
the
Eiffel
Tower
is
located
in
and
then
the
models
says
Paris,
and
you
want
to
update
that
so
that
it
says
it's
in
Rome
and
they
actually
figured
out
a
way
to
do
that
and
it
then
it
actually
affects
other
things.
B
The
model
knows
you
can
say
like
what
landmarks
near
the
Eiffel
Tower,
it
will
say
like
the
Coliseum
and
things
like
that
and
so
I'm
currently
working
on
trying
to
make
that
work
in
the
context
of
code,
where
I'm
hoping
to
be
able
to
say
things
like
the
function
you
use
for
hashing
passwords
is
and
then
the
model
currently
wants
to
say
sha-256
and
that's
not
really
correct.
B
I
want
it
to
say
be
quick,
which
is
what
you
use
for:
hashing,
passwords
and
sort
of
other
data,
and
so
but
that
again
like
that
is
definitely
research.
That
is
not
a
product
that
can
be
deployed
today.
B
B
Unfortunately,
I
mean
the
recommendations
are
mostly
the
same
recommendations
you
would
give
for
people
who
are
writing
code
by
hand
like
you
should
test
this.
You
should
use
security
scanning
tools.
You
should
maybe
try
fuzzing
your
software
and
I
guess
pessimistically
I
expect
that
advice
to
be
taken
up
about
as
well
as
it
is
for
human
written
code,
foreign.
E
A
B
Yeah
I
mean
so
I
think
it
definitely
works
very
poorly
for
languages
that
are
not
well
represented
on
GitHub.
So
it's
awful
at
writing.
We
found
it's
awful
at
writing.
B
Verilog,
which
you
know
is
why
we
ended
up
fine-tuning
our
model
on
verlog
and
I
would
expect
this
to
extend
to
any
other
kind
of
languages
that
are,
you
know
more
Niche
or
popular,
including
probably
things
like
Haskell
or
you
know,
o
camel
or
what's
the
you
know,
other
things
in
the
lisp
family,
which
have
just
never
become
all
that
popular
and
even
kind
of
newer
languages
like
rust,
where
it
is
very
becoming
very
popular,
but
there's
still
just
not
nearly
as
much
rust
code
out
there,
because
there
is
python
code,
so
I
would
definitely
expect
it
to
pretty
badly
on
languages
like
those
I'm
trying
to
think
of
other
cases
where
it
will
definitely
do
very
poorly.
B
You
know,
I
think
without
additional
kind
of
prompt
engineering.
It's
going
to
do
worse
and
worse
the
further
you
get
down
in
your
code,
because
you
know,
as
these
long
range
dependencies
get
further
and
further
away
and
are
gone
from
the
context.
B
It's
just
not
going
to
be
able
to
know
anything
about
them
right
because
it
can
only
remember
the
last.
You
know
up
to
2048
tokens
and
so
going
beyond
that.
It's
going
to
start
to
get
yeah.
It's
just
not
going
to
know
anything
about
anything
outside
of
that.
A
A
If
you
can
just
even
quickly
probably
I
know,
we've
talked
about
this,
but
we,
if
you
can
share
with
the
audience.
What
do
you
think?
Where
are
we
heading
with
competition,
large-scale
models,
these
three
quantization?
Where
are
we
going
in
it.
B
Yeah
I
mean
so
I'm
actually
very
excited
about
where
this
has
been
going
so
far,
because
it
seems
like
people
keep
finding
ways
to
making
not
to
make
these
models,
use
less
and
less
memory
and
do
inference
faster
and
faster,
without
seemingly
harming
the
quality
of
the
output
at
all.
B
So
it's
it
was
sort
of
shocking
to
me
that
you
could
take
a
language
model
and
compress
it
down
to
just
using
four
bits
per
parameter
without
really
hurting
its
performance
and
I.
Think
that
that
trend
is
probably
I
mean
it
has
to
stop
somewhere,
because
you
know
I,
you
can't
represent.
You
can't
use
zero
bits
per
parameter,
but
I
think
that
we,
it
will
still
be
able
to
go
down
even
a
little
bit
further.
B
So
I
think
that
that's
going
to
be
really
nice,
so
quantization
is
going
to
be
a
big
area
and
it's
going
to
be
right.
Now,
it's
sort
of
only
a
few
models
have
been
quantized,
but
I
think
many
more
could
be
and
then,
as
far
as
just
pain,
speeding
up
inference.
This
is
something
we
actually
have
a
couple
students
looking
at
right
now
and.
C
B
Looking
at
positions,
can
we
do
on
the
existing
code
to
make
it
run
faster
on
you
know
existing
GPU
architectures,
you
know
that's
going
to
be
a
little
bit
tough
just
because,
for
example,
like
faster
Transformer
is
written
by
someone
at
Nvidia,
who
presumably
knows
the
Nvidia
Graphics
architecture
really
really
well,
but
we
do
think
there
are
some
opportunities
for
taking
things
up
even
there
and
we're
also
kind
of
looking
at.
B
How
do
you
speed
up
inference
in
the
case
where
you
know,
maybe
you've
got
a
data
center
with
some
fixed
capacity.
You've
got
lots
of
requests
coming
in
and
you
know
at
some
point.
Maybe
you
start
getting
overloaded
and
so
like.
Is
there
like
a
small
reduction
in
quality?
You
would
be
willing
to
sacrifice
in
favor
of
keeping
inference
time
very
low
and
there's
some
kind
of
more.
B
You
know
Advanced
strategies
you
can
start
to
deploy
at
that
point
like
much
of
the
time
when
you're
doing
inference,
you
don't
even
have
to
go
all
the
way
through
the
model
before
it's
pretty
clear
what
the
next
prediction
is
going
to
be,
so
you
can
sort
of
bail
out
halfway
through
that
inference,
step
and
say
yep.
B
If
the
next
token
is
like
with
90
probability,
gonna
be
this
one,
so
just
return
that
one
instead
and
that
that
seems
to
work
pretty
well
so
I
I,
think
there's
you
know,
I
I
think
things
are
only
going
to
get
faster
and
cheaper
and
pretty
quickly.
A
Yeah
agree,
thank
you
for
that.
I
know.
I
think
it's
from
both
me
Taylor
and
everyone.
Thank
you
so
much
for
the
time,
but
we
also
want
to
know
if
you
have
questions
first.
Oh.
B
Yeah
I'm
trying
to
think
so
I
mean
the
main
thing
is
really
just
you
know.
Are
there
things
that
you've
found
kind
of
about
the
way
things
are
currently
implemented
that
you
know
you
say?
Oh
gosh,
the
you
know
it
would
be
much
better
if
it
were
done
this
way.
You
know
this
is
clearly
not
something
we
could
deploy,
because
you
know
it's
something.
That's
fundamentally
broken
here
or
something
like
that.
B
I
know
that
Fred
has
been
very
kind
of
providing
pull
requests
to
help
improve
it
in
many
ways.
So
far
and
I've
been
trying
to
make
sure
I
give
some
attention
to
his
code
suggestions
but
yeah
you
know:
are
there
things
that
yeah?
Basically,
how
can
I
and
I
will
caveat
this
by
saying
that
I
don't
have
tons
of
time,
but
are
there
ways
that
I
can
help
make
Profile
better
for
the
use
cases
that
you
have
in
mind.
C
And
I
think
the
you
are
right
on
the
money
with
prompt
engineering,
I
think
that's
going
to
be
like
the
major
differentiator,
because
I've
been
playing
around
with
it,
and
it
really
greatly
depends
on
what
you
give
at
the
prompt
or
what
kind
of
return
you're
going
to
get.
That's
actually
meaningful,
so
I've
been
currently
working
on
authentication
so
because
we
don't
want
anyone
to
be
using
the
computer.
We're
gonna
host,
but
I've
also
been
working
on
the
vs
code.
Extension
for
the
gitlab
official
workflow
I.
Think
that
will
be
pretty
awesome.
C
If
we
can
like
also
kind
of
promote
that
within
the
project
that
we
can
get
contributions
there,
because
I
think
there
are
a
lot
of
people
with
really
clever
ideas
about
prompt
engineering,
because
right
now
it's
just
the
most
simplistic
thing
there
is
I.
Think
it
just
takes
the
the
last
I.
Don't
know
how
many
tokens
it
takes,
but
it
just
gives
it
and
yeah
it's
not
optimal.
C
B
B
I've
I've
definitely
wanted
to
be
able
to
point
users
to
an
extension
that
works
well,
particularly
with
faux
pilot,
because
right
now
there's
some
kind
of
you
know
you
can
hack
up
the
GitHub
copilot
plugin
in
various
ways
to
make
it
talk
to
Pho
pilot,
but
a
lot
of
things
might
be
a
little
bit
broken
and
it'd
be
really
really
nice
to
be
able
to
say,
hey,
look:
here's
a
actual
Visual
Studio
code
extension
that
you
can
install
that
will
work
with
copilot,
specifically
and
I.
Then
I'd
definitely
be
very
happy
to
start.
B
You
know
asking
for
and
hopefully
contributing
some
ideas
for
how
to
make
the
kind
of
prompt
engineering
side
better.
C
B
Wonderful,
okay,
I
would
absolutely
love
to
tell
you
know:
17
000
followers
on
Twitter
about
that,
and
you
know
promote
it
on
the
project.
Page.
That's
really
cool.
C
That
will
be
awesome,
yeah,
yeah
and
I.
Think
we
all
also
discussed
is
like
cicd.
That's
probably
something
that's
going
to
be
quite
crucial,
so
right
now
I'm
hosting
it
for
on
gitlab
that
does
have
cicd
but
yeah.
Maybe
we
could
have
a
follow-up
conversation
on
that
on
how
to
make
that
more
publicly
available.
B
Yeah
yeah
that
I
would
be
very
happy
to
do
that
and
I
may
actually
even
be
able
to
run
some
kind
of
CI
CD
server,
just
because
I
think,
like
a
GitHub
action
still
doesn't
support
things
like
actually
having
like
a
GPU
available.
B
A
Yeah,
we
can
definitely
do
that
with
gitlab,
with
our
GPU
enabled
Runners
as
well.
So
so,
yes,
sir,
that's
something
we
do
support
on
it.
So
yeah
I.
D
A
Said
we
already
used
YouTube
Runners
inside
our
team,
yeah
yeah,
so
yeah,
so
we
could
definitely
help
you
with
that
as
well,
yeah
for
sure,
but
other
than
that
anything
else
Brendan.
We
can
support
and
have
like
really
thank
you
so
much
for
this.
B
Sure,
yeah
and
I
guess
we're
a
little
bit
over
time.
So
I
don't
want
to
keep
everyone,
but
thanks
very
much.
You
know
it's
great
to
talk
with
folks
and
I'm
very
excited
to
hopefully
be
able
to
make
full
pilot
a
lot
better
and
you
know
have
a
lot
more
people
actually
being
able
to
use
it
to
public,
build.
A
Thank
you.
Thank
you
before
we
actually.
A
Feature
wish
list
for
full
pilot.
That
was
basically
the
everything
we
can
use
to.
Anyone
wants
to
contribute
into
that
a
whole
lot
of
lists
that
Brendan
actually
put
together
for
us.
So
thank
you
for
that
and
then
yeah
well,
thank
you
everyone
for
the
time
and
thank
you
Brandon
all.