►
From YouTube: Infrastructure sync for Code Suggestions accelerated GA
A
A
For
code
suggestions,
I'll
start
I
have
the
first
item:
I
want
to
take
a
look
at
the
current
state
of
infrastructure
what's
running
in
the
cluster
and
make
sure
there
aren't
any
problems.
First
thing
we
see
is
like
this
error
can
scale
up
nodes.
I
think
this
is
for
the
GPU
instances.
So
I
don't
think
it's
anything
to
worry
about.
A
I
wanted
to
check
out
the
model
gateway
to
kind
of
see
where
we're
at
I
don't
think
we
ever
had
a
problem
with
the
model
Gateway
and
we
we
increased
the
number
of
replicas,
so
I,
don't
think
there's
any
problems
here,
I
think,
once
we
have
requests
and
limits
set
for
this
service,
then
it'll
probably
scale
down,
because
right
now,
I
think
we
statically
set
the
number
of
replicas
for
the
service,
but
I,
don't
I,
don't
see
any
problem.
A
B
A
C
A
C
Yeah
I
guess
on
that
as
well.
The
only
part
is,
we
were
looking
in
analyzing
Triton
and
we
definitely
need
to
optimize
that
and
the
only
way
is
I
think
I've
got
to
put
it
in
the
corrective
action
is
on
Dynamic
batching,
but
that
is
something
we
would
probably
look
into
next
week
and
that
would
also
help
with
the
load
on
Triton
side
as
well,
but
that
would
be
nice.
Okay,.
A
Super
I
also
have
number
two
I
wanted
to
go
over
like
what
the
neck
like
the
high
priority.
Reliability
tasks
or
infrastructure
tasks
are
for
the
next
two
weeks.
Really
these
are
probably
high
priority
for
this
week.
The
first
one
is
to
get
the
kids
manifest
into
CEI,
I
I.
Think
like
we
can.
We
should
I'm
going
to
try
to
set
a
goal
to
have
this
done
in
staging
by
the
end
of
today.
A
I
think
Devin
is
able
to
help
now,
so
that
will
help
I'm
also
moving
like
doing
less
Disaster
Recovery
today
and
more
of
this,
so
I
can
probably
help
as
well
as
far
as
like
temporarily
temporarily
interrupting
the
service.
Do
we
think
that's
a
big
issue
like
if
it's
just
for
like
a
few
seconds,
I
assume
it
isn't?
As
in
like
it's
it's
beta,
we
expect
blips
of
downtime
when
I
do
do
you
agree
with
that.
C
A
A
A
I
think
like
I'm,
hoping
that
we're
talking
about
seconds
of
no
more
than
seconds
of
service
Interruption
but
we'll
I'll
impress
on
Devin
that
will,
if,
if
he
does
do
it
today,
I'm
not
sure
if
it's
going
to
happen
that
he
should
make
an
announcement
and
let
people
know
what's
going
on
but
I'm
hoping
like
we'll
we'll,
probably
just
do
staging
today
and
then
we
can
talk
about
production
tomorrow.
C
Right
I'll
just
plug
it
in
we
don't
get
since
with
the
user
production,
we
don't
get
like
just
like
it's
louder
a
bit.
It's
not
working
for
one
second
yeah.
C
A
And
then
infrastructure
is
code,
I'm
gonna
start
by
importing
the
project
into
our
terraform
pipeline.
That
will
allow
us
to
start
selectively
importing
the
Clusters
or
resources
in
that
project
into
CI.
So
this
is
separate,
then
moving
the
kids
manifest
into
CI
yeah.
So
I'll
I'll
do
this
as
well.
A
The
third
item
is
request
limits
for
model
Gateway
Andres.
Did
you
have
something
for
that.
A
A
We've
we've
adopted
like
a
couple
different
conventions.
One
is
is
like
that
we
just
set
the
request
the
same
as
limit
so
that
you
basically
have
a
generous
request
value
provided,
and
you
set
the
limit
to
be
very
close,
but
this
really
depends
on
the
type
of
how
the
workload,
whether
the
workload
has
varies
like
spiky
usage
or
not
I
I,
don't
know
if
this
does
I,
don't
think
it
does,
but
for
the
model
Gateway
like
I,
assume
it's
fairly
stable.
But
what
have
you
seen
so
far.
A
B
Slowly
but
I'm
trying
to
wrap
it
up
this
week.
Okay,.
A
Okay
and
then
the
last
item
is
resolving
the
Prometheus
recording
rules
Bob
since
you're
here.
Do
you
have
any
status
update
on
this
I
know
you've
been
following
it
a
little
bit
I,
don't.
D
D
There's
two
ways
around
that
one
is
doing
all
the
recording
in
Thanos,
which
now
gets
all
the
metrics
through
tunnels
received.
That's
something
that
I'm
trying
out
right
now,
because
it
ties
into
other
work
that
I'm
doing.
We
had
a
blocker
there
that
I
don't
know
yet
how
big
it
is.
They
intend
to
look
into
that
today.
The
alternative
approach
we
can
take
there
is
deploy
a
select
set
of
rules,
yeah
deployed
select,
set
of
rules
to
to
this
Prometheus
server,
and
that's
probably
the
more
boring
solution
for
that.
D
We
need
to
have
the
Manifest
and
so
on
and
CI
I
think
so
as
soon
as
that's
done,
I
think
we
should
explore
both
options
in
parallel.
The
other
thing
that
needs
to
be
addressed
is
the
labeling
of
kubernetes
type
metrics.
They
need
like
label
type
and
so
on,
applied
I.
Think
Nick
added
a
comment
with
some
details
to
one
of
those
issues.
I
haven't
looked
into
that
yet,
but
that
will
make
our
saturation
metrics
show
up
on
the
dashboards
and
make
the
service
get
into
capacity
planning.
D
D
Yeah
label
underscore
type,
so
then
we
can
create
the
recording
rules.
That
does
it.
You
know
the
yeah.
A
A
D
I'm,
focusing
on
the
slis
I
hope
to
get
like
call
on
the
tunnels
thing
and.
B
A
Okay
sounds
good
Monterey.
You
have
number
three.
C
Sorry
yeah,
so
we
do
have
a
request
based
on
how
these
changed
the
settings
from
having
global
settings
on
to
default
and
with
all
the
migration
everything
we
have
a
whole
lot
of
users
who
had
enabled
it
and
need
to
re-enable
it,
and
so
we
want
to
know
I
think
it's
if
we
can
pull
all
the
user
IDs
for
the
last
30
days,
for
who
are
the
authentication
request
for
code
suggestions?
A
Possible
we
would
have
to
pull
the
logs
out
of
object,
storage
so
into
bigquery
and
then
do
a
query
that
way.
I
can
I
can
maybe
look
into
this.
Do
we
have
the
the
query
that
we
need
to
run
in
order
to
get
the
user
ID
like
looking
at
elasticsearch?
Do
we
know?
Can
we
get
the
last
seven
days
already
and
so,
and
so
we
can
take
that
and
look
back
30
days.
C
Yeah
I
can
I
can
send
that
to
you
to
you
as
well
or
I
can
ask
John
as
well
to
look
into
that
yeah
I'll
post
this
on
our
code,
suggestion
slack
Channel
and
then
I'll
tag.
You
then
jar.
A
C
I
believe
it
is
as
urgent
as
possible
based
on
the
fact
that
there
are
users,
who've
been
disabled
us.
Well,
they
don't
know
they're
disabled
and
we're
not
even
sending
a
announcement
to
them.
We're
just
doing
it.
On
the
back.
C
Are
we
going
to
send
them
I
believe
I
mean
I
think
this
is
just
against.
That's
just
me,
yeah
we
are
I
believe
we
took
the
last
seven
days
and
re-enabled
through
the
rails.
Consult,
oh
I.
Don't
think
that's
the
right
way
to
do
it,
but
I'm.
C
A
You
point
me
to
the
to
before
we
like,
because
this
is
going
to
take
a
little
bit
of
work,
and
did
you
just
point
me
to
where
the
slack
thread
or
issue
or
something
where
this
decision
was
made,
because
it
doesn't
feel
right
to
me
like
I,
feel
like.
If
people
want
to
use
this,
they
should
just
enable
it
themselves.
C
So
I
am
also
as
I,
don't
know
how
to
lack
of
a
very
shocked
with
all
of
these
decisions
happening
when
I'm
sleeping.
Oh.
A
D
A
Okay,
if
you
could
point
me
to
where.
A
Dig
into
it
a
bit
more
as
far
as
timeline
on
us
like
receiving
more
traffic,
we
have
the
web
IDE
coming
up
soon.
Right
and
that's
gonna
happen
this
week.
C
As
of
today
morning,
we
are
incrementally
rolling
out.
I'll
have
to
check
what
are
we
hundred.
C
A
A
Trying
to
kind
of
think
of
like
okay,
if
we
have
X
number
of
users
like
maybe
a
very
small
percentage
of
them,
will
turn
on
the
setting,
because
it's
still,
even
after
this
is
enabled
right.
They
still
have
to
turn
on
the
setting
right
to
enable
the
code
suggestions,
code,
suggestion,
integration
right.
C
D
A
A
Okay,
so
now,
when
they
use
the
web
IDE
code,
suggestions
will
be
enabled
by
default.
In
other
words
like
we'll,
have
it
we'll
have
X
number
of
users
using
the
web,
IDE
and
they're
all
typing
code,
and
that
they're
all
going
to
be
like
sending
prompts
to
code
suggestions?
It's
going
to
increase
load
quite
a
bit
if
there
are
a
lot
of
people
using
the
web,
IDE
right.
A
C
A
Okay
sounds
sounds
good,
I.
Think
that'll
that'll
help
a
lot
if
we
can
enable
it
incrementally.