►
From YouTube: 2023-06-29 Scalability Demo
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
I,
don't
I
didn't
have
anything
planned,
but
I'll
take
the
chance
to
to
talk
a
little
bit
about
timeline
because
I
happen
to
have
been
assigned
as
the
coordinator
for
this
week
for
the
capacity
planning
issues
and
so
I'm
trying
to
figure
out
how
to
get
timeline
to
adjust
to
changes.
So,
for
example,
one
of
the
alerts
that
I
was
looking
at
was
this
one
for
potential
saturation
for
PE
bouncer
client
connections,
which
forecasts
that
we
will
reach
80
percent
somewhere
soon
in
August
of
September.
A
But
this
doesn't
seem
to
match
what
the
current
behavior
is.
So
it
seems
that
we
had
a
base
label
that
was
around
20
and
then
or
lowest.
Level
is
now
like
25
percent,
but
it
seems
pretty
stable
at
that
point.
A
But
I
guess
timeline
likes
to
be
pessimistic
about
stuff
and
said.
Well.
If
we
had
that,
then
our
project
that
we'll
have
we'll
continue
to
have
small
bombs
and
then
things
will
get
out
of
control.
But
I
don't
see
a
reason
to
believe
that
so
I
was
talking
with
pop
and
he
talked
about
this
file.
A
Something
I
already
have
opened,
which
are
the
forecast
parameters
where
you
can
add
change
points
and
see
how
that
affects
I'm,
still
trying
to
figure
out
like
if
that
makes
timeline
more
or
less
bias,
tours
changes.
If,
if
you
add
a
checkpoint,
it's
from
I
was
trying
to
generate
things
locally,
but
they
seem
really
slow,
but
yeah
I'm
still
trying
to
figure
out.
B
A
So
let
me
show
you
what
I
have
so
far
so
I
change
the
attorney
thing
to
have
only
the
component
that
I
want
yeah.
A
B
A
Yeah
I
remove
it
for
now
and
then
how
are
you
running?
Okay,
so
I
tried
the
thing
that
you
said,
which
was
to
run
so
I
have
downloaded
the
data,
so
I
have
now
this
data
folder,
which
is
a
couple
of
gigs,
and
then
I
shall
run
now.
So
I
was
doing
this
and
the
the
thing
that
I
was
looking
is
that
there
is
in
the
documentation
this
this
flag,
that
you
said
time
lapse
timeline
only
cash.
A
I
was
playing
around
with
it
and
it
doesn't
seem
to
I,
don't
know
if
it's
checking
if
the
data
folder
is
dead,
because
it
seems
that
regardless
of
whether
I
put
a
flag
or
not,
it
is
using
the
couch.
But
let's,
let's
run
it
like
this,
which
yeah
now
with
the
cast
downloaded,
it
is
really
fast.
Maybe
not.
A
A
So
I
think
before
I
took
that
out.
I
think
it
worked,
I
mean,
let
me
just
because
I
it
was
working
before.
Obviously,
when
you
demo
something
it's
bound
to
break
okay,
so
it
wasn't
that
did
it
make
any
other
changes.
Let
me
just
leave
here.
A
A
And
then
I
was
yeah.
I
was
debugging
this,
but
that
doesn't
matter
I,
don't
know
if
I
still
have
the
SSH
terminal
up,
even
if
I'm,
using
the
cache.
C
C
A
B
A
It
didn't
matter
but
something's
wrong
with
my
so
I
had
to
so.
This
is
I
was
there's.
B
C
B
B
D
A
A
Setting
up
this,
and
one
reason
is
that
apparently-
and
this
was
something
that
I
think
I
can
come
to
a
couple
percent,
but
when
I
migrated
from
my
Intel
MacBook
to
this
M1
MacBook
I
did
the
migration
assistant
and
that
copied
a
bunch
of
Homebrew
installation
stuff
that
was
x86
and
I
had
to
just
remove
all
of
that
and
I
think
a
couple
of
things
from
my
broker,
including
my
SSH.
A
No
tunnel,
is
not
working
now,
but
I'll
have
to
figure
out
that
offline,
but
I
guess
I'll
just
take
a
chance
to
to
ask
Bob,
because
you
mentioned
that
adding
a
trend
change
point
to
help.
Can
you
speak
more
about
what
the
logic
is
behind
that
yeah.
B
So
a
change
Point
by
default,
timeland
or
like
profit
does
this
adds
about
I
think
it's
25
change
points
and
the
first
80
percent
of
data.
So
that
means
that
this,
this
change
that
we
see
at
the
end
of.
B
End
of
June
ish
yep
won't
have
a
change
Point
yet,
which
is
why
it
says:
okay,
it's
trending
upwards!
Now,
if
you
add
a
change
point
there,
then
it
it
changes
the
trend
line
and
the
mean
will
will
change
to
towards
like
because
then
that's
where
the,
where
there's
a
trick
in
Timeline,
so
you
can
see
them
and
then
like
that's
where
the
the
the
the
mean
line
changes.
So
every
time
like
you
see
this
blue
line
every
time
it
changes
Direction!
B
That's
where
a
change
point
is
so
what
you're
saying
now
by
adding
a
trend,
change
point
you're
allowed
to
change
direction
here.
B
So
with
that
I'm
hoping
that
the
what
I'm
expecting
that
change
to
do
is
it'll
make
the
growth
less
steep
on
the
end,
but
it
will
make
the
confidence
wider,
so
I
don't
think
it's
going
to
remove,
or
maybe
it
will,
because
yeah
the
confidence
range
is
not
that
wide.
B
A
Get
it
okay,
I'll
I'll,
keep
playing
with
that.
I
know
that
wasn't
too
informative,
but
yeah
I
just
take
it
at
least
take
this
home.
If
you're
going
from
into
my
book
to
an
M1,
don't
do
the
migration
thing,
it
brings
problems.
D
F
B
F
B
The
thing
between
when
they
went
to
a
x86
is
pretty
good.
I
thought
anyway.
I
I
have
another
question:
I
just
came
out
of
a
call
with
some
people
that
work
on
the
cost
service,
apparently
that
talks
to
Red
is
persistent.
B
B
It's
been
no,
no,
it's
okay,
I'll
I'll,
just
asking.
If
we
knew
about
it,
I'll
find
the
issues
and
issues
and
mention
it,
but
there's
a
Workhorse.
Also.
Does
this
I
think
that's
probably
something
that
we
need
to
keep
in
the
same
instance
as
rails,
because
it's
something
with
uploaders
I,
don't
know.
B
G
A
Oh,
so
it's
not
searching
the
same
data
that
rails
does
it's
just
a
distinct,
separate
subset
of
data.
D
B
B
The
idea
is
to
have
the
AI
Gateway
that
gitlab
instances,
including
gitlabs
gitlab.com
SAS,
dedicated
self-managed,
can
talk
to
not
a
single
one
but
multiple
ones,
depending
on
on
the
deployment
in
the
beginning,
I
suspect
it's
going
to
be
gitlab,
sauce
and
self-managed
instances
talking
to
a
single
AI
Gateway
and
the
this
is
going
to
be
my
suggestion
that
they
talk
grpc
to
that
Gateway.
The
reason
I
picked
I
would
suggest.
Trpc
is
because
we
don't
need
to
do
versioning.
That
means
we
can
keep
the
API
pretty
stable.
B
B
The
idea
is
that
the
gitlab
instances
provide
all
the
data
that
they
can
to
the
Gateway
and
the
AI
Gateway
decides
what
to
do
with
it,
how
to
generate
whatever
responsive
needs.
So
that
would
mean
that
we
create
a
separate
grpc
service
for
each
of
the
things
that
we
wanted
to
do,
and
here
I've
added
an
example
for
the
code
suggestion
service.
B
Yeah,
that's
about
it!
If
the
for
Coach
suggestions,
we
also
have
a
component
in
visual
studio
and
the
web
IDE
a
would
be
that
those
things
talk
to
their
respective
gitlab
instances
that
then
forward
the
request
to
the
eye
Gateway
right
now.
This
communication
happens
in
just
regular
rest,
I'm
wondering
if
that
could
also
in
a
future
iteration
become
something
that
does
protobuf.
So
if
we
do
that,
then
we
would
be
able
to
forward
things
that
the
gitlab
instance
that
it's
talking
to
doesn't
know
imagine
a
situation
where
we've
got
a
developer.
B
That
has
the
most
recent
version
of
the
the
code
extension
but
they're
working
at
a
bank
that
has
a
gitlab
version.
That
is
two
versions
behind
the
their
self-management
behind
and
they
could
still
use
newer
features
that
we've
introduced
to
the
vs
code
extension.
If
the
gitlab
instance
that
sitting
sits
in
between
the
PS
code
extension
and
the
AI
Gateway
just
transparently
passes
on
whatever
it
gets,
which
I
think
we
could
do
with
protobuf
3..
B
That's
that
out
of
the
way
I
mentioned,
that
I
would
recommend,
building
a
service
for
each
feature
that
we
built.
So
we
have
a
code
suggestion
feature
we
could
have
a
chat
feature.
We
could
have
a
summarize
issue
feature
the
reason
I
do.
B
So
that
means
that
we
can
iterate
faster
and
people
can
see
improvements
faster,
regardless
of
what
version
of
gitlab
that
they're
running,
but
also
they
want
to
call
out
yeah.
Currently,
we've
built
some
some
features
in
gitlab
inside
the
monolith.
So
what
these
features
do
is
just
call
out
to
open
AI
with
like
a
regular
requests
and
provide
AI
things.
The
idea
is
that
they
are
using
our
keys
for
that.
So
this
doesn't
work
for
self-managed.
B
That's
why
we
need
them
to
go
through
the
AI
Gateway,
but
because
these
features
are
already
built
with
them
that
are
already
built
inside
the
monolith
and
directly
tied
to
the
provider
I'm
suggesting
to
build
like
a
proxy
in
grpc
to
account
for
them.
But
I
am
recommending
to
migrate
them
over
to
feature
specific
RPC.
When
we
can
that's
about
how
far
I
got
any
thoughts.
A
So
currently
I
think
the
current
situation
is
a
bit
messy
right
because
there's
some
features
that
are
built
in
the
monoliths
that
call
out
to
open
AI
or
to
Google
vertex,
but
I,
think
code.
Suggestion
is
different
in
that
it
speaks
directly
to
to
vertex
and
Google
right.
B
Yes,
but
it's
also
different,
because
the
clients
that
use
code
suggestions
reach
out
immediately
to
the
model
Gateway,
which
is
going
to
become
the
AI
Gateway,
and
it
doesn't
have
any
information
on
the
gitlab
instance.
So
we
want
to
start
routing
requests
from
Visual
Studio
through
gitlab.com.
If
you
are
using
gitlab.com.
If
you
are
working
on
a
project
that
is
hosted
on
gitlab.com
you're,
going
to
proxy
the
the
request,
togetlab.com
gitlab.com
is
going
to
add
any
information
that
it
has
about
your
project
or
whatever,
before
forwarding
it
to
the
AI
Gateway.
A
B
F
I
can
maybe
inject
some
product
context.
Yes,
please,
if
you
look
at
one
of
the
hottest
kind
of
competitors
in
the
market
from
Source
graph,
they're,
doing
really
really
well
in
performance
with
their
code
suggestion,
because
they're
able
to
take
this
Source
graph
and
use
it
as
a
vector
embedding
to
improve
the
results
that
get
back
because
they're
you
know
open
AI
model
or
whatever
it
is
suddenly
has
a
lot
of
context
about
what
your
repository
is,
the
file
structure
and
all
of
those
sorts
of
things.
F
So
in
order
to
not
be
left
out
of
that
market,
I
think
we're
putting
a
you
know
Gateway
in
place
right.
So
if
we're
on
that
round
trip,
when
we
do
think
about
that
information
gathering
it
and
maybe
turning
it
into
Vector
embeddings
to
make
that
easy.
For
you
know,
stage
teams
or
the
A9
I
enabled
team
to
do,
and
if
we
go
straight
to
model
we.
Obviously
we
can't
inject
that
additional
information
into
the
prompts.
B
That's
the
discussion
that
we're
going
to
have
in
the
future
as
well
like,
where
are
the
embeddings
going
to
live,
because
we
don't
do
any
embeddings.
Now,
we've
built
some
database
tables
for
it
in
a
separate
database,
but
I
would
hope
that
we
could
do
this
inside.
The
I
I
haven't
written
a
lot
about
this.
B
Yet
because
I
don't
know
a
lot
about
it,
but
I
was
hoping
that
we
could
also
do
this
in
the
in
the
AI
Gateway,
because
then
we've
taken
a
database
dependency
and
out
of
the
critical
part
for
gitlab.com.
So
then
we
have
this
experimental
embeddings
database
that
may
or
may
not
be
hosted
on
cloud
SQL
and
if
it
becomes
unavailable,
gitlab
doesn't
boot,
which
is
yeah
annoying.
B
But
I'm
I
don't
know
enough
about
this,
yet
too
really
see.
If
that
will
work
properly.
B
The
idea
of
going
through
gitlab
like
gitlab
instance
for
code
suggestions
as
well
to
enrich
them
what
we
send
off
to
the
model
like
without
using
embeddings
or
anything
to
enrich
it
with
complex
information,
for
example,
we
could
add
stuff
like
we
are
building
this.
This
file
is
going
to
contribute
to
this
issue.
That's
being
worked
on
and
yeah
information
about
the
issue
could
then.
A
F
Yeah
and
and
further
down
the
line
in
terms
of
like
opportunity
cost
the
unique
value.
I
guess
we're.
Bringing
in
this
space
is
is
the
data
we
have
and
the
Insight
we
have
is
on
the
gitlab
platform
right
with
a.com
or
self-hosted,
and
if
we're
not
injecting
that
in
somehow
pick
an
llm,
you
should
get
similar
results.
That
aren't
you
know
USP
for
us.
B
The
main
concern
that
I
still
need
to
address
somehow
in
the
document
is
latency
that
it's
added
by
the
gitlab
instance,
because
the
model
Gateway
that
we
were
using
before
is
super
lightweight
and
all
the
latency
is
added
from
talking
to
the
model,
and
we
need
to
look
into
like
the
Mobile
Gateway
hop
is
very
fast.
That's
not
really
what
I'm
worried
about,
but
I
suspect
that
the
gitlab.com
or
gitlab
instance
hope
is
going
to
be
more
expensive
and
it's
not
something
we
always
control.
If
we
don't
control
the
instance.
D
C
D
B
The
the
initial
ID
that
I
discussed
with
Andrew
and
was
not
thinking
enough
about
them,
self-managed
installations
and
so
on,
but
was
the
design
where
the
vs
code
extension
keeps
talking
to
directly
to
the
Mobile
Gateway
and
the
Mobile
Gateway
requests
information
from
gitlab.com
to
enrich
the
the
prompts
that
it
sends
to
to.
D
B
It's
kind
of
like
the
zeroth
iteration,
it's
a
talked
about.
I've
talked
about
this
with
Stan
and
with
andras
from
the
I.
Don't
know,
there's
two
AI
teams
and
he's
from
one
of
them
that
works
on
the
model.
Gateway.
B
So
it's
not
just
me,
but
it's
not
enough
people
yet
I'm
going
to
work
with
this
on
this
with
Matthias
andras.
B
F
Thinking
about
kind
of
what
you
showed
this
morning,
Igor
in
having
to
switch
some
hard-coded
stuff
in
the
extension
to
get
you
know
the
demos
working
and
things
like
that.
Have
we
considered
how
we're
going
to
approach
the
versioning
in
a
way?
That's
not
gonna
kind
of
cause,
multiversion
incompatibility
at
the
extension
IDE
layer,
but.
B
All
the
extension
is
first
I've
mentioned
I've
mentioned
this.
I've
mentioned
this
in
the
dock.
A
little
bit
I
think
it
would
be
super
cool
if
the
the
communication
between
the
IDE
and
whatever
gitlab
instance
is
also
grpc,
because
that
makes
it
easy
to
not
do
versioning
like
if
the.
If
the
thing
sends
information
that
gitlabs
on
gitlab
the
their
gitlab
instance
understand
great
you'll
get
better
results.
If
it
doesn't,
then
that's
too
bad.
B
We'll
work
with
what
we
do
know,
and
the
cool
thing
I
think
would
be-
is
that
in
theory
with
the
protobus
V3,
the
gitlab
instance
doesn't
need
to
know
about
everything
that
it
receives
like.
If
we
keep
them
the.
B
If
we
reuse
the
same
protobuf
specification
across
the
tree
across
the
tree,
we
can
have
the
editor
extension
call
out
with
everything
it
knows
to
gitlab.
Gitlab
can
add
stuff,
but
it
doesn't
need
to
open
up
what
it
reads
from
the
from
the
from
the
IDE.
So
that
means
that
if
the
messages
are
well
formed,
both
above
things,
the
AI
Gator
will
just
get
them,
even
if
the
gitlab
instance
in
between
doesn't
know
about
them.
F
C
B
C
E
B
System,
yeah
yeah,
if
we
need
a
clean
cup
but
I
think
for
this
kind
of
work,
where
we're
basically
just
gathering
information
and
then
massaging
it
into
a
way
to
present
it
to
whatever
like,
in
this
case,
Google
vertex.
B
A
I,
don't
know
if
I'm
on
the
appreciating
or
underestimating
the
abilities,
the
capacities
of
Ides
extension,
but
I
I
will
just
double
check
if,
like
because
I
know,
there's
vs
code,
but
there's
also
jet
brains.
Another
like
that.
Just
double
check
that
all
of
them
will
allow
you
to
to
add
grpc
I,
don't
know
if
they
just
have
yeah.
B
The
first
iteration
doesn't
do
that
I.
Think
I
think
we
could
get
the
same
thing
done
with
just
Json,
but
then
we
need
to
figure
out
a
way
at
the
the
gitlab
player
to
translate
Json
that
I
don't
understand
into
protobuf.
That
I
don't
understand.
E
E
B
E
E
Yeah
I
I
had
another
question.
So
if
we're
going
to
go
with
grpc,
what
does
that
mean
for
the
implementation
of
the
AI
Gateway?
The
early
discussions
I
saw
around
that
were
to
kind
of
repurpose
what
we
have
for
model
Gateway
and
extend
that
code
base.
Is
that
still
the
planet?
Would
we
then
do
a
grpc
server
in
Python
and
is?
Is
the
language
runtime
and
the
grpc
support
a
concern
to
be
considered
there.
B
B
But
it's
something
to
look
into
and
for
the
first
iteration
of
code
suggestions.
I,
don't
think
it
matters
because
I
think
that's
just
going
to
stay
rest,
so
we
can
make
gitlab
the
instance
just
a
proxy
for
what
the
IDE
extension
is
already
sending
that
yeah
stays
the
same.
So
the
code
suggestion
Service
as
it's
currently
running
at
coaches.getlab.com
stays
working
for
a
while.
We
update
the
extension
so
I
don't
think
it
changes
anything
in
the
short
term.
B
G
Well,
I
think
I
think
maybe
I
I
missed
the.
What
what
do
we?
What
do
we
gain
by
using
grpc
instead
of
Json.
B
Easier
versioning,
easier,
bi-directional
streaming
if
we
wanted
to
for
a
lot
of
data.
G
Mean
okay,
so
so
you
keep
adding
fields
and
the
payloads
get
larger,
and
and
if
you
want
to
change
the
semantic
meaning
of
any
field,
then
you
have
to
kind
of
still
go
through
like
a
deprecation
process
and
you'll
continue
to
have
that
bloat.
So,
especially
with
like
early
adoption
stuff,
where
we're
going
to
you
know,
we
have
to
assume
that
we're
going
to
include
things
that
we're
going
to
care
about.
You
know
in
version
two
three
four
and
whatnot:
it.
C
G
Guess:
I'm
not
seeing
how
that's
a
win
that
we
get
over
having
fields
that
you
know
can
be
present
for
absent
in
Json.
B
B
Nothing
really,
we
need
to
add
an
API
between
two
services
that
we
run
and
generally
we
pick
trpc
for
that.
Like
generally
a
lot
of
the
times
at
gitlab,
we've
picked
grpc
for
that,
so
I,
the
first
documentation
I
read,
was
using
Json
and
then
people
said
well.
Why
not?
Grpc,
okay,
okay,
I!
Don't
have
strong.
E
I
think
one
of
the
benefit
one
of
the
potential
benefits
is
it's.
It
manages
by
means
of
having
a
well-defined
schema,
schema,
it
kind
of
solves
the
the
schema
question.
It
doesn't
solve
it
completely
because
if
you
ever
want
to
remove
Fields,
you
kind
of
have
to
Define
them
as
being
optional
and.
C
B
E
B
G
B
I'm
not
going
to
argue
about
the
protocol.
To
be
honest,
if
somebody
prefers
Json
I'm
very
happy
to
do
it.
That
way,.
G
I
was
mainly
thinking
of
it
in
terms
of
if
I
was
missing
some
benefit
and
also
the
client
support
that
we
were
talking
about
a
few
minutes
ago.
B
A
Yeah
I
I
had
a
good
thing.
Maybe
this
is
not
too
scalability
or
I
guess
it
is
because
we
we
want
to
report
metrics
on
it,
but
one
thing
that
I
was
looking
at
on
code
suggestion
is
that
we
are
not
really
so
now
that
we
are
using
external
models.
One
key
thing
that
we
have
to
handle
all
the
time
is
that
the
length
of
the
input
which
is
not
in
characters
because
for
language
models,
you
have
to
grab
your
string
of
characters
and
divide
it
into
tokens.
A
But
the
thing
is
that
the
tokenizer
is
different
per
each
model
and
we're
using
an
external
model.
So
we
don't
know
what
they're
using
so.
But
the
thing
that
we're
doing
right
now
is
we're
just
saying:
let's
take,
let's
allow
the
user
to
to
only
add
up
to
2048
characters,
and
if
we
look
at
the
logs
we're
hitting
the
limit.
A
But
then,
when
we
go
to
the
Google
documentation,
you
have
a
limit
of
of
that's
bigger
than
that
and
it
is
not
in
characters,
but
in
token
lens,
and
so
the
program
that
we're
facing
is
that,
because
this
is
a
black
box,
we
are
not
able
to
grab
the
string
tokenized
it
and
then
send
the
tokens
to
the
model,
because
the
model
only
accepts
accepts.
The
string
accepts
the
string
and
then
tokenizes
internally.
So
one
strategy
we
could
use
here,
which
will
be
because
it
just
grab
a
random
token
Essence.
A
Here,
for
example,
we've
talked
about
using
an
open,
AI
token
answer,
which
is
obviously
not
what
Google
is
using,
but
just
to
get
an
estimate
because
token
assets
are
similar
just
to
get
an
estimate
of
what
we
do.
Of
of
what
of
how
many
tokens
are
contained
in
the
input
string,
and
then
we
can
do
things
so,
for
example,
Stan
was
working
on
something
like
well.
A
If
we
are,
if
we're
sending
under
the
Google's
token
limit,
then
we
can
include
some
more
contexts
like
the
import
statements
at
the
top
of
the
file,
and
that
should
give
you
better
suggestions,
because
now
this
Concepts
about
what
libraries
are
being
used
and
that
kind
of
stuff
I
also
needed
to
to
report
metrics
to
Prometheus
I'm,
currently
reporting
token
stories
character
link.
But
that's
not
that
useful,
because
yeah
token
is
the
real
measure.
B
What
does
the
like
to?
Does
the
truncation
happen
now
at
the
AI
model
Gateway
or
at
the
vs
code?.
A
At
the
Gateway,
so
now
we
are
the
wait.
Actually
I
don't
know.
I
know
this
is
this
is
what
they
were
discussing
here.
C
A
Were
saying,
oh,
it's
2048
patterns
three
characters
which,
depending
on
encoding
that
translates
to
more
or
less
bytes,
yeah
I,
don't
remember
at
the
moment,
I
think
I
think
for
sure.
I
can
say:
I,
don't
know
if
it's
happening
on
the
editor
or
on
the
Gateway,
but
we're
sending
at
most
2048
characters
to
Google,
and
we
have
a
much
larger
capacity
which
we're
just
going
to
Serious
capacities.
One
thing
that
we
have
seen
is
that
you
can
apparently
send
we
haven't
hit
the
upper
limit
in
Google.
A
We
try
sending
like
a
thousand
sorry
10,
000
characters
and
Google
didn't
complain.
It
just
apparently
dismisses
some
of
the
earlier
tokens
that
you
send
and
only
and
only
processes
the
what
was
the
number
the
8
000
latest
tokens
that
you
send
so
I
was
also
looking
at
I.
Don't
know
here,
it
is
I
was
also
looking
at.
Maybe
this
is
not
the
best,
but
there's
also
this
tokenizer.
A
It
might
be
a
better
fit.
I
have
very
little
to
go
by,
except
that
Google
publishes
models
that
say
that
they
are
that.
A
That
way,
but
yeah,
the
one
of
the
things
that
I
still
don't
wrap
my
Mana
around
is
that
this
is,
in
any
case,
very
inefficient
because
we're
going
to
tokenize,
but
that's
not
going
to
include
influence
in
any
way
what
the
model
does,
because
the
model
is
just
going
to
take
the
industry.
So
it's
kind
of
unnecessary,
but
that's
the
best
way.
We
have
to
measure
things
in
our
site
now
that
we're
using
something
a
model
that
we
don't
control.
G
F
G
Okay,
got
it
got
it
and,
with
regard
to
the
tokenization
I
kind
of
didn't
follow
at
what
point
in
in
the
in
the
data
flow,
does
tokenization
happen
and
does
the
model
consume
the
tokens
or
does
it
actually
do
the
tokenization
itself.
A
So
when
we
had
our
Aura
model
and
what
we
did
was
take
the
info
string
and
then
before
passing
it
to
the
model,
there
were
like
three
steps:
there's
pre-processing
model
and
process
processing
and
the
pre-processing.
We
took
the
string
changed
it
into
tokens,
and
that
was
that
was
what
the
model
processed.
G
D
G
G
A
Yes,
yeah.
Definitely,
there's
no
yeah,
there's
never
going
to
be
a
case
where
a
character
is
divided
into
multiple
tokens,
yeah,
okay,
and
if
you
look
at
actually
that
page,
that
I
was
showing.
It
does
says
that
for
for
the
models
that
we're
using
a
token
is
roughly
code
into
forget
so
a
beginning,
heuristic
will
be
to
just
divide
your
input
string
length
by
Fork
that
but
I
think
we
can
do
better.
Maybe.
G
G
So
this
is
again
from
the
from
the
the
memory
bloats
that
we're
seeing
on
dedicated.
We
at
this
point
have
have
two
two
code
paths
that
are
associated
with
freeing
this
this
bulk
memory,
when,
when
we
terminate
a
single
individual
socket,
let
me
make
this
a.
G
When
we,
when
we
kill
the
difference,
so
these
are,
these
are
two
code
paths
in
the
kernel
for
freeing
socket
buffers.
So
this
one
is
what
we,
this
will
be
assassinate,
a
particular
socket
by
injecting
a
TCP
reset
into
into
this.
The
stream.
The
context
here
is:
we've
got
a
long-lived,
TCP
connection
where
the
client
has
set
its
receiver
window
to
be
zero
bytes
and
therefore
we're
not
allowed
to
send
any
data
to
it.
G
But
server
is
still
spitting
up
data
that
it
wants
to
send
to
the
client,
and
so
it
accumulates
an
ever-growing
backlog
of
data
in
this
socket,
which
is
why
we
have
why
it
accumulates
a
lot
of
memory
usage
and
when
we
kill
that
session.
This
is
the
specific
code
path
that
we
get
for
for
for
releasing
those
socket
buffers.
So
at
the
top
you
can
see
it's
kind
of
got
the
the
generic
page,
freeing
code
and
kind
of
in
the
middle.
Here
we
can.
G
We
can
see
that
that,
when
we,
when
we
send
when
we
use
TCP
kill,
is
a
is
a
small
utility
that
just
injects
DCP
resets
into
an
existing
socket
and
it
uses
the
send
to
assist.
Call
to
do
so,
and
we
can
see
that
the
way
that
ends
up
kind
of
you
know
during
that
Cisco
actively
freeing
the
socket
buffer
pages.
Is
it
processes
it
processes
the
we
get
to
see
that
here?
G
Wait
it
processes
this
it
it
processes,
as
as
a
consuming
the
the
backlog
of
receive
packet,
Direction
and
and
detecting
the
the
reset
and
purging
the
the
right
Cube,
which
is
the
which
is
one
of
this,
is
related
to,
but
not
identical
to
the
transmit
buffer,
and
so
that's
how
we
end
up
getting
to
free
a
bunch
of
this
bulk
memory.
G
This
this
particular
example
was
was
from
a
single
socket
that
had
over
100
megabytes
of
data
accumulated
in
its
backlog,
which
is
way
more
than
what
the
limit
should
have
been,
and
then
the
more
generic
path
is
when
we
have
a
clean
shutdown
that
uses
a
fin
exchange
instead
of
a
DCP,
a
hard
TCP
reset.
G
C
G
Been
hunting
for
a
while
to
find
the
exact
allocation
path
and
the
and
the
exact
free
paths
and
I
was
pretty
confident
that
there
were
going
to
be
at
least
two
relevant
cases
for
the
for
the
for
the
for
freeing
pages
and
we've.
We
got
both
of
them
yesterday,
so
that
was
that
was
first
info.
We're
gonna
do
like
a
a
kind
of
a
a
summary
write
up
on
this
soon,
maybe
maybe
today,
maybe
tomorrow,
so
there'll
be
some
some
writing.
That
kind
of
gives
more
context
around
it.
G
Let's
talk
about
that,
so
I
I
would
love
to
get
your
opinion
on
this
Bob.
So
at
so
the
as
mitigations
we're
gonna
we're
gonna.
We
are
gonna.
Do
the
the
CC,
the
cctl
adjustment,
to
stop
using
compound
Pages,
because
that
greatly
reduces
but
doesn't
eliminate
the
blood
rate.
G
Periodic
restarts
of
nginx
is
probably
going
to
be
the
simplest
thing
to
to
as
a
Band-Aid
and
as
a
long-term
strategy.
We're
thinking
about.
Having
we're
thinking
about
asking
our
development
team
to
have
web
sockets
have
an
upper
bound
on
how
long
it
will
patiently
wait
when
it
sends
server
pains
to
the
client
and
the
client
isn't
responding,
because
the
the
reason
these
sockets
are
staying
held
open
for
so
long
is
because
there's
a
TCP
proxy
layer,
probably
a
firewall
or
other
network
security
device.
G
That's
injecting
TCP
keep
alive
packets
and
the
TCP
stream,
but
it
can't
do
that.
You
know
within
the
SSL
within
the
TLs
tunnel,
so
which
is
how
the
the
web
traffic
gets
carried.
G
So
if
we
at
the
the
websocket
layer
of
the
protocol
stack
require
clients
to
respond,
the
the
machine
in
the
middle
isn't
going
to
be
able
to
spoof
that
and
so
that
that
would
be
a
a
reliable
way
to
detect
whether
or
not
the
client
really
is
there,
and
we
can
set
that
if,
if
that's
feasible,
we
could
set
that
as
a
configurable
timeout
for
with
a
really
generous
upper
bound.
As
long
as
it's
like,
not
days,
which
is
how
long
it
takes
to
fill
up
memory,
and
this
would
help
out.
C
G
B
G
I
were
kind
of
you
know,
brainstorming
yesterday,
and
that
that
was
that
was
kind
of
where
we
were
leaning.
We
don't
think
it's
going
to
be
feasible
to
deal
with
this
at
the
nginx
layer,
because
well,
because
it's
it's
not
it
would
anyway,
the
I
don't
want
to
take
up
too
much
time.
G
We're
pretty
confident
that
dealing
with
it
at
the
internet
layer
is
not
going
to
be
feasible
for
multiple
reasons,
but
doing
it
within
the
websocket
stream
seems
like
it
would
be
a
a
significant
win
and
it
does
look
like
it's
the
websocket
TCP
connections
that
are
responsible
for
the
large
majority
of
this
memory
leak.
Possibly
all
of
it.