►
From YouTube: Mimir Community Call 2023-07-27
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Because
so
hi
everyone,
we
are
doing
our
Community
monthly
meeting.
We
are
really
doing
this
month
because
we
skipped
last
one.
It
was
only
Peter
and
me
in
the
call,
so
we
decided
to
just
skip
it
and
today
we
have
some
points.
Some
news
around
the
new
upcoming
version
coming
out,
which
I'll
be
separating
the
release,
and
maybe
we
can
discuss
some
of
them
the
first.
The
first
point
is
about
the
health
chart.
I
guess
gradual
added
this.
B
Yeah
this
was
a
surprise
release.
I,
guess
we
surprise
everybody.
We
surprise
ourselves
a
little
bit
as
well,
because
yeah
so
Industries,
we
are
dealing
with
Port
security
policy
objects
in
in
kubernetes
and
apparently
Helm
is
in
great
at
handling
deprecated
apis,
meaning
that
if
you
have
a
deprecated
object
in
your
hand,
release
and
then
you
upgrade
kubernetes
to
a
new
version
that
doesn't
have
that.
B
Then
the
kubernetes
upgrade
will
work,
but
then
you
cannot
upgrade
the
ham
chart
anymore,
the
hamburgers
anymore,
because
Han
keeps
basically
a
history
of
what
it
installed
and
the
history
will
contain
the
deprecated
object
which
is
impossible
to
like
handle
at
that
point.
So
basically,
you
have
to
make
sure
that
you
remove
the
port
security
policy
objects
from
the
from
your
release
or
from
the
deployment
before
doing
an
upgrade
to
the
new
kubernetes
version,
so
that
so,
basically
what
we
did
yeah.
B
Down
well,
yes,
it's
in
the
reason,
not
sorry
it's
1.25,
so
1.25
removed
pod
security
policy
and
we
don't.
We
don't
render
Port
Security
Police
objects
on
1.25
since,
like
a
year
now,
but
apparently
that's
not
enough
for
him
for
him.
You
need
to
stop
rendering
it
in
the
on
some
previous
Series.
B
So
you
have
to
not
have
that
even
before
the
upgrade
of
kubernetes,
so
basically
what
we
did
at
5.0,
we
were
forced
to
do
this
major
release
and
this
major
release
basically
stops
rendering
or
security
policy
on
1.24
already,
which
is
a
breaking
change.
But
you
can
force
it
to
do
that.
But
you
have
to
be
aware
that
you
will
run
into
issues
with
kubernetes.
B
Then
this
API
has
been
duplicated
for
a
long
while
now
so
I,
don't
think
it
it
causes
any
issues
for
everybody
and
also
the
hand
chart
is,
can
run,
did
within
the
restricted
admission
control
for
admission
yeah
control
for
kubernetes,
so
you
can
switch
to
thus
to
admission
control.
So
it
shouldn't
be
a
big
deal,
but
we
like,
according
to
the
rules
of
the
helm,
versioning
and
and
compatibility
which
had
to
do
the
5.0,
and
it
contains
this
removal
or
not
so
that
that
was
5.0.
A
And
let's
talk
about
the
upcoming
release
and
we
are
still
deciding
the
date
for
this
release.
We
are
talking
about
switching
to
quarterly
release,
rather
than
six
weeks
that
we've
been
doing
up
until
now,
so
we
will
announce
soon,
probably
in
the
community
snaps
like
when
we
are
going
to
do
the
detention
we'll.
A
Document
and
some
news
we
have
here-
I-
don't
know
who
has
some
context
on
this,
because
I
have
no
context
any
of
the
changes.
B
Yeah
I've
I've
added
the
changes,
but
the
change
look
is
huge,
so
I
try
to
select
some
interesting
ones
and
the
first
one
is
a
minor
thing
which
is
just
performing
our.
You
know
policy
to
be
always
promises
API
compatible,
so
they
adopted
this
filtering
of
the
rules.
Api
and
then
I
listed
a
bunch
of
experimental
features,
and
maybe
Marco
will
be
much
better
at
talking
about
those.
D
Yeah
sure,
okay,
so
the
next
one
in
the
list
is
we've
added
the
an
experimental
support
to
Cache
the
queer
result,
responses
for
the
cardinality
API
endpoints
and
the
label
name
and
label
values.
Api
point
now
this.
This
caching
is
very,
very
simple
and
it
works
like
a
CDN
cache.
So,
given
the
same
input
parameter
input
parameters,
we
cache
the
response
for
a
short
period
of
time.
This
cache
is
not
invalidated
when
the
data
change
in
the
mirror.
D
So
the
idea
is
that
you
may
configure
this
cache
with
a
short
TTL
we've
rolled
out
these
caches
in
production
at
profound
labs,
with
a
TTL
of
one
minute,
so
very
short,
but
we've
seen
a
pretty
good
benefit.
Just
to
give
you
an
idea,
30
of
the
label
name
and
label
values.
Api
in
point
are
now
picked
up
from
the
the
cache.
Even
if
the
you
know,
the
TTL
is
is
pretty
short.
D
The
typical
example
is
grafana
dashboards,
called
the
label
value
API
in
point
to
populate
the
the
drop
down
menu
to
select
I,
don't
know
the
cluster
name
or
the
namespace
the
variables
basically,
and
if
multiple
people
open
the
same
dashboard
within
one
minute
or
if
the
same
person
refreshed
dashboard
within
one
minute.
The
second
time
will
be
picked
up
from
from
the
cache
someone
raised
the
hand.
D
D
D
D
We
haven't
noticed
any
issue
with
the
latest
version,
so
hopefully
we'll
turn
into
staple
relatively
soon.
Maybe
the
next
release
my
next
release,
I
mean
11.,
not
the
one
that
we're
going
to
publish
any
question
on
this.
Otherwise
we
can
move
to
the
next
one.
D
The
next
one
is
another
experimental
feature
and
the
idea
is
to
perfect
the
right
path
in
the
Injustice
rejecting
queries.
If
we
detect
that
the
ingester
is
overloaded,
when
this
feature
is
enabled
by
the
way
it's
disabled
by
default,
you
can
configure
a
CPU
utilization
and
memory
utilization
thresholder.
D
The
ingester
continuously
monitor
the
process
CPU
and
memory
utilization,
and
if
the
utilization
is
above
the
configured
thresholder,
it
will
start
rejecting
query
requested
but
will
keep
in
just
the.
A
D
Path:
data-
it's
something
we
are
still
playing
with.
We
haven't
fully
rolled
out
to
production,
yet
we
are
still
doing
quite
a
lot
of
testing
on
this
feature.
We
have
done
some
load
testing
to
see
how
effective
could
be
looks
like
working
as
designed.
D
I
mean
if
the
injection
is
overloaded
because
of
some
heavy
queries.
We
can
prevent
the
Injustice
from
being
either
overloaded
or
even
booming
at
the
cost.
Obviously,
of
starting
rejecting
queries.
D
Rolling
out
to
production
the
memory
based
limit,
we
haven't
rolled
out
the
production,
the
CPU
based
limit,
because
we
are
still
observing
some
edge
cases.
We
want
to
to
improve
in
how
we
compute
the
CPU
utilization.
We
are
currently
using
the
so-called
exponential
moving
average,
but
yeah
I
expect,
in
a
couple
of
releases
to
be
to
be
stable
and
ready
to
use
for
in
production
for
for
everyone,
any.
D
D
Yeah,
the
next
one
is
something
I
worked
on
and
that's
what
we
call
the
tsdb
had
early
complexion
now.
D
As
you
know,
the
most
recent
series
data
is
kept
in
the
ingester
memory
and
specifically,
the
data
structure
where
the
series
are
stored
is
the
tsdp
head,
which
is
basically
an
in-memory
data
structure
inside
tsdb
and
then
every
two
hour
we
run
the
so-called
gstb
head
compaction,
which
takes
all
the
samples
in
the
tsdb
head
with
a
timestamp
between
minus
three
hours
and
minus
one
hour
ago
and
compact,
a
new
block
which
is
stored
on
disk
and
uploaded
to
the
object
storage.
D
This
means
that
if
you
have
a
spike
in
the
in
the
in-memory
series,
it
can
take
up
to
three
hours
before
this
series
are
compacted
into
a
tstv
block
and
the
number
of
in-memory
series
drop.
D
Now
the
idea
of
the
family
compaction
is
to
add
the
another
dimension
or
another
trigger
to
decide
when
to
compact
the
tstp
head
before.
It
was
just
by
time
every
two
hour
regularly
with
the
tsdb
head
compaction,
it
all
other
than
the
regular
to
our
compaction.
D
We
can
also
compact
by
space
and
what
I
mean
by
space
is
when
we
detect
that
the
the
number
of
in-memory
series
grows
above
a
configured
threshold,
but
the
number
of
active
series
is
significantly
lower
than
the
number
of
in-memory
series
and
that's
mean
that
we
could
drop
the
in-memory
series.
If
we
trigger
an
early
compaction,
then
what
we
do
is
triggering
an
early
compaction.
So
basically
we
compacted
all
the
series
up
until
20
minutes
ago
and
20
minutes
is
not
a
random
number.
It's
actually
the
the
active
series
threshold.
D
So
since
this
logic
is
based
on
an
estimation
of
the
number
of
series
we
may
drop
if
we
trigger,
if
we
trigger
an
early
compaction
to
have
this
estimation
accurate,
we
use
the
active
series
tracker
to
detect
the
actual
number
of
active
series
over
the
past
20
minutes
and
then,
when
the
early
compaction
trigger.
D
We
compact
all
the
series
data
until
20
minutes
ago,
agenda
I've
shared
a
couple
of
screenshots.
Just
to
you
just
to
give
you
an
idea
here
you
can
see
one
single
memory,
cluster
deployed
in
multi-zone.
We
enabled
the
tsdp
early
compaction
in
one
zone.
We
kept
it
disabled
in
chiwada
zones,
so
same
exit
data
as
stored
across
the
Injustice.
D
You
can
see
that
without
the
helicopaction
sorry,
the
early
compaction
trigger
was
set
to
2
million
in-memory
series
pairing
gesture.
So
without
the
Hurley
compaction
in
just
a
real
memory,
series
grows
up
to
2.5
million,
with
the
in
with
the
early
compaction
enabled
we
see
that
it
keep
it
push
the
in-memory
series
down
close
to
the
2
million
threshold,
because
whenever
the
ratio,
the
number
of
in-memory
series
goes
about
this
2
million,
it
checks
if
there's
an
opportunity
to
trigger
an
early
compaction
again.
D
If
the
number
half
active
series
is
significantly
lower
than
the
number
of
in-memory
series
and,
if
so,
trigger
heneral
compaction
to
push
down
the
in-memory
series.
F
So
I
guess
the
the
goal
here
is
to
to
reduce
these
spikes
of
of
in-memory
series
to
produce
memory
usage
right.
Is
it
does
it
only?
Does
it
like
compute
these
early
Confections
just
to
get
back
below
the
threshold
or
does.
D
F
D
Mean
I
guess:
yeah,
let's
drop
below
it's
just
enough
to
fact
of
the
query
of
views
here,
so
here
we're
just
looking
at
the
ingester
with
the
maximum
number
of
in
memory
series.
D
A
D
D
To
show
you
know
a
long
history
of
of
results,
but
so
far
it's
working
as
suspected,
and
it's
helping
us
to
keep
the
ingestion
memory
utilization,
which
is
which,
on
the
right
path,
is
Mastery,
driven
by
the
healing
memory
series
under
control
whenever
there's
a
customer
with
a
nice
serious
charting
rate,
so
they
create
series
which
leaves
for
for
a
short
period
of
time
like
in
this
case.
Here
you
can
see
that
the
blue
line
is
the
maximum
number
of
active
series
across
the
Injustice.
A
A
E
Was
20
minutes
based
on
a
previous
architecture
of
the
storage
with
chunks,
or
was
that
a
different,
different,
different
reason
for
that
number.
D
I
think
it's
related
to
the
building
at
profile,
apps
historical
plan.
Oh.
D
Okay
to
myself,
hopefully
the
last
one
from
my
from
me
yeah,
that's
something
done
by
Yuri
another
another
engineer,
working
at
follow-ups,
so.
B
D
You
run
me:
you're,
probably
noticed
that
the
number
half
in
memory
series
between
Injustice
is
not
perfectly
balanced.
D
Now
we
spent
quite
a
lot
of
time
investigating
why
the
number
of
in-memory
series
are
not
perfectly
balanced
between
between
investors,
and
there
are
a
couple
of
reasons
one,
which
is
the
one
we
addressed
is
related
to
the
Token
ranges.
Basically,
the
ranges
of
the
tokens
assigned
to
each
ingester
inside
the.
E
D
Ring
is
not
balanced,
tokens
generated
randomly,
and
even
if
on
you
know
on,
when
do
we
need
generator
in
a.
D
Of
random
numbers,
you
may
get
a
fair
distribution
in
in
the
specific
case
of
Premiere,
where
we
generate
512
tokens
per
ingester.
You
may
end
up
with
some
imbalance
in
the
number
of
tokens
owned
by
every
single
ingestion.
D
The
second
reason
is
actually
related
to
shuffle
sharding
and
do
how
it
works,
and
we
don't
have
a
solution
for
that
yet
so
the
problem
we
solved
is
the
imbalanced
token
ranges
registered
by
Injustice
in
the
ring.
So
if
you
don't
use
Shuffle
sharding
with
the
new
token
generation
strategy,
which
we
call
the
spread
minimizing
strategy,
you
may
get
almost
perfectly
balanced
series
between
Injustice.
D
If
you
Shuffle
shredding
like
we
do
in
many
of
our
production
clusters,
you
will
still
experience
an
imbalance,
so
we
introduced
again
a
new
strategy
which
we
call
spell
minimizing
in
the
agenda.
You
can
see
the
screenshot
of
a
production
cell,
sorry
progression
cluster.
Before
and
after
the
migration
to
these
new
strategy.
Before
we
had
about
20
percent
in
balance
or.
D
Of
in-membraces
between
investors,
after
migration,
modulating
to
the
new
tokens
generation
strategy,
we
dropped
the
imbalance
to
below
0.5
percent.
If
you
don't
use
Shuffle
sharding,
like
you,
have
a
single
tenant,
for
example
in
your
cluster
or
a
few
tenants,
but
you
don't
need
a
shuffle
sharding.
Then
you
may
consider
migrating
to
this
strategy
was.
A
A
B
B
This
has
been
in
the
product
for
several
releases
now,
but
now
we
see
an
optic
in
people
starting
to
use
it,
and
we
noticed
that
when
profiling,
this
endpoint,
we
notice
there
are
some
things
that
are
not
very
optimal.
So
there
were
a
couple
of
optimizations
done
in
the
memory
utilization
and
the
algorithms
that
we
use
and-
and
the
reason
we
have
to
do
this-
is
that
the
opatority
endpoint
actually
converts
everything
that
you
send
to
its
in
the
open
Telemetry
formats
into
promato's,
metrics,
prometus
series
and
metadata
and
everything.
B
So
it
is,
it
is
doing
a
transformation
from
open
Geometry
to
to
promote
use.
Metrics-
and
another
thing
to
note
note
here,
is
that
it
already
supports
the
open,
Telemetry
exponential
histograms,
which
is
almost
but
not
exactly
the
same
as
Prometheus
native
histograms.
So
we
do
a
translation
for
that
as
well,
and
that
translation
was
missing
a
feature
which
is
the
dark
scaling.
B
So
there
is
a
feature
of
feature
difference
between
exponential
histograms
and
Native
programs,
which
is
that
the
open
gravity,
exponential
histograms
can
have
any
resolution.
So
basically,
the
buckets
that
you
had
in
those
histograms
can
be
arbitrarily
small,
but
for
practical
reasons,
the
promoters
native
histograms
restricted
to
to
a
certain
value.
B
C
B
This
release,
so
if
you
run
into
that
use
case
where
you,
for
example,
try
to
use
span
metrics
generated
through
opentametry,
those
band
metrics
would
be
too
high
resolution.
Especially
when
you
start
the
measurement
and
you
would
be
losing
those
because
we
would
reject
them.
But
now
that
our
scaling
works
and.
A
B
Will
get
the
get
those
metrics
now
there's
still
some
discrepancy
between
open,
Telemetry
and
and
the
and
prometus
regarding
histograms,
because
we
don't
support
the
the
delta
temporality
or
the
cumulative
temporary
histograms,
but
so
far
that
hasn't
been
an
issue
and
there's
already
some
processor
in
open,
Terminal
corrector
to
to
actually
convert
the
from
data
temporarily
to
cumulative.
So
I'm,
not
sure
that
will
be
support
for
that
in
the
future.
B
A
B
Yeah,
that's
what
happens.
The
the
specific
use
case
does
that
this
could
be
a
bit
like
tricky
is
when
then
your
server
is
auto
scaled
and
you
suddenly
start
accumulating
your.
You
know
Delta
temporary
metric
in
a
different
server
that
done
before
the
scaling,
but
but
then
again
histograms
support,
detecting
counter
results
and
and
such
things,
so
it
should
work
like
I
said
we
are
not
quite
sure
yet
if
we
need
to
do
anything
with
this
I'm,
just
mentioning
it
here
to
get
some
feedback.
B
If
somebody
runs
into
it,
then
you
know,
please
tell
us
and
Commit
This
Nightcore
Reach
Out,
all
right.
C
One
sorry
to
interrupt
only
one
note
on
this
point:
it's
slightly
related
with
orthogonal,
though,
which
is
that
in
Prometheus
we
are
about
to
merge
native
Auto
ingestion
as
well.
So
it's
like
similar
to
what
memory
is
doing
and
I
wondered
if
some
work
could
be
reused
or
if
there's
something
to
be
shared
here,
I
don't
know,
because
I
haven't
looked
in
detail
to
Mimi's
implementation
of
the
all
the
ingestion,
but
just
so
that
you
are
aware
that
this
is
happening.
B
Right,
though
yeah
the
name,
your
implantation
is
just
using
the
open,
termite,
contribute
Poland
or
contribute
GitHub
wrap
up
basically
GitHub
project.
Okay,.
D
Yeah
Jesus,
can
you
share
some
details
about
how
will
work
in
Prometheus.
C
So
it's
still
very
early.
Basically
what
we've
done.
We've
talked
with
the
hotel,
Community
folks,
and
so
they
have
in
the
other
collector.
They
have
code
for
remote
writing
into
Prometheus.
So
we've
basically
copied
that
code
to
create
our
own
native
endpoints,
and
the
idea
is
to
improve
support
for
auto
metrics.
C
So
the
first
step
was
just
copying
the
code.
Eventually,
it
will
will
be
removed
from
the
hotel
collector,
but
only
after,
like
we've
reached
a
certain
degree
of
stability
on
the
endpoint
yeah,
so
those
are
more
or
layers
the
state
chart
in
which
we
are
right.
Now.
B
B
B
You
know
we
never
promised
any,
not
raise
numbers,
but
these
are
coming
sometimes
so
one
big
chunk
of
work
that
we're
working
on,
which
is
really
working
in
promitus,
actually
is
the
support
for
out
of
order
ingestion
of
of
native
histograms,
and
the
only
thing
I
would
say
about
it
is
that
out
of
order
isn't
trivial
thing
to
solve,
and
Native
histograms
aren't
trivial
either
they
are
a
new
data
structure
and
there
are
some
analysis
there
that
are
quite
deep
and
then
we
are
trying
to
match
the
two
together.
B
So
this
is
some
quite
important,
but
we
are
working
on
it
and
it's
going
to
come
in
some
pull
up
release.
B
Another
thing
I
wanted
to
mention
is
that
there's
an
open
PRS
for
a
couple
of
months
now
on
mimir
about
adding
Auto
scaling
to
a
couple
of
simpler
components
for
mimir
and
another
engineer,
and
myself
are
working
on
trying
to
get
this
into
the
the
official
ham
chart,
because
now
we
feel
that
enough
time
passed
and
we
have
enough
kind
of
experience
with
with
the
auto
scaling
on
the
Json
net
side
and
in
production,
so
that
we
can.
B
C
B
Working
on
that
yeah
again,
I
don't
have
a
timeline
for
it.