►
From YouTube: Grafana Agent Community Call 2022-08-17
Description
Discuss the operator's future and continuing flow work.
A
Hello,
everyone
and
welcome
to
the
august
agent
community
call
and,
as
always,
we're
happy
for
anybody
to
jump
in
with
any
questions
or
if
anybody
has
any
comments,
if
you
don't
feel
like
just
jumping
in
feel
free
to
add
them
to
the
chat
and
we'll
just
go
ahead
and
jump
in
on
our
agenda.
We
have
the
state
of
flow
robert.
B
Thanks
matt,
I'm
trying
to
think
where
you
want
to
start
so.
We've
been
running
graphon
agent
flow
in
our
dev
clusters
now
for
13
days.
Is
that
sound
right,
4
minus
17
is
negative,
13,
yeah,
okay,
so
13
days
and
things
are
looking
pretty
good.
B
We
ran
into
a
few
issues
at
first,
where,
like
it
was
using
too
much
memory.
In
fact,
let
me
just
let
me
actually
show
everyone
that
might
be
more
interesting
than
me
just
kind
of
saying
it's
happening,
so
we
have
a
few
new
dashboards
for
flow,
but
I'm
going
to
start
with.
B
Operational
okay!
So
if
I
switch
over
to
the
agents
that
are
running
flow.
B
Pretty
much
I
I
we've
been
running
it
so
long
now.
I
don't
think
we
have
data
anymore
for
like
a
comparison
view,
but
really
this
is
pretty
pretty
comparative
to
the
normal
agents.
Bytes
per
series
is
around
six
ish,
which
is
pretty
normal.
These
eight
we
have
four
agents.
B
We
have
eight
agents,
but
with
like
two
replicas
each
and
then
across
those
it's
like
five
million
active
series,
so
I
think
we've
been
really
trying
to
hammer
flow
and
get
as
much
value
as
we
can
out
of
it,
and
it
looks
like
it
really
doesn't:
have
that
much
operational
overhead
compared
to
the
normal
agent,
which
is
which
is
good
and
pretty
promising
for
for
its
results.
We
also
have
this
new.
Let's
just
switch
to
light
mode.
That's
fine!
Let's
change
that!
B
We
also
have
this
this
new
dashboard
for
the
flow
controller,
so
you
can
see
that
we're
running
eight
agents
across
those
across
those
agents,
there's
368
components.
B
Every
component
is
healthy
and
those
components
are
being
updated
roughly
one
once
and
how
would
you
say
this
one
and
a
half
ish
times
a
second
most
evaluations
of
the
components
being
updated
and
re-evaluating
the
graph
takes
about
120
milliseconds,
pretty
quick,
but
sometimes
the
really
big
components
take
a
little
bit
longer
to
update
we've
been
trying
to
get
this
number
down
as
much
as
we
can,
and
I
think
it's
a
pretty
good
state.
B
I
think
so.
For
those
who
weren't
aware
the
way
flow
works.
Is
you
have
different
components
which
interact
with
each
other
using
like
a
declarative
config
and
if
a
one
of
the
component
updates
and
another
component
is
referencing
that
component's
config
the
component,
that's
referencing,
it
will
update
itself.
So
that's
what
we're
saying
about
the
graph
reevaluation,
where
all
the
components
that
are
referencing
other
components
get
updated.
B
So
sometimes
it's
slow,
and
I
think
if,
if
you
come
across
this,
the
indication
would
be
to
try
to
reorchestrate
how
your
components
are
structured,
maybe
like
we're
gonna,
reuse
information
as
much
as
possible.
What
we're
running
right
now
is
a
one-to-one
translation
from
the
existing
config
to
flow,
which
means
we
have
like
eight
service
discoveries
which
are
all
doing
the
same
thing,
because
that's
how
prometheus
would
work.
B
We
have
like
re-labeling
the
same
targets
over
and
over
again,
because
that's
how
prometheus
would
work-
and
I
think
that
adds
up
to
sometimes
re-labeling
pods-
take
takes
quite
a
long
time.
So
all
that
can
be
improved,
but
we
wanted
to
test
a
one-to-one
match
before
we
tried
optimizing,
something
that
you
couldn't
do
prometheus
matt.
A
Yeah
and
I
believe,
if
we
actually
wrote
it
flow
first
instead
of
conversion
like
the
config,
would
be
something
like
a
third
or
two
thirds
smaller
considerably
smaller.
B
Oh
yeah,
I
mean
they
can
so
we
already
kind
of
knew
this
that,
because
flow
trades
off
the
hierarchy
of
the
current
agent
for
a
more
flat
config,
that
the
new
config
will
be
a
lot
bigger
and
it
turns
out
to
be
like
twice
as
much
lines
of
code.
Although
that
that's
also
a
lot
of
like
white
spaces
that
that
you
wouldn't
really
have
in
yaml,
but
that
size
can
be
cut
down
even
further
to
be
more
yaml
size
equivalent.
B
So
should
I
move
on
to
the
the
fancy
ui
we're
working
on.
B
All
right
so
this
this
might
not
make
it
into
into
maine
just
worrying
everyone,
but
the
current
fl.
The
current
like
workflow
today,
would
be
here's.
Oh
sorry,
here's
like
I'm
running
an
agent
right
now.
Here's
this
config
file
where
we
have
three
components:
one
gets
a
file
from
disk,
the
other
one's,
going
to
scrape
brian
brazile's,
robust
perception,
demo
and
forward
it
to
this
third
component,
which
is
the
redhead
log
plus
remote
right.
B
B
So
you
can
see
here
we're
wiring
this
dynamic
label
to
the
file's
content.
So
the
file
content
was
changed
me
and
that
got
added.
They
got
merged
into
the
the
dynamic
label
of
the
of
the
target,
but
anyways.
This
is
pretty
big
and
what
we
found
out
for
our
kind
of
dev
cluster
workload.
B
This
file
was
500
megabytes
and
it
was
so
big
that
you
couldn't
even
load
it
into
a
browser.
The
reason
why
it's
so
big
is
because
of
all
those
like
those
of
those
like
repeated
service
discoveries,
so
we
have
eight
service
discoveries,
all
of
them
finding
22
000
pods,
so
that's
22,
000
pods
being
listed
in
like
a
fancy.
B
You
know
indented
format,
eight
times
it's
a
huge
file,
so
we're
trying
to
find
ways
to
work
around
that
which
the
ui,
which
I'm
going
to
show
you
in
a
second
will
help
with
the
other
kind
of
debug
endpoint.
Is
this
graph
which
uses
graph
is
to
render
out
the
dependency
graph?
It's
really
basic,
but
kind
of
taking
those
lessons
of
this
file.
This
one
giant
file
is
way
too
big,
matt,
durham
and
I
have
been
working
on
a
ui
to
kind
of
explore
what
what
a
better
world
might
look
like.
B
So
I'm
running
the
ui
right
now
with
mock
data.
This
isn't
real
data
it'll
be
replaced
with
real
data
soon,
but
this
is
just
the
kind
of
fake
data
right
now,
so
you
can
see
the
main
page
shows
a
list
of
components
where
I'm
running
four
and
I
kind
of
faked,
like
they're,
all
in
all
four
states,
so
local
file
is
healthy.
This
one's
unhealthy,
it's
one
of
them,
this
one's
exited
and
aside
from
this
view,
we
also
have
the
dag
view,
which
is
a
similar
view
of
the
graph.
B
But
now
it
lays
out
with
the
health
of
those
components,
and
you
can
kind
of
see
you
know
if
if
something
is
feeding
into
an
invalid
state
or
if
something
might
not
be
healthy,
because
it's
depending
on
something,
that's
unhealthy.
So
here
the
arrows
mean
reference
direction,
so
metric
scrape,
kubernetes
pods
is
referencing.
Metrics
remote,
right
default
and
metrics
from
by
default
is
referencing
the
api
key
to
write
with.
So
if
the
api
key
was
unhealthy,
that
would
kind
of
mean
that
you
might
have
problems
soon
with
metrics
remote
right.
B
That
makes
sense
to
everyone.
I
should
also
state
like
the
health
of
a
component
is:
is
independent
from
its
dependencies.
So
if
local
file
was
unhealthy,
that
doesn't
mean
that
anything
referencing.
It
is
also
unhealthy
because
health
is
is
calculated,
like
kind
of
you
know,
just
just
local
to
a
component,
so
these
are
all
links,
so
I
could.
I
could
click
on
either
a
component
from
here
or
a
component
from
here
to
go
the
component
page
I'll
click
on
I
forgot,
which
one
I
mocked
up.
B
So
I'm
trying
to
make
it
look
it's
a
real
demo.
I
think
it's
this
one
metricscrape,
so
here's
the
component
page
at
the
top.
We
have
just
the
raw
river
block
for
what
that
component
is
with
fancy
syntax
highlighting
so
its
targets
are
coming
from
kubernetes
service
discovery,
it's
forwarding
metrics
to
remote
rate
receiver,
and
you
can
see
that
it's
evaluated
arguments
is
like
this
list
of
targets,
which
only
has
the
one
fake
target
in
it
and
it's
forwarding
to
a
river
capsule
value.
B
We
might
make
this
a
little
bit
nicer
in
the
future
right
now.
It
just
kind
of
says
hey.
This
is
a
value.
We
don't
really
know
how
to
represent
to
you
and
then
inner
blocks
get
shown
as
indented
config,
so
the
job
name
is
being
put
in
like
a
separate
section
here
then,
at
the
bottom
we
have
the
components
that
this
component
is
referencing
and
the
health
of
those
components
which
might
be
interesting,
for
you
know,
figuring
out.
B
If,
if
the
whole
chain
is
working
on
the
left,
we
have
the
navbar
each
of
these
links
this
this
page
will
probably
get
pretty
big,
so
you
could
click
on
site.
You
know
right
now,
I'm
not
scrolling
it,
but
if
I
resize
my
window,
you
know
you
could
click
to
go
to
each
section
jump
from
the
from
the
from
the
nav
on
the
left,
which
is
helpful.
Okay.
So
there
are
some
extra
pages
we
haven't.
Oh
that's!
That's
a
bug
with
the
z
index,
but
there
are
some
pages
we
haven't
mocked
out.
B
Yet
we're
also
expecting
to
have
runtime
and
build
information
like
prometheus
would
show
you
right
now,
which
is
an
empty
page,
also
show
the
command
line,
flags
that
are
used
to
launch
the
agent.
Also
right
now,
that's
an
empty
page.
Same
thing
with
the
config
file,
this
would
be
the
raw
unevaluated
entire
config
file
that
the
agent
currently
loaded
successfully
into
memory.
B
So
I
think
all
these
things
will
be
generally
pretty
useful
for
the
agent
in
general
really
like
this
would
have
been
nice
to
have
today,
but
with
flow.
It's
a
lot
easier,
because
we
can
have
these
generic
pages
for
something
like
this,
and
if
we
wanted
a
more
specific
customized
page
for
what
it
means
for
a
metric
scrape
to
be
shown
to
a
user,
then
this
could
be
like
a
different
rendering.
B
That
kind
of
you
know
maybe
maybe
expands
out
this
target's
argument
into
like
its
own
table
or
whatever
for
debugging,
but
I
think
I
think,
because
of
because
of
the
component-based
concept,
we
were
able
to
make
this
ui
a
little
more
easily
than
we
would
have
been
able
to
with
the
agent
today,
given
that
the
agent
today
is
just
like
a
conglomeration
of
different
things.
C
I
think
my
question
is
more
about
flow
in
general,
like
I'm,
seeing
a
lot
of
things
that
are
not
compatible
with
the
agent
as
it
is
now
is
flow
going
to
replace
the
agent
as
we
know
it
in
the
near
future
or
what
are
our
plans.
B
Near
future
now
flow
will
okay,
so
the
plan
is
to
try
to
launch
flow
in
an
agent
release
next
month
before
the
next
community
call.
B
B
B
B
We
need.
We
need
a
way
to
do
clustering
with
flow.
You
know
to
kind
of
replace
the
scraping
service,
there's
a
lot
of
work
that
needs
to
be
done.
So
this
path
of
flow
becoming
the
default
and
only
path
it
depends
on
one
people
liking
flow,
but
then
two
a
lot
of
time,
so
I
think
it'll
probably
be
about
a
year
before
the
existing
agent
as
it
is
today,
goes
away.
A
C
B
Yes,
I
mean
not
necessarily
like
I
mean
there
will
be
functionality
that
you
could
say
only
make
sense
into
flow
like
I
prototyped
a
a
component
to
get
keys
from
vault
and
expose
those
secrets
to
other
components
that
might
need
vault
secrets.
That's
something
that
we
really
couldn't
do
within
the
agent
today,
because
it
really
depends
on
this
idea
that
some
part
of
the
agent
can
reach
out
and
ask
another
part
of
the
agent
for
a
value
and
have
those
things
being
wired
together.
B
But
most
things
like
the
fact
that
we're
working
on
flow
does
not
mean
that
we
don't
we
stop
working
on
the
current
agent
and
the
way
that
flow
is
built.
There's
a
lot
of
code
being
shared,
so
bug
fixes
for
how
metrics
get
scraped
will
impact
both
flow
and
the
agent
by
default.
We
should
keep
doing
it
like.
We
should
not
be
duplicating
code
right
now.
We
should
be
trying
to
share
as
much
as
we
can.
C
Yeah,
I'm
gonna
drop
a
link
here,
wow.
Where
is
chat
there?
It
is
basically
a
number
of
months
ago
we
made
an
rfc
titled,
let's
deprecate
the
agent
operator,
and
that
caused
a
lot
of
conversation
and,
frankly,
a
lot
of
concern
that
we're
not
committed
to
the
agent
operator
or
that
it's
going
to
go
away,
turns
out
people
generally
like
it
for
what
it
is.
So
we
just
made
a
document
here
to
kind
of
express
our
commitment
to
the
operator
and
talk
about
our
plans.
C
The
the
primary
thing
that
people
said
they
really
like
about
the
operator
is
that
it
lets
us
use
the
prometheus
crds
use.
Pod
monitors
service
monitors,
node
monitors
that
kind
of
thing.
It's
just
a
declarative
way
to
monitor
cluster
resources
and
people
really
like
that.
Our
our
immediate
plan
is
to
bring
that
into
the
grafana
agent
itself,
so
that
the
agent
itself
would
be
watching
those
custom
resources
and
generating
the
scrape
configs
internally
doing
the
service
discovery
and
going
this
works
really
well
as
a
flow
component.
C
Maybe
it
would
need
to
be
back
ported
into
the
the.
I
don't
know
we're
calling
the
existing
agent,
but
the
the
plan
for
the
agent
will
be
if
we
implement
the
the
services
that
not
service
discovery,
the
custom
resource
discovery
in
the
agent
so
that
it's
handling
all
that,
then
a
large
part
of
the
operator.
That's
reloading
the
agent.
C
Every
time
you
deploy
a
new
pod
is
a
lot
simplified
and
the
operator
then
becomes
primarily
for
deploying
the
agent
itself,
which
is
still
an
important
job,
and
people
are
using
it
for
that.
I
think,
particularly
where
there's
a
complicated
sharded
config
or
you
have
a
lot
of
integrations
or
you
have
multiple
agent
deployments.
C
The
operator
can
be
really
useful
to
declaratively
define
what
grafana
agents
you
want.
There's
daemon
sets
or
stateful
sets
or
deployments
that
can
get
pretty
complicated.
So
the
agent
is
a
good
way
to
do
that,
but
it's
also
a
lot
more
static
config.
So
at
that
point
we
may
be
able
to
address
alternatives.
We
may
say:
oh
this
helm
chart
does
really
well,
but
that's
not
to
say
we're
ever
going
to
say.
C
Oh,
the
operator
is
dead
unless
it
turns
out
the
operator
really
isn't
useful,
but
I
think
it
will
be
and
we
can
continue
to
make
it
useful.
So
we've
had
a
lot
of
a
lot
of
issues
with
the
operator.
There's
some
documentation
issues,
there's
some
usability
issues
and
some
maintenance
issues,
but
I
think
the
bottom
line
is
we
can
address
all
those
internally
without
making
a
lot
of
big
external
changes,
and
that's
really
the
crux
of
it.
C
I
don't
want
people
to
be
saying:
oh,
it's
it's
going
to
go
away,
so
we
shouldn't
use
it
if
it's
useful
it'll
stay
we're
not
going
to
take
anything
away.
I'm
certainly
not
going
to
like
we're
committed
to
these
use
cases
of
declaratively,
defining
your
your
monitoring,
especially
the
pod
monitors
and
service
monitors
and
things.
I
think
those
are
really
valuable.
We're
going
to
keep
supporting
those
we're
going
to
make
the
best
way
possible
to
deploy
the
agent
and
we
don't
like
breaking
stuff.
C
So
hopefully
we
can
assuage
some
of
those
fears
and
just
make
this
project
really
good.
A
Since
this
question's
been
asked
a
few
times,
the
grafana
operator
is
currently
labeled
as
beta.
Do
we
have
any
idea.
C
Yeah,
I
I
put
a
little
note
about
that,
because
that
question
has
come
up
as
well.
It
is
labeled
beta.
I
think
it
started
as
an
experiment.
We
weren't
sure
if
it
was
going
to
catch
on
and
turns
out
it
did
catch
on
people
like
it.
So
we
I
think
we
it's
not
beta,
is
in
we're
not
committed
to
it.
C
It's
beta
is
in,
it
still
needs
work,
and
since
I've
outlined
kind
of
some
major
refactorings,
I
hope
there'll
all
be
internal
changes,
but
I
don't
want
to
commit
us
to
kind
of
a
style
of
we're,
not
gonna
change
anything
at
all
ever,
but
certainly
any
changes
would
need
to
be
well
justified
and
have
a
transition,
and
once
those
are
done,
I
assumed
we'd
revisit
the
beta
designation
go
ahead.
Robert.
B
B
B
If
something
is
beta,
that
means
we're
more
confident
about
the
use
case
being
supported,
but
the
way
that
use
case
gets
exposed
to
users
might
change
over
time.
So
if,
if
you
use
the
operator
because
you
want
the
use
case
of
prometheus
crds,
that
is
something
that
we
have
committed
to,
but
the
operator
as
a
delivery
mechanism
could
potentially
change
whenever
right,
like
that
is
what's
in
beta.
C
C
So
tldr
bringing
the
prometheus
operator
crds
into
the
agent,
I
think,
is
pretty
uncontroversial.
People
seem
to
really
like
that
idea.
It
means
the
operator
is
not
required
to
take
advantage
of
those
features
which
I
think
is
universally
a
good
thing
and
the
operator
doesn't
really
need
to
change
externally
to
handle
that.
So
that's,
I
think,
a
good
change
that
we're
looking
at
in
the
future,
the
near
future
and
then
from
there
that
just
lets
us
kind
of
re-evaluate.
C
C
What
is
the
agent
operator,
okay,
yeah?
What
is
the
operator
for?
We
have
a
lot
of
options
and
we're
gonna,
reassess
it
but
kind
of
constantly
the
goal
is:
make
it
easier
to
monitor
your
stuff,
particularly
with
those
crds
that
people
like
and
make
it
easy
to
deploy
the
agent
we're,
not
gonna
change
anything
that
makes
any
of
those
wrong
and
we're
not
gonna
rip
anything
out
from
people
that
are
depending
on
it.
So.
A
All
right
any
questions
about
the
operator
before
we
move
on.