►
From YouTube: Ceph Developer Monthly 2023-02-01
Description
Join us monthly for the Ceph Developer meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
A
No
kick
it
off
or
wait
a
few
more
minutes.
C
A
So
I
shared
that
agenda
in
the
chat
I
see
Ilya
on
the
call,
which
is
awesome,
because
the
first
topic
is
Ilias.
A
This
first
topic
is
about
test
failures
caused
by
Health
warnings
and
there's
this
PR
attached
or
associated
with
the
topic
Ilia
I'll.
Let
you
take
that
away.
E
I'll
give
a
brief
summary
of
Where
were
I
with
it.
So
in
the
task
stuff,
idiom
in
QA
I
discovered
that
the
ignore
list
or
Badness
check
wasn't
doing
two
things
correctly.
It
wasn't
checking
for
standard
error
and
then
also
it
was
pointing
at
the
wrong
log
location
for
checking
for
health
warnings.
E
There's
a
tracker
there,
I
just
posted
in
chat
that
shows
the
failure
condition.
I
did
create
a
draft
PR
to
take
into
account
and
check
standard
error
and
the
correct
log
location
and
then
from
there
I.
E
Wanted
to
test
it
against
live
RBD
API.
So,
like
a
lot
of
the
other
components
in
the
product,
we
ignore
list
had
standard
entries
that
we
kind
of
just
copy
all
over
the
place
for
each
of
our
tests.
E
At
least
that's
what
we
do
for
RBD
AP
I
live
already
API,
so
I
use
that
standard
list
about
five
entries
that
we've
had
to
run
another
test
in
pathology
with
the
draft
PR
that
I
mentioned
that
fixes
the
log
location
and
standard
error,
and
it
ended
up
running
and
it
checked
and
saw
that
those
five
entries
work
Nord.
But
then
there
were
an
additional
blue
of
entries
that
were
non-related
to
the
lib
RPD
API
test
itself.
E
It
seems
not
present
in
other
tasks
so
like
the
Manila
theft
task,
Badness
check
it.
Those
five
entries
that
I
included
before
that
aren't
highlighted
are
actually
all
that's
needed.
So
from
there
it.
E
This
Behavior
has
been
present
instead
of
ADM
for
an
Inception
I
believe
it
was
copied.
It
was
like
sf2
task
before,
and
it's
been
there
for
about
three
years.
So
with
that
there's
some
challenges.
We
we
merge
that
I
draft
PR
in
there
will
be
a
additional
checks
that
other
components
will
have
to
have,
or
we
could
go
a
different
Avenue
as
well,
and
that's
up
for
discussion
at
this
point
of
how
we'd
like
to
proceed
from
the
greatest
team.
E
Laura
did
a
similar
analysis
on
live
art
or
on
Laredos,
like
I
did
with
the
live
RBD
API.
Did
you
want
to
share
that
now
Laura
and
then
we
can
have
a
discussion
afterwards.
A
Sure
so
the
the
biggest
concern
on
the
radio
side
of
things
was
foreign.
Just
with
merging
this
PR,
we
don't
want
to
end
up
breaking
the
radio
Suite
with
warnings,
so
I
write.
Random
excuse
me,
I
ran
this
PR
through
the
radio
suite
and
I
noticed
that
it
mostly
affected
tests
that
had
the
stuff
ADM
task
in
it,
which
naturally,
since
that's
what
your
PR
is
making
changes
to.
A
So
the
rate
of
sweet
duplicates
a
lot
of
those
tests
from
The,
orchestrator
Suite,
so
a
lot
of
tests
that
were
specific
to
rados
weren't
affected,
but
a
lot
of
tests
that
are,
you
know
like
radius
cephadium.
Those
are
the
ones
that
fail
due
to
your
PR
catching
cluster
Badness
like
an
OSD
down
warning,
or
things
like
that.
A
So
I'd
say
it
was
a
good
amount
of
tests
that
were
failed,
but
there
were
that
there
was
a
clear
pattern
of
which
tests
were
failing
and
it
was.
It
was
clear
to
me
that
they
were
all
Cafe
DM
related
and
could
be
likely
fixed
with
a
white
listing,
so
just
from
the
rados
side
of
things
that
that's
the
biggest
thing
for
us.
That
that
we
wouldn't
want
the
pr
to
to
break
things
upon
merging,
but
that
was
my
main
analysis
that
there
was.
A
D
A
D
Rbd
we
it's
actually
the
reverse,
there's
only
a
couple
of
jobs
that
are
using
this
fpdm
and
everything
else.
That's
still
using
the
old
kind
of
pathology
way
of
deploying
chef
and
therefore
the
the
the
set
of
white
lists
in
the
IBD
Suite
is
actually
complete.
D
Nothing
is
going
to
break
when
when
this
PR
gets
merged,
but
the
reason
I
I
wanted
to
bring
this
into
wider
forum
is
that
we
basically
have
a
an
instance
of
a
you
know,
fairly
major
Suite,
where
you
know
some
parts
of
it
are
also
duplicated
in
other
Suites,
like
you
said,
and
then
some
of
the
other
Suites
were
actually
migrated
to
cefadm,
which
means
that
they're
no
longer
using
the
old
self-task,
but
the
new
self-adm
task
for
deploying
staff
and
the
the
situation
is
basically
that
for
I
think
almost
three
years,
the
the
check
that
that
you
know
checks
for
cluster
Madness,
as
you
put
it,
the
checks
for
These
Warnings
has
been
completely
absent
from
from
separate
DM
task
and
I
think
there
there
is
now.
D
You
know
a
sentiment
that
you
know
maybe
maybe
like
we
don't
need
to
to
grab
for
These
Warnings.
Maybe
we
don't
need
to
fail
deaths
based
on
These
Warnings,
like
as
a
project
as
a
whole,
and
so
basically
there
is
like
we
are
at
a
kind
of
decision
Point
here
either
we
go
ahead
and
add
a
ton
of
white
lists
to
this
fadm
Suite
I'm,
basically
duplicating
all
of
the
latest
white
listing,
because
fadm
doesn't
do
anything
special.
D
You
know
with
regards
to
higher
level
functionality.
It's
basically
just
a
repeat
of
the
way,
the
suite
with
some
separately
and
specific
stuff
in
it.
D
So
either
we
do
that
and
just
overhaul
this
fadm
Suite
with
with
whitelists
or
I
guess
allow
lists
as
they're
called
now
or
with
just
drop
them
all
together
from
All
Suites
and
stop
caring
about
them.
That's
the
kind
of
the
project
quiet
decision
that
that
needs
to
be
made,
and
that's
what
I
want
to
get
out
of
this
meeting.
D
D
Personally,
for
IBD,
like
I,
said
nothing
is
going
to
break
when
this
VR
is
mirrored
so
we'll
find
either
way
and
I
would
actually
prefer
that
you
know
this
morning
stay
and
that
we
still
that
we
continue
failing
tests
if,
if
something
pops
up
but
I
do
agree
that
it's
you
know,
it
may
seem
tedious
and
it's
basically
the
same
set
of
you
know
or
almost
the
same
set
of
allow
lists
that
gets
copied
from
job
to
job,
sometimes
rather
meaninglessly.
D
So
I
could
see
what
Chris
was
looking
into
it.
I
kind
of
went
ahead
and
I
could
see
that
in
a
bunch
of
jobs,
in
particular
in
the
upgrade
Suite,
where
it's
it's
not
like,
it's
not
really
curated,
it's
basically
just
a
recursive
copy.
Every
time
it
releases
made
with
some
thick
steps
to
make
it
work.
There
is
a
bunch
of
like
allowance
that
are
clearly
excessive
and
shouldn't
be
there
and
then
also
a
bunch
that
are
missing.
D
They
still
useful,
or
should
we
just
get
rid
of
them.
A
And
also
it
would
help
to
explain,
kind
of
or
identify
what
types
of
warnings
we're
seeing,
but
this
is
part
of
what
I
did
in
my
analysis
here.
I
can
link
this
in
the
in
our
main
tracker
for
this
meeting
as
well,
but
I
looked
through
the
the
tests
that
failed
and
why
they
failed
and
kind
of
categorized
what
the
Badness
warning
was,
and
the
majority
was
like
for
an
OSD
down
health
warning.
A
A
So
of
course,
we'd
have
to
see
exactly-
or
maybe
it
made
more
sense
to
look
at
what
the
test
is
is
actually
testing
for,
but
most
of
the
the
warnings
fell
into
those
categories
and
I
don't
know
if
we
can
consider
those
as
something
to
whitelist
like
okay,
an
OSD
went
down,
but
as
long
as
it
came
back
up,
we
don't
care.
So
we
can
whitelist
that
or
or
allow.
D
It
no
I
I
think
for
the
for
the
OSD
down
like
if
the
test
doesn't
actively
Mark
those
views
down
that
that
shouldn't
happen,
and
that's
actually
part
of
the
reason
that
that
I
think
that
these
these
things
are
useful
because
stuff
like,
for
example,
flow,
Ops
I,
think
it
would
be.
It
would
be
hard
for
us
to
find
a
a
a
job
that
runs.
D
You
know
a
good
enough
amount
of
IO
that
doesn't
quite
lose
slow-ups
because
they
just
then
like
this
warnings
just
tend
to
pop
up
and
that's
a
bit
like
that's
you
know,
that's
the
one
that
I
that
I've
never
objected
to
to
live
listing,
but
things
like
PG,
degraded
or
or
OSD
down,
or
you
know,
man
down.
D
D
You
know
when,
when
just
one
monitor
is
up
and
then
you
know
three
monitors
are
up
and
then
just
there
are
no
osds
and
then
there's
one
OSD
and
so
on
and
I
think
in
in
some
cases
some
folks
may
have
may
have
run
into
into
kind
of
those
kind
of
occurrences
and
then
added
These
Warnings.
Do
they
allow
list
for
the
entire
job?
D
D
H
I
It's
been
a
long
time
since
I
spent
much
of
my
work
today
doing
these,
but
lots
of
tests
depend
on
these
checks
like
we'll,
you
know,
run
a
file
system
scrub
and
then
you
know
shut
down
and
assume
that
if
there
was
an
error,
then
it
will
have
populated
the
log
as
a
warning,
and
so
then
the
test
will
fail.
So
I
think
we
still
have
enough
tests
that
don't
run
in
stuff
ADM
that
we've
been
getting
good
coverage.
D
No,
so
those
I
I
don't
think
like
if
there
is
a
test
that
that
is
not.
That
is
not
generating
any
warnings
and
then
does
some
things
and
then
expect
no
warnings
to
be
generated.
That's
actually
a
good
test
and
it
doesn't
need
any
any.
Allow
lists
in
this.
I
J
I
K
J
D
This
VR
just
fixes
it
with
a
separate
DM
task,
so
it
would
be
locked
standing
it's
happening
today
as
expected
for
the
self
tasks
so
for
the
old
way
of
doing
things,
but
the
safe,
ADM
Suite
as
a
whole,
because
it
obviously
is
based
on
set
video
and
then
any
job
in
any
other
Suite
that
was
converted
to
this
fadm
way
of
deploying
Seth
those
jobs.
Currently,
you
know
essentially
just
don't
grab
for
for
These
Warnings.
I
D
D
So
in
that
case,
we
I
think
the
action
is
on
the
sepadm
team
to
get
this
PR
to
basically
pick
this
PR
up
and
fix
the
orchestrator
suite
so
that
it
remains
green.
When
you
know
when
this
PR
is
merged
and
I
guess,
there's
a
bit
of
work
on
the
latest
side
like
like
Laura,
said
there
were
some
jobs
that
that
have
been
converted
to
Safe
ADM
and
may
also
need
fixing.
A
And
a
lot
of
it,
I
predict,
will
translate
between
the
pork
sweet
and
the
radio
Suite.
D
I
I
G
A
Maybe
we
can
start
an
email
chain
and
loop
Adam
in
after
this.
D
I
think
he's
aware
of
this
PR.
He
actually
commented
on
it
and
I
think
he
was
the
one
who
kind
of
raised
the
option
of
just
you
know
going
the
other
way
and
just
always
ignoring
all
those
failures.
So
yeah,
we
I,
think
we'll
just
yeah
I
just
need
to
Ping
him,
and
let
him
know
that
the
Freighters
are
there
to
stay
and
is
the
orchestrator
suite
that
needs
fixing.
C
A
And
yeah,
and
whatever,
whenever
this
goes
through
testing,
even
though
a
lot
of
it's
been
linked,
it
should
go
through
the
orchestra,
suite
and
the
rate
of
sweet
and
and
others.
If
it's
it's
like,
for
instance,
the
fs
Suite
uses
the
cephadium
task.
That
should
go
through
too,
but
those
are
just
the
two
that
for
sure
come
to
mind.
A
All
right
was
there
anything
further
that
anybody
wanted
to
comment
on
this
because
I
I
wrote
Our
conclusion
down,
but
if
there's
anything
else,
please
go
ahead.
A
A
All
right
Nixon
go
ahead.
Your
topic
is
next.
A
Was
it
I.
J
J
A
a
brief
thing
to
discuss
so
that
I
can
I
can
do
it.
So
the
the
topic
is
basically
this.
When
we
create
temporary
PG
mappings
in
on
the
primary
p
OSD
in
during
peering,
we
don't
consider
primary
Affinity
at
all.
So
arguably
that's
the
thing
we
should
probably
fix,
although
that
logic
is
pretty
complicated
and
modifications
there
have
significant
ramifications.
J
So
it's
arguably
not
that
important.
So
the
price
here
is
that
when
we
mark
an
OSD
out,
it
has
a
fairly
high
probability
of
coming
back
in
as
a
primary,
while,
it's
being
while
its
replacement,
is
being
backfilled,
no
matter
so
I
guess
the
first
topic,
to
the
extent
that
there
is
one
is
how
big
a
deal
is
that
and
the
second
topic,
knitson
noticed
when
he
was
trying
to
make
a
reproducer
for
this?
J
Is
that
if
you
mark
all
of
the
osds
with
a
primary
Affinity
less
than
one,
you
have
a
pretty
deep,
you
have
a
modest
probability
of
ending
up
with
a
primary
Affinity
zero
as
d
as
the
primary,
because,
if
all
of
the
OCS
reject
we
always
pick
the
first
one,
that's
the
pra
closed
and
my
argument
is
that
that's
actually
not
worth
fixing.
Since
that
scenario
is
pretty
pretty
weird,
so
thoughts
on
those
two
topics.
D
Yeah
I
completely
agree
with
the
second
one,
not
only
because
this
Behavior,
just
you
know,
just
just
you
know,
makes
very
little
sense
if
you,
if
you
don't
allow
like,
if
there's
not
a
single
OSD
with
the
primary
Affinity
of
of
one,
then
this
means
that,
like
you,
you
don't
want
any
OED
to
be
primary
and
that
this
leaves
us
or
the
cluster
to
choose.
Whichever
is
these,
whichever
OSD
it
says,
fit
to
be
the
primary.
D
So
that
actually
makes
sense
to
me
and
also
the
other
reason
is
because
this
is
a
client
facing
change,
that
that
student
is
going
to
need
to
propagate
to
all
the
clients.
That
would
need
to
be
featured
data
and
all
that.
D
Right
and
and
for
pretty
much
nothing
right,
this
is
just
just
so
that
the
primary
Infinity
of
just
so
that
this,
this
very
weird
case,
is
handled
slightly
better,
so
I
don't
see
any
value
in
that.
The
first
topic
that
you
brought
up
the
digital
mappings-
that's
not
like
this
I'm
not
like
you've,
said
that
it's
complex,
but
at
the
very
least
it's
not
like
it
wouldn't
need
to
go
all
the
way
up
to
the
clients.
You.
J
The
class
replicated
pgs
yes,
but
Erasure
coded
pgs
are
more
complicated.
We
don't
so
that
the
challenge
here
is
that
the
OST
map
just
assumes
that
for
the
temporary
mapping,
the
first
one
is
the
primary
and
that's
sufficient
for
replicated
PGA,
since
we
can
control
the
order,
but
it's
not
sufficient
for
Erasure
code
of
pgs.
D
D
Yeah
but
but
that
would
still
be
you
know,
contained
within
the
cluster,
not
client
facing,
which
means
that
this
is
actually
feasible.
J
This
one
might
actually
be
worth
it,
but
it's
definitely
not
urgent.
It
would
be
sometime
in
the
next
year
kind
of
thing.
The
the
reason
why
it's
weird
is
that
it's
it's
the
the
workflow
we
expect
people
to
do
is
to
Mark
a
failing
disk
with
primary
Affinity
zero
and
then
Mark
it
out.
So
it's
disconcerting
for
them
to
see
that
OSD
pop
back
in
as
primary
immediately,
which
is
what
happens.
J
No,
no
just
just
a
single
one,
because
we
don't
consider
primary
Affinity
literally
at
all
in
this.
In
this
scenario,
we
make
the
print
we
choose.
One.
L
I
I
A
And
for
bringing
that
up
so
it
looks
like
we
had
a
PR
that
was
closed
now,
but
we
still
want
to
address
it
at
some
point.
I
B
A
All
right,
if
not
thanks,
Sam,
for
for
bringing
it
up.
The
next
topic
is
gal.
Solomon's
updates
on
S3
select.
H
H
Okay,
hey
hi,
my
name
is
Gus
Solomon
and
I'm,
a
member
of
a
lot
of
Gateway
team
for
the
last
three
years.
We
are
developing
this
effect.
This
talk
is
mainly
an
introduction
to
access,
select
I'm
going
to
talk
about
the
concept.
I
will
explain
what
it
is.
Why.
E
H
Need
us
and
how
does
it
work
and-
and
please
feel
free
to
ask
any
question
at
any
point,
so
first
in
July,
2020,
as
I
said,
was
introduced
to
Safe
Upstream
AWS,
actually
done
it
two
years
before
that,
as
you
select,
is
another
S3
request.
It
enables
the
client
to
push
down
an
SQL
statement
into
the
software
storage
and
I
must
know
that
this
SQL
dialect
is
according
to
AWS
spec.
H
Let's
just
direct
is
a
new
accessory
capability.
It's
it's
designed
to
pull
out
only
a
subset
of
the
data
that
you
need
and,
and
this
can
dramatically
improve
the
performance
and
reduce
the
cost
of
application
that
need
to
access
the
data
in
S3
and,
and
it
should
be
noted
that
it
has
to
reflect
the
request.
Implementation
inherits
from
get
object,
operation
request
from
simple
plus
perspective,
so
we
I
can
say
that
sfx
is
kind
of
a
get
object
on
steroids.
H
The
the
SQL
expression
and
its
processing
is
the
main
difference
versus
the
the
get
object
and
all
of
this
I'm
going
to
discuss
later
in
in
later
slides.
Okay.
So
why
do
we
need
that?
What
is
it
good
for
the
question?
Paradigm
is
about
moving
or
pushing
the
the
operation
close
to
the
data
and
it's
contrary
to
what
is
commonly
done,
meaning
moving
the
data
to
the
place
of
operation,
so
push
down
push
down,
does
not
consume
memory,
while
the
caching
does
consume
memory
and
in
a
big
data
ecosystem.
H
It
make
a
big
difference
so
yeah
upon
using
cash
Paradigm.
The
flow
is
about
what
data
to
keep
in
cash
and
what
data
should
it
validate
upon
using
the
push
down
which
is
actually
just
to
select,
the
flow
is
about
what
operation
should
push
down
and
how
to
use
the
return
results.
So,
let's
second
an
example
in
order
to
execute
the
SQL
query,
something
like
select
sum
of
X
Plus
y,
where
a
plus
b
is
bigger
than
C
in
order
to
execute
the
default.
H
An
SQL
equally
that
needs
to
operate
on
on
CSV
or
Json
object,
and
that
contains
many
columns
and
roles,
but
need
needs
only
a
very
small
portion
of
it.
Only
a
few
columns
for
many
or
small
percentage
of
the
rows
will
return
a
small
amount
of
data.
It
means
actually
less
operations
of
sterilization
and
visualization,
and
then
it
should
be
loaded
also
that
pushing
down
operations
close
to
the
data.
It's
not
an
easy
thing
to
do.
The
storage
is
usually
segmented
and
broken
into
random
pieces.
Each
record.
H
It
makes
it
quite
difficult
to
execute
single
query
on
top
of
many
random
places.
Moreover,
the
type
of
this
objects
could
be
boundary
in
some
case.
H
Okay
and
what
what
is
capable
of
it
does
not
turn
our
Esther
storage
into
a
database.
That's
a
good
keep
in
mind,
but
it
does
improve
greatly.
The
efficiency
of
the
SQL
processing.
H
Is
for
read
only
it's
only
the
select
statement
portion-
and
there
is
no
notion
of
schema
or
table
and
object
been
read,
should
be,
should
enable
the
tabular
operation
as
the
flag
system
is
embedded
into
the
get
object
model
which
makes
it
highly
efficient
for
the
push
down
operation
upon
SQL,
query
being
pushed
down,
the
the
object
is
fetched
and
each
fragment
of
the
object
is
processed
by
the
asset,
selector
module
and,
since
the
suspect
is
embedded
into
the
Ester
system,
the
there
are
no
resulted
copies
of
bytes
and
and
the
object
is
processed
immediately
and
the
results
send
back
to
the
client.
H
Moreover,
this
effect
system
is
capable
of
capable
of
processing
and
different
types
of
objects,
such
as
a
CSV,
Json
and
parquet
and
parquet
is
a
unique,
unique
structure
and
we're
going
to
discuss
this
later
about
this
kind
of
format,
pocket
format
and
and
must
not
also
that
the
same
engine
is
processing.
This
different
object,
types,
okay
and
maybe
another
question
that
somebody
might
might
ask-
is
why
using
SQL
I
know
something
and
not
something
else.
H
So
the
question
also
will
be
what
what
should
be.
Should
we
ask
here
is
what
is
needed
for
machine
learning
and
SQL
is
needed
for
the
machine
learning
through
the
practice.
This
is
the
language
for
querying
data.
It
should
be
noted
also
that
SQL
is
a
domain
specific
language.
More
than
40
years
it
was
designed
initially
for
manipulating
data
in
a
machine
learning
workflow
the
data
is
the
is
the
main
thing
it
is
the
source
and
the
more
relevant
inaccurate.
H
Yes,
you
also
show
should
keep
in
mind
that
the
data
must
be
formatted
so
that
it
can
be
used
by
the
machine
learning
algorithm
for
improved
pattern.
Detection.
H
All
of
this
next
SQL,
the
starting
point
for
the
machine
learning,
adding
the
capability
to
read
and
process
different
common
data
sources
such
as
CSV,
Jason
and
poke,
makes
that
a
really
powerful
tool
for
machine
learning.
H
As
you
probably
know,
reliable,
efficient
relativeship,
a
typical
safe
user
may
aggregate
along
the
years
tons
of
objects.
These
objects
contain
huge
huge
amount
of
data,
structured
and
unstructured,
and
these
massive
amounts
of
data
is
sitting
there,
a
kind
of
a
cold
State
and
the
data,
and
that
the
resizer
could
be
quite
precious
but
to
know
about
that,
but
our
opportunity,
our
voices,
it
is
it
needs
to.
It,
needs
to
extract
this.
This
data
from
the
storage
and
that
is,
could
be
very,
very
expensive.
H
H
So
using
a
s
to
select
on
disk
called
and
none
you
can
say,
non-accessible
data
mixed
data
is
to
select
a
cost
effective
for
an
ethical
operations.
H
Our
goal
is
to
make
the
safe
object,
storage
more
attractive
for
an
electrical
operation
and
not
just
as
not
just
for
storing
data.
H
That
should
be
noted.
The
engine
itself
is
add
only
there's
not.
There
are
no
libraries
with
one
exception,
which
is
the
Apache
error.
It's
for
the
4K
reader.
H
H
H
The
memory
that
is
consumed
by
the
engine
is
enough
for
the
for
the
robbing
process.
So
for
that
reason
the
engine
is
able
to
process
any
size
of
riddles
so
for
of
the
different
object,
types
and
decoupled
from
the
SQL
engine.
So
future
readers
for
other
data
types
such
as
orc,
will
reuse
the
same
engine
code.
H
H
If
you
have
any
questions
until
now,
I'll
be
glad
to
answer
them.
We
continue
to
describe
now.
The
different
flow
for
the
different
data
types
will
start
with
a
CSV.
H
H
Okay,
so
some
explanation
about:
let's
throw
the
flow
of
each
type
of
a
different.
The
differences
between
the
different
data
source
USB
is
the
most
simple
format.
H
H
This
both
format,
the
60
and
Json,
should
read
and
scan
completely
the
entire
object.
On
the
other
hand,
parquet
is
highly
Advanced
object,
it
contains
enough
metadata
and
that
enables
the
user
to
access
only
the
data
digits
for
the
query
processing.
That
means
that
by
using
parquet
object,
it
is
not
just
saving
Network
bandwidth,
but
it's
also
save
a
lot
of
iops
on
the
server
side.
H
Okay,
so
now
we
discuss
a
bit
about
the
C3
flow
itself.
In
the
case
of
the
CSV,
there
are
other
Gateway
fetches,
a
chunk
of
the
chunk.
The
extra
select
engine
offices
each
chunk
and
the
results
are
sent
back
immediately
upon
completion
in
this
in
this
type
of
processing.
A
chunk
may
break
a
row
in
the
middle
and
this
blocking
rules
are
not
processed,
but
they
are
safe
for
the
later
for
later
and
merge,
but
in
the
with
the
next
chunk.
H
So
to
summarize,
the
acid
select
engine
offers
row
after
row
and
send
a
mid
an
immediate,
a
response
with
no
preloading.
H
Okay,
so
bucket
bucket
is
binary
object.
It's
a
column
now
object,
we'll
explain
about
it
about
that.
A
bit
more.
This
type
of
object
contains
a
lot
of
metadata.
It
enables
the
reader
to
skip
known
as
non-necessary
information
when
I'm
I'm,
saying
non-necessary
information,
I
think
about
a
parquet
object
which
contain
a
lot
of
columns.
Let's
say
one
100
and
200
columns
and
your
query
only
need
to
access
a
few
of
them.
H
Let's
say
three
of
them
actually
enables
to
the
Parker
video
and
also
for
the
SQL
parcel
to
extract
only
these
three
columns
to
skip
the
data
and
not
scanning
it.
It's
a
it's
a
huge
reduction
of
of
IO
of
iops,
okay,
so
at
first
then
just
verify
that
the
magic
pipes
it
makes
sure
that
it's
validated
packet
object
and
for
the
next
step,
the
engine
fetches,
the
predicate
columns,
the
predicate
columns
I
mean
the
the
wear
clothes
in
the
case
of
the
predicator
returns.
H
True,
it
fetches
the
projection,
columns
and
calculates
the
projection
statement.
Parque
object,
as
mentioned,
is
structured.
It
contains
schema.
This
unique
columnar
structure
enables,
as
I
said
before,
to
skip
data
and
eventually
save
a
lot
of
iOS
as
well.
The
main
differences
between
the
parquet
reader
and
the
CSV
reader
in
the
C3
flow,
the
radius
Gateway
fetches,
a
chunk
of
the
chunk
and
deposits
possess
it
each
separately
and
the
actually
the
other
great
way
controls
the
the
start
and
the
end
of
the
process.
H
Okay,
now
the
following
slides
I
will
explain
about
the
Json.
Division
is
quite
a
unique
structure
and
quite
challenging.
I
must
say
it.
We
are,
let's
say,
already
finished,
about
80
to
90
percent
of
its
development.
H
So
this
a
bit
weird
slide
I
tried
to
explain
why
Json
reader
is
a
unique
compared
to
other
readers,
Upon
quiring,
A
Json
document.
The
S6
engine
is
focusing
only
on
specific
parts
of
the
nested
and
complex
Json
data,
not
on
the
old
Json
data.
So
it
means
that
only
specific
parts
of
the
Json
document
are
loaded
into
the
SQL
engine
for
processing.
H
You
can
see
here
in
this
example,
the
SQL
statement
is
focusing
it
it
it's
processing
on
specific
parts
of
the
documents.
According
to
the
Forum
close,
the
phone
calls
here
saying
from
S3
object,
wrote
that
glossary
that
gloss
deal
that
close
list
and
closed
entry
which
make
which
actually
triggered
the
the
reader
to
look
for
a
specific
path
and
and
to
start
from
there
to
load
a
specific
columns
into
the
asset
select
engine.
So
you
can
say
that
the
phone
calls
Define
the
boundaries
where
to
get
the
the
projection.
H
In
the
predicate
column
and
in
the
next
slide,
you
can
see
this
in
this
statement,
which
is
different
from
the
previous
one,
but
it's
the
same
document
and
this
form
closed
which
what
he
defined
differently
from
the
previous
one.
It's
the
boundaries
of
the
row
so
So
when
you
say
my
generator-
is
actually
without
only
a
different
parts
of
the
document
compared
to
the
previous
one.
This
makes
the
the
Json
video
quite
unique
to
others
with
to
other
readers.
H
So
for
summaries
that
to
summarize
all
of
this,
the
Json
leader
is
similar
to
the
CSV.
Their
Logics
must
be.
The
old
object
must
be
scanned
and
read
completely
because
there's
no
way
to
skip
data
in
the
Json.
It's
a
text
file,
but
it
means
it
means
that
the
runners,
of
course
fetch
a
chunk
of
the
chunk.
H
H
Okay,
another
important
issue
with
the
scaling
the
scale
up
a
scale,
the
scaling
the
scale
up
issue
is
currently
is
already
designed
and
actually
also
already
implemented.
It's
not
your
own
Upstream,
but
it's
really
important.
H
One
I
mean
there's
no
way
to
run
SQL
query
without
getting
the
quality
in
our
system,
so
must
say
that
the
ability
of
processing,
a
data
stream
without
preload
and
relayed
is
essential
for
possessing
huge
amounts
of
data
and
it's
from
memory
consumption
perspective,
and
this
is
achieved
by
the
processing,
the
role
of
the
row
until
the
end
of
the
stream,
it's
quite
similar
to
the
to
the
Linux
pipe
concept,
adding
that
if
the
effects
function
consume
really
a
little
memory,
as
effect
is
able
to
split
the
quality
processing
and
merge
the
results
upon
completion
into
a
single
result.
H
E
You
can
see
here
the
way
to.
H
It
should
note
that
a
pair
data
type,
the
the
data
splitting
process
is
different
in
the
case
of
the
CSV,
the
object
is
played
according
to
the
object
size
and
the
row
delimiter
in
the
case
of
poke.
This
unique
columnar
structure
is
already
divided
into
logical
groups,
and
that's
why
it
makes
it
quite
easy
to
to
divide
it
into
non-overlapper
Angels.
H
The
the
Json
use
case
is
more
complex,
so
one
way
to
handle
that
is
to
is
getting
the
data
without
the
many
objects
and
split
their
work
according
to
the
amount
of
objects
in
this
slide,
I
want
to
demonstrate
why
why
scaling
up
is
could
be
quite
complex,
so
you
can
see
here
an
aggregation.
Query,
select
substring
of
some
contact,
constant
text
with
aggregation
function
like
a
minimum
or
an
average
account.
H
The
minimum
and
average
are
actually
Arguments
for
the
substring
function,
and
in
this
case,
it
actually
select
is
divided
into
three
parallel
one.
The
mid
the
mid
stage
calculate
the
the
and
summarize
the
results
of
the
first
stage,
and
the
final
stage
is
is
merging
the
results.
You
need
to
understand
that
the
the
sfx
engine
need
to
to
analyze
the
query
and
to
know
for
each
function
how
to
actually
merge.
The
results
in
the
case
of
minimum
is
is,
is
the
minimum
of
the
all
minimum.
H
In
the
case
of
average,
you
need
to
have
a
specific
flow,
how
to
make
an
average
of
all
averages
of
each
of
the
of
a
different
front,
and
what
you
get
finally,
of
course,
is
a
much
powerful
one.
It's
about
it
could
be
three
times
in
this
case
could
be
three
times
faster
than
a
single
Glory.
Of
course,.
H
Okay,
some
some
about
challenges
ahead
of
us
integration
with
spark
entrenal
Okay,
so
applications
such
as
spark
poster
of
trino
may
use
object,
storage
for
their
backend
storage
and,
as
we
know,
but
I
said
it's
quite
inefficient
use
of
the
storage.
This
is
because
it
needs
to
retrieve
all
objects
participating
in
the
query,
but
if
they
know
the
train
of
the
stock
analytic
application,
we
would
know
how
to
push
down
that
query
to
the
market,
to
theft
using
just
to
select
request.
H
Instead
of
the
get
object,
it
will
improve
the
query
performance
automatically.
The
SF
effect
is
actually-
and
you
can
say
that
it's
actually
kind
of
Optimizer
for
this
eye
level.
Application
so
and
also
need
to
understand
that
there
is
a
great
advantage
using
SQL,
analytic
application.
Since
so
many
end
users,
applications
are
written
for
spark
entreno,
and
what
is
this
from?
H
The
recent
analytical
application
is
to
identify
what
parts
of
the
query
can
be
pushed
down
to
the
F3
storage
and
and
send
them
to
send
this
as
a
query
as
a
S2
selector
request
this,
and
why
must
I
mentioned
that
this
type
of
optimizer
already
exists
in
algebra
CMR
and
this
kind
of
integration,
and
you
might
pay
attention
to
that
to
this
article.
H
This
article
was
published
in
last
November,
it's
about
AWS,
Central,
John
telfort.
They
improve
and
upgrade
the
trader
software
and
also
contribute
it
to
the
open
source.
H
H
Okay,
another
another,
a
future
implementation
that
we
think
about
thinking
of
is
a
single
query
on
many
objects
and
I
want
to
discuss
a
bit.
What
is
the
advantage
of
that?
I
think
the
advantage
of
possessing
a
huge
size
objects
using
SD
effect
is
is
quite
obvious
or
already
discussed
that.
H
So
these
kind
of
features
allow
you
to
perform
queries
without
any
size
limitation,
which
is
really
a
great
thing
and
also
it
should
be
noted
that
AWS
has
to
select
operates
only
on
a
single
object
and
not
on
many.
H
Another
thing:
it's
quite
easy
to
experiment
as
to
select
you
know
you
don't
need
to
run
the
whole
set
or
something
like
that.
You
can
actually
run
it
from
the
share
command
and,
of
course,
you
should
review
the
the
GitHub.
The
geta
trip
of
sselect
I
will
show
some
example
how
to
to
run
this
on
your
actually
on
your
local
file
system.
H
Okay,
so
put
here
some
some
examples
how
to
run
as
effect
as
a
standalone
application.
H
The
first
example,
as
you
can
see
here,
is
the
SQL
statement.
It's
it's
processing
a
standard
output
of
the
PS
command.
It
possess
the
input
as
a
CSV
type.
Now,
please
please
note
that
the
SQL
statement
uses
the
column
names
like
the
the
PID
CMD,
the
current
PID
and
the
column
column
names
extracted
from
the
PS
command,
using
the
CSV
Adder
info
environment
variable,
which
indicates
that
there
is
a
error
and
since
the
PS
this
first
12,
the
PS
is
the
column
name.
H
H
But
this
time
it
uses
an
existing
already
existing
container
and
the
input
data
is
piped
into
the
container.
Now
I
must
mention
that
this
container
consists
all
the
necessary
all
the
necessary
packages,
including
the
Apache
Arrow,
and
also
build
activities.
H
So
it's
quite
easy
for
anyone
to
use
this
container
to
view
the
code
to
change
it
to
rebuild
it
and
whatever
you
want
to
do
in
the
last
example
here
also
using
the
same
in
the
same
container,
but
in
this
case
it
it
mounts
those
directory
into
the
Container
directory
at
the
enable
the
user
to
run
after
select
on
any
on
any
file.
You
can
use
it
on
4K
Json,
CSV
yeah.
That's
it
and
I
really
really
glad
to
hear
any
comment
or
question
or
anything
else.
A
We're
really
a
comprehensive
and
well
put
together
a
presentation
by
the
way,
and
my
my
question
was
so
you
talked
about
us
three
select
you
know
operating
on
different
types
like
CSV
versus
Jason.
A
Would
there
ever
be
an
instance
where
you're
dealing
with
mixed
types
at
once
or
is
it
always
I
saw
you
how
you
issued
the
commands?
You
would
always
issue
them
like
if
you
wanted
to
parse
them
CSV
and
then
a
Jason,
you
would
never
yeah.
H
Think
about
that
the
application
May
produce
a
mixed,
a
mix
of
of
of
of
data,
a
different
format,
a
CSV
Json
and
maybe
also
poke,
and
all
of
them
together
and
making
some
kind
of
a
offense.
H
Some
data
scientists
may
say:
look
I
want
to
make
a
single
query
on
all
of
this,
so
there
is
since
I
know
a
bit
about
the
my
this
is
to
select
there's.
No,
it
is
possible
to
do
that
with
Json.
It's
it's
a
bit
difficult,
because
Json,
actually,
cocaine
and
CSV
are
quite
naturally
I
can
join
together
because
it's
the
same
syntax
but
the
Json.
If
you,
if
you
look
at
the
Json,
you
can
see
that
it
impacts
the
syntax
of
the
SQL.
H
You
actually
need
to
describe
the
phone
close,
which
is
different
from
the
system.
Okay
and
also
in
the
projections.
Again,
you
can
also
need
to
describe
the
path
of
each
of
each
data.
You
need
to
discuss
the
path,
and
this
is
a
problem
because
you
don't
have
it
in
seriously,
so
a
query
which
Define
how
to
query
a
Json
document
will
not
work
on
on
CSV.
H
But
maybe
12
checking
it.
Maybe
there
is
a
way
to
do
that.
I,
don't
know
some
kind
of
a
flat
Json,
maybe
I,
don't
know.
B
A
Yeah
I
wasn't
sure
if
it
was
like
can
be
thought
of
as
a
limitation
or
you
wouldn't
even
have
that
use
case,
and
it
wouldn't
be
worth
it.
You
know.
H
Now
Jason
is
the
problem:
Jason
simply
Define
a
different
syntax
in
some
parts,
I
mean
it's
the
same
fashion.
Exactly
it's.
The
almost
the
same
flow
I
mean
it's
the
same.
It's
the
same.
It's
called
engine
for
all
of
them,
but
there
is
a
part
in
the
in
the
SQL
parcel
where
each
of
the
path
each
value
becomes
some
kind
of
index
in
the
in
the
scratchery
of
the
SQL
engine,
and
this
part
is
very
unique
for
the
Json.
So
you
cannot
combine
upon
defining
an
SQL
query.
H
It
won't
work,
it
might
not
work
on
CSP,
but
maybe
there
is
a
way
to
think
about
it.
Maybe
it
is
a
way
to
a
special
plugin
which
would
reduce
something,
and
then
you
can
run
it.
Also.
If,
let's
say
if
the
suffix
is
the
same,
you
can
do
something.
I,
don't
know
it's
something
that
you
need
to
think
about
about
joining.
As
you're
saying
you
had
to
thank
you
correctly,
you're
joining
the
same
query
on
different
format,
Json
and
obviously,
in
this
case.
Yes,.
D
I
have
a
question
that
is
more
related
to
parquet
itself
than
Let's
reselect.
So
my
understanding
is
that
this
is
a
columnar
format,
meaning
that
the
the
The
Columns
are
stored
together,
presumably
continuously
so
that
they
can
be
easily
skipped
and
also
compressed
and
whatnot.
But
he
also
mentioned
that
for
parallelization
right,
the
divide
and
conquer
so
I,
just
you
know,
style
of
processing.
Parque
is
also
provides
some
efficiency
benefits,
but.
B
D
H
A
good
question:
I'll,
don't
have
a
slide
for
that,
but
there
is
a
notion
in
the
in
parque
which
calls
the
raw
groups
what
they
are
doing
there.
It's
actually
the
chunk
is
which
reside
on
the
format.
Object
is
not.
It
is
contiguous,
it's
actually
it's
packages,
it's
packaged
into
a
chunks
chunk
after
chunk
of
the
chunk
of
a
chunk,
and
you
have
a
notion
of
a
row
group.
The
notion
of
logos
means
that
you
have
a
chunk
of
of
column.
H
One
reside
to
chunk
of
column,
two
reside
to
chunk
of
column,
three
and
so
forth.
Okay,
do
I
think
try
to
imagine
that
you
have
a
group
of
chunks
each
if
each
chunk
belongs
to
a
different
column
and
upon
a
query,
processing
you
actually
in
order
to
jump
to
the
specific
chunk
that
you
need.
So
in
the
case
you
want
to
parallelize
the
processing
on
such
a
on
such
an
object.
You
need
to
divide,
let's
say
your
your
parquet
as
a
thousand
row
groups.
H
On
the
same
on
the
same
object,
so
you
need
to
divide
these
raw
groups
for
about
125
per
each
for
each
query.
Okay
and
this
135
robots
will
actually
have
you
have
a
chunk.
This
reside
next
to
another
chunk
of
each
each
chunk
is
contiguous
a
list
of
values
per
each
columns,
I
hope,
I,
hope
it
makes
myself
clear
with
a
with
a
good
slide.
Actually
it's
quite
it's.
Quite
it's
quite
obvious.
H
This
kind
of
thing
you
can
actually
see
that
in
Apache,
Aero,
okay,
there
there
is
a
nice
slider
that
actually
explain
how
of
this.
Okay,
the
file
is
divided
into
columns,
robots
and
chunks.
F
D
H
By
the
way,
I
I
want
to
mention
another
thing:
yeah
upon
integration
of
Torino,
with
FX
select
adreno
itself.
Why?
Why
so
important
to
have
trino
integrating
into
our
into
our
system
and
also
why
AWS
thought
it's
so
important
and
why
you
get
so
much
nine
times
better
one
one
reason
for
that:
it
is
because
this
really
can
actually
parallelize
its
operation.
It
can
actually
divide
what
he
does
and
I'm
testing
I'm
testing
that
on
our
system.
Now
it
actually
divide
the
object
into
several
parts.
H
Okay,
it
actually
knows
because
it's
it's
in
a
full
engine
that
really
is
a
full
engine
and
he
knows
how
to
split
the
data
run,
a
dedicated
with
the
same
quality,
of
course,
for
the
for
the
different
parts
of
the
of
the
object
and
then
merge
the
results,
and
that
is
why
it's
so
important
to
have
this
training
integration
into
our
system.
It's
not
just
before,
and
just
because
you
have
so
many
end
users
that
are
using
trino.
H
This
is
a
very
good
reason,
of
course,
but
also
because
it's
very
efficient,
if
you
run
free
now
on
on
some
file
on
your
local
system.
You
can
see
that
the
run
parallel
query
on
that
object.
H
Compared
to
what
I
described
as
a
parallel,
it
does
it
in
a
different
way.
It
uses
the
the
scan
range
of
the
get
object.
It
actually
measures
the
CSV.
Let's
say
he
said:
okay,
it's
one
gigabyte,
so
I
will
divide
it
into
several
parts.
Of
course,
none
not
over
none
of
the
lapware
paths
and
it
will
issue
a
different
request,
but
each
of
this
of
this
paths
and
then
match
the
results.
This
is
the
trino
capability
and
and
upon
having
a
good
integration
with
this.
With
this.
H
L
Hey
girl,
I
had
a
question
in
the
past
we
had
there
had
been
interest
in
the
idea
of
pushing
queries
all
the
way
down
to
the
osds.
Is
there
still
interest
in
that?
Do
you.
L
H
Although
I
read
I
heard
some
say
that
the
only
question
about-
and
maybe
you
know
better
about,
the
scheduling
of
the
OSD
again,
can
it
consume
this
because
actually
we're
talking
about
OSD
is
more
difficult,
but
we
can
cope
with
that.
The
question
whether
you
can
schedule
correctly.
All
of
these
push
down
queries
and
I
mean
this
query-
is
going
to
work
in
in
Circus
manner
or
going
to
work
in
parallel
manner.
H
That
that's
one
question
that
somebody
raised
and
another
thing
that
somebody
raised
it
was
it
okay,
it's
like
because
the
data
is
segmented.
You
need
to
have
a
specific
flow,
but
this
this
thing
that
could
be
resolved
but
directly
to
the
question.
Yes,
I
think
we
should
push
down
and
hopefully
because
we're
going
to
load
the
CPU,
the
OSD.
H
We
need
to
figure
out
if
it's
it's
a
good
thing
to
do,
maybe
to
split
it
between
the
others,
Gateway
and
and
always
the
it
seems
something
that
we
need
to
check.
But
yeah
I
think
we
should
push
down
to
ofd.
L
B
H
That's
something
to
check
I
think
it
was
some
time
ago,
as
I
know.
Somebody
make
an
issue
on
that
that
okay,
I
think
I'm
gonna
Recall.
Now,
if
it's
too
long,
okay,
if
the
the
data
fragment
that
should
be
processed
in
the
OSD
might
take
too
long,
it
might
break
in
the
middle
I'm,
not
sure
if
that
was
the
the
concern
but
I
think
I
I,
remember
something
like
that
about
too
too
long
processing
in
the
OSD.
H
Probably
yeah
yeah
notion
but
yeah.
Somebody
said
about
the
Tula
about
scheduling,
issue
yeah.
B
H
And
in
that
case,
we
need
to
to
build
something
that
to
avoid
us
from
the
suicide
parameter
of
America,
to
know
to
return
the
result
and
in
order
to
get
back
to
the
place
where
we
stopped
the
processing.
This
is
a
different
thing.
What
we
we
don't
have
the
right
now,
but
again,
I
think
it's
worth
doing
it,
because
it's
down
there
it's
pushing
down
even
further
and
it
much
it
properly.
We
probably
will
gain
some
more
some
more
efficiency
by
pushing
down
to
the
OSD.
H
L
L
K
H
A
Girl,
by
the
way,
if
you
have
a
link
to
your
presentation
totally
fine,
if
you
don't,
but
if
you
do
send
it
in
the
chat
and
I'll
link
it
to
the
tracker
for
this
meeting.
Okay.
A
All
right,
if
there's
no
more
to
stay
on
S3
select,
I
can
go
to
the
next
topic.
A
A
A
So
with
the
goal
of
allowing
more
disk
space
to
osds,
so
that
we
can
run
tests
on
so
right
now,
there's
a
limitation
on
how
many
osds
we
can
a
lot
for
tests
and
technology,
and
we
would
like
to
run
tests
with
multiple
osds
or
mini
osds
more
than
what
we're
doing
now.
A
A
So,
there's
a
lot
of
been
a
lot
of
groundwork,
steps
that
have
gone
into
it,
the
first
step
being
the
the
zero
block
detection
feature
which
has
been
merged,
and
it's
a
it's
a
feature
that
can
be
enabled
via
a
blue
store
configuration
it's
off
by
default,
since
it's
meant
for
testing
purposes.
A
But
that's
if
you
need
to
to
visualize
it.
That's
so
where
it
could
be
found
in
the
config
settings
for
Blue
Store,
it
can
be
enabled
on
osds
and
Aishwarya.
Has
done,
is
working
on
more
groundwork
now
to
actually
set
up
the
logical,
large
scale,
Suite,
so
I'll.
Let
her
talk
on
some
steps
that
are
going
into
that
and
some
groundwork
that
is
laid
out
for
the
for
the
future.
M
Thanks
Laura,
so
I
started
off
working
on,
seeing
how
we
can
increase
the
number
of
osds
and
our
tests
and
I
started
looking
into
how
we
currently
do
this
partitioning
on
our
Smithy
nodes
and
it's
something
that's
statically
defined
currently
with
David
Galloway
and
Zach's
help.
I
was
able
to
do
a
POC
sort
of
experiment
where
I
was
able
to
change
the
number
of
disk
partitions
I
wanted
and
with
the
self-ansible
tasks
make
that
work.
M
But
we've
realized
that
if
we
wanted
to
be
a
part
of
our
entire
test
flow,
we
need
to
have
a
separate
ansible
task,
which
kind
of
looks
at
the
size
of
a
disk,
and
we
can
tell
it
that
these
are
the
number
of
partitions
we
want.
According
to
how
many
osds
the
test
has,
and
then
we
partition
the
disk.
So
that's
what
I'm,
currently
working
on
yeah
I
think
that's
pretty
much
it
from
me.
I'll.
M
Let
Laura
talk
more
about
how
the
blue
store,
zero
detection
stuff
will
work
once
we
can
run
multiple
osts.
A
Yeah
and
it
just
to
con
clarify
Aishwarya
the
the
you've
been
able
to
do
this
locally
using
the
stuff
Ansel
Secrets
repository,
but
the
blockers.
M
A
That
it's
not
accessible
publicly
for,
like
a
a
new
tautology,
Suite
right.
M
Yeah,
that's
correct,
so
I
was
able
to
modify
the
Secrets
repository
or
locally
and
use
that
file.
There's
a
Smithy
animal
fire
inside
it,
which
is
used
to
look
up
for
the
disc
partitioning,
but
the
problem
with
that
is
because
it's
so
private
repository.
We
can't
really
specify
that.
Okay,
like
pull
this
repository
and
use
this
Branch
instead
of
the
main
branch
with
my
changes,
so
it
can't
work
with
the
tautology
Suite
command
to
make
that
work
is
why
we're
looking
into
ansible
task,
where
we
could
do
this
more
dynamically.
A
A
But
this
PR
is
a
new
work
unit
I've
designed
around
booster
zero
block
detection
for
right
now
in
toothology
we
have
it
enabled
for
some
of
our
tests
just
so
we
have
some
coverage
there,
but
this
ER
is
a
specific
work
unit.
I've
designed
around
the
feature
that
uses
demonstrates
the
use
of
zero
objects
and
how
it
doesn't
fill
up
the
space.
A
So
the
goal
of
this
work
unit
is
one
we
can
simply
get
more
test
coverage
in
the
the
regular
tautology.
Suite
I
am
I
designed
it,
so
it
can
be
run
along
with
the
other
Rada
Singleton
tests.
We
have
like
some
PG
Auto
scalar
tests
in
that
that
Suite
as
well
so
and
other
things
around
there,
so
it
would
just
pull
this
work
unit,
but
since
this
work
unit
it
doesn't
depend
on
a
certain
size
number
of
osds
to
perform
the
the
goal.
A
This
would
also
be
for
Aishwarya.
Once
once
we
have,
you
know
actual
the
the
partitioning
available
publicly.
A
A
Disabled,
see
that
the
discs
fill
up
a
lot
and
then
try
enable
them
and
check
the
difference
between
the
the
discs
filling
up
and,
of
course,
the
the
overall
goal
would
be
that
the
disk
space
given
to
an
OSD
would
be
lesser
so
so
that
we
could
perform
for
tests
like
we
do
on
a
they
give
a
cluster
or
take
the
same
logic.
That's
going
on
in
the
giva
cluster
with
how
large
it
is
and
translate
it
into
our
tests.
A
And
that's
where
we're
at
right
now,
we
plan
to
give
more
updates,
as
this
comes
into
action,
but
it's
mostly
just
a
lot
of
groundwork,
steps
that
have
to
be
laid
for
it
to
be
set
up,
but
we've
made
a
lot
of
progress
since
we
last
spoke
at
CDM,
so
we
wanted
to
give
it
update.
A
Does
anybody
have
any
questions
about
like
the
feature
or
the
work
that's
gone
into
it
or
the
overall
goal
stuff
like
that.
A
Here,
if
there's
anything
that
was,
you
know,
I
need
to
clarify.
Just
let
me
know.
A
Well,
it's
like
no
questions,
but
if
that
changes
feel
free
to
email,
me
or
or
ping
me
or
anything
and
I
can
answer
more
questions
as
needed,
because
you
know
perhaps
that
this,
since
this
is
a
config
option,
perhaps
somebody
would
have
another
use
case
for
it
that
they
they'd,
want
to
try
and
I'd
be
happy
to
answer
questions
about
that.
A
Thanks
Aishwarya
for
your
for
speaking
on
that.
A
Cool
the
next
Topic
in
our
final
topic
is
from
Ali
Casey
and
Ilya
on
Earth
dump
Json
output.
Can
this
be
set
in
stone
or
can
it
be
tweaked
with
a
release
note
in
a
major
release
and
there's
an
Associated
PR
here.
N
So
I
I
just
spoke
with
Yuri
and
she
commented
on
this
PR,
and
it
appears
that
at
least
for
Reef
we
could
tweak
the
format,
though
we
should
test
it
and
make
sure
the
we
that
nothing
breaks
and
if
something
does
we've
fixed
that
as
soon
as
possible,
taking
one
step
back,
would
we
want
to
have
a
separate
command
for
just
the
labeled
counters
or
have
that
all
have
it
all
in
the
same
command
at
first
I
was
first
I
wanted
to
separate
the
two,
but
now
since
there
doesn't
seem
to
be
it's
much
of
a
distinction,
if
the
formats
are
very
similar,
maybe
we
keep
them
in
the
same
command,
I'm,
not
sure,
Casey
or
Ilya,
or
anybody
else
following
has
an
opinion
on
this.
D
Well,
the
the
the
problem
with
emitting
both
types
of
counters
in
the
same
Json,
is
that
it's
unless
we
make
the
output
irregular,
it
wouldn't
be
backwards
compatible,
and
this
question
is
so
so
this
question
is
really
not
so
much
around
this
particular
PR,
but
you
know
more
towards
whether
the
output
of
an
admin
socket
command
in
this
case
per
dump.
I
guess
is
it
kind
of
a
very
special
Penman
socket
command
because
that's
the
oldest
one,
it
goes
back.
D
You
know
it's
at
this
point,
I
think
it's
like
12,
12
years
old,
I
think
so
there
may
be.
There
may
be
people
who
are
or
parsing
this
output
like
on
their
own
and
perhaps
rely
on
just
you
know,
selecting
or
querying
a
particular
value
of
a
particular
group
of
turf
counters,
and
if
we,
if
we
change
the
the
output
format
for
unlabeled
calendars,
even
if
it's
just
a
very
minor
change
such
as,
we
just
add
a
an
additional
labels.
D
D
Yeah,
so
this
is
this
is
the
question
right,
so
we've
never
promised
any
sort
of
like
we've
never
said
that
this
that
this,
that
this
would
be
set
in
stone
right
and
we
generally
Reserve
the
the
right
to
change
output,
whether
it's
a
CLI
command,
or
in
this
case
this
is
an
admin
socket
on
major
releases
right
with
as
long
as
it's
documented.
As
long
as
there
was
a
release
note
that
that
is
considered
acceptable
and
we've
done
that
in
the
past.
D
The
the
reason
that
I'm
a
little
worried
about
this
particular
one
is
like
I
said
this
is
a
very
old
command.
It's
perhaps
it's
perhaps
the
oldest
command
that
you
know
people
might
actually
be
relying
on
and
kind
of
manually
manually
parsing
its
output,
just
because
it's
been
there
for
for
so
long
and
way
before
any
of
the
other
monitoring
capabilities
showed
up.
D
So
I
think
this
one
is
just
a
little
bit
special.
So
that's
one
aspect
of
it.
The
other
aspect
is
that
the
current
proposal
is
actually
that
we're
not
just
adding
a
labels
key
to
a
dictionary.
So
it's
not
just
a
single
key.
The
Ali
is
considering
a
more
invasive
change
to
to
how
profound
counters
and
dumps.
D
If
you
look
at
the
and
one
of
the
most
recent
comments
there
is,
there
are
examples
there
and
we're
basically
talking
about
taking
the
entire
performance
counters
dictionaries
that
makes
sense
and
moving
it
under
under
a
key
that
is
named
counter.
So
it's
essentially
there's
there's
going
to
be
an
outer
dictionary
around
what
is
there
today
and,
and
that
is
definitely
going
to
break
any.
D
You
know
JQ
query
or
you
know,
if
it's
not
JQ,
then
it's
going
to
break
even
faster
if
there
is
an
additional
layer
of
interaction
there
yeah,
so
the
current
the
changes
it's
currently
proposed
is
going
to
break.
It's
like
definitely
going
to
break
existing
tools
and
the
the
the
use
case
that
I
have
in
mind
in
particular.
Is
someone
someone
getting
a
particular
puff
counter
right
so
fetching
a
particular?
D
Maybe
it's
not
a
single
performance,
but
you
know
a
set
of
counters,
let's
say
something
related
to
Blue,
Store
and
and
then
then
processing
that,
as
opposed
to
a
kind
of
a
blanket
sort
of
thing,
that
self-exporter
would
do
there.
Where
it
would
just
slurp
all
calendars-
and
you
know,
dump
them
without
paying
attention
to
what
is
actually
done.
D
So
it's
just
a
you
know
a
textual
processing
exercise,
whereas
the
tools
that
I
have
in
mind
is
like
they're,
actually
they're,
actually
making
sense
of
a
particular
counter
and
expect
it
to
you
know,
obviously
to
be
you
know,
one
level
or
two
levels
deep
into
this
into
this
Edition.
That
is,
that
the
term
damp
returns.
If
we
change
the
number
of
levels
there,
that's
that's
obviously
going
yeah
I'm
going
to
break.
D
That
said,
the
counter
argument
to
that
is
that,
if
we
wanted
to
like
in
the
future,
once
the
label
of
counters
infrastructure,
you
know
once
it
lands,
there's
probably
going
to
be
a
gradual
conversion,
at
least
for
proof
counters,
that
you
know
at
least
for
those
both
counters,
for
which
it
makes
sense
for
them
to
be
labeled
right.
So
for
the
ones
that
we're
currently
embed
some
sort
of
name
in
the
Perth
counter
itself.
D
There
would
be
a
natural
candidate
for
conversion
to
to
the
label,
because
that
that
would
that
would
actually
align
with
with
established
practices
in
in
you
know,
in
Prometheus,
for
example,
and
other
monitoring
tools
once
we
do
that
that
that
that's
something
like
that
would
obviously
break
that
particular
proof
calendar
anyway,
because
unless
we
go
so
far
as
to
provide
both
the
alt
of
counter
and
the
new
curve
counter
and
and
dump
the
same
values
right
just
under
different
names,
which
I
don't
think
we
want
to
do,
the
the
this
use
case
would
be
broken
one
way
or
the
other
anyway,
just
you
know
just
slightly
down
the
road,
so
the
question
is:
can
we
can
we
just
pull
the
plug
today
or
not
today,
in
leave
with
the
big
blue
is
not
around
it
or
do
we
need
to
engage
in?
D
You
know
trickery
and
you
know
potentially
potentially
try
to
make
this
more
accommodating,
such
as
like,
instead
of
adding
this
extra
layer
in
the
dictionary
it
may
be
try
to
get
by
with
you
know,
by
adding
just
a
single
key
or
maybe
even
just
introduce
a
whole
new
command
for
that
will
dump
both
labeled
and
unlabeled
curve
counters
and
leave
the
existing
term
dumb
command,
as
is
I,
don't
know.
D
So
there
was
a
bunch
of
options
here
and
all
of
this,
like
the
the
input
to
the
decision,
is
really
how.
D
How
are
we
actually
attached
to
the
current
output
in
any
way,
and
do
we
know
of
any
consumers
of
the
types
that
that
I've
just
described
out
there
in
the
wild?
Are
we
afraid
of
breaking
this,
or
should
we
basically
just
the
level
of
I'm?
Looking
for
the
level
of
sensitivity
with
regards.
G
To
it,
I
think
there
are
a
lot
of
different
monitoring
systems
out
there.
I
can't
not
sure
that
how
many
need
I
I
wouldn't
need,
like
all
the
characters,
jumped
all
the
time
that
sort
of
thing,
but
if
we're
considering
asthmatic
change
to
how
the
interface
works,
that
you
just
using
a
new
command.
D
Right
and
you
command,
this
is
very
like
it's
it's
a
billing,
but
it's
like
it's
really
unclear
how
to
to
name
it
right.
Firm
damper
is
a
very
self-describing
name
and
having
having
something
else.
That
does
the
same
thing,
but
adds
some
more
counters
is,
is,
is
potentially
confusing
and
then
going
back
to
to
the
the
the
the
issue
that
I've
raised
and,
as
we
start
to
convert
these
proof
counters
they
again.
D
Unless
we
go
as
far
as
to
maintain
the
old
ones,
they
necessarily
going
to
disappear
from
the
older
Dom
command
and
kind
of
Be
Moved
into
into
this
new
command,
as
time
goes
goes
by
anyway.
Making
the
old
command
potentially
useless
in
in
the
long
term
anyway.
D
C
From
the
geometry
point
of
view,
we
can
accommodate
both
on
the
module
and
on
the
backend
side.
C
D
Yeah,
the
Telemetry
is
kind
of
a
weak
use
case
here,
because
that's
what
we
control
and
that's
what
we
can
do
pretty
much
anything
with
exactly
so
yeah.
It's
like
really
what
I'm
concerned
is.
You
know
people
not
even
an
actual
monitoring
tool,
but
people
running
their
own
scripts
that
they're
that
they
are
quite
accustomed
to
and
obviously
very
attached
to,
because
you
know
that's
their
own
thing
and
that's
suddenly
breaking
just
because
we're
going
to
change
some
Json
around.
That
sounds
bad
I.
A
A
Like
the
the
digi
jump,
admin
stock
command,
where
there's
like
dump
PG's
full
and
dump
PG's
brief
different
flavors
sort
of.
A
And
I
could
see
it's
just
using
like
a
python
formatter
right
for
the
Jason.
So
you
could
have
like
a
a
case
where,
if,
if
like
I'm
I'm,
just
imagining
like
a
perf
dump
labeled
and
then
if
labels
add
the
add
the
labeled
part
stuff
that
into
the
formatter,
that's
what
I'm
just
envisioning.
But
that
makes
sense.
D
Well,
the
the
thing
like
the
approach
where
we
just
just
selectively
add
labels
for
the
things
that
actually
have
labels
that
has
a
disadvantage
again
in
the
long
run
in
the
long
term
that
has
a
disadvantage
of
this
output
being
irregular.
So
in
one
case
you
would
have
a
label
subdictionary
like
like
a
bunch
of
key
value
pairs
that
represent
labels
and
in
another,
is
like,
instead
of
having
an
empty
dictionary
dignifying
that
there
are
no
labels.
D
This
dictionary
would
just
be
absent
and
that's
you
know
that's
more
pain
to
to
to
to
to
you
know
to
parse.
If
you
wanted
the
jpu
query,
for
example,
that
would
be
slightly
more
complicated
than
and
then,
if,
if
they
are
like,
if
this
dictionary
is
always
there
and
can
just
be
either
empty
or
not
empty,.
D
Right
so
it
sounds
like
Sam
is
saying
that
it
sounds
like
Sam
is
saying
that
he's
he's
looking
in
the
direction
of
a
new
command,
but
then
my
concern
like
the
concern
that
I
raised,
with
the
fact
that
the
old
command,
like
once
when
we
actually
when
we
actually
move
a
counter
that
that
is
for
which
it
makes
sense
to
be
labeled
to
into
like.
Once
we
move
it
to
the
label
framework.
D
D
We
don't,
but
this
would
mean
that
the
demon
would
need
to
bump
to
two
instances
of
the
same
curve
counters
and
that
they
may
potentially
get
desynchronized
like
if
we
have,
let's
say
by
there,
five
on
the
on
the
one
side
and
six
on
the
other
side,
which
is
probably
not
not
a
good
idea.
And
it's
just
more
code
in
the
demon
itself,
because
there
would
be
two
things
to
bump.
Instead
of
one.
N
Well,
in
in
the
case
of
the
perv
dump,
command,
there's
already
two
arguments
that
would
follow,
which
are
the
instance
name
of
the
perf
counters
and
then
the
the
actual
counter
name
Within,
that
instance,
and
that
that's
for
filtering
and
I'm
gonna
guess
if
we
we,
we,
we
can't
alter
that
order
either.
N
N
D
Right
but
the
other,
the
other,
like
the
concern,
the
primary
concern
that
I
have
with
term
damp
labeled.
Is
that
that's
a
very
bad
name,
because
this
command
has
to
dump
all
counters,
not
just
not
just
the
labeled
ones.
We
I
think
we
we
the
thing
that
I
I
think
we
we're
not
like.
D
There's
no
question
about
is
whether
there
should
be
a
single
command
that
dumps
all
counters,
both
labeled
and
unlabeled,
because
that's
that's
the
that
would
be
like
having
having
it
split
between
this
here
is
where
all
the
old
stuff
lives-
and
here
is
where
the
new
stuff
lives
and
the
user
and
any
monitoring
tool
and
including
self-exporter,
which
we're
working
on
having
to
run
those
two
commands
and
again
getting
potential
inconsistent
resources.
It's
not
good,
so.
I
L
I
I
also
agree
that
we
should
feel
free
to
change
the
format.
As
long
as
we
release
note
it,
but
I
still
haven't
been
queer
about
whether
other
components
are
set
of
stuff
are
consuming
these
in
the
Json
format.
Does
anybody
know
how
these
counters
get
shipped
to
the
manager
and
consumed
by
modules?
Is
that
in
Json,
or
is
that
or
is
there
a
different
Channel.
L
Because,
if
we're
changing
the
format,
that's
going
to
cause
issues
for
upgrades
and
I
do
think
that
we
would
care
about
that.
L
D
Well,
I
would
be
really
surprised
if,
if,
if,
if,
if
it
was
actually
a
Jason's
twin,
that
was
shipped
to
the
manager,
I
just
encoded
as
ssf
string,
I'm,
pretty
sure
that
it's
that
it's
that
the
Prof
that
the
Pro
Camera
collection
is
getting
encoded
according
to
the
according
to
the
like
type
of
each
counter.
D
So
if
it's
an
instance,
it's
an
end,
if
it's
a
if
it's
a
if
it's
an
average,
it
gets
encoded
with
like
three
different
values:
one
one
which
is
a
fluid
and
whatnot
I,
don't
think
it's
a
Json
and
yeah.
So
it
is
for
that
reason
that
I'm
not
really
worried
about
the
anything
that
is
internal
to
self
or
around
self,
such
as
a
Telemetry
project.
It's
really
the
external
users.
That
I
think
is.
D
You
know
that
is
the
major
concern
here
with
their
own
scripts
or
monitoring
tools
that
they've.
Let's
say
that
they
have
running
around
their
subclusters.
A
For
a
clarity's
sake,
if
we
were
to
just
change
the
schema
of
the
perf
dump
command,
would
there
be
potentially
a
mix
of
labeled
and
unlabeled?
Perf
counters,
like
all
of
the
counters,
would
be
there,
but
some
are
just
changing
to
label
birth
counters.
D
Right
as
time
goes
by
that's
what
would
happen,
I
I
kind
of
like
Greg's
suggestion
that
we
can.
We
can
keep
the
the
like,
we
can
keep
the
curve
dump
named
to,
and
you
know
we
are
allocated
to
the
new
command
and
they
will
behave.
However,
we
want
and
the
old
one
like
have
the
old
Behavior
be
invoked
like
with
you
know,
first
underscore
dump
or
something
of
that
sort,
all
right.
One.
G
D
I
D
I
D
All
right,
then,
in
that
case,
can
we
can
we
use?
You
know
a
few
minutes
here
to
come
up
with
a
with
a
name
that
would
be
as
good
as
per
dump,
but
not
per
dump
that
that
we
could
use
here
for
the
new
command.
D
There's,
that's
that's
having
it
an
option
like
we
talked
about
like
minus
minus
minus
all
or
something
right
to
signify
that
it's
both
unlabeled
and
labeled
above
counters.
The
problem
with
that
is
that
the
option
parsing
framework
is
is
not
flexible
enough
for
that
at
least
not
out
of
the
box,
and
it
might
require
some
some
tinkering
to
get
that
done.
So,
if
we
and
also
I
don't
think
any
of
the
any
of
the
admin
socket
commands
actually
accept
options
that
that
look
like
options.
D
So
with
the
minus
minus
prefix,
the
the
only
thing
is
accepted
is
just
you
know,
position
arguments
that
are
just
that
are
just
either
verbs
or
nouns
in
the
case
of
verb
dump
like
Ollie
mentioned,
there
are
two
additional
optional
arguments
that
are
accepted.
One
is
the
blogger
subsystem
so
like.
If
you
want
all
blogs,
like
all
Perth
counters
produced
by
MDS,
you
would
you
would
specify
MDS,
and
the
second
optional
argument
is
the
name
of
the
birth
counter
itself.
D
J
J
G
Yeah,
perfect
doesn't
really
communicate
any
meaning
unless
you
already
know
what
it.
J
And
I'd
argue
against
an
option,
because
so
I
I
I'd
argue
in
favor
for
one
thing:
if
keeping
the
existing
command,
because
there's
a
lot
of
tooling
that
we'll
need
to
deal
with
both
versions
of
this
command
during
an
upgrade
and
I
think
being
deliberate
about
which
version
is
being
targeted,
is
worthwhile
and
to
ronan's
point
I
agree.
We
shouldn't
hire
ourselves
needlessly
to
any
specific
things.
J
G
G
J
Keep
commands
which
have
a
heaped
like
those
are
just
single
underscored
words,
though
right.
N
A
It's
also
the
the
histograms
command
I,
don't
know
if
that's
related.
J
D
Okay,
well,
I,
that's
like
the
direction
is
clear:
we
need
a
new
command.
We
will
I
guess,
delivery
time.
The
name
some
more
in
the
pr
and
the
decision.
As
for
the
old
command,
is
that
we,
you
know,
keep
it
completely,
as
is,
but
as
the
calendars
are
converted,
they
will
just
disappear
from
the
from
the
old
command.
I
Yep
I
do
I,
do
have
one
more
question
like
in
theory,
I,
don't
like
like
okay,
we're,
not
We're,
not
gonna
break
it
exactly,
but
we
are
going
to
move
move
counters
around
and
in
theory,
I,
don't
mind
breaking
interfaces
on
major
releases,
but
it's
it's
kind
of
late
like.
J
D
Right
with
the
exception
of
three
or
four
Turf
counters
that
were
added
not
too
long
ago
to
the
ability
near
demon,
we
are
pretty
sure,
there's
like
no
one
is
aware
of
them,
except
for
for
our
monitoring
team.
They
really
want
those
to
be
converted
to
labels
and
so
I'm
going
to
convert
those
to
labels
in
in
leaf,
but
I.
Don't
think
that
even
warrants
are
released
note
because
nobody
is
aware
of
them.
But
it's
not
like
some
well-known
blue
star
counter.
J
And
incidentally,
we
should
release
note
this
anyway,
because
the
communication
to
users
going
forward
is
they
should
use
the
new
command
and
deprecate
use
of
the
old
one.
D
Right
right
that
that
that
that
was
the
intention,
it's
just
that
to
your
point,
to
direct
statement
that
you
know
if
your
Founders
should
change
in
this
release.
Unfortunately,
some
some
will.
A
L
D
Well,
I
would
make
it
counters
with
an
S.
At
the
end.
Pill
is
yeah
other
than
that
that
that
sounds.
That
sounds
good,
but
just
to
just
to
make
it
clear
that
counter
or
counters
labels
like
the
last
one
would
dump
just
labels
or
or
like
I'm,
not
sure
like.
What's
the
purpose,
because
we've
agreed
that
dump
should
dump
everything
both
unlabeled.
Yes,.
L
D
H
A
I
wrote
our
conclusion
to
that
on
the
CDM
tracker,
so
we
have
record
of
it.
Was
there
anything
else
that
we
wanted
to
wrap
up,
because
it
sounds
like
we
hit
a
conclusion,
but
if
anybody
else
has
anything
to
say,
go
for
it.
L
Just
thanks
to
Ali
for
sticking
with
this
one.
A
Yeah
I
think
that
sounds
good
to
me.
A
L
A
That
was
our
last
topic
on
the
agenda,
so
I
think
that's
the
end
good
job
everybody
and
next
month
is
the
APAC
friendly
meeting
and
then
subsequent
month
is
going
to
be
EMA
again.