►
From YouTube: Ceph Performance Meeting 2021-10-28
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
let's
start
this
off.
Okay,
I
saw
three
new
pull
requests
this
week,
one
to
set
the
minimum
allocation
size
to
the
optimal
I
o
size
for
certain
devices.
The
idea
behind
this
is
that
there
may
be
new
device
types
coming
out.
In
fact,
there
are
already
new
device
types
coming
out
that
want
larger
than
4k
allocation
units,
but
having
said
that,
they
may
not
require
it,
so
there's
some
trade-off
between
performance
and
wasted
space
and
all
these
different
things.
A
So
there's
some
discussion
happening
in
that
pr.
Regarding
that
definitely
an
interesting
discussion
if
you're
interested
in
this
whole
topic,
so
yeah
there's,
there's
the
pr
another
new
one
from
igor
to
make
the
shared
blob
fsck
process
much
less
ram.
Greedy
igor,
that
was
a
huge
win.
It
looks
like
I
think
you
said,
12
gigs
down
to
like
a
half
a
gig.
B
B
But
yeah
in
a
couple
of
deployments
I
could
see
like
millions
of
shared
blobs
in
the
rocklin
in
booster
database,
so.
B
Ahead,
well,
maybe
we
should
make
additional
investigation.
Why
so
many
blobs
are
present
there,
but
anyway
shouldn't
grow.
That
much.
B
So
nobody
cared
about
the
efficiency
of
this
process
at
the
initial
development
and
it
looks
like
we
missed
this
thing.
A
B
Well,
I
haven't
tried
that
in
the
field
I
made
some
artificial
testing
yeah
so
well.
Actually
I
I
changed
the
design
of
this
internal
data
structure,
so
I
I
believe
it
should
open
in.
A
Nice
excellent
job
looks
great,
igor
all
right
last.
Pr
that
I've
got
here
was
from
mark
cogan,
and
that
is
a
work
on
this
db.
Store
that
we
were
just
talking
about
a
little
bit
ago,
looks
like
add,
config
options
to
set
us
single
light
performance
tuning
parameters,
so
yeah
active
work
on
that
looks
really.
Neat
definitely
would
encourage
the
rgw
folks
to
to
talk
more
about
it
and
what
they're,
what
they're
doing?
A
Okay,
let's
see,
I
didn't
see
any
closed
prs
this
week.
It's
possible.
I
missed
something
but
didn't
see
anything
specifically
performance
related,
but
we
did
have
three
updated
prs.
A
Let's
see
this
one
to
optimize,
pg
peering
latency
is
neha
here.
No,
I
don't
see
her
okay,
so
this
one
basically
she's
going
to
take
a
look
at
it
looks
like
maybe
it
needs
to
be
retargeted
to
master,
but
in
this
very
narrow
case
that
they
have,
it
looks
like
they.
They
saw
a
fairly
good
reduction
in
hearing
latency,
so
that
was
it
yeah
in
this
very
specific
case
so
looks
like
it
might
be
a
win
next
pr,
the
mds
remove
subtree
map
from
the
journal.
A
This
one,
a
week
or
two
ago,
a
design
documentation
was
provided.
I
haven't
looked
through
it
yet,
but
it
looks
like
that.
There's
still
ongoing
work
being
done
with
this,
so
that's
really
good.
It
did
receive
an
update
recently,
I'm
not
quite
sure
what
changed
I
didn't.
Look
that
deeply
into
it,
but
it's
still
being
actively
worked
on,
which
is
really
good.
Okay,
last
one.
Oh,
this
is
an
older
pr.
A
From
this
past
summer,
optimize
object
memory
allocations
using
pools
that
one
would
have
been
kind
of
stale
for
a
little
while,
but
gabby
went
through
and
added
some
discussion
to
the
pr
and
then
I
I
also
added
a
little
bit
there
gaby.
A
I
I
really
like
the
idea
behind
this
pr
and,
more
generally,
the
idea
of
doing
more
to
allocate
memory
from
pre-allocated
regions
or
at
least
use
memory,
I
should
say
from
pre-allocated
regions
or
or
maybe
even
better
pre-allocate
objects
and
reuse
them
any
any
more
thoughts
on
that
yeah.
C
C
So
on
so
when
you
look
at
a
big
object
like
an
object
node,
I
think
you're
going
to
end
doing
like
20
30
calls
to
malloc,
oh
to
you,
yeah,
so
even
if
this
optimization
you're
still
going
to
have
to
pay
all
these
locks,
unlocks
overhead
fragmentation
and
so
on,
if
we
could
design
something
like
a
standard
object
for
some
kind
of
of
operation,
which
we
can
do
and
say
you
know
what
even
so,
we
should
allow
up
to
75
entities,
but
normally
we
don't
need
more
than
eight.
C
Yes
and
it's
a
little
bit
just
a
short,
it's
going
to
be
probably
the
vast
majority,
because
while
we
do
allow
a
lot
of
crazy
options,
the
fast
majority
don't
use
them.
So
you
should
have
something
for
the
normal
case.
Try
to
get
a
single
call,
recycle
the
object.
If
you
need
more
than
you
say,
you
know
what
in
that
case,
I'm
going
to
do
some
allocation.
A
C
You
don't
pay
fragmentation
and
even
so,
if
you
say
like
when
you
pre-allocate
and
you
assume
you're
going
to
have
eight
sub-objects,
while
in
many
cases
you
can
now
have
only
two
or
four
but
keep
in
mind
that
every
time
you
allocate
an
object,
if
say
it's
a
linked
list,
you
still
have
to
pay
some
extra
work
for
all
kind
of
link,
object
and
such,
but
if
you
put
them
in
an
array,
you
don't
have
to
pay
for
this,
so
it
needs
a
little
work.
A
And
in
wall
clock
profiling,
we
see
the
evidence
of
this
regularly
new
and
delete
and
and
object
creation
and
object.
Destruction
are
often
some
of
our
biggest
consumers.
Overall,
when
you
look
at
the
the
the
amount
of
time
spread
across
the
code
base,
where
we're
we're
doing
work.
D
D
A
A
Gabby
I
mentioned
in
the
pr
but
erratic,
and
I
were
working
on
just
specific
well,
it
doesn't
have
to
be,
but
we
were
looking
very
narrowly
at
encode
decode
and
allocating
memory
from
ring
buffer
a
thread
local
ring
buffer,
rather
than
allocating
from
malik
to
start
out
with,
but
using
malek
as
a
fallback
when
the
ring
buffer.
When
you
you
couldn't
allocate
from
it,
then
you
you
just
fall
back
and
we
did
we.
I
think
we,
if
I
remember
we
saw
some
advantage.
A
Actually
it's
been
a
while,
since
I
did
this,
the
the
data
is,
is
here,
but
it's
very
difficult
to
parse
through
it
all
but
I'll
I'll
paste
in
case.
Anyone
cares
enough
to
look
at
this,
but
the
the
gist
of
it
was
that
the
we
saw
an
incredible
amount
of
allocation.
I
mean
many
many
gigabytes
per
second
that
was
going
through
this
thing
and
I
think
that
there's
there's
a
huge
amount
of
opportunity
for
us
to
do
much
better.
A
A
A
Or
even
the
third
tab
that
total
fallback
length,
that
was,
I
think,
more
or
less
the
amount
of
data
that
was
coming
from
malik
and
then
the
that
was
like
column,
h
and
then
column
d,
the
total
alk
length.
I
think
that
was
basically
how
much
we
were
allocating
from
the
the
ring
buffer.
If
I
remember
right,
aggregate
across
ring
buffers.
A
Let's
see,
that
was
probably
the
aggregate
in
total.
If
I
remember
right,
but
maybe
maybe
it
was
actually,
this
was
a
per
second,
I
hope
not
well.
In
any
event,
it
was
I'll
I'll
have
to
go
back
and
look
at
it
again,
like
I
said,
been
a
while
since
I've
I've
done
this,
but
the
the
gist
of
it.
A
Overall,
from
what
I
recall
when
we
were
working,
that
we
had
a
huge
amount
of
memory,
allocations
are
coming
through
for
encode,
decode
and
and
that
we
did
see
the
behavior
we
wanted
to
see
where
we
were
getting
more
allocated
from
the
thread
local
ring
buffer
as
we
increase
the
buffer
size,
we
could
get
to
a
point
where
we're
doing
like
a
lot
of
it
from
the
ring
buffer
and
that,
then
you
avoid
the
lock
contention
overhead
and
avoid
a
lot
of
the
the
problems
that
you
have
with
tc
malek
when
you
have
to
fall
back
to
allocating
from
the
the
central
space
which
we
did
see
happening.
C
A
I
think
what
happened
is
that
we
see
a
lot
of
fragmentation
with
the
thread
local
buffer
when,
because
of
the
way
that
tc
malek
has
to
do
it
for
many
allocations
across
the
entire
the
entire
process,
whereas
for
encode
decode,
we
know
we're
guaranteed
that
these
are
short-lived,
they
only
live
as
long
as
the
current
up,
so
we
do
it
in
our
own.
We
know
that
it's
going
to
go
away
and
that
the
ring
is
going
to
progress.
You
know
fairly
reasonably.
C
A
A
A
C
A
We
we
work
one
by
one,
but
we
do
one
memory
allocation
up
front
because
we
pre-walked
the
entire
structure
figured
out
what
the
sizes
are
going
to
be,
and
then
we
actually
go.
We
do
one
memory
allocation.
That
was
an
optimization
we
made
early
on
in
blue
store
to
avoid
making
lots
and
lots
and
lots
of
small
memory
allocations
for
every
single
structure.
C
And
you
don't
need
so.
If,
if
it's
going
to
be
7.5
kilobytes,
we
don't
need
to
be
to
have
the
exact
size
just
give
it
constant
eight
case
and
just
keep
allocating
1k
8k
understand
I
mean
even
the
stock.
You
could
have
enough
space
and
once
actually
sorry
take
it
back
the
stuff
that
we
encode
decode.
Do
we
pass
it
as
a
buffer
list,
because
if
so,
it
couldn't
stay
on
that
on
on
the
stack.
A
C
A
A
But
the
entire
way
that
we
do
encode
code
is
kind
of
not
ideal,
even
just
going
through
and
doing
that.
Pre-Computation
of
the
entire
you
know,
memory
usage
is
is
not
cheap.
Radic
said
that
he's
seen
cases
where
that
was
ended
up
actually
being
slower
than
just
doing.
The
memory
allocations
up
front.
C
For
something
like
this,
I
would
definitely
use
slab
allocator
or
some
kind
of
a
body
allocator.
When
you
keep
using
power
of
two
you
use
them,
and
then
you
combine
them
afterwards.
When
you
free
them,
you
don't
need
exact
allocation
size
here,
it's
okay
to
have
some
space
wasted.
C
A
A
We'll
just
need
to
figure
out
how
to
change
it.
It
doesn't
help
that
what
we
do
in
blue
store
with
denk.
That's
really
the
only
place
that
we're
using
this
new
thing
anyway,
where
in
most
of
the
other
parts
of
the
code
like
in
the
mds,
we're
still
using
the
traditional
scheme
where
we
basically
just
allocate
as
much
space
as
we
need
for
each
individual.
A
Yep
yep,
and
even
for
like
mds
journal
journaling.
We
we
see
both
fragmentation
and
we
also
see
that
the
journaling
process
is
very
slow,
be
spending
so
much
time
in
memory
allocation.
C
A
C
And
that
goes
again
to
the
first
item.
If
the
object
are
not
going
to
be
dynamic
but
static
with
static
tables,
then
this
kind
of
object
could
be
just
give
them
the
pointer
and
it's
going
to
be
binary
stored.
The
reason
I'm
needing
coding
coding
is
because
we
have
b
tree
so
you
have
to
walk
the
tree.
Then
we
have
a
linked
list.
You
have
to
walk
the
list
and
the
memory
is
not
continuous.
C
C
So
if
you're
going
to
say
your
request
is
going
to
look
like
this
everything's
going
to
so
when
you
use
the
gen2
protocols,
we
could
just
use
structures,
do
one
allocation,
get
the
whole
struct
and
and
and
when
we
need
to
do
serialization,
we
just
take
it
as
is,
and
there
is
no
work
that
have
to
be
done
and
we
can.
We
could
still
do
backward
compatibility
by.
C
If
you
don't
confirm
to
this
new
api,
then
we
will
still
do
the
stuff,
but
hopefully
everybody
eventually
would
go
and
and
move
to
to
this
fixed
size
thing
like.
If
you
look
at
the
mainframe
protocol,
the
ckd
in
theory,
it
could
be
any
possibility
they
have
in
the
secretly.
You
could
define
a
lot
of
possibility
in
how
you're
going
to
make
the
disk
layout
in
reality.
A
I
think
version
two
is
a
good
idea
too,
and
and
currently
we
have
all
these
conditional
situations
for
older
versions
of
encoding
like
in
the
mds
and
it's
not
as
bad
in,
like
blue
store,
since
it's
much
newer,
but
there's
all
these.
A
Things
to
support
stuff
that
was
from
10
years
ago,
right
like
encoding,
that
was
different
than
years
ago.
You
know
we
can
version
two.
You
could
do
away
with
all
of
that,
and
just
start
over
from
scratch.
It'd
be
probably
better
than
trying
to
shoehorn
in
all
this
old
stuff.
C
And
I
mean
it's
required,
people
would
sit
and
look
at
the
protocol
and
say:
okay,
we
in
the
past
we
gave
a
lot
of
flexibility,
but
what
do
people
really
use?
Because
if
we,
if
there's
some
options,
that
people
use
once
a
day
and
there's
something
people
use
all
the
time,
then
maybe
you
should
eliminate
that
possibility.
In
version
two.
C
C
C
If
you
need
more
than
that,
then
you
have
to
go
and
use
the
old
api
and
you
could
maybe
even
mix
match.
Maybe
you
could
every
request.
You
could
say
I'm
using
this
request
using
a
new
api.
That's
like
was
using
the
old
api,
so
everybody
which
could
fit
within
eight
attribute
they
would
use
the
new
api
people
which
need,
I
don't
know,
27
attributes
they
would
use
the
earlier
one.
C
A
And
and
really
is
it
more
wasted
space
than
we
suffer
right
now
from
fragmentation.
C
Or
I
don't
know
red
black
tree
that
they
have
every
object
got
a
lot
of
overhead
because
you
got
the
right
pointer,
left,
pointer,
yeah.
E
C
C
You
pay
in
memory
memory
layout
and
then
you
pay
that
when
you're
doing
dump
or
marginal
demarcian,
you
need
to
walk
the
whole
thing.
While,
if
you
keep
everything
as
a
vector,
everything
will
be
sequential.
C
C
If
you
look
in
scuzzy
protocol,
the
structures
are
fixed.
You
could
have
some
flexibility
but
like
it's
like
64
bytes,
for
the
structure
in
fcp
in
the
fiber
channel.
Maybe
you
don't
need
64.
Maybe
you
could
fit
in
48,
but
64
is
enough,
but
you
cannot
have
more
than
64.
you're
not
going
to
have
47
bytes.
C
C
I
think
memory
access
is
a
big
killer
if
oh,
every
object
need
to
be
working
from
different
locations,
especially
if
the
memory
is
fragmented
and
say
you
allocate
a
link
list
with
eight
object,
but
because
the
memory
is
so
fragmented,
the
first
one
is
going
to
come
from
that
page,
the
second
from
that
page
and
you
keep
being
bringing
a
cache
line.
Every
time
you
access
one,
you
walk
the
list.
C
One
by
one,
you
don't
have
to
do
the
whole
system
at
once.
You
could
start
with
one
object
type
and
then
measure
what
kind
of
impact
it
does
and
then
for
throat
just
yeah
it
could
be
start
easy
and
once
you
do
it
for
one
object,
you'll
realize
how
you
do
it
and
then
you
could
start
doing
this
one
object
after
another,
but
it's
not
going
to
be
done
as
nicely
and
generic
as
this
thing
from
ibm.
The
ibm
thing
was
you
don't
understand
the
system,
they
don't
they
don't
claim
to
understand.
C
A
Yeah
that
was
kind
of
the
same
way
with
the
thing
that
radick
and
I
were
working
on.
Well
we're
not
going
to
change
the
entire
code
base,
we're
just
going
to
try
to
make
the
behind
the
scenes,
make
it
a
little
better,
because
we
have
some
knowledge
that
at
least
for
encode
decode.
We
know
it's
going
to
go
away
soon,
as
opposed
to
other
things
that
may
or
may
not
go
away
soon.
So
maybe,
with
that
knowledge,
we
can
make
it
a
little
better.
C
And
if
you
find
like
what
object
are
your
wars?
Hitters?
Then
you
deal
with
them
and
they're
still
going
to
be
a
lot
of
small
things
which
I
don't
know:
management
and
telemetry
awards
like
all
this
stuff.
That
doesn't
happen
so
often,
okay.
So
it's
going
to
stay
in
going
to
use
dynamic
allocation,
but
your
big
object
they're
going
to
be
pre-allocated
and
used
from
a
single
pool.
C
Usually
you
build
the
system
in
a
way
that
you
say
I
can
service.
I
don't
know,
1000
requests
in
parallel
while
giving
you
some
predictable
response
time.
I
don't
want
to
go
to
like
spike
of
5000,
because
it's
going
to
cause
the
performance
to
be
very
slow.
So
once
you
reach
1000
you're,
going
to
start
rejecting
things
or
just
cue
them,
but
don't
process
them
because
it's
going
to
impact
your
performance
so
just
keep
like
a
minimal
request,
but
don't
start
processing
them.
C
So,
but
that
also
means
that
you
know
how
much
space,
how
much
memory
you
need
because
you're
going
to
set
your
own
limit,
you
say
I
don't
know
1000
parallel
iops
are
going
to
happen.
4K,
I
don't
know
you
said
the
numbers
anything
above
that
you
just
cue
them
and
you
don't
allocate
resources
aside
from
the
request
itself,
which
is
relatively
small.
A
It's
nice
too,
because
if
we
do
this
right,
it
means
that
all
of
this
memory,
auto
tuning
stuff
that
I've
written
becomes
obsolete
and
I
wanted
to
I
wanted
to
go
away
right.
You
know
it's
trying
to
dynamically
account
for
changes
that
are
happening
to
keep
us
below
a
certain
memory
limit.
But
if
we
just
are
smart
about,
you
know
the
allocations
upfront,
then
we
don't
need
to
do
any
of
that.
We
just
have
our
pool
memory
that
is
static
anyway,.
C
That's
like
the
base
design
principle.
Actually,
since,
since
the
early
80s
people
been
using
this
kind
of
principle,
you
know
how
your
system
behave
and
then
you
know
like
in
the
old
unix.
You
have
your
limits
and
when
you
change
the
limit,
it's
just
going
to
assign
how
many
objects
you
could
process
at
once.
If
you're
going
to
limit
that,
you
cannot
have
more
than
that,
many
processes
happens
and
everything
is
pre-allocated
and
everything
is
working.
If
you
need
more
than
that,
then
you
need
to
change
the
settings.
A
All
right,
I
think,
we've
we've
beaten
this
one
to
death
almost
a
hard
time.
Yeah.
C
A
In
code,
decode
is
definitely
a
high
impact
area.
There
are
others
too,
but
that's
that's
one
of
the
ones
where
we
have
two
ways
of
doing
it.
The
old
way
in
the
new
way
and
both
are
kind
of
bad
in
different
ways.
So
that
be,
you
know,
somewhere
to
start
to
see
what
what
are
we
encoding?
What
are
we
decoding?
A
C
A
I
get
the
feeling
that
a
number
of
these
lists,
or
or
even
unordered
maps
that
we
have
we,
we
have
the
ability
to
have
an
arbitrary
number
of
entries,
but
we
only
use
maybe
one
or
two
of
them
typically.
A
All
right,
well,
we're
almost
done
here
I'll
just
add
the
two
discussion
topics
here,
I'm
doing
q3
update
for
crimson
just
trying
to
get
more
data
for
that.
I
I
I'll
talk
about
it
later
on,
but
right
now
just
trying
to
collect
it.
A
There
are
a
bunch
of
peers,
just
went
in
for
cbt
that
add
some
support
for
crimson
and
other
things
that
kicking
around
that
I
needed
to
get
in,
so
maybe
some
cpt
improvements
coming,
and
that
was
all
I
really
had
so
in
the
last
eight
minutes
here.
Anything
else
that
we
want
to
talk
about
before
we
wrap
this
meeting
up.
A
A
C
A
You've,
looked
at
the
blue
store
data
structures
more
than
I
have
do
you
remember
as
you
look
through
them
with
the
bigger
dynamic
elements
there
were
and
and
blobs
and
shared
blobs,
and
and
these
things
I
remember,
there's
a
fair
amount,
but
it's
been
a
little
while,
since
I
looked
at
them.
C
B
B
B
E
Yes,
we
based
a
lot
of
our
algorithms
on
entries
being
dynamically
added
and
tweaked,
then
split
them
dynamically
and
changing
that
will
really
impact,
not
even
data
structures,
also
even
some
decision
process
like
when
to
resp
recompress
compressed
chunks
that
that
also
will
be
affected.
If
you
change
data
structures.
C
C
B
B
B
C
B
Well,
definitely
doesn't
wait
until
additional
confirmation
from
versed,
but
it
it
could.
B
Put
postpone
sending
additional
data
so,
instead
of
sending
like
eight
gigabyte
megabyte,
it
might
send
less
in
less
portions,
but
it
it
actually
depends
on
the
client
or
some
different
client
might
behave
differently.
A
We,
if
we
don't
have
the
space
or
the
already
filled
the
the
number
of
items
we
can
handle
or
the
the
amount
of
space
that
we
can
handle
for
ingest.
I
did
we
just
reject
it,
reject
it.
I
think
that's
what
I
remember
happening.
E
B
Sure
about
the
details,
but
but
to
be
to
to
be
honest,
the
current
the
current
bottleneck
is
not
in
the
allocation
itself.
It's.
D
B
Interlocking
between
different
tasks
or
things
like
that,
so
these
dynamic
allocations
bring
some
additional
overhead,
but
it's
not
the
primary
one.
It's
my
feeling
and
my
experience
that
I've
worked
what
I've
observed
before
so.
C
B
I
I
agree,
but
I
I
think
the
difference
between
seven
well.
This
other
devices
is
that
we
we've
got
pretty
powerful
cpu
here
and
we
are
not
getting
that
many
iops
at
the
moment
by
due
to
different
reasons
and
that's
why
we
are
not
affected
that
much
by
these
inefficient
allocations
and
things
like
that.
So
mark.
Please
correct
me
if
I'm
wrong,
but
I
I
I
think
we
can
hardly
reach
more
than
100
000
rights.
B
C
B
A
So
that
number,
the
the
100
000
rights
per
second
you're
right,
igor
and
now,
especially
with
gabby's
work
on
the
allocation
improvements,
we've
gotten
to
the
point
where
the
tpusdtp
threads
are
actually
very
busy
when
we
do
that,
the
the
kbc
thread
is
still
kind
of
a
bottleneck,
but
the
those
our
worker
threads
are
actually
very
busy.
A
What
they're
busy
doing
there's
a
lot
of
contention
involved.
There's
memory
allocation,
there's,
object,
creation
and
destruction.
It's
spread
over
lots
of
stuff,
but
we
got.
I
think.
Crc
is
another
thing
that
shows
up.
We've
got
you
know
various
random
things,
but
we
are
very
busy.
Cpu
is
probably
near
the
bottleneck
with
16
threads
with
a
hundred
thousand
ray
apps,
we'll
use
almost
all
of
it.
C
A
B
Well,
at
this
point
I
would
bring
a
bit
different
idea.
Well,
a
while
ago
I
was
doing
some
experiments
with
messenger
and
it
looks
like
we
have
pretty
significant
overhead
in
in
messenger
itself.
F
I
think
that
you're
absolutely
correct.
We
should
find
one
component
which
is
easy
to
change
and
then
see
how.
F
And
well
I
I.
B
B
Again,
my
my
feeling
that
messenger
might
be
one
bottleneck
of
the
dpg
logic
and
the
law
interlocking
there.
It's
another
bottleneck,
roxdb,
the
third
one,
but
in
the
end
they
probably
write
all
this
inefficiency
in
in
locations.
A
Side
note
I
pasted
in
the
chat
the
osd
client
message,
size,
cap
and
osd
client
message
cap.
I
think
those
are
governing
when
we
reject
pulling
new
data
off
the
network.
A
We
should
we
should
move
forward
on
this,
though
one
way
or
another
even
just
trying
to
make
small
progress.
I
think
it's
a
worthwhile
pursuit.
C
Yeah,
so
I
think
at
first
we
need
to
find
the
best
candidate.
Maybe
that's
the
messenger
that
igor
was
pointing,
maybe
something
else.
So
we
should
try
to
look
for
a
good
candidate
which
is
simple
to
define
big
size
object
and
then
set
the
number
of
them
and
then
start
using
them
and
see
what
kind
of
impact.
If
any.
A
Sure
I'll
try
to
get
some
more
wall
clock
profiles
available
for
folks,
too,
and
then
maybe
that
will
help
us
determine
areas
that
might
be
worth
looking
at
in
more
detail.