►
From YouTube: W11 TEC Lab! cadCAD and Stablebaselines for Energy Web
Description
🙏 Thank you for watching! Hit 👍 and subscribe 🚩 to support this work
🌱Join the Community🌱
on Discord https://discord.gg/DDr5kYU
or say hello on Telegram http://t.me/CommonsStack
Join the conversation https://forum.tecommons.org/
Follow us on Twitter: http://twitter.com/CommonsStack
Learn more http://tecommons.org/
A
The
web3
sustainability
loop
is
an
idea
put
out
by
ocean
protocol.
Trent
mcconaughey
writes
about
this
and
it's
this
idea
of
sort
of
manage
how
to
manage
an
economy
and
it's
inspired
by
the
corporation
and
the
government.
These
kind
of
models
that
we've
seen
over
the
past
100
years
and
it's
basically
monetary
policy-
it's
like
money,
should
be
minted
and
allocated
effectively.
So
it's
this
difference
between
effective
allocation
of
money.
It's
like.
If
money
is
printed
and
not
efficiently
allocated,
then
that
is
inflation.
A
That's
what
causes
inflation
and
if
money
is
printed
and
allocated
efficiently,
then
that
is
growth.
So
this
is
this
is
the
key
thing
to
nail
down.
A
Right
is
like
it's
not
necessarily
about
how
many
tokens
we're
making,
but
it's
really
about
how
how
they're
distributed,
how
they're
issued
and
what
they
go
towards
and
if
they
go
towards
productive
means,
then
this
production
should
be
able
to
produce
enough
sort
of
inflows,
and
then
this
is
where
it's
been
really
nice
tracking
the
one
hive
honey
model,
because
it's
really
about
inflows
outflows
and
production,
and
this
is
that's,
that's
it
that's
the
recipe
it's
like.
A
If
you
can
take
the
tokens
that
are
being
minted
and
make
stuff
with
them
fund
the
community
fund
projects,
I
mean
exactly
what
we're
doing
here
at
the
tec,
with
with
the
hatch
and
the
bonding
curve
and
the
proposals
and
this
whole
process,
which
I
was
sort
of
like
lecturing
some
people
last
night
about
all
this
stuff,
and
I
realized
how
many
holes
there
are
in
my
own
knowledge,
like
I'm,
not
even
sure,
if
what
one
hive
has
with
the
common
pool
is
that
did
they
launch
an
augmented
bonding
curve?
B
No,
not
at
all
they
just
have
issuance
well,
so
they
want
to
change
it
to
dynamic
issuance
where
it
has
awareness
of
how
much
is
in
the
funding
pool
versus
the
total
supply,
but
they
just
have
issuance
pumping
into
the
pool.
They
took
out
the
bonding
curve
and
they
only
have
conviction
and
then
they,
then
they
just
print
money.
B
Currently
they
just
print
money
at
a
steady
rate
into
the
funding
pool
and
and
really
they
print
money
and
it
it
goes,
and
I
mean
it's
like
there's:
there's
a
there's,
an
issuance
contract
and
at
any
time
someone
wants
to.
They
can
push
a
button
and
based
off
that
time
and
the
last
time
the
button
was
pushed
money
will
be
minted
and
sent
to
the
fund.
B
It
can't
so
the
the
automated
part
is
the
quantification
of
how
much
can
be
set
at
any
moment,
but
it's,
but
unfortunately
the
evm
requires
an
external
account
to
take
action
on
contracts
for
actions
to
occur.
So
it's
like
the
best
tech
doesn't
always
win.
Let's
just
put
it
that
way.
Ethereum
has
a
lot
of
problems.
B
Man
there's
a
lot
of
really
annoying
technical,
like
hacks
to
make
it
due
to
all
the
cool
stuff
we
wanted
to
do
you
know,
and
one
of
the
big
things
is
you
just
can't:
have
a
contract
just
like
at
a
certain
time,
do
something
you
can
have
a
bot
that
has
an
external
key
and
then
that
bot?
You
know
the
web
2
bot
just
like
triggers.
You
know
it
has
gas
money
and
it
configures
something
you
can
do
that,
but
you
can't
have
the
contract.
A
But
yeah
overall
bozon's
been
great
and
it's
a
lot
of
work
to
just
keep
up
with
everything
that's
happening,
but
I
think
their
vision
is
wonderful
and
the
founders
are
are
great.
They
have
good
mindsets
and
I
think,
there's
going
to
be
a
lot
of
collaboration
between
the
tec
and
boson.
I
see
angela
just
joined.
C
B
C
A
A
B
A
C
A
Nothing
happens
with
the
sheet,
yet
it's
just
a
fun
project
really
to
see
what
kind
of
emojis
people
enjoy,
but
then
it's
available,
I
think
the
idea
is
we
could
do
a.
I
think
we
could
do
a
lab
sometime.
Well,
there's
not
that
much
rich
data
here
other
than
I
guess
the
social
media
platforms
or
operating
systems.
A
Maybe
we
could
do
something
with
that,
but
the
idea
was
just
to
generate
a
data
set
of
of
the
tec
lab
that
maybe
one
day
we
can
actually
go
ahead
and
analyze
and
do
some
data
science
on
this
data
set
itself.
A
But
nothing's
been
done
yet,
but
it's
nice
to
track
this
attendance,
so
we
could
definitely
do
a
co-app
or
something
along
those
lines.
Yeah.
C
A
A
All
right
so
right,
good
and
if
there's
any
questions,
anyone
can
just
let
us
know
and
the
lab
worksheet.
So
if
you
check
the
calendar-
and
today
is
february
19th.
D
Well,
thanks,
or
should
I
sean
what
do
you?
What
do
you
prefer
yeah,
you
can
call
me.
D
Yeah
or
ich
in
german
okay,
why
do
angela
yeah
sure
today
we're
going
to
talk
about
our
reinforcement,
learning
agents
in
the
loop
of
energy
web
last
last
lap,
I
talked
about
how
we
went
on
with
the
energy
web
submission.
D
As
a
matter
of
fact,
we
are
already
collaborating
on
that,
together
with
lior
from
the
data
guys
and
he
reached
out
to
to
to
join
and
well
basically
join
efforts
in
in
order
to
maybe
get
something
up
and
running
with
energy
web
and
data
down
together
we're
in
the
process
of
setting
up
a
two
page
or
something
which
I
will
share
in
due
time
for
this
lab.
D
I
think
we
need
to
go
a
bit
again
through
the
intro
of
the
token
spice
catkat
migration,
just
a
short
overview
of
what
we
discussed
last
time
and
then
head
over
to
the
reinforcement
stuff,
but
maybe
sean
you
can
give
a
quick
intro
about
the
reinforcement,
learning
agents
you
did
for
us
for
the
te
academy
and
with
your
notion,
notion
page
this.
I
think,
to
get
everyone
up
to
speed.
What
what
are
we
talking
about
when
we
talk
about
reinforcement,
learning
engines.
A
A
I
think
it's
just
a
notion
thing:
okay,
I'm
in
now
I.
C
A
Okay,
so
in
this
page
from
last
week,
I'll
link
to
that
there
also
just
drop
it
in
the
tec
labs
channel
yeah.
This
was
a
fun
presentation
that
I
gave
in
the
ocean
protocol
study
group
about
the
potential
of
combining
reinforcement,
learning
frameworks
with
token
engineering
frameworks,
particularly
token
spice,
and
so
I
gave
some
background
to
reinforcement
learning,
maybe
I'll
sort
of
go
through
this.
A
Briefly,
I
talk
about
this
primary
framework,
slash
library
that
is
available,
it's
produced
by
openai,
and
they
have
a
library
called
openai
gym
which
creates
these
so
back
up
a
little
bit.
What
is
reinforcement
learning
it's
this?
It's
two
standard
data
structures.
We
have
an
agent
and
we
have
an
environment.
A
An
agent
has
an
action
space
that
it
can
operate
in
this
environment
and
when
you're
thinking
about
token
engineering.
This
is
really
useful.
It's
a
nice
matching,
because
the
action
space
are
all
the
contract
functions.
That's
that's!
How,
when
you
think
about
the
evm,
the
state
of
the
the
blockchain
in
the
world,
all
the
things
that
we
can
do
are
essentially
the
functions
that
are
defined
in
the
contracts
or
the
or
the
protocol
itself.
So
we
can
interact
with
the
protocol
itself.
A
We
can
do
things
like
deploy
contracts
or
once
those
contracts
are
deployed,
we
can
interact
with
them
and
so
the
action
space.
I
I
like
to
actually
show
people
this
if
we
go
open
zeppelin,
maybe
here
so
if
you
don't
know
open
zeppelin,
it's
a
standard
audited,
secure
contracts
and
if
we
go
into
contracts
and
token
then
erc20,
then
this
this
is
an
awesome
resource,
and
if
we
take
check
out
the
interface,
then
we
have
all
the
functions
available
on
the
standard
erc20.
A
And
so
if
we
were
making
an
agent
in
a
token
engineering
simulation
where
maybe
there's
only
erc20s
deployed,
then
this
would
actually
be
the
action
space
for
the
agent
to
interact
with
the
contracts,
and
some
of
these
are
sort
of
read-only.
A
A
A
You
could
encode
each
action
with
a
number
one,
two,
three
four
five
six
and
you
could
have
a
basic
agent
that,
like
randomly
samples,
so
you
could
have
a
uniform
distribution
like
rolling
a
dice
from
one
to
six
and
it.
Maybe
this
agent
takes
a
random
action
and
you
could
have
thousands
of
these
agents
that
just
continuously
take
random
actions
and
then
you'd
be
able
to
see
what
are
all
the
wallets
of
these
agents.
Do
how
many
tokens
do
they
have?
A
Are
they
transferring
these
tokens
and
you'd
get
to
see
how
this
system
plays
out?
Maybe
if
every
agent
was
just
acting
randomly
and
that's
usually
your
baseline
when
running
and
deploying
ai
algorithms
is
like
what
would
happen
if
we
had
a
completely
random
policy,
and
once
you
get
that
working,
then
you
can
start
to
actually
use
the
sort
of
reinforcement,
learning,
algorithms
that
will
change
the
policy
over
time
based
on
based
off
feedback
from
the
environment.
A
So
that's
the
action
space.
So
this
each
agent
has
the
opportunity
to
take
an
action.
An
action
affects
the
environment
so,
like
we
said
some
of
these,
if
you
do
a
transfer,
then
you're
actually
changing
the
state
of
maybe
the
evm
in
this
case,
and
then
the
agent
gets
in
return,
some
reward.
So
if
the
agent
takes
a
series
of
actions
that
result
in
maybe
a
reward
function
is
maximize.
A
My
number
of
tokens
that
I
hold
in
my
wallet
so
the
more
tokens
you
have,
the
more
reward
the
agent's
going
to
get
and
then
again
the
agent
is
going
to
get
to
observe
the
environment.
So
this
is
encoded
similar
to
the
action
space
we
have
a
vector
and
the
observation
space
is
going
to
be
maybe
a
vector
or
a
matrix.
A
The
observation
space
could
simply
be
what's
the
balance
of
my
wallet,
so
we
could
encode
a
very
simple
erc20,
interacting
agent,
where
it's
taking
random
actions
on
an
erc20
token
and
after
every
action
it
takes,
it
gets
to
see
what
what
it's,
what
it's
balance
is.
So
it's
good
and
if
it's
tr
and
if
the
reward
is
to
maximize
its
balance,
then
it's
going
to
learn
quickly
to
stop
transferring
funds.
A
So
this
is
this.
Would
be
a
very
simple
implementation
of
a
reinforcement,
learning
agent
in
a
in
a
in
a
token
engineering
simulation
or
a
token
economy,
and
the
purpose
of
this
presentation
was
to
give
this
background
on
reinforcement,
learning
and
then
also
talk
about
the
opportunities
with
token
spice,
because
it
connects
directly
to
the
ethereum
virtual
machine.
We
can
actually
execute
these.
We
can
do
exactly
what
I
was
just
describing.
A
We
can
define
the
action
space
of
an
agent
to
be
actual
smart
contract
interactions,
and
then
we
can
run
these
these
agents
through
our
simulations
to
test
our
smart
contracts
and
the
token
ecosystem,
and
so
what
I
did
as
an
example
is
I
have
here-
is
all
all
the
agents
defined
in
token
spice
to
test
the
ocean
protocol
ecosystem
and
do
some
tokenomics
analysis
and
verification
and
simulation
to
get
these
sort
of
results
where
we
can
say
well,
what's
the
annual
revenue
of
the
dow
over
time.
A
So
we
have
monthly
ocean,
dow
income
ocean
minted
and
burned,
and
this
is
interesting,
griff
and
I
were
just
talking
about
one
hive
and
the
honey
issuance
before
the
call.
So
I'm
not
sure
exactly
their
issuance
rate,
but
trent
found
that
there's
you
can
do
sort
of
a
linear
issuance.
You
just
print
the
same
amount
of
tokens
over
time
I
say
monthly
and
on
the
opposite
side
of
that,
you
could
do
like
an
exponential
decay
in
your
token
issuance.
So
this
is
what
we
see
in
bitcoin
and
then
trent
found.
A
He
says
by
far
the
optimal
is
a
sort
of
ratcheted
exponential.
So
you
do
this
ratcheting.
He
calls
it
and
I
I
don't.
I
quite
know
the
precise
definition
of
what
ratcheting
is,
but
you
can
clearly
see
it
in
this
graph
here
where
it's
this.
I
think
it's
like
a
pseudo
manual,
token
minting
process
over
time
in
the
early
days
and
then
an
exponential
decay
over
time
and
trent
mentions
that
in
the
simulations
that
they
run
for
ocean
protocol.
A
This
is
by
far
the
best
issuance
policy
and
how
they
modeled
this
in
the
simulator.
Is
that
actually
the
agent
they
have
an
agent?
That
is
a
minting
agent.
So
this
agent
gets
to
decide
how
tokens
are
minted
over
time
and
they
just
had
three
pro
pre-programmed
options.
They
had
the
linear,
the
exponential
decay
and
the
ratcheted,
and
so
they
found
these
results
that
the
ratchet
did
the
best.
But
this
agent
itself
could
be
a
reinforcement,
learning
policy
that
you
could
train
over
simulations.
A
So
typically
in
rl
we
have
this
idea
of
an
episode
and
many
rl
problems
are
trained
on
video
games,
and
so
one
episode
is:
maybe
the
agent
will
play
through
an
entire
level
of
a
game
and
then
at
the
end,
it'll
have
its
total
reward
and
then
there'll
be
some
credit
assignment
part
of
the
algorithm
that
decides
at
each
step
of
that
game.
What
what
was
the
reward?
What
was
the
credit
attribution
of
the
that
of
the
action
that
the
agent
took
in
various
states
at
various
points
in
time?
A
So
this
this
is
a
overview
of
this
rl
process
and
then
here
I
have
a
coded
up.
Example.
I
took
one
of
the
agents
and
showcased
how
it
could
how
we
could
modify
this
and
augment
this
to
have
some
reinforcement,
learning
and
I'm
just
wondering
mark.
Do
you
want
to?
How
am
I
doing
do
you
want
to
jump
in
at
this
point
or.
D
Yeah,
sorry
yeah.
This
is
a
good
point
to
to
jump
in
yeah.
As
a
matter
of
fact,
I
looked
at
your
example
first
because
they're
not
not
so
many
examples
about
reinforcement
environments,
besides
the
let's
say,
the
the
standard
gym
environments
like
the
carpo
and
this
kind
of
stuff,
so
it
was
really
a
journey
in
in
how
to
set
up
custom
environments,
because
this
is
what
we
need
for
for
our
for
our
reinforcement.
D
Learning
agents,
as
you
can
see
here
in
this
picture,
sean
is
showing
here
that
he
makes
use
of
a
model
it's
given
into
the
into
the
initialization
function
of
the
of
the
agent
of
the
protocol.
Speculator
agent,
you
give
him
a
model
and
this
model
is
being
trained
beforehand.
I
guess,
and
this
model
will
predict
the
next
action
to
take
and
if
you
scroll
down
a
bit
further,
as
shown,
I.
A
Don't
see
that
I
import
did
I
forget
to
import
the
model,
the
actual
yeah.
D
D
The
model
somewhere
has
to
be
there's
no,
no,
no
problem
there,
because
we
suppose
that
there
is
a
model
available,
a
trained
model
available
for
this
agent,
and
you
see
then,
in
the
tech
step
function,
that's
a
normal
token
spice
a
simulation
step
function.
You
see
that
you
first
calculate
the
reward.
You
append
this
reward
and
then
you
do
an
action
and
this
action
is
being
predicted
by
the
model.
So
the
the
model
itself
has
a
function.
It's
called
predict.
D
This
is
a
function
I
really
needed
to
flesh
out
and
to
see.
Where
does
this
predict
function
is
coming
from?
I
think
this
is
a
nice
opportunity
to
share
my
screen
in
how
my
journey
went.
This
far.
A
C
D
Okay,
now
you
should
see
the
code
screen
correct,
yeah,
no.
D
All
right,
yeah
so
and
to
pick
up
where
we
left.
Last
week
we
talked
about
the
the
energy
web
agents,
quick,
refresher,
energy,
web
agents,
tokenized
power,
balancing.
We
need
to
power
the
balance
within
a
microgrid
of
energy
consumers
and
producers.
D
So
we
use
the
energy
web
origin
toolkit
for
that,
and
we
need
some
sort
of
external
interaction
and
also
the
ocean
market
to
to
do
that.
Well,
basically,
if
you,
this
is
a
high
level
overview
of
the
the
synergy
I
I
presumed
for
my
submission.
On
the.
On
the
one
hand,
you
have
a
energy
web
with
where
you
can
register
power
devices,
and
you
have
some
sort
of
marketplace
over
there-
that
you
can
sell
and
buy
power.
D
So
we
can
predict
what
the
outcome
will
be,
what
the
power
production
will
be
in
the
next
hour
or
the
next
day.
Maybe-
and
of
course,
you
can
have
all
sorts
of
staker
agents
around
that
that
are
going
to
predict
how
that
power
producing
device
will
behave
in
the
future
for
the
next
hour
or
the
next
day.
So
it's
quite
a
dynamic
system.
We
talked
about
that
last
week.
D
D
Okay,
that's
the
business
side
of
it
now
the
technical
side,
and
last
week
I
said.
Okay,
we
have
some
energy
energy
web
agents.
I
fleshed
it
out
in
a
separate
directory.
So,
first
of
all,
I,
of
course,
in
the
cat-cat
migration.
I
just
copied
and
pasted
all
the
agent
of
the
token
spice
environment
with
this
directory.
D
D
D
So
this
is
basically
the
agent
that
is
really
is
interacting
with
the
ocean
marketplace
and
after
his
action
of
publishing
a
data
token
pool,
we
have
a
date,
an
energy
web
pool
agent,
which
is
basically
the
manager
of
the
dave
talking
pool
and
we
can
interact
with
the
pool
agent
when
we
want
to
stay
committed
that
is
being
done
by
the
energy
web
staker
agent.
D
I
tweaked
a
bit
about
his
staking
behavior
at
this.
At
this
moment,
I'm
I
am
going
to
simulate
a
randomized
staker.
That
means
that,
once
in
a
while
it
stakes
on
the
data,
token
pool
and
the
other,
and
in
50
percent
of
the
cases,
the
mistake
and
in
the
other
50
percent
of
the
cases
it
will
unstay
and
it's
there
is
some
sort
of
sophistication
in
the
how
he
will
stake
or
unstake.
D
C
D
Let's
say
the
fewest
ocean
staked
on
and
and
on
the
other
way,
a
way
around.
It
will
unstake
on
a
pool
that
has
the
most
ocean
state
form.
So
it's
a
real.
Basically
it's
a
randomized
behavior,
but
we
need
some
some
sort
of
dynamics
into
the
simulation.
So
this
is
what
I
modeled
you
can.
You
know
tweak
around
and,
let's
say,
alter
the
magic
numbers
that
was
some
sort
of
maybe
sean
you
can.
You
can
elaborate
on
this
a
bit.
What
a
magic
number
is.
A
Yeah,
so
I
mentioned
this
last
time.
This
is
a
nice
aspect.
Of
the
token
spice
framework
is
that
we
leave
these.
So
in
general
software
engineering,
a
magic
number
is
a
bad
thing.
You
don't
want
to
have
any
magic
numbers,
it's
when
you
just
have
a
floating,
int
or
or
float
in
your
code
somewhere,
and
usually
people
don't
know
what
it
means
or
why
it's
there.
A
So
you
want
to
abstract
those
out
into
sort
of
a
configuration
space
where
things
are,
they
have
variable
names
and
you
can
get
all
these
magic
numbers
all
in
one
place
in
this
configuration
file
that
then
gets
loaded
into
your
system.
But
trent
inverted
this
principle.
He
actually
leaves
all
these
magic
numbers
in
the
engine
in
the
system
itself
and
so
to
really
augment
your
experiments.
A
You
need
to
go
through
the
source
code
and
look
find
all
these
magic
numbers
and
get
to
tweak
them
appropriately,
and
this
is
kind
of
unorthodox
from
like
a
software
engineering
practice,
but
from
a
simulations
practice.
A
It
makes
a
lot
of
sense
because
it
reduces
the
abstractions
we're
not
abstracting
everything,
away,
we're
actually
leaving
the
the
knobs
in
the
system
where
they
make
the
most
sense
right
directly
on
the
policies
themselves,
on
the
agents
themselves
on
the
environment
themselves,
and
it
ends
up
working
quite
nicely
because
it
does
reduce
these
like
abstractions,
that
you
have
to
follow
with
all
these
imports
and
sort
of
separating
the
components.
It
leaves
the
components
together
with
their
parameters
that
can
be
tuned
and
at
any
point
you
could
you
could
take
these
out.
A
You
could
find
all
the
magic
numbers
and
you
could
put
them
in
one
configuration
file
and
then
import
them,
and
maybe
I
could
see
that
happening
when
people
are
applying
token
spice
to
very
large
scale
simulations
large-scale
environments,
but
as
of
now
it
works
great.
It
makes
it
really
easy
to
just
modify
the
code
on
the
fly.
D
Yeah
so,
as
you
said,
as
you
see
here,
I
put
some
print
statements.
Just
you
know,
for
for
debugging
purposes
or
just
to
to
show
what's
happening.
I
will
start
out
a
a
simulation
now,
so
you
can
see
what's
happening.
D
Simulation
so
when
the
simulation
is
starting
up,
it
starts
to
publish
two
data,
token
pools
to
stake
on,
and
you
see
that
we
have
some
energy
web
publishers
staking
and
unstaking.
Also
and
later
on.
We
can
see
that
energy
web
stakers
are
also
going
to
stake
on
it.
So
we
have
an
energy
web,
staker
david
who
is
going
to
stake
on
that
and
they
yeah
an
energy
webster
in
brazil
and
once
in
a
while
every
three
hours,
new
data,
token
pools
are
going
to
be
published.
D
So
basically,
this
is
the
setup
for
the
reinforcement
learning
agent
to
have
some
sort
of
a
couple
of
data.
Token
pools
to
stake
on
according
to
some
sort
of
strategic
behavior,
of
course,
and
I
will
let
it
run
for
a
while-
you
can
play
with
it
as
I
will.
I
will
commit
this
this
also
to
to
the
repo,
so
he
can
fork
it
or
clone
it,
and
you
can
play
around
with
it
with
it
yourself.
A
D
So
what
you
need
to
remind
is:
okay,
we
have
some
some
sort
of
pool
staked
on.
We
have
the
energy
red
pool
zero
until
three
four,
maybe,
and
we
have
some
oceans
staked
on
and
you
see
we
had
some
strategic
behavior
of
the
staker
agent,
but
also
the
publisher
agent.
I
forgot
to
mention
that
is
showing
some
sort
of
dynamics
within
the
ocean
being
staked.
So
this
number
is
going
to
to
change
okay,
so
not
to
keep
you
waiting
too
long.
D
D
So
we
need
some
sort.
I
call
it
the
energy
web
optimizer
agent.
He
is
going
to
be
the
reinforcement,
learning
agent
that
is
staking
or
is
showing
some
sort
of
strategic
staking
behavior,
and
I
tweaked
a
bit
around
with
the
action
spaces,
and
here
you
can
already
see
that
it's,
it's
being
quite
already
quite
complicated
sean.
You
mentioned
okay,
the
action
space
is
a
tuple
I
played
around
with
it,
but
actually
I
couldn't
get
it
working,
so
I
resided
to
a
multi-discrete
action
space,
which
is
also
fine.
D
So
basically,
these
two
things
the
action
space
in
the
observation
space
are
the
thing
things
you
need
to
take
care
of
in
the
first
place.
So
what
do
I
mean
by
a
multi-discrete
space?
We
have
multi-discrete
means
you
have
an
array
of
discrete
spaces,
so
we
have,
for
instance,
three
distinct
action
types,
four,
distinct
energy
web
pools
and
four
distinct,
maybe
staking
person
percentages.
D
We
could
model
this
differently,
but
for
for
the
sake
of
simplicity,
I
kept
it
for
four
discrete
values
and,
as
sean
mentioned,
the
action
types
are
just
plain:
integers
zero,
for
I
do
nothing
one.
I
stake
two
I
unstake
and
where
do
I
stake
or
unstake?
These
are
the
is
the
second
that's
the
second
action
space
and
parameter
one
of
the
five
tools.
Okay,
I
I
presume
we
have
five
tools
to
stake
on
or
to
unstage
from
the
observation
space.
On
the
other
end,
that's
what
I
showed
why
I
showed
this
simulation
is.
D
We
need
to
take
care
of
how
how
on
what
kind
of
pool
will
the
optimizer
agent
stay,
calm
or
unstake
from,
and
that
is
dependent
on
how
good
or
how
bad
the
pool
is
behaving
or
the
pool
is
being
staked
upon.
So
it
needs
basically
the
amounts
of
ocean
being
staked
on
the
pool
to
get
some
sense
of.
D
I
really
spent
a
lot
of
time
setting
this
up.
As
I
mentioned,
there
are
not
a
lot
of
examples
in
the
stable
baselines
of
let's
say,
custom
environments,
because
you
need
for
this.
This
optimizer
agent.
You
need
some
sort
of
customized
environment
and
you
need
that.
You
need
to
set
that
up.
Quite
by
yourself.
D
D
So
this
is.
This
is
the
setup.
We
have
an
episode
length.
Sean
talked
about
you
train
in
episodes.
Well,
basically,
these
these
are
all
kinds
of
this
kind
of
stuff
is
quite
standard.
I
also
set
a
balance
so
for
in
order
to
stake
on
you
need
to.
You
need
to
have
some
some
some
ocean,
of
course,
or
another
cryptocurrency,
and
I
you
need
to
have
a
reset
function.
Basically,
this
you
can,
you
can
look
up
in
the
in
documentation
of
the
stable
baselines,
customized
environment
section,
and
what
I
did
is.
D
I
also
played
around
with
some
sort
of
a
chaos
monkey.
I
have
some
sort
of
a
randomized
action
that
is,
let's
say,
is
influencing
influencing
the
state,
but
first
of
all,
let's
see
how
we
define
the
state
after
a
step
function,
and
basically
I
say:
well,
I
take
the
action
space.
The
first
parameter
of
the
action
space
is,
of
course,
do
I
do
something
or
not:
do
nothing
stake
or
unstack?
That's
the
first
parameter
the
zero,
the
second
parameter.
The
one
is,
of
course,
which
pool
I
need
to
stake
on.
D
So,
if
you
imagine
that
the
state,
the
observation
space
is
actually
the
state-
and
these
are,
this
is
a
box
space
and
we
say
it
has
a
shape
of
five.
It
means
that
we
have
five
values
in
it
and
they're
continuously
growing
from
zero
to
a
hundred,
but
I
noticed
they're,
not
that's,
not
a
real
threshold
in
this
stable
base
lines
and
what
I
also
did
is
put
a
gradient
in
it.
What
do
I
mean
by
that?
I
need
to
have
some
sort
of
idea
of
okay.
D
How
should
that
strategic,
behavior
look
like
and
how
I
modeled
it
is
to
put
the
to
have
some
sort
of
a
gradient
of
the
observation
state
of
all
the
stake
tools.
So
basically
the
state
or
the
observation
space
is
being
modeled
by
five
pools.
D
Five
energy
wet
pools,
and
we
fill
this
up
with
an
amount
of
10
ocean
to
start
and,
of
course,
if
you
can,
if
you
are
going
to
stake
on
it
or
unstake,
these
values
are
going
to
change
and
the
the
manner
or
the
the
the
let's
say
the
rate
of
change
I
put
in
the
gradient.
So
basically,
if
this
is
this
state
is
growing
from
10
to
11.
Let's
say
we
have
10
ocean
and
in
the
next
step
it's
11
ocean.
D
So
if
the
gradient
is
big,
so
if
there's
a
big
change
in
it
and
that
would
that
could
be,
you
could
see
that
as
a
signal
of
hey,
this
pool
is
behaving
correctly
because
the
staking
is
really
increasing
fast.
D
So
the
reward
is
a
function
of
this
gradient
and
also,
if
you
unstack
from
it,
of
course,
if
the
gradient
is
negative,
so
people
are
taking
their
stake
away.
They're
unstaking,
it's
a
signal
of
hope.
This
pool
is
behaving
badly.
I
need
to
get
out
of
it.
So
basically,
it's
the
other
way
around.
The
reward
is
a
negative
if
you
unstake
on
it,
but
the
gradient
is
positive.
So
then
you
have
a
negative
reward,
but
of
course,
if
the
gradient
is
negative,
you
have
a
positive
reward.
So
you
do
the
right
thing.
D
A
Yeah,
I'm
definitely
following
the
intuition
and
it
seems
to
make
a
lot
of
sense
to
me
and
I'm
sure
there's
I
mean
this
is
really
a
space
for
creativity
in
modeling
the
reward
function.
But
for
me
I
would
yeah.
I
would
just
run
this.
It's
always
good
to
have
a
baseline.
A
Yeah,
well,
you
notice
this
about
reinforcement,
learning,
there's
a
lot
of
sort
of
rationalizing.
That
happens
because
there
is
in
the
very
middle
of
what's
going
on,
there's
a
black
box
right
at
the
end
of
the
day,
we
have
an
rl
algorithm
and
it
would
takes
a
lot
of
time
and
energy
to
sit
down
and
read
the
paper
behind
it
and
understand
all
the
math
and
and
really
think
about
what's
happening.
A
So
we
have
this
sort
of
black
box
in
the
middle
and
on
one
end
we
put
our
intuition
behind
the
reward
function
and
then,
on
the
other
end,
we
run
the
experiment
and
get
the
results
and
then
there's
this
sort
of
rationalizing
that
happens
between
what
we
expected
given
the
reward
function
and
then
the
actual
results
that
are
output
and
often
there's
surprises
so.
D
Yeah,
so
I'm
so
I
noticed
because
I
played
around
this
with
this
all
day.
I
I
should
mention
that,
because
I
couldn't
get
it
right
and
maybe
that's
maybe
that's
the
whole
idea
of
playing
around
with
it.
You
know
just
I
will
just
start
it
up.
You
can
see
it
here
in
the
control.
What's.
D
D
Actually
this
is
the
the
outcome
of
the
of
the
training
step,
a
mean
reward
of
negative,
so
doesn't
show
too
many
and
too
much
confidence,
but
I
need
to
let's
see
and
say
something
about
what
what
kind
of
what
kind
of
algorithm
I
used
for
trading
this
I
used.
The
ppo
2
model
doesn't
know
what
that
means,
but
I
took
your
ppo
algorithm
out
of
your
notion
page
as
sean.
D
So
I
guessed
this
was
a
some
sort
of
a
right
algorithm
to
take,
but
there
are
several,
and
so
you
can
you
can
do
as
you
like.
I
had
some
difficulties
in
saving
and
loading
the
model,
the
complaints
about
vectorizing
environments.
This
is
the
stuff.
I
really
need
to
dig
into
further,
but
then,
when
I
I
did,
I
could
use
the
model
without
saving
it.
Just
you
know
to
go
through
it
and,
let's
see,
do
some
10
steps
of
predictions.
D
So,
as
you
can
look
here,
this
is
something
you
might
have
noticed
out
of
the
presentation
sean
gave.
This
is
the
actual
prediction
of
the
action.
What
to
take
for
what
observation.
So
the
observations
are
here:
that's
the
state
and
it
takes
some
actions
according
to
the
gradient,
and
if
the
gradient
is
positive
for
some
pool,
he
needs
to
stay
calm,
then
he
will
stay
calm.
So
if
you
see
here
the
accident
space
in
step,
one
means
zero
is
no
action
taken
on
pool
one
with
a
fifty
percent
staking
amount.
D
D
So
basically,
this
is
how
I
put
the
the
simulation
and
the
put
together
the
the
stuff,
and
what
you
can
see
is
that
I
I
try
to
accumulate
the
the
reward
and
see
what's
happening
with
the
reward
and
it's
really
not
really
happening
for
10
steps.
But
maybe
if
we
can
increase
this
with
under
and
sorry
I
am,
I
have
to
train
it
again,
but
because
I
didn't
flesh
it
out.
D
A
Yeah
I've
seen
this
I've
seen
this
before
it's
somewhat
familiar,
there's
like
a
wrapping
and
unwrapping
I'm
just
looking
at
an
old
example
that
I
had
ran
on
a
project,
I'm
just
trying
to
get
it
running
kind
of
in
the
background
here.
A
So
I
don't
mean
to
take
away
too
much
attention,
but
I
do
remember,
there's
some
key
concepts
in
here
that
I
had
dealt
with
so
see.
If
I
can
get
this
running.
Oh
that
might
work.
D
D
Yeah,
I
also
said
okay,
if
you,
if
you
are
out
of
balance,
if
your
amount
of
ocean
is
getting
too
low,
you're
also
penalized
by
subtracting
the
reward
with
one.
So
basically
you
can.
I
did
the
most
of
the
modeling.
I
did
within
the
reward
function,
but
you
also
can
can
do
something,
of
course,
about
the
observation
space.
I
have
a
real,
simple
observation.
Space
now
of
you
know
five,
I
I'm
looking
at
five
energy
web
tools,
but
you
know
in
reality
there
could
be
hundreds
of
thousands
of
these.
D
These
things
and
the
reinforcement
learning
agent
just
need
to
pick
one
in
order
to
maximize
its
reward,
so
you
can
model.
Basically,
these
these
are
the.
These
are
the
tuning
parameters,
I
would
say,
of
your
reinforcement,
learning
agent,
so
the
action
space
and
the
the
observation
space
and
the
reward
function.
Of
course,
I
think
you
need
to
spend
a
lot
of
time
tweaking
around
the
reward
function,
but
you
don't,
as
I
recall,
sean
you,
you
shouldn't
be
too
specific
about
the
reward
function.
D
I
recall
because
otherwise
you
are
going
to
give
the
agents
too
many
clues
and
you
you
need
to
to
have
him
figure
it
out
by
himself
right.
A
Yeah
yeah,
exactly
because
essentially
a
reinforcement,
learning
algorithm
is
a
search
algorithm
through
a
very
large
search
space.
It's
like
a.
We
have
we're
generating
a
function
that
maps
actions
to
no,
it
maps,
observations
to
actions,
and
these
we
want
to.
A
We
want
to
do
a
comprehensive
search
of
that
search,
space
of
that
possible
function
and
if
we
pre-program
in
the
the
reward
function
to
be
too
specific,
then
we're
sort
of
constraining
the
search
of
all
possible
behaviors
of
all
possible
policies
to
something
that
we're
kind
of
imposing
our
own
bias
on,
and
you
know
maybe
sometimes
we
do
want
to
do
that-
to
search
like
a
local
area
or
a
specific
area
of
policies,
but
in
general,
reinforcement
learning
is
going
to
shine
when
it
has
this
sort
of
organic
search
through
diverse
policy,
a
diverse
policy
space.
A
So
it's
it
is
a
yeah
it's.
This
is
one
of
the
key
aspects
that
we
have
at
our
disposal
is
like
how
we
engineer
the
reward
function.
So
there
is
this
balance
of
like
not
being
too
specific,
but
and-
and
so
one
way
you
want
to
might
you
might
want
to
do?
This?
Is
you
could
code
up?
You
could
kind
of
save
this.
You
could
have
a
whole,
maybe
even
a
whole
file,
or
you
could
just
make
various
reward
functions.
A
You
could
have
a
whole
collection
of
them
and
then
in
the
actual
reward
function
here
you
could
just
sort
of
call
one
of
your
collection
of
reward
functions,
so
we
could
only
do
like
this
a
b
kind
of
testing,
but
what
I'm
thinking
about
here
off
the
bat
is
the
training,
because
how
much
training
are
we
running
on
this?
How
like
how
many
episodes,
for
example,.
D
C
A
Yeah,
so
that's
an
interesting
point
to
consider,
so
I
think
in
the
token
spice
modeling
it
comes
with
this
idea
that
one
time
step
is
one
day.
Was
that
right
or
was
it
a
month?
A
I
think
it's
one
day,
so
we
can
think
we're
running
this
simulation
for
a
hundred
days,
so
we're
theorizing
that
we
have
these
agents
and
they
can
stake
and
unstake
on
these
data
pools
and
we're
giving
them
100
days
to
do
so
and
getting
some
summation
of
rewards
over
those
100
days
and
then
we're
running
them
on.
Was
it
25,
000
episodes.
D
D
So
basically,
I'm
taking
here
the
episode
length
here
in
the
in
the
implementation
of
the
class.
Let's
see
here,
so
we
have
an
episode
length
of
standard
20
and
it's
done.
If
we
have
the
episode
length
here,
then
it's
done
of
course.
So
you,
you
just
train
for
100
time,
steps
each
episode.
I
think
20.,
no
okay,
yeah!
If
I,
if
I
don't,
if
I
don't
give
them
this
parameter,
you're
overwriting.
A
A
So
that'll
be
20.,
so
what
what
we
want
to
do
is.
A
C
A
And
then
100
and
then
500
and
then
1000
and
sort
of
plot,
the
performance
or
maybe
the
final
reward
against
how
many
time
steps
we
take,
and
then
you
can
start
to
get
this
idea
of.
Okay
are
more
time
steps
increasing
the
performance.
D
D
C
I
was
wondering
since
you're,
starting
from
scratch
here
with
not
that
much
heuristics
on
the
pools,
the
pool
sizes,
number
of
actors,
a
number
of
agents
and
so
on.
Is
there
a
best
practice
in
in
the
process
of
optimization
to
track
all
the
various
parameters?
You
are
changing
here
and
then
comparing
the
results,
since
I
think
it's
there's
kind
of
the
danger
to
go
all
over
the
place
and
don't
understand
the
results
anymore
or
are
not
that
sharp
on
interpretations.
A
B
A
Are
the
observation?
What's
the
observation
space
of
the
agents,
and
so
it's
like?
We
could
run
experiment
after
experiment.
We
could
tweak
things.
We
could
go
through
this
sort
of
rabbit
hole
of
running
so
many
experiments
and
then
at
the
end
of
the
day.
Well,
how
do
we
track
all
of
our
progress?
How
do
we
remember
that
it
was
important
for
the
agents
to
track
the
number
of
other
agents
or
the
number
of
pools
or
the
total
staked
like
all
these
different
ways,.
C
Or
do
it
vice
versa?
I
don't
know,
create
a
roadmap
and-
and
I
don't
know
best
practice-
follow
your
roadmap,
at
least
for
10
variations
until
you
change
your
roadmap.
Whatever
I.
A
C
Know
if
there
are
any
kinds
of
frameworks
or
best
practices
established.
A
Yeah,
I
don't
know
of
any
particular
not
like
a
framework
that
I
can
point
to,
but
I
think
it's,
the
general
scientific
method
and
I
think
what
we've
been
seeing
in
the
the
git
coin.
With
the
get
coin
methodology
with
danilo,
I
really
enjoy
how
he
starts
all
his
sessions
with
the
hackmd
file,
with
everything
laid
out.
A
You
know
the
experiment
is
designed
and
then
coded
and
then
ran
and
then
there's
this
review
process,
so
yeah
sort
of
taking
a
step
back
and
outlining
the
expectations
of
the
observation,
space
and
the
reward
function
and
the
agents
and
then
and
then
running
those
experiments
and
comparing
the
results.
So
I
think
we
do
see
this
when
you
look
at
the
literature
in
reinforcement
learning
there
is
this
process
there's
also
sort
of
a
more
programmatic
way,
which
is
like
a
grid
search.
A
You
know,
so
there
are
algorithms
like
hyper
or
or
meta
algorithms,
that
you
can
throw
on
top
of
your
reinforcement,
learning
algorithm
that
will
strategically
search
the
parameter
space.
You
can
define
how
this
is
a
lot
like
what
cad
cad
does
with
a
b
testing
in
the
system
parameters,
but
instead
of
just
a
b
testing,
there's
algorithms
that
will
actually
search
through
the
hyper
parameter
space
and
there's
often
the
class
of
algorithms
that
get
applied
to
this
is
evolutionary
inspired
by
evolution.
A
So
you
can
set
up
multiple
reinforcement,
learning
experiments
with
differing
hyper
parameters
and
then
combine
the
results
in
a
way
that
is
similar
to
that
of
like
evolutionary
traits.
So
you
can
randomly
you
can
randomly
scramble.
A
So
you
could
take
your
observation
space
or
you
could
have
your
set
of
parameters
that
you
can
can
be
included
in
your
observation
space
and
you
can
do
a
random
selection
over
them
or
maybe
you
could
do
100
random
selections
and
then
you
could
run
all
of
these
reinforcement,
learning,
simulations
and
sort
of
select
for
the
top
10
and
then
take
maybe
an
average
between
their
hyper
parameters.
A
So
there's
this
sort
of
automated
approach
and
then
there's
the
methodology
approach
and
I
think
both
are
very
important
because
of
the
complexity,
the
like
the
nature
of
the
complexity
of
these
problems.
It's
so
important
to
document
along
the
way.
I
I
just
this
is
speaking
from
experience,
how
many
thousands
of
hours,
I've
you
know,
had
the
most
beautiful
experiment
results
and
then,
after
all
night
of
hacking,
I
completely
forget
how
I
set
it
up
or
or
what
I
was
what
you
know
what
I
had
to
begin
with
so
yeah.
A
A
Sure
I'll
yeah
I'll
drop
a
link,
yeah
good
stuff
mark.
So
it's
awesome
that
you
got
this
running.
You've
got
this
reinforcement.
A
Cad
cad
inside
of
token
spice.
This
is
quite
the
the
feat
for
for
a
hacker,
very
inspiring
and
well.
D
A
Yeah
I'm
wondering
if
we
should
continue
these
lab
sessions.
The
series
we
can
do
some
chat
offline,
there's
nothing
really
rigid
blocked.
I
have
a
couple
ideas
of
of
slots
to
be
filled
with
the
labs,
like
some
more
work
with
one
hive,
but
it's
pretty
flexible.
A
So
I
think
maybe
now
that
we
have
this
foundation
and
that
you
have
here,
maybe
we
could
do
a
whole
lab
session
where
we
actually
just
program
on
this
and
try
different
reward
functions
and
maybe
do
some
of
this
experiment
methodology
that
we've
been
learning
from
the
git
coin
track.
D
Yeah,
it
would
be
nice
as
an
observant.
A
Yeah
yep,
that's
important
too
yeah,
okay,
everyone!
That's
the
hour,
so
mark.
Thank
you
so
much
for
this
amazing
two-part
series.
I
think
you've
blown
everyone's
minds,
you've
put
in
token
spice
inside
of
cad
cad
inside
inside
of
stable
base
lines.
It's
quite
amazing
and
I
think
it
would
be
awesome
to
continue
this
work
in
the
labs,
so
we'll
figure
out
how
we
can
set
that
up
and
yeah
and
thanks
everyone
for
coming
and
have
a
wonderful
weekend.