►
From YouTube: Multi canary release and load test
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
Let's
start
with
the
simplest
case
canary
is
a
technique
used
to
reduce
the
risk
associated
with
releasing
new
versions
of
software.
The
idea
is
to
first
release
a
new
version
of
the
software
to
a
small
number
of
users
and
then
gradually
iterate
through
the
upgrade,
for
example,
in
this
diagram
we
test
10
percent
of
the
traffic
first
then
gradually
move
more
traffic
to
the
new
version
and
finally,
the
old
version
is
cleared
and
taken
offline.
Throughout
the
testing
process,
we
can
label
the
traffic
with
various
business
tags
such
as
android
devices,
location
of
beijing,
etc.
B
Also
note
that
user
tags
should
not
use
it
addresses
which
are
inaccurate
and
inconsistent.
Then
we
can
specify
canary
traffic
rules
to
schedule
route
a
certain
part
of
the
user's
traffic
to
a
certain
canary,
for
example,
the
android
user
from
beijing
is
scheduled
to
the
2.0
canary
version
of
service,
a.
B
The
scenario
of
a
single
service
canary
is
still
limited.
In
reality,
it
is
more
common
to
have
full
stack
canary
testing.
For
example,
a
user
client
cannot
be
forwarded
directly
through
the
router
to
the
canary
version
of
the
service.
This
is
because
the
service
is
very
far
back
in
the
whole
chain,
separated
by
other
services,
as
shown
in
the
figure
we
have
published
canary
for
delivery
with
user
and
order
services
spaced
in
between.
In
this
case,
we
need
to
do
two
things
to
ensure
that
the
traffic
is
scheduled
correctly.
B
The
first
is
to
pass
through
the
user
tags
and
the
second
is
to
route
traffic
to
the
correct
version
of
the
next
service
at
any
end
point
of
the
chain.
If
you
consider
the
implementation
level
a
little
bit
here,
you
will
notice
that
there
are
two
categories
of
approaches
to
do
full
stack
canary
release,
either
by
changing
the
code
or
by
a
non-intrusive
platform
level
solution
and
changing
the
code
can
be
very
cumbersome
and
verbose
and
prone
to
bugs.
B
For
example,
there
are
now
two
services
order:
v,
1.0
and
email
v
2.0
the
service
order,
calls
the
service
email
and
the
service
email
uses,
the
third-party
email
provider
tencent
provider,
and
we
decided
to
add
some
information
to
the
order
entity
as
a
test
version
for
android
users.
Only
since
the
changes
to
the
order
entity
affect
both
services
which
need
to
be
changed,
we
added
order
v,
1.1
and
email
v
2.1
to
apply
this
change.
Then
another
team
decided
to
replace
the
tencent
provider
with
google
provider
in
the
email
service.
B
So
we
added
email,
v,
2.0
one
to
test
users
from
beijing
only
at
this
time.
There
is
a
dilemma
that
android
users
and
beijing
users
are
overlapping,
which
is
traffic
from
android
phones
in
beijing.
Considering
the
complexity
of
the
traffic
and
the
inconsistency
felt
by
the
users,
it
will
led
to
a
heavy
operation
burden.
B
Let's
take
one
step
back
and
analyze
different
cases
of
canary
release.
Let's
first
concentrate
on
the
figure
on
the
left,
for
example,
and
b
rely
on
z,
while
it
tests
android
traffic
and
this
testing
iphone
traffic.
They
test
two
different
user
groups
and
if
they
both
rely
on
stow
test
stakes
on
two
different
canary
traffic
and
will
become
a
source
of
confusion,
then
there
are
two
better
approaches
seen
in
the
figures
in
the
middle
and
on
the
right.
B
B
Even
if
the
canary
conflict
problem
is
solved,
there
is
still
a
problem
that
the
traffic
rules
may
overlap.
The
previous
example
is
the
user
traffic
of
android
and
iphone,
but
if
one
canary
tests
android
and
the
other
tests
beijing,
there
will
be
a
common
subset
of
traffic
rules
for
both
canary
releases.
B
B
B
So,
let's
use
an
example
to
demonstrate
all
the
problems
mentioned
before.
For
example,
there
is
a
back-end
service
stack
of
a
food
delivery
application.
The
service
consists
of
three
micro
services
order,
service,
restaurant
service
and
delivery
service
order.
Service
has
no
canary
restaurant
service,
has
two
canaries
first
canary
is
for
android
traffic
from
beijing,
and
second
canary
is
for
all
android
traffic
delivery
service
also
has
two
canaries
first
canary
is
for
traffic
from
beijing,
and
second
is
for
all
android
traffic.
B
B
Then,
on
the
other
hand,
the
traffic
with
android
user
tag
matches
the
routing
rules
of
android
canary
for
both
restaurant
and
delivery.
The
traffic
follows
the
blue
path,
but
what
happens
for
traffic
that
contains
both
android
and
beijing
tags?
It
matches
all
three
canaries
and
there
is
no
unambiguous
way
route
the
traffic.
B
B
A
simple
and
easy
way
to
solve
this
problem
is
to
specify
the
priority
of
canary
each
traffic.
Rule
has
a
number
indicating
the
priority
from
small
to
large,
as
you
can
see,
from
the
figure
traffic
beijing
and
android
matches
three
canaries,
but
because
restaurant
beijing
and
android
has
the
highest
priority
priority.
One.
The
red
canary
is
therefore
selected,
even
though
the
priority
solves
the
problem
of
multiple
matching.
There
is
still
a
problem
of
misused
configuration
traffic
shadow
problem.
In
this
example,
the
red
rule
is
shadowed
by
the
blue
rule,
as
blue
has
higher
priority.
B
Finally,
let's
explain
the
technical
details
of
isemesh
and
give
an
overview
of
how
we
implement
multiple
canary
releases.
First
of
all,
all
our
services
are
running
in
kubernetes
pods.
The
three
services
in
here
correspond
to
three
services
in
isemesh
and
even
different
versions
under
the
same
service
are
part
of
one
service,
so
a
mesh
service
will
have
multiple
versions
running
at
the
same
time.
In
order
to
route
the
traffic
to
correct
canaries,
two
things
need
to
be
accomplished.
B
First
thing
to
ensure
is
to
pass
through
user
tags
without
losing
any
user
information
throughout
the
service
chain.
This
involves
the
sidecar
and
the
business
application.
The
sidecar
naturally
knows
all
the
canary
traffic
rules
and
user
tags
such
as
some
specific
http
headers,
and
it
will
forward
them
with
the
traffic.
Also,
the
business
application
itself
needs
to
pass
through
the
user
tags,
which
can
be
done
by
our
officially
supported
javagent
in
cooperation
with
sidecar
and
does
not
require
user
awareness.
B
Sidecar
will
notify
the
javagent
to
pass
through
all
the
information
as
for
other
languages
such
as
golang,
since
there
is
no
ticket
technology,
only
a
simple
sdk
is
enough
to
forward
the
user
tags.
So
isamesh
also
supports
multiple
languages
in
this
advanced
feature.
As
long
as
the
user
tags
are
available.
Second
requirement
for
isa.
Mesh
canary
releases
is
traffic,
routing
all
components,
including
ingress
controller
of
isemesh,
and
the
sidecar
in
each
service
pod
have
the
ability
to
route
canary
traffic
to
the
next
service,
corresponding
canary
version.
B
You
can
see
that
all
the
service
components
in
this
figure,
whether
receiving
requests
or
sending
requests,
will
pass
through
the
sidecar
and
when
sending
requests
outbound
the
sidecar
observes
the
traffic
characteristics
and
decides
whether
the
traffic
needs
to
be
dispatched
to
one
of
canary
versions
of
the
next
service,
and
this
is
all
done
by
sidecar.
Without
the
involvement
of
agent
and
sdk.
C
C
C
C
C
B
B
B
A
Now
is
the
full
stack
stress
test
part.
The
topic
of
this
part
is
how
to
do
stress
testing
in
a
production
environment.
Today's
production
environment
has
become
very,
very
complex.
Just
like
the
picture
on
the
right.
There
are
many
components
in
it,
ranging
from
dozens
or
hundreds
to
thousands,
and
these
components
are
developed
by
different
development
teams
and
in
different
languages,
which
makes
the
communication
between
them
very
complicated.
A
No
one
can
tell
the
relationship
between
all
of
them.
The
complexity
from
a
technical
point
of
view,
makes
debugging
difficult.
In
addition,
the
business
has
also
changed
a
lot.
For
example,
during
the
black
friday
promotion,
the
traffic
pressure
on
the
online
shopping
systems
is
dozens
or
even
hundreds
of
times
higher
than
usual.
A
Now,
let's
look
at
the
problem
of
traditional
stress
test
methods.
The
first
is
to
build
a
test
environment,
identical
to
the
production
environment,
for
stress
testing
in
the
era
of
standalone
applications.
This
is
a
very
good
solution,
but
in
the
age
of
the
internet
there
are
at
least
two
problems.
The
first
is
money.
We
can
count
how
many
servers
there
are
in
our
production
environment
and
then
how
much
we
need
to
spend
to
buy
these
servers
and
that's
just
the
cost
for
servers.
A
The
cost
will
be
higher
when
counting
other
hardware.
Most
companies
should
not
be
able
to
afford
such
a
test
environment.
Even
if
duplicating
the
cloud
resources
for
the
test
environment
is
not
an
issue.
Is
it
enough
to
get
reliable
results?
I
think
the
answer
is
still
no,
because
it
is
difficult
for
our
test
environment
to
be
exactly
the
same
as
the
production
environment.
A
A
Therefore,
we
cannot
simply
use
simulated
data
for
testing.
The
second
point
is
the
proportion
of
different
users-
users,
like
me,
may
account
for
90
and
celebrities
may
only
be
one
in
hundreds
of
thousands.
Only
by
simulating
the
proportion
of
users
with
different
degrees
of
followers
can
we
get
a
reliable
test
result.
The
easiest
way
is
to
take
the
production
data
to
the
test
system
for
testing,
but
it
also
brings
the
problem
of
data
security.
The
production
data
generally
contains
a
lot
of
sensitive
information.
A
Because
of
these
issues,
people
turn
their
eyes
to
the
production
environment
and
try
to
use
the
low
traffic
period
of
the
production
environment
for
testing,
but
it's
also
a
huge
challenge,
because
it
is
an
intrusive
solution
that
involves
modifying
or
even
redefining
business
logic.
Let's
take
an
example,
assuming
it
is
an
online
shopping
system,
including
a
user
module
and
order
module
to
test
it.
We
need
to
modify
these
modules
first.
We
need
to
add
test
logic,
and
then
we
need
to
add
the
logic
to
detect
whether
we
are
in
a
test
or
not.
A
A
This
should
do
the
trick
when
the
request
comes
to
the
order
module.
We
may
still
want
to
use
the
user
id
to
determine
whether
the
test
logic
should
be
taken,
but
the
actual
situation
may
be
after
a
series
of
complex
processing
the
user
id
has
been
discarded,
so
the
order
module
cannot
see
it
at
all.
Then
how
to
write
the
judgment
logic.
A
The
second
question
is
how
our
test
logic
differs
from
production
logic,
it's
easier
for
us
to
think
about
accessing
different
data
sets
or
simulating
a
third
party
service,
such
as
payments,
because
we
don't
want
to
actually
spend
money
on
testing,
but
what
is
really
complicated
is
preparing
data
for
subsequent
components.
This
relates
to
the
first
problem.
That
is
because
the
order
module
cannot
see
the
user
id,
the
user
module
needs
to
mark
the
request
sent
to
the
order
module
so
that
the
order
module
knows
this
is
a
test
request.
A
However,
in
a
complex
system,
it
is
not
easy
for
the
user
module
to
know
all
the
modules
that
the
subsequent
process
will
go
through.
So
we
have
to
spend
a
lot
of
effort
to
ensure
the
test.
State
is
correctly
transmitted
between
modules
to
avoid
disturbing
the
production
logic.
Please
notice.
This
is
just
the
work
required
for
one
function
point,
and
there
are
thousands
of
function,
points
in
a
normal
system.
A
We
believe
that
the
key
lies
in
isolation,
which
is
to
isolate
the
production
system
and
the
test
system
from
the
four
dimensions
of
business
data,
traffic
and
resources
to
prevent
them
from
affecting
each
other
business.
Isolation
means
that
we
should
not
use
the
form
of
adding
conditional
judgments
to
decide
whether
to
use
production,
logic
or
testing
logic,
but
to
distinguish
them
clearly
from
the
beginning
data
isolation
means
the
same:
copy
of
data
cannot
be
accessed
both
by
the
production
system
and
the
test
system.
A
Traffic
isolation
means
that
normal
requests
and
test
requests
can
only
enter
the
corresponding
system.
The
resources
in
resource
isolation
mainly
refer
to
hardware,
for
example,
the
test
system
and
the
production
system
cannot
be
deployed
on
the
same
server
so
as
not
to
compete
for
hardware
resources
such
as
cpu
and
memory.
This
is
mainly
a
hardware
issue,
but
kubernetes
has
given
a
very
good
solution
at
the
software
level.
A
Let's
take
a
look
at
the
solutions
given
by
ease
mesh
first,
because
ease
mesh
is
implemented
based
on
kubernetes.
It
achieves
resource
isolation
with
the
help
of
kubernetes
for
business
isolation,
ease
mesh
can
replicate
existing
services,
except
for
adding
a
shadow
mark.
The
replicated
copy
is
exactly
the
same
as
the
original
one,
and
these
mesh
can
replace
the
connection
information
of
various
middleware,
including
miskal
kafka,
readies,
etc.
A
According
to
the
configuration
and
thus
change
the
target
of
data
requests,
thereby
realizing
data
isolation
when
creating
a
service
copy
ease.
Mesh
also
automatically
creates
a
canary
rule
to
forward
the
request
with
the
x
dash
mesh
dash
shadow
header
to
the
replicated
service
copy
as
a
test
request
and
forward
other
requests
to
the
original
service
to
achieve
traffic
isolation.
A
The
above
three
isolations
are
implemented
by
the
shadow
service
feature
of
ease
mesh.
It
should
be
noted
that
canary
is
also
a
feature
of
isa.
Mesh
the
canary
in
the
figure,
only
means
that
shadow
service
will
automatically
deploy
a
canary
rule.
In
addition
to
shadow
service,
we
also
need
another
feature
of
ease
mesh
to
make
a
full
stack
stress
test
possible
mock,
because
we
cannot
replicate
some
third-party
services
for
testing
such
as
the
payment
service
mentioned
above
we
need
to
mock.
It
now
take
a
look
at
what
will
be
demonstrated
today.
A
This
is
a
scenario
where
a
user
uses
a
coupon
we
can
find.
There
are
three
services
in
it.
The
first
is
coupon
service,
the
second
is
user
service
and
the
third
is
verification
code
service,
which
will
send
a
verification
code
to
the
user's
mobile
phone
and
coupon
service
user
service
has
their
own
database
middlewares.
A
The
entire
system
is
deployed
in
kubernetes
and
you
should
have
found
that
our
traffic
entry
is
mesh
ingress
and
there
is
a
java
agent
and
a
sidecar
with
each
service
in
the
system,
which
means
that
these
services
are
also
subject
to
the
management
of
ease
mesh.
The
java
agent
is
mainly
to
hijack
various
requests
sent
by
the
application,
including
both
http
requests
and
requests
to
middlewares
sidecart
is
implemented
based
on
easegress.
A
It
is
mainly
for
various
processing
of
traffic
and
also
for
things
like
service
discovery,
monitoring
and
tracing.
It
is
this
management
of
ease
mesh
that
makes
it
possible
for
us
to
hijack
various
requests
sent
by
applications
to
achieve
the
aforementioned
business,
isolation,
data,
isolation
and
traffic
isolation
for
stress
testing
in
this
system.
A
When
a
user
request
comes
in,
it
will
first
go
to
our
mesh
ingress,
then
to
the
coupon
service
and
the
coupon
service
will
send
a
request
to
the
user
service
to
verify
the
user's
identity
and
then,
if
it
passes
to
the
verification
code,
service,
send
a
request
to
send
a
verification
code
to
the
user.
So,
let's
look
at
the
steps
we
need
to
take
for
a
stress
test.
A
The
second
step
is
to
replicate
services
through
the
shadow
service
and
automatically
deploy
a
canary
rule.
As
we
can
see,
the
coupon
service
and
user
service
have
now
been
replicated
and
during
the
process
we
have
also
rewritten
their
connection
to
the
middlewares
through
the
sidecar
and
java
agent,
allowing
them
to
access
the
replicated
middlewares
instead
of
the
production
middlewares.
A
This
rewritten
can
be
done
through
the
configuration
of
the
shadow
service
or
through
the
confine
map
of
kubernetes
for
the
test
traffic.
We
will
add
an
x,
slash,
mesh,
slash
shadow
header
to
it.
Any
request
with
this
header
goes
to
the
replicated
services
according
to
the
canary
rules.
We
just
deployed
following
the
orange
lines
and
the
normal
user
requests
still
go
to
the
production
services.
A
That
is
follow
the
blue
lines.
Now
we
have
the
coupon
service
and
user
service
replicated,
but
haven't
the
verification
code
service,
because
it
will
eventually
call
a
third
party
service
to
send
the
verification
code
to
the
user's
mobile
phone.
Although
the
cost
of
each
verification
code
message
is
not
very
high.
If
we
send
a
lot
of
requests
in
the
test,
it
is
also
a
big
cost.
A
Therefore,
we
hope
not
to
send
the
verification
code.
This
requires
the
mock
feature
we
mentioned
just
now
to
mock
the
verification
code,
instead
of
replicating
it
directly.
Generally
speaking,
we
need
to
mock
services
like
payment,
because
their
implementation
is
complex,
involving
various
verifications
and
encryptions
which
make
them
difficult
to
mock.
Therefore,
we
need
to
make
a
service
in
our
system
to
wrap
these
third-party
services,
because
these
wrapper
services
are
inside
our
system.
We
can
make
the
interface
simpler
by
saving
a
lot
of
security
verification.
A
A
A
We
can
see
that
the
output
on
both
sides
is
exactly
the
same
in
while
I
will
also
show
the
topography
generated
by
our
mega
cloud
system
from
the
graph.
We
can
also
see
that
the
processing
process
of
the
two
requests
is
exactly
the
same,
but
because
mega
cloud
requires
a
little
time
to
sync
data.
Let's
take
a
look
at
the
content
of
these
two
scripts.
First.
A
We
can
see
these
two
scripts
are
exactly
the
same,
except
that
the
right
side
carries
the
x-mesh
dash
shadow
header
when
sending
each
request.
These
two
scripts
execute
the
get
token
at
the
beginning,
because
the
demo
system
requires
a
user
to
log
in
first.
After
getting
the
token,
they
start
sending
the
get
coupon
request.
A
A
A
A
Now
I
will
deploy
the
shadow
services.
Please
note
in
the
slides
we
say
replicating.
The
middleware
is
the
first
step,
but
for
this
demonstration
I
prepared
the
middleware
replicas
in
advance
and
in
order
to
show
the
difference,
I
revised
the
replicated
data,
but
in
practice
we
can
just
replicate
the
production
data
directly
without
any
modification.
A
Now,
let's
create
the
shadow
service
just
run
the
m
control
apply
command.
We
can
see.
It
says
that
both
the
coupon
shadow
service
and
the
user
shadow
service
have
been
created
successfully
now
run
the
cube
control
command.
Again,
we
can
see
that
there
are
two
more
pods
in
the
system,
namely
coupon,
shadow
and
user
shadow,
and
if
we
run
the
m
control,
get
shadow
service
command
again,
we
can
also
see
that
there
are
two
more
shadow
services
in
the
system.
A
A
As
we
can
see,
there
are
two
shadow
services.
The
first
one
is
named
coupon
shadow
service
and
the
second
one
is
user
shadow
service,
with
your
shadow,
copies
of
coupon
service
and
user
service
respectively,
and,
as
mentioned
before,
our
service
supports
rewriting
the
configuration
of
the
middleware
directly.
We
can
also
see
this
from
this
yaml
in
the
spec
of
each
shadow
service.
We
have
rewritten
the
connection
information
for
miscall
and
readies.
In
this
way,
we
replace
the
middleware
access
by
these
two
shadow
services.
A
A
A
Now,
let
me
refresh
the
page.
We
can
see
that
there
are
some
gray
nodes
in
the
system
which
are
the
replicas
of
the
original
service
and
middleware,
including
coupon
service
user
service,
miskal
and
reddies,
and
the
middlewares
being
accessed
by
the
two
replicated
services
are
also
the
replicated
ones.
The
only
problem
now
is
that
these
two
coupon
services,
the
original
coupon
and
the
replica
both
access,
the
same
verification
code
service,
because
we
haven't
mocked
the
verification
code
service.
Yet
let's
do
it
now.
A
The
m
control
apply
command
again.
This
mocks
the
verification
code
service.
Now,
let's
execute
the
command
with
shadow.
Again,
you
can
see
that
the
verification
code
becomes
a
b
c
d
and
when
executing
the
command
without
shadow,
the
verification
code
is
still
123
456..
Let's
take
a
look
at
the
content
of
the
yaml
file.
A
A
A
A
A
A
Back
to
slides
what
advantages
does
our
shadow
service
have
over
traditional
testing
methods?
I
think
there
are
five
points.
First,
zero
code
changes.
Everything
is
done
through
configuration,
no
code,
modification
is
required
and
no
new
bugs
second
low
cost.
In
the
case
of
using
a
cloud
server,
the
hardware
resources
used
for
testing
can
be
applied
before
the
test
and
released
after,
and
we
only
need
to
pay
for
the
actual
usage
period.
A
Third,
clean
environment,
except
for
a
few
services
that
are
mocked.
The
test
system
is
completely
consistent
with
the
production
system,
which
avoids
errors
caused
by
differences
in
business
logic
to
the
greatest
extent.
Fourth,
true
data:
the
data
of
the
test
system
and
the
production
system
are
completely
consistent,
which
ensures
the
reliability
of
the
test
results.
Fifth
secure.