►
Description
02:00 Jupiterflow Demo with HongKun Yoo
14:30 Argo Dataflow preview with Alex Collins
A
Okay,
is
everybody
ready?
Am
I
ready,
I
hope
so
good
morning.
Welcome
to
the
june
argo
workflows
and
events
community
meeting,
I
hope
everybody's
very
excited
at
the
moment,
they're
all
going
to
be
able
to
get
out
of
our
houses
and
go
and
get
back
on
with
their
lives
and
enjoy
a
bit
of
the
summer
sunshine.
A
I
know
I
know
that
I
am
we're
going
to
have
two
demos
stroke
previews
today,
one
will
be
from
now
correct
me
if
I
pronounce
this
incorrectly
from
hong
kong,
who
is
going
to
demonstrate
a
little
piece
of
software
he's
written,
called
jupiter
flow,
which
is
kind
of
one
of
the
really
interesting
bits
of
software.
We've
discovered
inside
the
argo
workflows
ecosystem
and
I
always
really
enjoy
things
that
build
on
top
of
our
go
workflows,
I'd
love
to
see
what
people
are
doing
with
it.
A
So
we
can
understand
the
kind
of
different
directions
that
things
like
data
processing
and
machine
learning
are
going
into,
and
then
I'm
going
to
give
a
demo
of
a
proof
of
concept
or
preview.
Alpha
piece
of
software
in
our
go
labs
called
argo
data
flow.
That's
oriented
around
data
processing
and
we'll
probably
I'm
hoping
will
be
quite
interesting
to
anybody
using
argo,
workflows
or
argo
events,
so
we're
actually
not
gonna
you're,
not
gonna,
see
very
much
in
the
way
of
argo
workflows,
nargo
events
at
the
moment.
A
A
Go
for
it
I'll
stop
sharing
my
screen,
so
you
can
take
over.
B
Okay,
first
of
all,
thank
you
for
having
me.
I
want
to
share
you
about
jupiter
flow,
a
better
way
to
scale
your
ml
job.
My
name
is
hong
kun,
yu
and,
and
my
targeted
audience
is
as
follows,
and
first
of
all
please
understand
my
english,
because
english
is
not
my
first
language.
B
Okay,
the
target
audience
is
follows
for
data
science,
who
are
not
fully
familiar
with
kubernetes,
but
want
to
use
the
power
of
kubernetes
and
for
envelopes,
engineers
who
want
to
provide
an
efficient
ml
environment
and
for
analysis
looking
for
a
better
ml2.
So
I
want
to
introduce
ml2
for
data
scientists.
B
My
name
is
hungunyu
and
I
run
a
blog
post
named
coffee
whale
mainly
on
machine
learning
and
kubernetes,
and
I
work
full
at
line,
which
is
a
messenger
widely
used
in
asia,
including
japan,
taiwan
and
thailand,
and
I
work
for
as
a
data
platform
engineer
based
on
kubernetes
before
I
start
start.
This
is
not.
This
project
is
nothing
related
to
my
company
and
it
is
totally
my
personal
project
and
I
do
not
speak
on
behalf
of
my
company.
B
Okay,
everyone
knows
that
kubernetes
is
great
and
it's
very
useful
for
ml
projects
yeah
it
handles
model
management,
node
management
and
job
skill,
scheduling
and
resource
management
and
monitoring.
It's
very
good
and
every
loves
it.
So
a
lot
of
people
use
argo
workflow
for
data
pipeline
or
machine
learning
pipeline,
but
I
think
there
is
no
free
lunch.
B
Not
everyone
is
happy
with
kubernetes,
especially
for
data
science,
who
are
not
very
familiar
with
software
skills.
So
I
come
up
with
two
main
reasons
of
this
issues.
First
of
all,
it
is
containerization
and
second
writing.
Manifest
file
was
a
little
difficult
for
more
details.
What
is
containerization
already
you
know
about
this,
but
I
want
to
emphasize
that
every
time
I
update
my
machine
learning
model,
I
have
to
rebuild
my
container
image
and
push
it
again
and
run
the
container
it
was.
B
I
know
how
to
do
it,
but
it
was
really
tiresome.
It
was
every
time
I
have
to
do
it.
I
think
about.
Is
there
a
better
way
to
do
this
and
another
thing
is
yamang
hell
as
every
time
it's
not
every
time,
but
I
have
to
write
a
yaml
code
for
running
my
ml
job,
and
maybe
somebody
is
who
is
familiar
with
this
might
feel
easy,
but
for
data
scientists
won't
be
very
easy
to
learn
in.
They
need
to
learn
about
kubernetes
and
very
detailed
stuffs
about
it.
B
There
is
also
unicorn
that
who
can
actually
are
good
at
data
science
and
software
engineer,
but
not
everyone
is
that
unicorn.
So
I
want
to
introduce
to
peter
flo
that
you
can
run
your
ml
job
right
away
on
kubernetes.
B
So
I
prepared
a
live
demo,
so
you
can
go
into
this
website
and
do
it
yourself.
I
will
show
you
in
my
computer:
can
you
see
my
jupiter
health
page
yep?
Okay,
I
will
share.
B
Github
account,
so
this
is
just
a
regular
jupiter
hub,
jupiter
half
platform
and
first
you
have
to
install
jupiter
flow.
This
is
this
is
not
all
you
need,
but
this
is
the
other
thing
that
the
scientist
needs
to
do
it
oops.
B
You
can
also
run
a
script
like
there
is
a.
I
made
a
train.pi.
This
is
a
regular
mnist,
keras
training
file,
it's
a
just,
a
a
simple
training
script
and
you
can
run
it
on
your
jupyter
lab.
B
It's
worked
well,
but
in
a
lot
of
cases
we
want
to
scale
our
machine
learning
job,
so
we
want
to
run
this
script
on
kubernetes.
So
how
could
we
do
that?
We
just
run
jupiter
flow
at
jupiter
flow
run
and
then
write
python,
train,
dot,
pi
and
then
the
actual
code
will
be
appearing
in
jupiter
flow.
B
B
The
jupyter
flow,
the
argo
workflow
starts
and
then,
after
finish,
there
will
be
a
mnist
file
in
my
jupiter
lab
and
also
you
can
write
a
more
complex
that
bag
workflow.
B
So
if
you
have
three
commands
hello
world
again-
and
this
is
similar
with
the
airflow
dependency
expression,
so
you
run
the
first
job
and
the
second
and
the
first
job
and
the
third,
so
you
run
jupiter
flow
run
and
with
f
option
workflow.yaml,
and
then
you
can.
B
You
can
run
more
complex,
workflows.
B
This
is
it
so
you
can
run
like
this
kind
of
more
complex
stuff,
so
this
is
it
about
jupiter
flow.
So
I
think
jupiter
flow
is
an
interface
of
kubernetes
for
data
scientists,
so
data
scientists
originally
need
to
use
kubernetes
directly,
but
if
you
use
jupiter
flow,
this
jupiter
flow
will
translate
your
machine
learning
code
to
kubernetes
argo
workflow
yaml
file,
so
I
think
jupiter
flow
is
some
sort
of
translator
for
data
scientists.
B
So
oh.
This
is
why
I
started
this
project,
so
I
I
also
run
machine
learning
code
and
it
was
a
great
idea
for
using
kubernetes
because
it
handles
scaling,
node
management
and
scheduling
all
kinds
of
good
stuffs,
but
using
directly
kubernetes
and
kubeflow
was
a
little
bit
tiresome
for
me
because
every
time
I
change
my
code,
I
have
to
rebuild
my
image
and
I
have
to
run
my
code,
so
that
was
little
troublesome,
so
I
wanted
a
better
way
to
run
my
model
for
efficiency.
B
So
whenever
I
write
my
code
on
jupiter
lab,
I
want
to
run
right
away.
So
this
is
why
I
made
jupiter
flow
so
to
wrap
up.
This
is
my
personal,
open
source
project
and
it's
early
stage
in
development,
and
it's
has
bugs
and
lack
of
features,
but
I
think
there's
still
no
defecto
standard
ml2
in
this
field,
maybe
kubernetes,
but
I
think
current
maybe
could
be
flow,
but
I
think
kubeflow,
it's
slightly
difficult
for
me
to
run
very
lightweight.
B
So
I
think
jupiter
flow
has
a
great
strength
and
opportunity
in
this
area
in
training,
machine
learning
models.
So
this
is
the
blogs
that
I
the
source
of
my
presentation.
B
A
A
I
certainly
have
a
question.
Okay,
I
want
to
know,
do
you?
Is
it
written
in
python
and
does
it
transpile
the
does
it
use
a
transpire
to
convert
it
into
workflow
yaml
under
the
hood.
B
Yes,
it's
yes,
it's
reading
python
and
it's
translate
it
to
yaml
file
and
it
throws
the
yaml
file
to
the
kubernetes
using
the
kubernetes
python
sdk
nice,
nice.
B
Actually,
I
have
a
architecture,
it's
it's
really
simple,
but
so
there
is
a
jupiter
half
and
there
is
the
argo
and
if
you
fetch
the
pot
spec
in
jupiter
flow,
we
can
get
the
image
and
the
storage
volume
so
jupiter
flow.
Just
do
this.
Just
fetch
this
information
and
build
it
to
the
build
the
yama
file
and
throws
to
the
kubernetes
and
the
argo
workflow
controller
run
runs
the
rest
of
it.
Cool,
okay,.
A
A
And
does
that
say,
argo
dataflow?
I
can't
see
what
it's
sharing.
A
Let's
try
and
I'm
gonna
do
that
again,
because
I
don't
trust
it
there
we
go
there
we
go.
The
green
line
has
appeared
around.
Can
you
guys
see
my
slides,
yeah
excellent?
Thank
you
barna.
So
today
I'm
going
to
talk
a
bit
about
a
a
new
project.
We've
been
working
on
to
help
build
out
a
solution
for
some
of
our
internal
needs
at
the
company
that
we
work
for
the
core
team
that
works
for
and
the
and
the
project's
called
argo
data
flow.
A
A
A
A
This
is
quite
a
complicated
example,
but
in
this
example
there
is
a
container
writing
data
to
a
nats
streaming
subject.
That's
then
read,
and
then
it's
run
through
two
filters,
one
that
filters
out
cats
and
one
that
filters
out
dogs
and
written
to
another
nats
streaming
subject
and
finally,
there's
some
processing
and
those
the
processing
on
the
cats
and
dogs
is
different
and
that's
written
to
an
output
topic.
That's
pretty
common
you're
reading
from
it
from
a
data
source
right
into
data.
A
Sync
optionally,
you
can
write
from
you
know
more
than
one
sources
and
you
can
write
read
from
more
than
one
sources
and
you
can
write
to
more
than
one
sync,
so
it
allows
you
to
do
kind
of
fork,
join
processing
as
well
on
those
items
of
data,
the
blue
icons
on
this
I
don't
have.
We
don't
have
quite
a
name
for
them,
but
you
can
call
them
a
processor
and
we
have
a
number
of
processors
out
box.
We'll
talk
about
that
in
a
second
okay.
So
what
are
the?
A
What
are
the
currently
supported
sources?
So
we
can
have
a
cron
schedule
as
as
a
source
which
produces
an
item
of
data
every
you
know
minute
or
two
minutes,
a
kafka
topic
and
that's
streaming
subject,
which
is
basically
the
same
as
a
topic,
both
of
which
are
durable
or
a
http
endpoint.
So
you
can
put
a
http
service
in
front
of
a
step
within
your
pipeline
and
consume
data
from
that,
and
then
you
can
go
through.
A
We've
got
several
built-in
operations
that
you
can
use
filter
map,
which
I
I
don't
think
I'll,
even
explain
those
two
for
you,
I'm
sure
you
can
guess
what
they
are
and
flatten
expand,
which
turn
out
to
be
popular
operations,
of
flattening
a
large
structured
data
down
to
key
value
pairs
and
expanding
key
value.
Pairs
back
up
to
structured
data
quite
popular
after
makes
it
easy
to
process
data
grouping
so
grouping
data
as
it
comes
in
and
then
emitting
single
chunks
of
data
that
have
been
grouped
together.
A
A
A
container
just
runs
a
container
image
that
you've
specified.
Maybe
that
could
be
called
image
and
a
handler
allows
you
to
actually
just
write
your
code
directly
into
your
pipeline
and
allows
you
to
code
it
in
the
yam
and
it'll
it'll,
compile
that
code
for
you
and
run
it
for
you,
there's
no
need
to
actually
build
and
publish
an
image
which
I
think
we
all
we
all
know
is
a
bit
of
a
pain.
A
Oh
and
then
we
sync
it
to
pretty
much
the
same
kind
of
places
you
you
know
the
sources,
so
kafka,
nats
or
http
endpoint.
Also
a
log
sync.
Anybody
who's
familiar
with
argo
events,
will
know
that
isn't
there's
a
sensor
called
sorry
sensor.
I
don't
think
I
mean
sensor,
I
mean
something
else.
A
Yeah,
a
sensor
sensor,
there's
a
sensor
called
log
which
allows
you
just
to
write
those
messages
which
is
intended
for
debugging,
because
obviously
one
of
the
challenges
in
a
kind
of
distributed
system
with
a
lot
of
messages
is
you
need
to
be
able
to
trace
your
messages
through
the
system,
and
then
you
can
scale
your
processes
up
by
using
either
hpa.
So
if
you
want
to
scale
up
and
down
using
cpu
or
memory,
that's
one
option
scale
them
manually
or
scale
them
based
on
the
number
of
pending
messages
in
the
queue.
A
A
Okay,
so
you
can't
really
get
away
from
yaml
if
you're
in
the
cloud
native
universe.
So
this
is
an
example
of
a
pipeline
specified
in
yaml.
A
It
contains
some
pretty
much
conventional
metadata,
somewhat
inherited
from
argo
workflows
that
allows
people
to
describe
their
pipeline,
who
owns
it
so
who's
responsible
for
if
it
goes
wrong,
you
know
a
lot
of
people
have
trouble
determining
if
a
customer
who
who
owns
a
particular
resource
with
inside
kubernetes,
so
they're
an
out-of-the-box
annotation
for
that
and
add
above
description
as
well,
and
then
you
can
see
that
this
particular
specification.
A
Stan
is
a:
u
is
a
acronym
for
nat
streaming,
so
it's
nat's
backwards
and
then
a
second
step
called
b,
which
reads
from
the
subject
of
rights
to
an
apple
topic,
but
nobody
likes
that
so
we've
written
a
nasant
python
library
that
you
can
use
to
write
your
python
in
a
kind
of
builder
format.
So
this
is
not
actually
not
the
same
pipeline.
A
This
contains
two
steps
here,
both
reading
from
a
cron
schedule,
passing
to
a
handler,
which
is
a
python
function
specified
actually
in
the
source
code
here
and
if
you
use
kubeflow
you'll,
probably
recognize
this
kind
of
way
of
doing
things
of
having
a
pointer
to
a
function
and
then
a
second
step
which
does
kind
of
the
same
thing,
and
this
one
actually
also
showcases
a
retry
policy
and
I'll
come
back
to
that
shortly.
A
Okay,
so
it's
python,
so
you
can
use
it
in
a
jupyter
notebook.
This
again
is
very
new,
I'm
kind
of
very
interested
to
get
people's
feedback
on
how
they
might
want
to
author
their
pipelines.
This
this
decision
has
been
based
on
the
fact
that
we
know
that
you
know
people
don't
really
like
gamble.
They
want
to
write
things
in
python.
A
That
came
back
very
strongly
in
the
survey
earlier
this
year,
and
so
here
is
an
example
of
running
a
pipeline
in
a
a
jupiter
notebook,
and
we
also
contain
some
kind
of
prometheus
metrics
out
of
the
box,
so
each
step
within
your
workflow
sorry,
your
data
flow
pipeline
will
emit
messages
such
as
the
number
of
messages
going
through
the
rate
which
they're
going
through
the
numbers
are
in
flight
and
the
number
of
replicas.
So
you
can
you
can
easily
monitor
your
system
to
make
sure
it's
processing
it.
A
This
is
especially
useful
if
you're
processing
pipeline
uses
downstream
systems,
so
she
has
to
reach
out
to
another
service
to
get
some
kind
of
data,
so
this
will
help.
You
understand
that
and
you
can
also
look
at
pending
messages
and
other
things
to
see
if
data's
kind
of
building
up
in
your
system
and
build
very
standard,
wavefront
or
grafana
dashboards
with
you
know
very
standard
alerting
on
that.
So
you
know
when
things
go
wrong.
A
Basically,
a
quick
start
gamble
that
you
can
apply
into
a
namespace
called
argo
dataflow
system
and
that'll
create
the
controller
whose
responsibility
is
will
be
to
execute
the
pipelines
and
by
default
we
provided
basically
a
namespace
scoped
install
so
I'll
just
go
into
a
single
namespace
it'll,
only
listen
to
pipelines
and
steps
created
in
that
namespace
and
then
there's
a
user
interface
that
comes
with
it
as
well.
A
Maybe
familiar
to
some
of
you.
I
don't
know
if
you've
seen
this
particular
user
interface
style
before
there's,
basically
a
new
option
on
the
left
hand,
side
above
the
event,
option
for
pipelines
and
lists
all
the
pipelines
and
then
we're
just
going
to
I'm
just
going
to
go
through
now
and
just
the
pipelines
are
all
provided
in
a
series
of
examples.
So
it's
quite
easy
to
go
through
them
and
the
example
startup
101
for
easy
examples
and
go
up
to
301
for
advanced
examples,
kind
of
showing
you
each
of
the
new
features.
A
A
So
you
can
have
a
look
at
the
python
that
produces
this
particular
pipeline
and
this
particular
one
said
released
from
a
cron
schedule:
cats,
the
output,
so
it's
an
identity
map
operation,
then
writes
it
to
a
log,
and
this
will
then
be
represented
in
the
user
interface
here,
showing
the
sources
it's
reading
from
and
the
places
it's
right
into
now.
Cron
and
log
are
quite
useful
for
experimentation
because,
of
course,
kafka
is
typically
quite
heavyweights,
as
and
and
that's
just
kind
of
moderately
moderately
heavy
weighted.
A
So
if
you're
experimenting,
this
can
be
quite
useful
to
help
you
if
you
just
click
on
the
particular
step
within
the
pipeline.
You've
got
a
couple
of
different
tabs
containing
some
kind
of
useful
information
here
and
the
first
one
just
contains
an
overview
of
the
status
telling
you
how
many
replicas
you're
running
in
last
time
this
this
particular
step
was
scaled
up.
A
It
also
contains
some
information
for
each
of
the
sources
and
sinks,
giving
you
the
total
number
of
messages,
the
the
current
messages
per
second
transactions
per
second,
and
in
an
example
of
the
recent
message,
you
can
kind
of
see
what's
going
through
a
particular
step
and
the
same
the
same
with
the
sinks
here
as
well,
you
get
a
bit
of
a
similar
kind
of
information,
typically
with
a
cat
operation.
You
might
expect
the
number
of
messages
to
be
the
same.
A
You
can
also
have
a
little
look
at
the
logs
here,
so
there's
a
couple
of
tabs
for
the
different
types
of
logs.
So
you
can
see
this
is
the
main
main
container.
With
inside
this
step
and
like
an
inargo
workflow,
you
have
a
weight
container
sidecar.
This
has
a
cycle
called
sidecar
whose
responsibility
is
for
reading
and
writing
messages
to
and
from
the
topics.
A
A
Let's
go
back
to
I'm
just
going
to
go
back
here,
I'm
just
going
to
go
back
to
the
examples,
as
I'm
just
going
to
talk
a
little
bit
about
the
examples.
This
second
one
is
an
example
of
an
a
pipeline
with
two
nodes
or
two
steps
in
it
just
going
to
make
sure
that's
created.
A
A
This
one
reads
from
a
kafka
topic
perform
some
kind
of
processing,
writes
it
to
a
net
streaming
subject
and
then
writes
it
to
an
output
topic
and
the
status
on
this
will
just
show
you
you
can
see.
This
is
currently
great
because
it's
waiting
to
actually
schedule
this
particular
pipeline,
because
my
cluster
pro
doesn't
have
enough
space
to
do
it
at
the
moment.
A
We
talked
a
little
bit
about
things
like
filter
flat
and
expand
and
map
I'm
just
going
to
dive
into
filter,
which
shows
your
filter
operation.
This
basically,
this
particular
pipeline
will
filter
to
only
include
messages
that
contain
the
word
capybara,
which
is
a
type
of
small
rodent.
I
think
and
read
them
from
a
catholic
topic
and
writes
now.
A
This
expression,
syntax,
is
the
same
expression,
syntax
we
use
quite
commonly
on
argo
workflows
now,
so
that
should
be
familiar
to
a
lot
of
people.
A
Let's
just
go
back
here
as
well,
and
then
you
know
we
talked
about
flattening
expanding,
so
flattened
down
to
you,
know,
dot,
saturated
key
values
and
expanding
out
of
it.
Then
a
map
operation,
so
this
one
I'll
just
I'll
dive
into
this
one
as
well.
This
one's
a
map,
operation
and
basically
prepends
the
string
hi
to
the
message
that
comes
through
the
pipeline.
A
Now
I
haven't
talked
about
the
format
of
messages.
The
the
lowest
common
denominator.
Message
format
is
a
byte
array,
so
nat
streaming
only
uses
byte
array
doesn't
have
the
ability
to
add
metadata
in
the
way
that
kafka
does
or
even
http
requests.
So
messages
are
all
currently
byte
array.
A
One
question
we
don't
know
at
the
moment
is:
if
we
need
to
have
some
kind
of
internal
format
for
messages
to
support
something,
for
example
like
the
cloud
events,
message,
format
or
something
else
that
allows
us
to
add
additional
metadata
to
to
each
meta,
which
one
so
interesting
to
see
what
people's
thoughts
on
that
then
we
have
an
auto
scaling
pipeline
example
of
the
auto
scaling
here.
So
the
way
that
scaling
works
out
of
the
box
is,
you
can
define
a
replica
ratio,
I.e
the
number
of
replicas.
A
We
should
be
running
for
each
n,
pending
messages.
So,
for
example,
if
your
ratio
is
500
because
you
think
you're
able
to
process
500
messages
a
second,
then
you
might
have
a
ratio
of
500
and
then,
when
the
number
of
pending
messages
goes
to
a
thousand
you'll
be
wearing
two
replicas
one
thousand
five
hundred
three
replicas,
two
thousand
four
applicators
and
so
forth
up
and
into
into
a
bounce.
But
you
can
also
it
just
implements
the
standard
scaling,
that's
used
by
hba,
so
you
can
also
just
use
hpa
based
scaling.
A
If
you
want
to
scale
it
using
something
more
sophisticated,
which
is
typically
more
complicated
to
set
up,
because
you'll
need
to
install
probably
metric
service
and
hpa
as
well,
so
that
may
be
more
complicated
than
a
lot
of
people
need
by
default.
A
So
this
is
a
python
handler
and
basically
you
could
define
the
code
that
you
want
to
run
in
terms
of
a
handler
function
and
just
a
runtime,
and
what
that
will
do
is
it
will
build
and
compile
that
code
for
you.
So
this
is
suitable
for
very
simple
use
cases
where
you
can
probably
write
the
code
that
you
want
in
20
or
30
lines,
and
you
don't
have
any
significant
external
dependencies.
Your
every
dependency
is
kind
of
out
of
the
box,
and
this
is
probably
familiar
to
people.
Who've
used
aws
lander
as
well.
A
Then
we
have
a
git
option
and
these
are
both
intended
to
address
the
difficulty
of
having
to
build
your
own
image.
The
git
option
actually
will
check
your
code
out
of
git
and
run
it
with
a
particular
image.
So
this
one
here
checks
out
this
particular
repository
checks
out.
The
subpath
example
slash
git
on
branch
main,
because
your
branch
is
called
main
these
days
and
then
we'll
actually
run
that
with
inside
this
particular
image-
and
I
think
it's
usually
pretty
interesting
for
us
to
have
a
look
at
what
this
contains.
A
Slow
and
I'm
not
even
on
the
corporate
vpn
today,
so
I'll
have
to
wait
for
it
to
load.
Here
we
go
so
here's
an
example
of
a
git.
Basically,
this
allows
you
to
provide
something
very
similar
to
a
docker
file
here
and
in
this
example,
I'm
basically
providing
an
entry
point,
which
is
the
code
to
run.
I
need
to
provide
a
handler
function,
as
you
mentioned
before,
and
then
a
main
function,
which
is
kind
of
copy
and
paste
code.
A
It's
possible
to
run
a
pipeline
to
completion
or
not
to
completion,
so
there
are
two
ways
to
run
a
pipeline,
and
so
one
that
runs
the
completion
would
be
one
where
you're
just
processing
a
kind
of
a
finite
amount
of
data,
and
the
way
that
you
run
to
completion
is
simply
by
exiting
zero
in
your
container
just
to
indicate
you've
finished
and
in
a
a
pipeline
that
runs
to
completion,
you
can
actually
mark
it
as
a
you
can
mark
steps
as
a
terminator,
so
in
a
particular
step
within
your
pipeline
runs
to
completion
that
the
whole
pipeline
is
terminated.
A
So
if
you've
got
three
or
four
steps,
you
might
have
the
steps
passing
information
between
one
another
or
from
the
first
step
through
to
the
last
step,
and
if
the
last
step
exits,
then
the
whole
pipeline
will
then
be
shut
down
for
you
automatically
or
you
can
have
them
run.
If
you're
processing
an
infinite
unbounded
stream
of
data,
then
obviously
your
pipeline
would
run
add
in
for
an
item,
and
we
talked
a
little
about
containers
I'll
skip
over
that
as
well.
A
Simply
you
can
specify
the
container
that
your
pipeline
runs
and
then
I've
got
a
couple
more
sophisticated
examples
of
a
veterinary
and
a
word
count,
one
as
well.
So
again
these
can
be
seen
in
the
user
interface.
So
here's
the
here's,
the
go
one.
You
can
see.
That's
running
there
processing
a
small
number
of
messages
at
the
moment
and
you
can
have
a
look
at
some
of
the
more
complicated
ones.
This
is
an
error,
error,
error,
handling
pipeline,
demonstrating
retry
policy,
so
this
is
actually
kind
of
an
interesting
example
of
where
the
steps
aren't
connected.
A
Typically
you'd
expect
to
see
the
steps
connected,
but
I
disconnected
them
in
this
one
and
the
top
kind
of
sub
pipeline
reads
from
a
crown
topic
and
the
handler
itself
randomly
emits
an
error.
So
it's
running,
you
know
if,
if
random
of
two
equals
zero,
then
raise
an
exception,
otherwise,
just
to
return
the
value
in
the
first
one,
it's
retry,
because
if
you
retry
randomly
you'll
ultimately
get
success
that
uses
a
back
off
policy
a
retry
back
off
and
this
second
one
here
that
has
a
retrace
policy
of
never
so
never
retry.
A
So
you
can
see
around
half
the
messages
of
fouls,
because
you
know
it's
a
it's
a
coin
toss
as
to
whether
they're
failed
and
that's
reported
in
this
and
that's
kind
of
useful,
because
it
allows
you
to
have
a
step-
that's
robust
to
your
container
being
killed
by
kubernetes,
which
can
happen
at
any
point.
Anybody
who's,
written
workflows
knows
one
of
the
biggest
challenges
of
writing
a
workflow
is
is
getting
your
retries
set
up
successfully
because,
of
course,
your
any
particular
step
within
a
workflow
can
randomly
fail.
A
It
could
be
killed
by
kubernetes
because
it
wants
the
resources
for
something
else
and
datapo
has
that
same
kind
of
challenge.
So
it
has
the
ability
to
to
read
reliably
from
a
topic
and
retry
if
there
are
any
kind
of
issues
with
the
processor,
for
example,
if
the
process
is
now
unable
to
connect
to
a
downstream
service,
then
it
can
just
retry,
and
that
means
it
can
give
a
very
high
reliability
guarantee,
not
quite
100
but
pretty
close
to
100
percent.
You
know
one
one
thousandths
of
percentage.
A
A
If
you
want
to
find
out
more,
you
can
find
argo
dataflow
in
I'll
go
project,
labs,
kind
of
really
keen
to
get
people's
feedback
to
think
about
the
use
cases,
and
so
we
can
make
sure
we
build
up
the
right
kind
of
features
and
capabilities
for
what
people
need.
We've
got
some
idea
from
speaking
to
our
own
customers,
but
we
knew
the
feedback
from
the
community
is
really
invaluable.
A
Hopefully,
you'll
see
that
some
of
the
concepts
have
been
borrowed
or
adapted
from
both
argo
workflows,
narco
events,
you
know
some
of
the
best
concepts
around
kind
of
starting
and
stopping
pods
and
containers
and
doing
things
reliably
on
kubernetes,
which
is
you
know,
an
interesting
challenge
in
his
own
right.
A
C
It
maybe
did
you
want
to
explain
like
some
of
the
use
cases
that
we're
trying
to
solve
by
writing
this.
A
Yes,
so
we're
not
looking
to
kind
of
replace
stream
processing
tools
like
apache
beam.
I
think
we
and
we're
not
looking
to
replace
tools
like
argo
and
vents
and
go
workflows.
I
think
it
sits
in
between
those
two
and
the
kind
of
a
very
general.
The
very
general
use
case
is
processing
items
of
data
from
some
kind
of
topic
or
subject,
but
we're
aiming
at
operational
analytics
processing.
A
So
in
our
initial
case
that
would
be
events
so
about
application
deployments
about
requests
going
through
api
gateways
that
go
that
needs
some
kind
of
pre-processing
or
adaption,
or
you
know,
enrichment.
Those
kind
of
operations
before
they
are
put
into
some
kind
of
ai
tool,
or
some
kind
of
you
know,
data
processing
tool
to
extract
anomalous
events
from
from
the
list
that
would
be
the
initial
one,
but
but
anything
that
would
actually
process.
Streams
of
events
is
targeted.
A
Okay,
so
I
will
just
go
back
to
our
menu
for
today,
so
I
hope
you
guys
enjoyed
the
demos
that
we
have
today.
If
you
want
to
learn
more,
we'll
include
some
links
in
this
document,
so
you
can
go
and
read
the
slides
yourselves
and
also
be
able
to
kind
of
ask
more
questions.
You
can
obviously
come
and
ask
questions
on
the
cncf
slack,
which
we've
now
migrated
to
in
the
argo
workflows
channel.
You
can
come
and
ask
questions
about
that.
A
I
think
there's
an
argo
data
flow
channel,
but
there's
not
much
in
the
way
of
things
going
on
there.
If
you
are
interested
in
presenting
at
the
community
meeting,
we
always
love
to
see
people
talking
about
what
they're
doing
well
yeah.
We
go
it's
great,
always
great
to
see
people's
tools
that
they've
built
on
top
of
argo,
workflows
and
argo
events,
but
also
it's
really
good
to
see
people's
kind
of
other
use
cases
what
they're
doing
with
it.
We
all
find
that
really
interesting,
and
we
appreciate
that.
A
Okay
and
thank
you
very
much
for
all
joining
today.
Oh
and
the
other
thing
they
always
ask
us,
is
this
being
recorded?
Yes,
it's
been
recorded
and
the
video
will
be
available
on
youtube
later
today.