►
From YouTube: Automate & Scale Data Pipelines the Cloud Native Way | Guillaume Moutier OpenShift Commons Briefing
Description
Automate & Scale Data Pipelines the Cloud Native Way
Guillaume Moutier (Red Hat)
OpenShift Commons Briefing
March 12, 2020
A
Hello,
everybody
and
welcome
to
yet
another
OpenShift
Commons
briefing
today
we're
gonna
tell
you
how
to
automate
and
scale
your
data
pipelines.
The
cloud
native
way,
yo
mutia
from
Red
Hat
will
be
reintroducing
himself
and
telling
you
a
little
bit
about
himself
and
then
giving
us
a
bit
of
a
deep
dive
in
our
database
pipelines
initiative.
So
Yom,
please
take
it
away
and
there
will
be
live
Q&A
at
the
end
of
this.
I
will
get
the
slides
from
him
and
we
will
post
it
all.
On
blog,
got
openshift
comment
on
YouTube
as
usual.
B
So,
let's
get
started
first,
the
cloud
native,
as
you
said,
to
set
the
stage
I
want
you
to
go
back
a
little
bit
on
what
what
the
characteristics
can
be
for
a
cloud
native
platform
here,
I'm
listing
the
things
that
are
most
important
for
me.
But
what
you
must
never
forget
is
why
you
are
doing
things
here
is
the
business
outcomes?
What
are
we
trying
to
achieve
when
we
are
implementing
those
kind
of
architectures?
For
me,
most
important
things
are
speed,
efficiency
and
for
most
adaptability.
B
We
we
know
now
that
technology
is
moving
faster,
real
fast
and
we
have
to
adapt
our
businesses
or
organizations
to
be
able
to
to
handle
the
this
kind
of
changes.
And
so
adaptability
is
and
was
my
main
concern
altered
my
career
and
now
we
have
the
tools
and
we
have
the
technology
to
be
able
to
to
achieve
this,
this
business
goals.
So
let's
take
a
look
at
what
I
would
call
a
legged
legacy.
B
Data
pipeline
architecture
and
I
call
it
legacy,
but
I
know
for
sure
that
for
most
organizations
still
the
standard
way
to
do
things.
We
look
at
architectures
where
that
are
very
tightly
coupled
and
not
easily
scalable,
for
example,
if
I
take
a
very
basic
application,
where
a
user
will
save
a
thigh
to
some
storage
that
can
be
processed
by
by
an
application.
B
Well,
the
which
works
for
most
applications
is
that
there
is
some
storage
mounted
on
a
computer
can
be
a
shared
folder
or
something
like
that,
and
then
the
the
file
is
sent
to
the
storage
again,
it
has
to
be
mounted
against
the
server
CIFS
iSCSI,
but
still
some
kind
of
hard
connection
in
between
the
storage
and
the
application
server,
and
then
it's
consumed
by
an
application.
Let's
say
some
java
application
here.
Problem
is
with
this
architecture.
Well,
first
I
have
to
have
things
very
close
from
one
another
because
of
this
mounting
problem.
B
I
cannot
Manta
CIFS
over
thousands
of
kilometers.
Doesn't
work
well
also
the
scalability
problem
when
I
am
using
this
type
of
connection.
That
means
that
if
I
want
to
put
up
another
application
server,
for
example,
because
I
want
to
scale
my
application
capabilities
well,
it
has
to
have
exactly
the
same
configuration
as
my
first
server
exactly
the
same
storage
connection,
exactly
the
same
man
point
and
behavior.
So
that's!
B
Ok,
if
you
have
one
or
two
server,
but
if
you
have
tens
or
hundreds
of
them,
that's
a
burden
that
you
have
to
take
care
that
you
have
to
take
care
of
now.
Let's
look
at
a
more
cloud
native
way
to
do
these
kind
of
things.
Well,
we
can
think
of
an
application
again
where
a
user
will
just
send
a
file,
but
this
time
to
an
object,
storage-
and
here
it's
a
fully
disconnected
mode.
You
know
using
an
object.
B
Storage
consuming
of
just
storage
is
only
HTTP
connection,
so
it's
only
a
put
or
a
get
and
then
that's
it.
I'm
finished
I
have
no
remaining
connection
in
between
my
user
application
and
the
object
storage
same
thing
for
your
data
processing
function.
They
can
consume
the
storage
directly
and
as
they
need
it.
So
that
means
they
can
be
wherever
you
want
and
they
can
scale.
It
will
be
much
easier
to
do,
and
now
we
have
intelligent
storage,
I
would
say
in
the
in
the
latest
release
of
Ceph.
We
have
now
back
at
notification.
B
So
that
means
that
whenever
something
is
happening
in
the
object
storage,
you
can
send
a
notification.
Let's
say
to
a
kefka
bus
that
will
that
will
itself
trigger
some
data
processing
function
here,
I
like
to
put
Kafka
in
the
middle
of
this
kind
of
architectures,
because
it
can
hacked
in
two
different
ways.
First,
as
a
buffer,
let's
say
my
data
processing
function
is
not
ready
or
not
ready
yet
well
there.
The
notifications
will
keep
coming
in
inside
the
Kafka
bus
and
when
it's
ready,
then
it
will
be
consumed.
B
The
topic
the
notifications
will
be
consumed
and
then
the
function
can
realize
its
its
process,
but
also
Kafka
can
act
as
some-some
hub
for
all
those
notifications.
So
we
can
imagine
that
we
have
different
processing
functions,
maybe
in
different
places,
different
data,
centers,
realizing
different
operations,
but
everyone
feeling
and
the
same
on
the
same
topic
so
we'll
try
to
do
it
for
real
try
to
build
an
application.
That
will
that
we
work
like
this
so
here
for
this
demo.
I
took
the
example
of
ACH
payments.
B
For
for
you,
people
who
are
not
in
the
in
the
United
States
ACH
can
be
seen
as
electronic
check
electronic
payments,
so
it
can
be
used
by
a
customer,
paying
a
service
provider
and
employer
depositing
money,
and
your
your
checking
account
for
payrolls.
All
those
kind
of
things
happening
electronically
for
my
demo
here
I
will
try
to
implement
this
very
basic,
very
basic
pipeline,
where
someone
buys
something
from
merchants
and
there
is
an
electronic
payment
happening.
B
The
which
works
is
that
the
transaction
has
to
be
sent
to
the
to
the
bank
of
the
American
Way,
and
this
bank
will
produce
what
is
called
an
ACH
file.
It's
a
standard
file
will
come
it
and
we
come
to
each
a
minute,
a
standard
file
that
will
be
sent
to
the
Federal
Reserve.
Here
it
will
be
processed
and
make
available
to
the
receiving
bank.
The
receiving
bank
will
be
the
one
of
the
customer,
so
it
will
be
the
one
to
process
the
transaction
and
debit
the
account
of
the
of
the
customer.
B
Ok,
so
that's
the
the
the
basic
process
of
ACH
and
as
a
reference
here
is
the
the
ACH
file
itself
and
which
works.
You
know
very
unfashionable
transaction
with
the
first
line,
giving
information
about
the
the
bank,
the
bank
itself
and
some
basic
information
about
the
company.
Second
line
more
details
about
the
company,
and
then
you
have
all
those
transaction
fields
with
the
different
customers
here:
the
amount
of
money
that
they
have
that
they
have
spent
and
which
bank,
which
receiving
bank
this
transaction
should
be
sent.
Okay.
B
This
is
how
I
have
implemented
it
inside
openshift,
so
I
have
here
some
kind
of
generator,
we'll
come
to
it
that
generates
fake
transactions
and
send
send
those
files,
enzymes
inside
an
object,
storage
bucket.
Then
this
one
will
trigger
a
notification
that
will
be
sent
to
kefka
bus,
and
here
I
will
be
using
kenneth,
eventing
and
carrot
of
serving
that's
a
way
in
kubernetes
and
in
up
and
shift
to
create
on
demand
paths
on
demand
function.
B
So
I
have
a
service
that
will
be
listening
for
Kefka
events
and
then
spinning
up
a
deployment
of
the
container
that
will
process
the
file.
If
we
process
the
transaction,
so
here,
what
you
will
do
is
create
an
ACH
file
for
for
the
transactions
and
send
them
to
to
the
bank.
So
the
bank
of
the
merchant,
so
here
I,
will
have
a
few
packets
I.
Do
my
Gmail
with
seven
different
banks
or
seven
different
buckets
to
which
the
different
files
will
be
sent
depending
and
the
merchant
sending
sending
the
file
at
the
origin
bank.
B
Those
files
will
be
processed.
Basically,
what
it
will
do
is
look
at
all
the
transactions
and
recreate
new
ACH
files,
this
time,
sending
it
to
to
a
destination
to
the
destination
bank
to
the
receiving
bank.
All
those
files
will
be
created
and
burst
into
the
into
different
buckets.
So
this
was
this
time
buckets
billing
belonging
to
the
the
receiving
banks
where
they
will
be
processed.
So
here
the
the
standard
process
will
be
to
look
at
the
transaction
and
they
beat
gee
I
can't
after
after
customer.
B
What
we
will
do
in
this
demo
is
that
will
only
look
unjam
unprocessed
and
we
will
just
sum
them
up
in
some
wine
in
some
some
big,
some
big
bucket,
just
to
see
how
many
transactions
were
processed
in
how
many,
how
much
amounts
of
money
was
was
processed
all
throughout
all
throughout
this
pipeline.
So
to
implement
these
few
things
that
I
need
some
calf
care
topics
to
be
able
to
send
my
notifications.
So
here
you
can
see
at
the
bottom.
B
They
have
the
American
upload
topic
and
and
the
ODF
a
topic
where
I
will
send
a
file.
Then
I
have
some
buckets
that
I
have
created
in
my
in
my
storage.
Here
are
all
the
buckets
that
I
have
and
don't
worry.
You
will
have
access
to
the
two
to
the
code
and
everything
to
be
able
to
to
reproduce
the
demo.
B
So
I
wasn't
going
to
into
too
many
details
on
this
right
now
and
then
we
will
program
the
back
identifications
themselves,
how
it
how
its
done
in
in
in
surf
and
RHCs
in
the
reddit
surf
storage.
You
can
do
what
you
do
is
to
create
a
topic
that
will
point
to
your
Kafka
to
your
calf
cap.
End
point
okay,
so
here
I
will
create
a
topic
with
the
name.
Rg
Fi
and
I
will
point
it
to
my
craft,
like
a
calf,
calf,
cluster,
okay
and
then
for
each
bucket.
B
B
Finally,
before
we
go
on
from
a
live
demo,
this
is
a
transaction
job.
The
wage
works
is
that
it
will
trigger
a
container
that
will
generate
our
transactions
and
it
will
run
60
times
with
the
parallelism
of
five,
so
that
means
that
I
will
be
able
to
create
five
five
files.
At
a
time
inside
the
my
inside,
my
clusters,
I
may
open
chief
cluster.
So
let's
go.
Let's
do
this,
so
here
what
we
can
see
here,
I
am
in
my
project.
B
I
can
see
that
I
have
three
parts
which
are
the
parts:
the
Kennedy
parts,
the
server,
less
open
ship
server,
less
parts
that
are
listening
to
two
events,
I
have
also
in
my
openshift
server.
Less
I
have
three
different
processes:
three
different
services,
which
will
split
the
ACH
files
or
process
them
depending
and
they
are
ready,
but
you
see
those
processes
already.
The
services
are
ready,
but
there
are
no
pods
running
so
now
we
are
scaled
to
zero.
Okay.
B
So
let's
create
these
just
transactions
here,
I
will
use
that
the
exact
same
file
just
showed
you
and
now
it's
being
put
into
motion.
So
here
we
can
see
that
we
have
five
containers
creating
based
on
the
the
transaction
transaction
image
transaction
container,
that
I
that
I
have
designed
and
they
will
begin
to
create
new
transactions
and
as
new
transaction
files
are
created.
Well,
it
triggers
containers,
it
triggers
the
orifice
plate.
That
means
looking
at
the
ACH
file
and
splitting
them
and
putting
them
inside
the
red
bucket.
B
It
also
triggers
our
GFI
split,
that's
what's
happening
when
it
looks
inside
the
ACH
file
and
splitting
them
together
to
send
it
to
the
receiving
banks
and
then
the
our
GFI
process.
So
here
I'm
processing
the
transactions
themselves.
It
will
be
better
with
a
live
view
like
this.
Here,
it's
a
graph
on
a
dashboard
where
I
have
my
pipeline.
We
can
see
that
we
have
already
generated
15
15,
different
transaction
fires.
16
now
so
for
16
have
been
processed
and
dispatched
to
the
different
Bank
of
origin,
and
so
far
we
have
treated
8.
B
We
have
processed
8
8
of
them.
Those
fires
are
are
splitted
in
for
the
different
specific
banks
and
they
are
sent
here
to
to
the
receiving
banks
buckets
where
they
are
processed
and
so
far
we
have
processed
75
of
them.
Of
course,
we
have
many
more
files
in
this
process
because
we
take
H
originating
file
and
split
them
split
each
transaction
towards
its
its
own
receiving
bank.
We
can
see,
as
the
process
is
going
on,
that
the
the
CPU
usage
is
increasing.
Of
course,
we
are
spinning
more
clouds
as
we
as
we
need
them.
B
We
have
also
the
RAM
usage
going
on
and
I
have
some
lags
here
on
the
deployments,
but
it
should
keep
up
in
a
few
seconds
and
we
can
see
here
the
the
value
of
the
transactions
that
have
been
processed
so
far,
so
we
can
see
it's
going
up.
We
are
now
at
about
9
million
dollars
what
I
Jen
right
here
for
transactions?
It's
it's
a
random
number
of
transactions
between
300
and
500
of
them
for
each
guy
and
the
amount
itself
is
between
$1
and
$2,000.
Okay.
B
So
that's
the
kind
of
transactions
and
generating-
and
here
we
can
see
the
different
deployments
now
that
we
have.
We
are
now
up
to
15
parts.
We
can
see
that
we
have
five
deployments
of
the
create
transaction
part.
That's
the
maximum
parallelism
that
I
authorized
for
this.
We
have,
of
course,
my
listeners
for
the
Kefka
events,
but
the
treatment
themselves.
B
The
processing
itself
is
how
do
you
have
a
GFI
split
is
what's
happening
here
at
this
point,
so
here
it
doesn't
consume
much
resources,
because
it's
only
looking
at
the
files
and
depending
on
of
the
the
American
banks,
sending
it
to
the
different
buckets
here.
So
not
many
resources
involved.
So
there's
only
one
deployment
of
these
this
process,
but
here,
if
I
look
at
our
deifies
plate
here,
that's
what's
happening
and
in
this
box.
B
B
So
here
that's
why
the
server
less
functions
has
automatically
automatically
be
scaled
to
two
deployments,
because
that's
what
it
needs
to
be
able
to
handle
the
traffic
coming
in
same
for
dr
GFI
process,
it
looks
at
the
files
and
and
process
it
adding
to
the
amount
of
money
that
all
those
transactions
represent,
and
then
it
needs
also
two
of
those
parts
to
do
the
processing
here
what's
happening.
We
can
see
that
we
have
reached
the
maximum
number
of
files
that
we
wanted
to
generate
so
60,
so
our
create
transaction
parts
have
scaled
down
to
zero.
B
Okay.
We
of
course
that's
what
we
wanted
to
do,
and
then
here
we
have
reached
also
the
number
of
64
this.
The
first
step
of
processing-
so
these
are
GFI
split
part
should
come,
should
come
then
to
zero.
In
in
a
few
seconds,
we
can
see
that
it's
we
already
are
consuming
a
little
bit
less
memory
for
this
kind
of
thing.
So
here
that's
a
neat
way
to
demonstrate
with
only
using
bucket
notifications
and
civilized
functions.
You
can
fully
automate
your
data
pipelines.
B
It
doesn't
require,
you
know
some
kind
of
application
that
will
orchestrate
everything
and
will
take
care
of
everything
here.
It's
only
a
few,
a
few
files,
a
few
configuration
files
that
you
put
into
motion
that
allows
you
to
to
create
very
simply
this
kind
of
pipelines.
So
speaking
of
files
and
I
will
go
back
here.
B
Speaking
of
files,
you
will
have
all
the
code
and
all
the
all
the
different
configuration
files
and
containers
images,
and
things
like
this
in
this
repo
I-
will
also
put
it
in
a
few
days,
a
full
full
world,
true
to
be
able
to
reproduce
this
kind
of
demo
and,
of
course,
feel
free
to
to
reach
out
for
some
more
information
or
if
you
have
questions
or
problems
implementing
this
kind
of
things
it
will
be.
It
will
be
a
pleasure
to
to
reply
to
this.
A
A
B
There
is
everything
there
is
the
container
code
to
be
able
to
create
your
that
the
part
that
will
the
process
or
create
the
transactions.
There
is
the
Kafka
topic
creations.
There
is
well.
There
is
everything
to
be
able
to
go
from
scratch.
That
is
starting
on
the
brand-new
openshift
installation
and
install
everything
that
you
need.
Awesome.
A
So
look
forward
to
other
people
taking
this
for
a
test
run
and
drying,
and
Emily
and
I
really
appreciate
you
taking
the
time
today,
Jim
and
look
forward
to
having
you
back
for
update
new
updates
on
this
topic.
So
thanks
again
and
hey,
everybody
would
like
to
re-watch
this.
There
will
be,
it
will
be
uploaded
on
the
YouTube
channel
later
today
and
I'll
steal
the
slides
from
Guillaume
shortly
and
also
link
them
up
there
as
well
and
put
a
blog
post
with
some
other
resources
up
on
blog
that
openshift
com.
A
So
look
for
that
in
the
next
coming
days
and
we
will
continue
to
provide
you
with
entertaining
and
educational
briefings
over
the
coming
weeks
to
take
place
of
some
of
the
conference's
that
have
been
cancelled.
So
look
for
that
on
the
events
page
at
open,
Commons,
openshift,
org
so
take
care
everybody,
and
thank
you
very
much.