►
From YouTube: OpenShift Commons Briefing Deploying KPMG Ignite on OpenShift - Kevin Martelli Hongfei Cao (KPMG)
Description
OpenShift Commons Briefing
DeployingKPMG Ignite on OpenShift - Overview and Deep Dive Demo
Kevin Martelli and Hongfei Cao (KPMG)
2020-06-03
A
All
right,
everybody
welcome
back
to
another
openshift
Commons
briefing,
we'll
be
live-streaming,
multiple
channels
on
Twitter,
Facebook
and
YouTube,
and
we'll
collect
your
questions
from
there.
Today
we
have
to
be
new,
hopefully
Commons,
member
KPMG,
joining
us
to
give
us
a
talk
on
there
at
night
platform
for
well
we'll
find
out
what
it's
all
for.
It's
machine
learning
table
science
all
kinds
of
good
stuff,
but
we
have
Kevin
Martelli
and
hung
fake.
A
B
Thank
you
so
much
and
thanks
everyone
for
taking
the
time
out
to
go
through
this
presentation
quickly
before
we
dive
into
the
details.
I
want
to
quickly
introduce
myself,
so
my
name
is
Kevin
Mark
Kelly
I'm,
a
principal
software
engineer
at
KPMG,
overseeing
our
Big
Data
soft
Ranger
engineering
team,
a
lot
of
stuff
related
to
cloud
containerization,
z'
and
end-to-end
deployments
with
machine
learning
with
me.
I
also
have
a
colleague
conf,
a
cow
on
hung,
say
maybe
a
quick
introduction
on
yourself.
Yes,.
C
B
Great,
thank
you
one
thing
so,
as
was
mentioned,
we're
gonna
give
you
a
little
bit
of
a
background
around
the
motivation
that
we
had
on
building.
What
we
call
KPMG
ignite,
which
is
our
data
science
platform
and
ecosystem,
to
bring
use
cases
from
pocs
into
production,
will
drill
a
little
bit
into
to
what
ignite
is
so
the
audience
can
get
a
feel
for
what
we're
dealing
with
they.
B
The
demonstration
will
be
representative
of
how
the
business
can
interact
with
it,
how
data
scientists
can
interact
with
the
platform
as
well
as
how
engineers
can
interact
with
the
platform
and
how
we
can
production
Eliza's
pipelines
on.
On
top
of
that,
we
shift
within
containers.
So,
looking
forward
to
this
and
again
any
questions,
please
feel
free
to
type
them
in,
as
we
go.
B
So
the
first
question
that
many
you
may
ask
is:
why
did
we
decide
to
build
and
invest
so
heavily
into
this
platform
that
we
use
for
our
internal
data
science
as
well
as
client
initiatives,
and
you
know
probably
like
many
of
you
could
guess,
one
of
the
main
drivers
was
the
just
the
explosion.
If
you
will
of
AI
in
the
marketplace,
you
know
many
clients,
you
know
looking
at
it
to
leveraging
it.
B
How
can
they
use
it
and
we
saw
that
there's
a
need
there
to
kind
of
bring
some
of
these
technologies
together
and
not
only
from
the
standpoint
of
you
know,
being
able
to
I
would
say,
production,
wise
use
cases.
We
do
see
a
lot
of
people
that
are
able
to
build
cool
POCs
but
they're,
not
always
necessarily
able
to
get
them
into
a
into
a
production,
wise
format.
So
one
of
the
things
within
a
within
our
ignite
platform
is
how
can
we
come
and
bring
these
these
capabilities
from
both
the
POC
to
realize
production?
B
In
addition,
making
sure
we
have
those
right
hook,
so
how
does
the
business
interact
with
the
platform
and
our
scientists
and
engineers
and
again
we'll
drill
a
little
bit
into
into
that
detail?
But
as
we
looked
at
that,
there's
really
about
you-
know
five
or
so
areas
that
we
see
that
enterprises
need
to
be
aggressive
in
in
order
to
support
AI
and
be
able
to
bring
AI
more
seamlessly
into
the
into
the
organization.
B
Some
of
these
areas
we
do
cover
by
ignite
and
in
some
of
these
areas
you
know
have
to
be
augmented
with
separate
business
processes.
You
know
one
of
the
important
things
here
on
number
one
is
around
data
literacy
and
building
at
your
data
expertise,
and
we
used
to
think
about
this
more
along
the
lines
of
something
that
an
engineer
or
scientist
a
business
analyst
would
do.
But
this
is
more
holistically
now
across
the
organization.
How
do
business
folks
understand
their
data?
Can
they
get
the
right
training
data?
B
B
The
next
is
around
technology
and
I
think
we
can
all
kind
of
agree
that
there's
been
a
technology
explosion
in
the
space
every
every
day
you
wake
up.
There's
new
technologies
available
to
produce.
You
know
similar
types
of
outputs
and
what
technology
should
you
use?
What
technology
shouldn't
you
use?
Where
do
you
want
to
do
your
R&D,
invest
in
around?
B
One
of
the
things
we
took
for
ignite
is
that
we
realized
that
this
market
was
going
to
be
expanding
so
quickly,
so
we
needed
a
very
open
ecosystem,
micro-services
containerization,
easy
to
plug
in
and
plug
out
as
these
new
AI
tools
and
capabilities
came
to
the
marketplace,
and
that's
something
that
will
show
you
to
within
this
demonstration
business
processes.
How
do
business
processes
now
change
our
people,
relying
on
the
data
from
data
science?
Tell
our
people
embedding
this
into
their
day-to-day.
You
know
work
activities.
How
are
the
enterprises
adopting
this
and
then
the
work
force?
B
It's
it's.
It's
important
that
we
enhance
some
of
our
legacy.
Skill
sets.
We
probably
we
hire
new
folks
and,
and
how
is
our
workforce
support
this?
How
do
we
move
from
sort
of
the
legacy
monolithic
into
these
more
agile?
You
know
quick,
developing
and
production,
ization
of
machine
learning
pipelines
and
then
the
final
part
I'll
just
briefly
touch
on
is
around
risk
and
reputation.
I
think
we
all
know
that
there
could
be
a
lot
of
risk
associated
to
some
of
this.
B
The
importance
of
you
know
understanding
your
model,
understanding
the
details
that
are
going
into
your
model.
How
are
you
managing
this?
We
see
a
lot
of
organizations.
You
know
you
know
coming
up
with
ways
that
they
could
have
explained
ability
and
making
sure
that
their
minds
they're
there.
Other
models
are
fix
of
bias,
and
how
do
you
fix
these
things?
So
there's
no
risking
reputational
risk
that
organizations
are
facing
in
this
era
and
again
these
are
some
of
the
capabilities.
We
hope
that
we
can
help
support
through
the
deployment
of
our
KPMG
ignite
platform.
B
Do
it
with
that
I'll
jump
down
to
you
know
what
is
KPMG
ignite
and
we
touched
a
little
bit
on
these
topics
on
earlier,
but
essentially
we
have
the
who
the
what
and
the
why
from
the
who,
who
who
was
this
platform
built
for
it?
Who
already
have
it
in
mind
that
could
use
it.
It
was
a
platform
predominantly
built
for
our
data
science
and
data
engineers,
so
they
could
build
the
the
pipelines.
However,
you
know
without
having
the
business
and
the
ability
for
others,
analyst,
etc,
to
be
able
to
put
input
into
the
system.
B
It
just
wasn't
necessarily
as
I
would
say
useful,
so
there's
business
hope
so,
whether
that
comes
from
annotating
and
creating
training
data,
whether
that
comes
from
validating
model
results,
there's
different
areas
where
the
business
and
business
analyst
can
come
in
to
work
within
the
platform,
the
what
part.
So
what
is
it?
And
we
talked
a
little
bit
before
this-
is
a
a
global
AI
platform.
We
have
a
very
modular
/my,
close
service
type
of
delivery,
so
each
module
and
we'll
dive
into
that
could
be
an
OCR
job.
B
A
module
can
be
a
model,
a
module
can
be
a
data
extraction,
so
we'll
jump
into
some
of
these
things.
What
these
models
are,
but
they're
built
in
such
a
way
that
they're
sort
of
interchangeable.
So
if
there's
a
new
capability
in
the
marketplace
that
comes
out-
and
we
want
to
take
advantage
of
it-
you
can
plug
it
into
the
platform
seamlessly
and
its
capabilities
deprecated.
They
could
be
removed
from
the
platform,
then.
B
Finally,
the
why
what
we
noticed
is
that
you
know,
as
we
mentioned
before,
there's
a
high
demand
for
these
types
of
capabilities,
but,
more
importantly,
as
organizations
sort
of
took
their
journey
into
the
AI
space
and
unstructured
data
and
semi-structured.
There
was
a
lot
of
work
around
unstructured
data
set,
so
loan
loan
documents
or
contract
documents,
PDF
documents.
How
could
organizations
get
the
right
information
out
of
them,
voice,
documents
etc
and
make
the
correct
business
decision?
B
This
is
I,
would
say
the
crux
of
what
we
built
our
platform
on
we're,
really
building
it
on
the
foundation
of
very
small
capabilities
and
services
that
could
be
interchanged
with
other
services
that
you
can
string
together
a
pipeline
to
produce
some
type
of
output,
for
example.
Maybe
a
pipeline
is
you
need
to
OCR
a
PDF
document.
You
then
need
to
break
down
that
PDF
document,
so
you
can
start
making
business
decisions.
B
So
maybe
you
add
some,
you
know
Spacey
into
into
it,
to
enhance
it
or-
and
you
might
do
some
sectioning
of
the
document
and
then
ultimately,
you
might
make
some
type
of
business
decision
on
it.
All
these
different
components
are
modular
and
kind
of
executed
by
themselves,
or
you
can
call
different
components
that
may
reside
in
different
cloud
providers,
whether
that's
you
know
GCP,
AWS
or
Azure.
B
Another
another
area
was
that
it
was,
you
could
be
deployed,
rattly
a
containers
or
it
supports
restful
services
into
the
platform
based
on
the
demand
that
you
need
to
push
through
and
finally,
it's
around
reusability.
So
all
these
components
are
reusable.
So
if
one
person
creates
a
component
in
the
community,
that
component
has
a
capability
of
doing
something
that
component
gets
checked
in,
can
be
real
Everest
and
then
rebuilt
into
an
image
and
ultimately
brought
back
into
the
platform.
B
B
B
With
in
the
data
science,
notebooks
yeah
have
the
ability
to
be
and
own
your
own
pipelines
and
workflows
to
test
them
to
build
them
and
then
deploy
them.
The
data
science
notebook
themselves,
it's
gone
through
Jupiter
hub
and
then
each
user
will
get
sort
of
their
individual
notebook.
And
then
we
talked
about
a
little
bit
around
how
the
business
and
users
can
get
in
there,
and
we
have
two
main
mechanisms.
We
have
the
annotation
UI.
B
Understanding
the
statistics
associated
to
your
model
many
times,
feeding
into
your
model
governance
process,
as
well
as
serving
up
models
and
I
guess
one.
This
thing
I
want
to
show
on
this
slide
is:
if
we
sort
of
understand
the
platform
I
want
to
walk
through
from
the
bottom,
for
the
pools
are
persistent
volumes
up
through
kind
of
the
the
application
layer.
Just
to
give
you
a
feel
for
how
the
platform
itself
works.
B
So
at
the
bottom
we
have
persistent
volumes,
for
there
are
openshift
cluster
that
are
attaching
into
the
cluster
and
if
you
move
one
layer
or
for
structure,
so
what
we
use
is
we
use
distributed
manao
to
facilitate
object,
storage,
which
gives
us
better
Li.
You
can
see
faster,
read
and
write
times.
We
also
have
a
post
crest
database
that
stores
a
lot
of
the
metadata
associated
to
the
processing
and
the
workflows.
We
have
our
logging
and
reporting
within
Qabbani
and
elastic
search.
We
use
Kafka
as
our
message
broker.
B
The
Kafka
is
really
set
up
in
a
way
that
allows
you
to.
Why
do
you
execute
one
component
at
an
F
component?
That
goes
a
way
that
keeps
the
the
queue
for
the
next
component
to
pick
up
then,
and
finally,
everything
is
executed
across
the
container
organization
platform.
Remember
orchestration
through
openshift,
alright,.
C
C
So
in
this
architecture,
as
in
Kevin
Carver's
system
volume
4
into
structure,
on
top
of,
for
instance,
structure,
we
have
were
microservices
right
for
each
infrastructure
component,
that
we
deploy
data
in
a
secure,
open,
shape,
the
environment
and
and
they
expose
as
API
right
for
user
to
access
and
on
top
of
the
infrastructure
we
have
our
machine
learning
pipeline
component.
You
can
see
you
have
some
pre-built,
you
know
so.
Maldo,
like
the
you
know,
say
even
a
cc.
C
So
here
is
another
view:
I
got
to
show
you
the
high
level
architecture
about
ignite
Pantheon
and
how
we
deployed
openshift
for
after
that,
I
will
show
you
the
actual
deployment
using
our
scope
in
shape.
So
as
as
we
mentioned,
so,
the
whole
deployment
is
the
container
base
and
conducting
is
a
jockey
in
ride
and
defender
shipped
right
to
a
cloud
infrastructure
initiative
working
from
BM
or
in
a
zone.
Prime,
not
leaders,
private
cloud,
and
we
deployed
the
whole
plan
on
using
the
SAT
pipeline.
C
C
You
know
genetics
or
TFK,
etc,
and
once
we
have
the
infrastructure
component
and
the
deployed
their
weekend
week
and
unlikely
to
use
such
a
big
notebook
as
our
data
science
platform,
but
to
customize
and
build
any
elective
units
are
too
emotional
learning
pipeline
workflow
using
the
predefined
component
or
image.
Here
we
show
ones
book
flow.
We
build
using
this
internet
star
phones,
our
job
PDFs
can
PDF
and
we
can
individualize
right.
C
At
the
end,
we
have
a
hundred
feet
and
affection
component.
The
whole
model
is
also
a
machine
learning
model.
It's
also
deployed
and
version
control
using
mo
flow
for
those
sleepy
and
not
familiar
with
my
pillow.
It
is
open
source
Apache
project
which
offers
a
version
control
and
the
centralized
just
storage
for
any
machine
model.
It
is
supported
in
the
Python
secular
model
as
well
as
there
like
in
the
spark
and
male
model
in
packet
format.
It
is
a
tensor
flow,
alright,
I
torture
model
etc.
C
C
How
we
can
use
our
annotation
UI
to
prepare
the
training
data
accessing
data
to
generate
the
label
for
our
supervisor
learning
model
and
all
the
way
to
model
prediction
classification,
and
we
have
interface
application
tools
for
you
for
any
user
to
correct
an
order,
output
and
there's
a
last
demo.
I
will
show
you
how
to
use,
attribute
and
notebook
to
customize
and
create
an
arbitrary
emotion,
running
workflow
pipeline
using
a
predefined
ignite
component
in
the,
for
example,
spacy
in
as
a
as
a
riot
OCR
and
ID.
Okay.
C
Without
further
ado,
then
let
me
jump
choose
a
first
demo.
So,
as
you
can
see,
we
deploying
on
a
hole
in
that
platform
in
today's
secured
almond-shaped
platform,
and
let
me
quickly
jump
to
the
components
we
deployed
as
a
measure
earlier.
We
are
using
the
say,
SAT
pipeline
to
capacity
boyos,
a
infrastructure
component
for
it
in
that
platform
and
also
set
up
services,
dolls
right,
a
for
any
API
where
I
have
web
application
way
exposed
to
the
end-user.
C
Also,
this
basic
pattern
will
help,
as
you
know,
customize
and
configure
base
account
rather
than
for
the
map
and
make
sure
the
secret
is
placed
properly.
As
you
can
see,
we
have
several
paws
holding
a
napkin
20-point
there,
the
elected,
multiple
powers
running
at
this
point
and
for
the
data
scientists
were
data
engineer.
We
have
to
pin
notebook
of
them
right
to
use
a
cast
or
developed
by
the
new
motion
running
pipeline.
C
You
can
see
each
data,
scientist
or
engineer
and
user
and
then
log
into
our
to
bonobo
develop
created
their
own
path,
so
they
have
a
segregated
environment
for
them
right
to
pass
and
a
develop.
So
they
don't
need
a
wardrobe
all
acting
this
access
control
where
other
user
right
externality
change,
where
we
move
their
code.
Okay,
so
holding
that
popcorn.
C
Do
you
thank
you?
Yes,
but
so,
let's
quickly
jump
to
the
net
IQ
demo
I
know
we
can
come
by
to
the
safety
kikyo
ignite
on
45
min
all
bishop.
So
here
is
the
choice
for
Edison
case
and
the
data
engineer
managed
machine
running
model.
The
backend
is,
we
are
using
the
enough
flow
to
a
store
and
version
control,
the
supertrain
model.
C
It
actually
includes
the
whole
civilized
binary
of
your
model,
so
this
to
basically
help
you
select
lean
as
a
model
store,
so
you
can
use
these
tools,
try
to
check
and
the
reveal
all
the
existing
model
from
in
Emma
flow
and
also
you
can
manage
and
create
a
new
models
with
ASA
web
application.
Here,
I'm
going
to
show
you
I
already
login
as
admin
that
can
look
at
in
the
Ziggy
in
this
you
can
add
a
model
workspace.
We
already
have
a
three
models
created
and
also
I.
C
Let
me
jump
me
in
one
model
year,
so
let's
model
is
called
started.
So
what
happened?
Is
we
were
processing
a
bunch
of
for
financial
services,
contract
documents
and
it's
a
model
well
extract.
The
magnetics
stretch
the
the
content
from
the
del
PDF
to
Cadiz
as
a
start,
a
write
four
days
take
in
the
contract
and
we
don't
have
rely
on
any
another
predefined
template.
This
is
a
purely
IOP
internal
model.
C
Okay,
so
the
first
thing
you
want
to
do
is
you
want
to
create
the
model
using
this
Ignacio
item
in
r2
and
by
giving
the
model
name?
Was
the
standards
of
this
model?
You
first?
That
is
always
a
setup,
meaning
you
keep
the
model
name
use
that
as
a
target
accuracy
or
you
want
to
reach
for
the
model,
and
you
start
like
this
or
prepare
as
a
training
and
the
testing
data
side.
C
So
at
that
point
we
are
moving
to
the
annotation
stage
once
the
Fenian
dinner
testing
data
sets
is
ready,
we'll
move
on
to
the
modeling
stage,
which
again
is
where
train
the
model
and
the
melody
is
more
yourself
asking
decide
and
at
the
end
we
have
a
like
a
hold
out
inside
for
you
to
really
you
know
so
valid
in
a
test.
Your
model
with
all
the
lacking
natural
true
run
through
the
label,
and
at
this
stage
we
have
it's
a
user
interface
for
any
user.
C
They
can't
go
to
the
model,
result
and
amenity,
garage
or
oxides
and
multiple
result.
It's
a
las
that
is
complete.
So
when
we
save
the
model
in
using
these
admin
tools,
what
happened
is,
in
the
back
hand,
the
ml
flow
or
it
will
create
entering
a
project
in
the
end
up
flow
for
this
model.
It
quickly
show
you
here:
this
is
the
back
and
enough
flow
engine,
as
I
mentioned
earlier.
This
is
used
as
over
centralizing
mode
of
storage
model
database.
C
B
One
thing
here
is
I
think
a
lot
of
the
datasets
that
are
being
part
of
this
ml
flow
or
ultimately
feeding
back
into
the
governance
processes
that
organizations
may
have
so.
A
lot
of
these
statistics
and
data
sets
that
are
coming
out
of
your
confusion,
matrix
etcetera
can
then
feed
back
into
the
overall
governance
process
of
your
Model
Management
yep.
C
C
So
here,
the
the
we
uses
are
no
flow
model
to
tract
model
performance
and
as
well
as
save
and
manage
the
actual
model,
tannery
service
format,
for
example,
or
a
second
remodel.
It
will
save
as
a
Pico
format.
The
first
part
ml
model
about
save
as
okay
format
right.
So
let
me
jump
back
to
the
Ignite
IQ
okay.
So
at
this
point
we
already
create
a
model
or
using
this
I
can
add
a
cue
any
mean.
C
C
C
Hey,
as
you
can
see,
we
have
three
economic
models
already
loaded
again
this
again.
Thank
you
again.
It's
back
and
it's
a
real
fun
and
not
flow
and
it's
model
and
into
demo
is
a
starting
model.
First
thing
is:
we
want
to
have
a
set
of
label
data
for
our
model
training,
it's
kind
of
mentioned.
This
is
often
the
part
ready
to
for
the
gate,
descent
into
the
CAD
store
training
data.
C
If
this
is
a
cigarette
learning
model
and
far
worse
start
more
start
dating
model,
we
need
a
bunch
of
contract
data
draw
PDF
and
for
the
SME
company
side
over
to
the
sentence,
they
kind
of
used
this
ignite
a
clue
to
to
manually
label
our
target
result
from
each
documents,
as
you
can
see
here
where
they
have
the
label
for
the
start
date,
the
actual
text
is
showing
here
the
reason
we
can
show
that
extra
tax
is
all
the
document
drop.
Scam.
C
Pdf
is
ready,
OCR,
we
do
have
OCR
text
without
detection,
so
and
I
say.
If
I
draw
another
bounding
box
in
the
different
area,
you
will
see
that
it
will
return
actually
in
the
text.
Data
from
OCR
in
this
way
SME
what
the
dissent
is
and
quickly
label
the
documents,
rather
just
from
a
bunny
box
around
information
they
need
and
move
on
to
the
next.
C
B
Just
ahead
a
little
a
little
business
context
here
is
what
we
traditionally
see
is
business
users
might
want
to
get
certain
information
out
of
unstructured
data
set.
So,
for
instance,
if
it's
a
contract,
they
might
want
to
pull
out
the
contracting
terms,
or
they
may
want
to
pull
out
the
effective
date
or
the
completion
date
of
the
contract
and
be
able
to
make
business
decisions
on
that.
Are
they
getting
the
right
services
for
the
contract?
B
Those
questions
have
answers
that
are
in
the
documents
in
the
annotation
process
allows
us
to
annotate
those
answers,
specify
them
where
they
are
inside
the
particular
doubloon
or
the
you
know
the
outflow
from
your
process
and
then
allow
the
data
scientist
to
figure
out
if
they
want
to
use
some
type
of
machine
learning
model
to
be
able
to
extrapolate
out
the
information
or
some
other
technique
to
consistently
find
that
right
information.
So
decisions
could
be
made.
You
know
form
from
the
business
side,
but
that's
that's
one
of
the
areas
of
it.
B
C
Instead,
okay
bless,
you
finish:
menu
label
for
the
training
data
set.
We
can
execute
as
a
model
and
January's
I've
received.
Fourth
opinion
data
because
nobody's
a
trainee.
The
data
center
may
Vanek,
multiple
experiments,
multiple
runs
but
also
tracks,
accuracy,
distribution
and
it's
a
backseat
ran
right
from
this
history
chart
audience
all
the
metadata.
C
Moving
on
to
the
tests
similar
to
the
training
data
set,
we
also
want
to
prepare
is
the
label
data
for
the
test
stat
at
the
end,
the
descent,
if
I
send
me,
you
can
log
into
each
documents
and
magnitude
towards
the
bounding
box
for
the
actual
label
once
it
is
complete.
Nichkhun
Jenner
is
a
test
data
side
accuracy
at
the
training
model
once
the
model
in
the
fully
tested
we
can
move
on
to
the
last
step,
which
is
a
holdout
inside.
B
C
Once
SMU
prepares
a
holdout
data
set,
it
can
used
tests
between
the
model,
the
processing,
each
documents
and,
as
model
generally,
the
default
predicted
result
with
a
cuisine,
and
this
point
we
do
see
like
in
the
song
result,
output
a
paid
model,
looks
correct
to
us
for
the
samsung
result
itself.
The
world
starts
a
menu
determining
step
to
go
through
each
model
result
and
either
accepts
the
result
were
corrected,
for
example,
for
this
document.
The
result
looks
good
to
me,
so
I
can
accept
the
result,
and
this
result
is
also.
C
This
could
mean
okay
and
moving
on
to
the
next
one
sings
actor.
This
detection
is
off.
We
can
reject
it
and
men
unique
right
on
the
right
start
date
or
dump
all
that
me
pick
up
this
one
and
sour
that
it's
dark
day.
They
let's
take
this
result
here.
So
what
happened?
Is
we
just
correct
this
document?
Our
result
generally
power
in
the
pro
training
model,
today's
correct
information
and
for
the
next
training
were
next
in
the
model
updates
of
Melanie
use
their
this
rather
information
label
or
retrain
our
model
to
credit.
B
So
when
the
data
scientist
gets
the
output
of
the
corrections
that
the
business
is
making
or
the
interactor
sees
that
they
had
in
their
prediction,
they
get
to
see
that
information
into
the
loom
to
then
retrain
their
models
or
update
their
rules
or
whatever
they're
doing
to
try
to
extrapolate
out
that
information.
So
again,
one
of
the
things
we
had
data
literacy
is:
how
do
you
kind
of
have
the
understanding
of
data
through
a
life
cycle
is
process
and
one
of
the
concepts
here
we
have.
B
C
At
this
stage
we
are
pretty
much
done
or
as
a
whole
machinery
life
cycle,
and
we
can't
go
back
to
our
admin
tools
to
update
our
model
status
to
completely
boring.
It's
a
skill
if
the
model
still
need
more
pull
out
testing,
because
activity,
medication
stem
and
all
the
the
whole
tools
is
deployed
on
top
of
a
unite
platform.
B
You
bring
up
the
PowerPoint
one
quick
time
before
you
go
back
into
open
shift.
I
just
want
to
show
where
you
were
in
that
life
cycle
right
there
yep,
if
you
think
about
it,
have
you
think
about
what
we
were
doing?
There
was
the
activity
of
the
OCR
in
which
was
done
already
that
OCR
and
then
feeds
into
those
business
input
function
so
where
you
can
start
doing
the
annotations
and
where
you
can
start
kind
of
marking
up
your
documents.
B
That
thing
goes
back
into
the
data
scientist
and
then
once
they
do
when
they
build
and
they
train
the
model,
there
might
be
a
smart
sectioning
model,
for
instance,
they
might
want
to
break
the
document
down
in
the
section,
so
they
can
more
granularly.
You
know
make
predictions
on
the
datasets
that
are
being
highlighted.
They
could
add
some
Spacey
there
to
enhance
the
data
to
better
find
the
information,
but
these
steps
along
the
path
of
after
you,
CRA
after
the
business,
comes
in
and
gets
the
the
annotations
and
the
markups.
You
know.
B
The
next
part,
then,
is
to
start
breaking
that
problem
down
to
be
able
to
get
the
information
out
that,
ultimately,
you
want
to
use
to
make
your
business
decision
so
again,
we've
kind
of
run
one
part
of
the
night
here
to
OCR
it.
So
you
can
see
the
document
on
the
screen.
Like
you
did,
the
business
will
do
the
annotation.
You
start
creating
different
models,
whether
it's
a
model
for
smart
section
or
you
want
to
reuse
this.
B
You
enhance
your
intelligent
domain,
name
engine,
you
add
Spacey
to
get
better
Richmond
across
of
it
and
then
finally,
what
on
Fable
now
show
is
these
components
will
execute
in
a
workflow
that
can
scale
up
or
down?
So
if
you
have
through
OCR,
we
know
OCR
is
a
heavier
process.
Maybe
a
doze.
You
are
a
thousand
documents.
You
could
have
a
thousand
components
that
are
running
each
ton
of
components,
complete
it's
going
back
into
Kafka
to
the
Kafka.
No,
and
then
your
next
component
and
the
workflow
is
picking
this
up.
B
C
A
C
We
also
has
a
point
of
appointment.
Type
components
are
shown
here,
including
the
we
have,
this
part
I'm,
the
only
component
and
a
load
balancer
and
James
HBase
API
for
rewrite
from
HBase,
annotation
tools,
component
builder,
etc.
So
after
using
the
cs-80,
a
right
to
deploy
is
a
complete
in
that
platform.
Openshift,
we
can
let,
in
you
know,
data
scientist
or
an
engineer
to
use
this
platform
to
launch
or
create
archery
I
denied
workflow
for
their
machine
learning.
Pattern
does
the
first
thing.
B
Just
one
quick
thing:
you're
gonna
be
using
a
Jupiter
notebook
to
show
the
ability
to
create
a
workflow,
which
is
each
component
that
you
want
to
execute
and
then
executing
it
through
or
as
we've
seen
some
other
clients,
and
we
do
it
internally.
You
can
call
our
restful
service
that
sends
in
your
workflow
that
also
execute
this
and
then
pipeline.
B
C
Yeah,
so
this
is
the
Python
SDK
we
created
is
called
in
that
connect,
but
it's
kinda
mentioned.
You
can
also
try
to
be
the
meat
of
your
workflow
in
the
our
JSON
format,
too
long
to
your
machine
learning
pipeline
directly
in
that
API
yeah.
But
it's
a
time
of
them
to
show
you
so
using
soya,
even
as
I
speak
a
to
create
port
flow
and
executed
using
Napa
right.
The
first
thing
is
the
way
you
need
to
import
several
libraries
peasant
libraries
for
tonight,
and
then
we
will
define
our
workflow
here.
C
First
thing
is
a:
we
will
integrate
some
PDF
scan
PDF
image
right
from
the
local
disk,
and
then
it
will
come
to
the
invisible
flow
definition
here.
We
have
several
components.
You
want
to
ask
you
in
this
for
flow.
The
first
thing
is
because
PDF
is
can
PDF
we
need
to
convert
from.
You
know
superior
up
to
the
image
we
are
using
this
PDF
tool
and
we
call
the
edge
scanner
components.
C
You
need
to
specify
the
full
name
package,
name,
docker
image,
which
the
tag
information
here
and
based
on
the
number
of
documents
in
the
process.
We
can
horizontal
scale.
You
can
set
a
number
of
tasks.
You
want
to
ask
you
for
this
component
in
openshift,
and
also
how
many
documents
you
want
to
run
in
one
batch
expensi.
It
is
very
flexible
and
highly
customized
for
your
own
processing.
C
Okay,
next
component,
so
come
to
pick
up,
is
Desiree
OCR
component.
We
also
need
to
divide
to
find
sounds
parameters
whether
you
want
to
pass
through
the
tesseract
command
and
the
similar
to
the
first
component.
Unity
finds
a
package
name
component
name
image:
double
image
with
attack
information
instance
batch
size.
C
We
do
have
two
more
components:
Phase
II,
which
is
running
the
ILP
processing
and
intelligent
on
the
engine
to
extract
into
the
field
right
forms,
financial
contract
documents.
Okay,
once
you
define
each
component
using
si
speaking
the
next
step
is,
you
can
create
the
workflow,
so
is
actually
defined.
Workflow
a
pipeline
by
defining
a
directed
acyclic
graph.
That
simular
to
spark
on
has
at
this
point
is
skill.
Like
a
lazy
evaluation,
you
just
create
edge
for
the
components
right.
C
B
Our
idea,
the
infrastructure
kind
of
foundational
pieces
that
the
platform
is
this,
but
none
of
these
other
containers
are
deployed
onto
the
platform
until
the
execution
in
the
workflow
and
then
there's
a
determination
of
how
many
do
I
need
so
help.
How
many
OCR
job
so
I
need
to
run
how
many
containers
do
I
run
and
then
we'll
run
in
parallel,
complete
their
job
and
then
they'll
all
shut
down.
So
it's
it's
only
using
the
capacity
when
it
needs
it,
and
then
it
shuts
down
and
produces
the
output.
Yes,.
C
Okay,
so
you'll
see
we
just
finished
the
page
standard
components.
The
first
component
and
it's
completed
is
terminating
Reynold
and
move
on
to
the
next
second
component,
which
is
tested
right
here,
and
it's
launching
right
now.
Okay,
also,
if
you
don't
want
to
using
the
open
shape
portal
to
check
the
standards,
you
can
also
using
our
SDK,
which
is
the
workflow
doubt
standards
because
cactus
that
is
won't
you.
C
C
C
C
C
B
Great
so
again,
just
to
recap:
on
on
what
we
saw,
we
went
over
a
little
bit
on
why
we
decided
to
invest
in
my
platform
and
I
apologies.
I
did
get
kicked
off
a
little
bit
there
I'm
gonna
store
in
the
past
through,
but
we
saw
why
would
we
build
at
night
the
kind
of
capabilities
that
ignites
an
open
ecosystem,
build
on
containers,
micro
services
that
can
plug
into
other
offerings
or
other
cloud
offerings?
B
Other
data
science
offerings
on
Prem
and
and
helps
to
manage
the
fall
and
lifecycle
deployment
and
production
on
station,
as
well
as
model
management
of
a
model
concepts
built
on
top
of
a
loom
so
luma
in
the
amount?
It's
easy
for
everything
to
communicate
with
that
and
then
at
the
end
of
it
it
produces
some
type
of
output.
That's
usually
then
fed
to
a
downstream
application.
B
So
there
could
be
like
an
exception
process
where
any
of
the
predictions
that
need
to
go
be
reviewed,
go
into
an
exception
queue
once
I
could
feed
through
feed
through
to
in
the
business
system
to
help
make
those
business
decisions.
But
that
was
I
think
everything
that
we
wanted
to
show
and
again
apologies
for
the
a
little
often
on
there
with
connectivity.