►
Description
ML Microservices with gRPC and Python by Andrej Baranovskij, Founder/Senior Developer at Katana ML
Andrej Baranovskij, Founder/Senior Developer at Katana ML, will present "ML Microservices with gRPC and Python". ML logic flow can be complicated and it is advisable to split implementation into different services. For example, microservice for data preparation, another microservice for model training and so on. When logic flow is split into different services, we need to transfer large amounts of data between these services. Andrej will show how this can be achieved with gRPC.
A
Okay,
so
thanks
for
joining
everyone
and
yeah
for
me
right
now,
it's
quite
late.
It's
actually
half
past
midnight
based
in
lithuania,
europe,
but
anyway,
I'm
no,
I'm
not
sleepy.
I
because
I'm
a
developer
and
I
never
sleep.
I
just
go
to
stand
by
sometimes
right
so
yeah.
A
So
it's
fine
for
me
and
thanks
for
this
invite
and
yeah
I'm
happy
to
talk
about
our
experience
of
the
way
how
we
use
grpc
and
just
a
very
quick
introduction,
what
we
do.
We
start
up
machine
learning,
related
startup
and
we
implement
different
solutions
that
help
to
automate
enterprise
and
one
of
the
things.
A
For
example,
we
work
right
now
is
automatic
extraction
from
data
extraction
from
the
documents
and
automatically
posting
invoices
and
stuff
like
that,
and
my
background
is
software
development,
so
I
I
jumped
into
machine
learning
from
software
development
and
from
the
start,
but
I
saw
is
like
when
you
start
with
machine
learning,
you
usually
start
with
notebooks
csv
files
and
you
load
csv
file
into
the
notebook
process,
data
train
model,
and
then
you
run
inference.
A
If
you
have
this
spaghetti
code
and
everything
is
inside,
all
the
code
is
in
a
single
place
then
becomes
very
hard
to
to
manage
the
stuff,
especially
because
in
in
machine
learning,
you
have
to
put
quite
a
lot
of
logic
for
data
processing
to
clean
data
process
it
in
a
certain
way
and
so
on.
And
if
you
do
processing
and
model
training
and
inference
in
in
a
single
place.
A
It
works
up
to
some
point
where,
when
complexity
grows
and
it's
hard
hard
to
manage
the
stuff
yeah
and
by
the
way,
if
you
have
any
questions
in
the
meantime,
just
yeah,
you
can
stop
me
and
task
so
I'll,
be
happy
to
follow
up
yeah.
So
what
what
we
saw
is
that
it
doesn't
work
when
to
run
everything
in
in
one
like
monolithic
application,
with
machine
learning.
A
But
what
typically
you
would
see
when
you
would
start
with
ml
when
everything
runs
in
a
single
place,
and
then
we
decided
that
for
our
product
we
need
to
split.
We
need
to
have
a
separate
services.
Each
service
would
be
responsible
for
its
own
stuff.
A
Like
data
fetching
data
transformation,
data,
cleaning
data
processing,
maybe
it
will
be
done
by
service
a
and
then
once
data
is
ready.
We
could
transfer
this
data
to
another
service
which
would
do
model
training
and
based
on
that
data,
and
when
model
is
ready,
then
there'll
be
another
model
which
should
serve
api
and
would
allow
to
execute
like
inference,
requests
to
do
to
do
the
processing
right
and
then
the
question
comes
with
that:
okay,
we
can
split,
but
this
one
is
specific
about
ml.
A
Is
that
you
operate
with
usually
flash
sets
sets
of
data.
So
when
you
train
the
model,
you
need
to
have
training,
sets
validation,
set,
testing
set
and
usually
there's
quite
a
lot
of
data,
and
you
need
somehow
to
transfer
this
data
from
one
service
to
another
right
and
the
first
option
that
would
come
to
to
the
mind
would
be
to
use
json
and
rest
calls
over
http.
A
It
works,
but
it's
not
convenient
because
as
data
sets
are
quite
large,
then
you,
you
may
have
lots
of
attributes
as
well
in
data
sets.
And
then,
if
you
use
json,
to
send
data
over
between
services,
then
it
comes
extra
complexity.
When
you
need
to
parse
the
data,
when
you
get
this
json
text
and
then
you
need
to
pass
all
the
data,
and
especially
the
problem
is
with
numerical
data
you
may
lose
some
precisions
and
so
on
and
so
on.
A
So
there
are
a
lot
of
small
details
that
kind
of
hidden,
but
then,
as
you
start
to
work
with
that,
it
can
comes
to
the
surface
and
it
becomes
quite
complicated.
A
So
then
we
started
to
look
for
alternatives.
What
other
options
we
could
use
to
implement
communication
between
services-
and
we
know
that
grpc
is
quite
suitable
for
our
case,
because
until
show
later
that
in
a
demo
because
with
grpc,
you
can
quite
easily
transfer
data
from
one
service
to
another,
and
you
don't
need
to
play
with
data
parsing,
because
you
kind
of
you're
able
to
define
the
method
which
returns
type
and
this
type
could
encapsulate
different.
Multiple
data
sets
so
in
in
a
single
call.
A
A
So
the
model
is
based
on
this
article
that
I
wrote
back
in
december
on
towards
data
science,
and
this
model
is
the
idea
of
the
model
is
to
keep
it
simple
right
and
it's
using
a
standard
data
set
that
you
would
be
able
to
find
in
different
ml
examples.
This
is
boston
housing
data
set.
It
comes
with
a
set
of
attributes
that
describe
price
for
certain,
like
real
estate
like
house,
for
example,
and
to
make
it
this
model
slightly
more
interesting.
A
We
are
training
model
to
kind
of
forecast,
not
only
single
attribute
price,
but
also
additional
attribute
people.
Teacher
teacher
ratios,
so
that's
based
on
a
set
of
attributes
a
model
is
able
to
predict
a
price
for
this
house
based
on
the
neighborhood,
maybe
when
the
house
was
constructed
and
so
on
and
so
on
and
additionally,
it's
able
to
predict
able
to
predict
another
another
attribute
called
ptr
just
to
make
it
more
fun
and
yeah.
A
If
you
would
be
interested
after
the
webinar
to
read
more
about
this
sample
model,
you
would
go
for
that
article
and
also
all
the
source
code
is
being
provided
on
the
github
and
is
being
referenced
in
the
bottom
of
this
article.
A
Okay,
so
let's
jump
to
the
source
code-
and
I
have
dem
application-
is-
is
quite
simple
because
I
effort
to
it
should
not
be
over
complicated
in
order
to
for
you
to
understand
what
we
are
doing
here.
So
it's
not
this
demo.
Application
is
not
our
main
product,
but
I
I
implemented
separate
application,
obviously,
which
specifically
shows
the
the
main
domain
solution
which
we
are
using
and
it
highlights
the
advantage
of
grpc
for
a
ml
domain.
A
Okay,
so
first,
there
are
two
two
applications:
the
first
one
is
implement
data
service
and
second,
one
implements
training
service.
A
So
the
idea
of
data
service,
that
is,
that
it
should
read
data
load
it
clean
it
up,
remove
some
attributes
that
you're,
using
as
a
target,
attributes
right
and
then
split
data
into
training
and
test
sets
and
and
then
do
data
normalization
as
well,
because
in
ml,
when
you
have
a
data
set
with
certain
numbers,
it
doesn't
work
to
send
this
just
this
raw
array
for
training,
because
if
number
distribution
is
is
quite
high,
you
may
have
one
attribute
it
can
be
like
if,
if
the
scale
is
different,
so
one
attribute
is
like,
maybe
from
one
to
ten
another
attribute
from
one
to
ten
thousand
and
in
this
case
model
would
not
train
very
effectively.
A
So
you
need
to
do
data
normalization
and
you
need
to
translate
all
the
attributes
to
be
in
the
same
scale.
For
example,
from
minus
one
to
one-
and
this
is
kind
of
common
task
in
melbourne,
so
this
job
is
done
as
well
as
a
data
in
data
service,
then
all
the
data
is
being
prepared
and
it's
being
sent
to
the
caller,
and
the
caller
in
in
in
our
case,
would
be
in
a
training
service
which
is
another
application
which
using
grpc
to
make
a
call
to
a
data
service.
A
It
gets
the
data
and
based
on
that
data
runs
the
training.
So
essentially,
this
is
very
common
ml
flow,
but
the
main
difference.
If
you
look
for
most
of
the
examples
of
ml
tutorials
or
just
any
like
overview
material
for
ml,
you
would
see
that
those
these
two
steps
are
done
usually
in
the
same
application.
A
So
the
idea
here
is
to
split
and
have
two
different
applications
to
do
this
to
do
the
job
and
okay,
if
you
look
into
the
data
service
application,
we
have
defined
here,
data
service,
proto
file,
and
in
this
profile
we
define
messages
for
the
request
and
response
request
is
quite
simple:
just
one
attribute
size
and
it's
kind
of
with
this
parameter
we
can.
A
We
can
have
the
option
to
split
original
data
set
into
the
training
set
and
test
set,
based
on
the
percentage
that
we
specify
in
this
test
size
parameter
for
the
response.
We
return
multiple
attributes,
we
return
normalized
data
for
train
test
sets
and
then
validation
set
normalized
as
well,
and
we
additionally
return
target
database
so
for
train
test
and
validation.
A
So
the
target
attributes
would
be
used
by
ml
model
to
when
when
training
will
run,
and
it
will
try
to
match
input,
attributes
from
train
test
validation
sets
with
with
those
targets,
attributes
from
train
test
and
validation
and
will
try
to
build
patterns
and
or
rules,
and
then
later
it
will
follow
those
patterns
to
work
with
unseen
data.
A
Okay
and
to
make
it
to
keep
it
simple.
This
one
service
defined
boston,
housing
and
this
method
called
prepared.
Data
is
being
defined
which
accepts
parameter
to
to
set
the
testing
size
and
it
returns
the
response,
all
the
all
the
data
that
we
was
prepared
right
and
then
the
next
step.
What
we
do
when
we
have
this
protofile,
we
generate
client,
grpc,
client
and
server,
and
for
those
of
you
who
would
be
interested
how
we
do
it.
A
I
included
a
readme
file
over
here
yeah
and
by
the
way,
this
source
code,
that
for
that
demo,
that
I'm
showing
is
also
available
on
github,
and
you
could
find
it
it's
this
example
is
available
for
anyone,
so
you
could
try
it
out
and
play
it
aft
after
the
webinar
as
well
yeah.
So
what
I
like
with
grpc,
because
in
the
past
I
was
working
a
lot
with
soap,
web
services
and
you
know
in
so
when
you
want
to
generate
clients
for
the
web
service.
You
have
the
option
to
generate.
A
A
And
always
this
code
that
was
generated
was
very
cumbersome
and
in
case
of
so
many
classes
which
generated,
and
it
was
hard
to
manage
all
the
stuff.
And
what
I
love
in
grpc
is
when,
when
we
generate
code
on
top
of
this
protofile,
there
are
just
two
files
being
generated
and
they're
quite
simple.
And
this
is
what
I
like,
because
it
you
don't
over,
complicate
your
application.
It's
kind
of
easy
to
manage.
A
A
This
is
a
standard
code
that
you
would
see
in
grpc
documentation,
nothing,
nothing
special
over
here
and
what
we
do
here
is
we
call
we
call
start
method
from
the
server
and
to
start
listening
for
incoming
connections,
and
this
is
where
this
class
it
implements
boston,
housing,
servicer,
the
one
that
was
defined
in
profile
and
we
import
import
the
the
script
you
just
auto
generated
from
the
profile
and
over
here
we
implement
prepare
data
method.
The
the
method
which
is
was
defined
over
here
in
metadata
file.
A
Right-
and
this
is
the
the
main.
This
is
the
main
logic
where
we
handle
stuff.
So
we
recall
the
first
thing
we
do.
We
call
prepare
datasets
methods
from
from
data
helper
right
and
if
you
look
into
the
data
helper,
this
is
standard
codes
that
you
would
see
in
any
in
most
of
the
ml
examples
right.
A
A
Then
we
normalize
data
for
train
test
and
validation
sets
and
then
finally,
we
return
all
this
all
those
sets
back
to
the
caller
right,
so
the
data
is
being
prepared
and
in
in
this
case,
as
I
mentioned
already,
a
standard
ml
code
is
being
used,
the
same
code.
You
would
see
in
in
any
way
in
any
ml
implementations.
Typically
right,
then
we
get
back
data
sets
and
then
we
have
print
out
information
about
datasets
just
for
for
debug
purpose
and
then
at
the
end
we
would
return.
A
We
would
construct
a
response
of
the
same
type
as
it
was
defined
in
a
profile,
boston,
housing
response.
A
Then
we
for
each
element
from
that
response.
We
assign
the
value,
and
the
tricky
part
is,
is
that
we
operate
with
numpy
arrays
when
we
prepare
data
sets
it's
not
just
a
simple
python
array
or
whatever
it's
a
numpy
array
and
by
default.
A
Grpc
doesn't
support
numpy
type,
and
you
cannot,
just
out
of
the
box,
simply
send
this
array
through
gfpc,
but
the
standard
way
how
to
solve
this
problem
is
to
create
by
terry,
because
numpy
library
allows
to
convert
or
save
numpy
array
into
the
byte
ray
using
standard,
save
method
like
we
do
it
here,
so
we
create
bytes,
io
object
and
then
assign
we
basically
copy
them
from
original
numpy
array.
We
copy
this
array
into
the
bytes
array
and
using
a
numpy,
save
method.
A
Right
then
the
same
stuff
is
done
for
all
datasets
and
then
this
data
says
that
from
numpy
array.
Basically,
I
converted
to
by
sorry
being
sent
for
the
response.
A
A
Training
service
over
here
we
have
the
same,
both
grpc
files
that
are
generated
generated
in
data
service,
the
same
the
same
files
are
being
copied
over
here
and
then
we
have
just
for
demonstration
purpose,
training,
service,
test
script
is
being
created
which
initiates
training
service,
and
this
is
the
one
and
then
we
call
run
training
method
right,
and
this
is
the
main
method
which
is
is
here
and
what
we
do
here.
We
first
of
all
fetch
data
right,
and
we
call
fetch
data
methods
and.
A
This
method
gets
gets
the
data
from
data
controller
from
here,
so
we've
got
our
phage
data
method,
which
actually
what
we
do
here.
We
use
grpc
api
and
we
pass
input
parameter
test
size
which
will
be
used
to
split
data
into
training
and
test
sets,
and
we
make
a
call.
A
We
call
prepare
data
method
and
we
get
back
response
and
response
returns
byte
by
terrys,
and
then
we
need
to
load
because
we
want
to
to
keep
operating
with
numpy
arrays.
When
we
run
training,
we
want
to
use
numpy
re,
so
we
need
to
load
back
from
by
3
into
numpy
and
again
we're
using
here
standard
numpy
method
called
np
load.
So
no,
no,
no,
hacking,
no,
nothing!
A
Nothing
special,
just
using
standard
standard,
api
right
and
the
only
the
only
tricky
thing
here
is
that
when
targets
at
the
target
attributes
array
was
converted
to
by
theory,
it
was
targeted
since
we
use
our
model
is
training
for
multiple
attributes.
A
For
two
attributes
to
be
precise
and
targets
object,
it
was
double
and
containing
both
race
and
when
those
this
object
was
converted
to
by
theory,
it
was
array
was
created,
like
a
router
main
array,
with
two
arrays
inside,
and
this
is
how
it
was
sent
and
when
we
converted
back
to
numpy
array,
then
it
was
not
converted
to
be
exactly
the
same
object
object
as
it
was.
Originally.
It
was
converted
to
numpy
array
with
two
arrays
inside
and
then
we
do
this
extra
step
to
convert
it
back
to
be
exactly
the
same.
A
Object
tuple
object
to
be
the
same
like
it
was
originally
because
otherwise
tensorflow
ways
in
tensorflow
and
tensorflow
training
model
the
method
which
trains
the
model.
The
feed
method
would
not
be
able
to
recognize
this
target
as
the
one
which
is
suitable
for
training.
A
So
we
do
this
little
trick
to
return
the
state
as
it
was
originally
and
as
it
is
done,
then
we
return
all
the
sets
back
to
the
main
method
from
where
it
was
called
and
and
then
the
rest,
it's
kind
of
straightforward,
because
we
build
the
model
and
compile
it
and
then
run
a
standard,
tensorflow
fit
api
call
which
trains
the
model,
and
we
pass
the
data
here
and
this
data,
because
it's
num
we're
using
numpy
race,
which
we
brought
back
from
bite
siri.
A
Then
it
works
the
same
as
like
it's
like,
like
with
original
data
sets.
It
doesn't
know
that
data
actually
travels
through
flow
grpc
from
another
service,
it's
transparent
for
in
this
case,
okay,
and
just
to
show
you
that
it
actually
works
yeah.
I
could.
A
Run
training
service
test,
then
it
makes
a
call
to
another
application.
Data
was
prepared
over
here
to
send
back
and
then
training
loop
runs
right
and
training
executed,
100
and
it
reports
back
information
about
the
quality
of
the
training
right.
So
that's
a
standard
thing
as
well.
So
we
see
that
communication
works.
We
were
able.
We
are
able
to
make
a
call
from
training
service
to
data
service.
A
You
would
have
probably
logic
for
data
processing
would
be
way
more
complex
and
you
may
have
different
different
checks
and
stuff
running
for
data
processing
and
having
this
logic
to
be
turned
to
to
be
runnable
in
separate
services
is
good
for
maintenance,
but
not
only
it's
also
good
for
for
runtime,
because
if,
when
you
have
data
processing
running
in
one
service,
training
running
in
different
servers,
you
have
more
options
for
for
scalability.
A
You
could
run
this
service
for
for
data
processing,
on
one
machine
or
on
one
container,
with
certain
resources
and
training
we
would
run
on
on
gpu,
for
example,
to
improve
training
performance
and
and
so
on,
right.
A
It's
kind
of
natural
because
comparing
to
those
erp
applications
when
you
have
database
in
the
background,
if
machine
learning
you
typically
you
don't
have
a
database
and
it's
easier
to
split
logic
into
different
services-
you're
not
constrained
by
database,
so
it
makes
it
more
more
natural,
okay.
So
that's.
A
That
was
my
demo
and
yeah.
The
main
point
was
to
show
you
that
in
to
explain
how
we
applied
grpc
in
a
ml
domain
and
to
explain
specifics
on
of
ml
and
why
I
think
grpcs
is
useful
for
for
ml
use.