►
From YouTube: Development and Testing Autonomous Vehicles at Scale - Frank Kraemer (IBM) OpenShift Commons 2022
Description
Development and Testing Autonomous Vehicles at Scale - Frank Kraemer (IBM)
OpenShift Commons Gathering on Automotive
April 6th 2022
full agenda here: https://commons.openshift.org/gatherings/OpenShift_Commons_Gathering_on_Automotive.html
A
Yeah
development
and
testing
of
autonomous
vehicles
at
scale
next
20
minutes-
you
can
reach
me
at
the
email
or
send
me
look
me
up
in
linkedin.
I'm
frank,
remember,
ibm
systems,
architect,
long-time
ibm
guy
last
three
and
a
half
years
working
very
closely
with
them
with
automotive
companies,
oems
tier
ones,
specifically
in
germany,
but
also
around
the
world,
and
the
idea
of
this
short
presentation
is
to
give
you,
let's
say
a
little
bit
the
view
of
the
project
that
we
see.
A
Some
of
these
are
references
where
we
work
with
customers
and
the
customers
agreed
to
share
their
name.
Of
course,
development
needs
av
data
or
data
for
av
development.
I
want
to
share
my
experience
with
data
center
design,
what
we
work
to
specifically
work
together
in
this
area
with
equinix
and
also
with
ntt
data
and
then
for
use
cases
if
we
find
the
time
for
that.
So
what
we
learned,
what
are
the
challenges
in
av
data
management
more
or
less,
and
I
think
we
we
touch
stem
of
these
points.
A
Data
ingesting
and
preparation
cycles
are
very
time
consuming.
This
is
mainly
to
do
with
that.
The
it
metadata
is
different
to
the
car
engineering
metadata
and
I
think
the
picture
here
shows
it
in
a
very
clear
way.
This
is
the
poor
engineer
who
who
knows
that
there
is
data
available
to
solve
this
problem,
but
he's
not
able
to
find
it
because
he
has
not
the
right
technology,
not
the
right
software,
not
the
right
processes.
In
order
to
do
so.
As
a
result,
you
see
many
silos
of
infrastructure
cost
intensive.
A
A
So
what
we're
looking
for
is
to
be
reproducible
to
be
efficient
and
to
be
resilient,
and
this
also
over
a
long
time
of
period
of
time,
because
these
cars
are
on
the
road,
probably
the
next
15
to
20
years,
more
or
less
okay.
So
what
do
we
need
yeah?
So
the
old
saying
of
data
is
the
new
oil.
Well,
it's
a
little
bit
more
limited
for
me.
I
think,
like
all
data
is
valuable,
but
you
have
to
refine
it
because
before
you
can
use
it
otherwise
it's
just.
A
It
is
just
useless
more
or
less
and
the
refinement
of
the
data-
and
this
is
use
case
number
one-
has
to
be
done
not
in
the
refineries
like
on
the
right
side,
but
the
new
refineries
are
collocation
data
centers,
and
I
think
this
brings
us
to
the
idea
of
what
do
we
need
or
what
do
we
see
in
this
game?
Yeah
and
I
think,
first
of
all,
you
need
a
collocation
data
center.
A
This
location
needs
to
be
connected
with
a
high-speed
networking
connection
and
I'll
tell
you
in
a
second
what
I
mean
with
high-speed
networking
connection,
because
these
high-speed
networking
connections
are
required
to
reach
the
public
cloud
providers
and,
as
we
already
said,
public
cloud
is
a
typical
play
here.
So
we
see
the
major
players
in
this
game.
A
Aws
and
azure
as
the
two
big
ones,
google
cloud,
oracle
cloud,
ibm
cloud
and
several
others
tencent
cloud,
maybe
also,
but
all
of
these
are
important
and
what
we're
doing
in
the
collocation
data
center
space
in
the
first
one
we
put
the
data
into
the
data
space.
We
try
to
do
some
analytics
on
it.
We
have
to
do
some
cpu
computing.
We
have
to
do
some
gpu
computing.
A
We
have
to
use
this
data
for
testing
hill
testing,
sill
testing
simulation
and
all
the
game
more
or
less,
and
if
this
collocation
data
center
is
big
or
small,
there's
a
little
depending
on
the
let's
say,
concept
and
the
costing
structure
there,
but
I
think
putting
everything
in
cloud
is
possible.
No,
if
you
have
a
golden
credit
card,
that's
the
way
to
go.
A
If
you
want
to
make
it
a
little
bit
smarter,
faster,
I
think
hybrid
cloud
is
a
very
nice
play
and,
of
course,
in
this
game,
containerization
open
shift
kubernetes
is
is
the
way
to
go
okay.
What
do
I
mean
with
high
speed
networking
connection
on
the
left
side?
You
see
not
a
high
speed
networking
connection.
Now
this
is
a
slow
networking
connection.
Now
this
is
typically
what
we
see
when
we
talk
to
car
manufacturers.
A
So
that's
reality.
What
do
we
need
for
av
development
and
in
this
game
this
means
highways.
Data
highways,
fast
lanes,
fast,
networking,
high
speed,
parallel
networking
latest
technology
more
or
less.
This
makes
it
possible
to
cope
with
the
data,
and
this
is
what
we
have
to
do,
and
this
is
where
physics
and
reality
comes
into
play.
These
clouds
are
made
of,
or
these
are
existing
data
centers.
They
are
servers,
they
are
networking
there.
There
are
there's
gpu
and
physics
supplies
more
or
less.
Everybody
knows
this.
Okay.
A
What
do
we
also
see
in
this
game
is
that
this
is
a
distributed
play
around
the
world,
which
means
we
do
have
typically
a
three
three
location
around
the
world
in
the
us
multiple
sites
and
we
aggregate
these
multiple
locations
in
a
single
color
or
in
twin
color,
more
or
less.
We
also
see
this
in
europe
and
we
also
see
this
in
asia,
and
we
specifically
if
these
is
in
china.
A
This
is
a
specialized
thing
because
everything
have
to
stay
in
china,
but
it's
a
multi-place
game
and
we
have
to
interconnect
that
an
interconnection
also.
This
is
very
important,
not
using
old
mpls
technology,
which
is
very
expensive
and
using
new
software-defined
van
technology.
This
is
a
perfect
match
to
the
red
hat
product
portfolio
and
sd-wan
open
run
all
these
new
technologies,
where
software
defined
is
the
way
to
go
using
overlay,
networking
and
a
very
smart
way
of
of
orchestrating
this,
these
overlay
networks
more
or
less
so,
okay.
A
What
does
it
look
like?
Then?
If
we
go
into
such
a
collocation
data
center?
This
is
now
let's
say
some
work.
We
did
together
with
equinix
on
the
left
side.
You
see
heavy
heavy-duty
in-car
data
capturing,
and
we
did
this
together
with
a
partner
here,
and
we
did
this
with
a
company
called
b-plus
and
siemens-
and
we
can
also
do
this
with
other
companies.
It's
not
a
big.
It.
A
Seagate
ni
so
same
same
play:
mall
as
driving
around
collecting
data.
We
see
about
50
to
80
terabytes
per
development
car
per
eight
hour
shift,
so
this
is
big
data
more
or
less.
A
We
have
to
have
some
uploading
stations
in
order
to
receive
the
data
and
then,
of
course,
we
put
them
in
a
data
lake
and
it's
very
smart
from
our
perspective
in
order
to
work
on
this
data
as
as
soon
as
possible,
in
order
to
find
the
right
pieces
of
data
which
are
relevant
for
the
for
the
later
stage
of
the
ai
training,
and
then
then,
there's
still
something
missing.
So
I
think
the
software
gap
that
we
see
in
this
area
has
to
be
closed.
A
A
If
we
find
the
right
data,
then
we
could
put
it
in
a
fast
file
system,
and
this
fast
file
is
typically
connected
to
ai
training.
I'll
show
you
this
in
a
second
or
we
can
use
and
upload
the
data
to
the
cloud.
We
can
also
use
the
data
and
store
it
on
the
local,
cheap
tape,
drive
same
costing,
structure
or
cheaper
costing
structure
as
aws
s3
clasher,
or
we
can
also
use
it
in
in
combination
with,
and
this
is
something
new
with
equinix
metal.
This
is
servers
out
of
equinix.
A
They
bought
the
company
packet.com
and,
of
course,
they
run
linux
and
they
run
kubernetes
and
that's
a
very
good
fit
in
order
to
mix
and
match
between
your
own
servers,
the
servers
and
the
infrastructure
rented
from
or
by
provided
by
equinix,
and
also
the
interaction
with
with
cloud
with
cloud
providers
there
to
make
this
a
little
bit
more
clear.
This
is
a
picture.
A
This
is
how
it
looks
like
in
the
reality
from
an
equinix
perspective,
yeah
we
do
have
multiple
oems
or
multiple
tier
bonds
or
any
mix,
and
they
are
they're
collecting
data
in
a
very,
very
extensive
way.
We
have
to
have
multiple
facilities
interconnected
around
the
world.
We
have
to
use
fireballing
between
them.
A
They
are
ready
to
go
firewalling
technology
and
I
think
this
picture
shows
it
from
a
complexity
side.
Now
this
is
really
big.
This
is
expensive
and
we
have
to
you
have
to
fine
tune
this,
and
you
have
to
make
really
sure
that
that
this
is
what
the
customer
wants
and
and
and
also
the
interaction,
the
multi-tenancy
bringing
data
together
over
multiple
tier
ones.
Things
things
like
that,
so
this
is
all
important,
more
or
less
okay
use
case
number
two
yeah.
Why
are
we
collecting
this
data
like
like
crazy,
because
we
want
to
do
ai
training?
A
Ai
training
is
the
holy
grail
of
robotic,
robotic
cars
in
this
game,
and
we
all
know
that-
and
I
think
this
picture
also
shows
it
very
well.
You
can
have
the
fastest
car
in
the
world,
but
if
you're
stuck
in
the
traffic
jam
does
not
well,
it
does
not
help
more
or
less,
and
this
is
really
relevant
for
ai
training.
You
can
have
hundreds
of
gpus,
but
if
you
do
not
have
the
right
data
well,
you
just
wait.
A
What
does
it
need,
and
this
has
done-
work
we've
done
together
from
an
ibm
perspective
from
our
data
side,
file
system,
side,
nvme
software
defined
data
in
combination
with
vetted,
because
containerization
is
also
very,
very
important
here
and
the
certification
that
we
have
to
get
from
nvidia
in
order
to
feed
these
databases
like
the
dgxa100n
in
the
future,
the
very
new
h100
system
we
provided
reference,
architectures,
certifications,
best
practices,
performance
guides
and
etc,
etc.
A
A
With
the
data
lake
that
we
already
talked
about,
we
can
split
a
little
bit
and
match
it
and
fine-tune
the
data
lake
in
order
to
have
hot
data
very
closely
available,
using
very,
very
fast
infiniband,
with
low
latency
in
order
to
feed
these
gpu
systems
and
keep
the
volume
of
the
data
on
on
the
lower
or
on
a
on
a
colder
tier,
which
is
a
little
bit
more
less
expensive,
more
or
less.
These
are
the
optimization
tricks
that
we
can
play
here.
A
Reference
customers,
as
I
said
it's
continental
based
in
germany,
equinix
in
germany,
big
installation,
lots
of
gpus
lots
of
dgx
systems,
big
file
systems.
So
we
do
this
together
with
that.
It's
a
very
well
known
reference
customer.
They
extended
the
installation
multiple
times
now
from
the
gpu
side
and
also
from
the
storage
side
publicly
available
as
pdf
just
have
a
look
at
it.
I
think
it's
a
very
good
reference,
smallest
and
we
can
talk
in
detail
if
there's
a
need
for
that
one.
A
How
does
it
look
like
in
such
a
collocation
data
center
and
on
the
upper
left
on
the
upper
right
side?
You
see
the
real
actual.
The
real
picture
from
the
equinix
site
they
posted
this
on
the
web.
So
it's
freely
it's
it's
really
available.
These
are
very
very
large
data.
Centers
collocation
means
there
are
other
customers
in
the
same
building
or
on
the
same
compass
and
specifically
for
europe.
All
the
major
cloud
providers
are
in
colu
data
centers.
A
They
are
just
in
on
the
same
compass,
and
this
makes
it
very
easy
to
have
a
fast
networking
line,
because
your
own
data
center
is
more
or
less
physically,
very
close
to
to
hyperscaler
data
centers,
not
they
spend
over
multiple
collocation
providers,
but
typically
they
have
from
a
networking
side.
It
is
very,
very
close.
A
cloud
has
to
be
closed
in
order
to
use
this
high
high
speed
connections
more
or
less.
A
They
put
it
in
a
in
a
in
a
in
a
in
a
rack
and
this
software
and
this
hardware
software
combination
just
acts
the
same
as
it
would
be
in
a
car,
and
then
you
have
not
only
one,
you
have
hundreds
of
these
hills,
rigs
hill
stations
and
you
feed
them
with
the
real
data
that
you've
that
you
have
recorded
on
the
road,
not
only
the
data
that
we
use
for
ai
training,
you
feed
your
complete
data
stack
to
those
hill
systems,
so
this
means
hundreds
of
petabyte
for
each
hill
hill
hill
run
is
typically,
and
if
you
keep
all
your
data
in
in
cloud
well,
you
have
to
pay
the
equals
charges
for
that.
A
So
that
makes
it
a
very
expensive
operation.
Then
this
is
what
several
of
these
customers
have
found
out.
So
it's
a
very,
I
think
it's
very
smart
to
put
some
of
the
data
close
to
your
hill
testing,
where
it
is
available
for
less
casting,
and
we
can
also
combine
this-
and
this
is
typically
done.
So
the
more
modern
guys
are
using
more
software
in
the
loop
testing,
where
the
where
the
hardware
is
replaced.
A
We
are
a
softer
model
and
typically
the
software
model
is
is
using
kubernetes
and
openshift,
and,
but
still,
I
think
there
is
there's
some
need
for
hill
testing,
but
we
see
a
tremendous
increase
in
software
in
the
loop
testing
software
only
but
still
testing
is
is,
is
very
relevant,
and
this
is
one
of
the
pictures
we
could.
We
can
and
we
can
create
this
more
or
less
okay,
when
we
all
put
it
on
a
on
a
chart.
A
This
is
well
not
the
greatest
chart
in
the
world,
but
I
think
it
shows
what
we
have
to
do
from
an
automotive
perspective.
Yeah,
we
have
to
collect
data,
we
have
to
do
ai
training
or
data
preparation,
ai
training.
We
have
to
do
hotter
in
the
loop
and
software
in
the
loop
and
we
have
to
do
simulation,
which
I
come
to
it
as
a
as
a
use
case
number
four,
and
I
think
this
fits
very
very
nicely
to
the
overall
architecture
of
of
openshift
that
we
can
do
all
these
things.
A
Also,
some
something
important
specifically
in
the
hill
space
is
also
windows
server,
because
some
of
these
hills
rigs
are
still
using
windows
operating
systems
that
we
cannot
get
rid
of
as
at
least
very
fast.
But
this
still
is
very,
very
good
in
from
an
integration
point
using
kubernetes.
Also
on
the
on
the
in
the
windows,
environment
should
be
no
big
deal
either.
This
is
this
runs
physical
virtual
in
the
private
cloud
and
the
public
cloud
or
any
mix
on
the
edge
or
in
a
co-located
system
does
not
really
does
not
make
any
difference.
A
As
long
as
we
have
the
right
software
concept
for
that,
and
as
jill
already
said,
this
is
the
way
why
containerizing
and
container
and
operators
is,
is
the
big
thing,
and
this
is
also
what
we
see
here
here
in
this
game.
We
also
see
it,
of
course,
aws
and
azure
being
the
dominant
cloud
players
here
and
if
we're
using
the
elastic
kubernetes
service
or
the
azure
kubernetes
service,
so
we
can
also
very
easily
intermix
and
play
depending
on
the
costing
structure.
A
There
last
use
case-
and
I
think
this
is
tremendously
active
at
the
moment-
and
still
it's
it's
it's
very
it's
kind
of
new
and
the
people
are,
the
market
is
still
is
still
involving,
which
means
the
simulation
testing
on
the
road
is
very,
very
expensive
and
specifically
in
the
last
year
and
all
the
two
last
years
where
the
test
release
had
to
be
shut
down,
and
there
was
a
lack
of
of
getting
around
in
in
the
world
people
starting
saying.
Well,
how
can
we
do
the
simulation
here?
A
Can
we
use
sometimes
gaming
technology,
virtual
worlds
and
add
the
physics
and
the
reality
and
the
and
the
model
a
on
the
and
the
sensors
which
are
on
these
cars?
If
we
have
a
software
model,
which
is
good
enough,
and
can
we
combine
this
in
order
to
verify
if
the
testing
that
we
did
on
the
road
is
correctly
and
can
we
match
it
and
can
we
extend
it
and
can
we
do
variations,
and
I
think
this
graphics,
which
I
stole
from
the
ntu
project
in
in
in
singapore-
puts
it
very
nicely
together.
A
You
need
a
virtual
test
orchestrator.
Typically,
we
have
to
have
the
vehicle
dynamics.
We
have
to
have
the
right
interfaces.
Then
we
need
scenarios
scenarios
now
open
scenario
comes
out
of
the
asam
consortium.
I
think
this
is
very,
very
good.
Work,
open,
drive,
open
scenario,
open
road.
I
think
this
is
the
right
way
to
go
in
order
to
have
a
unique
common
understanding
and
language
which
the
which
the
engineers
understand.
We
need
traffic
models.
We
need
environment
models.
A
This
is
typically
done
with
co-simulation
and
also
sensor
models
which
are
close
to
the
real
thing
and
which
has
to
be
provided
by
the
real
sense
of
center
people
there
and
we
put
everything
together
and
what
we
see
in
this
area
is
using
the
grpc
standard
from
from
google
in
order
to
get
everything
together
in
a
very,
very
close
and
and
like
a
shared
memory,
space
or
working
together
as
a
as
a
good
model
which
which
is
working
on
not
a
single
executable
but
multiple
different
containers,
which
have
to
be
scheduled
in
combination
on
the
same
cluster,
very,
very
interconnected,
and
then,
of
course,
this
workload
then
will
have
to
be
scaled
out,
which
means
if
it
runs
for
a
single
engineer.
A
Typically,
they
look
at
the
virtuals
or
they
look
at
the
screen.
But
when
everything
is
okay,
then
the
screen
gets
detached
and
they
run
the
simulation
a
million
times
or
even
even
more,
in
order
to
find
out
things
which
are
edge
cases
which
they
change
the
weather.
They
change
the
conditions
they
change
the
cars
they
etc,
but
it
has
to
be
matched
to
the
reality.
A
I
think-
and
this
is
still
very,
very
critical,
because
we
also
have
two
two
different
ways
of
thinking
here:
the
real
car
guys,
they
think
only
driving
is
the
real
thing
and
people
in
the
computer.
They
think
this
is
sometimes
it's
gaming
yeah,
and
this
is
not
gaming.
This
is
reality,
modulation
and
simulation,
and
it
has
to
be
accurate.
That's
the
problem
more
or
less
okay.
I
hope
I
think,
of
course,
for
us.
What
do
we
need
there?
A
I
think
kubernetes
platform
is
the
platform
of
choice
for
the
autonomous
vehicle
development.
It
has
everything
that
we
need
the
integration
also
to
the
nvidia
playground.
The
nvidia
ngc
container
registry,
where
there
is
lots
of
software
available
from
the
open
source,
which
is
gpu,
enabled
and
ready
to
go
and
ready
to
be
consumed,
can
be
very,
very
easily
constructed,
and
I
think,
there's
still
some
way
to
go,
because
the
poor
car
engineers
are
quite
new
to
this
modern
way
of
computing.
But
I
think
this
is,
I
think,
there's
no
alternative
any
longer.
A
Most
of
the
people
have
heard
something
they
know
something
about
dogger,
but
well.
Dogger
is
a
little
bit
old.
Now-
and
you
need
of
course
much
more,
but
I
think
time
will
tell
and-
and
I
think
we
are-
we
are
on
the
right
track
here-
and
everything
which
has
been
said
before
at
this
conference
fits
very
nicely
in
this
picture.
I
think
okay
well
last
picture
just
to
give
you
some
example
and
we're
happy
to
show
you
the
in-car
recording
from
the
left
side
with
partner
siemens
and
b
plus.
A
So
thanks
for
that
frank,
we're
we're
hoping
that
the
back
of
my
car
does
not
look
like
the
back
seat
of
that
car
in
the
future.
That
I
saw
that
and
I'm
like.
Oh
no,
please
not
that.
So
this
is
the
demo.
It's
a
demonstration
vehicle
yeah,
but
they
are.
They
can
record
with
with
a
high
very,
very
high
bandwidth
there.
So
it's
not
really
needed
in
it's
more
than
you
need
more
or
less.