Red Hat OpenShift Automotive 2022 OpenShift Commons Gathering, 6 Apr 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Development and Testing Autonomous Vehicles at Scale - Frank Kraemer (IBM) OpenShift Commons 2022

Description

Development and Testing Autonomous Vehicles at Scale - Frank Kraemer (IBM)
OpenShift Commons Gathering on Automotive
April 6th 2022

full agenda here: https://commons.openshift.org/gatherings/OpenShift_Commons_Gathering_on_Automotive.html

A

Yeah development and testing of autonomous vehicles at scale next 20 minutes- you can reach me at the email or send me look me up in linkedin. I'm frank, remember, ibm systems, architect, long-time ibm guy last three and a half years working very closely with them with automotive companies, oems tier ones, specifically in germany, but also around the world, and the idea of this short presentation is to give you, let's say a little bit the view of the project that we see.

A

um Some of these are references where we work with customers and the customers agreed to share their name. um Of course, development needs av data or data for av development. um I want to share my experience with data center design, what we work to specifically work together in this area with equinix and also with ntt data and then for use cases if we find the time for that. um So what we learned, what are the challenges in av data management more or less, and I think we we touch stem of these points.

A

Data ingesting and preparation cycles are very time consuming. This is mainly to do with that. The it metadata is different to the car engineering metadata and I think the picture here shows it in a very clear way. This is the poor engineer who who knows that there is data available to solve this problem, but he's not able to find it because he has not the right technology, not the right software, not the right processes. In order to do so. As a result, you see many silos of infrastructure cost intensive.

A

These silos can be on cloud on-prem or any in in any mixer. We see multiple copies of the same data without a single source of truth and data gets bigger and bigger, so 300 to 500 petabytes of data is typically the play that we see here and, as we all know, um it's expensive more or less.

A

So what we're looking for is to be reproducible to be efficient and to be resilient, and this also over a long time of period of time, because these cars are on the road, probably the next 15 to 20 years, more or less okay. So what do we need yeah? So the old saying of data is the new oil. Well, it's a little bit more limited for me. I think, like all data is valuable, but you have to refine it because before you can use it otherwise it's just.

A

It is just useless more or less and the refinement of the data- and this is use case number one- has to be done not in the uh refineries like on the right side, but the new refineries are collocation data centers, and I think this brings us to the idea of what do we need or what do we see in this game? Yeah and I think, first of all, you need a collocation data center.

A

You need a place where you can, where you bring your data, the test data, which is run by the test cars you have to bring them into a certain location.

A

This location needs to be connected with a high-speed networking connection and I'll tell you in a second what I mean with high-speed networking connection, because these high-speed networking connections are required to reach the public cloud providers and, as we already said, public cloud is a typical play here. So we see the major players in this game.

A

Aws and azure as the two big ones, google cloud, oracle cloud, ibm cloud and several others tencent cloud, maybe also, but all of these are important um and what we're doing in the collocation data center space in the first one we put the data into the data space. We try to do some analytics on it. We have to do some cpu computing. We have to do some gpu computing.

A

We have to use this data for testing hill testing, sill testing simulation and all the game more or less, and if this collocation data center is big or small, there's a little depending on the let's say, concept and the costing structure there, but I think um putting everything in cloud is possible. No, if you have a golden credit card, that's the way to go.

A

If you want to make it a little bit smarter, faster, I think hybrid cloud is a very nice play and, of course, in this game, containerization open shift kubernetes is is the way to go okay. What do I mean with high speed networking connection on the left side? You see not a high speed networking connection. Now this is a slow networking connection. Now this is typically what we see when we talk to car manufacturers.

A

This is in their garage in their working environment and sorry to say: wolfsburg is not the center of the earth and it's not very well connected to any cloud they found out.

A

So that's reality. What do we need for av development and in this game this means highways. Data highways, fast lanes, fast, networking, high speed, parallel networking latest technology more or less. This makes it possible to cope with the data, and this is what we have to do, and this is where physics and reality comes into play. These clouds are made of, or these are existing data centers. They are servers, they are networking there. There are there's gpu and physics supplies more or less. Everybody knows this. Okay.

A

What do we also see in this game is that this is a distributed play around the world, which means we do have typically a three three location around the world in the us multiple sites and we aggregate these multiple locations in a single color or in twin color, more or less. We also see this um in europe and we also see this in asia, and we specifically if these is in china.

A

This is a specialized thing because everything have to stay in china, but um it's a multi-place game and we have to interconnect that an interconnection also. This is very important, not using old mpls technology, which is very expensive and using new software-defined van technology. This is a perfect match to the red hat product portfolio and sd-wan open run all these new technologies, where software defined is the way to go using um overlay, networking and a very smart way of of orchestrating this, these overlay networks more or less so, okay.

A

What does it look like? Then? If we go into such a collocation data center? This is now let's say some work. We did together with equinix on the left side. You see heavy heavy-duty in-car data capturing, and we did this together with a partner here, and we did this with a company called b-plus and siemens- and we can also do this with other companies. It's not a big. It.

A

um Seagate ni so same same play: mall as uh driving around collecting data. We see about 50 to 80 terabytes per development car per eight hour shift, so this is big data more or less.

A

We have to have some uploading stations in order to receive the data and then, of course, we put them in a data lake and it's very smart from our perspective in order to work on this data as as soon as possible, in order to find the right pieces of data which are relevant for the for the later stage of the ai training, and then then, there's still something missing. So I think the software gap that we see in this area has to be closed.

A

um Finding the right data in the right language of the engineer and it's not about the I.t metadata, it's about the engineering metadata, it's about temperature, velocity traffic situations, weather conditions, etc, etc.

A

If we find the right data, then we could put it in a fast file system, and this fast file is typically connected to ai training. I'll show you this in a second or we can use and upload the data to the cloud. We can also use the data and store it on the local, cheap tape, drive same costing, structure or cheaper costing structure as aws s3 clasher, or we can also use it in in combination with, and this is something new with equinix metal. This is servers out of equinix.

A

They bought the company packet.com and, of course, they run um linux and they run kubernetes and that's a very good fit in order to mix and match between your own servers, the servers and the infrastructure rented from or by provided by equinix, and also the interaction with with cloud with cloud providers there to make this a little bit more clear. This is a picture.

A

This is how it looks like in the reality from an equinix perspective, yeah we do have multiple oems or multiple tier bonds or any mix, and they are they're collecting data in a very, very extensive way. We have to have multiple facilities interconnected around the world. We have to use um fireballing between them.

A

We have to use the right cloud providers which these specific oems have been selected, and we have to mix and match that, and we don't have to reinvent the wheel there, because lots of the things are already available when we think about equinix collocation data center, so using equinix fabric as the interconnecting and the networking services.

A

They are ready to go firewalling technology and I think this picture shows it from a complexity side. Now this is really big. This is expensive and we have to you have to fine tune this, and you have to make really sure that that this is what the customer wants and and and also the interaction, the multi-tenancy bringing data together over multiple tier ones. Things things like that, so this is all important, more or less okay use case number two yeah. Why are we collecting this data like like crazy, because we want to do ai training?

A

Ai training is the holy grail of robotic, robotic cars in this game, and we all know that- and I think this picture also shows it very well. You can have the fastest car in the world, but if you're stuck in the traffic jam does not well, it does not help more or less, and this is really relevant for ai training. You can have hundreds of gpus, but if you do not have the right data well, you just wait.

A

What does it need, and this has done- work we've done together from an ibm perspective from our data side, file system, side, nvme software defined data in combination with vetted, because containerization is also very, very important here and the certification that we have to get from nvidia in order to feed these databases like the dgxa100n in the future, the very new h100 system we provided um reference, architectures, certifications, best practices, performance guides and etc, etc.

A

So we did these kind of installations, and this is just a picture to show it. How is it done more or less yeah? um You can pre inside the collocation data center. This is also the best place to to create your ai servers more or less and interconnect them.

A

With the data lake that we already talked about, we can split a little bit and match it and fine-tune the data lake in order to have hot data very closely available, using very, very fast infiniband, with low latency in order to feed these gpu systems and keep the volume of the data on on the lower or on a on a colder tier, which is a little bit more less expensive, more or less. These are the optimization tricks that we can play here.

A

Reference customers, as I said it's continental based in germany, equinix in germany, big installation, lots of gpus lots of dgx systems, big file systems. um So we do this together with that. It's a very well known reference customer. They extended the installation multiple times now from the gpu side and also from the storage side um publicly available as pdf just have a look at it. I think it's a very good reference, smallest and we can talk in detail if there's a need for that one.

A

How does it look like in such a collocation data center and on the upper left on the upper right side? You see the real actual. The real picture from the equinix site they posted this on the web. So it's freely it's it's really available. These are very very large data. Centers collocation means there are other customers in the same building or on the same compass and specifically for europe. All the major cloud providers are in colu data centers.

A

They are just in on the same compass, and this makes it very easy to have a fast networking line, because your own data center is more or less physically, very close to to hyperscaler data centers, not they spend over multiple collocation providers, but typically they have from a networking side. It is very, very close. A cloud has to be closed in order to use this high high speed connections more or less.

A

What we also see- and this is specifically important for the automotive space- is that testing is important, because it's security and relevance so testing is absolutely critical and testing from the automotive engineering means hardware in the loop testing, which means they they deploy the electronics which is typically in a car.

A

They put it in a in a in a in a in a rack and this software and this hardware software combination just acts the same as it would be in a car, and then you have not only one, you have hundreds of these hills, rigs hill stations and you feed them with the real data that you've that you have recorded on the road, not only the data that we use for ai training, you feed your complete data stack to those hill systems, so this means hundreds of petabyte for each hill hill hill run is typically, and if you keep all your data um in in cloud well, you have to pay the equals charges for that.

A

So that makes it a very expensive operation. Then this is what several of these customers have found out. So it's a very, I think it's very smart to put some of the data um close to your hill testing, where it is available for less casting, and we can also combine this- and this is typically done. So the more modern guys are using more software in the loop testing, where the where the hardware is replaced.

A

We are a softer model and typically the software model is is using um kubernetes and openshift, and, but still, I think there is there's some need for hill testing, um but we see a tremendous increase in software in the loop testing software only but still testing is is, is very relevant, and this is one of the pictures we could. We can and we can create this more or less okay, when we all put it on a on a chart.

A

This is well not the greatest chart in the world, but I think it shows what we have to do from an automotive perspective. Yeah, we have to collect data, we have to do ai training or data preparation, ai training. We have to do hotter in the loop and software in the loop and we have to do simulation, which I come to it as a as a use case number four, and I think this fits very very nicely to the overall architecture of um of openshift that we can do all these things.

A

Also, some something important specifically in the hill space is also windows server, because some of these hills rigs are still using windows operating systems that we cannot get rid of as at least very fast. But this still is very, very good in from an integration point using kubernetes. Also on the on the in the windows, environment should be no big deal either. This is this runs physical virtual in the private cloud and the public cloud or any mix on the edge or in a co-located system does not really does not make any difference.

A

As long as we have the right software concept for that, and as jill already said, this is the way why containerizing and container and operators is, is the big thing, and this is also what we see here here in this game. um We also see it, of course, aws and azure being the dominant cloud players here and if we're using the elastic kubernetes service or the azure kubernetes service, so we can also very easily intermix and play depending on the costing structure.

A

There last use case- and I think this is tremendously active at the moment- and still it's it's it's very it's kind of new and the people are, the market is still is still involving, which means the simulation testing on the road is very, very expensive and specifically in the last year and all the two last years where the test release had to be shut down, and there was a lack of of getting around in in the world people starting saying. Well, how can we do the simulation here?

A

Can we use sometimes gaming technology, um virtual worlds and add the physics and the reality and the and the model a on the and the sensors which are on these cars? If we have a software model, which is good enough, and can we combine this in order to verify if the testing that we did on the road is correctly and can we match it and can we extend it and can we do variations, and I think this graphics, which I stole from the ntu project in in in singapore- puts it very nicely together.

A

You need a virtual test orchestrator. Typically, um we have to have the vehicle dynamics. We have to have the right interfaces. Then we need scenarios scenarios now open scenario comes out of the asam consortium. I think this is very, very good. Work, open, drive, open scenario, um open road. I think this is the right way to go in order to have a unique common understanding and language which the which the engineers understand. We need traffic models. We need environment models.

A

This is typically done with co-simulation and also sensor models which are close to the real thing and which has to be provided by the real sense of center people there and we put everything together and what we see in this area is using the um grpc standard from from google in order to get everything together in a very, very close and and like a shared memory, space or working together as a as a good um model which which is working on not a single executable but multiple different um containers, which have to be scheduled in combination on the same cluster, very, very interconnected, and then, of course, this workload then will have to be scaled out, which means if it runs for a single engineer.

A

Typically, they look at the virtuals or they look at the screen. But when everything is okay, then the screen gets detached and they run the simulation a million times or even even more, in order to find out things which are edge cases which they change the weather. They change the conditions they change the cars they etc, but it has to be matched to the reality.

A

I think- and this is still very, very critical, because we also have two two different ways of thinking here: the real car guys, they think only driving is the real thing and people in the computer. They think this is sometimes it's gaming yeah, and this is not gaming. This is reality, modulation and simulation, and it has to be accurate. That's the problem more or less okay. um I hope I think, of course, for us. What do we need there?

A

I think kubernetes platform is the platform of choice for the autonomous vehicle development. It has everything that we need the integration also to the nvidia playground. The nvidia ngc container registry, where there is lots of software available from the open source, which is gpu, enabled and ready to go and ready to be consumed, can be very, very easily constructed, and I think, there's still some way to go, because the poor car engineers are quite new to this modern way of computing. But I think this is, I think, there's no alternative any longer.

A

Most of the people have heard something they know something about dogger, but well. Dogger is a little bit old. Now- and you need of course much more, but I think time will tell and- and I think we are- we are on the right track here- and everything which has been said before at this conference fits very nicely in this picture. I think okay well last picture just to give you some example and we're happy to show you the in-car recording from the left side with partner siemens and b plus.

A

We can show you the right side, which is an equinix data center and we have set up pocs and we are openly working together and share our experience as as long as it's possible from the end customer point of view there.

A

So thanks for that frank, um we're we're um hoping that the back of my car does not look like the back seat of that car um in the future. That I saw that and I'm like. Oh no, please not that. So this is the demo. It's a demonstration vehicle yeah, but they are. They can record with um with a high very, very high bandwidth there. So it's not really needed in it's more than you need more or less.

A