Cloud Native Computing Foundation CNCF Webinars, 3 Oct 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Webinar: Feeding the Kubernetes Beast: Bringing Locality Back to Data Workloads

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

While adoption of the Cloud & Kubernetes have made it exceptionally easy to scale compute, the increasing spread of data across different systems and clouds has created new challenges for data engineers. Effectively accessing data from AWS S3 or on-premises HDFS becomes harder and data locality is also lost - how do you move data to compute workers efficiently, how do you unify data across multiple or remote clouds, and many more.

Open source project Alluxio approaches this problem in a new way. It helps elastic compute workloads realize the true benefits of the cloud, while bringing data locality and data accessibility to workloads orchestrated by Kubernetes. Alluxio can orchestrate data locality from any persistent storage including object store such as Ceph and cloud storage such as AWS S3 or GCS and make it accessible to compute running in Kubernetes pods. As a stateless data access layer, Alluxio runs as a native service making data-intensive compute workloads Kubernetes friendly.

In this webinar, Adit will present this new approach of bringing data locality to data-intensive compute workloads in Kubernetes environments, and demo how to setup and run Apache Spark and Alluxio in Kubernetes.

A

I think there can start recording now.

A

Yes welcome everybody I'd like to thank everyone for joining us today for another webinar from Connecticut foundation, the tight ways you can see filling the kubernetes beast, bringing data locality back to data workloads. My name is alessandro, volta, respond, software engineer, Microsoft and a cloud native ambassador, so I'll be moderating this webinar, so I will collect your QA. I will interact with the speaker on your behalf and I hope. Everybody's gonna have a great time.

A

We would like to welcome our presenter today, edit mode, an experimental looks you and before we get started, of course, just few housekeeping items we will collect, there's a queue, a box in in zoom and at the bottom of your screen. That's where you can I can write your your questions for after the the webinar, we will collect them, so the webinar will last for the last 45 mins. Then we will time to to talk about to to answer your questions, the the official webinar of the cloud native foundation.

A

So please, please everybody be aware, there's a code of conduct and- and we will not let any question that will violate that code of conduct. Now, if a date is ready, he may start the webinar start. The presentation.

B

Thanks alejandro welcome everyone to the webinar today, like alessandro mentioned, I will be talking about feeding the kubernetes beast with logs here Emily. The engineer at Luxio I've been here for about three years now. I hope we all learned something noon today.

B

So here's the agenda for today I will start with a quick introduction of a Luxio for those who don't know about it, then I will move on to describing some fundamental kubernetes concepts which I will use in the rest of the presentation. Then I'll talk about different deployment options and use cases for a Luxio on kubernetes and I'll end it all with the demo or spark and Alexio running in kubernetes.

B

Okay, so here we go so if you look at the Luxio project, the project itself began as a research project in 2013 at the club, a damp lab in UC Berkeley.

B

Now it's been about four years that the company has been established, it's well funded and it's the goal that we have a deluxe EO is to orchestrate data and memory speed for the cloud and what a data orchestration even means like that's something I'll get in get into some details in the rest of the talk, so stay tuned. For that.

B

We have a fast-growing open source community. We have a variety of contributors from both the industry and academia spread across different parts of the world. We are github repository. Is it's extremely active and in case you want to learn more about the project, feel free to get in touch with us on our community slack channel. The link is there on the screen for you guys, if you.

B

Okay, so to give you us some context on why you need a project like a Luxio and in the evolution of the Big Data ecosystem. Let's start with what the first iteration of the big data ecosystem looked like. So when we, when we started, we only had one compute framework, which was hadoop mapreduce, which was co-located with one storage system, which was the Hadoop distributed file system and your data and compute would reside on the same cluster. You would grow both compute and storage together.

B

The whole premise of running compute on the nodes which have the storage was that you would obtain data locality and data locality is something that is critical for for performance and so that you gain timely insights from the data that you have.

B

But if you look at the big data ecosystem today, we have a proliferation of both compute frameworks, including fresh dough, spark MapReduce, flink and also a variety of storage systems that are available. Memory. Storage systems include storage systems, both on-premise such as EMC ECS, Hadoop, distributed file system and also in the cloud including Amazon, s3, Microsoft edge or Google, Cloud, Storage and so on. Now each of these compute frameworks serve a specific purpose.

B

They are good at a certain kind of workload, but they still do need to in interface with the variety of storage systems that we have listed at the bottom.

B

Like I mentioned, initially, we have in our journey. We had a co-located, compute and storage cluster in this picture. I have MapReduce as the compute framework running on HDFS.

B

Typically, what happened in big enterprises and also in the cloud was that people observed that most of the clusters were typically compute bound, but in order to add compute capacity, people also had to add storage capacity. Since storage and compute was co-located, so you can't grow compute and storage independently.

B

Then we moved on to the world with disaggregated, compute and storage in which, since you have disaggregated, storage and compute, you can add based on your needs. So if you need only storage, you would add more nodes to your storage cluster and if you need only compute, you would add more compute capacity, which is more economical. But what you lose in the process is data, locality and remember. Data locality was the whole premise of why MapReduce and Hadoop the data ecosystem started with co-locating computing storage.

B

Also, what happened was that when people have big clusters of HDFS on-premise them when they want to grow out the compute, they could either grow out the compute by adding more nodes to their on-premise clusters, or they could Choo's compute in the cloud.

B

So you could say that now, I want addition, compute capacity and I want to use that in Google's cloud, Amazon, Cloud or Microsoft cloud, and also like we mentioned before since, with the proliferation of the compute frameworks, people now want to use many more compute frameworks in it, in addition to just hive or MapReduce, which are more suitable to their workload.

B

The other thing which happened with the move to the cloud was that now we have cheap object, storage solutions which are available and the cost of data storage is much cheaper than it is to buy no provision a physical, node and store your data in HDFS.

B

So all of this entire stack is also highly useful. How is deployed on top of kubernetes in many deployments.

B

Okay, so this is where exactly where Luxio comes in with different compute firm works, talking to different storage systems, alec co acts as a virtual abstraction layer which sits in between the compute framework and the storage systems.

B

Alexio exposes different api's, including the java native file system, api and a hadoop compatible API, which is which we have labeled as the HDFS interface in this picture, and other interfaces, such as the POSIX interface, which allows applications to access storage systems, including object stores, with the same familiar interface that they are used to so and the applications do not need to make any code changes, but they could still work with any of the storage systems that we have labeled below.

B

So some of the key innovations of a Luxio includes the three bullets, the three pillars that I have on the screen. The first one is a data locality like I mentioned, as you are increasingly moving from a world with where there was co-located, compute and storage into a world where we have this disaggregated. Compute and storage data locality is something that is not easily preserved in a situation like that, so the layer by having a layer of luxury in the middle we I'll and I will talk about the solution in the following slide.

B

We are able to have data locality without making additional copies or migrating data to the remote to a compute cluster, which may not be the same as your Azure storage cluster. The second pillar that I have over. There is labeled data accessibility, so you could still continue to use the popular API switch. Your frameworks are familiar with to access storage, which could be located anywhere. It could be in the cloud it could be in a in a cloud which is in a different region, spread across spread across geographically and the third.

B

The third bullet I have over there is data elasticity. What this means is that, within the single file system, namespace that alexia provides, you could have access to data spread across storage systems. So imagine you have a highest table which could be stored across different storage systems, so you could have some partitions stored in HDFS, some partitions in s3 or some partitions in yet another storage system.

B

Here are some examples of how an end application interacts with the Luxio. If you look at if you're familiar with ARC, you can see that accessing data through Luxio is very similar to how you'd access data from HDFS.

B

The only difference is that you, instead of the HDFS scheme, you have the luckiest team and pointing to the lucky omastar, and so what you have is that, with only configuration changes and no code changes for your end application, you could switch from moving HDFS to using a Luxio which enables you to access data across storage systems and also access data on object stores.

B

Similarly, for presto or if you're familiar with presto, it looks exactly the same. The only change is that you would move from the HDFS scheme to the luckiest scheme.

B

The third bullet that I have over. There is the POSIX API, so you could interact with the Luxio using a familiar POSIX API by simply issuing calls, as you would issue calls to local file system, and this is an interface that we have seen increasingly popular for machine learning.

B

Applications such as tensorflow running on kubernetes, the most flexible way of interacting with the Luxio, which gives you the most amount of control, is using our native Java API, which is our Java client library, and this in case you want to roll out some applications, fresh applications running directly unluckier. You could choose to use that.

B

Ok, so since I'll be talking about the deployment of a Luxio on kubernetes, I just want to give you a high-level architecture of what a luxury looks like so Alex EO is a distributed file system which has a master and worker components. The master component for a Luxio is something which stores the metadata for the distributed file system and the workers is the component which stores the actual data cached by alessio. So in this picture, that I have I, have two applications: Bristow and spark accessing a single Luxio cluster.

B

All and the end data that's being accessed could be, is residing in the two pillars: I have on the right, which is the object, store and also HDFS. So you could think of. If we are talking about a cloud deployment, individual Axio and the compute application is deployed in a cloud, we could have a situation in which HDFS resides on-premise. You are accessing data in the cloud from your on-premise HDFS cluster and at the same time, you have some additional data which you want to store in the object store which is provided by that cloud.

B

So what would happen is when you access data from a Luxio.

B

You would cache the data in Luxio based on whatever the location is. So if you want to access a TFS data, you could set policies in which which control, how Alex your stores, the data that you're accessing for a high availability in a Luxio. We have both.

B

We have a couple of options or we could either for on-premise deployments of villagio. We have the option of using zookeeper as a consensus, quorum consent as current consensus or for environments like kubernetes. We have an embedded quorum consensus algorithm between the Luxio masters, which ensures that Luxio clusters are highly available. So, if you so I'm sure this picture reminded you of the HDFS architecture allows, your masters are similar to Hadoop name nodes. Analog co-workers are similar to Hadoop data nodes.

B

Okay, next I will talk about a few kubernetes concepts which I will mention in the rest of the talk, especially of so. These are some components that we use to deploy a lecture on kubernetes and for people who need a quick refresher here we go so like most of you know. Kubernetes is a system for deploying and managing containerized applications.

B

This could include applications like spark and presto, and also stateful applications like Luxio in the next couple of slides, we'll cover some basics, talk about different options for deploying elección kubernetes and, like I mentioned before, we'll have a demo running spark on Luxio accessing data from Amazon s3, in instance, in a kubernetes cluster deployed in Amazon ec2.

B

Okay, so so some of the concepts that I love to use is from kubernetes, as the container orchestration platform are listed here on the slide, kubernetes abstracts away the physical infrastructure, so you can run containers on different physical holes, and this makes the deployment of applications like a luxury beautiful, regardless of what your physical host operating system or or infrastructure, was to make it easy for applications to connect to each other on kubernetes, especially when containers are launched across on across different host. There is a mechanism called service.

A

B

The other thing that kubernetes provides and Alafia uses is self-healing capacity, in which, let's say an alum master goes down. Kubernetes provides you with conserve with the ability to relaunch the desired number of pods and containers on the cluster. So secrets is a way of managing credentials when a lock seok's connects to different storage systems such as Amazon s3, sensitive credentials such as the access key and the secret key can be stored in the secret store, so that data any sensitive data, is not readily available to anyone.

B

So kubernetes also has different options for storage management, such as persistent volumes, and this is also something that we use to store the larger new in community.

B

Okay, so the all these terms should look familiar. Containers are basically a docker image with the lightweight operating system and the application executes executor environment.

B

Once a container is in a running state, we once an images in a running state. We can we call it a container fast in kubernetes are the basic scheduled herbal unit. Multiple containers can be combined into the same pod, and this is something that we do for a Luxio as well. Controllers are a way to specify the desired state for pods and kubernetes ensures that, regardless of the failure scenarios, the desired state of the pods are still maintained on the kubernetes cluster persistent volumes.

B

We already talked about this, so persistent volumes is used by Alexio as well to store, journal or store any state that should be maintained regardless of HIPAA pod is restarted or support fails.

B

Okay, so there are different ways of launching or deploying and managing a kubernetes application. The most basic way of deploying an application is using a declarative, llamó file which specifies the controller, the container image, what air containers are combined into a pod and also the different persistent volumes and resources used by an application like alexia.

B

So typically, what we have is that we have a set of yellow files which we would need to modify to deploy an application like a Luxio, and there is a lot of redundancy within these files, for example, the image the container image and the tag would be duplicated across the set of these files.

B

What helm for white is a helm as a thin wrapper over the declarative specifications. It reduces the complexity by specifying any redundant values in a single configuration file. This single configuration file kind of compiles into the multiple decorators Yama's that we used in the previous step, so that when you make any configuration change, you have a single location to modify and all all necessary. Yeah moles are modified.

B

So another abstraction over the decorated specifications and something that's increasingly used to deploy applications on kubernetes are kubernetes operators. So any domain knowledge that is specific to the application can be built into the kubernetes, and this provides you with the most flexible and Evi way of deploying any application on top of kubernetes. For example, if you are doing upgrades and you want to improve troubleshooting during the upgrade process, this kind of domain knowledge can be built into your operator so that it makes any de box kind of operations easy for your admins.

B

Okay, so in the next few, slides I'll go into a cup into somewhere details of what the solution or philosophy on kubernetes looks like.

B

Okay, so when you have a kubernetes cluster, which is segregated from the data store, the original source of data, one way of making the data accessible on your kubernetes cluster is by copying the data over into the kubernetes cluster. But what this does is that you need to set up and set up an ETL job and make multiple copies of the data to make that data accessible on your kubernetes cluster.

B

Now, in order to migrate the data and for the ETL to work, you need some kind of stateful storage system on culinary's and and having this kind of storage system on kubernetes can be very hard so to tolerate elasticity, and when you scale your kubernetes plus cluster up or down, you might have to both migrate or rebalance your data. For example, if you are scaling your cluster up to have an even distribution of the data, you might want to rebalance your data so that you have your data. You gain.

B

You have this performance that you want, but, and also, if you scale your cluster down in order to not lose any data, you would need to migrate data which is present on the larger cluster to the smaller cluster, so that any of the data is not lost during the elasticity process.

B

And also changing applications to any new storage system which is deployed on kubernetes can be very hard as well. So if your storage system that is available on kubernetes does not provide a familiar API, your applications will to change, and also not just the modifications that are needed to your applications, but tuning your applications for performance can also be equally challenging.

B

So so this is kind of why we need a solution like Alexia on kubernetes. So when you have your compute cluster running with kubernetes accessing data, which is not present on the kubernetes cluster Alexio brings a few useful features which we have described on this slide over here. The first thing that Alexia brings is that allows you gets data locality, regardless of if the solution, if your storage is not deployed on kubernetes. So in this picture that I have on the right.

B

A spark analogs you're running on a kubernetes cluster and the data being accessed is in an object, store like Amazon s3, or it could be in a different object or such as Google such as GCS. Now, when you access data from any of these objects towards the first access, what would happen on the first access is that the data would be fetched from your object. Storage onto the kubernetes cluster into a Luxio Alafia would share this data across different jobs, and your application would schedule on a Luxio with data locality for any subsequent accesses.

B

Okay, so let's talk about another another important use case for Luxio with kubernetes, so so what we've seen increasingly is that big enterprises who traditionally store their data in HDFS are increasingly wanting to burst into the cloud burst their compute clusters into the cloud like we mentioned before, we've observed that many of the on-premise clusters are typically compute bound and what they want to do is that they want to access.

B

They want to provision additional compute in the cloud and access their on-premise data, so the two options that we talked about previously was that for a situation like this, what you can do is either you can set up an ETL pipeline copy. All of the data that you need onto the kubernetes cluster and then only can you start running your applications, but the solution with the Luxio provides you with a couple of useful things. So the first thing is that the data is accessible immediately.

B

So as soon as you spin up your computer cluster in the cloud with kubernetes, your data can immediately be accessed from your on-premise cluster without setting up any ETL pipeline. The other thing is that the that data is fetched on access. So, even if the storage capacity on your kubernetes cluster is not high, you would still be able to access data and only cache data in Alexeyev, which you access. So you don't need to guess. Or do you don't need to predict what data analyst wants to?

B

You will want to use anytime in the future and migrate. All of that so like this is something that fetching data on access is also a reason why you would want to use a solution like Alexia, so it's called zero bursting because you are not making any persistent copies in your on your kubernetes cluster and you don't need to set up an ETL pipeline.

B

Okay, so getting closer to the demo, I'll talk a little bit about the architecture of velocity on kubernetes. So in this picture, I have two physical horse running different containers. The host on the left is running, the Luxio master container and the post on the right is running. The Luxio worker container, like I mentioned a lakiya master, is the demon which, towards the journal and the metadata for the distributed file system, a luxury service is used by clients to identify and connect to the LA co master.

B

So even if the alexia master switches from the host on the left to the host on the right and Alex, your client would still be able to connect to the LA Co master, using the luckiest service, which provides the DNS hostname for any client, regardless of the physical location of the Lockean master container, to store the persistent metadata Allah CEO can be configured to work with persistent volumes such as Amazon EBS. If you're running in a min, Amazon or you can use any other persistent store, depend a store.

B

Persistent volume dependent on the cloud that you're working with.

B

So when you're running spark on the Luxor in communities, we have will have a spark driver being deployed on some node in the cluster and we'll also have a set of spark executors. Now a spark driver and a spark executor both have the Luxio client jar embedded in them and they connect to Alafia masters to identify the location of blocks and then they'll access. The data from the Luxio workers.

B

Now deploying elección kubernetes, we have different options, we can choose to use the declarative, llamo specifications and you can deploy a Luxio with the default configuration using a set of commands that I've mentioned on this slide.

B

Now recently, we have added support for the Luxio help chart which, like I, mentioned, it's a single location for specifying any redundant configuration, and it makes it much easier to install Luxio on your kubernetes cluster. Now in the flight I've. The repository for helm that we're using is a Luxio repo, which is which is which should be available to your kubernetes cluster for accessing for installing lecture using helm, and this will be in the stable hen repository starting Alexio 2.1, which is scheduled for later this month.

B

Some of the ongoing works that we have related to kubernetes at Alexio is including support for large production deployments by by improving the high availability solution that we use in the absence of zookeeper, like I mentioned before, Allah CEO of masters have an embedded quorum, contestant algorithm, which can be used for H a in the absence of zookeeper. Now. The other thing that the other major thing that we have validated recently is the afife metadata layer which allows the log Co deployment in kubernetes to store metadata for your files, which could be in the billions.

B

So you could have you could have. You could have files in HDFS and s3 and for Luxio to be able to handle data across multiple storage systems. The metadata layer in alexia actually needs to be much more scalable than the metadata layer for the solar system that it's accessing, and this is something that we have that we have included recently now helm charts like I already mentioned. This is a more convenient way of deploying a Luxio, and now on the helm. Chart has parity with lots of deployments in a non containerized environment.

B

Now we also have a CSI driver for a Luxio coming soon, which makes it easy to access a lot see using the POSIX API, so applications like tensorflow or any other machine learning applications can simply mount a persistent volume of type a Luxio and start using Alafia without the need to distribute the Luxio client jar into the applications that is accessing Alexia.

B

Okay, so I have I.

B

Have prepared a demo for running a Luxio and spark into an kubernetes and I will jump into that phrase soon.

B

Okay, so, like I mentioned before the setup for the demo, is a four node kubernetes cluster deployed on Amazon ec2.

B

Let's just make sure nothing is running.

A

B

The cluster, as of now.

B

Okay, so the way that we'll use to deploy a Luxio on communities right now is using the the Hamill's, the declarative, IANA specification. The first thing that we need to do is deploy.

B

The electio configuration which is Luxio config map. So if you look at the config map, it's a set of configurations that we have for the Luxio cluster, so it specifies the storage system, which is an Amazon s3 bucket, and it also specifies different different parameters which are needed for Luxio to to run in Amazon ec2.

B

So once the config config map is deployed. The next thing that we do is we create a journal.

A

B

So the journal volume is.

B

Let me just delete that first.

B

So the journal volume is used by the Luxio masters to store any persistent state, such as the metadata for the file system cluster. Regardless of if the lucky master, pod, star, restarts or not.

B

Now, once the configuration and the volume has been deployed, what we do next is that we create the Lakia master.

B

If you look at the state of the deployment, we see that we have unlocked Co master pod running and it has two containers running inside.

B

Now, once the lock, your masters are up, we can also lock launch the lock your workers, the log. In this case we use demon sex for the lock co-workers we even a demon. That would mean that unless a worker is launched on every single node in the cluster.

B

Now, once the locks, your master and workers are running, we exact into the Luxio master container.

B

And now we can access the Luxio cluster using the CLI. So if I type in a lecture FS, we can see the content contents of the Luxio cluster.

B

So default test files is something that was present in the s3 bucket, which was route which was mounted at the root of the lecture file system. Namespace.

B

So, in addition to that, what we'll do is that in the demo we'll be accessing a two gigabyte file which is in the bucket that I just mounted at location s3. So what this means is that in, if you look at the lock to your file system tree and at the location, s3 a will have access to the bucket that I just specified on the highlighted line. So any contents of the s3 bucket other demo public are now accessible in the Luxio namespace as the location s3 a data.

B

So this location is, as you can see, it's two gigabyte file. The annotation persisted- means that the data is only present in Amazon s3 and not inside a lecture at the moment, and zero person also means that zero percent of the data is cached in lucky. At the moment,.

B

Okay, so in this tab that I have opened on the right, this is something we'll use to run park in kubernetes. We have a lot docker image for running spark. We deployed on this cluster. The spark image contains a Luxio, Luxio, client jar, and this can be used by the fog driver and executors to interface with the last year.

B

Okay, so running Park on a luxurious as simple as a park submit job. So once we run this job, we just specified some configurations needed to access a Luxio. As you can see, we specify that we will access the Luxio, the Luxio master and we'll access the data on the location, s3 a slash data, which is the two gigabyte size.

B

Actually, let me just make quick modifications there. You go to specify the correct service.

A

B

Once the container finishes, we should be able to see the logs to.

B

Okay, so uh it looks like we are a little short on time and also I'm running into some issues with with the demo, but so in since we have limited time remaining. I would like to wrap up on the remaining presentation and we I'll just walk you through what would have happened in the demo except was working so we had Luxio. So what we saw so far was that a Luxio was deployed on the cluster of Latium.

B

We had a single alexia master pod and a set of Alafia workers, the workers stored, the actual data and the Luxio master stores. The metadata we mounted an s3 bucket, which we attempted to access through spark and the first accessed through spark, would have cache the data and any subsequent accesses to spark would would show you a performance boost, because the data is now available locally on your kubernetes cluster.

B

So in the interesting times, let me just wrap up and we can come back to the demo if we have any time remaining towards the end.

B

What we info in the stock, we we gave an overview of what data orchestration is Luxio acts as an abstraction layer accessing data from multiple storage systems such as Amazon, s3 or HDFS. It enabled access to data in the kubernetes cluster, regardless of the location of the data. We ran through a guide of deploying and managing a la CEO in kubernetes, using the declaration general specification, and we also showed you a demo of running unlucky in kubernetes.

B

So in case you guys have any questions left after the session feel free to reach us on the community. It's like the slag address is a Luxio dot. Ios, like slack, feel free to find me. My name is Edith Madden and I'll open the floor to any questions. Now.

A

Awesome I did thank you very much for the presentation. The demo was equal.

A

So right um there was just a couple of questions on on the Q&A window of some of them have been answered already, of course, I look to is an open source software and is under license and for that asked question so because it doesn't look to remove the space assigned to a container once the container is not running or removed. So it's easy cleanup. The distillate after the container is gone.

B

So what after so, if you're talking about the storage space that allows your users to cache the data on so once the Luxio container is gone along here, would clean up after itself and any storage would be removed. But we also have the option of running Alessio with persistent volumes so and in case you want to preserve the data. That is also a viable alternative.

A

Okay, I think it's just now. Another question pop up from Vishnu Muslim enslaves architecture looks like similar to safe. So the question is about the similarity to two safe and.

A

Efficient building master specifically, so how does it compare to safe is the similarity there.

B

So the the architecture for Alexia and Seth are a little different. The way that the we rebuild, the Lockean master is by depending on persistent volume. So in the configuration that we have like any Alec you any metadata for the logic, cluster is stored in persistent volumes and once master goes down and is brought up on a different note or once a secondary master is started. The state of the Luxio cluster can be rebuilt from the persistent volume.

B

So there are no issues in rebuilding the state of the Luxio cluster, and this is something that we have worked on extremely hard recently and it it definitely does not have any issues that I'm aware of in case. There is any specific issue or specific kind of issue that you have in mind. I would love to hear from you. Please get in touch with me on the slack Channel or with a follow-up question. I.

A

See so you saying that, of course, the Masters of looks, you are protected by the same mechanisms. Are they stateful sets or deployments under Coubertin anthesis? What you trying to say, yeah, okay, another question is also from Farzad.

A

If the container starts after long hours between downtime at runtime, the soldiers data be preserved somewhere, so container can staffed. Will he stop so he is if the containers is stopped and every staff that is, the data, be preserved, I suppose well,.

B

A

Think they dig yes question. Yes,.

B

Like I mentioned so forth, the data itself, there are various ways in which you could preserve the data across restarts of the worker cause. Let's say for the lock co-workers the data is stored could be stored in volumes of type empty door which is lost on on restart, but Alexei would still be able to access the data from the underlying storage system. So, let's say Alafia was acting as a layer, an abstraction layer between your computer applications and an HDFS cluster.

B

Even if you restart the LA co-workers and you used volumes of type empty dirt which are Italy, which is cleared on restart a lodger, would still be able to fetch the data from your HDFS cluster. If that is not an acceptable alternative, what you can do is that allow co-workers can be provisioned with persistent volumes and regardless of how long your clusters worst were stopped for persistent volumes can be used to recover the data.

B

So a lot you can store data in different tiers memory is just one tier which is lost on on restarts, but a Luxio can also manage data in SSDs, nhd reason or any other persistent storage that is available on your kubernetes cluster and that will be preserved across restarts of the Luxio processes.

A

That's interesting, okay, I think I have only one last question for myself if it's possible to run I'll, look to you in a multi card environment, so to say having destroyed the genus 3, but then provide that storage to a class of honey in Azure cloud.

B

Yes, that is definitely possible and Allah still brings a value of not locking into locking you into a specific route provider. So if you have data available in Amazon, s3 and you're running your computer on kubernetes cluster on Amazon resources, today, you can easily just as well migrate your kubernetes cluster to a Google cluster and still access your data, which is present in history. The first access would have high latency interlocks.

B

He has to fetch data from a cluster which is not on the same resources, but any subsequent accesses, which you will still be able to run kubernetes on Google cluster. What will access your data from Amazon.

A

Interesting very interesting.

A

Let's see, we've formed three minutes left. If you can take a very last question. Oh this seems to be answering already. Well then, in that case, I will. Thank you very much at it. As you can see, there's a data acquisition summit in Mountain View in November. You can register in. We also posted the link in the chart and I would like to thank, take a look so and and a date for the great presentation and I hope you had a good time.

A

Thank you for joining us, the webinar and the recording will be shared online later today we look forward to more critical computer foundational webinars and to everybody have a great day. Thank you.