Red Hat OpenShift San Francisco 2019 | OpenShift Commons Gathering, 28 Oct 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Lightning Talk: Hybrid/Multi Cluster Storage with Erin Boyd

Description

Lightning Talk: Hybrid/Multi Cluster Storage with Erin Boyd of Red Hat.

Filmed on October 28th 2019 in San Francisco.

A

Thank You Diane, so my name is Aaron Boyd I'm, a senior principal engineer for Red Hat, I work in the office, the CTO currently I'm, working on hybrid and multi cluster storage. In just the multi cluster story, we have coming out of Red Hat thought it was a thank you diamante for setting it up. I'm talking about hyper-converged infrastructure. That's one of our main focuses that's a powerful message for why we enter the realm of hybrid and multi cluster storage.

A

So why would you want a hybrid type of setup or even multiple clusters? Well, especially in terms of AI and ml, you might have performance considerations where you want to run some of your workload on a very specific cloud. You need fault tolerance, you want to back it up, you know maybe between different zones, and maybe you need some specialized hardware to run some of your workloads.

A

The problem with that, though, is if you happen to choose one vendor for all of those services, you're locked in and then there's the idea of regulation and you want to collaborate. So you want to be able to run your workload where you have those services, wherever they might be, they might be in gke, they might be an AWS and due to regulation you might want to run them on Prem.

A

So those are the considerations where you want to look at hybrid cloud or multiple clusters within your kubernetes deployment, to be able to facilitate running your workload where it runs best so whit. There are lots of different applications, of course, in AI and ml, and how do we share data within those? You know they're the typical four things we share, our an object store or it could be a database, a file system or a queue. Almost every slide.

A

I've seen you know, evidence of that I've seen Postgres I've seen Kafka I've seen spunk I've seen a lot of these already, showing that this is what we're using. This is how we use them between clusters and I'm. Today, I really want to focus just on how objects store and what we're doing that's a little bit different and in the community. Today.

A

So object storage is convenient in an AI ml presence and that you can have a bucket within something like three and have many different applications either feeding to or reading from that bucket. It's pretty easy to use the API lots of different cloud vendors support that API and you can write your application easily to get and put from that bucket and provide policies.

A

However, if you're familiar with kubernetes, there's not really a standard for how we handle object today, so in turn, ooba-dooba is a new project that was started a while ago, and it's just become open-source the beginning of this year in May and has an operator, you see the operator hub, dot, IO the Nuba operator, and it provides a means for users that want to have this hybrid cloud Multi cluster capability to store data within their object bucket and set policies within that, so at the top it gives you buckets, accounts and permissions things that you would expect from a typical s3 deployment of any type of storage.

A

The middle layer, though, where it's differentiating between your normal bucket is you can set then mirroring caring policies. You can spread it, however, you want based on data locality based on performance needs, and then the last layer is the bottom layer of the storage.

A

So within that you're able to not only be tied to the you know, s3 version that you're using on AWS you're, able to abstract that out and use nuba as an interface between any s3 provider so on the backend nuba also provides the ability to do dupe the data get sections of the data to move over. So you have your basically your replication, snapshotting and backup as well. All this is still tied into kubernetes because, as I mentioned before, we don't really have a consistent API in kubernetes for object storage.

A

How many of you guys use persistent volume? Precision volume claims today, yeah, a lot of people are very familiar with that. So what my team at Red Hat did has come up with a CR D that allows you to create an object bucket in an object bucket claim. These concepts are just like you're, persistent by persistent biome claim, but specific to the very needs of object, storage, and this way it can. It provides a consistent control path so that I can create an object bucket claim just like I.

A

Would a persistent volume claim and I can dynamically create storage on my back-end, then that storage can be sort of served up to things like Nuba or your, and then you have a portability layer on your storage, so you can create your applications, creating an object bucket claim, just like you would a PVC and you can move it from cloud to cloud having the same consistency.

A

How am I doing on time Dan?

A

So what? If I'm not using object data? What, if I, want to keep my application? The way that it is I want to use all of you guys that raise your hand when you said you're using persistent data. Rook is another great open source product that project that allows you to automate the installation of things like SEF or mini, oh and soon to be long form, and so what it does is it provides the ability to have an operator to deploy the come.

A

Take the complexity out of the storage and come provide this consistent backbone across many different clusters. So I believe in the discover thing we heard earlier today they were talking about using shared storage using things like NFS or EFS. Rook also provides now a plugin force ffs. So then you have the shared storage. So even if you're not using object data, you still can use the power of an operator than to deploy your storage system consistently across many different clouds.

A

So what are we doing in the community? I'm also part of the kubernetes Storage sig and the CN CF Storage sig and the community is working to enhance how we use data more agile e in the community. Things like snapshots, cloning and volume transference are soon coming out to be able to be deployed and used to leverage the ability to have hybrid cloud because, as you know, kubernetes is always claimed, we're completely stateless, we're agile. We can move anywhere and then that all fails when we start talking about purses storage.

A

So look forward to many of these features, helping improve the way that we can manage our data within the systems and then lastly, I think this is great. This actually ties back to what do you? Monty was talking about when you have hybrid cloud and when you have these challenges of Manning managing different clusters, you need this consistent administration across all this. When you talk about your applications, you have to have network. If you don't have network, you don't have the ability to have the distribute distributed storage.

A

You have to consider state the portability of your application and the placement of that and having that consistent control, plane of administration is important and that's why platform matters I think as you've seen today, where a lot of people have leveraged OpenShift as that platform that ties all of this together, not only having your end-to-end pipeline for data analytics, but then a consistent user experience across all of your different clusters.

A

It simplifies administration, allows you to enforce things like quotas, I believe that discoveries are talking about either and then apply those policies across all of them. So with that hopefully I met the under five minutes. They did an excellent job of that. Thank you very much, and so we're going to get Kyle to come up.