Cloud Native Computing Foundation Storage Special Interest Group, 10 Feb 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CNCF SIG Storage 2021-02-10

Description

CNCF SIG Storage 2021-02-10

A

A

My good morning, good afternoon, good evening, um hello, we'll just wait. Hello, we'll just wait a few more minutes to allow a few more people to join.

A

Hello, everyone we'll just wait one or two more minutes to allow a few more people to.

A

A

Okay, um I think we should we should start. um Today we have uh two main items on the agenda.

A

The um project fine yards team um are going to follow up from the presentation that they that they gave us at the last meeting with with um a quick demo and then we'll move on to uh to further technical discussion of rafaeles.

A

Dr document, which has been coming along nicely so uh with that I'll hand over to the alibaba team for project vineyard demo,.

B

uh Thank you alex uh hello, everyone. My name is leon yoo and uh I'm from alibaba dharma academy, and today my colleague, andy and tao, and I are going to demonstrate our system as a follow-up of last presentation.

B

And it has a shared screen. uh Yeah, uh why are linearly? Is a in-memory, multiple data manager uh that provide out-of-the-box, high-level abstraction and zero copy, in-memory sharing for distributed data in big data tasks, and if, if you I want to know more technical details, you can refer to our rapport. Also, you can uh look at the dax of the last since uh last thing meeting and I will first introduce a big data task.

B

We are going to demo and then I will hand hand the microphone to my colleague andy, and he will then go through the other running back on combinations and as well. Okay, the task we focus on it's like uh it's a simplified version of what we do in alibaba is for fraud detection.

B

Currently a fraud transaction indicates a customer deceptively purchased. An item hoped hoping for to inflate the rating of the item and in the demo example, we used pandas and mars for distributed processing to prepare the dataset and then use python to trim from translating classifier, and we integrate one out of its these two systems.

B

uh First, we can check the data, uh the code or data data first sure.

B

The data we used actually from a few series base, typically one on an application built on top designed for handle large data, but today, for demo purposes, we will just always use the small data just to speed up the the process, but you can imagine those kind of data can live in hdfs or something the schema of the data is like. The follows: like item is something like items listed in uh e-commerce websites like amazon or alibaba.

B

It begins with an id, and then it follows by like several numeric features and the user. The user table is similar as indicates each entry is a user and it starts from id and followed by a few feature features and the transaction like indicating a purchase from a user with an item.

B

The first two columns are the user ids and item ids and the third column is a label like used to indicate whether this transaction uh we have labeled as a fraud or not. We use this label for training and followed up by a few like transaction features like for the actual classification we uh we actually, we we joined those two uh two to three tables together to to have a very wide uh lots of attribute lots of features uh table for transactions.

B

Let's look at the data. uh uh Sorry, the code uh like a a simple like one single machine version is using behind us. We use banners to load data from sas waste and load them as a data frame as appendix data frames, and we do us using a join like to to expand to get a new data frame and it's called data set, and then they uh like yeah combine features is to do this and then we can go. uh Can you can you scroll down? Please.

B

Yeah and then we we sorry, oh okay, we use the make data sets to to the function, to like to convert those pandas data frame to num high and to be a a a high torch tensor data set, and then we import pi torch and it's a just, a simple like one layer in for classification: okay, it's to taking taking input and taking the data as input, but uh what, if those data sets, are large like? Let's just imagine those sets- these are too large to be handled on a single machine.

B

We need to do the distributed processing. In this case we use mars. uh It's also a project from alibaba. It's uh you can think it has a parallel or distributed numpy appenders and the scikit cycle learn package. Okay, let's go back to the code, uh it's the code itself. It looks pretty like similar to the version in pandas. You see, um but there are a few notable differences.

B

uh First, like the data frame. Here is a mars data frame. Here it is different from the pandas data frame uh here.

B

Tmrs will slice the the extra data frame into multiple chunks and this uh distributed results into material nodes and machines, and you can see why yard provides the I o ability for mars uh because like uh for for iron- and it not only supports the local data I o, but also supports, like I o from uh remote hdfs and various like uh data sources and the forty those that come the actual drawings like, let's see the the like uh andy, can you select the yeah for the acne joints, uh uh it's quite similar to the pandas, and I accept that uh for mars.

B

It's actually uh like the job actually.

B

Frame download the joints of those chunks and reshuffle the data if necessary, and for for this part we we only like taking care of the precision and, in this script, we're only taking part uh taking care of the preset pre-processing part. We we just output a one-yard object as a global data frame. uh We will like go back to the python part later.

B

uh I will next hand the microphone to andy, and you can you can taking over from here.

C

uh Thank you, william, and thanks uh tau for for the code demo. Okay, now, let's first check the our environment. uh Here we have a kubernetes cluster with eight nodes and there is no pods running now we install wired with helm.

C

Then we check the crds, we have the global objects and the local local objects crds from wired.

C

Then we run mars. Now, oh okay, uh uh while it's now running as a demon set on the kubernetes cluster okay, now we can run mass now.

C

This process will take about one minute: it will join the three tables and produce a global data frame in.

C

C

Let's wait here.

C

Okay, we get the global data frame with this object id id, and then we check the crd. This is the global object. Crd represents this global data frame and also this global data frame is partitioned into chunks and these chunks are distributed over two notes. They are 192 and.

C

193 uh now we check the the parts uh we can see that the the wired uh port on node 192 is. Is this one, so we log in this uh one other part to to to get a chunk to to see how exactly the chunk looks like uh we first import the wire, then we establish a client client connecting to the ipc socket of wyatt.

C

Then we can get the chunk. We will choose a chunk lying on the two note: uh okay, now we get the chunk. uh uh Actually, the chunk data is mapped from the one of the process to the python process, uh with shared memory in a zero copy fashion and the chunk data is automatically uh resolved into a pandas data frame, since we already registered the pandas data frame reservoir to myart for the detail about memory, mapping and the resolver registration mechanism.

C

Please refer to the one document, and now we try to uh get another chunk that is not located on 1i2, for example, this one okay.

C

Now we find that we we can't get get that uh chunk. uh This means that when we try to get the chunks from uh one other, we we can only get the local chunks.

C

Now, let's look at the pie touch code. That will be the next step. We are going to run okay compared with the single machine version. uh The make data set from wired is a is a different uh here. Okay, here we we have to first connect to the wi-fi ipc socket and get all the local chunks, and we can concatenate these chunks uh into a merged data frame. Then then, the rest part are just the same as the single machine version. Okay.

C

So when we deploy the python uh port on kubernetes, uh what if the ports are scheduled to the nodes that doesn't have any chunk uh chunk of data, then then the python can't get any data from wired. So let's look at the stateful set yamo of of our pod job of our python's.

C

C

huh Hello tom, are you still there.

C

Okay, I will sorry we can.

D

Hear you just fine.

C

uh I I think tao is, uh is offline, so so he's not.

B

Tau is the one actually typing those commands.

C

So I I I was trying to find the code. Okay.

A

Oh, I see this.

E

Is this is this is what this is? What happens when you're.

A

Extremely brave and do a demo online live.

E

Yeah replica one demo.

D

It's not a live demo. If something doesn't go wrong, so don't worry. This is.

D

This is all good.

C

Okay, uh let me find it: okay,.

F

The oh was quote.

F

C

Maybe we should wait a few.

G

Minutes, can I ask a question while we're waiting? Okay, andy, can you can you describe the the your memory allocation, uh how that's going on and you know what uh what kind of rules you're using and and how the the memory that global memory is getting allocated and and decided. If there's enough to do the application and things like that.

C

Okay, uh here and uh currently, we didn't have uh uh very specific, sophisticated uh strategy uh in and now, if the, if the memory is not enough, when we create the uh the object we will, we will return the the the message to the client uh to tell them that the memory is not enough.

G

So you were so you had two servers there and you allocated the during the allocation of the memory.

G

um If the two servers weren't enough it, uh you would get the message to try to pull in more is that is that the use case or how you, how you see it getting used actually.

B

uh Yeah, let me uh like share a bit more on this. Actually, we with the helm like we need to specific specify how much memory can we are used like in the helm, um the command and after layout is deployed and the the memory like consumed by wire is fixed. It's not no larger than eight gigabytes in our demo case, but you can specify the launcher demon set of wire like how much memory you want um montmartre.

B

uh So it does not have a dynamic like adjustment on the the the size of the the memory layout can consume and for the memory allocation we use a very simple version of a single thread: uh memory allocator, uh it's it's um it's the same as a plasma hi talking are you back with us.

B

Yeah yeah, you can, you can show us the um the spec, the the the stateful set yamo.

B

C

uh Okay, so so I I just uh described the the case that, when, when the pod is, is not scheduled to the to the place, the um that the the chunks are located. uh In this case we will uh add the the wired migration uh to the init container, and this is uh a a function uh provided by wired to to migrate the data chunks when, when the chunks are not located at the same node as the pod, okay. So now now, let's run the pythos.

C

C

And we give the object id of the global data frame as the required object.

C

Id okay: uh now we check the pod.

C

uh We can see that these two pi touch pods are scheduled to node one eight, eight and one eight seven. uh They are not the same node as the chunks are located. uh They are 192 and 193. So, let's check the logs of uh the worker 0.

C

We can see that see that the the data migrations I are triggered here and and the data migration is actually time consuming and now we check the the local objects.

C

Okay for this local object, it has been replicated here, so it is replicated from 193 to 188 so that the the port on node 118 can use this chunk.

C

So here we can see that the data migration is both time and space consuming. So is there any way that we can reduce the data data migration as much as possible?

C

Here we we we try to use the schedule plug-in in kubernetes to solve the problem. Okay, so we first reset the environment.

C

And and we we will run mass to get a new global data frame.

B

Just a little part.

G

A

Just just a quick question there so so effectively you have a set of source data which then gets sharded across. You know a number of nodes, and then you can start your um your workload that consumes that data on any node within the cluster, and it will pull the relevant segments to the node where the, where the, where the job is running right.

B

Yeah, correct cool.

H

I have a question about the design. Why did you decide to have an explicit step to pull the relevant shards as opposed to maybe just let the client in this case the python application say I want this global shard and have the in the vineyard locate the the you know they needed and do the migration on demand instead, you're doing an explicit, you know any container step to do the shard migration or replication.

H

Is there a? Is there a reason.

B

uh I I can't explain, um because, like those kind of cancer needs to be like in our case, that cancer need to be like used immediately after the the this process was launched, just just because the case we first, we need to convert those like data frames to tensors and we need to get all the data frames. At least a chunk. The whole chunk to get the job started.

B

uh We have a mechanism for pipelining like uh using uh streams like environment. We didn't demo today. uh In that way, we can do some pipelining, like the streams, are just chunk streams you can like consume chunk after chunk chunk and a sequence of chunks, you can consume a chunk at a time and we can organize that as extreme, so we can only migrate one chunk at a time, and in that case, so yeah yeah, just just the data.

B

In both cases, the data are directly mapped into the consumer's process. So that's the basic design paradigm, yeah.

E

Just a quick question- and I'm learning from this is that uh would it be the data frames are on certain nodes? Is that correct? Yes, okay and then, um as the application consumes them the python application, would it be a beneficiary to have the python application hint in the pod specification?

E

What frames is looking for so that you could extend the kubernetes scheduler to be out to to tell the scheduling kubernetes say place this pod on this node, where its data frames are so that way, you don't have to move the data.

B

That's exactly what we are going to them.

I

Oh sweet exactly.

I

Awesome exactly I look forward to it then.

B

Finally, please go ahead.

C

Okay: okay, thanks, okay, so uh this new global data frame uh it it has chunks uh located as uh at 192 and 103 as well. Okay, now this time before we run pi touch, we first uh installed the scheduled plug-in to uh to kubernetes okay, yeah.

B

That's a plug-in, that's a scheduler plug-in luis just talked about yes,.

C

C

Okay later we will uh inspect the logs from this scheduled product in to to see how we schedule the part. Okay. Now, let's go back to the state state force, 34 set yamo.

C

Here to let the scheduler knows which one object is required by the pod. We added this spec to to the pod spec. Okay. So now we can run the pi touch now.

C

um So actually, when, when the scheduler is, is trying to schedule the port, it will first get get the one object, id form uh from the that that spec and then it uh since we are using stateful set uh each pod- will has a rank based on the rank and the object id the scheduler can understand which chunks are required by that port. Then it will inspect the location info from the crds to know which node has that chunks.

C

Then it will give the node the highest score. For example, here the node, 193 and node 1i2 has the height highest scope. Okay, so let's go back to check the parts.

C

Here we can see that these two pie touch ports has been scheduled to 193 and 1i2. Let's look at the logs of the work there again and there is no data. Migration happens and the time is very small. Okay, let's check the local.

C

Objects and- and there is no replicated local objects, because there is no data. Migration happens. So here the here, the the data migration and the schedule plug-in is just one of the functionalities uh provided by wyatt in kubernetes, and we want to uh integrate uh the ability of kubernetes in to achieve advanced functions like check, pointing photo recovery and so on. We hope we can build a new cloud-native way of building and running big data applications on kubernetes thanks.

B

That's all for our demo. Okay.

A

B

A

This this was this was really um this was really informative. It really helped me um visualize a bit more um how how fine art works and some of the architecture, because I think it's um it's a complicated concept.

B

Yeah a little bit.

J

Sorry I missed the last presentation, but just a general question: how do you compare one yard with something like spark on kubernetes uh spark, uh apache spark? Yes,.

B

uh You mean about this box, oh yeah, oh, it's, it's quite different. Actually, the one wireless like interface is very low level.

B

It's actually the shared memory uh it you can use all kinds of like the language of your choice and the the runtime, the the execution model, your choice to like build applica big data applications on top of ir, like you can use a data flow like spark or you can use mpi jobs or you can openmp parallel computation or you can even use some single machine algorithms like so you can choose everything, but the fourth spark is actually it has a fixed like a a fixed programming model and a fixed, like communication model, a fixed uh running mode.

B

Also, it's it's kind of restrictive like if we want to develop, for example, efficient, graph, computations or some efficient machine learning pipelines. Sometimes spark is not the most efficient way to implement that. So that's uh how my yarn is different from like spark in our case.

A

um Just to summarize that, therefore, um am I getting this right. Would it be fair to say that that fine yard is is almost like um an in-memory um kind of mapreduce capability which can plug into multiple so multiple apps? Can it be accessing that same those same shared memory, chunks in in in parallel, when you have a multi-step pipeline for your for your analytics, for example,.

B

Yeah yeah that's correct, uh but uh basically.

J

B

J

Exactly to be a memory, data store equivalence of mapreduce or hadoop, but I think the bigger difference is really the flexibility of the framework. As you said, right.

B

Yeah, the flexibility is what we aim for like, for example, there are some hpc tasks using mpi like you built. The application itself is an mpi job. That's like you, it's very hard to like uh migrate that job into like the a spark pipeline like but get rid of dvm completely. It's it's very hard, so um the wire basically doesn't provide any computation um support. You choose your way of communication. You can use rdma, you can use some magical tunnel. You can use your the common tcp socket.

B

It's your call. You doesn't enforce any like execution model, for example. If you want to run on tpu, it's fine, it's your call, it's it! So basically, why are just memory? Storage engine like uh can provide a cross process like memory sharing.

J

Another computation framework is more like a memory data store.

B

Yeah, that's true.

B

Thank you, hi you're, welcome.

A

Did anybody else um have any other questions for the team.

D

No thank you for the demo. It helped a lot clarify, as alex said over the presentation last week, where it would fit in terms of kubernetes. So thank you.

B

Okay, thank you. Thank you eric uh uh we. We would like to submit our uh project as sandbox as we mentioned in the last meeting, and uh we are wondering, if uh is there any way we can like uh have feedback from the the sig storage uh community uh in some form, uh I'm not sure whether that's unnecessary but uh yeah. We really want some feedback here.

A

So it's um I think, amy's on the or I'm not sure if amy's still here, actually she might have dropped off. um So so, strictly speaking, um a sig recommendation isn't needed for the um for the for sandbox submission. um But what um what we can do is we can provide.

A

You know a little um a little snippet of of of a comment to kind of say that you have presented and- and we think it's interesting and it's worth going into sandbox and also provides um a recording of the of the sig meeting, which you can use in your in your sandbox submission, because the sandbox submission is just it's just um um effectively a a form that you fill in and then the toc review it at the their next meeting.

B

Okay. Okay, thank you. Thank you very much. Okay, thank you.

A

Very cool, so thank you again for that. We really really appreciate that that was brilliant.

A

um The the next thing we had uh for on the agenda was um to follow up uh the discussion around the the disaster recovery um document uh that uh that uh rafaela has been um focusing on and contributing um just so that everybody is on the same page. I will.

A

Put the link into the into the chat window.

A

So we we could, we could definitely um do with um more feedback and- and you know, comments as we as we continue to um refine and and and bring more ideas uh into the documents.

A

um Raphael, and I um had a a brief conversation um yesterday and we we discussed, um we discussed a a few topics which which I just wanted to to run through, to see if, um if there's sort of feedback on the scroll about things that we might want to include in the document, so um so the first thing was we we had.

A

um We had a discussion about uh documenting more clearly, some of the um you know some of the advantages of of cloud-native disaster recovery as a as a as a general concept, in terms of, for example, having standardized versions of all of the software in the kubernetes clusters, by virtue of the fact that you know everything's, containerized and standardized versions of deployments and configuration through the use of you know, yaml and being able to have declarative and composable um application, deployment and storage.

A

um And you know that's that kind of leads to a number of advantages that we saw. So, for example, you know it dramatically simplifies testing, failover and dr processes. It it dramatically simplifies um keeping multiple clusters in sync, because you know they're all built on the same config, so rafael did have. I captured that thought process. Was there anything else that we in relation to that.

H

Yeah, no, you captured well, I think one of the other advantages we were thinking of is the ability to autonomously decide that there is a disaster right that, in contrast to more traditional disaster recovery, where it's a human decision, often.

A

A

um I'm just writing your notes.

A

uh That that's uh that's a good point. um I don't know if anybody else um on the crawl um who, who you know maybe um that works on you know hey or.

D

A

D

A

With um perhaps their customers or or anything like that, whether whether you know there were any other um things that that maybe we can, we can focus on in terms of the advantages of of the r and cloud native.

J

I can't talk about a little bit, um so it's quite challenging to say you know what considers an outage, especially in cloud environments, especially because in clouds you know there are different services like load, balancers, compute and outage means different things in each context, especially like a zonal outage. You know, or things like that, and there is really no good way of defining or detecting outages like so most like what we usually end up with is some arbitrary thresholds.

J

As far as if x percentage of nodes are down, we would consider like a zonal outage, and obviously semantics about is also mean different things, for, if you look if you're looking at it from the from the perspective of infrastructure versus application, but from applications point of view looks applications like scd or distributed data stores. They do replication and they obviously have you know different view of what outage means versus if you're, trying to just look at it from the infrastructure's point of view.

J

In terms of you know whether nodes in a given zone in a cluster or off or not. So I think, based on how we're looking at things that which layers we're exploring, we may arrive at different results.

J

And you know how we can distinguish from temporary outages like network blips versus more sustained outages.

J

I mean I have more experience with in gcp with all these problems, but I imagine like they're different each cloud or you know, on-prem.

H

I think what we're trying to do with this document is to provide guidance that abstracts from the underlying root cause or possible root cause of an outage, and just give you a guidance on how you organize your stateful workloads to survive an outage whatever that, whatever the root cause is right, um we want to be able to do it without having human intervention, like I was saying without losing state, or you know, having zero rpo and having the lowest rto as possible.

H

I that that's what we're trying I that's that's where, where I'm trying to go with this document,.

A

I I think, I think um it's still useful, though too to capture you know the things that are that are hard to do as well right, because, um on the one hand, you know having um having a coordinated plus recovery pattern, the find is important and there are a lot of pros, but obviously, from an operational point of view, things like defining what constitutes an outage and what are the thresholds for for failover and, for you know, the differences between degraded service versus an outright failure of his own are are probably worth worth articulating.

A

I imagine because those are things that you will have to think about and you will have to define.

H

Okay, then maybe ardhan, I could use some help. You know we could. We could start.

A

I was just thinking yeah.

J

I just got access to your document and I just you know scanned it very quickly, so I like the way you're, you know, describing the fundamentals of you, know: calf, theorem things like that.

J

Look in very abstract ways: you're, basically describing the problem. You know. Obviously, if you want to tolerate an outage there has the data has to be replicated like if you're doing, zonal, outage or regional outage. So.

J

I mean it all depends on like how abstract we want to keep the document or how generate you want to keep it. um I haven't had a chance to read the whole thing in detail, so I don't know exactly what what your goal is larger goals if it's just providing uh some basically some level selling for people to order to be familiar with the concepts or we're actually trying to come up with a set of guidelines, so people can build solutions on top of them.

H

It's the latter that you said: let's do this, um if you're willing to contribute we're going to take your time to read it and and maybe drop comments, and maybe we can also collaborate.

H

um You know set up some meetings and collaborate together.

H

J

I'll be happy to comment on the document. Yeah.

H

Yeah, the other thing I was gonna ask for to the team is I realize this is a long documentary. Some of you have actually read it and and provided feedback, and I I'm very thankful of that, but I think we need more before there is a consensus to go ahead and do something with it.

H

So I was wondering, would it help if I created a deck with you know a summary on the main concept of the document, and then maybe I presented the deck in one of the next meetings.

A

Well, um a deck to to summarize would would always be, um will always be extremely valuable, um and we can also you know, use that content, perhaps to to you know, create um a blog in future or a webinar or or or something like that, because you know the we we have done um webinars on the on on the uh through the cncf as well in the past, so that that sort of thing would definitely be useful. I'm sure.

H

Okay, so I'll take that as a to do as well as continue working on the document with the physics so.

D

Raphael, it's aaron. um I think it's good to give a little bit of history. Is this the same doc that has kind of been in progress for a while, when I was still there in that? I think it's really hard for people who are adopting kubernetes in a not completely cloud deployment to understand the complexities of running it on-prem and on cloud and taking our traditional storage concepts and trying to apply them to cloud native, and I think that's.

D

Maybe the goal of this is to take what people are normally used to as far as their rto and rpo in terms of storage and what that looks like if we run that in cloud and how we can achieve that capability into artelon's point like we have to have replication. I think just some perspective is also about what are the costs associated with that, because that seems to be a concern when I talk to a lot of different people about deploying things this way, you know that are at huge scale.

D

um What are the the benefits and possible drawbacks so that that would be helpful if we also outline those for people- and I would arlan what do you mean by, like you know, being specific? Are you talking about creating use cases for each one of the clouds and how we know that they work or trying not to be specific?

D

What was your comment there, yeah.

J

J

So I think I mean it kind of depends on how we look at it. So, for example, like you know, I work at netapp and in the storage people usually talk about like five nines six nines, you know of availability, like rto of you know a few seconds or you know uh it's a very the way we talk about our real high availability is somewhat different than the cloud world.

J

You know where you know, for example, if you look at the slas of aws or gcp, you know they usually talk about like 99.5 or 9.5 availability, you know and if you're building a cloud native solution, unlike on-prem storage solutions, where you can control the whole stack networking, uh it's it's a different kind of beast in the cloud world right. So um I think some of it is a cultural look.

J

Basically, we have to talk about these terms a little differently and that's also, I think, hard for storage consumers, but I think that that's something we should kind of like. Let others know that you know you can't make the same sort of guarantees that you can't make for the on-prem systems.

J

The way you know you can't do for vmax or you know, uh but.

D

J

So whether you know, like I mean I, this is the first. I just got access to the war dock today, just.

H

Right 30 minutes ago, so I don't. I.

J

Don't know exactly what the document is about, but.

H

You know let's work together on that, because sure I think you will find it. It was an interesting insight for me, as I was doing this research in in in what I call the cl cloud native dr, which is defined in this document right and you don't have to agree, but what I call cloud native dr disaster recovery, the features, the capabilities that enable disaster recovery don't come from storage.

H

That could be a little surprising because historically, storage has been essentially, the storage and the storage team have been providing these capabilities for the entire uh enterprise right. This. The insight here, the discovery for me, was that these capabilities now come from networking.

H

Storage has to be there still as an important role. We still need backups for logical failures, but not for dr. We still, we probably don't need volume replication in this new world, uh because the responsibility of keep the state in sync is to the application.

H

J

The volume of data replication becomes the responsibility of the cloud provider right because.

D

A

J

H

H

It's not a storage problem, it's an application problem, so it would be responsibility of the database. It would be a responsibility of the queue or the cache or whatever middleware you're using. So that's where.

J

Yeah my point is it's not as critical as it is on-prem because of the way, for example, let's say aws ebs does replication on their side or gcp pds. They do versus with on-prem you're solely relying on the storage solution to do the replication. So that was a.

E

H

It's either on-prem or on uh in the cloud. We don't rely on any of those capabilities. If, in this, in the pattern that is explained in this document, we truly rely solely on on the application.

H

A

I just want to make I I just want to make an important point, because, because I I feel we're we're in danger of going off in a dangerous tangent. The the point that we're trying to talk about here is not about the capabilities of individual platforms, it's more about the architectural patterns um that would be implemented and.

D

A

You know the the the architectural patterns that we're discussing um in this document are not linked to you know any specific cloud provider, or you know on-prem provider or or anything like that. In fact, it's it's it's more about. This is how you would engineer um this is about how you would engineer a system and cater for consistency and make sure that data is available in multiple places and then allow, for you know the the the failovers across you know, different kubernetes clusters, etc. That's kind of the concept of what we're talking about here.

A

um We're not we're not specifically saying at any point that you know you would do volume replication in a certain way or database replication in a certain way. But but more around the point of saying you know we, you need load balancing capability and you need you know a middleware or or a database layer that that can do the replication or you can have replication happening as a volume layer.

A

But but the point is, you know we're not talking about the specific platforms. In fact, this is, if you wish generic way of of of of defining.

H

A

That that can be used in in any cloud native environment.

H

Right, thank you alex. Yes, in fact, we from the internal documents that aaron was referring to before we I stripped away everything that was kubernetes related or openshift related. It's all, it's very, very generic, now that the pattern should work anywhere, whether you use machine with, or machines, in the cloud whether you use containers on prime, wherever you are the pattern, this pattern that we are presenting should work.

H

Actually, I think it will work um and that not to say in this document we are not saying that's the only way to do it.

H

We are just saying: look there is cloud, enables this new way, because everything is declarative and automated, but then you can also use all the traditional tools that that you have for for dr, and you see in the document that we list that there are listed all the the traditional approaches also, but we are trying to make the point at least I'm trying to make the point that there is probably a new way that should be considered.

D

I think it's right for discussion. To be perfectly honest, um I'm part of the the end user group now as well in the cncs, and there is a lot of discussion around storage and durability and availability and the the consensus for most users of kubernetes is that they can't the storage is not. I don't think they use the word dependable, but they haven't had good success.

D

um So I think it's also one of the topics I brought up on the the toc yesterday is I'd like to have a little more focus from them as well as projects and ideas and and how we take these things and socialize them a little bit better, so that people can achieve the same things. Hopefully they can do on-prem or in private cloud that they they can do in in cloud. So they have a consistent way of doing things. I think that's what we're trying to achieve is.

D

We can still rely on the storage, regardless of how we're deploying it using these particular patterns. So was that what you're going for raphael alex? Does that sound, reasonable.

A

Yeah definitely.

H

Okay, would that team be interested in talking about these things, the user I'd.

D

Love to talk about these things, because I I think I have definitely my old red hat perspective and now apple's perspective, and I think it's interesting to see the two meld together. So I'd love to hear the diversity of thought around this and what people are doing and customers are expecting so. Okay,.

H

A

A

No, no, no um so look I I we're coming up we're coming up the time. I I just wanted to say look. These are all good points, but we um we should make sure that we that we contribute to the documents- um and you know, if need be, add paragraphs, add comments um and we can. We can. You know then further discuss and merge them together and we can have um separate follow-up goals if we want to discuss any particular points in in a bit more detail.

A

If we want to structure the content in a particular way for.

A

Example, I think we are at time, rafael. Was there anything else you wanted to add before we go.

H

No, I'm okay, um erin I'll contact you privately just we couldn't continue that that's good.

D

Okay, yeah just hit me up on slack thanks.

A

Cool. Thank you. Everyone.

D

Thank you bye. Thank you.

E

E