Ceph CDS Infernalis, 4 Mar 2015

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CDS Infernalis (Day 2.2) -- OSD: Tiering

Description

Videos from Ceph Developer Summit: Infernalis (Day 2.2)

04 March 2015

https://wiki.ceph.com/Planning/CDS/Infernalis_(Mar_2015)

A

All right now we're on to the next one. This is a double whammy here with the next version of tiering or the the next the next take on tearing and soul is the dynamic data relocation for cash during all things, cheering. So damn you want to start with yours, and then we can hear from the Intel guys on theirs as well sure.

B

Okay, it's so common. Can you guys hear me yep an off requested feature is an ability for the OSD to offload cold data, not necessarily to another ratos pool in the same data center, but to something completely different and hopefully cheaper, like, for example, a cheaper crappier Rados cluster in a different data center or I, don't know s3 or something so the way the current cash during system works. Is we pretty much just make liberate O's calls to the next pool down, but in principle we could use really anything else to write the objects out.

B

We don't actually need it to be a ratos pool. No, the idea would be to create a plug-in system where you could define a backing, pool that hands off to whatever your specified plug-in is and the tiering agent.

B

This wouldn't really be a cashier, be more of it here or tier would use this opaque, plugin interface to actually do the object of motion and it would store within the the Welcome greatest here a metadata object redirect indicating which plugin was used to offload it and some information from the banking system, but where it went so I have here a very low thought crack at what the interface might look like. um The details aren't particularly important.

B

I think the only useful bits are that the interface is append only and you you don't get to define the name of the object you get opaque, handle back. I think this might be useful to allow the backing tier to cram placement information in the handle it gives you, let's see.

B

And similarly read as stream-based no snow seeking, so you promote a whole object or nothing at all this, but makes a little bit. This may make it easier for backups to implement this interface efficiently. Soto.

C

Has only you could make this so that the implementation, if it were ratos, for example, it says it gets to decide what the name is, but it actually chooses the name that you provided yeah.

B

I mean if for rados that part isn't important. This is more for a system where, for example, you don't have rado style, fantasy, dynamic, object placement and you actually want to put something like the data center. It's in in the name, I, don't know: yeah I, don't.

C

B

If that parts all I was hoping to get feedback on that or Indian s3, it might indicate the what are they call it. The region zone, whatever yeah.

C

You might add it off the tape, and it gives you like a which library which taping the library ended up on which bring.

B

Or might be that questions.

C

Yeah sorry I think systems like laws to also have that that model where you look back at NR was, is a camera company. This the one end but yeah I got the yeah yeah.

B

Anyway, yeah it's, it seemed it's. There was no reason for us to get to Fort for us to impose a name on at least I can't think of a good reason. First, one pose the name on the back end so might as well let the backend choose the name. I. Also don't want these to be over writable at least again. I think it would be probably easier for a system like a radius ratio coding pool not to allow partial overwrites. So da jets are a pendulum. They add in immutable what's closed.

B

So one thing that is a question for me is there are slow backends like s3, and then there are hyper slow, backends like glacier or a tape bot. Do we care about extending in this plugin interface, to be capable of handling something like a tape, but it would mean that Rados would have to be able to propagate in some kind of an ian progress error code to clients.

B

That indicates that I'm working on a check back later, because with sure it might take four hours to retrieve an object, I think I read that one time damn tape lot. Similarly, take time to get it an object.

B

Another well I mean so this. This is another one of those things where I don't think we both move on it until someone.

D

B

Us a good reason to either do it or not. Do it yeah? That's that's a question. We want a motivating use case for similarly with um go good.

C

I think for a tape, bot I mean the latency czar. Like um tens of seconds to minutes I mean, I think, that's that's you know excruciating but but tolerable and throws the win. You get right for tape. Robot is wouldn't be our. So would you know, but, but I mean, if you have, if you have like an HSM type system, where you have just random files in your system up and archive up to tape, then you go try to read them like users, kind of expect that it's going to take.

C

They know that there's a tape of the back end, so they'll just they'll just wait and the file systems that are sitting on top. Don't it's not like the system calls returning in progress and a rio try to reopen the photo. I just.

B

Block we have to return you in progress because we can't let request filled up on the OSD like that. I see. Oh I see okay. So, while this is in progress.

C

I'm going to return.

B

An error to the client, the clients free to block whatever it wants to block, but we can't have the messages sitting on the Augustine.

C

So this would be, this would be between to be extra exchanges between the brightest and the OSD, but the liberators call it sub would still be blocking I.

B

Think the liberators call itself returns, Ian progress and anything above that that wants to block and block man. Yeah, okay, I, don't want liberators, even keeping a keeping track of outstanding and progress up object pro shins either. If that's I think the client thinks is important, it can probably just do that. I mean for something this pulling on. An interval of like tens of seconds is probably fine and that's good enough. We don't need to keep you enough state for a Fred notify police. That's my feeling.

C

Ok yeah, I would start simple, but but we can go there too. I.

B

Don't want to don't the it seems simple, except that we're building up all these requests. um That happens enough on the OSD, as it is I think going forward any designs. We come up with really really minimize anything the OSD holds in memory, and it's not really that much more complicated. It just means that librettos returns in the in progress and if the client library wants to wants to pull it can do that, pulling is easy to to it to it anyway, right.

E

B

Okay, so another question is that it's interact with snapshots if at all in principle, it's not really that hard. We just write all the relevant snapshot, information into the metadata object and that eat that even allows us to do snapshot trims without promoting but it'll be kind of tedious. So that's extra work we'll have to do it. Also justice lot of snapshots, so another one is because of the anonymous object. Name nature of this. If you open an object, start writing and then there's appearing interval.

B

We will have lost that information, so we probably need to write into the PG log that we're starting a demotion to one of these things so that the next primary can look backwards to the log and clean up after Eddie canceled promotions or demotions yep yep. Those are all the gotchas I've got I, don't know if anyone else has any other ones or use cases that are missed. That's that's the big one.

F

Family, big whoop.

B

Alphabetical order.

E

B

Who's trying to talk that's enough.

E

Hey yeah: it's me: okay, mm-hmm. You hear what I say: yep.

B

Yeah sure go ahead.

B

She can go, go go up. Okay, yeah yeah topic.

E

Yeah I have a question: I mean if we want to do this as a lot of care of history. Why why why we don't do this as a new against or we can. We can implement this all over the tip, both or and or whatever others as a lubricant start, and then we create a pool as their code style pool and then make.

B

A cast you okay, so the the main reason is that there isn't a reason to do that. So whatever this, so, if you're using s3, for example, it already handles object, placement there's, there's no reason to part to partition the namespace. It has three objects into WP, geez that only proxy off to s3 that doesn't seem worthwhile to me. We might as well have the cash here simply write directly or tyr tyr right directly to the backing store. The other reason is this is a little bit different.

B

This isn't really that similar to cash nearing it's similar in that it acts like a deer, but, unlike capturing, we don't use the absence of an object to indicate that you should look at the next gear down. The top-level tier would actually contain a meta data object for every outfit.

B

If that makes sense, so you would act like a metadata cash. But to answer your main question: I, just I, don't see an advantage to doing that. Maybe I'm missing something.

E

Maybe okay well.

G

I mean so you're gonna matter.

E

Go go, go up, go ahead and also.

C

He gues don't ask.

G

A lot of question.

E

G

Ahead, the data you upload.

E

The data uploaded into the core sorry and should we I want you I mean when the Dead has outlawed to the coast orange? So we can better not correspond, be able to read from the other. Some other kind, I mean I. The data is recognized us by some some some others. So.

B

That's another question: we come the way I've described this this interface. There isn't really any relationship between the name that the back end gets or generates, and the name for the object in the front end and there's not that, as in the raid 0 school there's, also not necessarily any really good relationship between the name. The object has in ratos and its actual user facing manifestation. If it's a raid ocw object, for example, it's like the head object is named after the user-visible s3 name.

B

That's the raid 0 CWS, three, not the back end s3, sorry for choosing that example, but um the other objects, if it's, if it's a big object, are named after some kind of monotonically, increasing sequence, the ER and a something else, though, the only way to do that would be to propagate some kind of information about what this object is and how it and how it relates to the higher level user-facing object, because of me I mean.

B

Does that make sense if it's a ratos object and the user has control over what the object is, then, maybe it's a little easier?

B

No I guess! For me. That's the tricky part, it's not clear to me how how you, because, by the time you get got tomatoes in the first place, it's already pretty well digested.

B

Okay, so basically, whatever whatever process was looking at the cold storage and wanted to perform reads, would have to a be capable of tolerating the fact that some of the pieces of the thing that it's looking for might not be there because they haven't been demoted yep and to it needs to be able to reconstruct the jigsaw puzzle. So it would, it would hey, would have to know what the what the user facing application was and how it was naming objects.

F

F

B

Seems like a desirable property, though so give a way around that, though, for an interface change that would allow us to that more cleverly. That might be useful.

F

Yes, how may be a dumb question, but but the Dodge GW bacterium that either is planning to implement right. So it's these separate pockets of different storage so for how actually implementing that without this infrastructure on the way is beside like like on different through the sorghum different pocket, really.

B

I'm sorry I'm having trouble hearing you yeah.

F

So so, basically, what I'm saying is that it's probably did it. What you do is actually trying to implement on the s3 bucket right on that, like cheering the court hearing on that in the whole story, general look wait.

B

This is a are you talking about a thing: ranis VW is imploding, yeah.

F

That actually moving it support of of you during on the laying right.

B

Not necessarily I don't actually know how it works. Sorry, maybe you could.

F

But, but even even that that right, so if somebody wants to put the data on the desk from different schools, which is actually for retains of seconds to access it right so that that could be the instance of this is putting different different objects on on different storage. I.

B

So am I understanding correctly that Rados TW is implementing a finger or you might specify that a bucket is located on a different kind of storage than ratos.

F

Otherwise, how actually don't be saying that if somebody says that ok I want this object will be accessed only once in a year, so they have to look at it in some stories. Actually, it's very frequently fix a star or a little bit, even eventually my turret, we're straight because we raised you so how they actually will be doing good without the help of this was the devil suppose I.

B

I, don't know, I'm not sure how the rest VW thing works, so, okay, so they're, basically you're saying rados GW has I is talk or you who does talk about adding tomatoes to wa policy system, where you can say that this object will only be read once a year and you're asking how you can implement that without help from the OSD yeah.

B

Ok, so rados GW has the freedom to write to any pool at once, though, from its point of view, if it has a replicated pool and an erasure coded pool available to it, there's absolutely no reason it couldn't just choose to write to the erasure coded pool. Instead, it doesn't need OSD help for that. um Similarly, it could write to s3 on its own. That would be a little bit pointless, but maybe it could.

B

This is more for a system where you're not either radio CW doesn't know up front which objects will be cold and you'd like over time, the cold addicts to be devoted or a system where the front end simply isn't that clever. If that make sense, yeah.

F

I understand individual decision day on the way he 3rd, but if somebody says that I want this project to be stored on it in a pool safer on the HP website. So will not save the whole day. Sure.

B

Right ocw were just right to that pool in seven yeah.

F

So I think that's the same question the wrong with of Phoenix illegal fun, figured a pool on double dates, every right side, and it said they.

B

Might say so, no, my my argument against that is that radio CW wouldn't write tomatoes. In that case, it would write directly to the cold storage system. Yeah.

F

But might if you, if you read attorney, said it's something like try again or something right so like.

B

F

So we are saying that we will be returning some in in progress right then she turns straight us to the upstream so in that, so why we're actually planning to do that? Why not actually wait till this subjects, no really cube and then send the deck for getting here so painful, okay, what is actually knew ed knows. Ppl lose any time. Oh no, the.

B

In-Progress thing is only about at like at a pot or amazon glazier, glacier, where it simply takes a preposterous amount of time to actually retrieve an object so yeah you week. We could just wait for the IO to complete- maybe maybe that's the best thing to do, but I was I was the question was basically doesn't make sense to extend the interface to deal with that to instead return an error and say, here's a handle you can use to check on the progress of this operation come back later.

B

um Yeah maybe makes more sense to just to just accept that it'll take a while end. Yes,.

F

The same interface but.

C

B

How that relates to read ocw.

C

You I mean you might have, it might be perfectly reasonable to have all the ingest come into a rate of spool, and then you know slowly in the background that get split off to something really slow. That's actually the typical model for for tape, because it gives you like the random utiful objects and then they get staged out after an hour or something so I. Think that then you might, you might have that like a pool policy. That does that.

C

Maybe you set a hint when you're doing the put that this object is going to be cold, and so you hint that to ruidoso knows which of the yeah its menu of slovak ends to use.

F

Instead of actually play flying directly saying that I don't need that I over 250 to say so, slow.

F

B

Saying you would you would like it to not be stored on the regular here? In the first place, you would like it to be directly off offloaded yeah.

F

I can do the blue, nissan esflow storage and then rgw for me 90-degree to the dinner than this game, giving him to the OSD nosql actually about the time.

B

Well, if the, if the pools on the soul, search that it's already there, there isn't anything else to do. Yes,.

F

I, don't quite.

B

F

That's it. One question is actually worth it of this like giving a gig and their lecture will be bring that then we have a pool that will be that we can create. It would be way. Oh this.

B

Is for a whole different. This is, for a whole different level of slowness. This is for a level of slowness, so slow that it doesn't make sense to run rato set up.

B

Basically, you you, you use the rados pool as a cat as a as a metadata cash for the even more monumentally, slow, back-end tape or s3 system. That's that's! What we're talking about here.

B

Does that make sense.

C

So there's there's a second half to the session. There's a second blueprint: ok from young dynamic data, relocation for cash during right, yeah.

E

Yeah, yes, okay, yeah.

C

E

Actually I yeah I swear. You have talked them part of this in the previous blueprint, okay, hey. What do I want to do is to add to increment, not adhere inside manteo storage in Santa. If that, currently we have cast here and then we have best here, I may be regulated, let away and can do manicures. We have that hot here we don't.

G

E

Here and the tomatillo at the code here, maybe as many as we want I, and then we can what we want to what we can do on this mud. What year is that we can, I dynamically and relocate better between disappears and then we can. Maybe you can do it manually or do it automatically yeah. This is what that, for what would what so.

C

I think I think what what what Sam was talking about gives you, the third tier at least all right, so you'd have you'd, have the base here, which would be the warm. The cash would be the hot uh-huh and it would point up to something colder and Adam hub. We haven't really contemplated.

D

E

Multiples here, I, don't and I know that the current implementations support support them. Maybe we can let T the cash Torian.

E

We have a suite courses that we have to report and then the first one is to catch the office. Animal a second ones that catch up to settle on we supply run out.

E

There was up at this right now. Well, I haven't.

C

Tried that yet.

E

C

Mr. Burke aster is the wind already.

E

C

E

C

Yet ok, students.

G

To catch you up.

E

Another here, oh not yet not.

C

E

Cash: it's what they sizing yeah.

E

Ok, maybe you can talk to to support this right.

C

Yeah, I'm I'm not sure about layering caches, but I think definitely, once you get down to the based here, having lots of additional tears, I think makes sense. I mean the key thing is that for any object you have to go somewhere to the side where it's located right.

C

So that's effectively your your metadata pool or something that says where the object is and that's effectively. What the based here is. It just could also actually store the object itself.

C

So you can imagine that you could have you know a 30 tier system where you go to the base, pool and look at the pointer, and it tells you which, just which other tier to go, look in, except that the what Sam is describing their sort of Anita below once you follow that pointer, it's it's a it's a read-only copy and you have to pull it back out in order to Spotify it, but you're basically limited to a cached here and a based here as the only one second except rights, and then everything else is a sort of a colder, a colder tier.

E

So you think, with sweetie RCC not well.

C

To rideable tears and then as many cold, tears and sort of the third, ha as you want, I mean that that's that's currently what we're! What we're describing um I think the question is whether that's sufficiently general to capture all that stuff. I.

C

Mean okay, so in your example, having an SSD tier, an HDD tier and a new seat here, I think that definitely works. um Having like a fourth tier of tape, I think it works there too.

C

The only restriction is that what whats am subscribing means that, in order to move it from a racer coded to tape, backup do it, it would actually the based here would read it in and then write it back out again and it's not clear how? Where? Where would just what would make it decide that? How would know that it's, it's so cold that it's just gonna move top the tape, maybe some external hint or agent, or something we have to come along and say this data set is ancient and I.

C

That's probably maybe to Caesar would have to do it right. Yeah, the den.

E

Is the second kind of my ring and yeah we can do some I do some automatically at all yeah, maybe time to do automatically, but we can manually to relocate the data to the code here in decline. Sign using some common ions phone status, form of the you can add some cumin 'god need you to send a bottom to the female MP. No bottom yeah, as we told a parable, appear in the bottom in the in the previous book cream, and then you have a question folder or for that done.

E

If we do this, when we don't do it to that, when we get a lot of 4 and then ha standard attitude to the SSD poor right yeah, but sometimes we sometimes where you want to amp in this this hotbed, huh they will have a video we want to paint and in the cast here and and then later. Maybe it is becoming a hot. We want manually to an aunt in the cast here. Yeah then buddy, but he do you a case.

E

You want to great for later from the SSP for to another pool, right, yeah.

C

Yeah I mean I think so before the example was like a database that you know is always going to be hot and in that case just put it on the other pool but I. Think in your in the video example, then that makes a lot of sense or you know something is about to be high. You pin it and then you know it's going so I think that makes sense. It's like having a having a good operation that seems pretty reasonable, yeah.

E

C

G

The two layers.

C

Go ahead, I mean the other. Half of this is that I think that the hint the hints that that we've already added um I think also will will be, give you most of what you need right. You just want. We want to add an additional pin and unpin.

E

C

Mean generally, you.

E

Mean the do not meet or really doesn't yeah.

C

We'll need and don't need in ocassion all that DX, okay, okay, yeah I mean.

E

G

C

Just extend that extend that then two more yeah I mean maybe maybe pin it on pinterest separate operations, I'm, not sure you're.

E

Missing it better to expose this as another, another operations is.

C

This person point.

E

Okay, Yeah Yeah, right, yep, okay, okay, so yeah.

C

Yeah I think I mean this is kind of coming back to sams, deep in a conversation with on that, if you're still here Sam that I am sorry.

B

I just couldn't hear him over the bike Mike at all.

C

Yeah yeah yeah, you might invest in a new microphones on that. It's pretty it's pretty muffled. So the one thing that I probably should have done but didn't is, is contrast what what you wrote up, Sam with the what we sort of dreamt up for ever go in like Firefly EDS, with the with the cult earring. Did you by chance I'll get that one? The.

B

Only real difference is so they actually cover kind of disjoint areas. If I didn't I haven't read it recently, but if I recall they cover disjoint areas that one was more about policies for, or choices for, how to implement policies which we still need here. So all of that is still applicable here. The only contribution here is that we might not use a ratos pool as the cold dear. That's the only difference so.

C

That damn that that is that's definitely a big difference, but I think the other one is um so. This is the radio straighter excellently post the link in the chat. um The other difference is I. Think in in this original proposal you could I care what it is, because I think I think you could actually write to the back end.

F

Yeah but something.

C

I, don't know it was complicated, I just remember that it would. There are all these all these PG log events and.

C

Yeah, because we were allowing rights to the cold here, basically I think that's the main difference and what you're doing is simpler than that it. Just if you touch it with a right it has to promote it. That's.

B

All right saying: okay, I guess that's okay, so there might be the intermediate system where we do know something about the backing tier and we can be more clever I'm. Not it's not clear to me that that it seems to be that overlaps kind of a lot with the current cash during implementation, yeah.

C

B

Yeah, the only real difference is redirect objects. So how much do we get by having redirect optics or how much do we lose by having them I suppose, radar cops are kind.

C

Of nice to have actually I mean that's what you're proposing basically right that it's awesome. You have to have.

B

Them here, because the system shows we need to maintain trados ins own metadata, using these objects, but with we have a we do the cash during approach, we're using the backing pool to encode the raiders level metadata, knowing that we can recover it during.

G

B

Here we wouldn't even have control over the object name, so we have to write at least that down yeah yeah yeah.

C

Which is right, so it's a redirect, um I guess, I know: I, don't see that I don't see that the what I'm, not sure I, not sure I see the difference. Oh no, sorry.

B

It is a that stuff's what I mean. That's.

C

Okay and different all.

B

Right, yeah, dad I'm sorry I'm just keep using different words cuz I cant rember want to use last time: yeah, okay, okay,.

C

All right, yeah I, feel like we should reread this, because.

B

Ya see this one involved a lot of cooperation from the backing tier, though right, the client was actually able to talk directly to the backing tier in the soonest design right.

C

I'm battery dead I'm here yep bit.

B

Yeah that, in this case, the client is able to talk to the backing tier and it's able to speak Kratos and get ratos like replies that wouldn't be true for a tape system got it okay, so in in that sense it's different. So oh that's.

C

At of us, that's why I so complicated okay, then.

B

The next question would be: is there value in having the cash during system the Firefly variant of cashing, where you're able to get useful information back from them from the back? Dear? Incidentally, you could have a ratos pool in a different data center, which you could talk to like a ratos client, so that that there would be a place for that and yet a third system where it's an opaque plugin and the primary handles all of it so to eat all three of those or maybe just the cash during of the third one.

B

That's that's the next part, so we'd want sort of motivating, implementations or motivating uh use cases to make a decision. I would, I would think yeah.

C

I mean, I think, the two, the two scenarios that that seem reasonable to me are.

C

Something like what you propose where it's you know having a plug in it. That's some other back end could be s3. I think, looking at something like a dryish like type Layton sees make sense because it could be rgw, none of the data center or like the Geo, distributed whatever, whatever it is, that that scenario and another rato school, so you could have a castria, that's SSD at based here, that's replicated HDD and a like a very widely striped erase your coded cold. Here, that's still read us, but I mean if it's.

C

If it's a plug-in mechanism, then you could just make a ratos plug-in right, yeah, I, guess, but and.

B

You lose the you, you lose the rado specific, so if we actually do want to have the the Richer protocol that allows the client to receive a redirect and then go talk to the pool directly for performance and or offloading the primary CPU reasons or few reasons, then we would need the the intermediate yeah fireflight style touring, which is just about the worst name. I could assign to that look, which is this book.

C

I think everyone knows I what I mean you're talking about the yeah, the old redirects.

B

Yeah, the word hearing has become distressingly overloaded, yes, yep.

C

C

Do we have any sort of feedback as far as what makes most sense what people actually want? Yeah.

B

Because all we've all we've had so far is just sort of vague rumblings that they'd like to be able to put cold data on something cheaper than another rato school, which probably isn't enough to go on. So we we'd want a more concrete motivator.

B

Anyone anyway mailing, let's convene Douglas. If you have ideas there, yeah we're now, don't don't! Let me stop you.

C

G

Like the idea that the base tier is the the master copy, it has all the redirects, and if you were having agents running policy, it would run on the base here. Yeah.

B

And that's that's another question so at this point the policies get kind of more complicated because with cashier you can really only have one based here. That's just the way.

B

It goes because we're not keeping any any metadata, but with something like this, we could very reasonably have a situation where different different kinds of objects, for different reasons, with the off letter to different kinds of cool tears, so another blueprint for maybe the next CD s would be an idea of how users might be able to specify and extend the the placement and cheering engine campus decisions, either by exposing a through Rados or through some kind of plugin api.

B

That would allow us to extend the existing online or asynchronous cached hearing agent, which again in this case, would apply to things other than just the cast during the publication. That part is just an asynchronous process which runs the background of scans office and does stuff. Based on that, we would sort of generalize it to that point. How you think.

C

Yep, so the one other thing I want to throw out here while we're talking about this, is that the other nice thing about this concept, where the base tier has all these pointers redirects, is that that model also would allow us to do some sort of deduplication underneath it where that pointer says. Oh, this object is composed of this chunk over here and that chunk over there that are in some colder reference counted greatest pool or something we're all that for all that you do. Placated chunks are stored. That's.

B

True, you don't even lose anything by doing it that way, because the UM you wouldn't want to talk to that pool directly anyway. You'd want to go through the the base.

C

Geocities school and it be, they be sort of india, mutable reference counted yeah, whatever yeah, and then, if you wrote to it, it would promote it again, either pull it back into the based here and eliminate the redirect or right.

B

For this kind of chilliness I think the promote on small right isn't: gonna bother anyone who's, disabled.

C

Yeah well, don't complain anyway, but.

C

C

E

Excuse me yeah that the conclusion that I might be P by the.

G

Second part: I.

E

Mean that the relocation, let me look at data I'm different, he respond. If that then is worth doing right.

E

G

The date Hannah cast.

C

Year and maybe.

E

C

Mean I think I think pending an impending, makes sense to me. What about you have an opinion air time I.

B

Think that is an example of something that could have that could live in the raid 0 slit level that we can use to implement smarter policies. I. We haven't even really begun to explore that that kind of interface, so that that sounds like a good idea. I think well.

C

And that would fit in within the current cash steering right. It'd just be like a bit that we sat on the object info. That's like.

B

Ya know what I well I would prefer it be a little more general, but.

C

B

But yeah yeah, that's I mean we. We haven't explored that much making the the policies more expressive. I guess is the right word way to say it so yeah. That seems like a very reasonable thing. You'd want to be able to do to it to an object.

D

High time I have one question: ok, so this plug-in interface can can. Can we make it more intelligent saying that if it is a ratos, plug-in interface and probably can allow overwrites to, and so plugin tells you that I can override the object or not overwrite it? Something like that.

B

So there are sort of two ways you can go with, that we can make it so that the plug-in actually supports both kinds and the OS DS problem, and it's just up to the OSD to query the plugin as to whether it supports over rights or not and act accordingly. But the larger question is I wasn't clear to me that we gained a lot by doing it that way, because the advantage of doing over rights is kind of lost. If you expect a lot of locality in the object anyway.

B

So so this is a very, very cold tier right. This is meant to get to capture the case where ninety-five percent of your of objects ever ridden are literally never read, and in that case those as long as those objects never leave the Coulter. You you've kind of one right I'm having having an extra promote on a small right when didn't necessarily need it might or might not be that big a deal.

D

B

I mean you're you're right, it would depend on the use case and it's it's also not clear that that that that wouldn't give better argument for implementing the level to hearing one where you know it's a ratos pool and that allows the client to actually be clever and send the right directly to the back and pool.

G

B

That does that make make sense.

E

We are winding them more.

B

Complicated protocol where the client would negotiate with the base tier or with the pot tier OSD and receive you know in enough information, allows it to conclude that I can safely right to the base here right now, and in that case it would be, it would be able to do partial rights and also other things that would be faster. Does that make sense? Did you have a use case where it will be useful?

B

Well, okay, so any any back-end that supported partial overhead sighs. I suppose the answer that question.

C

Yeah, the worry is just that that, as the complexity of the bold proposal, sort of suggest, anything that involves redirects and the sort of non linear path through the means that it gets just, gets really complicated. When you start thinking about all the reason in this case, you could proxy it yeah practicing symbolizes to thin it lexing.

B

Simplifies it a lot actually yeah. Exactly so ya know: you're you're right. If it, if it supported partial rights, we could conditionally proxy the right through to the back end. If it was the kind of thing the interface was capable of supporting, um so I guess we'd want. We want to use case where that was a whim, I suppose, yeah and I don't think it's super well, it was less words, but I feel.

C

B

Complicated there are a couple of gotchas like there's no way to so our Oh map interface is obnoxiously rich compared to most most other systems. Well, so their systems, don't give you an arbitrary key value thing. So we get around that here by just streaming it out to a packetized, byte stream out to some buffer stream in whatever the back implementation is.

B

But if we allow partial overwrites on an object that has a no map payload, then that gets a little complicated, but that's not to say that we couldn't simply have a policy where we record when we offload it, whether or not it has a nomad payload and always do a promotive if it has one. So it's not really that bad. Does that answer your brush? It off notice that I can just work confusing.

C

B

About the question.

C

I think they dropped maybe.

G

C

B

We're willing to proxy it's not that big a deal yeah.

C

Which we would be yep I, just I, just worry about making proxy everything apart, see because I, don't that can be limiting to anyway, I think we're probably oh I have five enough to chew on yep yep, okay, alright.

A

So we're actually into our 15 minute break. But you want to take five minutes here and go grab a drink or use facilities or whatever yep.

C

Yeah second break all.

A

Right now back so we'll take five minutes and then come back and then resumed. Okay.