Add a meeting Rate this page

A

Well, we'll we'll just spare a few minutes to make sure if you let people join.

A

You.

A

Hello, everyone: um this is this: if six Oracle and we're just waiting for a couple is to to join and then Derek and Flavio will be giving us a presentation of privy ger, so will will address, wait for a couple more couple, more minutes, we'll probably start at five past the hour.

A

You.

A

Okay, I think we should. We should be able to start. Aaron is going to be joining in a minute. Quinton, unfortunately, had the power.

B

And.

C

Internet outage so.

A

You might not be able to join so in that case there can Flavio I'll hand over to you to give a bit of an intro for the of the pre future projects. I assume you have some slides to share.

D

Yeah hi this is 5. You can y'all hear me right.

A

Recap: okay,.

D

Alright, okay, so let me see, let me share my screen.

D

You.

D

Frickin, can everyone see my my slide, but I can't chip, okay, all right, so um how much time would you like for this I suppose you don't want me to choose the whole time. Is that right? So how much time do you think I have just so that I calibrate.

A

We we, this is the main thing on the agenda for this meeting, so so you have most of the most of the hour.

D

Okay, all right, okay, so I won't rush. Then my name is Flavio. I am now be talking about Bree Vega, as Alex system just introduced. So this is pretty much the same presentation I gave at the CN CF webinar yesterday. I did make a few changes that removed some of them of discussion on flank and I added some some some new content. So you know- hopefully many of you haven't seen that presentation, so it won't be. You won't be. It will be new to you uh before I get into pro Vega.

D

So a bit about a bit about myself. I am a senior distinguished engineer at Dell, so I've been working on the purveyor project since 2016, so I have completed don'ts towards the end of 2016, so I completely be three years working on the on the project so complete for towards the end of the year. My background, this is in disability. Computing I was in in research for a number of years.

D

I was researching in Microsoft research earlier in Yahoo research and I, working on the number of Apache projects, most prominent ones, that I actually had I, helped you to to to build from scratch or zookeeper and uh and in bookkeeper both in the ASF. So I have some contact information. The case you want to reach out to me or follow me on Twitter, so you may under and the Twitter handle. So now. Let me move on and I'll talk about motivation and such so.

D

The main motivation for bra Vega and many of the systems that you hear that are dealing with the streams, a stream processing general, are the many sources of a continuously generate data that you have. We have out there and I'm sure that you have come across a good number of them and that's not a surprise but being more concrete about about the sources. They can be. Events that are end-users generating. We can think of the traditional social networks or users are posting events or you can think of online shopping.

D

Your users are purchasing items are performing online transactions or they are searching for products. So all those generate data that you might want to capture, but it's not only about end users, so you can think of machines, also being sources of continuously generated data. You have servers that are continuously producing telemetry, that I you want to capture so that it can spot problems early on in your food of service. So that would be you when you use case, but it's not only about servers either.

D

There are the types of machines that are that many users and applications care about so sensors in IOT the holds the promise of a connected cars, autonomous cars and so on. It's not quite a reality yet, but we are going towards the directions so hopefully that that will become a reality eventually, but all those who be continuously generating data and and ingesting the data and processing it could be, could be interesting or or even a requirement for a good number of users.

D

Now, if I put, if I put those comments into what I'm calling a landscape here, what we have is on the left-hand side. We have various sources of data, end users who have machines I, don't know you can have you have drone sensors are connected cards. All these things produce a continuous flow of data, that's I want to capture and that I want to process.

D

But now the processing might not be as simple as just filtering the data or normalizing the data I can it can have various stages, and so we in the end, you need at least two core components so that we are able to achieve this goal of ingesting processing the data.

D

One is storage right, so so capturing the data storing it so that I can I have it available for processing and second having a stream processor that is able to take that data and make sense out of it, and those things can be combined in in in a number of ways you can think of the processing as a directed graph, where you have interleaving stages of of storage and processing and the output of those pipelines can be a number of things.

D

So visualization where you, where you are representing raw data in ways that are just more intuitive or it or are easier to to extract insights out of it, you can produce alerts. If we talk about a fleet of servers that I know bad things are happening in your infrastructure and you want to know about it generating sites about maybe users or your applications. You know front-end applications that are, you might want to know that there's a spike in traffic or or or events of the like recommendations.

D

Again, if you talk about end users, what other users are looking to, or users with a similar profile and and finally just actionable analytics where you you present data or present results that could be useful on any action you want to take. You go visit a customer and you want to know more about the counselor or anything related to that customer about to visit.

D

So that's the general landscape, but to talk in you talk a bit about a few use cases that we have seen in in the fields.

D

One a class of applications that we find very interesting is the ones related to drums where you, you just think, video produced from the from the cameras in the drones, along with telemetry, so both of them are sent directly to some infrastructure that they're used to ingest and in process and the applications for that they vary from.

D

There is looking at the health of your cattle to inspecting airplanes between flights, and you you want to do that not only by tailing the stream by tailing the data right so by processing as soon as the data is available for for processing, but also you might want to go back and reprocess data I, don't know. Maybe you found a bug or you found an issue that you wanna.

D

You want to revisit the data and extract some new, some new information and so in in in applications like this, not only you're interested in in the low latency aspect of it, tailing the stream and processing as soon as the data is available, but also going back in time at some arbitrary point and and reprocessing and along similar lines we have in in a factory, you can have cameras just recording or about any pictures of of parts that are being data be manufactured and you wanted the new and a spot, for example, defects on those parts, and so the same concept applies here.

D

Where you want you spot those problems or those defects as soon as possible. So you you probably want you probably want you to tail the string and process the data as soon as you get it, but there might be situations again in which you want to go back in time and revisit the data and reprocess it so that same concept applies. For for such a use cases you.

D

Now, focusing on on strings itself right, so there was a that's what I wanted to say about about use cases, but now, let's, let's turn our attention to what streams. What these streams actually look like you know, in an abstract way, so natural way, you think think about dreams is that they are sequences of events or records or messages that I know any concept that I matter so to the application. But this this sequence of data items as they are produced.

D

We keep appending them to the to the sequence, but in reality a reality is not just one single flow. If I think that a lot of the a lot of the scenarios that I have mentioned with servers with sensors, you have a number of this parallel flow. So it's not one sequence, but you can have. You can have many of those in a in parallel, and so this parallelism gives us another degree of a say, realism right. So it's closer to our.

D

We were we'd, expect to see in in a real application, but doesn't stop there, because we can also have fluctuations in the in the traffic we're observing. You can have parallelism is that you have the parallelism, but you know that the traffic in the parallel flows can I can grow me shrink over time, and that's because you have a I, don't know daily cycles. Weekly cycles, maybe no be honest, likely cycles, but also you can have spikes on a Black, Friday or or Christmas, and some events that get people to access more your system.

D

So all those could be examples of where it can have changes to the traffic of your application. In consequence of your of your stream.

D

Now, it's also important to note that if we talk about continuous gen continuously generated data, you can have this being continuously generated for a very long time. We could be talking about years right as if, if an application has been running for a long time, and so we we might want to capture, we might want to capture this frame from the beginning and keep it as a stream.

D

In the recent past a lot of applications, they have they. They tend to split the stream data into say, fresh recently ingested data, which is the part that you associate dating right.

D

So it's the recent data you are, you are you're capturing processing and the older historical data that made you have already process and you might want to reprocess in the future, and so for that matter you maybe use a different system even to restore it and I'm gonna call it the lumped away just in reference to to what people called the Lumbee dollar architecture, but the reality is that it would be, it would be ID. You know for applications that they didn't have to make such a distinction that they could ingest the stream.

D

Of course they can get rid of that in the stream truncate and so on. But if, if they need to keep the data, then they should ingest the stream and keep it as a stream for as long as the data is needed. So no such a distinction between fresh and historical data.

D

Now this streams enough learning about writing I, have focused a lot on the on the right part right, so one sequence far listened and then traffic fluctuations and ingestion, but there's a big part of it, is also on the on the read side so making sure that that's an application that once you process it is actually able to cope with that. You deal with the flow of data, no matter in what form if it fluctuates, if it's parallel or what so read.

D

Scalability is another important aspect, so with all those concepts in mind what the main goal of probe a guy or the the vision we had for profit when we started is that we wanted to have a storage system that has stream as a primitive. So it it has streams going in and has streams going outs.

D

So traditionally, a start systems have focus on objects and files, and we we thought that that, given the nature of a lot of applications that were using they it's more natural that that's, they use streams as as their core, primitive and now using the concepts. I have just explained those things they have to be implement in such a way that I that the system is able to accommodate an unbounded amount of data that the stream is elastic. That is consistent. We don't want to.

D

We don't want to duplicate events or miss events and that applications are able to both tail and in process data historically and I'm. Referring to this as a called native way of of exposing streams, because those are all concepts that are that we find very important when building blood systems. So let me now move to talk about. Provegan specifically now prevail, the builds on the concept of segments, a segment is a single sequence of bytes and it's our storage unit right. So it's the is the unit data that we store in our lowest storage system.

D

It's an append-only sequence of bytes, its bytes, not events, I, have been I have mentioned events before and that appears at the API level. Bye-Bye internally, we treat as a sequence of a of bytes now to convert from bytes to to sort of two events from by two bytes. We use serialization. So we expect the application to provide a way of serializing the the data industrializing on the on the way out so segments enable us to have parallelism. I can have a number of segments ena in parallel, daddy.

A

Flavia um I just asked a very good question here: yeah of course, oh, this does privy to just focus on on the storage of the streams, or does it have or doesn't have any functionality to kind of do some of that serialization. So you know, as an example is it is, is it only just doing the raw streams or does it have some higher I live a functionality like like similar to what say a message queue might might do, for example,.

D

So but you do have the bigger of writing events. You just perfect. The prophetic line expects you to pass as you realize it, and this you realize so this realize on the way in, let you serialize it will use the CDC realized on the way out, so the application, rights and events, but internally we store it as bytes, so prevent internally does not understand the events. You don't understand the bytes okay.

D

So so the segments that we store internally, they enable us to have parallelism, so you can have the the various clients writing to those segments in an Impala and we use routing keys to map their pans of data to those segments. Now this segments also enable us to varied that degree of parallelism, which I'm gonna call scaling. So I can start if I start from the head of the stream which which, in this representation it starts from the from the right. If I start with two segments, then I at some point, my traffic goes up.

D

I decide that I need a larger number of segments and I transition from two to five. As some later points, traffic goes down and I I dropped from five to three right, so that's viable with with a per Vega stream, and as I mentioned, we have this. We have this. We have this notion of scaling and this scaling can be then in an automatic manner. So any configured stream, you can say they want auto scaling, enable and super vague, attract the traffic and and we'll do that scaling automatically for you.

D

Segments also Lao was to implement, implement transactions efficiently and effectively. We when, when you start a transaction from from from an application, provegan create segments for a temporary segments for those for those for the application for that transaction, and any parent in the context of the transaction will go to those segments.

D

Now, if the transaction commits, then those segments are merged to achieve the main segments of the stream and in the case of an abort, we just discard those those segments so that data, the data of a transaction, does not interfere with the data in the primary segments of the stream until the the transaction is is committed, and if it's aborted, we simply discard those segments and the data in them. So that's another benefit of of having cheap transite segments in the way we have in a improve Vega.

D

Yet another thing that we can do is segment, so you can have conditional pens conditional pens which we do in in revision streams we it they work by comparing the offset.

D

So if you, if you're trying to append and and you want to make sure that that the pen happens in in in a state that reflects your your observation of the state, then it will be accepted and it will be appended if the upset hasn't moved, which means that now you don't have the latest state, then that append is rejected, and this is this is a way of implementing consistent, consistent states at the application level.

D

We do that via primitive, that we call the state synchronizer, which we both exposing the api and and we using internally, and one of the things that you can do with this is build, replicated state machines right and note that this is, though we are doing this via using optimistic concurrency, and so we have again these two, these two primitives that we expose one is revision, streams and another one that is at state synchronizer. They stay synchronize. It builds on on revision, streams.

D

Let me talk a bit about one of the one of the key features that we have animation stream scaling, but I want to talk a bit more about that. I want to go any more depth about it. I mentioned that when you skating a stream we go from from, we can go from one side, many to many or we can.

D

We can merge segments in it and go from from more segments who are two fewer scaling can be done both automatically manually, so auto scaling will react to changes to the traffic, so you configure stream to perform out the scale and and if private observes changes important changes to traffic, then you will scale up or down, but you can also do manually, so you can also proactively scale the stream.

D

So, for example, if you expect some traffic and you wanna increase the degree of parallelism, your of your stream, you can go and manually do it before the events or before you have that spike now illustrating how how that looks in a for a particular stream, so say that we start with a single segment stream with a single segment, and what this graph is showing is the routing key space versus time. Remember that the routing key they're running keys are the elements we use to map events that are being upended to two segments.

D

So in this case, if I'm, starting with a single segment, all the keys in the routing key space are mapped to the same. To the same segment now say that I have two hot keys and and those two hot keys induce enough load. That provider decides that it needs to split that one segment you to say, for the sake of example, say that those rally the routing keys, are representing geographic locations.

D

For example, I have some taxi ride application or some taxi write data I'm, looking at the the geolocation of the taxi rights or they're, starting where they're ending and so those two locations turned out to be hard. For some reason there is an event or people just congregating there, I guess these days. That's that's, probably not happening, but, but you know before all this before they were a situation that I I suppose that was the common, a common thing so privately splits into two new segments.

D

Two and three now, let's say that that was not enough and and itsu induces enough flow that you want you needs. It requires a high degree of parallelism so now I split, second and chewing into four and five and at a later time, say that those keys go back to codes, no privity ghosts and merges 45 back into in just six. So that's the that would be the final state of the stream, at least for the time frame that we're looking to now.

D

One interesting thing to observe from this is that a single routing, II there's not always map to the same segment. It varies: it can vary over time if you have auto-scaling enabled so. If I pick, for example, up 0.9, then it started with segment. 1 then then was mapped to segment 2, then 4, then 6, so at different points in time a particular routing key mapped to different segments, and even though this may sound a complication for the application, the application doesn't need to really does not read, observe this. In this changes.

D

Two segments are completely hidden from from the application and we deal with it under the hood.

D

Now let me show you a graph that illustrates the changes to just segments over time, but now from a real run. So we have this heat map that shows segments and the loads on on the segment so that the color represents load so light blue means that is lightly loaded and and bright. Red means that it's it's heavily loaded.

D

The white lines represent the seperation between segments and what we observe here is if we start from the left, we observe that we have a number of segments and and slowly those segments are merging, and so we see fewer and fewer segments down to a minimum. That starts around 2:30 a.m. and goes all the way to around 5:30 a.m. and then we have only two segments during that period and from around 5:30 a.m. 6:00 a.m.

D

we start seeing segments bleeding again, it is spreading more and more, and you see a good amount of of Reds in the segment's, which means that there is a good amount of work load in uh in those.

C

Segments.

D

And this precisely, we observed you observing the traffic that we used to generate that figure. So we took the we took data from these New York City, yellow taxi trip records.

D

We took half a day from it and we just ran it through pure Vega to observe that those changes should to the segments, and we see precisely that so it starts with some amount of traffic and it slowly drops down to a minimum and then at some point early the more you start speaking up again and if we put them both together now we we can observe. We can observe that a that effect where the change to traffic causes the segment's to to merge initially and then displayed again when the traffic picks up.

D

Let me now move to talk about the the private architecture: sighs I'm, as I mentioned before we have if-then writers. That's one of our API. You have other API is like a byte stream API, but they've an API is an important one main applications. They have the obstruction of events or or similar obstructions that can be mapped to two events and so using the event.

D

Api writers append to a previous stream to the segments of a previous frame, and we track the position of the writer so that, in the case of a disconnection and followed by reconnection that the writer is able to resume from the the right position, then to consume the data. We have the notion of reader group, so we group event readers into groups that we use to split the load of segments to balance the load.

D

That's all, and that gives me their bait of growing in in shrinking the set, so that, if I need more capacity for reads, then I can do if I don't need as much and I want to reclaim some resources. I can I can remove some readers as well and and read. The groups operates even in the presence of scaling and so that balancing of an assignment of segments happens if Ana, when in the presence of of scaling and the readers, are not aware of those changes that happens and it's coordinated internally using a state synchronizer.

D

Now the two main components of profit itself are the controller and the segment store. The controller manages the lifecycle streams. It's it commands the segment so, for example, to create segments when, when, when he needs to, it, also manages transactions that we run against against against streams. The segment's store is responsible for managing the lifecycle of segments and for storing them. So that's that's our underlying storage later so the segment store doesn't know anything about streams.

D

A stream is a concept of of the controller, and the controller is the one responsible for exposing that that concept to the applications, a segment storing, will be use with with segments.

D

We use steered storage the first year of storage is we expect it to be a low latency option for small writes, so we have chosen to use Apache bookkeeper and for the second tier, which we call the long term storage tier. We we have different options and we can. We can bill that we can configure it to use either fire or object in principle that the system's agnostic, Chua was being done there as long as we. What we have binding sue connects you to such a system.

D

So, for example, we can use HDFS there or we can use an NFS mount. We also use Apache bookkeeper for for coordinating the assignment of of what we call segment containers. So that's not to be confused with Linux containers. Those are that's obstruction. We use to represent groups of of segments and that's the units we use to assign work to the different segment story. Instances.

D

Let me let me talk a bit more about this, so the controller is the one responsible for assigning segment containers to the different segment store instances. So each segment, each segment container, is responsible for for a group of of segments and and to determine where a particular segment is going to land with respect to segment containers we use, we use, we hash the name of the segment now in this particular example I'm, showing that the controller assigning three segments containers to each one of the of the of the segment story.

D

Instances in the case that say I add another segment store instance. What the control will do is will remap the segment containers we shut down segment containers in the existing or some of them in existing segment store instances and map them to the should to the new one. So that way we distribute the loads. You take into account newer, new new segment store instances you can also remove them, am illustrating adding one. But of course you can be moving that salsalin we distributed.

D

Now talk a bit more about the right and the right path so on the right path, if the first thing that an event screenwriter needs to do if he wants to append data is to determine which signal store, hosts the it's the segment he wants to appoint. You know based on this segment on the segment container, so he finds that information from the controller, and at that point he connects to the segment store and starts on appending the bytes.

D

Now this segment store will write chubu keeper and only when he receives a response from bookkeeper data is persistent. You will respond back to them today. Ransom string, writer, so bookkeeper on its turn.

D

I persist the data in in a journal, and so it's guaranteed that that is on disk by the time that the event stream writer receives the technology and the data to to long-term storage gtr2 that is propagated, asynchronously and, as I mentioned before, we we have a few options that they have of HDFS and FS all those built on file or objects for the repast. We have a similar structure, but the stream readers get information about segments from the controller they read bytes from the from the segment for the corresponding segment store.

D

The segment store responds with data in the cache. If it's a cache hit, because it's stating a string data, they will return immediately. If knots they needs to read the data from from theater, and the data in bookkeeper is not used for reads at this point is only used for for recovering for recovery purposes. So if a segment store instance crashes- and he needs to recover the data for for a particular segment container or set of segment containers, then you will use the data in in Apache, bookkeeper legends.

D

All right, so that's that's what I wanted to cover about the the purveyor know, segments concepts, features and and a bit of its architecture. Let me now say a few words about it: stream processors, which is actually connecting for Vega to applications.

D

We.

A

So um Flavio could I just ask a very good question about the battery bookkeeper. um So so so, in effect, is that is the apache bookkeeper kind of your transaction or go right ahead. Look kind of thing exactly.

D

You can't you can't look at it. This way right.

A

It's okay, okay! Thank you.

D

So the to connect provecho applications, we typically use connectors, especially if you're talking about well. So if let me say that, so, if you're talking about application you're building from scratch, then of course it can just go in and use the clients directly. But you have say frameworks which you want to to the generic frameworks that you want to connect to provegan for those you want to build Sinkin in source connectors, so the sync connector will allow you to output data to a private stream.

D

The source connector will allow you to and to read data from from a profit stream. So one example is the the fling connectors that we have developed. So the the reference to the repository is at the bottom of of the slide, but that's the general concept for uh for connectors that that we can use for systems like finger order, these stream processors so existing connectors. That are that we that we have implemented or are aware of so we have one for Apache Frank, which I have just mentioned. Then we have one for Hadoop.

D

We have a lot stash plugins. There is one contributed by the community for alpaca and, and there are a good number of other ones that are, we aren't doing any and you expect the community to contribute as well.

D

So I I have skipped a good I'm, a good amount of slides that I hadn't flink. If anyone is interesting talking more about this, I I have backup slides about that, but I always keep it for now and move on and talk about them, provegan kubernetes. So we have implemented operators. So operators are in a custom controller from managing the lifecycle of an application. That would be at a general definition and we have used that in in a few places and our operators, they do a number of things.

D

They worry about the deployment about configuration, so we talked about board disruption, budgets, part of Feeney and I, have any rules of no validating making sure that we are satisfying those assigning default value co2 variables. So all those things it takes care of scaling in the case of the private operator is responsible for for upgrades self upgrading from from more rushing to another one. That's the that would be the responsibility of the operator to let you take care of it and also monitoring the health of of the individual components we have implemented.

D

Three different operator is for the different components that are well for the various parts of the system, so we have for the private operator that covers the controller in the segment store. Then we have the bookkeeper operator and zookeeper operator. So all those three are open sources at the moment and what I wanted to do is show how quickly a cluster data that we have deploy. It's running a longevity workloads, this particular longevity workload. This is running so it categorized by a small set of routing key.

D

So it focuses it targets traffic using an a small set of routing keys, which is a non uniform load. This would be issued across those keys right, so we're now using our keys. It's a small set so gives me a skewed distribution of the workload. So let me you, let me show you it ran you.

D

Give me a minute and.

A

I just say it's particularly good for material have a life during a project, presentation, yeah,.

D

Okay, yeah so so I have. um We have I, have set up this before the call, because there are a number of steps. I do need to get this ready, so I am I want to show this Reni already, so I won't be running a lot of other commands this also in dangerous of time. But let's, let's see.

D

You.

D

Ok, so these are the parts that are that are running in this cluster, so, as as you can see and see, if I can annotate this, as you can see, you have Griffin and influx to be running, then we have a set of a3 bookies.

D

We have the pro Vega operator running, then we have a single controller and a single segment store, and we have five zookeepers running here: zookeeper servers and the zookeeper operator, so this version of of the of the operator was not using so was incorporating the actions of bookkeeper as well, and so we I don't have a the bookkeeper operator running separately here, but at their operators. I have described this isn't a separate repository. It's available.

D

Now let me show like okay.

D

So this is graph Anna. This is the graph on a dashboard for an for that cluster. So let me start with this: the operational dashboard they show. So this is the krita traffic we're imposing. We are putting between 6 & 8 hold on a second I'm, not seeing you know this plane. Quite yet, okay, there you go.

D

Right so, as you can see in this graph, your segment right bytes per second we're putting a load sets between 6 & 8 megabytes consistently. So as I mentioned, this is a longevity test that we run continuously and, and one of the interesting things that we can see here is the number of the variation of the number of segments. Remember that the distribution of load across keys is skewed, so I am sending to a small set of keys. I.

D

Remember correctly, it's 4, and so it's expected that I that we get a good number of splits, so increases in our segments and then at some point this starts dropping again until it converges. So that's expected. Behavior.

D

All right so well, that's that's what I'm going to show quickly just to see that that uh that, uh because the running and some of these graphs, let me go back to the to the presentation. If you have any questions about this, I can come back to it and show you some more.

D

And alright, so that was like a quickie view of a prophetic lesser life now to wrap up to wrap up the main motivation for us to pursue a system to stone streams was the observation that I we have a very good number of applications out there that I have sources of that are continuously producing data, and we felt that a lot of those applications words map there. There Deb struction, that that they have of this sources to streams rather than filer objects, which are the thracian primitives. You find in storage systems.

D

Now we have put the effort into making. This is the streams, unbounded, elastic and consistent from a storage perspective, and we have also done the work of connecting them to stream process so that we can extract the value out of the data is not only about in jessamine story but also being able to to derive value out of the data. I gave one example, which is effective link I mentioned some others, but about you think is the main one that we have being working with.

D

The project is open source is under apache license, which you at the moment. It's hosted on github and we are looking for a home for a frank evasion, so we are looking at a better options for for incubation at this time and before I close just a few comments on anyone who could be interested in starting with pro Vegas, so I want to give a few pointers check the website. There's a good amount of documentation there.

D

You have even videos and and blog posts, in addition to project documentation, check the organization on github and the repository the main repository. There are a number of repositories there for Vegas. That is the main one with respect to what I present it today, but you also have the connectors I have mentioned. Then you can run provegan stand-alone locally. If you want to do some quick testing or even some development, you can run that, along with previous samples, so I have a number of samples and data in that product.

D

Samples, repository and Teutonic Oberon Ares I suggest that you know go look at the repository, the instructions there and throughout that process you know, feel free to give feedback, and even country beautiful. If you see anything that you'll be interesting, changing your or improving your or anything, and that's with that I conclude my presentation, so this last slide gives a good number of references for for all the things I have mentioned during the presentation. Thank you.

A

Thank you for that presentation that was was very informative. It's it's interesting, honest I, guess slightly different to some of those sort of typical storage projects that we've discussed so far. So this is this is a very interesting alternative. You mentioned you were looking for a place to to donate the project. To are you familiar with the sort of projects, graduation structure in the CN CF.

D

um A little bit I'm not very familiar I'm I'm, more familiar with a of doing things, I'm, not as familiar with the CN CF or even the Linux Foundation in general I mean we have spoken with people in across CN CF, sorry on, because the Linux Foundation green, CN, CF, but I, don't know it's anything. You want to make sure it would definitely be a be of you so be if you wanna, you wanna give any information about that. There will be used so I'm sure all.

A

Right so so perhaps I could send.

B

You an email after.

A

This.

A

Perfect, that's really helpful, but so so, just just to quickly summarize there are, there are sort of three levels of of projects. The the starting level is a is a sandbox project and this has relatively low a lot relatively low bar to entry. So so it's kind of goods, if you're, if you're, trying to help build the community- and you know address- maybe IP policy related changes or help grow. The number of maintained errs, for example, of the project.

A

The next level up is the incubator level, and that covers there has that has a that has a higher bar and there are a number of different criteria and then finally, there's there are the graduated projects. But graduation then requires additional things, like security audits, for example, so it would be.

A

It would be useful to to understand your thinking on this and and at what level you you you are considering, because obviously there are there's, there's a different workflow in a different process and and different levels of due diligence that that we would need to consider.

D

Sure so, in your view, won't be the difference again between sandbox and are in incubation.

D

Well, what would make anyone go which for send box rather than incubation directly that Sun? That would be my question.

A

um It's it's to project maturity, so so the incubation level requires a number of criteria like, for example, having maintained errs from different organizations or and end-users and having the project being used in production, those those sorts of things. So you know if, if you can't get some of those references or or you know, maybe maybe the the project is very focused on just one organization that might be an opportunity to go in at soundbox level to kind of grow. The community further.

D

Got it okay, that makes perfect sense all right.

C

I, okay,.

D

That's the interesting interesting.

C

And yes, sorry, um I might I, don't know my audience was breaking up, I, don't know if it's me or everyone else as well that thanks for presenting I, think it's definitely as Alex said, a a very different project than we're used to looking at so I think it'd be a good asset to add to our portfolio and ciencia.

D

Yeah, thank you Amy. It's a.

C

Me right.

D

Now that was a Erin Erin, okay, sorry, I guess: I saw the wrong.

B

Erin, thank you, hello. This is Luis I just want to thank you for this. This is really nice. I'm very different I think we need to expand our our landscape document and we have four storage systems. So one of the questions is: is that yeah? You.

C

Have a lot.

B

Of Apache projects, what you know I'm just curious: you know why go after CN CF approval instead go part of the ASF.

D

Excellent question I would say that I I personally haven't entirely made up my mind. I have what I have been with I accept for a for a board director I have been everything in the SF. I am commuter projects, I am part of PMC's I am an Apache member. I have been part of incubators, so I know their stuff pretty well. I have heard great things about the Linux, Foundation and and CN CF in particular.

D

I have been very impressed with the the infrastructure and and the group of people and the projects, and so I decided to explore I thought it'd be a good idea to explore. So it's it's a it's offering a strong contender in a name I list. It's a it's. It's looking pretty solid, all the work that I very people have been doing about the projects and again infrastructure I think there are all accounts and in helps projects to be to be successful.

B

Thank you.

D

Thank you.

A

So so, thanks again for for presenting Amy or Aaron, unless there are any other things that we need to raise. I think this covers our our agenda today. Yes,.

C

I think we should pass on the maybe sandbox at least to start templates, so it helps answer. Some of these questions, like Louis, was answering to structure it in a way that would help you understand, which level of acceptance you think the project would go into so I'll go ahead and forward that on to you, Flavio and then I think we'll go from there once once, it's structured in a way.

C

Oh thank you Amy, once it's structured in a way that we could move forward and understand what you guys are looking to get out of it.

C

Thank.

B

You this is a question to our most of us: none not really value, but I thought projects always started as sandbox. First nope.

C

That's not accurate. There are projects they can generally I mean they don't come in as graduation I think that's kind of a given, but they can start as incubation, provided you know there have a lot of support within the community and box meant to be that springboard and so without really understanding it in in terms of all the other different aspects, I, don't think I could make a recommendation one way or the other okay, the.

A

Test started as an incubation project yeah.

D

Incubation, okay,.

B

Yeah Vitas what was like the early days of this project, this stuff and I, was wondering if we have changed for Apple. If I understand now, let's go. Thank you.

C

Not that I know of Lewis, but it's fluid.

B

It's all good, thank you.

A

Cool okay, so I think I think we're done and we're coming up to time too. So, thanks everybody and we'll see, I will speak to each other.

C

In the.

A

Next couple of weeks and I hope everybody's keeping well and staying healthy, yeah.

D

Likewise, thank you for the opportunity of presenting get.

A

This.

D

Group.

A

Out like thanks again Flavia.

D

Be well I.

A

Don't see.

D

You online.

A

You.
youtube image
From YouTube: CNCF SIG Storage 2020-04-08

Description

CNCF SIG Storage 2020-04-08