Red Hat OpenShift Upstream AMA | OpenShift Commons Briefing, 26 Apr 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Inside the Indexer - Louis DeLosSantos (Red Hat)

Description

Inside the Indexer
How Clair extracts and persists your container contents
Louis DeLosSantos (Red Hat)
2021-04-26
OpenShift Commons Briefing #Upstream #AMA
For more information about Clair:
https://github.com/quay/clair

Slides: https://github.com/openshift-cs/commons.openshift.org/blob/master/briefings/slides/Inside%20The%20Indexer.pdf

A

Hey everybody welcome to another monday openshift commons briefing with another wonderful upstream project, this one claire. We have luis de los santos here, who's a principal software engineer, working on the claire project for red hat and he's going to take us inside the indexer, and so I'm going to let lewis introduce himself introduce what everything is going on in the claire world today and if you ask your questions, throw them in the chat and we will answer them at the end of the presentation.

A

So take it away. Lewis, and thanks for your persistence,.

B

Yes, thanks for the introduction, so today we're going to do a talk called inside the indexer, how clearv4 extracts and persists the content of containers.

B

So what this talk is really trying to uh uncover is is getting um the community or the watchers of this presentation more acquainted with the internals of how claire works.

B

Claire's fundamental goal is to provide insights about containers uh to the client, whether that's a developer or an operations team. We want to show you what exactly is inside the container and what might be vulnerable and have your teams patch those things or act accordingly.

B

To do this, um it becomes obvious that we need to understand, what's inside the container, extract the contents and place them into some kind of schema, which is searchable, and that's what this talk focuses on inside the indexer indexer is a service which actually takes all the layers from a container, looks inside them pulls out the contents and creates a report.

B

So what is indexing? Indexing is a term that that we use um with the process of extracting the contents of the container itself.

B

It is the first step in claire's analysis pipeline, so inside claire's pipeline we're trying to take a container and we're trying to understand what content is vulnerable. We split this pipeline into several phases, and indexing is the very first phase, it's responsible for creating an index report which we're going to go into detail in just a bit.

B

So if we're looking at the complete claire pipeline to create a vulnerability report, this is what we're looking at. This is the 30 000 foot view. I have highlighted the portion of the application. The portion of the pipeline, which we're going to cover today in this talk. What you'll notice is we take a container manifest?

B

We feed that to the indexer. The indexer performs a bunch of work which we're going to go uh into in detail in this talk, and then it generates an index report, which is the findings of uh of uh the work that it just uh performed on the container manifest.

B

So there's a couple key components here now, if you'd like to follow along or you go back- and uh you want to look at this talk if we're in uh claircore, which is our project, this is the engine. This is what's really doing the scanning in in the clear project.

B

If you do want to follow along in our source tree, the indexer code is in this internal package and then there's the indexer directory here. Almost everything we're going to cover in this talk is laid out within this indexer directory uh and there will be a lot of references back to this. So if you are interested in following along or just you know, you're looking at this talk at a later date and you're trying to piece together what we're talking about to the code.

B

This is the directory of interest, so the key components uh in this little section, I'm going to cover uh the data models, basically how we go and structure our data to accomplish this goal of extracting the contents and reporting what we found inside the container.

B

So first we have a manifest. uh The manifest represents a container image for us, you'll notice, it's made of a slice of layers. uh Those are order dependent. So um if you go- and you created a container- uh you know with um uh with docker or pod man. um Those layers are created um with a parent child relationship and we represent that with the slice.

B

So when you submit a manifest to us you're, creating um the same concept as the layers hierarchy, um the mat, sorry, the containers hierarchy of layers- you represent that with a slice of layers to us and then just the hash digest of the manifest the content addressable hash, signifying the manifest as a whole.

B

Now this is the index report. This is how we communicate to clients what exactly we found inside containers. So we'll start again with the with a hash right. This is a hash of um the manifest as a whole, so you can think of it as a unique identifier for the container and its layers in that unique um ordering and what happens to obtain this hash.

B

um You might know about this if you're, if you know about how docker images are built, but if you don't, this hash is actually computed by taking the hash over each individual layer inside the container and that computes a final content, addressable hash, the state, this is used internally and it is exposed to http clients who might want to query the indexer.

B

So the way this works is that when you submit a job to the indexer, if you were to try to submit the same job, we would actually give you back this structure and give you the state of the index. So claire is smart enough to know like hey we're working on this right now, but here's the state you might want to pull the state if you want to pull the state just just wait until you see an error or you see that it is successful.

B

So we don't do this in quay right now, but as a usability factor, you could write clients which just kind of sit there and like pull on their job. It's part of the design specification for the indexer itself packages. This acts as a. um Let me actually take a step back, you'll notice packages, distributions repositories, they're, actually, maps right. It's a map string with the actual structure of the package, distribution and repository.

B

So the index report is really acting like a portable database. We do this for de-duplication reasons. um It would be unfortunate if we just continued to write the same package strings for every layer that we found it in or had to duplicate that information.

B

So when you're, looking at the index report, you actually kind of want to treat it as a database which has uh uh key values that you can string together to understand um where certain packages were found.

B

So the way this works is you'll, look at the packages, um we'll call it a database and it has the id, and then it has the package name. So right now, uh this you can picture this as a deduplicated uh database of all the packages that were found inside your container same thing with distributions, we could technically identify more than one distribution.

B

We typically don't if it's a normal container, but sometimes there's dist upgrades or sometimes you know, there'll be um more than one file that uh gives us a hint on the distribution of the actual container, and this is um you know whether the container is rel, whether the container is set os, whether the container is debian, that's what this is representing uh and then repositories these as if these are uh usually language repositories. So if we find pip, if we find um npm they'll be represented here and then the environments is what strings this together.

B

So when you're looking at environments, this will basically give you the idea of saying, okay. We found this package in this layer at this file system path and we needed to do this to support um language packages, because once you start supporting language packages, you have this predicament where the same packages could exist in multiple directories across the file system, for instance, if you are using npm and you have a forms library, you might use that in five projects that are scattered around uh the containers environment.

B

So we record each one uniquely without having to duplicate the packages identity by uh compressing them into these small databases. So that's really the bulk of what the index report is providing you now, there's just some bookkeeping um whether you had a success or not. So this is again if you're a client, that's polling and you want to know um okay did we have a successful index or not.

B

You can pull for that and then, if we didn't, if success is false, then we'll give you a detailed error message, uh which is just helpful for debugging. So that's the index report. That's the output of the indexer.

B

So now we have scanner interfaces. This is a very important concept uh when dealing with the indexer, because this is the externally implemented uh sections of code.

B

The each scanner is repre is um in charge of taking a container layer and then parsing through it and finding the desired content that they're interested in. So we wrote these as interfaces allowing other teams other upstream contributors to come in and say: okay, I want a jar package scanner you'd come in you'd implement this interface, which takes the layer, looks for jars and just parses them into packages, returns it to the clear code, very simple, and this has been proving itself useful. With the crda integrations. We've done.

B

uh I was working with a roon on code, ready containers, and I discussed this with them. I was like how interface is doing and he showed us the pr. It was all code ad nothing needed to be changed, so this abstraction has been working pretty well for us. The same goes for distribution scanner or repository scanners.

B

This. This plays an important role um later on in the talk, but, as you can imagine, the indexer is taking container layers and trying to understand what's inside them. This does the bulk of that work. We hand each implementation, a layer and it can go ahead and scan through it and understand with its own business logic. The clear code proper isn't too concerned about what's happening in there.

B

So the flexibility is there's a lot of flexibility there to to perform package scanning distribution scanning and repository scanners uh scanning uh the way the ecosystem uh sees fit, whether that's npm or python or whatever, and then there's a coalescer. So this this is probably my favorite part of the indexer, um and it's somewhat this took a little bit of time to figure out. You know: how do we do this right? So let me think of the best way to explain this.

B

When you have a normal uh container, it's a series of discrete tar balls right and that they represent file system layers.

B

Now inside those layers, there might be what's called a white out file and what that allows us to do is handle deletions between the layers. Now claire doesn't necessarily need to understand that, but at some way shape or form claire needs to understand. I have these layers.

B

There might be situations where the packages I found in layer- one don't even exist in the later layers. So we don't want to put that in the final index report because they were deleted in some intermediate layer. So the coalescer is another interface which handles this business logic. So it can go ahead and it looks at layer artifacts which are similar to the index report, but they represent the individual packages.

B

Distributions repositories found inside an individual layer, so the coalescer will take a list of these artifacts and with its own business logic, it will understand whether it should actually keep or remove particular artifacts from the final index report in a similar fashion, as if you were a container runtime and you had to apply a set of layers on top of each other to get the final container file system image. That's going to run on the host. We do this. We obviously don't have to do it with the file system in mind.

B

We have to do it with the end goal of creating an index report in mind. So a little bit of in-depth detail there there's two implementations of the coalescer, currently there's one specifically for rel that you'll find and then there's a generic one. So let's take a look at that real, quick inside this inside our root directory.

B

If you go into internal indexer linux, because this is a linux, focus coalescer, the coalescer is inside here, and this would be a really valuable piece of code to understand how we actually go about creating these final uh index report and there's a little bit of heuristic in here.

B

um We have to kind of identify distributions um in a very piecemeal way right because, while it's not the common case, what could happen is that you have layer, zero layer, one and then finally, in layer, two, we find something that gives us a hint on the distribution of the container.

B

We now have to somewhat back fill that information to previous containers and then attribute the packages found in those previous containers with the distribution information we found uh later on- and this is just the nature of claire right, like this- is kind of what makes claire a unique application in the fact that it's dealing with piecemeal information, the entire way through and we're kind of finding novel ways to stitch this information together and then create a cohesive result. That represents the final image.

B

So the architecture of the indexer itself, it's has a restful, http api and we've written this in such a way that you could. Theoretically, if your application needs were simply just I want to know, what's inside the container, I don't care about vulnerabilities or matching them against anything. You could go and you can take the uh the indexer and use it as a discrete service. It has no other dependencies.

B

So if, for some reason you had the idea of like okay, you know I'll go and I'll do my own vulnerability matching. uh Given that I have this little piece of code, this service, that's able to give me the contents of a container, then you can simply just use this alone and there's a restful http api to do that. It is also architected um and modeled um as a finite state machine and the reason we did this um well, if you're not quite sure what a finite state machine is.

B

It's a set of logical steps. um States, if you will that house business logic so as you're moving through this business logic, you're transitioning via states. Now what this allows you to do is um when we were re-architecting claire v4. We wanted to be able to basically quickly say: okay, there's something else we need to do.

B

We don't want to refactor the entire application, so if we model it in a state diagram like this, don't don't worry about knowing all this we're literally going to go over every step in this, um but if we model it as a state diagram, and then we decided like hey after scan layers, we actually needed to do this new thing. We just pop it into the diagram and then do the plumbing necessary.

B

Almost no code has to be refactored, which has worked out very well for us, because when we were re-architecting, for instance, index manifest came as a requirement much later uh in our development cycle, and it was almost no refactoring was necessary because we just created a new state and popped it into the state diagram.

B

um It is a comic pattern, I'm just explaining it in case uh you're, not too aware of of what that looks like. So I have a little snippet of code here about um how the state machine runs and let me go back to the source code. Just in case you do want to follow along the actual state machine is in the same directory, the internal indexer, and we call it a controller to follow along with a lot of the semantics around claire.

B

um Maybe a smaller side is that a lot of times, you'll, see interfaces and then controllers and the way we kind of architected claire as a whole, is that people upstream individuals very simply just implement interfaces and the controller handles most of the business logic. This separation has made contribution pretty seamless, because contributors don't need to worry about databases, they don't need to worry about how claire actually stitches things together. All they really worry about is implementing interfaces, and then we have controllers which control these interfaces.

B

So if you are interested in following along anyway, this controller is the actual implementation of the state machine and the guts of it is really in controller.go here.

B

So I'm just going to go over the actual um run uh method here, because I think, if you are trying to follow along, then it can give you uh good insights.

B

We have this um this dictionary um or this uh map of state names to state functions and, as you could assume state functions, are the actual business logics of each of these states. So you'll probably see like a fetch layer function.

B

What we do is we get the current state of the state machine and then we automatically run that state function that state function is going to return a new state. It's a very recursive uh algorithm. Here it's going to return a new state, we're going to see if we need to do any error, handling we're going to check if we're at the terminal state, which is a canonical way of saying, everything's done you, you can halt the machine.

B

Once we have once we determine it's, not the terminal state, we set the machine state to the one that was just returned. We do a little bit of bookkeeping here uh which writes the new state to the database. This goes back to if a client is polling, uh the indexer.

B

This is actually the hook that says, hey there's been an update to the to the index report. You probably want to be aware of that. If we can't do that, we do some error handling and then finally, we just recursively call run, which just does the whole thing again with the new state.

B

So it's a it's a it's a recursive algorithm, um but again, what's nice is that when we update the state diagram, we we don't really refactor anything. We just add a new. um We just add a new state function and then we update the state map.

B

Okay, so now I want to dig into each of these states individually to give you an idea of of what the indexer is doing. um So the very first state that that we enter when you submit a container manifest to the indexer is called check manifest, and it's exactly what you think it is it's we determine if we've ever seen this manifest before.

B

So we can do this because of content addressability. So I don't know if you've, if anyone has watched previous talks uh with me and claire we're always hammering on this, the content, addressability aspect, uh what this really means is that if we see um a manifest with a particular hash, it's content addressable, it's the same content, no matter when we scan it again, no matter when we see it, we can always be sure that the same layers with the same content makeup that manifests. Therefore, if claire sees it, it can go.

B

Oh okay, I've seen this manifest. I don't need to do anything else. I can literally just return the index report that I've already computed for this manifest now, in this case, we're going to say claire has not seen this manifest. So it's going to go. Okay, move it through the pipeline. uh We need to. We need to move it forward. Now, there's a bit of a subtlety here and just something: that's nice to know when you're working with multiple scanners.

B

What we'll actually do here is: let's say that, um let's say claire has seen the manifest, but now the implementer of the jar scanner for claire made some changes to it and it might detect things a little differently.

B

The check manifest state is smart enough to say, oh, that jar scanner has changed, so I'm actually going to submit this manifest down the pipeline to the next state, but I'm only going to scan it with the new jar scanner.

B

um So this adds to the ability of just doing as little work as possible, um and you can see that actually in the source code there's a little section where we we clip off scanners if they have actually scanned the manifest before so just a little tidbit of information. That, I think is, uh is nice to bring along um just in case if you see it in the source code.

B

So now, once we have said okay, we haven't seen this manifest before and now we're going to uh move it down the pipeline. The next state is what we call the fetch layer state.

B

um So in this, in this state, claire is trying to determine which layers it actually needs to go out, spend system resources to fetch, download, decompress, possibly and then scan so in this diagram. Here I express the common case of claire deciding this base layer. um I've seen it already. I don't need to go and grab it.

B

It might be ubi8 base layer, it might be a ubuntu base layer, it's a very common case, because a lot of containers are just built from the same from uh in the dockerfile.

B

So this is another example of basically how we're doing less work when it's possible to do so when claire fetches, the layers it's going to buffer set buffer them to disk.

B

This is a small change which adds a lot of benefits from claire v2 declare. V4 uh clearv2 would actually do all the work in memory um which can be problematic if you're trying to pull down gigs of layers right. So now we buffer to disk when you're running clear, it's we advise, you know, definitely have at least 100 gigs of scratch. Space ssds will help because we actually do use the disk layer quite a bit for buffering data, especially for very large layers.

B

Now we go into the next state, so we have the layers they're local, on the file system. Now we take the scanners, which were computed in the check manifest state. We say: okay, we have this list of scanners. We know what we want to uh scan inside the container. Now we do that work so the way uh the scanning state works is uh it takes that list of scanners and it will concurrently uh via go routines, uh fan out the scanning business logic.

B

The controller does this right: it knows the implemented scanners that are configured it'll, fan them out and then hand them each layers, and then they just begin scanning the layers and then they return their contents back to the controller. And then the controller will write those contents to the database so in the scan layer phase is when we are actually computing. What's inside each layer and then storing the partial results of each layer into the database.

B

So to touch on this a little bit, um I I went over this in the components section a little bit, but just as a refresher, because it is a lot of data throughout the talk. So I wanted to put little reminders. This is what the package scanner, the distribution scanner and repository scanners look like, um as you can tell when you call their scan methods they're, given a layer.

B

um If we go back just a bit, you will know that the layer was buffered to disk, so uh it's a little bit abstracted, but the scanner can get a tar handle to the layer if they want, or we do have some abstraction methods on the layer that says hey just give me this file. So an example of this.

B

If I was implementing an npm package scanner, I would implement the scan method to look at the layer grab a tar handle and then look for all the node modules directories that I can find parse any pac parse any packages found into clear core packages and just return them, and then your implementation is done.

B

So that's the level of abstraction that you can expect if you're trying to implement these scanners yourself for your own purposes inside claire. um That's all you have to do so, then, when we get back these clear core packages, these clear core distributions and repositories, we simply just write them to the database with our with our own database handling logic, all internal declare, so any implementers will not have to worry about that.

B

This is a look inside um the claire data model of how we actually stitch these um items that are found during scanning uh together into an erd for searchability. So we created the um idea of scan artifacts inside the claire database, and what this is really doing is it's just tying together the ability to search, saying, okay, we found this package in this layer and it was found by this scanner.

B

This data model makes it possible to say a new scanner has been changed. Let's scan that layer again uh because we're recording exactly the scanner uh name version and kind which found these artifacts.

B

So this is just a little bit of details in how we stitch everything together inside claire's database, and then we get into the coalescing step.

B

So I mentioned coalescing a little bit about how this is how claire computes all this partial data into a final index report and the way this works is um the business logic in in the controller will go ahead and it'll ask for the scan artifacts for both these layers right, the the two layers that we scanned it's going to go and it's going to get this layer, artifact structure, which has the packages, the distributions, the repositories and it's going to go ahead, and it's going to get the the other layer artifacts from df0 and you're, going to feed both these layer, artifact structs to the coalescer.

B

Now what the coalescer wants to figure out is. How can I attribute packages to distributions we touched on this a little bit, but the distribution information might be. You know in layer, 10 and the packages database might be in layer 2.. So we have to somehow coalesce this and backfill distribution information.

B

We also have to figure out which packages should remain in the final index report and what packages should be deleted. uh You know based on the state of each individual layer, so the coalescer works similar to the scanners, in the fact that the business logic in the controller will spawn coalescers with go routines, run them in parallel.

B

They'll both create their own uh representation of the final index report, and then we just merge them together to get the final index report with the final set of contents that are left inside the image.

B

So this is um in the index manifest state is where we make the contents of a container searchable.

B

It's not a super complex data model or erd diagram, but basically, what we are doing is we just have a giant link table um that basically says um we found this package in this manifest. We found this distribution in this manifest.

B

uh We found this repository in this manifest um where this comes in handy is when a new vulnerability enters the claire system. Vulnerabilities are usually tied to packages um and distributions right. So if you have the rel pulp uh security database, um when you look at a vulnerability, it's going to say like open, ssl, rel eight, you can take that vulnerability and ask the indexer hey, which manifests have uh openssl and are of the distribution rel8 and this index manifest makes that possible to give you that answer.

B

uh It happens after coalescing, so we get the final computed results of what's available inside the container image we index that, hence the name indexer, and then it becomes searchable uh in in the way. The aforementioned way, uh when a vulnerability is, is uh attributed to a particular package and distribution, and this is exactly what the data model looks like um at the end. I I'll go through um the uh erd diagram and I'll go through basically the uh database code and then, finally, we have uh index finished, and this is very simple state.

B

This, basically just uh massages the um the state and success uh keys, uh while the values in the index report and then writes them to the database.

B

Deferring work so claire we touched upon this throughout the talk, but one of claire's main goals is to do as little work as possible, so we can uh compute results and give them to the client as fast as possible.

B

So I want to just review real quick some of the ways that uh player v4 does this deferment of work, so the big first part is just the manifest scene start right because of content addressability.

B

If we have seen a manifest before we're simply just not going to do any work, we're just going to go right to the database and we're going to say. Okay, I have an index report for this manifest hash, I'm just going to return it now again. This is excluding when scanners might have changed or claire is just configured in a separate way or a different way.

A

B

Than when claire can understand that its configuration has changed, it will go ahead and it will scan the manifest again.

B

So another way of deferring work is determining which layers to actually scan. This is fundamentally the same as the check manifest state just on an individual layer basis. So again, content addressability indicates that if I see this hash, if I ever see it again, the contents haven't changed. Therefore, I don't need to rescan it, so it's another way that we're able to do less work and another reason why indexing large amounts of images might not be as scary as it sounds.

B

As long as they are sharing. You know several layers. A very common thing to do is have a base layer with a dependency layer and then finally, a third layer that just changes. You know your application um and, if that's the case, then claire is really only doing work on a single layer. Every time you push an application update as long as your dependencies aren't changing, and you write your doctor containers in a sane way which utilizes this, this separation between base dependencies and application.

B

um This is just touching upon um when we do decide that we're only going to scan particular layers. We just go right out to our own database that has this information in it already grab the information and then bring that information with us for the other steps. The other portions of the pipeline.

B

Cool, so that is inside the indexer. I have some information here for you, which is my email address. If you'd like to get in contact with me, I have the claire github repository and the clear core github repository.

B

So that's all I have for the presentation uh I'm into either doing a little bit of code digging or we can go right to questions. um What do you think diane.

A

A little q a here because there's one question and the other thing that I would have you do- is go to your site and show um the schedule for community meetings, because you've just done an amazing run through to give people insights into how to contribute and how it all works.

A

So I want to make sure people know how to find you and get into the community and and get started, um and while you're doing that I'll read off andre's uh question here, a couple of them coming in so which phase of the general cicd pipeline should be the appropriate position for claire scanners after some deployment or somewhere as a testing, linting health check, phase of ci cd or as part of the security, vulnerability management and qa process. I know you have opinions about that, but I think everybody holds an opinion about that. I think.

B

Yeah definitely there's there's opinions about that me personally. um If I have a build system- and I am performing you know- staging builds um when you create those staging containers and they get pushed to a repository. That's really your time to do that scanning understand the vulnerabilities uh that might be inside your container before they ever hit production right. If you don't have a staging environment and you simply push containers and then deploy them to production, you still have that period of time where you've built a container you've pushed it to a registry.

B

It's available for clair to analyze. Do that right before you were actually deploy your code.

B

So in the cice pipeline I would say, as a you know, as a general best practice: do it as early as possible right like as soon as you have the container built um and obviously before you push it out to an environment, um then I would do the scanning uh as early as possible in your ci cd pipeline.

A

All right and andre says thank you very much for that extra.

B

A

um And narandev has a um I'm sure. It's a uh interesting uh thing that he's posted is wait. There's a state management solution for golang. um Can you talk a little bit about that? I you must have mentioned it earlier and.

B

I'm not exactly sure what you're referring to, um but we have written that state machine code um just in pure, go as a incarnation of our own uh development, uh we're not using a library for state management. um But if you are interested in state management and you'd like to see how claire does it, I would definitely check out that code.

B

It's probably a decent representation of what a uh you know, ffsm or finite state machine implementation in go, could look like um yeah it was. It was written to serve a purpose, it might not be the shiniest cleanest thing, uh but it works and it works well. So if you do want to take a look at how we work um that finite state machine architecture, in again, you can go to our source, uh which is clear core.

B

It's the internal directory, indexer and controller, um and what's really interesting to you, would be this controller.go and state.go, and this is basically how we created the uh state transition tables uh which maps straight states to functions but yeah, no, no library, uh no matter state management solution for that we just we just coded it.

A

Yeah, that's it's usually how things get done and then eventually someone says hey. That could be something something and useful somewhere else too. So.

B

A

Modularity and and creating libraries.

B

A

B

A

But I think um what what you've just done, um which I wish I could get every upstream project to do- is to really explain how claire works internally and in order to contribute to a project. That's like one of the often one of the missing pieces, because you know a bunch of engineers from red hat and elsewhere that have been contributing over and over and taking the time to really explain how it works is um wonderful, and so I can't thank you enough.

A

I'm hoping that will drive people who watch this and want to use claire in whatever projects or products or states that they want to um will come to these community meetings. And then um you know, take a look at it. Whether you want to create. You know, take a look at the state management solution um or code base. I guess rather than solution um or contribute to this.

A

That would be a lovely thing, and so um thank you for powering through your power outage there this morning, louis and making this happen and my pleasure anything else. You want to add in terms of um what's next for claire and the claire community.

B

Yeah, maybe just a couple of touch points on uh what's coming up on the our uh internal agenda uh right now we have the 4.1 release uh baking, and this release has a pretty paramount feature called uh what we're calling enrichments um what you might have noticed when we redesigned claire v4.

B

uh We wanted to remove uh false positives as much as we can. By doing so, we've removed nvd as a vulnerability data source. um We did that somewhat opinionated.

B

I think a lot of people share our opinions that mvd might not be the best source of data. However, when we did that we removed a lot of the severity information that people became accustomed to so the enrichment specification and 4.1 roadmap goal is all about allowing auxiliary data information to uh enrich our vulnerability report. So we took kind of a best of both worlds. Approach, in my opinion, is that we're sticking with the official upstream vulnerability data, but now we're just enriching that data with mvd metadata.

B

So it's a little it's a little different from going to nvd and trusting all of it. Instead, we have the trusted source and then we're adding information to the information that we already trust. So this is a 4.1 goal and you'll notice that a lot of information for vulnerabilities will become richer and, if you'd like to follow that development in any way, shape or form, you can go to quay claire.

B

And on our, is this big enough to see oh a little bit bigger? Would.

A

Be better there, you go.

B

Perfect, so you can go to our discussions and inside this design. Tab right here, you'll see claire in richmond specification and just by the way uh we practice open design. So any big ticket changes that are going to happen to claire will be in this section. So it's just a good good area just to watch, um and this is the claire enrichment specification uh and there's a link to our github repository, so the spec is here and the implementation details are here, um I'm working mostly on this implementation, but community contributions are completely welcome.

B

Every single uh detail uh to my best ability is outlined here. Some things may come up just from you know. Implementing software is not it's not always so easy to uh foresee everything that's necessary, but the majority and the chunk of work that needs to happen is all here and welcome for community development. So, if you'd like to speed up the uh the rate at which mvd data winds up back in clay, it's a good one to just you know, be abreast of and take a look um yeah other than that.

B

I think that just kind of being aware that uh we have a community development meeting um every second tuesday of the month.

A

Don't hesitate about that, is it every second tuesday.

B

A

Or is it just every second tuesday, every second tuesday.

B

I'm not sure if we fix that, then all right.

A

B

Peer review: this is why we do this all right.

A

We didn't have to do a pull request for it. That's great! um That's.

B

Awesome yeah yeah.

A

So cool, so what um I'm hoping um I'm looking to see if anyone else has any questions, whether you're out there in twitch land or um in blue jeans, or wherever youtubing um and watching this um or on facebook, even post your questions, otherwise um we're all clear with all the questions, and I will go we'll, let you go back to your day, lewis and if you can share your slides with me, I'll share it with the community as well and we'll upload this to youtube, and hopefully now that everybody understands how the indexer works they'll be excited about contributing to it and um and come to a community meeting.

A

So thanks again for taking the time today, my pleasure, my pleasure, it's very.

B

A