GitLab Continuous Vulnerability Scans, 7 Sep 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Prototype: Ingesting OS packages and advisories from Trivy db

Description

This is a prototype demo of how we can ingest OS packages and advisories from trivy-db into the external license-db.

It shows how we read trivy db data and then feed them to the advisory processor who is responsible for storing them in the database.

A

Okay, so let me start with some uh with a brief introduction of what we are trying to do uh with this epic. So let me start by sharing my.

A

Screen um so this epic uh is about um advisory, inje, so ingesting OS packages and to be more specific, we want to ingest OS package information and how they relate to advisories and the source is uh the 3vb. So you might be familiar with uh this architectural diagram uh where you see the flow for advisories, and until now we just had the Glad advisories, so these are advisories coming from gymnasium DB, but now we also have we.

A

We want to extend the whole system with advisories coming from the trivy database, um so there are some fundamental differences between glad and 3B. Of course, the first one is that gymnasium DB, so glad uh is mainly referring to application Level packages right, so you have packages of go of um Conan. uh So this kind of packages, while 3vp, refers to application Level packages, but mainly it refers to packages uh from distributions.

A

So you can imagine that we have there distributions about Alpine for different versions of Alpine, the fundu of red hat, just name it it's there. uh So there are two kind of information that ddb contains. It contains um vulnerability, so advice, vories row advisories exactly like glad have them H and it also contains all the affected packages so for every distribution uh there are packages and every package has one or more advisories uh that they affect the that package right. um So that's a bit so that you know what trib is about.

A

um So the idea is very similar, so we use the 3vb as an advisory Source. um We basically uh again use a cursor to see if what what was the latest version of the 3vb that we have processed, uh then we get that old version that we have processed the last time the advisory feeder was executed. H. We also download the latest 3vb, and then we compare those two and we publish only the differences uh to The Advisory processor through pops up. uh Yes,.

B

Just a quick question so looking at the diagram there, one thing that stands out to me is that it looks like there's a single advisory feeder rather than multiple advisory. Feeders, like we do with licens DB and I. Just want to clarify is that is that the case.

A

um So there is a single code base right. uh The project is named, actually license feeder, not advisory, feeder I'm using the advisory feeder just to yeah uh to understand better what I mean. um So we have one code base that we can run against licenses or against. Advisories in this case is against advisories, H and every time, and what will happen basically is that we will have scheduled jobs uh for tri, so you might be familiar with uh the advisor feeder jobs that we have for glad. They run once per day.

A

So once every 24 hours h on both environments, Dev and Broad, uh exactly the same, uh we will do for 3bb. So we will have a scheduled job that will run the advisory feeder uh code uh and as a source we will use 3vb.

B

Perfect. Thank you.

A

So the advisory feeder, once it has the diff, it will send it over pops up to the advisory processor and the advisory processor is responsible for storing those data into the external license database and again we have a scheduled job for the advisory exporter. That will read the data from the external license database and it will store them into the advisory bucket, which is a public bucket. So this more or less it's it's the the the the usual flow.

A

um What I want to demonstrate today? It's basically the part that contains the advisory feeder, the advisory processor, H and storing the data into the license database. So I have prototyped this part uh of the system. So let me scroll a bit down and I have here another diagram. So until now we just had one advisory processor, and that was for the Glad um Source. uh Now we have uh a small change there, uh because we use different topics uh depending the source.

A

So, for instance, when we have um uh when we have glad as a source, um then we send um all the advisories over the glat topic and then uh those messages are being processed by the Glad advisory processor, which will store them which will store them uh into the external license. Db. Now, when we use 3vb as a source, there are, as I mentioned before, two different things. There are the advisories and there are the packages that we want to store.

A

So we are going to use two different topics for this, and every topic sends the messages to a dedicated Cloud run instance. That will process that so, for instance, we for 3vb advisories. We have a um a dedicated topic that sends the data to a dedicated Cloud run instance. That will store them into a table in the database where store 3vb advisories for OS pack for 3vb OS packages. Again we have a dedicated topic that sends the data into a dedicated Cloud run instance.

A

That will store them in the 3vp package uh table in the license database, um so yeah. This is a bit about the infrastructure right uh I. Don't think it makes much sense to show you the schema of the data datase right, um but maybe what's uh what's nice to do at least for the demo is show you how things are right now in the database. So let me take.

B

This so right now, question.

A

Yes, of course,.

B

um I just wanted to know um has there been any changes to the existing schema um as part of these changes.

A

So, yes, we are doing Chang in the existing schema, but we are basically extending the schema. We don't change anything that was there, so we just what we actually did was that we added two tables: The 3vb Advisory table and the 3vb packets table, and we also added some permissions to The Advisory, processor user, so the user, the database user that is used by all these Cloud run instances so that that user can actually write the data on those tables.

B

Perfect. Thank you. No.

A

Problem, so let me go back to the database, so whatever you will see here, the whole demo runs in my gcp sandbox project, okay, um so these are the two tables 3vb advisory and 3vb package, which, if I try to count right.

A

Now um see how many data it should be zero exactly so, for the advisory data are zero and for the package data. Oh it's one, okay, interesting! Let me let me make sure that it's zero also for the packet.

A

Oops that was me experimenting right before uh our call okay. So, as you see, we have no information right. uh So what I want to demo is that I would like to run the feeder, and this feeder will actually create messages for the advisories and the tvdb, and uh it will eventually end up here in the license database.

A

uh um Since we have no data in theory, we should actually uh do a full ingestion, but the full ingestion, it's about 1 million a bit more than one million pops up messages, and it takes around six minutes. So I'm not going to do that. H I made sure that uh the cursor says that uh we have processed a database like two or three days ago, so that we can actually check only the uh the difference right. uh So we have less pops up messages.

A

So let's go into the feeder code, uh not this, but this one. So this is the command that I'm going to run I'm specifying that the source is 3bb I'm, specifying the two topics and I'm specifying that this should not be a dry run. So, let's see uh I'm just going to say, debug.

B

There's one thing: I noticed there: just um if you are sharing this I noticed.

A

B

Was a credential in the last screen that you.

A

Had uh yeah good one, then uh I will actually, uh after this video I'm, going to I'm, going to deprecate it and create a new one. Thanks thanks for sharing that um good cards. So let me explain a bit what is happening here. uh The f has already finished running. So what you see here is that it says that the cursor tag uh is this. This is actually a date, so this is 2023 September, so 5th of September and the latest tag that exists out. There is September 07, so we have two days differences.

A

uh It will create a new database, so basically it will download uh uh the curs uh tag database and it will open it h and it will also do the same for the latest tag, and what you see here is that it says that um it prints some statistics. So first statistics are about the advisories, so you see here that we have 75 advisories that have been added uh and uh 1416 uh um edits on existing um on existing advisories.

A

So in total these are the amount of pops up messages that the feeder sent uh when it comes to packages so always packages. You see that we have uh this number of additions. 298 edits, so in total we have this number.

B

That's really nice.

A

Yeah I I actually wanted to know how many deletes we have, but it looks like we don't get too many deletes, um but it's nice and informative I. Think now let me go to here and what I have here is the cloud run instance for packages. So let me refresh this page. The time is 11:18 uh yeah. Probably it's already been two minutes, since uh this thing happened because I see here from 11:16 a lot of messages with 200 right.

A

So that means that all those ppab messages were uh correctly acknowledged and inserted in the database. I'm, not sure if we can see metrics, because this actually upload uh gets updated a bit slow yeah. We cannot, uh we cannot see yet uh um so what I would say is let's go to the database and Let me.

A

Refresh okay, I think I need to connect again. So sorry for.

A

This again, this is a password that um gets uh deprecated uh after um an hour, so I will make sure that so by the time the the video is published uh yeah, you won't be able to do much with this password.

B

A

Worries. That's why it also timed out right. My connection, let's see.

B

If this will work just um just on that, like I I won't labor the point because I know you're recording, but the um the credential that I saw I thought was a gitlab credential, which I don't know. If is shortlived or not.

A

That's indeed it's a gitlab token credential, so that's something that I definitely need to deprecate. um Okay, I! Think it refreshed. So let me run this. So we have this number of 3v DB packages right, let's see if these Maps, with what we see here, yeah it actually maps and now, let's do the same for.

A

Advisories and you see that we have 1491, which is the same as this, uh so basically, this is uh a small demo of how of how it works right and next steps for this would be. uh Of course, we right now. All this uh code is prototype right, so we need to do the proper merge request and uh have it um uh on dev and then on production. uh But for now this thing has been validated and we know that we can do it.

A

uh I think that six minutes for a full ingestion of 1 million entries it's very nicely and we haven't even optimized uh The Advisory processor to do BS inserts right, so I think we are quite quite good already. The next step, for this is going to be uh The, Advisory exporter right. We need to export The Advisory exporter.

B

Yeah that that's really cool um I was I was wondering. When you showed me the C I was wondering. Is there any chance? You could show me like the contents of one of the rows in the database. Just out of curiosity, sorry.

A

I, what do you want to see exactly I didn't get that.

B

um Instead of showing a count, could you show me one of the rows in the database.

A

I'm just curious, yeah, yeah yeah, of course. Of course, that's a that's a good one. S actually thought that maybe it's a good idea to do it, but then I said maybe it's not very interesting. Okay, you can see my screen right.

B

A

Schemas public tables, so let's look first on the advisory. So what you see here is the CV of the advisory, uh the last updated so a time stamp, and then you actually see the actual content right. So it's a Json, it's it's a Json B data type for this column and then you see the contents and then, if I go to 3vb package, uh we have an ID as a primary key.

A

We have the distribution name name, we have the distribution version, we have the package name the advisory ID, and then we have the node value. Basically, the node value says: which versions are.

B

A

Nice, okay, um then I will stop. This recording yeah sounds.

B

A