GitLab Secure Stage, 21 Jan 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2020-01-21 - Walkthrough of semi-automated advisory generation with adbcurate

Description

This is an updated workflow of adbcurate, i.e., our tool for semi-automated advisory generation from NVD.

A

Okay, cool, yes, so welcome everybody. Today, I just wanted to give you a short demonstration of the Advisory generation tool. This is actually presentation, I or demo. I gave in September I think last year, but this is the updated version, because something's changed in between and yeah the handling of the tool change. Also, the structure of the DT younger files that we have for the advisories changed a little bit. So maybe it will share my screen.

A

Can you see the slides or perfect so yeah, um so I at the beginning? I just wanted to give you a little bit of context. So ADB accurate is the name of the tool. The idea to automate the adviser generation process is based on the procedure that was applied before, where we actually menu to check data sources for security, relevant information, and then we had to basically write the advisories by ourselves like manually and then create a merge request which has to be reviewed, and then it was merged at a certain point.

A

So it was a largely manual process. I also linked the the epic, the related epic here in the title, and the process looked a little bit like this. So there was some sort of data source, a data feed. Let's say the MVD data feed that had to be checked, relevant entries had to be filtered filtered out with relevant I mean if there is a vulnerability report in a data feed like it, for example, NVD.

A

It does not necessarily relate to cannot be related to a package, because a vis--vis are mostly relevant in the context of software composition, Eliza's and when you're looking at an LBD data feed, for example, over 90 percent of the vulnerability reports we find in the data feed are not related to packages, and so the first challenge was somebody had to go to the data feed had to look through it and to figure out is this relevant to packaged to care about, and then assemble information, maybe from different sources, because it may not be all contained in this vulnerability report create a merge request for gymnasium DB, which had to be reviewed and then at a certain point.

A

It was merged and back then we also had a client-server architecture where which meant that, for every advisory that was merged to gymnasium DB, the related advisors also had to be like pushed to gymnasium server. Because back then, we had gymnasium server that or that was kind of the the the the server that hosted all the availability. Information with respect to composition, analysis, vulnerable packages, okay- and this is like these- are the data fields that are contained in an advisory. So we have a data field for an identifier. This is unique, Extron identifier.

A

This would be, for example, a CV, or it can be also another external identifier of external I mean that is not controlled by us. This is some identifier that's put on by nobility what a report by an external entity then there's a package slug that we use internally to identify and advise me that is usually composed of a package type and the package name. So the package type we usually refer to the language so, for example, or the the package registry.

A

So for python we have pi, pi or PP, I'm not sure what the right pronunciation is for go. We have gone just go, then we have Nathan, we have packages for PHP packages and so on. So just I like the package types and then the name is the name of the package as it is hosted on a certain package registry, then the title is a short description of availability.

A

We have a publication date there's. This is the date on which an advisory has been published on a certain data source and the date, which is basically the modification date, and we need this distinction to track updates that are made to certain vulnerability reports.

A

Then we have an effective range, a field which contains machinable machine, readable range of affected versions, so this is usually evaluated by the analyzers.

A

Therefore we are using the version syntax with respect to a certain package registry, so, for example, when you're using when you adding an advisor videos related to Python and the packages posted on on some package, residues registry related to Python, we are using the version constraint, syntax for this particular package registry and they all use different version version constraints from Texas, so maven has its own syntax. Python has its own zone text, and this is basically what these affected range view contains. Then affected versions is the same, but it's just human available.

A

So this is like, like a string and natural text, it's comprehensible by humans contain the same information. Fixed versions is a list of versions that have been fixed and not impacted. It's the same for list of versions that are not, or basically just a different, a different kind of range. That's just saying that this version, this version range, is not impacted by given by nobility. So it's also a version range.

A

The solution contains the solution, string, saying that please upgrade to a certain version: it's usually what we adding their credit, is just to give some honorable mentions to people that were involved in fixing a certain vulnerability or that were reporting it URL. So the list of URLs that are related or providing additional information about the certain vulnerability that we have also CDSs vectors that I used to express the severity or to compute the severity of a vulnerability and an intron identified.

A

This is the UUID, so this is just a string that identifies the malleability uniquely trembling.

B

Yeah and just before, I move on yeah I had a few questions for you. It's actually more of a question and comments. Do we have just out of curiosity? Do we have a job that is taking that T identifier is unique in the DB y.

A

There is, there is a check for that yeah, so we have an automated check for the uniqueness of the these identifiers yeah and the second.

B

One is big. Yes, it's gone, oh.

A

It is also true for the for the package for the package luck, so we also checking for for uniqueness there and I mean if you, if you about to add a new advisory. That's already present then also the TCI check, good question, yeah, perfect and.

B

Is it gonna win just a quick comment on the publication date? We should make it clear that that's the publication of at least or itself, and not the publication of the advisory in or DD that could be confusing for users, but since.

A

C

B

Get that we should rely on date, but if it's explicit, that's always better trust me to sense. Yeah.

A

It makes sense a this is something I should add explicitly to levy KMD on gymnasium DB. It's true.

A

Great yeah this, so this was just that this slide just gives you an overview of the different data fields that we have. Some of them are optional, some of them are not, and the Epicure a to now is what it does it is. It goes through an energy data feed it tries to identify what are relevant entries and then based on that generates an advisory, providing information for these several data fields that you can see here, and there are certain challenges that were associated with that.

A

So, as I mentioned at the beginning- and it is very generic, so you have to figure out what which entries are relevant with respect to composition, analysis and to which package packages they can be related to and because related package name and types are not provided in, usually in the descriptions you're going to find them and the CVS that are contained in the NVD feed and also you have incomplete information as an example, the CVS that are coming from MBD feeds. They don't contain the title. They only contain the description and it's you can.

A

Oh, it's hard to automate the summary of the description into a title. I mean they are twisted. Do that I tried some of them, but they they don't work very well.

A

So therefore, I'm also relying on cwe as another data source to basically pull in proper titles from there, and that can be added to the advisories and then another challenge is that these data feeds tend to be relatively large, so there's 4, 2010 BG 2019 feet is about 60 megabytes and with 14,000 CVS, and so you have to because we want to reprocess that we want to be able to system once in a while. You have to figure out a way to do that efficiently and yeah.

A

Then the main challenge I would say what the biggest one in this list is to figure out how to relate the vulnerability descriptions or CV is to packages. That was basically the main thing, all the other things that are on this list date. For example, you know the title we could pull it from cwe and for for solving this particular issue. We we are relying on on streams, so we considered the whole and adjacent file, it's a stream, and then we just why are we going through it generate advisories and I? Had this one?

A

This particular item I forgot to mention is that nvd or enmity entries, usually they they contain both the content version ranges, but they also contain explicit versions. So when you have a an affected version range range from 1.0 2.0, it might be that in the end VD feet they exclusively list every single version. That has been impacted, and this is not not what we we are doing and in our schema we assume that you have a version range, and so we have to translate these explicit, explicit version into two ranges.

A

We have to translate this, which is not a big change. It's durable, but it was a bit nasty to do so yeah. Then we come to the biggest challenge. How could we then identify and with the entries that are relevant to certain packages, and how do we figure out these mappings? So for that we are using information? That's provided in every single CVE.

A

That's contained an energy data feed which is called a CPU, so come up that form enumerator and it contains information about the vendor, the product and version, and this vendor product information is very useful to us because in most of the cases there's a one-to-one mapping between vendor product and package concrete package. It's not always the case. There are situations where you have 1 to n mappings. So it could be that you have CPE. That's that's that's covering 10 or 20 packages.

A

That also happens, but in most of the cases you can link CPE directly to a certain package so and we are using using this information.

A

So why are we going through this and BD feeds and we are looking at a CVS? We, we also see that d cv is are linked to certain cps and then we maintain separate database where we store the relation between DCPS and certain packages and yeah. So this is just a let's say and a separate database that we are using using for package with resolution more or less and.

A

The next question is that how where did we get this information from? Because when we started we didn't have, we didn't have a CPE map yet so we we didn't didn't know about the mappings between CPS to certain packages, and for that we we used some bootstrapping and we basically did the bootstrapping based on information that we already had in our in our advisory database. So in gymnasium DB we already had a couple of yellow files, I think back.

A

Then it was about thousand thousand advisors that we had in the in the database and thousand CDs should say and for bootstrapping the first CPE map. What we did is. We basically went through the CVE files that we have in our advisory database. We checked the correlated NVD entry extracted the CPS from that and then basically stored. This correlation separate database, and then we had the CP map that contains contains a mapping between CPS and packages.

A

And yeah, we we also used other sources for extending our CPE map. We used, Friends of PHP will be sick. Then CBE CP data feeds that's provided by Nvidia and, based on this, we already had had a good starting point for for generating the first set of advisories.

A

Ok, maybe I will just start with this short demo about how to how to apply our to how to run the bootstrap process. So this is something that usually you don't have to do it if this is basically something that I'm doing it doing once a month, something just to add new entries to the CP map, but I'm also adding this information as a JSON file to the git repository, so that can be easily imported, but I will just clone fresh.

A

C

Is that the CBE map json.

A

Lets the CP mfj sign me up yeah. This is the JSON file that I can also show it to you here. So this is basically de prefix that contains the vendor and the product, and we say here that we would like to include the CP mapping with it. That's that it is used as a valid mapping, and this is the list of CPE packages to which this this may be related to it yeah. So this would be about this I'll be your package in Python, so there are some includes there.

A

Also some excludes in so this one. So, for example, if we didn't come across, we didn't came across a package, yet that's related to Python hyper, or they are also other other ones that are, for example, OpenStack Octavia. Where we wear this, where we didn't find a relation, they are, they excluded.

A

We still add them because once in a while I'm going through them to check whether I can maybe add additional information to this CP map, so that these packages could be resolved in the future, and- and this is also the reason why reprocessing is so important, because whenever we change something on the city map, we want to be able to reprocess. Also, all the data feeds in order to pick up, maybe new CP is that we didn't know before.

A

Okay, so the the whole once you've closed or posit Ori, there's, basically single Meg file that manages the whole advisory, a generation and all the different steps that that I am mentioning doing the demo over short, like a short description about what is targets to make targets.

A

By supposed to do, and one of these targets is make bootstrap cpe, and this is basically doing what I've just mentioned in the slides before it will just go through the advisories that that are already present in the database will check the corresponding NBD, suppose CVS and then look for methods that that it doesn't already know and then extend to CP email, but this takes this takes a little while because it goes basically for every single advisory, it goes to the corresponding website on on on nvidia to fetch this information.

A

So I don't, I don't want to run this. This is true. I just wanted to show you how you can execute it if you want to try it out, but as I mentioned this information, the current information is basic. You already contain from your JSON file. I stopped it at this point.

A

Okay, so the next, the next thing I would like to illustrate is the advisory generation process. This is basically the interesting part, so here we assume that we have a CPE map that that there's, a city map present that we could use and then there's an NB d data feed that we would like to process and this advisory generation contains basically of two steps.

A

The first one is called, filter and split, and what happens here is that we we go through the energy data feed and we check every single CV and for every CVE that we that has a hit in the CP map, so that we know has a packets relation. We we extract the cv e, so we basically generate a new JSON file that only contains the information related to the CV that we just that we just hit, and that we know has a is related to a certain package.

A

This happens basically here and I mean what I could have also generated a single JSON file. But it's nicer if you, if you split them into singer single JSON files for debugging, so you could basically navigate through the signifies and also look through the single files and see what was the CPE that that cost this particular CD to be identified as a as a CV related to a package that we know about- and this is the first step, the Filton split. So this is just to identify relevant packages and extract them.

A

So here we extract him, basically all relevant or irrelevant CVS or CVS that are related to packages, and then the next step is the advisory generation, where we take these files and translate them or transform them into an an advisory. According to our llaman schema that we are using with the data fuse that I've mentioned at the beginning, and we are also taking some information from cwe here. This is basically the title that we are taking from an external source, so Filton split is basically cherry.

A

Picking and advisor generation is generating advisories according to our to our advisory format, and this is now now I want to go through the singing steps and show you how you can you can run them and how it looks like when you, when you do this with Eddy PQ 8, and this is like the whole, the overview of the whole approach, so the interesting part or the parts that are that I'm going to show. You are mostly focused on filter and split and advisor generation, because I think these are the most important ones.

A

So this here is the first part here is about the bootstrapping. That I showed you at the beginning, how you can bootstrap with CPE, but this is something that you don't do on a regular basis, usually because in most of the cases, we are mostly interested in CPS that are related to the nvd feed that we're looking at at the moment, and this U is more like taking all the CP mappings that you're getting.

A

But it was good to have a starting point. Basically,.

C

A

C

Grab say with the NPM: get the list of all of the NPM package names and then add that directly to the CP map.

A

That would be possible if you could, if you can get to the related CPE information. So if you, you could basically collect all the packages from NPM, for example, but you still don't know which CPE they actually correlate in cause. There's no no unique fixed schema that they're using for defining the CPS.

A

Oh, so that's me that would be the perfect approach, because then you would be complete with respect to the package package, names or packages that you considering, and the only problem is how to figure out the CPE for a certain for a certain package. Okay,.

C

A

So yeah, then, as when we can, let's just generate a couple of advisories, for maybe the nvd data feed from 2020.

A

So the first thing to do is I indy make file I've defined a variable which is called data fits and VD that contains the the data feed that you would like to to to check for advisories and if, in this case, I'm using the data feed from 2020.

A

So the first thing I'm doing is I'm basically collect calling the target collect feet, mvg and what it does it downloads, the the most recent data feed from ngd and stores it in nvd feet in the end directory, and we defeat any and I can have a look at. So this is basically the the most recent version of the entity, data feed and in JSON format and.

A

After after I've downloaded this file, the the next step is this sufficient ajith step which I'm invoking so I'm, calling to make filter and split NBD? And what does this target? Does it's going through the the JSON file and looks for CVS that are relevant to certain packages, and this information is then stored in nvg feeds it? So you can see that we have a couple of JSON files now here. That is, contain the same information as SG as data from the MVD feet.

A

The only difference is that we basically also are already added information regarding the the package to which the CV relates to, because we already have this information. We added the study to the JSON file so, for example, the CVE that you can see here. This is 2021 9 to 5. This one corresponds to the package or data server core, which is a just making package. Basically, so we had already added this information here in this in this flux area, and you can see.

A

This is also information, that's already contained in the end we defeated in this case they were using the range syntax, which is much nicer than the than the other syntax that they also have where they basically explicitly or they they give start and end of a version range as affected range, just quite helpful in this case. So this these are now the the CV is that we cherry picked from from the data feed and the next Arabic API into multiple ranges.

A

Yet it's also working if you have multiple ranges, yeah, so it's implemented in the way that if we have multiple ranges that that are reported, we we kind of put them into a set of ranges century. So we have a list of ranges that we are using, and this also works for. If 20 versions are mentioned explicitly, we squeeze them into a version range, and so it can be also used. If you have both cases, if you have situation, we have multiple ranges and some of them are reported using the range syntax.

A

Others are reported using the explicit versions context, and then they are kind of put together into a list of off ranges. I mean it's holidays.

C

A

C

Keep going, it fits on the same topic, a question that fully pad I've got a different question.

A

Yeah I just wanted to add that the did they are. Sometimes there can be cases where we will have some false positives where this this squeezing doesn't doesn't properly work or that there are some versions that were missing. This does happen, but most in most of the cases, it is working that that's the reason why we still have this manual reviewing step in between to spot these problems, but and in most of the cases, this version detection mechanism works, pretty good.

A

James you had a another question or yeah.

C

The format of those tve json files is that a format we came up with or is that a standard that we are using, I can exist in standard.

A

Basic for the jason for the jason formats give me the jason or the general formats, because they because the json format here that for these JSON files that are generated after this filter and split step, is essentially exactly the same as they appear in the NVDA data filters. Or is it just extracting them and using the same the same data fields? This?

A

This is just an intermediate step to do some cherry-picking, but and to kind of organize them on different files, which makes it a bit easier because, when you're going through them in, like we are processing the ND d feet and a stream, and when I'm coming across an interesting city, I'm just pushing it out as a file while I'm going through that and yeah the.

C

A

Is just the same as the one for to see the Easter not contain anything that.

C

A

Okay, so the next step.

A

After doing the the filter and split would be the advisor generation so for that we also have an extra extra target called make generate, generate advisories nvg and when I'm invoking this this call target, it will basically go over these JSON files and will translate them into our own own format and it will put them into the advisories out directory. So you can see that we have now the advisories are kind of organized and in the way that's specified in our schema.

A

So we have like two package types, maven and pipe I on the top level and then the the package names for maven. It's a bit more complicated because you have cooked by the Infanta fact ID, but for pie. We just have pillow as a package name and then CVE, which is the identify that this is basically tea.

A

The organization is based on our schema and when I'm looking into one of these advisories, then you can see that we have a identified, which is the CDE package. Slack is basically the same as the directory structure to it to that Advisory. The title is based on the cwe and in the description. This is for now the same description as the one that you also see in DME CD and then the date update affected range. This is now the affected range based on the mavin mavin version constraints.

A

Context where, where 4.00 is the lower bound and 4 0 4 7 0 s the upper bound and the fixed version information that has been pulled from maven central, so we basically check for the first version that doesn't fall into the affected range and then and then put this into the into the fixed version list and affected versions.

A

Like these few see affected versions, not impacted and solutions, they are essentially generated from affected range because it is kind of, and you can think of them as redundant fields, essentially, because you can generate them from information that you already have in your own advisory. So affected versions is basically string. That's generated from affected range because we know lower and upper bounds. We can basically just generators and not impacted is essentially the inverse of affected range. So we can also compute this automatically and was like what, if the.

C

Resolution directed range is multiple ranges. You know it's like two different major version help something.

A

So if you have multiple ranges, what what we're doing is, we are essentially inverting them, and then we check to check for the first version that that falls into into G. So as we adverting them, we are generating a new list of of ranges basically based on that and whenever there is a version that falls into this new list, we considered as as as fixed and basically the approach we're using there. Yeah.

A

Yeah and then you, your ID, is generated automatically and just CBS s vectors. They are also provided by the CVS, so test informations only contains and the nd feet.

A

Okay, so usually what I'm doing now at this point is I'm, removing a version information, for example, a tool from from the description, because it's redundant and as it's already contained here and I'm, also if there are situations where you have classes, for example, I'm using markdown syntax, so there's some editing that I'm doing usually on on these generated advisories.

A

That's just to know, remove information, that's not pretty neared at this point and to make it consistent with what we already have in our database and but just for the sake of time. I will just go on to the next step, which would be to generate merge, requests from that. So now we have the advisories, but we still need to generate merge requests according to our contribution guidelines and at the moment our contribution guide lights.

A

They say that you have to that that your title has to adhere to certain formats that, yes, that you have to add certain labels to your via merge request. For example, package type has to be added as a label, and we also have one merge request per advisory policy so saying that every single advisor we'll be adding is basically added through a single budget request and therefore I'm, there's an another make target that we can use just and just just make targets called prepare Amar's and what it does it's it's.

A

Basically, just we have a local copy of gymnasium to be, and it's basically just adding branches and branch we're adding one singer advisory- and this happens with this with this make target and let's just go into our local copy, and you can see that we have like now, maybe I should grab four.

A

We have now a couple of branch at the branches that have been created for these advisories that we want to push and.

A

There's also another target when, whenever you you're pushed because you know when you, when you're processing energy data feed, it can be that you have like 20, 30, 40 branches and a certain point. You want to get rid of them. So there's a make protuberances, which just deletes all the branches that are related to a DB curate, so that you can make start from scratch.

A

And then the next, the next step would be to to push them to generate much requests on github. So there's there's another target called push Amar's and what it does. It's private branch that you have locally and you're in a local copy. It will generate them ours on on gymnasium DB and can just.

A

So just it's now, Ana market has been created for pillow, and this was the one I edited. It goes to this post here and see if possible, dingo one yeah and what I'm doing is I'm. This is basically the old checklist I'm still adding it. It's a 2d to the our description, but it's not really required anymore, because all of these fields are generated automatically, so we don't have to check them anymore, I'm, also adding a couple of useful links at the top. That makes reviewing a bit easier.

A

If you, when you, whenever you're, not percent sure about an advisory about content of when advisories of there's some contradiction, then you can always go to the main repository page or to the original CDE, a page to to check for for consistency or for other other issues and yeah. Usually what I'm doing here then is I'm doing the last pass. I'm checking whether there are some some issues, I'm, also looking at the at the CI jobs that are running on the advisories, so, for example, for this particular one everything is okay.

A

So there is some sentence: 70 checks that are executed on these advisories and then I'm. If everything is okay, I'm resolving the VIP status and I'm assigning it to to another reviewer, it's just.

A

Tatiana example in this case and then there's a final pass, and if this is okay, we just much much she's advisories to the to the database.

A

Okay, this was this is basic to the workflow. So starting from the empty data feed through the filter and split step advisor generation, then I'm our generation and then hopefully it ends up in the gymnasium database and and then once in a while, I'm. Also applying the you know the bootstrapping step again just to get to know a new recipe. Naps.

A

Yeah, this was basically um just a short walk through. If you have like questions or suggestions, maybe yeah.

C

Definitely have a couple questions um all right, so the workflow. How much of that right now is automated? Is it like a recurring CI job and get lab on the advisory, DB, curation tools, project or.

A

No at the moment, I'm I'm hitting this manually so I'm starting the jobs manually- and this has I- mean a test. Some advantages when it comes to reviewing, because what I'm usually into it doing, is I'm processing and the data fit, for example, from 2008, soon I'm generating advisories.

A

For that and then I'm editing them locally and so I will usually I have a batch of a couple of hundred of advisories, maybe 100 200, and then once I've revealed this like badge I'm, generating a mask for them and just makes reviewing a task easier for for whoever checks just a final check on engineers in DB, but putting this into a see, I job. This would be definitely nice to have, especially when it comes to update checks for updates on on nvd feeds.

A

So you could, you could have some sort of scheduled, CI job running checking, whether something changes on the MDG data feeds that are published on the m BD page and then fetch them and extract the simulated, advisories and then creates MMR's, but before we could do that, we still have to. We would have to figure out a way how we could efficiently mitigate false positives.

A

Have some or having some sort of black listing would be would be great for start, I have some very primitive approach, but it's not good enough, because you would have to have some sort of loop that whenever you, you notice false positive. Maybe while, while reviewing on good luck, for example, you would have to mark this somehow as the false positive so that we could kind of get this information for the next iteration.

C

Is it are the false positives, usually us thinking that a CVE is about a package when it really wasn't or or is it some other type of false positive.

A

Yes, it's exactly that so there's a situation at the beginning, I've mentioned that CBE maps are usually there's a one-to-one mapping between cities and packages and types, but not always so especially for go it's it's really terrible because they have one single CPE, but the CPE is related to ten twenty go packages, and then you generate twenty advisories, but only one of them is actually related to the package you care about, and this poses some challenges, especially for go also for maven.

A

You have to do it for maiden too, because they they also like to submit they compose their packages into sub packages. So we have like one parent maven project and then you have a lot of different smaller packages, but it's still one single CPE and that then this is problematic for for some package types more than for others for NPM, for example, sorted the bigger problem for for gems. It's usually, you also have one-to-one mapping and Python too, but especially maven go. They are, and sometimes yeah for for PHP.

A

You sometimes also run into this situation for CMS fingers or symphony they. They also, if you also have a one CPE, but a lot of a lot of packages that are like a cover.

C

Okay, yeah, alright yeah. That makes sense.

C

So I think more about what what I mentioned before about doing like spell checking and to help it be as automated as possible. I I think there's some like grammar checking command-line tools we could use as well or you check the spelling and filter out anything that's related to a product name. Basically anything in the CPE, because it's gonna say all those are misspellings and then do a grammar check.

C

Are the descriptions do we use? hmm The title comes from the cwe right in the description we just take. Whatever is in the NVD database, read yeah.

A

Yeah we take it, but I check every single one of them, usually for because of just marked own issues and version issues, because I have to delete the versions from them. This is also good for cross validation, because I can check whether the versions mentioned in description actually match the versions that are explicitly listed in the enve in the CVE.

A

So it's also has you know an additional advantage additional value, because sometimes you you you can you can not just it. There's like an inconsistency between the version mentioned in description end and the one that's mentioned in the affected versions, field, make--the, yeah, sure.

C

That makes sense, okay and the so. What you did in the demo was all of the vulnerabilities that we currently think are related to packages just since this year or if.

A

Today or something yeah, this was a sin since 2020 all the since January to eternity to thrusters generation.

C

Okay, all right so that wasn't too many and then we're still catching up on previous years.

A

Yeah, most of the actually most of this to the work that that we all why the automation, further automation is useful, as I think, mostly because of backlog processing for all the NBD feats, 2018 2017 2016. You know, because when it comes to new advisory on you, you advisories I would say that per week, usually we have maybe 10 15 look more than that's nice.

C

Yeah, it's not too bad.

C

Okay, I have a lot of other questions about the types of manual editing that we need to do so if we're working on the backlog, so I got any backticks around, like you know a function name or something yeah I can often that needs to happen and like mm-hm, maybe things we could do to automate it as part of the merger quest, so that there's as little as possible that we need to do manually.

A

Yeah so I would say, and why not, the percent of the cases versions have to be deleted from the description. Ninety nine percent in regarding the backticks this is if I should I would say that maybe in 50 percent of the cases they are class names or other keywords that we could put these around. But the nice thing is to you could actually detect this I think automatically for some languages for Java. In many cases you can, if there's like a camel case, word sorry problem I, think I got trapped out so.

C

The last thing I heard you saying with class names so Persians you always have to be deleted and class names and hear about site functions or something like.

A

That yeah so for I think for functions and class names. You could actually use some sort of regular expressions to detect, detect them. That would be doable, I. Think what's a bit nasty sometimes, is that one ability reports they sometimes get generated automatically, so I think that they are generated from tools and then then they are basically incomprehensible. So they use some internal structure where they were. It say that affected component Colin and vulnerability type Colin, and then they they put something he ever, but that I mean this.

A

Is these cases I think have to be edited manually? But it's maybe it's just like. Maybe ten percent, it's not that much I mean the biggest biggest. Things are versions because they always mentioned it make specifically also in the description and and class names and functions. You know, okay, this can be done for.

C

The descriptions from MVD ever too big and we have to summarize it ourselves.

A

In most of the cases they are okay, it's there even cases where they are too short, where they just put one single sentence sentence.

A

Then it's even shorter than the title that we are using, but if they are too long, it's mostly because they are related to other CVS, and then they also reference other cities and their with the design.

C

A

This is in this cases, I just delete, delete two sentences if they don't really explain or add some useful information with regards to their to do to the nature of the vulnerability I do.

C

Okay, all right, if that was a common problem, I was going to say we could use so on reddit. There's the auto TLDR Bach. That summarizes articles, for you does a great job. If that was a problem we had I would consider maybe using that to automatically generate a summary, but I doesn't sound like that's too big of a problem.

A

Not really no, it's it doesn't happen, I would say too often, so this would be something that we could I think still fix, doing, doing, reviewing yeah, okay, all right and I see that there's comments also now: okay, yeah. This was a question.

B

Of are arrested before yeah.

A

B

That was expanded, useful thank you for putting this together. Yeah.

C

A

You know I just wanted to UM you know to go bit through the workflow I didn't go into implementation, details or something because I I thought that I just wanted to know so how I'm using it at the moment. It's also nice to get some feedback about how we could improve this. This process itself. This worked for that. We have at the moment to increase the throughput and number of advisories.

A

In rocket yeah, there's there's also the I think I already sent you the link to the.

B

A

Just checked yeah: this is the I added the modey modey TI description description here.

C

Yeah I was following you, along as you went through the make file yeah. So thanks for that, like that.