OpenSSF Secure Software Repos, 19 Oct 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Securing Software Repositories APAC friendly (October 19, 2022)

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

B

A

I see some IBM folks, here's Jeff from IBM here as well or someone else going to be leading that uh discussion.

C

Ian Ian will be the presenter okay, great.

D

E

Okay, just let us know when to get started and we'll be ready to go.

E

We can we have some slides to share as well so, okay.

B

A

If you're just joining, please make sure to add your name to the attendees list,.

A

Okay, great so we'll start with just some quick points of order. um Zach are you here.

A

That's not here, but I'll take this from Zach, so uh uh the immediate friendly meetings have been rescheduled. They are at a new time that is more media friendly and they'll also take place on Thursday. So this is if you're, East, Coast or U.S time zones early Thursday morning um and Zach will be chairing those meetings, I think indefinitely going forward because they're outside my working hours, um but he might appreciate some fellow chairs and folks to help him with the chair.

A

So um those have been updated on the open, ssf public calendar and also the shadow calendar invites that have rcps uh and we'll also update links and stuff elsewhere. When those happen. So uh then that that'll be the next meeting in two weeks on November 16th, which I feel like a good section 17.

A

um but yeah it'll be Thursday and uh we'll be updated. Any questions about that.

A

Yeah that'll be the 17th just.

A

B

A

Meeting will be on December 15th, okay, cool, uh real, quick I want to welcome new friends. So if you haven't been to this work group meeting before, um can you uh unmute real quick, just say hi introduce yourself and your affiliation.

C

Hello, I'm uh Zach steinsler uh not to be confused with uh Zach from chambered and I'm working at GitHub on the npm linking packages to their source code and build instructions using sixstore.

A

Awesome thanks suck anybody else.

F

I'm Justin capos a professor at NYU and created tough and work a lot on in Toto and a bunch of other things in this space.

G

I am Jen shavik I'm, with Boeing's open source program office.

A

Cool welcome I.

G

Don't think we've had.

A

Anything anyone from Boeing yet so cool excited.

C

A

H

Yes, time, uh Ian Malloy on Farm, IBM research.

E

Jr Rob from IBM research.

I

And this is jiong Zhang from IBM research.

H

David Edelson IBM research.

J

Eva black from Microsoft and the organisms of tech.

A

Hi Eva nice to see you here anybody else, foreign.

A

I guess maybe you want to give us a little preview of what you're going to talk about or uh a little more detail about your organization at IBM.

E

Sure, um maybe I can start with that and uh and and then Ian and John can do up the presentation, and we can take you through that. uh You know- we've been represented at the open ssf by two of our colleagues are on this call: uh there's Jeff boric and there's mattress, rootkowski and, uh and of course, we know, Jamie Thomas who's, one of our vice presidents is uh uh is, is is leading the uh is one of the leaders at the open, ssf working with Brian who's, also on the call.

E

So we in IBM research have been working on trying to address this issue of how to secure supply chain at scale right and especially, look at open source repositories and and and come up with ways how to raise the bar right, and so we've we've seen some of the example uh topics.

E

We've worked on like six store, for example, that's one of the projects in the openssf, uh which is more based around reputation of developers and credentials, and so on, uh for we've been working on a different angle on this, uh which we actually call a code genome, and it's it's really based on the idea of. Can you because it's possible, if you raise the bar on security to spoof things like uh you, know, developer credentials and other other threat threats and other attacks become possible?

E

Is there a way that you can go to the to the code level and what is it that you're able to do at the code level, and so you know for the better part of this year, we've been actually working on some of these ideas and uh sketching them out again and jiong will take us through this. uh Maybe you can start presenting guys, and so what our request to you is, uh you know to get you know you guys are: are spending a fair bit of time across the industry in this space?

E

I would love to get some feedback from you on uh on some of these ideas and and how we are uh you know uh pursuing this and how we might be able to make it much more usable, consumable and uh practical as you go forward, so I'll pause here. um Let me also ask Jeff: do you want to weigh anything in please before we turn this over to Ian and gion represent Jeff borick.

K

uh Thanks Jr and yeah, we're excited to uh preview this technology with the uh uh securing repositories working group and um we're interested in finding ways to effectively share some of this to remediate, help remediate some of the challenges that uh span the open source ecosystem uh and so we're we're here to preview and sort of again be flexible. On the approach on this and and be, you know radically transparent and see where we go from here.

K

As the governing board chair you know, Jamie Thomas has been very supportive of the good work that IBM has been. Research has been doing in this space and is excited about the possibilities of this technology. Ian.

H

Over to you right thanks, so everyone can see my screen. I had to do a quick, Shuffle there, actually, while Jairus speaking, okay, great um yeah. So uh again, you know why I'm the department head of the security research group uh in Yorktown and I want to bring uh forward some of the work we've been doing.

H

Ideas have been kicked around for a number of years, I think some of the recent events that happened- uh probably the last, let's just say, six to 12 months- uh really kind of spread this on to develop it a whole lot more and you'll understand that as we kind of go through, but an interesting time, let's just get started, uh so we probably don't have to go through any of this.

H

This is kind of one of our slides when we bring people up to speed who aren't familiar with it, but supply chain attacks are uh pretty prevalent these days we actually see quite a few of them at all different levels of the supply chain, from attacks against developers, developer credentials, repositories um and then kind of going after it systems.

H

F

H

Go into a lot of detail there, but when we look at you know the whole supply chain and how it's actually built out there is this pipeline this process.

H

You know when you go from the developers who are going to add that code they're going to build it, compile it and then eventually distribute it, and there are a whole bunch of tools that have kind of built up to help secure that, uh whether or not that's looking at vulnerabilities, all the great things that kind of integrate directly with tools like GitHub uh that let you find them as early as possible.

H

um You know Integrity of the cicd pipelines. You know, reproducible builds all the great work that's been going on with in Toto and salsa and then eventually getting the s-bombs uh the bill of materials. They can actually provide uh to the end users- and you know, have that provide some form of an assurance of what's going on, and what we're actually going to be presenting and proposing here is something that's a little complementary uh to these Technologies.

H

So it's not one of these things that you know it makes us. He wouldn't use need these, but again it might address some additional security gaps that we actually see so Jay I already kind of mentioned a little bit, but what we actually want to propose is something like a software fingerprint uh to help raise Assurance of the software, the code, the binaries that we actually see and when we think about this we start off and you've got a hash. You know it is something that we can use to verify uh file matches.

H

Precisely if there's a single bit difference, then there are large differences with the hash signatures. Allow us to then kind of assign a some form of trust to that. Assuming you trust who signed it then effectively they're testing that, yes, this is what they have and it's you know a blessed copy. It hasn't been modified in any way and you know who uh it claims to have come from, but this actually requires you to have full trust in the signer and as we kind of signed the previous chart, sometimes developers do actually get compromised.

H

uh Salsa tries to address this with two-person verification of all software packages or.

C

H

Commit but sometimes I might not, it might not be. You know something that any different uh you know package maintainer can actually have it's very, very costly and some different vulnerabilities or breaches to actually still make it through uh fuzzy hashes.

H

uh Something like ssdeep is kind of another technique that is kind of similar, which would actually provide a partial match for a file or for known files, and what this can tend to do is find you know some small differences between files, but it really lacks the the semantic understanding of the files, their purpose and what they're actually trying to do, and just to kind of show a quick example.

H

I was playing around with actually running SSD on different implementations or different builds different distributions of SSD, so uh Rel version of 2.14, two different Ubuntu versions of 2.14, uh and you know, compiled it with Brew on my laptop and they all come up with wildly different um fuzzy hashes. So none of them actually match. So it's really difficult to actually verify that they're all uh different implementations of effectively the same binary, the same code.

H

So we kind of want. Is this feature where you have? You know stability across versions and continuity, um and also a robustness uh for for legacy? So you can go out and scan all of your systems and verify yet that this is the same code. That's been running where you might not have the hashes anymore.

H

We can also kind of think back to I guess the old Ken Thompson papers, uh Reflections on trusting trust where you might not know where you've been compromised was the developer that was compromised. Is there something the compiler is actually doing?

H

uh We've actually done some tests recently, where we thought we had potentially disabled certain features, but they still actually make it through to the end binary, and so what we really kind of want is a way to verify the from the source code to the binary, the the full Integrity of everything, uh and do that across deployments uh and address some of the the recent things that we've seen where you might have a version number that actually doesn't match it match because potentially someone added additional patch or they, you know back ported, some security, patch or additional functionality, because they had to do that.

H

For your specific system, so we kind of go back to the the supply chain view. uh While there are all these tools that really are doing a great job at locking it down, there are still small places that where there are potential gaps, potential weaknesses that attackers can actually leverage again anything from you know the the sort of point of entry compromising Developers um uh and then at the end you can kind of think of this.

H

Well, we have a huge Legacy problem so until we're all on a salsa level for uh build system, those can be tons of code out there tons of Legacy deployments. Not everyone has great uh cmdbs or knows exactly what's being deployed, and so the ability to be able to go back and verify and fingerprint all of it is uh imminently useful.

H

So what I do is actually hand it over to Jiang who will actually walk you through a little bit of the fingerprinting technology we have and how it works.

I

All right, thank you again, so we call our project as a code genome, because the fundamentally, what we're trying to achieve is to find the meaningful fingerprint that represents the functionality of the code, and this is the kind of Beyond synthetic matching. So we want to understand or they're trying to find the interfunctionality of the code.

I

That's why we call the this is kind of the semantic match or the semantic search based on the code functionality, and here we show the four different blocks of the black screen, and this is a kind of the small example to demonstrate what we are trying to do on the left to right.

I

These are the actual, the same kind of the computation as you can see, we injected some kind of assembly code in nine, or sometimes we put kind of changing the control flow kind of doing some kind of some kind of obscation or kind of the modification about entrepreneur. And, of course, when you compile and they will result a different binary. Of course, even the size is different, as you can see from the opposite thumbs screen at the behind. And what here?

I

We're doing is regenerating the code genome based on each of the binary code, which is the the bottom center. You see the black screen and looks like some kind of ir format, and that generates some kind of regret, some kind of graphical presentation- and this is actually the code genome- we generate from the 4D Pro on the binary and then that all the way down to the same representation so that we can search through so different architectures or compilers or different, optimization level. Of course, this is a quite challenging problem, as you may already know.

I

So we cannot claim 100 coverage, but we are getting there that we are trying to improve our technology and to handle many different Corner cases, but we are getting there and they'll be improving our code, genome functionality and the quality and that's something we are getting there. So this is a high level overview, and here maybe we'll go on to the next chart.

I

And how we generate the code genome in terms of kind of pipelining or kind of the step on the bottom left. You see, there is a source code and which is not Theta is optional because we mainly starting from the binary. Of course. If we have the source code, you can compile to get the binary or eventually what we care is or irre presentation intermediate representation, because this is something as you may already know. So this is some some architecture or the platform independent leverage, interaction code.

I

That's what I'm focusing here so from the machine code. We first dip into the raw ir and from there with lots of transformation of the IR to make it more kind of consistent across different kind of the binary, and we do the rise of canonical characterization and then do the loss of the cleaning.

I

And then we eventually get some kind of canonicalized form of the IR representation of the code block, and then we convert into the big code, and then we do the rounds of different style of the embedding, and this is one of the example and using one of the our team members is the previous research work or the sigma so which is about representing the speed representation into the Imaging, so that we can do the comparison between different images so that we can measure the similarity.

I

So this is how we generating the code genome based on starting from the actual motion code or the source code, and then eventually we get to some kind of representation of the code genome and then which we can do the comparison between the code and the main benefit. Is we don't necessarily rely on the source code and of course, if we're source code, it is great and then we can do kind of some kind of ground. Truth matching it.

I

But in case you don't have the source code and over the time, maybe some of the record SQL. You may lost access source code or even the commercial or proprietary code. Of course you don't necessarily have the source code and even those cases we still want to do the verification.

I

So that's why we focusing on the binary analysis and also as Ian mentioned so there might be a compromisation of the the compiler like xcode course is one of the example and that's why we don't necessarily want to trust all the building process, but we eventually verify the binary code, which is the code that actually learning on the system in the end, because source code binary code, it doesn't necessarily guarantee they gone on to the same thing. So that's why we want to inspect or verify the binary code.

I

That is actually loading on your system yeah. Maybe the next shot.

I

Okay, so using this technology, the team have been exploring many different use cases and we picked her. Maybe the two most relevant use cases for the team and the pro series case was, as you know, of course, last year at the end of the character year, which is at the perfect time of the holiday- and you know, big hit about the local Jose and then, as I mentioned, the software genome is about finding the core functionality or the core representation of the code.

I

So we took that idea and finding where the vulnerability in the local Jose and from the vulnerable browser of the rock Jose and they're, using that to search things through our system and the infrastructure and the organization to finding where the digital project is still deployed and used, and this also brings a quite interesting kind of challenges. So, like software is not necessary, you have just a single binary, there's always packages and package of the packets and, of course, package of packets.

I

Like zip ties uh you imagine so there are lots of dependencies and also a lot of layer of the software. We have to peeling the onion to get to the actual code and then dependents get quite complicated. So so there is lots of kind of the beyond the subject. You know we trying to cope with the robust technology or the framework to analyzing to get to the Nugget of the actual code.

I

So that's how we scan our organization and we are reporting the a lot of the kind of the the matchings in the deployed in the system, and we actually so- and here this we present the low per se. Of course, the same technology can be used other type of the vulnerability, so team is currently kind of pulling the latest version or the new vulnerability whenever is coming out, we're pulling it and then we currently generating quote-unquote kind of the signature or with the genome representation to searching through and finding the vulnerability.

I

So this is one of the kind of the use case of the using the genome technology for the security purpose, and maybe the next chart.

I

So this might be more interesting for the audience, maybe the today so another use case. We're focusing on the genome. Technology is the S1 verification and Asia mentioned. Spam is a great kind of technology or the standard to provide what is actually the ingredient in the software whenever it's deployed or the delivered. The issue here is that the user or end user.

I

They just need to trust what is written on the Xbox and but there is no guarantee that aspirin is the correct or the company so meaning that there could be some mistake by the human or some developer. Maybe they didn't properly understand that's from format.

I

So then I missed something well, sometimes some of the vendors we know there is many instances like some of the vendors is kind of imprints the open source like GPA license, but of course, for their case, they do not include or claim that as part of the S1, because they don't, they don't want to get into the kind of legal. But that's why there's always chance it might be incomplete or incorrect. So that's why we want to verify whether s-bomb is really matches with Insider the source code or inside the software.

I

That's why using our technology energy, to understand what is the inside of Softail, meaning we are doing the software composition analysis using the genome technology. So, given the software we verify or generating what is inside and the regenerating that spam or the verified as pump to guarantee okay, this is the correct s-bomb. This is what they claim. This is matching. Maybe this is not matched so this is the kind of the capability we are currently actively building it at the moment, so in the next chart.

I

So the reason I mentioned the trust and verify is that we saw there are many different tools and many great tools, especially from Cyclone DX community, and then there is whole effort from many different companies and there is very kind of great tools to set to generate an s-spam, but we noticed that most of the tourists focusing on the generator and spam from the metadata like pip file, or maybe some requirement.txt or the package manager.

I

So these are great sources of information to generate a Spam, but the issue is that maybe it can maybe it might not be complete. So, for example, on the right tops corner. There is a list article about kind of the Mr official, the alcohol, some of the images they have. The incorrect version of word processor is claim.

I

So this kind of the mistake may happen over the time because the S1 is still new standard and it made me takes time to people to properly adapt this technology, and then we also internally tested some of the kind of s-spomb generation 2, and you see the mid the mid left hand side the donkey file.

I

As you see simple darker from the Ubuntu image you update, and then you insert the WK and then, of course, from the spawn you expect there is a double to get as part of your spam current of the generation, which is the true, but on the right hand, side you see there is a line number six which is effectively basically taking out the package manager. The metadata information will be removed. When that happens, the actual response, the general test pump, doesn't contain the duplicate inside.

H

So so young in the interest of time, let's um uh just start, jumping to the asks so I'll show just a couple other things. uh So we had a demo plan where we're going to show that we could identify. You know a download recompiles like wget and unknown package and have our our tool automatically be able to identify and recognize it, um but we'll skip that.

H

For now, we are currently building a bit of a service where we actually have a UI that we hope to actually be able to make live for people to be able to go and actually see how it works.

H

So what we have currently is several different techniques that would actually compute these different genes, uh support for both uh multiple formats for binaries packages and even different types of bytecode interpreter code, um very large, Cloud native application for processing this we're trying to process as much software as we can to make sure that the genes are robust and we can correctly identify all the different software packages and are trying to currently perform a large scale evaluation.

H

um What are we actually planning and releasing well we're hoping that some of the initial versions of the gene uh creation techniques that we have um are things that we can actually open source, uh plus the service that I just kind of show the screenshots? That would actually demonstrate the technology.

H

Sorry about that um and utilities for actually being able to handle and query uh the the large database that we're actually building up and everything.

H

um We welcome any kind of thoughts and feedback, uh support and uh insights and different use cases.

K

That's totally cool and it shows what happens when you leave the bowl of Halloween candy on so.

H

K

Keep going all.

F

Right, it looks like there's an appetite for a demo if there's time and I.

K

Know that I don't think there's going to be time for a demo Matt, but I do know that. There's a number of questions out there. All.

B

Right, yeah, I.

K

Think we can come back for a demo or set up a special one, um but I'll defer to the um chair of the working group. Institute of our time allowed.

A

Yeah I'd like to take questions and then maybe run through the rest of the agenda uh quickly and see if we have more time at the end for a demo, okay, chance.

C

Yes, uh first of all, very very cool uh technology I'm wondering um how you're proposing to uh generate all of the hashes associated with the vast number of software libraries that are that are in use in the world.

H

uh We're probably going to start a little small um looking at all the major packages that we see in major distributions. That's probably where we're going to start building this up, and if people have packages they think that these are the ones. We definitely need to ingest uh willing to take that as kind of thoughts and feedback.

H

um So we are I think our different mechanisms, where we'll be you know, are syncing different repositories having different projects that would actually start pulling in um and kind of using that as a base and then building that as we go making sure that the the genes that we compute are, you know meaningful first before, as we kind of go and scale up.

E

We'll take any suggestions from this working group because that's what the charter of this is so.

I

There is a very challenging question or the Practical question, but over the time the team has a vision to provide some kind of open service like kind of search engine protocol. So you put maybe the hash. Of course, these days you can search by hash using the virus order, but we want to go beyond.

I

This could be some kind of connect with the genome representation we have, or maybe some of the metadata you have a code, so we want to provide all various different kinds of analytics and outcomes kind of the as a kind of one place. So that's the where we have the reason.

A

I think Avo is next.

J

um Thank you all for the presentation. I think this is really exciting approach. I'm curious if you've considered how the the genomic fingerprint of software relates to uh dependency resolution of software is it, for example, um somehow inferable from a a gene sequence, if you will what the content or the the dependencies were, or some way to connect them or map them. um Since you've probably seen my work on gitbomb really focused on uh dependency resolution, this looks not the same, but I'm curious. If there is overlap.

H

Decision I've got one thought there in that church young. If you've got anything, you want to add.

I

So I think that's a really great question, and here so in the back end, so exactly to Tech or kind of address or the provide those capability to infer the dependencies. We are not storing as kind of the single the kind of relationship database. Instead, we are actually building the knowledge graph, the behind the scene, so the where we represent the dependency toward the kind of relationship between the binaries or binary the package, the package to the darker container. So we want to keep those relationship between the binary and then from there.

I

Of course, we can do many interesting things like problems from the code and how to code Gadi Borg and maybe when the vulnerability coming out, whether this vulnerability also used for other packages or the projects or all kinds of many different, interesting kind of reasoning. We can do on top of it, but that is really kind of the yeah great question.

H

What the the one thing I wanted to kind of chime in there that was kind of interesting um would be I. Guess two things one would be. uh You know we use log4j as an example. um It was you know the gift that kept on giving and one of the reasons for that is. When you look at a jar, you know very very infrequently.

H

Do you actually have like log for j.jar that we actually found, um because lots of people would actually build with Maven and it would package all the dependencies inside there, and so it actually became very, very difficult, so I mean I, I, think Jiang mentioned, but we scanned hundreds, if not thousands, of systems within research with uh you know a very, very early version of our tool and actually found hundreds of running instances along for Jay that, um had you know all the other scanners that they had actually tried had missed uh partially because we weren't, you know I like to use the word turduck in, but you know unpackaging on rolling and unwrapping all the different layers of uh the software dependencies um until you actually find that hey within this jar is the following class files and those following class files actually match the gene sequence and I'm really kind of curious on applying this eventually to things like go, applications that tend to be statically compiled and be able to look at that and say: hey.

H

We've got little bits of code from here, a little bits of code from there and they've all been added together into this larger package. So it's kind of the dependencies. uh But looking at these, these different uh layers or.

J

H

Just like I copy this code and pasted it in here.

J

Fascinating so that the sort of Gene sequence resolution is that fine grain? That's that's really cool thanks for sharing that yeah.

H

I'm not sure if we actually mentions uh we actually hope to have genes computed at multiple different levels of granularity. um The preliminary version is going to be the file level. uh We have things at function level uh and we're also again, you know evaluating what what the right level granularity is.

A

Well, that goes right into my question, which I think I was next in the queue uh which is about granularity, like you showed very small Snippets in the slides, I think that makes sense. But I guess you know yeah what level of granularity are you looking at and how do you account for, like um the presence of something in a file like, let's say, a malicious or a malicious, snippet or a vulnerable snippet is included in one file, but then it's also a completely different position in another file. Like does this account for that.

B

H

The hope is that we do at the the functional level we would have like large sequences. Like large collections of genes, we actually see within a given file and if you kind of reorder the the positioning, uh the hope is out. Yes, we it would be in very intellect the gene would actually still be there. The hope is that it uh is robust to different forms of obfuscation. uh It's kind of Jiang showed.

B

So we definitely have.

H

Had that kind of uh in our minds, as we kind of go through into the evaluation.

I

Yeah, so green Rarity is one thing the team spend a lot of time to discuss and, of course, fire level could be two cores and if we go to the basic block, or maybe even at the gadget level, it's going to be too fine grains.

I

So maybe there are too many genes we need to create, and maybe some of the gene may not be that meaningful, because that's the reason we want to start from maybe function level would be the right level, because the function by definition function should be kind of the one of the units that provide the meaningful computation. That's why we decide function will be the good Clarity to Target, but we definitely keep track function to fire by the package or by package to container those kind. Regions should be still capturing in the knowledge Grant.

A

L

If you could talk a little bit about sort of that obfuscation and other sort of how you're handling this and actually more generally sort of what is the threat model, there right you've mentioned a couple of times sort of trying to um sort of keep some of the malicious suppliers honest if they're lying about stuff on the as well and all of that sort of stuff.

L

How confident are you in that sort of being able to handle effectively a malicious Supply, which is a part of that sort of uh your threat model and how it's going to work.

I

Yeah there is, of course, a part of the Australian model, but we need I guess there is a reason we say. Well, we start with the preview of our technology, so we are developing or evolving our technology to address many different Corner cases, but we are pretty sure in this space. As you experience, there will be a lot of Economics, very creative way to bypass all different kinds of kind of de-op station technology, so we're trying to incorporate many different things and many different optimization at the IR level as much as possible.

I

So we are trying to develop more and more technology to address many different kind of corner cases, and this is actually one of the reason we want to interact with the the community because there's lots of development in the binary analysis, I know from the research community and, of course, industry community. So we all know, adapt and incorporate into the play so that we can liberalizing or to get the benefit from this development in the community and, of course, want to give back as kind of the service. So because this is changing problem.

I

But we are working on it.

H

Maybe two more proof points um some of the guys in Junk's team. Have, you know, do some or have done some work in uh malware analysis, that's actually how it some of the ideas originally started and they're. Looking at um you know, two different disassemblers might actually come to wildly different conclusions based off of uh you know: non-word aligned instructions and some interesting little things like that. That malware tends to do.

H

uh The presence of hackers is another thing that we have in our mind, uh but we're starting with you know the the open source software, the little hanging fruit, making sure it works there, uh but we're definitely aware of of some of the you know different things adversaries can do.

H

um The other thing I'll mention is that we do have a large team working on things like adversible machine learning, which would actively be attacking any of these types of techniques.

H

um So we're considering some of the different uh findings for that uh that that could potentially help address the issues, and let us do that evaluation so very much to the Forefront uh time will tell as we kind of do more evaluations and see how well everything works.

A

Any other questions.

A

So I'll add, like one thing that seems particularly useful for and might require a much smaller data set is malware detection, so uh I'm, assuming you probably already thought about that- is that maybe like a first step here that would be useful to provide a service that uh you know. We have a data set of known malware fingerprints for all that, and then we can sort of uh provide service that can scan some arbitrary piece of code or software and see if any of them malware is present.

I

So they will be really super useful security technology or the use case, but at the moment, as you present in the use case, 2 we're currently focusing on the s-bomb more focusing on the open source. The reason is, since we are still developing technology- and we all know- have the good mechanism to evaluate whether our binding or the result is the meaningful. You know so for that. That's why you're focusing on the open source side, where we have the ground, Rules From, the Source Code, and whether our finding is really made sense or not.

I

So we first want to do the verification across many different data source from the open source, and then we all kind of look into the more commercial and even the malaria side, where of course, we don't necessarily have their ground truths, but we have, we all want to kind of test out, but first our main focus would be the open source and that's where our focus is at the moment.

A

I think better you've got an old hand up so Jeffrey.

K

uh Thanks Dustin and um yeah I just think we are probably ready to yield time now, but I wanted to conclude it by basically doing a quick show of hands. If the folks attending the call want to either you know, provide a thumbs up or raise their hand, is an indication that you know this looks like an interesting technology that the working group would like to follow up on. We can talk about coming back and doing a demo.

K

um We also want to continue to socialize this with uh perhaps another working group at the open ssf uh to because we think it has broad applications. But if the working group here is interested in this uh Dustin, whatever the most appropriate way to capture some feedback, would be appreciated. uh We'd like to continue to uh preview this and may even uh discuss it uh publicly next week at the member Summit and uh but would like to come back and line up a demonstration as appropriate.

A

Yeah so uh yeah, this is a work group. That's focused on the software repositories themselves, which is why I was sort of asking a little bit about malware, because that's something we discussed in the past um and so yeah I think uh there's, maybe some ideas about how a software repository could use a service like this um like malware's. First thing that comes to mind for me, but I think we also have a working group around identifying security threats.

A

So it opens us up and they might be interested to see this and uh yeah I can't think if there's another one that might be a good fit as well might like to hear about this. But.

K

Yeah supply chain Integrity is another one that comes to mind.

L

Package analysis is another one that would be potentially interesting because they are effectively trying to to do to find bad packages out there, and this feels like it would be very useful. There is a way to group um bad things.

A

uh So yeah I'll say thanks for coming chatting with us. This seems super interesting. I'd say definitely come back when there's a service that we could, we could play with I. Think we'd be very interested to see that and try to experiment with that. A little bit.

A

um Okay, I'm gonna, switch.

K

Yeah, we'll also try and come back at a time- that's uh maybe more Nia friendly as well. So we capture that side of the group as well.

A

Yeah you're welcome to do that. uh We do record these also. So, uh ideally folks can watch the presentation, but that's hard to do q a over recording, um okay. So moving on the agenda, real quick, let me just take over the screen.

M

Dustin I think you're mutated.

A

Does this thing when I present my screen, it's really annoying I actually want to reset my window anyways. Let's do that.

A

Okay, you can hear me now uh yeah, so we have Justin here on the call and Justin's been working with folks in the python ecosystem for a while on um a proposal that's been around for a while to integrate tough into the python package index to do repository, sign, metadata and artifacts.

A

um This has been in progress for a very long time, but we're at the point now where um one of the things that sort of came out of this work was the creation of a design doc. That very specifically describes like how unexisting repository would add. Support for tough um I wanted to share that with this group here for folks that have other language ecosystems that might be thinking about integrating with tough, because I think this would be a really helpful resource, um but also Justin's here to answer.

A

Maybe any questions about what that might look like and uh yeah. That's, that's all I really want to share. This is still in progress for Python, and this design doc might be a good resource.

A

D

M

Can't see your screen but I'm, assuming you're, sharing something.

I

A

There was just.

I

A

Yeah I was just seriously.

I

D

A

I think so I wasn't sharing anything. But it's this live line item in the uh the notes, official, quite a long document, but it's about how API will integrate with tough.

A

G

uh Can you hear me.

B

G

uh Hi, this is Simona I'm, one of the people at NYU. Who's worked with Justin a little bit on this and I just wanted to mention that uh there's a number of people who worked on this you'll see their names in the the comments, especially.

G

On the document, I just want to give a shout out um and NYU and elsewhere, who have helped contribute to this, to make it more likely that uh we'll be able to work out all the design details so that we can integrate top in here.

A

uh I saw a question for uh Justin from trishank in the chat about how tough relates to Papi six door in Toyota.

F

Okay, okay, yeah um I'm, happy to to take that so we've been talking with the Sig store folks about um and I think the likely path forward for Sig store is going to be that the key type in Six Tour, which is done by fulsio, will become a key type inside of tough um and so effectively. People that want to integrate Sig store will integrate tough and then use that key type, rather than uh like just doing Sig store and not getting. The name spacing stuff like this from the tough side.

F

So right now we're in the process of working through some of those details. um As for how, in total relates to this, uh there was a very excellent uh blog post put out by someone at datadog. uh Let me think, uh oh yeah, it was tree shank uh who, who described a lot of the integration, work and things like that and really did some fantastic work in that area.

F

um So uh you know we would probably follow follow that model of integration, because that's the one of the common Pathways that we've had for tough in Toto integration in the future or in the past and will probably be that way in the future. Given some of the um uh ites and uh like poof, related work for doing it and obtaining other places.

A

Any other questions.

A

Okay, I'm going to hand it over to Betty uh who's gonna. Give us an update on the shared repository help desk proposal that Jacques was working on.

M

Oh share my screen.

M

I, don't have a pretty presentation, but uh I talk better when I have some points, so I hope you uh bear with me with my uh Google doc presentation: um okay, well, hello, I'm Betty uh from Shopify, uh as Dustin mentioned I'm, just giving an update on uh how the proposal presentation went with the TAC meeting from yesterday.

M

um Also as Dustin mentioned as well, this uh the shared repository help desk proposal is largely championed by Jacques. um So I am a less charismatic person that you're stuck with for the moment, um but but I also don't have as much context on this as well. So I can't speak to it as well as y'all. Can that said uh here we go so some background context on what the proposal is for anyone who's not aware.

M

um So uh we have a shared help desk proposal that was uh shared and surfaced out um and the key goal of it was to propose a solution to help software repos manage MFA reset requests, um since that is high risk and time consuming, um and the goal is for software repos in general to like roll out MFA requirements to larger and larger cohorts, um and in order to support that, we proposed creating a shared help desk to alleviate some of the the MFA reset requests um portion, the support that that comes with it.

M

And so, as a recap. um From the last meeting uh for the secure software repos Gordon group, um the Jacques had asked the working group if they were in favor to support bringing this proposal to the attack. um There are no oppositions, everyone supported it, and so the next piece was actually bringing it to the tech.

M

um Jacques is a way for another four weeks, I believe um he's over in Australia getting married, so congrats to him. um So in the meantime he had asked the working group if anyone was available to take it to the attack.

M

No one was, and so uh I as one of his team members, as well as other team members um kind of held that torch for him in his absence, um so uh Ashley Pierce was the one who presented to the tack um and uh the goal was to bring into attack and ask to bring it to vote on whether they support that proposal.

M

um But um uh we postponed the request to the vote and that's mainly because the attack and other members who were in attendance in that meeting had a number of questions.

M

um The key thing that was highlighted that became like I guess uh evident to us is that uh the proposal in its current state lacks a lot of details around the house and like uh more specif specificity.

M

um It's currently too ambiguous for the attack to vote on I think that's like Fair uh feedback, and they had really great questions that they've asked. So, instead of like voting on something they're unclear about um the the goal was to take it back and we need to do some rework.

M

That said, there is some like good news to share, which is that um the sentiments and the theme that um I think we took away from that meeting was that um you know members of the tax that they support. This idea conceptually um and they agree that the problem is worth solving. So what I heard was like no one was saying: no.

M

um What they're looking for is just more details on um what this proposal actually looks like and not only just um having a shared help desk as the solution, but also coming up with Alternatives as well.

M

um So that brings us to the next steps um we need a way to like consolidate or gather more feedback in the um actual I guess a Consolidated list of questions, so we can address them.

M

um So to do that, uh we had asked Dustin if it makes sense for us to open an issue in this working groups, repo uh you seem to have given it the thumbs up.

M

So that's what I'll be planning to do um probably later today, if not tomorrow, I'll create an issue in our working groups, repo, which is this um repo right here, I'll share the link in um the uh doc once I'm done, presenting um and in there uh we'll have an issue uh where we can like have the discussion, get more suggestions, get feedback and the goal is to create a concrete list of questions that we can then focus on addressing um I also wanted to call out.

M

I know we're short on time, but uh in the conversations with the attack um and other members in that meeting, they had mentioned that, like today, could have been a good opportunity for them to join us to have some discussions here. I know we're short on time, so it might not be viable at this moment. um So, given that I think the point, the area that I'm gonna Focus everyone to start getting that feedback on would be to the issue that we're going to create in the working groups. Repo um I do have an Ask.

M

uh You know this is, as I mentioned, the largely shocks um initiative that he's worked with this working group and he's uh gotten feedback from other ones as well. But uh I was wondering if anyone is available over the next few weeks, while the shock is a way to like continue making progress on it, um if not like I, think. The best uh path forward is to one collect more feedback in that issue um and gather those questions and then two we can wait for Jacques return um to then like work on that proposal.

M

Some more uh the one thing I want to highlight or call out is that the governing board meets um to like plan out 2023 budget in early December, I, think shot comes back around late November, so I don't know if that'll be sufficient time to get it in for that the budget planning, um but from what I understand, even if this isn't presented before then with like a more fleshed out uh proposal that it is possible even like early next year to continue this conversation, so the door to get funded is not like immediately closed.

M

If we don't make it for December um yeah. That's all I've got to share.

A

Cool thanks, buddy real, quick I, see you just drop a link to this document and the note stop thanks and yeah is anyone so I I. You know said that we could create an issue in the repo I think the big good way to have an async discussion about uh the tax questions about this proposal, but yeah.

A

If someone wants to lead that in the interim until Jacques returns, um feel free to take ownership of the issue in that repository when it's created and Betty asks you to just drop a link to that issue in the slack Channel and the notes, and that kind of thing when you create it cool awesome. Any questions.

J

I'm happy to answer any questions. If you all had some for the tack left over from the TAC meeting.

A

Questions about questions.

M

I, don't think I have any follow-up questions yet, even but uh if we do uh below that, we uh funnel them your way and.

J

If I'm, not in that slack Channel feel free to tag me. If you have any questions just to get my attention to it,.

M

That's good. Thank you. Thanks.

A

uh Okay, five minutes left last thing on the list. uh Zach Newman is not here but had thrown in an idea to talk about uh ARA stuff, our stuff and a future meeting uh for the tough folks around the call. Maybe you could tell us what this is. You can decide if we want to hear more about it.

N

A

Is a repository.

N

Simulator for tasks, the idea is for it to replace the um it basically to be used as kind of a Sandbox for repositories um in the repository side of tough to experiment, with the way repository setup could go for Pepe and others. The same box should that work. I think that um I haven't worked at it myself. I think a lot of the folks at VMware have done a lot of the work there um and they're all I, think in Europe, and so probably aren't here today. So um maybe we've do it at some.

N

You know at the Indian meeting sometime. One of them failed to come and and talk more about that in detail.

A

That sounds great I'm gonna just tell Zach, we said: let's do it. Okay,.

A

um Cool okay, so yeah an interesting I'm guessing the IBM demo doesn't fit into uh four minutes. So is that accurate assumption.

H

B

H

You want to show us, it's basically I think showing a Swagger interface in four minutes unless we also build the binary from from Source. What do you think Gian.

I

I can do click demo, I'm looking for.

D

Where these mice, yeah okay,.

I

I I try to be quick.

H

Yeah we're showing the guts instead of the UI I, think for this yeah.

I

So there is a reason why core preview, we don't have fancy UI yet so we are contributing on it on top of it, but this is more like behind the scene, how it's developed or kind of the Korean service, so this first UI what I'm gonna do is so one day. So why we're developing our technology Ian gave me some random DPR packages, and he just gave me this random Debian packages and he just asked me: can you tell me anything about this Debian panties?

I

So, okay, so we are currently building and let's try to figure this out what these package is, so we submit to the system and what it does is behind the scene. It is analyzing this binary and trying to correct all the information, and this is job ID I just submit it, and this will show the progress, for example, job. Of course it started that is great and for the Deviant packages, as you may kind of work on the last of the dbm packages, this is actually the AR kind of Y and inside.

I

Of course, there are many dependencies inside like these. Are the two children points of the analytics I'm gonna skipping for now and if I just refresh, you need to do the unpacking one of the inside by the Deviant binary and there's the data.xg, which also has one of the children which is speed out or the top five okay, it's still unpacking it.

D

And we'll see, is it on packing or still doing it looks like it's still doing it? Okay, very easy! This is.

I

Okay, this spider is existing. Somehow it is okay, that's weird I already cleaned up, but somehow.

D

This one decided to doing it, but.

I

All right so for the, for the sake of time, I just keep for now so inside this this is actually top five and I guess. One of the logic is if it's already processed, to reduce the performance time, so we just skip it. Yeah.

H

I

H

Because it already existed.

I

Yeah, so sorry about that I already cleaned up somehow yeah. This guy decided that so after that, what happened is that one of the fire is actually this binary, which is actually amplifier inside it, and then we call it to our knowledge grab. What do we know anything about this binary, and this is the coriander wizard, for example.

I

This query: this is one of the file we just submitted from on the TPM packages and inside there is a file called unknown and then the first heat of the result is actually the match did with the duplicate 1.19.4, which has a magic count of 118. So these are the list of the function that match it. With the specific binary we just get from the video Summit to the system, and then there is more detail about what Y type is it? What is size and what is the value?

I

And then this will show the what is the closest match? That was the top one and the second one is actually coming from the official dbm package from the Ubuntu, and this is the package name on the 19.4 and this and then this is inside the wget from this dbm package. That also has 118 kind of match it. The signature or the Genome of this, the binary and then from here I confirm, looks like hey. Yeah looks like this is double to get and I think my best guess would be.

I

This is the Frozen and the year and actually confirmed this. Is the capsules called he downloaded the compiler and gave it to me. So this was kind of the one of the ways we can do. The composition, analysis about the packages and the binary, so I guess I exactly four minutes.

I

A

That's perfect, no that's great um yeah I would say also like if you want to put together a little screen, recording or video or something we could share that in a stock. I think folks would be interested to see that as well, um but yeah, that's great. All right. Awesome! All right, we're out of time thanks everyone for joining and see you in two weeks, the immediate friendly meeting. Thank you all.

B

Right. Thank you.

C