OpenSSF Secure Software Repos, 9 Feb 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Securing Software Repositories EMEA friendly (February 9, 2023)

Description

https://docs.google.com/document/d/1-f6m442MHg9hktrbcp-4sM9GbZC3HLTpZPpxMXjMCp4/edit#heading=h.pujncb7gxv4f

A

Hi, this is the securing software repos booking group right.

B

A

Just checking thanks.

A

I think Zack said: he's gonna be a few minutes late, so I suppose, let's give it a minute or two for people to come in and um someone has to volunteer running the show for a while.

C

Hey Trisha, how are you doing hey.

A

Good, how are you Laurie.

C

Good I guess I need to get a picture on my default. It used to be there I, don't know what happened to it.

A

Zoom can be funky, you need to go to your profile and change. It.

C

Though yeah yeah, maybe I'll, do that as we're waiting.

A

Yes, um while we're waiting for Zach I guess, I'll volunteer I'm going to show a little bit until he arrives. Here's the meeting notes, please um put your attendance there, you don't mind.

C

Oops I guess I screwed up jumped in the same spaces.

A

That happens, a lot yeah.

A

Cool so I guess I guess we could get started instead of waiting a whole five minutes. Yeah just make good use of everyone's time. um So it looks like the agenda is hi I'm trishang by the way, I'm a staff security engineer at datadog, um the first thing in the in the agendas intros new friends. So if they're new people here, um would you like to introduce yourself? Please.

C

So I'm I think I'm new to this I'll turn my video on since I. Don't have a picture yet I'm working on that so um Lori Williams. Just my camera um I work at NC, State University as a professor um work in the area of supply chain security with my PhD students. One of them is here: Nusrat zahan who's been working with lots of people, so I have PhD students who work in supply chain security also have a big NSF, Grant nine million dollars on supply chain security.

C

That's multi-institutional and I'm doing a sabbatical at synopsis working in supply chain security. So it's kind of like living and breathing it all day. Long, probably like a lot of you.

A

Nice welcome excited to have you here, yeah thanks. Would your student like to introduce herself.

C

She's been here, yeah I.

B

Can I can love everyone? I have been in business last couple of months, probably I'm, Nusrat I'm a fourth year PhD student and my focused on software supply chain security, especially identification of security, metrics that detects risk in software supply chain, and here I'm in this group to know about how the ecosystem maintenance, our manager, are working towards getting more data from different ecosystems, especially how can we get malicious data from different ecosystems? So I talked with Jack and Jack invited.

A

Cool nice thanks. Sorry, I wasn't aware that you've been here before um and Miles has a shout out for you. You say it's nice to be in a call together, love the research you did with the GitHub npm team, um and you look very nicely linked to the paper there.

C

Yeah and thanks miles, there's nothing that an academic advisor loves more than to read messages like that about her students. Thank.

B

You so much I was in Twitter today morning and I looked at my some of the two and I remember: I shift connect I think it is nice to meet you here. Finally,.

D

Yeah you as well, it's um you know, always fun to seeing someone think that they discovered this. You know like every three or four weeks we get. We get vulnerability, reports pretty regularly and people get pretty angry when we're like yeah like we're, not gonna, do anything more than we've already done and we're not going to talk about it extensively more than what we have on a website, um but we're very aware of this attack because of this excellent research that was done. You know like a year before people started trying to like on mass.

D

Do this and appreciate folks who are doing research in a way like you are where you work with the companies instead of just you, know, hacking and then calling your research.

B

Thank you. Thank you. So much.

E

Hi this is Arnold, so uh I know some of you. Some of you know me probably already but I'm new to this group on that time frame actually or time zone.

E

um I'm, actually part of the open, Technology Group at IBM I've been I've, been acting in the open source, Open Standards arena for almost 30 years, and um so I have actually been focusing on open ssf for about a year and a half. Now, generally speaking, I try to make this organization be successful in what it's trying to do and I've been focusing on different groups here and they're, trying to help them as of late and mostly active in the salsa spec work group. But you know I'm always trying to attend.

E

Other groups calls to at least know. What's going on.

A

Cool welcome or no nice to see you here.

E

A

um Anyone else would like to introduce themselves or should we move on ah Zach's? Finally, here: okay, very good Zach, would you like to take over.

F

Yes, um I just heard Zach. Would you like to take over so uh would appreciate a moment to to get my bearings uh yeah and apologies for the delay. Everyone I am um uh without internet at home, so I'm currently outside um so I see we did intros uh Cairo. Did you cover the rst UF, our stuff I? Don't know what the canonical.

G

Pronunciation is.

F

G

Yeah so um yeah. Well, we say our stuff well, our stuff, we'll see but um yeah uh for the before he's new here. The uh our stuff is the kernel for the uh repository steps for tough.

G

um This is um an implementation of stuff as a Cersei. Tough is a framework to secure uh repositories. So the updates that I have uh here it's a we open uh the the pull request for the TAC uh it's in draft there.

G

um Also it's that's the update that I have for the the work group, but also I would like to to share that uh the progress for the roadmap, uh it's going very well so, uh probably end of this month. We have the the first uh beta and also we will submit the pull request to the warehouse warehouse, the pipi uh uh with the artist of implementation. uh Then we can evaluate this.

E

G

Together with the pipei folks as well, that are my updates.

F

uh That's awesome and I just want to chime in real quickly with a slightly administrative update. I went to the TAC meeting um for the open ssf on Tuesday, uh as is kind of the recommended procedure for when you're getting a project adopted, they don't have to vote on it in a meeting, but just to give them a heads up.

F

Everyone basically said, looks good keep on going with the with the pull request in that process, and once the legal uh stuff is sorted out which we're working on and I don't see any any big roadblocks. There um should be just a matter of getting a thumbs up from the TAC and then it's an official project under this working group so um excited to see uh where that goes both technically and bureaucratically. uh Thanks.

F

Okay, thank you.

F

um So I have a tentative item for me and Israel um can I turn it over to you and ask if you have any updates and sorry to spring with on you.

B

Yeah I can actually so when Jack and I was talking. We were one of the things that we have identified and probably we all know that we don't have any Central repositories where we can collect different ecosystems, data and most of the research that we have seen in so far. It was based on npm because it gives us a bit easy.

E

B

To get the data from a private switch so two, if we want to extend supply chain research and even for industry, we need a way to. uh We need a platform where we can get different type of data for different ecosystems and also, if possible, if we can host malicious package data historically, because it will also help from a research perspective and that's why we try.

B

We are currently in the process of writing a proposal to have a central repositories and what type of data we should include on that Central repositories and how we can incorporate different ecosystems to get the data and uh I also proposed one of the research that currently I am doing and I want to extend it for the different ecosystems. So, if everyone wants, probably I can talk little bit about this research or else I can send it over to Jack.

F

Yeah I think that's that's a great summary um and then again I as usual. Have the bureaucratic updates, um which is there seems to be a lot of hunger for uh such a resource across working groups.

F

um I've seen at least three different, open, ssf working groups ask for something like this, um and this ranges from very uh narrow um uh sort of wishes.

F

uh So there's an open issue, I think on Pi, PA, I'm, not sure I'm, not sure which, which repository exactly but basically for python, to publish basically a feed of malware packages that they're taking down just just the metadata, so that can kind of turn into um you know advisories from uh openssf and other tools uh in case you're using one of those packages that has since been taken down like at what point? Do you notice that that has happened?

F

If you were unlucky enough to download something pre-takedown, um so I think there's like a very narrow uh version of this question. You know, especially as it pertains to malware.

F

H

G

Think we lost to him.

F

Yeah and as well, which.

D

Zach I think the tall buildings are impacting your uh cell service.

G

F

G

F

About that I I think I'm back now. uh Where did you lose me.

B

F

Think we'll talk about.

B

F

Great yeah, so so there's there's sort of demand for that very narrowly, but I think there's also General demand for a lot of interest in metadata about malware, there's interest in malware samples itself uh and which ones you'd be using depends on who you are, if you're doing like malware, research and and so on uh miles.

D

Yeah so a couple things that I I wanted to mention, and um first um for our malware takedowns that we're doing at mpm. um That was no. That was no those now um directly connect to the GitHub advisory DB, creating malware advisories for each of them. um Now one thing I've conflicting thoughts. While I talk about this one would be. It sounds like we should get the python folks and our advisory database folks um connected both figuratively and literally, um to see if we can have that stream uh piping to create malware advisories.

D

But one of the things that we have noticed internally can be an issue and I would be somewhat skeptical of um of hooking up more things to this automated system and is uh false, positives or or um any sort of malware. That's um you know like falsely identified.

D

We had an incident recently without getting into too many specifics, where a very reasonable updater tool looked like a dropper and resulted in setting off a whole bunch of um malware um tools um that have resulted in something being erroneously identified as malware, which then results in, because everything is like so connected package goes down.

D

Advisory goes up when we had to take down the advisory we removed them when this happens not um not withdraw them, because that would not be fair uh to the people impacted, um but it does raise a couple concerns that I have overall about these kinds of Thieves, especially in the day of AI, where a lot of people are probably subscribing to these and building models out of them, and then maybe making assumptions about usernames or email addresses or anything that's associated to these things and like we, we see this overall with npm.

D

We have this couch TV feed, which is gnarly, but you could think of it almost like a large feed that you could subscribe to um and a lot of the um people that subscribe to it, um don't actually Implement like unpublished properly, so never nothing ever ends up being taken down, and so that always is a risk when you're kind of creating these seeds and following these feeds that, like you, only get the signal of new things and there's no like withdrawal capacity um and so I just kind of competing thoughts.

D

This is awesome, I'd love to see how we could like connect. It, but we also need to be really really careful with these feeds, ensuring that we have like the appropriate means of withdrawing things, um especially with all the AI models being built. On top of this.

F

That's yeah that experience is really awesome miles and we're really grateful for that, and that's I think why the this group makes sense as a sort of natural coordinating Point uh for efforts like this, so that we can kind of learn from those experiences uh figure out best practices, be those in the actual publication itself or like what goes into that data set initially or in documentation or in even just technically how how we expose that uh so I'm happy to piggyback on like uh GitHub security advisories.

F

But if there's a data format we can make that's even maybe more um like more specific right, I think uh in to some extent, there are worries that come from uh shoving this into a vulnerability. You know format which a lot of those were kind of intended for I I.

F

You know, I think the GHSA schema does a good job, but like it's not what it's for initially right, and so, if we can just have a really like feed design like tailor-made for this perfect purpose, uh especially if we can get standardization across languages, I'm gonna put stick my um sort of former academic hat on. uh We see a lot of research, that's repeated across ecosystems or Worse, never repeated across ecosystems, just because it's a bunch of additional work uh to uh check.

F

um You know each additional language ecosystem, and so we don't know a lot of the research findings to what extent they do generalize and so the more uniformity we can get I think the better. But, uh as you point out miles, it's really important to kind of uh make sure that the data gets used responsibly and so again. I think I think that's something that we're in a position in this group to do.

D

um And I just shared a link to our advisory DB, specifically from our and a couple things that I'll call out there. One is our malware. Advisories are not the same as our normal advisories. So like we're not trying to like shove malware into like the CBE format.

D

um We don't include when you look at like the reviewed advisory like count lists that are there the malware advisories like don't show up and in fact on the default order. When you just go there, it doesn't show you need to put in explicitly type malware up.

D

The reason for this is like especially with something like npm, where we have so much drive-by malware that we're taking down all the time it really is just like it would pollute everything. It's like. You know it's bad enough with all the redos attacks. We don't need to add malware to pollutants as well.

D

um So um that's the way in which you um can see it, um and so that um you could take a look, and then that was answering your question uh news, rap and so internally, um just like the way a malware takedown would look like for our team.

D

um So we use a thing called entitlements at GitHub for permissions. um We actually made it available online. It's like a little Ruby app that uses um metadata inside of the GitHub repo to manage permissions, and so we use entitlements to manage the permissions of our chat Ops and we have a chat, Ops uh Channel, specifically um from malware removals, so any malware removal can be audited and we can see who did it?

D

What was taken down and if there's problems like it's all in there and auditable, only a subset of our staff have access to those specific commands, even like a smaller subset than our overall support team is just the folks, mainly from trust and safety, and a few other folks who have those permissions um when something is taken down. um Nusrat mentioned a security holding package, um so we removed the package and all the versions We store that um data, and we replace the package with a holding package that only has like a basic readme.

D

uh The intention is to just like not completely disrupt the supply chain with like an empty package or something that would blow up if the package didn't exist, um although if someone's relying on a version scheme like it's, it's not perfect, um and then we now have a pipeline that essentially creates an event. One that happens that goes to our advisory database team. With the information they need to automate the creation of advisory in that malware advisor database.

F

Yeah that that's all really helpful uh Cutler miles thanks um and it's likely that if there is enough- or you know, as this effort kind of gets off uh within the open, ssf we'll we'll turn back to GitHub, because it sounds like you have a lot of experience.

F

And again, our goal is not to like um necessarily take over this, where github's doing a good job, but just just sort of like act as a coordinating point to make sure we have access to data from from various ecosystems in this format and again, uh with the lessons learned from from npm. Of course, um cool. So I think.

F

um The the real summary here is that, uh while there is no updated, is something uh that at least uh Israel and Laurie uh and to a lesser extent me are thinking about um expect at some point uh kind of just a general proposal for what this group can do to help out.

F

um That's probably um going to be pretty Limited in scope to just uh getting help from the repository maintainers, though we I guess, reserve the right to ask for funding from the TAC and the governing board of the open ssf, and they have indicated at least a theoretical willingness to help with issues along this lines.

F

uh With with funding support, um though they said you know, we got to come with a very concrete ask: they're not just gonna write us a blank check there, so no real update other than it was discussed at the TAC call.

F

um There's uh folks are clamoring from across working groups um and so uh highly anticipated uh work and I'm excited to kind of see where, where this is uh going. Any other questions comments on on this topic.

B

One sorry about that. One thing: I wanted to brought up so for npm I think we have I think we have some resources where they have been collecting npm data. Historically, they are part of our year right and our group also doing the same part, but for in last couple of days, I have reached out to multiple um package, maintainer or manager to ask them like whether there is a way to collect historical data for other ecosystems. So what I have learned so far?

B

They all want to have a structural platform where they can host the malicious package data, but right now they don't have it. So in that case, what is the like chicken and egg problem? Is it something package registry should start collecting? First, then, the centers reposited are yeah. That's something. I was thinking like how to go ahead from that.

F

Yeah Joseph.

H

Hello, everyone I can do some updates from ruby gems perspective uh for now, as mentioned, we don't really collect much details on like remove packages and stuff, but we are going to start really soon. uh We are in process of building new admin dashboard uh when this things are going to happen and there will be I also audit a stored next to our election. So we will know who did some, for example, removing the package and make a reason what has happened.

H

So then we can share share those data later on, so we are going to start collecting them. Maybe this month already yeah.

F

That's that's really awesome and I I am excited to hear you going ahead with that. I think it's our hope that again you're doing the hard work to like come up with the schema and figure out what needs to be added and to some extent that's gonna have to be unique per ecosystem because everyone's got their own underlying data stores, and you know, data models and so on.

F

uh But if we can also come up with common formats, that's going to be you know, shared I think it can save you all a little bit of work, so I don't want to stop you I think rubygems should definitely go on ahead, but once we have a few examples of that that we can, we can sort of generalize and hopefully uh make the.

H

Amount of work- we are really flexible on this and we are really flexible. So we are going to start with some classic classic stuff to be stored. We can extend over the time to in line with the standards. So that's: okay,.

F

Great looking looking forward to hearing more about that.

C

Yeah is there an example that he should follow for format.

F

uh I, don't know if that was for me or for Joseph.

I

F

Yet we we don't have um examples, I think we could talk to um see uh some of what like npms is doing here, uh but Joseph do you have um design docs or anything you can you can share out, or is this just kind of okay cool? uh We would love to see that once it's ready, but obviously do that at your own at your own pace.

F

Yeah cool all right, I think I. Think that um just about covers that for now uh look forward to more concrete updates, I hope in the in the future um and yeah thanks again to all the folks who are actually building these things that we can learn from as we as we try to generalize uh can I turn it over to tree Sean.

A

Yeah thanks um I just wanted to add something that um to the previous point, which is that um datadog has open source. This tool called guard dog and maybe I should send a link here. Let me see if they can find it kind of cute name, isn't that card doc. Let me see what I can find it there. We go okay, so what this tool does? Is it's compiled a bunch of heuristics that we found tends to identify malicious packages? Malicious python packages I think we're working on an npm version of it.

A

um What I'm thinking we should do is help npm and Pipi scan all of the packages and contribute guard dog findings. Back to this shared data repository that you're all thinking of.

F

Yeah, so this this opens a little bit of a can of worms. I would say uh in general, the sort of like uh automated or semi-automated malware uh detection um is tricky, I think to get right, and so, if I can, if I can do my own horn for a little bit, uh not plugging this.

F

uh Let me see uh this is a preprint of a paper that I'm going to drop in the chat um and it will appear in much prettier much better edited form at uh the Dixie, the conference on software engineering uh this year and basically, we said, let's take a look at a few python, malware detection tools and learn how well they really work. But we did this by talking to some of the folks involved in the python sort of malware ecosystem and how much we can generalize from that. I.

F

Don't know right, uh not like npm as miles was mentioning. It has like a support team associated with it of of paid folks, which is really awesome. Pipi has more or less Dustin, um who, uh unfortunately, isn't around today, but um and, and so Pipi has in the past, experimented with automated malware detection and basically, what they found is that almost any non-zero false positive rate is something that they consider unacceptable just because there are 20 000, plus packages and I think the numbers even gone up since I last checked every week published Pi Pi.

F

So okay, so you have a tenth, a hundredth of a percent that still results in like a a non-trivial burden on maintainers if they are going to to sort of review everyone, and they do prefer to err on the side of not avoiding false positives by kind of manually, looking at things and so on. um So I come down a little bit skeptical, not on the value of tools like guard dog, but on the deployment in an automated fashion.

F

That said, I think signals from tools like this could be extremely useful in sort of highlighting things to take a second look at if, if that makes sense, so I don't I, don't want that to come across as, like a uh you know me being a against this and I think it's awesome that datadog is released and this is open source. We ran into a problem with this paper, where a lot of other companies have tools like this and don't open source them. So I'm excited actually to take a look at guard dog.

F

At some point. uh Sorry, Joseph.

H

um I'm involved also I'll mention. We already used some tool in rubygems. It's called lot Source, it's a different dial I can show I think it was already better to be back to this group.

H

um You'll be definitely interested, maybe to contributing some common patterns. We see in ruby gems to that cardoc to be able to Outsource can ruby gem packages super nice and we use it as Jack mentioned currently in manual mode, so we are getting manual notifications. We manually evaluate the severity and then then decide to perform next action on.

H

So it's just sending our alerts, but not doing anything in automated way, and it works really well for now, there's just some kind of delay like when anyone is around to be able to to evaluate this one so yeah. What's up super thanks for sharing this question, I will take a look if there's anything, we can contribute as well to maybe support Ruby James as well in the future on this tool.

H

Maybe you can answer then later use it to automate some stuff, and, as I mentioned, we are building the new admin dashboard for rubygems. So this is I. Think planned to incorporate right. It sends a lot of notifications in there for now manually evaluated, I'm actually left right uh to block proper packages. So we can do some manual effort first and then maybe see maybe like um partially right something you will see. Maybe some patterns we can rely on one other person, so we can automatically block. Some of them will be still manually manually, evaluated.

F

uh True shank, you have anything on that. You want to add her.

A

No, no I think I. Think that's good to know thanks. This was very useful information. um I agree. Yeah drowning in false positive is no fun yeah.

F

Yeah anyways, so I think I think there is a lot of room for again this. If this um group is going to uh start collecting shared data across ecosystems, I think you know signals from our scanners, uh but in both directions right uh both helping those scanners run uh more easily on lots of packages um without having to write like a parser for every repositories. Feeds uh would be nice and in the other direction.

F

Once those findings are there storing them in a common format um so that we can, we can sort of pull signals across and again all of this, we would want to review who gets access uh right um uh pretty carefully before we open it up to the entire world. Obviously we're the open ssf we're in favor of openness by default. But but again um you know when it comes to malware stuff. uh I always want to be a little bit careful there.

F

um Yeah I think I think that could be another again useful function of like let's collect package metadata in uniform formats and again there are a bunch of initiatives here and I think uh this group is a great way to unify a lot of those cool all right. Anything else on that point before we we skip to trishank's next item.

F

Going once going twice all right: um cool trishank back over to you cool.

A

Thanks um sorry, I came a bit unprepared so for forgive me, I'm gonna, I'm gonna share my screen, and hopefully with enough waving of hands. You get what I'm trying to say. I think I think it's a very interesting topic for this group. So I got inspired a few weeks ago, New Year I was bored um and I've had enough of reading about dependency. Confusion attacks like enough once and for all.

A

Let's stop calling the supply chain attacks and and try to fix this problem once and for all right, one of those grandiose, New, Year ideas, um and um so it dependency confusion attack happens when so. This is a real real example of what happened to a torch, Triton um I. Think a lot of you are familiar with this already, but exactly why this happened around New Year um around Christmas. What happened to George Triton is that they have their own nightly index their own pipei repository that has some of their packages and what happens?

A

What happened is that someone seems to have um reserved the same name that didn't exist on Pi Pi, the public python Repository.

A

So what happened when you use two of the repositories is that if pip the package manager is not careful about it, it would end up preferring the package from the public index, which you may not want, because it's not actually authored or developed by the original authors, but rather someone who has reserved the name and is hosting malicious packages. In fact, this is what happened. It was actually malicious code.

A

um I forgot what it did exactly, but it was malicious, so they had to clean all of these things up um uh that has been removed from their nightly index. They've also reserved the name um um on on on Pipi and and and so on. So does this make sense so far. Basically, the attack comes down to if you use a package manager, and you use more than one repository more than one server to try to get packages.

A

If you're not careful about it, um you could end up trusting the wrong repository by accident for a package you're. Looking for the the canonical example is when, when the original author wrote about dependency, confusion is that if you have an internal package, for example, let's say you work at meta, not not picking on them in particular, but let's say you're working at meta and your private packages, your private python packages start with FB hyphen, whatever star, let's say so.

A

All your private packages starts with FB, star, FB, hyphen, whatever let's say: utils, FB, FB, hyphen, noodles um and and attacker. Who knows this could reserve the same same package on on same projects on Pi Pi, and if pip is not careful about, it would install the one from Pi Pi, not your private company server.

A

So it's a problem and a lot of companies fell for it.

A

um So my purpose of solving this um and I'm not saying this the best one, but it was the idea of using something that we call the map file. It's an idea from Duff we're not planning to use stuff here, but we're planning to use this idea from Duff. The idea is to map packages to their respective repositories.

A

So here, for example, how we could have avoided the python situation is that we would say: look if you're looking for any package that looks like it starts with torch, anything that looks remotely like torch, whatever Dodge, Trident or Doodles, whatever um you would trust it only from the pie, torch, Repository and, more importantly, you would terminate the search it's it's it's it's listed in priority of indices. You should look for so.

A

The first thing you would look for is in by torch if you're looking for torch uh torch package- and you would more importantly, here's the thing- here's the subtlety- you want to say that, even if you don't find torch Triton on pytorch, you want to stop your search. You don't want to backtrack. You don't want to make the mistake of backtracking and looking for a look for it on pipei, just because it's not on your private server, so you might want that safety guarantee in some use cases not all um anyway.

A

I think you get the idea. um This is basically a nice way to to to to segregate packages and and into different repositories. So you don't run into this confusion. Anyway, I started this whole discussion and the too long didn't read is that um um it's not that straightforward I lend a lot of lessons. So this was nice. I learned a lot here and I like to share his findings with you.

A

um Basically, what I've learned and I'm gonna paste some links here? um Yes, I've I've pasted the Biden discussion in a big issue. What I've learned so far are things like this? The first thing is that uh it's it's not the easiest thing in the world to change uh uh a package and especially an open source package manager for very good reasons.

A

um I'd say the inertia is high, but for very good reasons, reasons I won't get into so uh sorry. Let me uh there we go so.

F

Proposal so you talked I, can I can transcribe.

A

Oh great, okay, thanks thanks, I appreciate it. So what are some things you can do here? One is do nothing the status quo, which I think everyone would agree is like not great The Other Extreme is to use something like the map file, but the complexity is high. You do have to change your package manager. People have to learn like this new thing understand the semantics. It's not gonna happen overnight. Basically forget about it.

A

Okay, so what's this second approach?

A

Sorry- and actually this is subtlety here- um that took a lot of argument back and forth to try to find out, but I'm I'm glad that that we all finally agree um It's Tricky when you have something like the mapping file, when you have multiple okay, let's say your package manager does dependency resolution, which almost all modern package managers do um and it backtracks. So basically, what happens is you have to explore different sets different choices for packages, namely different versions, to see whether they're compatible with each other?

A

And if you think about it, if you add mapping, if a package, if a project can come from multiple indices, you've increased the complexity of your back of your dependency resolution here. So basically, you need to change your dependency resolver to to consider choices of repositories, which adds more complexity.

A

So there's that that's something to keep in mind of okay, um the the third option, I, would say, and Donald stuffed came up with it. If you, if you look at a GitHub issue here, um is simply erroring out and and Dan Lorraine from chain guard had actually implemented a very simple prototype. But what a weekend to do this? It's it's quite simple. To do what you do is in PIP, if it sees that okay, so pip the way it works right now, I had a lot of fun digging into the code.

A

um If you have multiple indices, multiple repositories, it treats all of them with equal priority. So if it comes from any one of them, it works um so now to avoid the torch Triton situation. What you can do is, if you see a project, come from multiple indices. You just failed. You just error out and say: I refuse to handle this. This might lead to a dependency confusion, attack um yeah. So that's one thing that you could do to start with right now and Donald's stuff has done a great job of yeah.

A

I'm gonna link it here. This is the uh where's the link, it's a long discussion, as you can see so, basically python. The community seems to be going towards this. It's it's! It's it's a it's a layered approach. It's not any one thing that they're planning to do right now and I I can't say: I fully, understand it yet, but it's a layered approach. I'm gonna!

A

Do this verse and then the second thing first and then third thing so I highly recommend all of you to read it because I think I, suspect mypi is not the only repository. That's concerned about dependency, confusion, um NVM or ruby. Gems might be interested too. So I highly recommend that you all take a look at this and figure out and and maybe take a look at some of the lessons that we've learned here.

A

I think this is a very useful discussion to have about what is the most practical way to start solving this problem in the real world. It's not something that you can just change overnight and I'll. Stop talking here. Sorry I think Joseph wants yeah.

H

Yeah thanks um I think it will be best to show my screen uh to explain since in ruby gems. We already took some steps I already also some mentioned in in your talk.

H

I think I can share my screen and actually do a quick demo how this works in rugby gems today, but probably address a lot of stuff already you are thinking about. So this is.

E

H

Gem file specifying okay, can you read the file somehow.

F

A little bigger would be great.

H

I will open it this way and then I can do this right.

F

Yeah that works for me so.

H

This is uh game files like picket Json uh requirements, those text right, similar stuff for Ruby jams. We specify the top level Source. This is the public repository we maintain so rubygem.org. Then we specify individual dependencies and also you can specify separated block of additional Source like secondary songs, and you can specify within this block which dependencies must be coming from this blog. This is part packages on GitHub, for example, um and if you do this and let the bundle resolve your stream file, there are two sections each pair each source.

H

So in my side there is the one public one really long, one and one short one for this private part only and those those dependencies needs to be coming only from this one.

H

So if anyone will Acclaim this name of a public jam, it will not be considered during Rich solvent uh candidate for this one, because it's coming from different Source, then explicitly you you mentioned in in gem file before this is actually changed, which happened like a few months ago, after reaction to the uh dependency chain attacks before it was common uh to make two top level sources, and then the problem was, as you mentioned, uh they had the same uh priority.

H

So if I do this today, uh try to do bundle, I will get a big warning and on the top, just cancel it so I print this out and began expected. Oh let's do it again. So if you do this get wrong, we are trying to to keep it possible to make it for easy, easy migration. But it's explicitly mentioned right. You have two Global Sources. You have no idea which one should be there should be selected and it may result into security risk of installing unexpected gems.

H

So if you wrap it into the blocks, you are.

F

H

Those jams are should be found only in this alternative, so the rest is actually uh ignored for the source right. So that's uh sorry, that's how we tackle this in in ruby gems these days and once this happened, I think this solves the majority of the problems we we were aware of, so you. In short, we are explicitly saying those dependencies belongs to this source and during resolution we are not looking to any other source uh to find them.

H

That's how we do for now.

F

That's awesome, that's really cool I think I saw Jonathan's hand first.

I

So um one of the things that uh pops into mind immediately is I know that a lot of corporate organizations set up um like jfrog artifactory or something like that, and they usually mirror a bunch of dependency or they mirror a bunch of servers.

I

um Now so I I haven't used a corporate mirror in a long time. In my experience, um what I? What my experience with the J frog, for example, is that a lot of times people will set that one mirror up and mirror a lot of different repositories, and so that it's not usually the dependency resolver? That has the problem. It's usually the excuse me the mirror that has the problem, because the mirror is mirroring in such a way.

I

That is um uh uh both referring to the public repositories and to internal components that are being served from that center. From that one URL um I, don't I'm, not saying that this is something you can solve. I'm just saying that's you know. That's that's been my experience as a corporate user that that's that's, how people end up, building and setting up their their artifact resolution um and probably something that needs to get changed as well. But this this seems like a good approach.

I

um uh uh Yeah um I need to I'm gonna switch to my laptop I. Also have a uh I, don't know if anybody's shown the Gradle way of doing this, but I'll I'll I'll stop for a second so that other people can jump in and chat.

F

Oh I think I see trishank next.

F

uh Tree shock, you're, muted,.

A

Oops sorry, there we go um great thanks, I'm glad we're having this discussion um um Joseph that that looks quite it's quite close to I, I, dare say it's mapping. So it's very interesting. Do you have like a documentation or a blog post, that we could take a look at.

H

Oh definitely I think there was some announcement uh for some release. First, one including this so I can I can now to the after the meeting try to find a link and pink. You want to honest electrician, great.

A

Thanks appreciate it yeah sorry, just to add to that right, exactly I, just added to the notes, there's a few other Solutions as well. That came up. One was um why not? Why not use lock files um I, think Dustin actually brought this up um and Zach has noted some of the downsides there.

A

um The other one is most notably I would say one of the downsides. Is it's not going to help you when you are first result when you're first building the log file itself and you have multiple repositories, you still run into the priority Problem so that we have a bootstrapping problem. The second approach that came up quite a lot is why not just use like a network proxy, like an index proxy that that you can use to mediate to all all your requests to the public service.

A

um There's a subtlety there about backtracking dependency resolution, then I don't want to get into right now, but but it's not going to work. For that reason, I think Jonathan has a has a remark.

I

Yes, I just want to give a quick demo now that I'm on my PC of um uh get a great great old handles this. So Gradle traditionally.

F

A little bigger.

I

Oh all right, good call nope, not that way. Oh for the love of there. We go okay, so in Gradle build files. You'll see this like repositories block. um You know, repository comparing Independence on.

F

I

Group Google, you can also declare multiple repository blocks where you're, depending on multiple depend, multiple different, artifact servers. um Your repositories block is independent of your uh dependencies block. um The thing about the Java ecosystem is the Java ecosystem is very heavily.

I

um The ingrained in the Java ecosystem is a namespace right, so they have group name version, so groups are usually reverse DNS paths or reverse DNS so, um and you traditionally have had to own the DNS record, to publish division Central and be able to prove that you own the DNS record for that for that artifact or for that domain and then um uh filtering it was implemented as an additional feature.

I

um uh It's IV IV um repository content filtering, so you can say for maven only include the group, my company um or you can include my red, regular expression um for this or exclude, but you know for Maven Central exclude by regular expression, my DOT company, um and then you can also say um exclusive content. This repository will not be searched for artifacts um right, so Maven, Central and then exclusive exclusive content for repository filter only do this right. So this uh this this will prevent Maven Central from being searched.

I

For this group um at all, even if it's declared first um and then so yeah, you could also release only snapshot only stuff like that. So there's. If you're curious about this I can drop the link into um the chat and some I don't know the DACA, but yeah.

F

Yeah I think dropped it in the dock, so um but yeah, that's quite interesting, um okay, so here's where I'm at with this question is I think one very useful thing that we could do uh as this group uh would be to basically just say rather than having each package manager slash repository, because this is kind of a concern for both solve this problem from scratch.

F

To start I wonder if we could systematize this knowledge, somehow um yeah I think you know we we have could do a comparison and contrast how this works in a bunch of ecosystem pros and cons, and maybe even uh we'd feel strongly enough to come up with a recommendation if you're able to do this from scratch.

F

What would you want to do here, um including possibly trishank I, think the baby step that you highlighted is really nice, uh where it says: okay, like there's a nice heuristic that sure um it doesn't handle every Edge case like exactly how you'd want but like in cases where there's ambiguity we can just freak out and and um try to try to stop that um I've learned better than to volunteer to write this myself, but um I can just file an issue, but if anyone has interest in uh trying to pull that together, I can I can assign that out uh to you, I'm also open to other suggestions for things that we could do uh in this group to to influence here and and help solve this problem.

F

More generally,.

A

I I volunteered to help with that with that document, since I started this whole mess and I want to see a resolution so.

F

Awesome thanks.

I

Cool you definitely did not start this mess. This mess was.

A

This man, it was, it was 2017 I, think I forgot his name yeah, but yeah.

F

um Sweet I- and it would be great uh if you do that- to follow up in uh the slack channel to um because I'm sure plenty of folks would love to contribute and uh anyone who's not here would would be interested to hear about this as well um cool all right. That's uh very interesting, uh I think useful, thought-provoking discussion to have um anything else on this or if we solved it once and for all, or at least made a plan to solve it once and for all.

F

Going once going twice: okay, cool uh I don't see Mike Lieberman on this call uh was anyone I, unfortunately, wasn't able to make it last week. Was anyone able to make the APAC meeting a couple weeks back who uh listened to the salsa distribution and Discovery item um and feels like they want to uh summarize that for us? Otherwise we can wait until um I can make it or he briefs someone.

H

I just filled in the links to the document deck and also uh there was one additional topic quickly discussed about. The processing of the gems I also are packages in general I also left a link for one of the gem proxies how they uh it played, apply the same logic on the proxy level, so it works equally. So it's pretty fairly private over the Public Public sources, there's a link in the document. So if you detect the local reach me on it's like, if you would like to get more details on this.

F

um Cool awesome thanks yeah, thanks for dropping that link in um and I think I will I will just make sure it should be in the calendar. Invite, but I can also make sure everyone in the chat has has these links too. um Okay. So then um I'm gonna table this as I didn't discuss, discuss because of insufficient okay.

F

uh And uh see if we can't um I can try to make sure um either he's here next time or I get a briefing from him and can present this on his behalf. um One final point of administer via uh I: don't know that there's a super formal process for this, but I am uh uh not listed. Currently. As a maintainer of this working group, I would like to add myself.

F

um This has made things a little awkward when I go to the attack and they're like who are you and I'm trying to present things on behalf of this group? um So uh I guess I just wanted to float that here before I like made any moves, I will go through the trouble of I, don't know making a PR to the repo and getting approval from the existing maintainers. We don't have a super formal process for amending that, but I figure. If we get consensus of everyone who's in charge.

F

No one can fight that, but wanted to. um If you are terribly offended by that, I wanted to give you some notice. So you can I, don't know, start your summer campaigns.

F

um All right, then, I am happy to wrap up a couple minutes early, not least of which, because I am slightly underdressed for the weather. um But thanks thanks for showing up as usual uh fun discussion today um and then see you all in two or four weeks, depending on time zones and in between now and then on slack any other business before we before. We leave.

B

F

Have a great one: everyone thanks.

A