Internet Engineering Task Force 113, 25 Mar 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: IETF113-PPM-20220325-1130

Description

PPM meeting session at IETF113
2022/03/25 1130

https://datatracker.ietf.org/meeting/113/proceedings/

A

Five o'clock, I could tell when.

B

A

One more minute.

C

C

I tried to close the doors to the room and failed so.

A

Well, I don't have don't have any uh any advice on that.

A

A

Well, we are officially.

C

A

Start time and I think nobody will feel slighted if they miss the first chair slide. So let's get started hi. My name is ben schwartz. Together with sam weiler, we are the chairs of privacy, preserving measurement, and this is the very first meeting of this brand new ietf working group. Thank you to joe salloway for being our uh our surrogate chair in vienna.

A

If you have any questions in the room, please take them to joe uh sam and I are both on uh in east coast time in the united states. So uh uh if we, if we say anything that seems a little.

B

A

Because it's still pretty early in the morning here.

A

A

A

A

Okay, how's it going joe are, we uh are we ready to go in the.

C

A

C

Trying to right now the doors to the room are open and there's kind of I think, a lot of ambient noise coming in. So I'm trying to get them closed, so I think we're ready to go. I mean.

A

Okay, uh well as long as as long as it's not too distracting in the room, maybe we can. We can keep keep moving.

A

As always, please note well that you have certain obligations and expectations by participating in an ietf session.

A

uh It's always relaxing to be the last session at ietf, since most people have had a chance to really look at this quite a few times this week, but just in case you haven't, this is legally binding. So please understand, what's going on here and abide by these rules when participating at the ietf.

A

uh This is a hybrid meeting if you're in the room, you probably know that you need to check in with your mobile device in order to be counted as attending this session. Please do that.

A

A

There are some resources that we've been encouraged to share with everybody.

A

And this is our agenda, we're starting off here with a chance for anybody to to ask questions about the agenda. Then we'll we'll have an introduction to ppm from eric riscorla. That seems like a good way to start off the very first session of a new working group.

A

Then we'll have some discussion of a draft, that's mentioned in the working group charter and some changes that have happened to that draft since the since the working group forming buff and also a new draft.

A

That takes a different approach to some of the problems we're discussing here, and hopefully we'll have enough time after that, to really get through everybody's questions and maybe even end ahead of schedule.

A

So that's, uh that's all we've got for for this meeting. Anybody wants to raise questions about this agenda. Please speak now, otherwise, we'll hand it over to eric rescala.

D

I I am talking next obviously, but I did have a question with the agenda, which is um uh uh we probably need to talk about adoption of the of the um of the draft because of that, this charter item um um in two ways, so um uh um so just like that should go at the end of this. uh You know at the end of in the beginning of this, some region changes et cetera.

A

Okay, I think that we will. We will note that, there's a that, we will definitely raise the question of draft adoption after the presentations in this during this session. Thanks and uh now it's yours, how do you want to handle slacks.

D

I think I just press this button which I can refined. um Then I will be good or are you now now you have to press a button yep.

A

Oh, uh I see I need to stop sharing before you can start sharing or.

D

I can just talk to your slides, which would be entertaining.

D

All right, uh however, you're right so for those of for those of you who'd entered the bath. This is actually a nearly identical presentation, so um my apologies I'm not forgiving it, but for the fact that you had to wake up 20 minutes earlier than you otherwise would have. So if you've already seen this, uh you know you can feel free to go, get a coffee or something um because really it's just a subset.

D

um uh So um the sort of order of this presentation um is first going to talk a little about measurement scenarios, um namely the things we're actually trying to accomplish. um Then I'm going to talk a little bit about various methods of nano's measurement trying to set and context the work we're doing here.

D

Apologies to the star people- I didn't actually cover star in this, but they're going to so that'll be fine um and then I'll talk, something something about um um um mpc. um You know the specific technologies which um are the ones that are like larger than the scope of this working group. Again. um uh You know this is mostly stuff in draft gpew.

D

um uh um I'm expecting, like you know, chris, to cover a lot of the detail, but this is just to give you sort of some situation um and- um and I think we'll stop um so there are a lot number of situations where it's desirable to learn information about people. um You know this is um not only true um uh on the internet, but intro generally, um you know various forms of public research. um The census is a good example.

D

um You wish to learn information, which is science sensitive, like demographics income, you know medical issues, those kinds of things um you know having a sense for instance, of how many people have had or having covered, is obviously recovered response. um As an example, um then there are commercial reasons I mean for pi development. um You know if you make a product you'd like to know which features people use and don't use in your product, um so you can make them better or take them out.

D

um uh You like to understand where your product is failing to work, um and this um this is obviously very important for consumer products, that um you know uh uh that that you know people use and they go wrong, and so you know if, if your body doesn't work on a website, um you know it's pain to have. Customers have to have to report it to you so you'd like to know it didn't work um and like to measure where it didn't work. um As soon as you know, addresses are most important.

D

um This is a big issue for for firefighters by the way um um and then there's a various because behavioral measurements, they're great hyper are, you know, are using your product, um uh you know uh so. um First of all, it was discovered websites which, which you know previously existed, so you could like you know, people are now going this website a lot. um Maybe it should be. I don't know in the search index um um and which information people are most interested in um again.

D

You know, maybe there should be something which which would like have a search index or like to tell them about um proactively. um So it's like you know, measurement is all over the place.

D

um So that's this information is obviously super useful, but the problem is it's also in many cases very sensitive. um You know people um so take my example. From earlier um having having it known that someone had covid or has some other medical issues, that's something information they may not wish to be known. um You know, income, sexual orientation, other things are, are um gender identity or things people? um You know an aggregate. um You know people would like to know, but individually people necessarily do not want to disclose.

D

um It turns out that um you know that even much less sensitive data um can um or paralyzing data can be very, very revealing, um and especially when you take a lot of less sensitive data. I've got this a square course for a reason: um it's not emphasis. It's uh reminding you. The less is a um is a matter of judgment. um um When you put lessons of data together, um you know you from a lot of people and so there's this famous incident.

D

Where target sort of looked at somebody's um shopping history, he had figured out that this, this girl's pregnant um um and I I've heard other reports from people the same thing.

D

You know that that that they'll have just being some life change and they'll suddenly start getting advertising that, like um than just space, that life change- and uh so um you know this- I mean this concept is the foundation behind like modern web advertising, that you should take a bunch of like small information, glue it together and use it for use it for uh uh um is for targeting. So um it is definitely the case that collecting a lot of information, even if it's like normally not sensitive, can be a real problem.

D

um So this tension between information gathering and um and uh the power of information gathering would you like to have and um and the privacy problems that that creates. um The good news, however, is the tension is created by technology. It's not created by it's, not inherent because, mostly what you want to gather. um I I wanna caveat that there's, that's not always the case so for like web advertising um for tracking it's super complicated, but um for measurement you typically wanna measure um aggregates.

D

So you wanna measure like what is the distribution of people's income. I don't care anybody's individual income out to the distribution, income or distribution of household expenses. You want to know what fraction of people have had coveted. You don't want to know whether anybody individually had coveted um in general, I mean. Obviously there are cases you know um like the people in indiana.

D

Perhaps you might want to know somebody individually has copied, but um for my my my comfortable posture in the back in my room here, um you know I don't care, um I don't need to know. You know which individuals uh have covet. um So um you know. um Is there like a so um there's more complicated aggregates, so you might want to measure like the relationship between income and height, which, by the way there is one um uh taller people on intent, have higher incomes.

D

um You know what are the most popular websites again, I don't need to know websites. You know individually. Someone went to all I need to know. Is you know which one is most popular? It's often uh it's often necessary, however, not just to have like gross aggregates like that, but people slice the data in multiple ways, so you say well, you know I want to know what distribution people's income in california or I want to know, take two arbitrary variables and compare them. So I gave this.

D

You know: income and height distribution um example, um but you might want to say well, you know covered rates sliced by age or by income or by demographics or other things. um So again, the key point is that these individual values, like aren't necessary, do this work what's necessary to be able to work on the data in various ways.

D

um But as long as you can work on the data, the individual values actually is not particularly helpful um and um and of course, a privacy problem, um and as someone who's done quite a bit of work, I can tell you that, like one actually very rarely looks at data values individually, um once you get past even size, because it's simply not practical, you have you know hundreds of thousands of data points and like what one data point doesn't help you very much um and if you do have and the only times you really do look at them.

D

It's like there was a giant outliers or something you're like. Why is there like one person who's? You know who appears to have visited like 50 million websites and like that person's, like obviously not probably that person. um So there are a variety of different measurement types that you like to collect. um You know there's sort of simple statistical, metrics aggregates um like mean media and some histograms and those kinds of things.

D

um um You know the typical things you've learned is stats like one-on-one class um um and then there's like there's things like relationships and multiple values. These are all stuck at stats. 101 kind of things um you know correlation coefficient ordinary squares, uh blah blah. um You know um this can go. This, of course, can get very complicated.

D

uh You know up to like machine learning, algorithms, um uh uh nothing we're talking about today, really will let you do like you know, we'll let you do a deep, deep learning, um unfortunately, um and then there's a there's, a very specific task, but actually quite a common task, which is common strings, uh which is often called heavy hitters and um the the the problem here is to say like um what uh you know what strings are like most common um between people um and so like that can be a lot of different things.

D

But a common example that google gave in their in their report paper is what um what um home pages are most common on people's people's machines. And so um you know you don't want to know like what you don't want to know like what home page someone has like. The only person has it, but if, like half the population has to let you know that, um and the nice thing about like about all these values, let's say is like just to pull back.

D

Is that once again, um they don't actually depend on individual values only depend on knowing that this the desired good um and and this heavy hitter's problem obviously has a property, and you only want the most common ones. You don't with individuals um and that helps by the way, mitigate the privacy of these heavy headers. Because someone has you know as their home page, they have their own. You know some google doc that they accidentally, I said, set as like happens. Some google doc, which um you know is like, is like sensitive well.

D

You don't want to do that necessarily so um so one example use case. um Is you know user interests? So um you know what kind of sites are you just visiting um and again that you know I don't need to know, um and no one and typically until no one needs to know exactly what sense it is.

D

um But you know, but you like say: okay, you know bucket size by topic, so you know how many people are on medical science, um so um see a bucket like the topic, you have a number of like visits and minutes people span on each topic. um But again you know, even though, even if you're not measuring individual sites, the topics themselves can be sensitive. So if someone is visiting a, we know a website associate particular medical condition.

D

You might worry that they or someone in the family has that condition, and- um and you know you don't want to have to collect that information. So it's like one problem statement is like the distribution of time spent on each type of site. It's obvious to like be able to generalize this kind of problem um and anytime, you have, you, know categorical information from users and you'd like to collect. um You know this category, so you know it's not.

D

um You know not just necessary sites, um there's lots of other things as well um and okay. One thing you're going to hear throughout this presentation is that you know these are these are general small? These sort of stylized problems often can be bootstrapped into like a lot. A lot of different kinds of measurement can fit these kind of stylized problems.

D

um Another use case, um which is um uh uh which we see very often, is this like web compatibility problem. So um you know uh the web is really big and that websites sometimes will not work on a given browser, and so um you know, voices very often the website works. Fine, icon and chrome doesn't work on firefox.

D

It works fine on firefox release, bills and work on firefox beta um and it it uh users will have support these problems, but often they do not, and um they'll present doesn't support these problems, but often they do not, uh and so you know even we have fairly gross skin or problems. Often they don't get reported and even if they do get reported, um you know the latency of the system means that, like suddenly a lot of people experience problems before you've had any chance to fix it.

D

um This is like a really quite large problem for um for for for browsers, like silver and firefox, where sometimes the developers a lot test on the product. um So the good news is that often you could detect breakage on the client directly either directly by you know the api fails or they try to use any api which doesn't exist or because the users like what they call range clicking, trying to reload the page over and over and over again in order to see, hopefully, it will fix which sometimes it does by the way.

D

D

A very specific case of this is that it wasn't quite well. Compatibility was a similar kind of problem, which is many websites. Do this thing called fingerprinting where they measure persistent properties to create a browser fingerprint. So this is an obvious threat to privacy, it's alternatives and cookies, and it's it's. We see it being used, sometimes when cookies aren't available or when the people don't trust cookies or something. This also is often detectable on the client, uh because you know you can see you can see use of apis.

D

It doesn't make any sense, so an example would be like they do webrtc uh apis, but then they don't actually have much appear connection. um So he's like. Why are you collecting, like you, know the ip address, but you're not clicking here connection? So um it's very hard to learn with these issues um on a mass scale, um and we, and even though modern browsers, have what's called telemetry, which would say they report data back back to the manufacturer.

D

They do it uh only with basically non-sensitive data, because for obvious reasons, we wish to preserve the user's privacy and we don't wish to learn whichever that they're actually on. um So this is the problem statement here which like collect the sites where the client sees issues, but do it in a way where uh actually I've written the palm team wrong. Even the problem statement here is collect the sites on which clients are aggregate, seeing issues.

D

I I don't care you're, actually seeing issues on for the reasons indicated previously, um so the the, um if you so pulling back. um You know there are multiple kinds of privacy problems, so um one privacy problem is collecting um sensitive data, which is directly tied to identifying information, um so say so, for instance, a concrete example.

D

If I just report back uh what, um if you have a program which report back what website everybody's going to, um even if there's, even if it's not tied, um even though there's no like email address in there, the ip address is enough just to see that right and so the um and so like the the first privacy problem, which is just gathering like sensitive data but like I usually identify directly attached to it in some way.

D

So that's problem, one um problem two is um is even if you don't have an identifier collecting sensitive data um along with some um some non-sensitive appearing identifying information. um So as an example, um the time sweetie in 2014 pointed out that if I just had your zip code, your gender and your date of birth, that's enough to identify like 87 of people in the united states.

D

um So um so so there's a good example here is say like look, you know, um is people's income individually, isn't isn't isn't uh problematic and, like their birthday, is code initials on the same problem when you glue them all together now I actually have like everybody's birthday or not everybody's, but a lot of people's income individually. So um we have to fix both these problems or we're not going to be out of the woods.

D

Okay, so um the sort of natural thing everybody thinks to do at this point is say well what, if we just collected these, these are information without any identifiers.

D

uh So, practically speaking, um you know, there's like the sort of dumb way to do this, which you say well, we'll extract the identifiers on our side, and you can just promise us. We don't do that. um People do that and likes better than nothing.

D

But um you know, uh technical controls are better than policy controls, and so the the better thing to do is to strip identifiers on the client side, which is a client side, we'll strip out like on a web browser throughout cookies and email address and stuff like that, um and then you strip out, but that still leaves you with a lot of networking metadata. You ship that out with a proxy, so um you've got some proxy which is like not associated or at least not colluding with the collector.

D

The data collector and you end and encrypt the report to the data collector. So the proxy can't see the data and the identifying information and then the proxy strips off like the metadata address. um So the idea here is that the data is never concurrently identified and available. um It's either encrypted or unidentifiable at the same time. um So there's multiple technical ways to do this.

D

They're, like connection logo, proxies, like you know, ipsec 2018 2017 connect mask there's application proxies like ohio, so this is a great technology and a technology which, um uh which is very valuable in number of cases. um Unfortunately, um it is imperfect, um so uh good use cases here are um when you have like semi-sensitive data um so um and you want to boost the privacy.

D

So uh you know, as I said I mentioned browsers telemetry now, but they just they just throw away the ip address, hopefully at the other end, but um which we try to do. But you know you don't know it's happening, um also: um uh individualized values um where you don't need to dig into it.

D

I'll get this in a minute um and freeform data like json blobs, which is like hard to like do like crypto on um like calculate crypto, on um also for anything by the way, which is an answer um because um the multivariate computation stuff I'm going to talk about a minute. It does not work properly. You need to answer. It's one way only it's like dns requests and safe browsing queries and stuff like that. Do well with proxies and do I have like bad kids, yeah, good, bad use cases fantastic.

D

um um So so so there's a number of cases where this is really useful. um Unfortunately, this case is where it falls down to um so one place where it falls down is, if you have like, what's called high dimensionality data, um so um say I have data where I have a lot of variables that I want to like look at their relationship with them.

D

um So going back to my income height example, for instance, um uh also the case when you want like subgroups, so you say like look, I want to look at only people with this, this nationality um or if I want to do correlation regression, any kind of status, processing, and the reason for this is that, as I was saying earlier, the more you glue together these hot these, these, like um low individually low sensitivity but um high dimensionality data, sets the more you can identify people, and so there's this uh the example I give of you know birthdays whatever.

D

um Where is it perfectly natural? I want to ask: what's the correlation between zip code and income, but um but if I give you and it's probably natural to ask like what's the relationship between birthday and income, but if I like all those things together suddenly I have identifying um and there's this famous um exam other famous example of the netflix data set where nariana demonstrated that you could like look at very, very small information people, netflix viewing histories and figure out how they work.

D

um So the problem is that, if you're just going to like blanket anonymize, you need to break the data apart. You need to take each value and send it separately unlinkably. But then, when you do that, you can't do any of this. That's analysis.

D

um It's also not very good for heavy hitters, um because um if you want to, if you want to know, um um if you want to know only the top end values, um then um in order to actually in order to actually figure out which one's the top end, you report them all. And so you say: look you know what I'd like to do is not see the you know: google docs instead of their home page, but actually what happens is that they always send the server and you stack rank them.

D

um So some technique for fixing that is, is really helpful, um and so I I think just not not to foreshadow this too much, but um uh the you know the um uh uh like. That's one thing that star does um that's a second later um it tries to like collect the data, but but not see the ones that are high dimensionality. That's something that's in scope for gpu for for ppm. This is work as well.

D

um So um the good news is like this situation, where cryptography can help us uh there's been quite a bit of work on how to address this situation in the past 10 years, and um we have cryptographic mechanisms for us in this problem. um So the basic technology it gets is called um mpc, multi-party, computation and the idea is, you have two servers and the servers are um non-colluding. What that means is that they interact with each other, but they're not working together to reveal your information.

D

So um the um basic idea is that each client takes this data and it splits it up between two servers and it sends like signed to server and send to server b um and um the servers um basically take all the data from all the clients and they individually aggregate it. But data is still encrypted so they're doing it.

D

It's a kind of homomorphic encryption and they're computing, an aggregate over the encrypted data, and then they share, take their address and share them together and then, when the shares can be reconstructed to produce the actual value. um So uh just to think about where the situation is uh no, the servers individually know who the clients are, but they don't see the clients, individualized values and the um and the collector um gets to see the the aggregates but never gets to see any initial value associated with anybody's data.

D

So, what's really important to the trust model here is what the client's requirement is. Is that the two servers don't collude if the service clue they can compute digital values, and it's like totally over um um the servers and the servers um in order to guarantee this um that they don't accidentally include, is in force various invariants like minimum bachelors and query limits.

D

um This is this: is the hard privacy requirement for the client side? The uh collector's requirement um is actually the doctors don't care if the servers collude, but they care the servers, execute the protocol correctly and either server can distort the results, so um just to just recap that um for the client to be safe, uh both servers have to cheat for the for the um for the cl. um uh Sorry, the unsafe position of the cheat for the um for the collector to be unsafe.

D

uh Only one service cheat but um from the from the privacy perspective. That's okay, um because it because, because once every teacher can't actually break the privacy in variance um this is by the way difficult.

D

um Both these properties are really difficult to verify, um especially collisions which happen with the side channels, um uh but point in time audits are sort of sort of the state-of-the-art um and again we're talking at um you know, trying out data where um sorry, I'm also trying to thought there um it's five in the morning for me um so anyway, this um uh this is not not necessarily straightforward to verify, but um you know generally, the idea is, you can keep piling on more servers and at some point the clients asked.

D

So I really believe all these people are going to cheat.

D

um So I'm going to like very very briefly, let's get some of the photography just to give you a flavor of it. um The uh um the sort of main protocol here uh is something called prio.

D

um I'm sorry the initial the first one of these are like really is viable, um and this is useful for computing, like simple numeric aggregates, um so the basic idea here um and sorry about a little bit math, but um is each client has like an initial value um x of I um uh um so, like you know, say it's like my height or my income is like that and- and I want to give it- and I want to get the I get over that.

D

So um this is like all like high school math, which is like very nice. Actually, it's like elementary school math, so um I generate some random value. um R sub, I um in a finite field p um and I send server one. um Basically, my share minus r sub, I uh modulo p. um uh So again, that's like that's like the elementary school math part of it um and I send cert and I send server two r sub. I um and you should be able to treat yourself pretty easily.

D

If I take x of I minus r sub, I and I add r sub. I then um I get x of I back um so this information, theoretically fine um and each server adds up the shares. um So the server server one takes up all the x sub eyes.

D

Minus the workshop buys and server twos takes all the r supplies right, and so now, if you take all those you add them up, um you know, congratulations addition is commutative, and so you just cross out, basically all the minus all the r sub eyes and the minus r supplies, and you get basically the sub x of s right. So what we've done is we created a situation where neither serves all the data, um but uh but you have the aggregate anyway.

D

um So uh there is. um I say this is also elementary school math. They not only just going off. Part of this is what happens if the clients lie, um and um so the zero knowledge proof indicating the clients didn't say that I you know I the my I member again meters tall.

D

um um The good news is that um even this, like really really kind of like dumb, encoding um or or you know, very limited system, compute enormous number of things so um just to give you a flavor of this arithmetic mean is like easy to compute. Once you have some, you divide some by by cardinality. That's like obvious, um you compute product by working in log space instead of working in uh by working in linear space. um This by the way is like how slide rules work.

D

um If you remember sled rules, um you could be a geometric mean for product in earth. It's the same way as the arithmetic mean you could be variants of standard deviation. The same way um you can do bullion, bore, brilliant or and min max or even ordinarily squares, and just about finding the writing coding for the data. um So that's really really nice because, like you've got the same basic structure and you just basically say: oh now, I'm encoding the data in a new way, and I can give you new things um right.

D

So, as I was saying, we have this problem with bogus data. um There are two kinds of bogus data, um one of which is um plausible, with false data. So I say: look I'm really 175 centimeters tall, but I stay 180 centimeters tall. This is a problem. Any surveying technique has this problem.

D

uh People lie, and you know if you're gonna trust them. You're gonna trust them. um The solution to this. As with any surveying technique, is you live with some of the noisy data and you hope that the lying is unbiased uh by the way height? It probably isn't. People probably said taller than they are um the um and then there's the question of completely ridiculous data. So I say I'm a kilometer tall.

D

Well, I'm not a kilometer tall, you know nobody's kilometer tall um or worse, maybe they say I'm negative one kilometer tall, which is like if you can't see the individual values. What do you know so in the standard system? What you do is you take the data that comes in and you look at the data and you say like well, I'm not going to accept anybody who says they're called at all I'll, just let it out, but obviously prio. You can't do this data is encrypted.

D

So the solution to this- and this is the fancy part- is each submission- comes with a zero knowledge proof of validity. So in advance you say: I'm only going to accept people who say they're between 100 centimeters, tall and 200 centimeters tall, and you know maybe there's someone who's more than 200 centimeters tall, um which I think there is um but we'll just say, look you know we'll call them 200, centimeters it'll be okay um and similarly below 10 meters.

D

um So um the um and the zero knowledge proofs just prove that the proof of the encoding is that this is correctly encoded um and uh which, with math, that I will not attempt to explain that perhaps chris patton can um the uh servers work together to validate the proof because individually, they shouldn't be able always real information. The data and you only aggregate submissions with valid proofs, so there's basically filtering stages that haven't shown.

D

um So you can also collect you, know user interest. With this technology, every user gets a bucket and um sorry every user interest gets a bucket. So if I have 100 user interest, 100 buckets and the user reports the amount of time spent in each bucket, um you know including zeros, by the way. um So if I didn't spend any time on car websites, it still has to say zero. So let's say I wasn't on a car site um and you just pre to sum them up. So it's like really straightforward.

D

You get histograms as beautiful right um and the servers again only the aggregates, not the values which category um uh this is all fine. um As I started on this footnote, if you also report on times squared, you can be standard deviation uh because I really need to compute generation. um So that's all sort of okay, except that um this is all sort of okay, except that it doesn't scale well.

D

So if there's a hundred user interest, I have to report a hundred integers, but there's a million users that report a million integers and the intuition for this is that if there are integers for which I have no value at all- and I just report nothing it's fine. You can certainly conclude. There was no value, but now you know I wasn't interested and you can assume that anything. I did report a number on. I was interested in so you have to report all the value, so it doesn't scale well at all.

D

um So there's this uh um uh um there's this new technology, even newer um called um poplar. I don't know why I still have it called hits here. I thought I replaced all those um uh but same people's prio um that basically works by collecting strings, so um instead, instead of having so you could do the same thing, you basically say each interest is mapped to a string.

D

um You can even do a url um and each client reports the string, that's like most interested in the top five strings, you're most interested in right um and then the servers can join and the servers can generally figure out which ones are most popular. um This is really cool um for like cs people.

D

um The way this works is that you can basically take the set of submissions and um and because, because things are unknown, you got to figure out what things are most popular and so you're able to ask this question: how many strings have prefix p and how many pieces of psp with a zero on the end and versus one on the end right, and so now you have a binary tree and you can find your search on the tree and find the most popular things.

D

um This is like a very clever intuition um and very very powerful schemes lets you collect. You know the most popular urls, for instance, um or, as I say, basically any open end stream.

D

um So you know how would I imagine using this um looks like a real use case um every time that every time the site is broken, the client creates a report for each site which is broken, and then you use on this technology to turn the top sites um and um um and and and so then you go and you try to investigate them right, and so the nice thing is, the servers um collectively only are the most important sites, they don't want to report them and they don't learn like the low cardinality that you don't care about um because, as I said, this is arranged in a binary tree, and so you start- and you say like here, all the reports and then the left side of the tree like or you know, half of them, but at some point you get down and say: there's only 50 reports to the left side of the tree and you're still like way way way at the top of the url, because you know because most of the urls are still you still haven't, seen the x you're only going down one one, but at a time, um okay.

D

So um um the second piece of this puzzle is that um sometimes you want to, as I said, dig into the data right so so I just described like a sort of like uh one-way pre-program kind of measurement, but you can also have measurement where you want to get into the data later.

D

So you say: look, you know, I have a distribution of income, but I really want to look at you know birthdays kind of whatever right and so um the um and so the way you do this is you put the demographic data um in the submission itself and that basically gets carried along? That's unencrypted and that's because that's basically nonsense. If you don't worry about it so much um and then the servers can say: okay, slice, the data only only only ask for the aggregates over um uh you know.

D

I only asked for addresses with a set of birthdays or the set zip codes right. um So this is a powerful technology. um The the challenge, some challenge in that, um if you, if I allow you to make as many queries as you want. Obviously, you can slice it down to individuals, data values, and even if I say that you can only compete, you can only ask for like sets of those more than a thousand.

D

um What I can do is I can create like two sets, one with a given user one without um and then I can pull out. I I's value, um so there's a bunch of possible defenses here um which probably all used together um minimum batch size anti-replay. So you don't can't ask this question the same data multiple times or too many times: um uh randomization noise for differential privacy, those kinds of things.

D

um So I know those are all: let's go for each of these proposals um so, um as I said that there's a main document um here um and the one which is, I believe in the charter specifically, is this: some draft p gp, ew print ppm, which I guess I have their own version number here, which is like a generic module protocol for like any of these mpc flavor schemes, initially implements um uh prio and poplar, um which I I really thought I did search and replacing these clients. But maybe I forgot to rerun them.

D

um um Poplar is the new name for this thing we call hits um um and it's compatible.

D

Basically, um a plugable systems is compatible with, like anything that can be fit in this npc flavor, um and to give you a concrete example of that, the um uh the proposal facebook and mozilla worked out for um in our approval private um attribution, also if it's reasonably waterless framework, um so it's built on top of like web service infrastructure, so it's easy easy to implement with existing stuff, um et cetera, et cetera, the usual thing um and just check very very briefly. You know.

D

Here's like what the architecture looks like, like I said before you have these clients. Each client sends their shares and they have this these back and forth um aggregates competition, and it goes to the collector um and I'm not gonna, bother to like really kind of like go into this at all because, like I know, pat has plenty of material on it, um and so now I have questions. If anybody has any.

A

Hi everyone uh we're running a little bit behind schedule, I'd love to get into the next presentations uh thanks thanks eric for that really uh clear and educational introduction. um If anybody has burning questions uh feel free to jump into the queue now, but uh but but also, let's, uh let's start getting ready for our next presentation.

A

Okay, chris, we see your face. We don't see your slides.

E

Oh, I requested to share them. Yes, I approved that. Okay, let's try this way.

E

Oh, oh! I guess I can um share them from this.

E

Upload flow share, so I can't do from google docs, but we can do the uh cool.

F

Everyone can see that right, it's working cool and I have full control. I am ready to go. Okay, yeah! uh Thank you, ecker. That was a that was a wonderful introduction to the the crazy, the crazy uh world we just dove into with ppm okay. So I'm going to start off um the the next. The next few talks are kind of describing uh uh of some open issues um that have come up uh over the last several months that we've been kicking around this.

F

uh This draft that uh we're talking about, um and so I'm gonna. I I have the pleasure of talking about one of these one of these open questions and it has to do with how um clients upload information upload their their reports um to to the um to the aggregators the uh the the two servers that ecker talked about.

F

Okay, so I wanted to give a quick overview of how we think about the architecture of the system and we really think of ppm as being three sub-protocols that are executed simultaneously. So the upload flow uh is clients push their reports. These are input these. This is the the input shares um the the secret shares of their input um to the leader, and these are encrypted under the um the public keys of each of the aggregators.

F

So the um the leader doesn't end up seeing them in the clear um and then there's the aggregate flow, which is when, where the leader, the aggregators interact with one another, in order to to um verify the validity of the inputs they're consuming as well as aggregate them and compute uh compute shares of the uh of the aggregate results.

D

Yeah, can I just cut in for one second um chris. I.

F

Think it'd be really helpful.

D

um If you, if you took two minutes and covered the material, I that I foolishly thought you were going to cover, um um which is basically just briefly describing how the architecture works in a slightly more general sense. um um You know you're only showing um um you know uh um I mean. Are you really sure like like like? I know this material but make sure you you cover like exactly how people think about the whole system as a whole um that be useful.

F

I I mean, I think that's what this slides about, but maybe, but maybe for something else.

D

um Maybe I am I'm sorry, I it's very early in the morning for me as well.

G

D

um uh Yeah, okay: go ahead! I'm sorry! I I I didn't mean you're after I want to. I should make sure people are situated before you ask people to make decisions. I make sure people are situated.

F

Okay, okay, yeah yeah: that's that's what I'm attempting to do here. um So, if anything's not clear at the end of the slide, people please get in the queue and ask questions um because we're gonna we're gonna, bring up the pros and cons of two different approaches here: okay, so um yeah, so the upload uh clients upload their reports to the the leader, uh the their uh their.

F

uh The report contains the input, shares and they're encrypted to uh to the public keys of the leader and helper, um and that's so that the leader doesn't ever see the input shares in the clear in the aggregate flow. This is where the input validation that ecker described, happens um and aggregation of the input shares um and then finally, the collect flow is there's.

F

um There's the data collector, we think of uh as as uh a different entity that um interacts with the leader in order to get um the the the uh final the the aggregate result, and so it's it's asking a leader for the encrypt uh aggregate shares. Okay, um so does anybody have questions about the architecture?

F

I guess is the next question I'll look at the chat, real, quick.

F

Okay, I'm going to assume okay go ahead. Penguin.

B

I have a question what what, if the leader and and the helper than the helper and they compare with each other, is there a possibility.

F

Yeah, so I mean the the what, as ecker pointed out, um something that we're going to make be. Assuming um is that they're uh that the the leader and helper are not are not colluding, um and we can discuss like um I think for now. I think we should take that as gran. Take that as granted, but um we can discuss like I think the working group uh uh might might want to discuss ways of actually verifying or uh uh non-collusion um ecker go ahead.

F

D

I I have a leading question, which is: um uh um how do the, uh um uh how do the leader and helper get coordinated so they're doing the same thing um and know which measurements are being collected and what anything individually means and like those kinds of things right, um do you mean how they're.

F

F

Okay, um yeah, I think that uh um the uh yeah, so the uh I guess what's missing here- is that um the what's being what what the ppm's protocol is meant to specify so uh there's in the cfrg we're working on working on a a document that describes uh the underlying crypto bits like soprio and and poplar, and other instantiations of like a thing of the multi-party computation step, um and so the the the all of the parties have to be configured to execute the same.

F

uh What we call a verifiable distributed, aggregation function, vdf um I was hoping to talk about tasks for 30 seconds: okay, um okay, so, okay, um so uh a task is um sorry. It's also early in the morning. For me,.

F

When, when, uh when a collector okay, so when, uh when when a client is uploading reports, it's it's, it needs to know where to send reports, uh how to generate them, uh and this is determined by what we call a ppm task and task, and the task is supposed to specify all of the things that all the parties need to agree on. In order to to do to uh do the computation, I'm not sure what else there is to say about that.

A

Let's move on to the next question.

F

Mariana you're in the queue.

H

Do you hear me? Yes, okay, sorry, this resets all the time, um so my question is related to are you envisioning to allow any options of this architecture in the.

I

H

That maybe there are applications where the party collecting the data. The reports is different from any of the aggregators right. There might be a reason that it's easy to have a party online that is different from the aggregators that is collecting all the reports and an example of such a system is the the deployment that we have in exposure notifications where our collectors, our ingestion servers, are neither of the aggregators, and I can also imagine other applications where maybe you want to avoid the additional communication cost of the leader sending to the helper reports.

F

H

So that's what that means.

F

Yeah, so the sorry to interrupt um that's what this this presentation is about is an alternative architecture. That's been proposed um and uh we're weighing like the the the goal is uh in my talk is to weigh the pros and cons of of the two different approaches um and to your first question um yeah I mean, I think I think um I think that that's totally on the table, but it's uh it's something that also needs to be discussed. So what I'm?

F

What I'm the perspective I'm coming from is from the current draft um and uh which, uh which I think I I guess we could have um spent more time uh describing at a high level um but uh yeah. So uh that's that's totally on the table, but it would need to be in the draft.

F

Okay. Thank you. Any other questions.

F

Okay, um I unfortunately oh go ahead. Daniel.

J

Yeah um we in the previous slide deck we're talking about aggregators and.

B

J

Slide we're not seeing there's a term aggregation, but.

B

J

Mention who aggregators are so maybe.

F

J

F

uh I yeah I apologize for that. um The leader and helper are both aggregators, so there's three different kinds of roles: there's the client uh and the the client, the an aggregator and the collector, the leader and helper are both due different types of of aggregator, and the only difference is that um the leader is kind of holding the state of the aggregation flow. um I thought I I was hoping that would be clear on on eckerd's slides, but I guess not I apologize for that. um I should have uh labeled this better.

J

That makes sense, so the blue squares are aggregators, yes got it thanks.

F

Okay, um I will go through this very quickly. I want to be sure to save time for everybody else, so um this this is what we have today. This leader upload flow, where um clients send their both encrypted input, shares to the leader um and um an alternative flow, as mariana suggested is uh we could instead have the client send a share of its report to each of the aggregators, so um each each uh each uh each of these report shares would just have the encrypted input share for that for that for that aggregator.

F

um So I think that in some sense the split upload model is a little bit more natural and as we and as mariana mentioned, this is what's already been uh deployed uh for, uh if for for an earlier iteration of prio, um so why so? Why did we do it this way?

F

um Why do we do it this way? So uh I think the main motivation was we wanted to. We wanted to make. um We wanted to make it as cheap as possible to stand up the helper aggregator.

F

um So, with the current architecture, uh only the leader has to have has to be able to handle a lot of bandwidth um and has in is exposed like as like a normal web service to uh to uh to um to client traffic, um and the aggregation flow has less bandwidth because, uh first of all, it's sending it doesn't have to send both encrypted input shares to its peer. So we save a little bit in bandwidth um and also the throttle the leader can throttle traffic if it needs to.

F

If the helpers falling behind or if it's, if it's falling behind um it, doesn't have to be totally online. The way the leader has to in the uh upload flow, and then the collect flow is just like there's this. uh The collectors is just getting aggregate results which is fairly cheap. To do another motivation is that there's kind of a race condition in split mode where um the leader is receiving report shares and can initiate the aggregation flow at any time um and then and then, when the helper gets it helper receives its report shift.

F

So if the helper receives its report shared first before uh the leader uh begins. Well: okay, sorry, if the helper uh doesn't receive the report share before uh the leader receives its report, sharon begins aggregation, um then we'll we might have to like drop that report um and there's ways to fix this uh in the protocol um and then the third, the and then the third point is that uh in the split upload uh world um the upload flow is uh is more likely to fail, because um there are two.

F

The client is making two http requests to two different servers, rather than just one to one server.

F

um The big downside to the leader upload model, as mariana suggested, is that uh the aggregation flow has higher than necessary bandwidth, because, basically, we're uh we're sending uh the client could have sent the report share instead of the leader- and this is a significant problem for poplar, because uh input shares are big and um there's we're going to run the aggregation flow several times on the same set of reports.

F

In order to to finish the the computation of the of heavy hitters um and and so higher bandwidth between the leader and helper means higher egress cloud egress costs between like cloud providers, which is uh an important consideration for us.

F

um So I think we have mainly two options, um which is what I'd like to discuss, um but I I guess we should maybe just take it to the list, because we're we're running low on time. um We can stick with the leader upload model and try to mitigate its downside. That is, uh you know, don't require re-transmitting.

F

The report shares in the in the protocol um and there's a question of whether this is like enough to reduce the egress costs or we can take, or we can consider uh adopting the split upload flow um and leave and uh base yeah. I don't have much time to talk about this. um There's there's options for uh it's, it's possible that we can um mitigate the uh the downsides by uh by but but kind of leaving it up to the deployment.

F

So uh there's this there's been the suggestion of putting an adjuster between the client and the lead and the leader in the helper, which can coordinate transmission of report shares and um solve some of the uh the coordination problems. um Then there's this question that eckerd brought up on the list, which is uh you know, is uh in what sense is uh the ingester trusted or untrusted? um So that's my time. um I'll I'll uh kick it back to then.

A

Okay, uh thank you chris. um I want to make sure that we get through all the presentations on this topic. We did build in time for questions after the three discussions on this draft, so if folks have have really high priority brief questions jump in, but but also, let's get set up for the next set of slides.

A

Oops uh eric.

D

I I just want to make sure, like I seen a bunch of chat budget budget discussion on the chat. um I'm sorry, he says, make sure people like understand what that's at stake here, um which is um primarily what's the stake here.

D

Is um you know whether or not the role of distributing the data to the um so the the the role the leader has to perform is orchestrating the computation and the role that is instead of performing and distributing the data to um to the help developers right and so at stake here is whether we should separate out those rules and have the role distribute between the data helpers be done separately from the um and directly in this case, from the role of orchestrated computation. So that's for the stake here. It's not.

B

D

It's not it's not a security question. It's really! um It's like largely a um it's a largely operational question.

J

I think the other comment in the chat was that, if it's possible to describe the system where the role according to the computation does not require participating in the competition that might also make it cleaner right, you could say there's the coordinator and the coordinator may also play a role as one of the participants in the computation, um but I don't understand the the actual mechanics of the computation well enough to know whether those roles could be split out. I think that's the thing that was.

J

Jonathan lennox's recommendation.

F

um I think that there is, I think, that's an interesting suggestion. um I think uh I mean you could like. I think you could imagine like like it would be. I think it would be nice to cleanly separate separate those things. um I don't know if it's like that easy uh at least to specify um but uh yeah. I think I think it's uh there's a lot to discuss here um and I think um hopefully we can have a have a good discussion about this on on the list.

D

Just just to answer that question a little bit just sorry just answer that question a little bit. I think the challenge is that, from the perspective of running the computation, it's actually quite straightforward to um specify, as if the leader like we're, not one of the elevators, but in order to actually do the computation, um because, like you're, really just sending messages, saying compute this compute this. But in order to actually do the computation you have to know which shares are available um in order to describe them to each side. So they can be aggregated.

D

And so we have to invent some mechanism by which the leader learned, which shares were available um in order to computation and that'd, be like new protocol mechanics or we have to like just say it's magic um and um and so that that I'm sort of looked to describe, there's no protocol mechanics if no one's actually gonna implement it. That way, but perhaps you could just say it was mad. It was a magic channel.

A

Okay, let's move: let's move to chris wood.

K

All right, um good morning, everyone, uh this is going to follow in the heels of um some of the comments and questions that came up during chris's presentation, in particular um uh how the collect sort of piece of the protocol works um and what sort of requirements you need to have in place in order to ensure that the the resulting ppm protocol has the desired privacy properties that you want. The echo kind of alluded to in his overview.

K

um It's going to get a bit technical in terms of, I guess how the particular collect flow works for the current instantiation of ppm. um So if you have any questions along the way feel free to chime in or pop in the queue or whatever and I'll try to answer them um so at the highest of levels. This is how you've seen this diagram in different shapes and forms before, but this is how the collect flow basically works.

K

um You have a collector that interacts with the the leader for the purposes of like initiating a collection which either in turn, will trigger some aggregation to happen between the different aggregators um or aggregation will happen, preemptively, depending on what type of vdf is being used.

K

Once aggregation is done, the leader will produce basically an aggregate share or each aggregator approves an aggregate share, um and then the collector will eventually query the leader for this for these aggregate shares um and then combine them together to get the the aggregate result. um uh Each aggregator also maintains uh during the aggregation.

K

uh I guess flow of this in this particular interaction. uh All of the individual reports that were submitted by clients, um uh the the individual reports themselves, as well as as well as also sort of the aggregates that are accumulated uh as the aggregation happens such that it can present them to the collector um and the requirements here. Are you know how to um in what ways can the the collector ask the the aggregators for different aggregate outputs um such that the the the privacy requirement um is, is satisfied and by the privacy requirement?

K

What we mean basically, is that the the aggregation output ensures that there's some minimum number of reports that went into it, where a minimum number of reports is something that's configured as part of a measurement task in the system. So all of the entities that are participating for a particular measurement agree on what this this min batch size is.

K

This minimum this minimum threshold is and the requirement is that, no matter how how the collector chooses to query and interact with the aggregators it cannot produce or derive in any in any way um a a an aggregate that was uh based on fewer reports than this that's risk threshold.

K

um There's also, of course, a correctness requirement, which is that aggregation that is either triggered by collection um or happens before collection, actually that the collect request actually comes in includes basically the same reports. So if a client, uh uh you know, if there's uh n reports from n different clients um uh or n shares of n reports from n different clients, those are all included in the same aggregate, um and you know we don't have like one aggregator uh aggregating.

K

Some set of report shares and another aggregator aggregating, a different set of report shares because the output would be garbage. um So these are these are kind of two informal goals of collection um and try to walk through. Why currently bpm does not satisfy the latter goal and ask some questions that hopefully get us towards thinking about how it might satisfy this later down the road.

K

So during aggregation, aggregators kind of keep track of individual report shares in batches and batches are divided over time, where the the length of time is some some parameter. We call min batch duration, so might be a day might be a week or an hour. Whatever um a report is tagged with a particular uh time stamp effectively, and that puts it in one of these. These time, windows and uh collect requests when they're when they are issued by the collector, indicate the the time window over which the collector wants uh collection to occur.

K

um Currently, the the collect request, uh the parameters, the the correct, collect request, in particular, the time parameters must align with batch window boundaries, um so they must align like uh on the picture here on like uh t minus one or a t or t, plus one or whatever, um and uh and um that that's effectively kind of the only constraint.

K

um The the current validation for collect requests is basically uh it's composed of two steps, the first of which is to um a check to see that the the time parameter parameters do indeed align on these batch window boundaries um and the second of which is to make sure that for the given boundary or for the given time window specified by a single collect request, independent of any previous collect requests.

K

The number of reports that are covered in that particular time window exceeds the minimum threshold. Size is min batch size.

K

So, as an example, imagine you had a collect request um for the window of t minus one to t. If you were to look at the two criteria here, um it's indeed a valid window.

B

K

Aligns on the boundaries um and totally arbitrary, but like tried to draw the picture such that it's like yes, this is a indeed a valid size. um There's enough reports, um there's not too few many reports that go in this particular window, so both both both criteria are met and in this collect request would be satisfied.

K

Similarly, you could ask for this collect request again time. Window is valid because it aligns on the batch window. Boundaries and the size is also valid, because it covers um simply more reports than the previous collect request. Martin, I just noticed you're in the queue I don't know when you joined the queue, but uh is it a clarifying question.

L

Yeah, um so how does the? How does the collector know how many events are in this in this window when it goes to make the query?

L

How does the collector know so, if the collector's going to ask for say, t, plus one and t plus two which have what appears to be a total of one uh sample, submitted, uh that's not going to meet your minimum balance, that's right!

L

So how does it know not to do that? It doesn't wonderful! Okay, thanks.

K

um Right, I mean the collector doesn't know how many things go in the report until it asks or how many things are inaccurate until it asks I guess, and it gets an output from the system uh indicating that there were indeed enough things in that particular window, or no, that there were not enough things in the window, so an aggregate could not be produced, but anyways. um If you look at these two collect requests independently, they seem valid.

K

They check both criteria. Unfortunately, however, um it's pretty trivial, uh if you sort of only validate that these collect requests are valid in isolation. It's trivial for a malicious collector to basically use the output to uh uh compute an aggregate that is composed of less than the threshold.

K

um If you think of these, as just like uh that they collect requests, yielding like a set of reports, um then you can just like compute the set difference between the the first collect request or the output from the first collect request and the output from the second collect request, and in this particular example that set difference would yield exactly one record um or one report which is unique to some client breaking the informal privacy goal that we had at the beginning.

K

So this is clearly a problem and something we need to fix. um So it's kind of an open issue in the draft right now.

K

um Another thing to consider in the collect request flow and the collector in general is that people may want to collect or issue queries for parameters that are not only constrained in time, but also, let's say in space.

K

So, for example, you might want to say give me the aggregate for this particular time window that came from all clients that have the specific user agent string, or you might want to say, give me all the aggregates in this time window for reports that came from this particular geographic region. To allow you to sort of drill down into errors, um if, like, for example, these are like measurements, you're collecting from the perspective of what web browser um and currently the.

I

K

Request is only parameterized in time and has no notion of space, and indeed like the the protocol itself is completely unaware of this, like additional, like metadata, so to speak.

K

um So, given these two things, um uh the the the the desire to sort of maintain this, um uh this informal privacy goal stated above our previously, as well as um as well as the uh potential flexibility you might want. You might want in time and space.

K

um I think the the real privacy goal can kind of be stated as such at the top, um and that is like any any sequence of collect requests um with any time and space constraint parameters.

K

um Given any of this any any particular sequence, the collector must not be able to produce deduce compute, whatever uh an aggregate that is composed of less than um the threshold number of report shares, um and uh this is it's fairly. It would be fairly easy to sort of enforce this rule. um If the protocol was aware of the sort of space dimension, um uh how that's actually done mechanically, like how reports are tagged in space like they are in time, is sort of an open question.

K

Do we want to even do that as an open question, but I think I think we like there. We understand fairly well how we would implement this. um It's just a question of how this how these different uh constraints and parameters are expressed in the collect request.

K

uh Sam, do you have a question.

M

Sorry not as chair tell me about how you deal with civil attacks here. How do you keep the collector itself from being some civil contributor.

K

uh Civil attacks are a different issue entirely. I think- um and I'm not trying to address them in this particular this particular uh discussion.

K

um We do have to have like accommodations for civil attacks like either clients themselves, introducing random reports or leaders themselves introducing random reports, um but that's separate from how the how the collector queries for things and tries to produce or tries to violate the privacy requirements.

M

I I'm I'm not convinced this to being separate, but I'll wait to hear this later.

K

Okay, um so uh the questions I have for the group- um basically uh I first is- is the validation problem. Clear um in particular, is uh as um uh like. The current issue in the draft I have described it.

K

Is it clear and understandable to folks um is the sort of informal privacy requirement um uh also clear, uh and if so, um how do we want to sort of augment the protocol, if at all, to accommodate queries to or allow people to query on the basis of time, as well as potentially on the basis of space or, like you might imagine, just simply relying on the fact that the aggregation protocol itself will always yield or the output of the aggregation protocol will always ensure that both aggregators agree on the same um the same reports that went into a particular aggregate, so you may not need to specify in in full detail, for example, how uh how collection requests are.

K

Exactly how the validation criteria is is to be enforced. You might just rely on the fact that the aggregate protocol sort of enforces that, for you um uh and there's probably other questions as well. um Stephen.

I

Hi steven farrell, uh so chris, you talked about time and space in privacy pass yesterday and those are a different space.

K

uh Sorry, I just heard privacy pass, it was very quiet.

I

Sorry so so yesterday in privacy past the phrase you know that the concept of time and space was used, but in a different way right.

K

I

I I wasn't sure I liked it then, but I definitely don't like it here. I think, because what you're saying what you're saying about space here is really anything you can query on or something is it what it's not a single dimension.

K

um Yeah I get I mean I I tend to think of it as like. Just a bit string, um you know the bit string might be like you might might encode like the user agent and bitstream might include like geographic data in the bitstream, but um I guess depends on your definition of what space is. um I think, going back to the examples is probably the easiest way to think of space.

K

So, like you, you know, imagine like the the space constraints being uh a fixed set of user agents um and each report being associated with a particular user agent or something.

I

Yeah, okay. So so I guess I'd like to understand that better, because being able to query on the ua string implies that the ua string is visible to all these entities right.

K

That's right, I think, that's that's a necessary requirement if you want to and like enforce the privacy requirement on the basis. uh If you want to allow people to query based on these additional parameters, then I think those parameters need to be visible to the people who enforce the the validation uh criteria.

I

Right, okay, so I think I'd I'd like to understand how we define space, in other words, what we make visible to all these entities.

H

I

H

I

Over the development of the protocol, I'm not asking we do understand it right now,.

K

I

K

Think that's a good point um and I think this goes to how much uh sort of flexibility query flexibility you want in the collect flow, um uh whether or not you you want to allow that like any of this additional data to be expressed and and as a result, be sort of exposed to the aggregators um or not um the utility of the collection sort of depends on you know what information is available.

A

Thanks uh in the interest of time, I think we should move to tim's presentation.

N

Thank you. uh Okay. I'm gonna attempt fate and try to share pdf here.

N

All right cool that works, okay,.

N

All right look, let's get into it, uh so bear with me a number of these. uh I think a lot of our material today was uh written under the assumption that the folks have a lot of familiarity with the draft, so uh yeah bear with me if a lot of this gets too technical or uh too into detail about the state of draft one okay.

N

So we're going to talk now about uh the the current officer after the specification that we've been working on as well as some work uh that a few of us have been doing. That goes uh beyond that draft cool all right.

N

So, in anticipation of this week's meeting, we submitted a new author's draft of the ppm specification, which is draft01, which contains quite a lot of changes relative to graph zero zero that was submitted back at ietf112 for the for the buff, we're going to cover a number of those changes as we work through this deck. But first, let's talk about the status of this current draft so to briskly recap: um this draft specifies ppm, which is a framework a protocol framework for privately computing aggregate functions.

N

It's based on prio, but we've generalized from there, so that it can work with any instantiation of a verifiable distributed. Aggregation function, which is a specification that's being discussed in the cfrg ppm, is designed to coordinate the execution of vdafs across multiple non-colluding aggregators, and the sharing of inputs across both servers is how ppm deployments provide meaningful privacy to users.

N

uh So azure explained to us earlier. This targets a variety of motivating use cases ranging from simple statistics to the heavy hitters problem.

N

All right. uh We also uh covered earlier very briskly how ppm is composed of three different uh sub protocols that execute simultaneously. uh The upload flow is for clients to um secret share their their input and upload them to aggregators.

N

uh The aggregation flow is where the leader and helper uh interact to verify the validity of inputs and then aggregate the reports and compute aggregate shares. And, finally, the collect flow, uh which chris would just give us some insight into is how the collector gets uh aggregate shares from the leader to get final results.

N

So um we know from some experience that draft 001 is almost fully implementable, at least like the happy path of uploading reports and computing aggregates is implementable and we feel it satisfies the the key deliverables defined in the ppm working groups charters.

N

So in particular, we have mechanisms for client submission of measurements, uh for the the joint evaluation of proofs of validity of those measurements and for the computation of aggregates to be delivered to some recipient, and all of these are defined in a way that makes them flexible enough to accommodate uh multiple underlying algorithms through the vdaf abstraction.

N

um So we're going to discuss later, I think adoption of this draft, uh but I suppose we I want to say at the stage that well we think that this draft is good enough to be adopted by uh the working group. Okay, on more on that later, I suppose uh moving on so going past draft zero one. um We at isrg and uh our colleague with the club research team, we've been working on a set of proposed changes that we're calling them interoperability target.

N

So our teams have been independently implementing the specification and our objective is to get our aggregator implementations to talk to each other in an experimental setup.

N

um So we've been taking the things that we learn while we write code and uh we're integrating them into a set of specification, changes uh that we're calling an in-drop target, which is um a set which excuse me, uh which is meant to be an iteration on draft zero, one that we think can actually be implemented.

N

So uh our goals here are to run a deployment of ppm using the prio three based vdas, since the popular one, uh vdaf isn't uh quite ready yet so bizarre to let us hammer out a bunch of interesting protocol, edge cases and some important error handling scenarios and we're hoping to learn a lot about which parts of specification are difficult, either to implement or operationally.

N

And what we'll learn can then be fed back into uh discussions in the working group and has proposed changes to the specification okay. So, let's look at where we're at today with development. um So the most substantial piece of progress that we can report is that the specification of the 303 vdas is has matured quite a bit. So you can take a look at that in draft zero.

N

One of the vdif document, which was presented to cfrg, I believe yesterday by chris patton and his co-authors, so we also have a complete implementation of that draft of 303 vdafs, along with test vectors for them in librio rs, uh which is going to be used in both the cloudflare and icerg ppm implementations and hopefully others in the future. So of course you can find those documents and implementation up on github or on the data tracker at these links.

N

um So we I srg, also currently have a toy implementation of ppm somewhere around draft01. It's missing a number of important protocol features, and it's really just a toy that couldn't actually be deployed onto the internet. uh It can only talk to itself not to any other implementations, and it has no persistent story, but it does demonstrate that the happy path of the protocol can be implemented at end to end.

N

That also is up on github if you're interested uh and finally, both uh cloudflare and isrg, we're working on actual deployable, implementations of the evolving interop target.

N

Ours is up on github and cloudflares should be open, source and public in the near future.

N

Okay, now, let's turn to looking at like what what proposed changes are we proposing in the interoperable protocol? So far? um So let's talk about the aggregate phase, so, as we discussed earlier right, ppm is made up of three sub protocols. um Upload aggregate and collect um the meat of the complexity lives in the aggregate flow, uh and it's consists of the coordinated execution of a vdaf across the two aggregators.

N

We also refer to this frequently as preparing inputs and what that means is taking input, shares and transforming them into output chairs that can be summed into aggregates. uh So what that means depends on the particular vdaf. It could just mean uh evaluating the the validity proof or in some other vdfs there might be some more significant transformation um all right, so I spelled out currently in draft zero one. uh The specification lacks sufficient detail to really be implemented.

N

uh So one of the things that the interop target does is to be more detailed about how to detect and handle disagreements between the aggregators um we've also updated ppm to use the current verbs and message types are defined in draft zero. One of the vdif specification.

N

So, above the vdaf level, we uh we also relative to draft zero zero. Excuse me: relative drop, zero one have eliminated what was called the helper state blob. So in back and draft zero zero, we had the goal of having no storage requirements for helpers. So the goal there was to foster a diversity of aggregator operator by making it as easy as possible for anyone to run a helper with minimal infrastructure requirements or operational overhead, um but aggregating shares into vdaf is inevitably a stateful process.

N

That's because the coordinated evaluation of the proofs is a multi-round protocol and there's a state that carries over from one uh one round to the next, the vdfs that we envisioned currently so again, that's uh pre the different 303 based ones and popular one are all uh uh two round protocols, but you could have arbitrarily many rounds and bdafs that come along in the future.

N

um Excuse me, where was I okay.

B

N

Our solution back and direct uh in draft zero zero drop zero one was to shift the burden of storage onto the leader by having hold on to the helpers encoded state across the sequence of aggregate protocol requests. So we see this illustrated in the sequence diagram over here on the right. um In the first request, the leader is sending to the helper a uh a sequence of reports of encrypted report chairs, along with some other parameters needed for aggregation uh upon receipt.

N

The helper decrypts his report shares and then runs the vdaf prepare start algorithm, which yields a sequence of prepare messages and some state related to preparation. So the helper would serialize the state and send it to the leader, along with the first round, prepare messages.

N

The leader next combines the the help message that it received with its own and sends a sequence of combined prepare messages for the helper. So that's one prepare message for each report right um and sends to the helper also the the serialized state blob that it received previously. So in this illustration, the second request from the leader uh happens to be sent to a different instance of the helper than the first time around. You might imagine that there are multiple replicas behind a load balancer.

N

uh It looks like this is fine, since all the state is in the serialized state blob. But of course there were several downsides to this design, so the first is uh you're going to spend extra bandwidth, transmitting the state back and forth over multiple rounds of the protocol.

N

Further its contents are secret, which means that the helper implementation has to be responsible for encrypting its serialized state to protect it from the leader uh tampering with it or just seeing it. And of course we have to stop the leader from replaying old states into the helper, so anti-replay means the helper has to store a counter or something like that to prevent state rollbacks, and so we've already failed to meet our goal of no helper storage.

N

uh Finally, this scheme makes it pretty hard to parallelize aggregation aggregate parallelize aggregation, which otherwise ought to be easy to do, since the preparation of one chair should be totally independent from the other, except for at some point, updating an accumulator.

N

um So to see why parallelization is hard, uh we're gonna look at another important change that happened between draft zero, zero and draft zero one which has to do with the protecting against the replay of reports, which is not to be confused with playing state blobs.

N

So in a ppm, a report is uniquely identified by a nonce and a nonce consists of a time stamp along with a random 64-bit integer.

N

So we have to prevent the leader from replying a client report into the helper to rule out attacks that would enable the leader uh to learn something about the client input in draft zero zero. This was solved by defining a total ordering over report nonces and then requiring the leader to send nonsense to the helper in ascending order.

N

The helper would then defend against replays by keeping track of the highest nonsense ever seen and refusing any reports older than that. So this doesn't work, though, if you have multiple helper instances that are working in parallel. uh In this illustration, the leader has carved up the work of aggregating. The k reports that fall into some aggregation into three chunks each meets the requirement of ascending nonsense, but to meet the anti-replay requirements.

N

The helper instances have to share the highest nonce counter, which is sort of shown in this, like ambiguous cloud of storage on the right. um So if the helper 3 happens to do its work first and then commits k as the highest nonce, then harpers, 1 and 2 have to reject all the reports they get, which is obviously bad.

N

So in draft zero one, uh we acknowledged that we had over indexed on the goal of lightweight helpers and we accepted that uh requiring that helpers have a trusted database or some kind of trusted. Storage isn't really all that bad, um especially since, as we discussed uh draft zero, zero required them to have some kind of storage anyway.

N

So in draft zero one, instead of the highest notes ever seen, helpers are required to keep track of all the nonsense.

N

They've ever seen up to some reasonable data retention period so that they can refuse to aggregate a report if they know they've already been included in that aggregation helpers also have to keep track, of which batch intervals they've serviced a collect request for so they can refuse reports uh new reports that fall into those intervals uh to mitigate some of the attacks that were talked about earlier by chris wood, the drop zero one still has the helper state blob uh and the attended anti-replay counters.

N

Since we already accepted uh non-trivial helper storage requirements, we decided in the interrupt target to take the next step and do away with helper state all together.

N

So instead we require helpers to store their own intermediate state, um but to preserve the nice property that uh you know different rounds of the of the prepared protocol don't have to be serviced by the same helper.

N

We introduced and said the concept of an aggregation job id which is assigned by a leader when it constructs aggregation, requests and can be used later by helper to look up the state associated with the preparation of a set of shares. But, unlike the old helper state, the job ids aren't secret and they don't require any replay attack. Mitigations.

N

Okay, um I think in the interest of time and we're gonna, I'm gonna skip over this and okay jump ahead to uh the topic of gracefully recovering from input preparation failures. All right. This is a this- is a problem that we learned quite a lot about uh while operating the exposure notifications, private analytics system over the last couple of years.

N

um So first, let's recall some of the math about uh how aggregations over the secret share of work. So suppose we have n value where each is sharded into one chair for an aggregator, a and the other for aggregator b such that they sum back up to the original value modulo some prime number p.

N

um So we compute aggregates by having each aggregator sum their sequence of shares, then the sum of those sums is congruent to the sum over the original inputs and again all under modulo, p, okay, but for some large n for some large number of reports n. We expect that errors will occur such that there will be cases where one aggregator happens to accept a different set of shares uh than the other.

N

So, if 10 shares out of a million get rejected, we still would like to be able to aggregate over the other 999 990 shares, because that's still a lot of good data, but we have to make sure that both aggregators are using the same set of shares so recall that um for each share v sub, I a that is to say, like the I um you know, uh report with the iaf report and the share.

N

The share of that is report for aggregator a and v sub ib is effectively a random value in the field defined by p, and a the aggregate result is also a value in that field. So, let's suppose that aggregator b drops its share of the kth report.

N

The resulting aggregate is not just off by some well-defined epsilon or you know, error bars. You can reason about it's going to be complete, random, garbage and worse, because that random garbage is still a valid field element. You can't detect this corruption, except for by heuristic means.

N

Like say you know, you're measuring something like how many times did a user click a button, and you get some of like absurdly huge number, um so in the scope of the uh interrupt target, uh what we are aiming to do is to have each aggregator include some data in aggregate chairs that will allow us to detect this kind of problem and measure how bad it is.

N

Specifically. The leader in helper will exchange accounts. Reports are in an aggregation, as well as to check some of the included report, nonsense, which is computed by xoring, the sha-256 hashes of each report's nots.

N

So this hashing scheme is unlikely to be adequate for any real security purpose, but we chose this because it doesn't depend on a consistent hashing order across aggregators, and since this measure is intended to detect implementation or design flaws in an experimental setting, rather than to provide real resilience against malicious clients or servers and production deployments, it's okay, if it's security properties, aren't all that strong.

N

All right, so that's the highlights some of the highlights of what we've been batting around as we work towards an interoperability deployment. So, to reiterate, our goal here is not to our goal here. Excuse me is to operate experimental deployment and then come out of that with um with some learnings some experience and some data that will allow us to start some discussions in the working group uh to then you know, propose some changes to the protocol and that, hopefully, will let us answer. Oh sorry, benjamin.

N

No, no finish your sentence: okay, yeah, I'm just wrapping up last slide. uh So the interesting questions that we're hoping people to discuss going forward with uh what we learn are um so for which interactions and to what extent, should the ppm protocol specify authentication for uh for requested messages, or you know, instead of doing that, maybe we should be specifying transport uh security requirements and letting deployments choose for themselves how to meet them. um Also, like there's, there's some places where the protocol introduces shared secret parameters between actors, particularly between the aggregators.

N

How are we going to go about negotiating those and particularly rotating those and then operationally? What is the life cycle of reports or for the state associated with their processing? When is it acceptable for one of the participating servers to discard old data and to make that clear? Might we need an explicit commit phase during the preparation protocol, such that both aggregators can have high confidence that they're aggregating over the same set of shares?

N

Okay, that's all I got thanks. Everybody.

A

Thank you tim um in the interest of time. I don't think that we should attempt to discuss or answer these questions here. Obviously these are important and deep questions and and we'll have to deal with them over the next weeks and months as working group.

A

I want to make sure that we have time for the dss star, presentation and and broader discussion after that.

A

So if there are uh very brief, clarifying questions feel free to get in the queue, but otherwise, let's, uh let's bring up the dss star presentation, I believe, alex davidson, is presenting.

A

L

O

Cool, can you hear me okay,.

O

um Yeah yeah, okay, cool, um okay, cool yeah, so uh I'm alex I'm gonna be talking about star, which is, as has been a reference previously like an alternative uh protocol, uh sort of idea that could potentially fit into the ppm framework um and yeah so I'll, just get straight to it. So uh star is kind of it's very similar to this like popular one uh approach that has been mentioned so it.

O

The idea is that uh we want to come up with a system that we can find like heavy, hitting like arbitrary data and specifically when we're building star, um we essentially want to provide canned anonymity for clients to provide sort of arbitrary data forms. So the idea is that a number of clients would send data and then, when you've received k, reports all containing the same data point, the aggregating server will be able to reveal them.

O

So the the reason that star exists is you know, as we were designing the system, we were kind of looking at these heavy hitting protocols and they were like. There was a number of issues that that made it difficult for us to deploy them so, uh especially popular.

O

um If we just focus on that, as seen as the functionality is quite similar, is that it's quite expensive to run, and also these like having to have this aggregation process between like multiple non-including services or something that was quite difficult for us um to get off the ground during the aggregation phase.

O

So so I was kind of an attempt to build like a privacy, preserving system that allows you to reveal these like heavy, hitting data points without having to run aggregation uh collaboratively with these like non-polluting elements and also uh you know along the way. Obviously you want to perceive privacy and then also like there was also a goal of trying to use like simple cryptographic, primitives and techniques where possible, rather than introducing like novel uh mechanisms for running this aggregation.

O

So the idea behind star, hopefully, is very simple, so it spent three phases so essentially the first phase is this randomness phase um and the the idea behind that is at this phase. Is that we're trying to uh establish a scenario where different clients non-interactively, can establish secrets of the secret shares using any old threshold secret sharing scheme of the same value? That will be that you'll be able to combine together, so this randomness phase essentially allows the clients to establish correlated randomness, depending on their measurement data point and later on.

O

In the measurement phase, the clients are going to sample um sort of secret shares of that measurement. It's a little bit more complicated but and then send these measurements to the aggregation server and then in the aggregation server. When the um server receives like k of these shares.

O

If, if k is the threshold in the secret sharing scheme, then they'll be able to reveal the measurement, um so the randomness phase can be done in cooperation, either with a random server or like, alternatively, locally uh just derived from the measurement expert that has cycle security issues, and this is something that was already raised by um previous protocol designs like proclo by google.

O

One of the nice things about star is: you can include like additional auxiliary data as well with your measurements, but the the threshold itself is only imposed on the measurement itself and then I've written uh there's this notion of an epoch which I'll also talk about which is uh defined by the randomness server and they're. Randomly serving the aggregations say that here are non-colluding, but the idea is that they don't have to collaborate in the aggregation phase.

O

um So what exactly is happening here? So the idea is that if you sample you can sample randomness locally, using just like a hash function, to write like defined over your measurement space. uh But that's gonna only really work. If you have high entropy measurement distributions- and it's not clear whether uh you know such measurement distributions exist, and so one of the um sort of things that we're introducing with star.

O

Is this like, um like remote way of something running this via an oblivious pseudorandom function which is controlled by the mapping of the server, so clients submit their measurements obliviously and the server evaluates the oblivious.

O

You don't function over them using a secret key which is kind of like tight the epoch, and so this, like randomness, that the client sample would then go just into this like message: construction algorithm, so the clients essentially encrypt their data point along with any auxiliary data and then derive an encryption key from some portion of the randomness that they get. They secret share like this randomness that they use to derive the key and this round. One here is sort of like the correlated aspect where you can.

O

You can like construct shares like consistently without interacting as clients, and then they also introduce a tag which isn't, which is an important note for security, and essentially these tags will, when the clients then just send all these messages to the service, email and the server. Then groups together all the messages with common tags, because these deterministically derived from the measurement and then can recover the measurements themselves. Using this like secret sharing recovery process, cool.

A

So mariana is in the queue.

H

um So I just have a clarifying question. The server here learns all the counts on tags in the clear yes, yeah, that's correct, you're, not we're not protecting accounts here.

O

We're not protecting accounts yeah, so the server can server learns all the subsets of clients that share the same measurement, but doesn't necessarily learn the measurements themselves.

O

O

Cool, so um the security model that we use is obviously comparable with popular one, um so we need non-conclusion of the randomness and aggregation service to um you know. To for this to work for clients, um we consider like a malicious adversary uh server that also may control a subset of clients to kind of like model this, like civil attack propensity and as as, as mariana just pointed out, the messages that encode the same measurements are leaked.

O

So even if you don't reveal the measurement itself that, like the deterministic tag, leaks which um subsets like the messages belong to- and so you can see that, um but the goals here are confidential absolute confidentiality, the measurements are sent by less than or equal to k, clients and like robustness, of the aggregation um cool.

O

So uh I just wanted to talk a little bit about like the civil attack window for stark, as this is uh like the most damaging attack. So essentially we want you, because this deterministic tag is present in the messages it allows the aggregation server potentially to try and learn the message just by trying to like run like an offline dictionary attack uh on the measurement space.

O

So obviously, with the local model, this is absolutely possible, and so one of the reasons that we use this like remote model of sampling randomness, uh is to kind of like shorten this attack window and also move the like offline dictionary attack to something online that has to be um carried out with the in, like as queries to the random the server so yeah like so in doing so, we kind of like make ensure that, um as long as the client messages are sampled in only in this window and then they received afterwards and the the aggregation server's attack propensity is also limited to the tool cool, uh hello.

J

Sorry, can you go back uh slide? I just wanted to check if I'm understanding the sorry one more slide, I'm just gonna wrap my head around this randomly server, um the um I guess the previous slide has looks like the client is sending x the secret value directly to the randomness server themselves.

J

Is that right? It.

O

No, so the x is blinded. Well it. It is in the sense that it's in, like it's input to the opr f, which, by definition like the client's input, is blinded to the random server. Never learns x.

J

Okay, all right! Sorry, I didn't, I didn't know.

O

J

Call on the previous slide, the ladder diagram makes it look like x is being sent directly.

O

Sorry yeah, I um yeah that this diagram is not that helpful for explaining how the random server actually works, but um okay yeah. So this oprf is, as defined in like the oblivious universal functions draft as uh that's coming with the cfrg. So essentially the client's input is blinded beforehand and then, like the server evaluates the function on it and this the client receives the response and then can unblind it and then gets the like. Real value output got.

J

It okay! Thank you. Sorry, sorry, for the the roll back, no.

O

No! No! No! No! Not at all it's a it's a bottom of the slides right, so um cool.

O

Okay, yeah, so just as a quick comparison with poplar one so um in star clients can send arbitrary, auxiliary information with their data point, which may or may not be useful uh um star. As I mentioned star leakage reveals all the sub sort of measurements that hide the same measurement. Even if the threshold is not satisfied, which is obviously very important.

O

uh One of the things uh that um popular one does is it. You know it may even be part of the functionality. Is it reveals all the heavy hitting prefixes of strings and for some of our use cases, we only wanted to reveal the heavy hitting string in its entirety rather than the prefixes itself, um and um one one obviously star.

O

Only during the aggregation phase only requires a single aggregation server, which makes things a lot more cost effective um because you don't use any bandwidth and the the computation itself is very minimal um in terms of how potentially star could fit into the pbm framework or at least how we envision.

O

It is that kind of this, like leader and collector, in this diagram and in star these are kind of the same entity, and there were just no helpers, so clients submit you know via some mechanism, either oh hi or whatever like an anonymizing proxy, or it doesn't even have to be included in the vitamin proxy, but it it massively improves privacy. If you do and submits things to this entity and the entity just performs the aggregation and learns the output.

O

um So we we we, you know, as I mentioned before, star is kind of like a trade-off between trying to reduce the costs, while also trying to maintain some like meaningful privacy guarantees uh so and also not having like noncluding entities work together to perform the aggregation, and so some of these are some of the things that we're trying to emphasize.

O

When we talk about the advantages of star um and also for functionality, we allow auxiliary data which may or may not be useful and simple cryptography in the sense that we don't have to implement uh quite complex new protocols um in order to build the aggregation process, and so just to conclude, um yeah.

O

We think star provides kind of some, for at least for us like provides like a private, preserving reporting mechanism, uh for you know, entities with limited resources and, without you know, expert implementation, knowledge, and we think that some of the trust assumptions are preferable uh to those made by um either prior or popular, and so just to finish.

O

Up like some of the questions we uh wanted to ask is whether the working group is kind of interested in this as a alternative protocol spec, and if it, if it was, uh would this star draft fit into like the working group and also the ppm kind of specification.

A

Thanks, uh okay,.

B

A

Have uh two things I wanted to say: first, not as chair, um so okay. Let me let me step back one more step. uh We have uh 12 minutes by my clock um before the end of the session. We have a lot of things that people want to talk about uh I'll, try to be uh fast.

A

um So, first, not as chair. Do you think star can be expressed within the framework laid out by the priv ppm draft as a vdaf, for example, could we represent the randomness server and ohip type proxy?

A

Could we represent that as one helper and uh and represent the aggregator as the other helper and lay out a vdaf protocol? Even if it's not exactly natural.

O

I think currently the draft doesn't really allow for the star sort of well at least like the way that we, inter, like I've, interpreted like the leader in the helpers. I don't think star necessarily fits into the like interaction process. That's currently laid out.

O

I think um I think if there was some way of like building extra functionality into the ohio proxy, then maybe there was like there'd be some way of like interacting with that in order to uh create some information and then like sending the reports through to the leader, which could also be a single entity in the case of star, perhaps but um yeah.

A

Okay uh and secondly, uh as chair I've, I've heard several um comments about draft adoption, especially for the priv ppm draft, so um I'd appreciate, hearing from star authors and other people. If people could comment on whether they think the priv ppm draft is ready for working group adoption thanks.

F

Alex was um uh so you mentioned that the the the differences in trust assumptions um do you need the randomness server and the aggregation server to not collude.

O

uh Yeah yeah, so you require both of those entities to not pollute yeah. Okay,.

P

So that's comparable.

O

Right, it's it's comparable, but they don't have to because they don't have to collaborate. During the aggregation phase, you can kind of like it's easier to split other functionality, and one of the I guess one of the hopes with like oprf based functionality is that we would have entities running oprs kind of as a service and yeah at that point, it's easier from a practical perspective to argue that, like that, if you like an application server in an entity that just runs an opr kind of as a service are like not colluding.

F

Right yeah, I I most definitely there's a huge advantages to not having having to interact for aggregation um and I yeah. So I I think, that's a very attractive feature of star.

H

um I want to make a comment that I'm not sure I'm kind of buying it. There are two different ways to view non-collusion. I I feel like for me this. If there is a non-collusion assumption, there is a non-collusion assumption in either protocol, so I I would kind of challenge the statement that these are different trust models.

A

Thanks mariana, um it seems like the something strange happened with the queue um a bunch of people were in the queue and then got out and back in.

D

I thought and then I don't seem to be a no. I am and whatever uh I'm gonna.

A

uh Eric I saw you at the top of the cube before. If you want to.

D

Go ahead. Well, I want to clarify, make sure I understand your question because you asked whether whether we thought on ppm, the priv document was ready for um for for, was ready for acceptance, but then people were still talking about stars, so um uh you want to talk about both or what just to be clear.

A

It sounds to me like, as as chair, uh it sounds to me, like the authors of the priv ppm draft, largely feel that it's ready for adoption. Although I haven't heard everybody comment on that, I wanted to also ask the star draft authors what they thought of adopting the proof ppm draft, um because and and everybody else in the working group, because, of course, we adopt by uh rough consensus.

D

So since I am here, I will say that the ppm doctors were ready for adoption. um I think that the um uh I think that people were asking in the chat whether or not these were um mutually exclusive they're not under complimentary. um Each is good for some tasks. I don't think we have to. I think, if we drop through ppm, we can still adopt star later, um and I think um you know uh it's clear that it.

D

It's really clear that they're not not subject to each other entirely, so um so I think like they're, not that they don't proclaim each other.

A

J

uh Sorry, I've I've lost much more thought while trying to describe so. I will pass.

L

Have to wait for the uh wonderful meat echo delay, um so I sort of came into this thinking that uh the group would be adopting something I think we'd be foolish, not to not to do something that fits within our charter.

L

uh I don't see any alternatives for the the sort of things that are being achieved by the prio work and I'd like to see a draft for that, I'm a little less clear on the sort of practical aspects of the popular stuff, um but I'm happy to sort of take the the ppm draft where, where it is right now, I'm not quite sure about the applicability um of the the star stuff. Just yet, um there's there's a bunch of sort of usage constraints that need to be better understood.

L

I think um that's also true to some extent for the um the prio work as well. um There are a number of questions that came up through the presentations which, I have to say um left me less confident than more confident at the end of them, which is not usually how you want these things to go. But um overall I think we should be taking the um the something on at at some point: do it to a call for adoption on the on the list after this.

P

So I guess becker had remarked that you know the the star and the priv or priya, I'm not sure what we're calling it they're compatible um and we can potentially do both. But if it's looking at the charter and the charter says that we will deliver one or more protocols which can accommodate multiple ppm algorithms. And so I wasn't really sure if you know if we did both star and prio, would those be different protocols or would those be different.

P

Algorithms within the same protocol- and I guess I'm directing that attacker, but really anyone who can answer.

D

Do you want me to jump in? um I think they have a different protocol. um That was that was the question that was being asked earlier. I think about whether or not you could cram you know. I think that that the term here um I think um so I mean the print framework- was designed to accommodate multiple as patent was saying vdfs, which is the term you know which is like the specific term is using for algorithms, um but I don't think it's very practical to cram star into as a vdif.

D

So I think that would be. I said, it'd be a different protocol, which I mean. I suppose you might imagine that that star had its own algorithm some sets, but I'm not aware of that. So I think, like I think it's you know, they're two independent tweets. I think all right thanks.

K

um I just wanted to respond quickly to something martin said in particular around poplar.

K

I wanted to note the distinction between ppm, uh the framework that drives sort of the vdf execution and then the vdf specification itself, which is happening in the cfrg, wherein popular and pre-specific bits are being standardized into questionnaires, whether or not ppm as a wrapper around vdf is sort of an appropriate.

K

You know a good place to start and all of the complexities that come with poplar and whatnot, I think, or most of the complexities that come with poplar, can be uh relatively cons, isolated and constrained within the context of the vdf draft.

K

That said, like there are certain aspects of the vdf execution that bubble up into ppm, like how the collector needs to drive a collection, um and there are multiple iterations of that, but um in general I kind of I wanted to make that split like the this is like protocol engineering work and the cfrg vdf stuff is more of the the crypto specific bits.

J

So I remember what my question was sorry. This is a question about star I was. It was unclear to me what the specific collusion risk is between the randomness server and the aggregator and star, and I wondered if uh the um star uh anybody from the star team or alex, particularly if you want to just uh try to summarize what you think the collusion risk is there.

O

Yeah sure so, essentially, if the random server and the aggregation server colludes, then the random server can reveal the like opr secret, key to the aggregation server and it kind of moves. The the like online attack back to being like a local offline dictionary attack again against the measurement space.

O

So it kind of reverts to kind of like the local model, of something randomness.

O

Does that make sense.

O

Q

David gennazi, google, um I have to admit I feel a bit more confused now than I did two hours ago, but I'm.

C

Q

Okay, let me eat the mic, um I'm I have to admit that I'm a bit more, I feel more confused now than I was two hours ago, but I'm going to attribute that to it being friday- and I haven't slept enough this week, but I just wanted to say that this is a great starting point for the working group, and so I hopefully support adopting the ppm.

A

Document, okay, um mariana: we we tried to close the queue, um but uh what I'll ask is uh everybody who's read the draft? Could you um could you please um comment in jabber whether whether you've read uh the the ppm draft, so we have some idea of of how many people have read the draft and maybe, while that's going on um mariana, can can chime in.

H

I guess I just wanted to make the point that we actually implemented both prior and popular and um in the question of whether they they are really fitting together in a single framework. I would say: yes, I think there are like traders between communication and computation, but the main challenges that we kind of see kind of seen across all of the previous presentations are common across this protocol. So I would strongly say that they do fit in the same.

H

In the same framework,.

A

Okay, thanks to everybody for um for telling us that uh that you've reviewed the draft, it's great to see that there's been uh been so much review. uh Unfortunately, we don't have time to run a live, uh a live poll. I think it's too late for a hum. We are officially out of time, but uh but we will, uh I think, run a call for adoption on the mailing list.

A

uh I want to make sure to thank uh dkg for volunteering ahead of time to uh to take uh to take notes. That was incredibly uh incredibly helpful. It saved us a bunch of time and made time for all of this great conversation, uh roman.

G

A

G

Roman, I just wanted to jump in the end because I should have done in the beginning. I just wanted to thank you, ben and sam, for stepping up uh to serve as chairs, and you know to convey how exciting it is to have a working group, and I also wanted to thank, as I did in sag, the proponents uh of the working group for really helping us get from what often doesn't happen from a buff to a working group by the next meeting and that all comes through hard work and preparation.

G

So again to all those that were involved in that. You know much appreciated for making it easy to do good work in the community.

A

uh Yes, thanks again to joe for for being on site and making sure everything ran smoothly today. Okay, um I think that uh this session is over. Ietf 113 is over thanks. Everybody for coming see you on the list.

A