IPFS IPFS Camp 2022 - Content Routing, 2 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Lunch + Fireside Chat with David Mazières and Juan Benet - Juan Benet, David Mazières

Description

David Mazieres is a computer scientist, best known as creator of the Stellar Consensus Protocol and coauthor of Kademlia, the peer-to-peer distributed hash table used in IPFS. He will be joined by Juan Benet, inventor of IPFS and Filecoin, in a dynamic and wide-ranging conversation about challenges and opportunities in designing large-scale distributed systems.

A

uh David and many others have gone on to build uh other distributed systems and file systems and operating systems, and now consensus protocols and um like stellar and and so on, and what I wanted to talk about today is a set of ideas in peer-to-peer and distributed systems and secure systems that are really good and promising, but haven't kind of Taken um taken hold yet so there's all of these really great ideas and research that make a ton of sense, but just for whatever reason get stuck in the r d Pipeline and don't make it all the way into production use and sometimes there's good reasons for that, like hey.

A

Actually, the idea wasn't as good, or sometimes it's just not the right time or sometimes there's no good implementation, or sometimes it's just difficult to shift the world right so content. Addressing is one great example of this. It made a lot of sense from from the beginning and so on, but Shifting the entire structure of the internet to go from location, addressing to content addressing is quite difficult, and so you have to chip away at the problem for a while, so yeah. um Maybe let's just start with.

A

How do you think about r d and just um maybe how you approach research, ideas and and producing work, and then how? How do you want them to get fleshed out and productionized in the world.

B

Okay, well, how there's several questions there, so how do I approach r d? Well so I have the luxury being at uh Stanford of being able to work with super smart people and do whatever I want. So the question is like what? What should we do? uh This kind of a a blank slate and I I always come down to the kind of the same three questions when, like we have an idea uh or like a student approaches me with an idea. It's you know.

B

Why would anyone want this right, the stuff that, like you, think it's good, but then you dig into it. It's like! Actually, you wouldn't ever use this in practice right. So that's one! Why would anyone want this two? uh Can we is there any chance that we can actually pull this off right, like I could come up with some amazing sounding thing but like if it's actually impossible? Our chance of success is zero.

B

Then it's not worth it um and then the third question is just okay: let's do the really half-assed thing and like try to solve it with like today's existing ideas. You know: can we get 90 of the way there like you know, why isn't the kind of the half-assed solution good enough, um and so once you answer those three questions, uh then you can kind of identify an idea. That's maybe worth putting time into and.

A

Once you identify those ideas- and you put some time into it and maybe get some recent results, um how do you then go to um getting that thing adopted like do you kind of? um If you were to I'm sure that you can think of so many ideas that are just sort of stuck in paper form but haven't made it? um How do you sort of arrange the the pipeline to get some of those ideas built out faster yeah.

B

I mean so, particularly as an OS researcher, you know backwards. Compatibility is the uh is kind of like the number one impediment to getting stuff out there right. You can build a brilliant new operating system kernel and if it doesn't run a web browser, you know, like you know, you're not going to to you, have a lot of traction say on people's desktops. So uh so the answer is: try to slice the problem up in in such a way that you don't have to solve everything.

B

um And so one way to do that in taking the example of operating systems is look at a new kind of uh emerging areas right so maybe like uh it's too late for the desktop, but maybe like you know you don't care, you don't need backwards. Compatibility for, like whatever is running the embedded processor on your fridge and so like you can kind of like use the new area.

B

um We tried that with like cell phone operating systems, uh but that that turned out to be a failure, because it was just too hard to like figure out what was going on in the baseband layer.

B

um But uh but you know maybe there's this other areas, so the other thing you can do is try to make the problem easier, and so we have this uh project. My my then student, uh Adam belay who's. Now a professor at MIT, this project called Dune and the idea of Dune was: let's use the virtualization Hardware in you know: x86 servers to not Implement a virtual machine abstraction, but to implement a Linux process abstraction.

B

So basically you run a Linux process, but it's running in uh in basically uh you know it's it's running in the kernel mode, but like of the guest mode of the CPU, and if you use a VM call instruction, you can actually get a Linux system call, and so this because the problem with operating systems right is you have like a brilliant idea for like how to redo the network, stack right, and so you implement this and then to make it useful. You have to implement a file system and you have to implement.

B

You know a windowing system or whatever, and so here uh the idea is that you could do just the part. That's interesting like just redo the networking stack say and then, if you just just need a file system, you could just kind of pass that through and use VM calls, and so you can literally like call printf in the middle of an interrupt driver and there is a standard output. It's like connected to the standard output of that Linux driver, so that has been like really useful.

B

There's been a you know, a bunch of research projects that managed to use Dune uh and it made their their life their lives a lot simpler. So but you know sometimes it's like sometimes problems just like require a huge effort right, but I think often you can figure it out. The nice thing about being at the software and operating systems is often you can figure out how to do it with less effort.

B

If I were like building, you know, new vlsi chips or something I might need to like employ an army of 10 grad students and work for a bunch of years to have something: that's not competitive, but.

A

Yep and um when you think about like the the range of work that you've been um doing so from again peer-to-peer file systems, operating systems, consensus and so on, um there's probably a set of ideas there that you thought, like um really were really great ideas that just haven't made it through. Can you maybe mention two or three of those ideas and we can like dig into them.

B

um What are great ideas that have not made it out there um I, don't know, I mean I, have their sort of principle, well, there's principles and then there's ideas. So you know there's like technologies that I wish were out there, and it's just an unfortunate that that they aren't uh I mean just an example, not not that I've contributed in this area, but, like password, authenticated key exchange right to me.

B

It's it's grotesque that we're typing passwords and like sending them in clear text to servers right and like we literally have better Solutions uh than this right, even with passwords right like of course, a lot of people now are saying: oh passwords, don't work get rid of them, but actually we could do passwords better um and the thing that's standing in the way that like um I had a student Quinn slack who implemented you, know really pretty tasteful implementation of pig, which was password, authend key exchange in the browser, and we went and we talked to the people at Mozilla and they said well, you know the problem is that people want to have who are building websites want to have control over like the failure path.

B

If you type the wrong password yeah- and it's true like if, if you were using passwords right, then when you mistype the password, you would learn one of two things: either that you're talking to the wrong website or that you type the wrong password right, and so that would give up some control, but the benefits would be immense, but, like it's just people are unwilling to give up that.

B

You know the small advantage that they currently have in exchange for like a much much bigger Advantage um in terms of principles, though I would say you know, they're sort of principles that guide my work, that that cut across a lot of areas and one is that, like I, I, really uh deeply believe in sort of egalitarian interfaces and apis um and one of the things that's a really unfortunate thing that you can do in a system. Design uh is have uh basically sort of privileged and unprivileged programmers users.

B

Software have access to kind of qualitatively different apis right, because then you get this thing like on on Unix and Linux, where essentially, root is like the garbage can and there's all the stuff that, like you know, for example, there's protection built into Linux right. It's called user, IDs and uh and yet applications can't use user IDs for the most part to like sandbox themselves, because you know they would.

B

It would be like more of a pain to install them and it wouldn't just like work you'd have to like create a new user, ID and stuff, and so we do other things to kind of sandbox software and we're like giving up the ability to use the hardware to do that. So you know the principle would be like. Instead, try to you know, try to make sure that, like privilege is like a quantitative difference, not a qualitative difference right and that's certainly something we've taken to Heart in like the work at Stellar.

B

Where you know what we're trying to do is make sure that you know any random person who wants to innovate, and you know a big Bank have access to. Like the same apis, even if they have sort of different assets so like there's just more ability um to innovate um and then I guess the second principle would be um I, don't know I, guess you could call it like the weight weight and balance principle. I'm.

B

Not uh you know if you're designing an airplane like you can't lard it up with too many features because like if it's too heavy It's, Not, Gonna, Fly and unfortunately, we've kind of lost that with software right, you just kind of Pile in the libraries and whatever and so and the other thing people do is just throw more engineering effort into something and it's easier to kind of write code than delete code um and one of the things that is, you know you see it at say at Stanford, where, like we're working on researching small teams, you have to make these design decisions.

B

Okay, we're not going to like build every feature. You know you really have to decide what the core features are and at the end, like I. Much prefer these like strip down tools that do the core interesting thing and don't try to you know.

B

Do everything all at once, I guess if I could add a third one yeah if you're, if you're, if you're designing systems think about the graph of like the number of hours of experience, people have with your system uh or as developers using your platform and the sort of power that they have using it, and what you want to try to do is you know, have a high. uh Why?

B

Why intercept, but also like a kind of constant slope, and these systems, like you know like say, cbosplus, is a programming language right like to really do stuff? You need like two years experience. You know, like that's kind of like there's something wrong with your design. If that's the shape of that graph.

A

Yeah, how does that um uh apply to say um software, libraries or or uh I guess yeah civil plus makes sense. You have to learn an enormous amount python. You can get started really quickly and already leverage a lot of power. um How do you see that kind of applying in the design of of um yeah software? Where does that go like into API design? Does that go into like feature set design or yeah.

B

It goes into API design and it also goes into tooling right. um I mean one of the things like I spent a fair amount of time at Stellar, which is the the blockchain I'm involved with doing stuff.

B

That's not super research worthy but like implementing like a kind of an assembly language for Stellar transactions, because what you know it looked like there are people who, like either you're, really Advanced and your uh you're basically you're implementing to like our go apis or you are like basically using a wallet right and there wasn't like this sort of gradual. If you wanted to just write a script to like generate a bunch of transactions, it was like a pain to do that.

B

You'd actually have to write a go, go program and with my silly tool called Stellar transaction compiler, you can suddenly write like shell scripts or you know, just simple command lines to basically like generate transactions um to kind of try to fill in that Gap. So I guess look at sort of the gaps in you know like people who have like you know, 9 15 months of experience with your system like what can they do?

B

You know if they are, if they aren't experts and if there's like kind of a trough there, where there's not like a payoff like find ways to like fill in that that area and you sort of create a smoother on-ramp for people to like use your and become experts in your system. Right, you got to get those endorphin rushes right, oh I did it. It works right. You know like that's what motivates a lot of us developers.

A

Right, let's dive into content addressing so um there's, actually a question that I've uh never quite been able to figure out. Where did the idea of hash linking and file systems start because I've traced it back to sfsro, um which it did and then probably be before that like fossil inventy in um plan, nine in Bell Labs, like already had the beginnings of that but I think you interned there for a while. So I was wondering if, like you brought that idea or like you learned there or I.

B

Think we sfsro might have predated uh venti um cool, but um yeah I mean I, I, I I, remember specifically, so so from my dissertation I'd done this file system called SFS self-certifying file system. The key idea, then again following the principle of like filling in the gaps in in and and making the system more accessible.

B

The the idea was to embed the public keys of servers in uh path names, and then you know, certificate management would just be like creating a bunch of symbolic links, because a certificate is just a way to sign one human, readable name to a public key and so, uh rather than learn. How to you know, fill out like an x509, Certificate request or whatever, like everybody, who can do a little bit of shell programming understands what a symbolic link is.

B

So it's like it's much easier to do that, but if you did that as a certificate, Authority just became a directory full of symbolic links, assigning human readable names to these other path, names that contain public keys and- um and so uh we built of course, then, if you want your significant authority, of course, in order to be secure, you need to have the signing key be offline, so we had kind of two dialects of the protocol.

B

A read write where the server is online and then read only where the stuff was signed and so kind of the first version of the read-only. We kind of signed all the different pieces of the file system individually and then I. Remember I was you know talking to my advisor uh France at the time and I was like and he was talking about how we're going to like redo. This I was like no, no, no I've got the way. We're going to redo this much simpler, there's only going to be two RPC calls.

B

One get me a digitally signed uh message that contains the hash of the root directory and two here's a hash value. Get me the pre-image and like that's all, you need server side, so we can make the server side super simple and that's that's. What led to sfsro that I think was in like two osti 2000, so I think I was like a little bit before eventive.

A

Yeah and that's the core hash linking for data that then you know ipfs and git, and so many other things are are based on. So um you know I think we're still in like the quest of adding content addressing to the network like we're.

A

Now what 10 and then 10 to hundreds of millions of people benefiting from this, but not yet billions we'll get there, uh but, um as you think about like that set of ideas of kind of hash linking into these extremely powerful, primitive, and yet you know it's 2022, and so many systems out there are still kind of linking to all this mushy Dynamic, not certified, not authenticated data structures um like what, when what has gone wrong over here right like. Why is it it's either? Is it too hard is the idea too complex?

A

We sometimes talk about it as like people have to go through their Merkle journey of understanding how everything is when you hash link everything. Everything gets better and you can move everything into different spots and you get all the security properties and distribution properties, and that idea space is, um for whatever reason, not as prevalent as say something equally complex like encapsulation and protocols, and the protocol stack and networks or reliability in reliable transport. So, like all these ideas are kind of seem equally complex, but for whatever reason, hash linking isn't quite there.

A

So I don't know if you've.

B

um Yeah I mean it's not just hash, linking it's also Key Management, so I would say one uh one thing: that's a little bit unfortunate is, uh is the design of TLS and how it's used in practice. um So the problem with TLS is that it's providing two totally different things.

B

One one is it's actually providing security at the transport layer by uh by encrypting your Communications, and the second is that it's kind of it's kind of most of the implementations build in this like x509 certificate stuff, and uh what you really would like to do is kind of disaggregate these things and uh and be able to kind of authenticate, both contents and and endpoints in uh in different ways. Right. So uh so, like a problem with the web, is that you can't name a public key right.

B

It's like you, can do this key management in x509 and get your https URL, and then you can get a web page, which is a bunch of HTML. But then you can't use what you get back to actually name another public key right. So the key management can't be part of the system itself, and it doesn't doesn't have this like self referential property.

A

um We have blockchains now we could do a lot of the key management through like just putting all the certificates on chain.

B

But I'm saying like the the system that everybody uh that you know a lot of people use, watch change but way more people use the web and sort of like the thing. The thing that, like is probably many people's first experience of the internet, now is like using web browser. Just doesn't have the flexibility to implement your own key management or content authentication as.

A

You think about like blockchains and where they might go like not just kind of the public large-scale cryptocurrency blockchains, but just in general, this idea of using consensus with hash linking and maintaining these logs. um You know you could redo the ca system. This way you could redo DNS. This way you could redo. um You could have a secure time this way. um How do you see that kind of like sinking into the network stack.

B

um Well, you know: there's the this, um uh this classic paper on and dance arguments and system design right which says that if you can Implement functionality in the network or kind of closer to the end nodes, you should do it in.

B

In the end, nodes um and I view it kind of a lot of systems designed throwing in a fourth principle here, as kind of uh keeping in mind uh a more generalized version of that which is no, you don't just have end nodes in the core of the network, but you have sort of routers that are closer to the user and that the user has more control over within a an end. Node, you have stuff that you could do in the kernel versus stuff.

B

You could do in a library versus stuff, you can do in the application and so kind of you know the more you can move that functionality out say towards being in a library, I think the better. So um so the question is, you know, should you know, content addressing, replace IP, uh I? Think probably not.

A

um But HTTP, for example, like uh just pulling some um today, you use HTTP to pull content and you ask for a a file path and you get back some bytes, and maybe you should ask for a path. Yeah get back a hash. Take the.

B

Hash, and so your question is how should that sink into the network core um and so I think maybe like an interesting open resource, question and I do have a couple of uh students at Stanford were thinking about uh sort of thinking about this?

B

Is that a lot of these dhts that you can use are not uh Byzantine fault, tolerant and building like a fully decentralized Byzantine fault, tolerant, distributed hash table seems like a very hard problem, and so the question is: if we could make, you know, ask a little bit more of the core Network.

B

um Could we actually do better um and.

A

So, as you think about dhcs, one of the things that we've run into is um just the speed like when you want to use. So in epiphast we use a DHE when we use gadamia to do the discovery, so the the content, routing uh part of the problem. So if you have a hash, how do you identify what nodes in the network have the content? So you can go get it right. We call this providing so content.

A

Providing we create these provider records, we put them in the DHC, a user trying to look it up, looks at the DHT gets a provided record can go and find them, but um the HDs have a little space per node and then they have really large. uh You know round trip time distance to all of the nodes, so traversing these dhds gets really expensive when you want to do something like a page load right.

A

So if you want to do a page load and you have a hundred resources and you have to touch a bunch of different nodes, suddenly it gets like super super slow. Have you thought about like how to either make the issues incred?

A

Incredibly fast, you can get close to O of one or how do you or end up with ehds that are a lot smaller, so one route is this content indexing pathway where we've gone, which is instead of having say hundreds of thousands or millions of DHT nodes, have tens to hundreds but make those nodes extremely large. You have some security between the decentralization of those nodes, but you now have tons of Records in one in a smaller set of machines which of these Pathways. Do you do you think we're we're gonna?

A

Can we like hack, like the massive scale, DHD systems, with like millions of nodes and kind of somehow solve the the speed of light problem, or do we really want to go for like tens to hundreds to maybe thousands of nodes in like interconnects in data centers, and you know make these massive rulers there? Well.

B

First of all, are you sure it's a speed of light problem because I've seen kid Emily uh yeah, you.

A

B

Was taking like tens of seconds or whatever.

A

So it's speed of light when so, when you have millions of nodes- and you don't yet have a secure communication setup, a secure Channel, you end up in this problem, where, if you want to get the information from the next party, you have to then establish a new secure connection to them, and so that ends up with a bunch of round-trip times. Well, why does it have to be secure?

A

um Because, uh because of privacy reasons, you don't want to necessarily just tell everybody so two two parts: one is you don't want to flood the network, so you want to do iterative discovery which is slower. If you do recursive, you then need some other incentive structure in the DHT to make sure that you know you don't flood it and then second, um you want privacy in the queries, and you want to be careful about who you necessarily convey. What information that you're looking for.

B

I, don't know I guess I would I would push back on that assumption a little bit so, first of all like it depends what you're using right uh you could. You could certainly Implement a custom protocol that would uh that would uh require fewer round trips.

B

um You want to design the protocol, especially you can overlap the latency of like some of the encryption with uh you know, with with network communication, um but uh but I think that I mean what are these things using like TLS, typically or so.

A

Yeah, there's a you know, set of different, so TLS and noise and other another yeah.

B

um So I think I would I would probably first look at the low hanging fruit, which is, if you were to design a custom protocol um and.

A

uh To try to like do the encryption such that um it's not on the channel of communication, so you're doing a request in the clear.

B

um Yeah, and so that you can I mean the other thing is you could use like suppose when you at the time you learn of like a node's ID, you also learn of like it's a public key. You could potentially do like a non-interactive, Diffie Hellman uh key exchange, so you basically like, can send the thing in encrypted right. I mean that's not uh uh you. You lack forward secrecy there, but I mean given that we're talking about a kind of level of casual. What we you know.

B

This is only going to work against a sort of casual attacker who's like eavesdropping on the Wi-Fi anyway, so you have to kind of find the right balance um so anyway, this is the point being like already we're talking about stuff other than speed of light.

A

But but uh the way this connects is that you have these problems that cause that add another. You know round trip and when you have other round trips and you're, trying to Traverse a very large, you end up end up in the sequential problem where you have to like find out a bunch of information and in order to find out the.

B

Information I understand, but but what I'm saying is that we're it's important to say that these round trips aren't like this experience? The the performance bottleneck isn't the speed of light. It's other things. It's like node failure right. Why is node failure? What's you know? What's probably like the number one concept of node failure? Well, like would a nightmare not traversal is to implement right.

B

So if you were to sort of in terms of what could you ask in the network, it's like well, you know, uh start rolling out IPv6 and you know have some range of ports that, like applications, can use that are basically not uh not firewalled um so like these are things that, like you, could ask from from the network um and and I think that that you could do okay, I mean I, I, I I.

B

Guess you know, should we go to a world where there's a hundred you know data centers that are storing the world's content. Like that's one possible world I, don't like that as much because they're centralized control so I'd like to exhaust all the other.

A

You can put them in kind of the um you could look at the Grapevine of the internet and put these wherever there's some high connectivity in in the pathways right. So if you look at the tree structure and you put in content routers in the vertices that have lots of Downstream parties, then kind of increase the amount of content there, because storage is getting super cheap right like you can, so you can have a you know.

A

We these things are going to carry terabytes now and then it'll be tens of terabytes within not too long a server rack is now a petabyte. So, like you can put a petabyte of Records into you, know a your basement or a pair device records into this building and then suddenly have this massive content. Router.

B

um Yeah I mean the the retrieval bandwidth is, is potentially an issue. um I think the the problem with this is that like, if you could wave a magic wand and have isps deploy this stuff for you, um that would be great, but it would be set in stone right like trying to get them to upgrade it.

B

Then it's going to be like a big deal, and so, if you, if you so it is, you know, there's an argument to be made that, like okay, maybe you know if ketchup is like the you know, shot three is like the the hash to end all hashes. So we're happy to like just commit to that and like do this, and maybe the interface is simple enough, that we feel we have something that, like is the right thing, and so we we're ready to like bake it in stone.

B

Let's make up credible protocols, um I I'm, a little more. uh You know, I'm a little wary of that right, so I'd want to know like want to make sure that it can't actually happen at end nodes where, like user, we could have like competing versions of the protocol. If necessary, yeah.

A

No I mean um you know instead of committing to catch act. You just add a bite ahead of time, telling you that you're going to use which hash function, you're.

B

Gonna yeah, you want crypto agility but like getting people to like rehash, like you know, petabytes of data like they're not going to be that excited about that like having a flag day where, like you, switch or having to it's just it's yeah it'll be annoying.

A

We already have an ipfs and ipfs. We use chat if it's six, we use um blake2 and we use a bunch of other um hash functions and it's already kind of all integrated. So you can navigate through the through the pathways and then you just disable you disable hashes as you um as they break or they get weaker.

B

But uh so maybe we're at a point where, where, if you could like basically commit to your software and say like I'm, comfortable, never upgrading this again, modular minor bug fixes, like you'd, be happy doing that.

B

um And so then you just have to make the argument that uh show that it's, like the the benefit you get from pushing into the network is worth the lack of upgradability and the lack of competition from other. Potentially, let's.

A

Talk about privacy for a moment like, um as you think, about uh preserving privacy and reader and writer paths, so um writer privacy is hard. Reader privacy is way harder um when you kind of like think about how to make content, publishing and content viewing properly private of, like all of the approaches that you've seen tried, what do you think is the most promising.

B

um I'm not sure, because there's there's a bunch of different approaches and none of them, uh none of them is ideal and they don't really compose right. So, like private information, retrieval, for example, is really cool, but it's really expensive on the server side. uh So another thing you do is, you could like say, split, trust and say well.

B

These two organizations need to collude to figure out what I'm looking up right, but that's not a great uh uh thing either right and then you can do sort of half-assed things where you have at least plausible deniability, because you're fetching something random but like statistically, people are still going to figure out what what you're doing and so I and so I, don't I, don't know because, like none of these are, are really I.

A

Mean the servers are pretty fast now we're talking about. You know um using zero knowledge, and you know in five years, we'll be using five ten years we'll be using fully homomorphic encryption like it's.

B

Like super private and for truly private information retrieval you basically, it only works if you touch if the server touches every single piece of data, because if it doesn't touch every piece of data, then knows that you didn't fetch a particular record. So if you want perfect privacy like there's no way to do it under law again,.

A

Surely performance is going to get past the amount of like humans and so like you only have a certain amount of data that you're generating and everyone given point in time. It's only some amount of data.

B

I think the amount of data we generate might be proportion of the number of.

A

Devices no no I mean like the lookups that you might be doing so. um So a human does a certain amount of lookups and sure that might increase on something.

B

Sure that's fine, but my point is that the cost of the lookups with private information retrieval increases with the size of the data. So it's the amount of data times the number of people making requests, but.

A

You can probably add some kind of um uh structure to this to break down the problem right, so you could like um do some some like bucketing, first and then private information retrieval in some within some buckets yeah.

B

So maybe, um and then you have to look, but so that's my point right that there's all these trade-offs um and uh you know another thing you could do like first, you also have to think about censorship resistance. So one of the things that we did I had a student at NYU, Mark Waldman, who did he built a system called tangler, and the idea was that all of the data that was stored was kind of information theoretically unrelated to the content.

B

So you would like, basically fetch multiple data blocks that actually depending on which ones you fetch Could, reconstruct different documents and then just reconstruct the particular document you want. But of course there's like a you know, 3x overhead, in bandwidth, for for doing that. Theorex.

A

Is 3x is not terrible: yeah yeah, that's pretty good, um so uh one thing so we're at lunch now, so um uh David has kindly uh agreed to do office hours with folks here that are doing uh distributed, systems, work and uh working on content, routing or dhcs or cdns or um hash, linking and so on.

A

So we'll right after lunch, uh we'll go to the terrorists or some somewhere up there and if you're interested in like talking about some of the problems that you're working on some of the tech that you're building um and get feedback from David uh we'll we'll do that we'll kind of just to keep it to like five five to ten minutes and then so that other people get get that and uh yeah after lunch. I'll take maybe two questions from the audience and then we'll we'll stop for lunch.

C

Hey good afternoon um so I landed in in the talk about uh private data. Retrieval I'm, very interested in you know, I I. Imagine you could mix data right. You could mix data from various sources. So it's like ah you know this. One file has information from five different sources mixed and then you can request a bit from one server from another server from another server and the specific combination of data that you retrieve from various servers could be reintegrated into the exact data you need and that way none of the service you.

A

B

That's the tangler, so that was the tangler idea and so there's kind of two things: there's one the fact that all the data is Tangled, Up So you! When you request blocks those blocks, could be used to reconstruct multiple documents. It doesn't say what you're doing and you couldn't sort of censor a document without causing collateral damage to other documents, um and then the second part of it is. You need to have a a way of kind of shuffling the data around such that.

B

If you a server cheats on even one block uh one data block that like they, you know they can get sort of kicked out of the system and such that nobody is sort of permanently responsible for any block so that, even if someone is being bad like eventually like it'll, move to other places, yep.

C

I guess you would also have to make sure then that you know not all the notes that you request. Data from are owned by the same party, and this is a tricky part. Now, how do you prove that you know these are different people hosting your stuff.

A

Yeah I mean that depends on kind of the system design and um what are the guarantees that you want to preserve like and sure you're going to end up with many many notes um uh involved in in the protocol? You.

B

Can check out our tangler paper and it's from like 1997 or something. But uh you know from by.

A

The way just tons of amazingly good ideas are in the literature like just have not been implemented for some reason or have been, but like got stuck somewhere in the r d Pipeline and you know waiting to be um to be distributed. All right uh final question just raise your hands.

D

You talked earlier a bit about the like sort of combining uh Byzantine fault tolerance with uh distributed hash table. Could you talk about like I guess the security model that like or distributed hash tables, have right now and how Byzantine fault tolerance like comes into play.

B

um Yeah, it was basically not good, um so there's uh we don't really have a good solution to this problem right. So what you know, people can do things with sort of admission control to try to prevent civil attacks, but it's fairly unsatisfying- um and you know my then student Michael Friedman who's a professor Princeton. Now he built this thing called the choral content, distribution Network and the goal is always to like open this up to a bunch of uh you know and let anyone participate we never ended up.

B

Doing that and, and in part, is because we couldn't trust the the nodes right. Someone could mess it up and uh well, it was it was that and the fact that you couldn't authenticate content in a in HTTP and HTML because, like what you really want to do, is uh you know fetch content from this caching layer and not have to trust the caching layer for Content integrity and just because again, the browsers aren't designed in the way that I wish they'd been designed. You can't do that very easily. Yep.

A

Well, uh thank you very much David, thank you for being here with us and uh sharing the knowledge so.

B

What 1pm uh Terrace.

A

Or yeah 1 pm in.

B

The Terrace, okay, yeah, maybe see uh some of you, then thanks. So much thanks.