Ethereum Foundation Ethereum Berlin Meetup May 2022, 25 May 2022

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Designing a web-scale data network, secured by Ethereum - Joel Thorstensson

Description

Slides: https://drive.google.com/drive/folders/1ZLdT20HFyMu7E-DxXZfQNHpFetD0OOcp?usp=sharing

Berlin Ethereum Meetup on May 25, 2022:
Joel Thorstensson from Ceramic Network on "Designing a web-scale data network, secured by Ethereum"

A

All right there we go.

B

Cool well, it was super cool to see kind of your full presentation about bitcoin and kind of all the ideology that came into that, and so I'm gonna kind of look one layer in here of like how we can actually store all this data that we need to create these kind of pluralistic profiles of individuals that contribute to these systems and yeah. So the network we've been building is really designed to allow us to have a kind of web scale, data throughput in a decentralized way and I'll kind of dive into what that means.

B

But first I kind of want to give an overview of um what this really is. So this is a slight adaption of a talk. I gave at the side decentralized conference the centralized science conference that was last few days.

B

So if you missed that that was really good uh good conference, and so this is kind of talking from the perspective of like a knowledge graph, it really generalizes to like a graph of contributions and application data generally.

B

So what we set out to achieve like, maybe to take a step back uh when we started building ceramic we're really trying to build a system for identity, but we realized that identity is like not really about you going to some institution, getting some credential getting like a passport as like, officially stamped by government.

B

If you think about your real world identity, it's more about kind of the relationships you have with people and the relationship you have to the world like your interactions essentially, and so we wanted to kind of capture that in a digital form, um so we kind of started thinking about it as a knowledge graph of the internet or generally like a contribution graph. It needs to be kind of living and relational, and both data- and it's in it needs to relate to each other, but also like the relationship between people.

B

How they interact online uh cannot also be captured, and one way to think about this is as an emergent web of trust. Some of you might be familiar with like this old pgp web of trust project.

B

I think the reason that didn't work was that it was trying to like introduce this new social norms about like verifying each other's keys and stuff like that was really difficult for people to understand.

B

But if we kind of can have interactions be digitally signed, we just start to kind of have an emerging type of trust, and from that we can start to extract this kind of like social proximity which which the kevin and the git going guys talked about. Just now, and once we have this kind of uh open data graph for webp, we start to do like collaborative sense making in that, and so I'm going to talk about some of the properties that is really needed to achieve this.

B

First of all, we want to be able to share data across applications and across the organizations. We don't want the data to be logged into like big stakeholders that just hold the data port data for themselves and kind of keep it captured, and we want people to plug into this system and optimize for their specific workflows, everyone's not going to want to query data in the same way from the system, and we want the data to be composable uh if a git coin creates their kind of profile that adds verifiable credentials.

B

Someone else should be able to plug into that use that also like, maybe create some new new way of adding data and still kind of leverage, the old data and reference that old data. I think with this we can increase kind of the rate of innovation in in this kind of ways of looking at uh fluoridally.

B

Finally, we need authenticity, so we need the system to be censorship. Resistant. We don't want arbitrary actors to be able to remove stuff. um We want every action to be authenticated. This essentially means that users can like sign data or essentially, like accounts, we'll sign that up. I think an important piece to realize here is that this system is system doesn't like require any real kind of uh real-world identity, whatever that would mean information, and we also want kind of like secure time stamping.

B

So we have like verified of all the audit trails like what happened at what point in time and with this uh authors of data can kind of build reputation. So how can we build this system? Well, can we just put it on the blockchain, the blockchain kind of provides all this this functionality?

B

Well, the problem with blockchain systems is that some of you might have noticed they don't scale very well, and so the reason for this is that they strong they favor, something called strong consistency. Essentially that means that all transactions need to be ordered in a particular way, and this is really good because or they need to be, all translations need to be ordered completely, and we can't be like two nodes to have like different ideas of uh what the state of the blockchain is.

B

This great prevents double spends allows us to do all these nice financial things. uh We can do fund management through those we can fund public goods. We can have cool nfts that can have a lot of use. Cases beyond, like you know, only like um pictures online, but yeah. The the main limitation here in throughput is that at every block there needs to be an individual block producer, that's choosing by whatever consensus mechanism you have.

B

It produces the block which essentially makes the scalability limit, whatever one node in the network, the smallest node in the network can produce or can compute, so there's essentially two ways um that different projects are taking to scale blockchains for data and there's two camps. So big block camp is solana, celestia army. They basically have different mechanisms of um kind of convincing themselves in the community that like hey.

B

This is secure and that's all well good, but the problem is, you still have a big computer that needs to process all the transactions and, if you think, about applications such as twitter and facebook and messenger, and these things they can't scale by having like one centralized server, they actually need to have a huge distributed system. So thinking that we can scale to web scale with with a blockchain system that you set big bots, uh that's pretty far we're gonna get pretty far off the mark. There's another approach called proof of storage.

B

Essentially that means that you, as the user of the system, you can make a an agreement with some node or set of nodes in the network like hey, you can store this big chunk of data and there's like proofs that makes kind of guarantees that the data will be there when you want it back just great now we have a much more throughput in the system, but for every kind of agreement you make you need to make a transaction that needs to go into the block and most of the time, if you have like millions or hundreds of millions of users, uh all of those updates are not going to be like big data updates, it's going to be small updates that each individual makes so we're still kind of like not going to be able to get that through, like every all.

B

Those small updates will still need to go through the block, so we need something different. We need a decentralized system that is eventually consistent. That allows us to produce data in parallel um without having the limitation of this.

B

This block, with the producer that we have in the blockchain, and so I think we can achieve this if we have verifiable audit trails and ideally that can enable us to have data composibility.

B

One big key thing to note here is: we: can we can achieve parallel data production without this kind of um and having eventual consistency by focusing on non-financial data, because the financial data use case really needs a strong consistency to be able to function correctly. So, let's focus on non-financial data.

B

I I kind of tend to think of that, as, like soul, bound data, because uh you can't like trade it so with ceramic we're, building a solution for this, and we essentially split it into three pieces event streaming at the bottom, basically like a hash, linked log of events on top of that, an indexing system that basically builds a view on top of the event streams. On top of, that is like a graphql api that allows you to like easily query data within the system, so at the bottom we focus on making event streams available.

B

So these are independently verifiable event blocks.

B

I can sync the state and verify the kind of validity of one individual event stream without having to know anything about the rest of the state of the network, and so this is quite different from a blockchain where you need to sync the entire blockchain to know what's going on and each of these event, streams are produced by an individual account, uh so we use the ids uh it's a way to represent accounts, and this kind of allows us to have um individual or like support any sort of blockchain walls.

B

So right now supporting uh metamask, but we're working on extending that to like any um any blockchain wallet. That really can sign a message and the ide is like a really good way of making an abstraction for that. So each individual account produces their own event strings. So you can choose to index essentially like one account or across multiple accounts, and we use the peer-to-peer network to synchronize event streams. So you can connect to the network synchronize only the event streams which you care about and yeah a result of.

B

This also is that all data is sold on. Like you as an as a user of the system, you would produce your own investments and there's no way to create those event streams. Let's say you're tied to your account to your ethereum address or to whatever other other address you have, and an interesting thing also is that I can produce an event stream that makes verifiable credentials and claims about other individuals.

B

I can claim that you know you're here and kevin could also have done that and explained that about everyone here as well, and then you can build an index on top of event streams for like the presenters of tonight and kind of query, information about um kind of social context, and that data will also be sold on, because you can't really trade that either so.

C

The event stream.

B

Probably looks like this: uh every every event here is is put into.

D

uh Or is hashed.

B

And then you can kind of build this hash linked stream of events, and so the genesis event is just like the creation, which basically ties the event stream to your account and then the assignment series, like updates to this event stream, and we also like periodically anchor these event streams into the blockchain, and this is key for like the security of the system, because we can now get like secure timestamps, you might wonder like okay, but now I need to make a transaction for like every time. I want to anchor this into the blockchain.

B

You know that seems like a limitation. It was pretty simple to get around that we could just take a bunch of updates to a bunch of different streams, build a merkle tree or some sort of vector commitment. You just put the the root of the tree or like the vector commitment on chain, so we can basically group a bunch of updates and put put them on the blockchain.

B

And so, on top of the event streaming layer, we have the indexing layer, and so each node in the network can choose to build an index on top of the event streams, and this index is essentially, um they can essentially choose which data to index.

B

uh So we we have like an abstraction called data models which allows you to create a subset of data that describes um essentially, semantically describes some some data that you might use for your application and and each node can choose like which models index and they don't need to index this data, the entire network.

B

So this is kind of like what really makes it it's scalable, because each node just choose what they prefer to look at, and the interesting thing also with this kind of red streaming and then indexing on top approach is that if someone is not satisfied with the the kind of database layout of whatever you get from from the default indexing, they can just plug in directly to the event streams and read the data.

B

Maybe perform actions directly based on events or build a different sort of index, and I think this flexibility is really key and so, finally, on top we're building this more easy to use interface for for engineers and developers to to create.

B

And add data to the system, so we call this data models that you can create them using kind of standard, graphql schema, definition, language. You can query the data using graphql and these models. You can discover models that have already been created and you can kind of compose them. You can take existing model, create a new model that maybe reference the whole data or just add some additional.

C

B

Which yeah it's like same sort of composability having smart contracts but without the financial pieces and like finally, here like the interesting piece of data models? Is that you as a developer? You define the data model, but then you don't actually define how like a database where, where users write any user can create an event stream that writes data to this data model and as a developer, you can choose to like query all of the data across all of the users or query data across, maybe like only the specific nft holders, or something like that.

B

So you can kind of choose. It's like an open open system where anyone can write, but you can choose like which things to include in your view, so just a quick example of what this would look like is that you have here potentially have a proposal and it has an author which is provided automatically by the system. It has some text, then there's a reference to another data model, which is a comment. This is pretty much the same, but it has a proposal id which basically a reference back to the original proposal.

B

So you can see kind of like how you can create two different data models here and how they reference each other and propose, and potentially someone could come in now. If I build this application for for dab proposals or whatever someone else could come in like hey, I want to be able to like proposals and upload and download comments, so you can add, like a new data model that allows for that.

B

All right so uh quickly, some use cases. These are kind of like more tailored to the decentralized science um thing because there's a slightest problem, but I think it's really interesting to think about this sort of way of modeling data as a semantic knowledge graph, where you describe kind of the data you put into the system and the relations shifts between data.

B

Another interesting aspect is: you can have more real-time collaboration on things, because you have this kind of open, write, access to write to a data model and any user can, for example, like create a proposal. Any user can create a comment and it's up to kind of the application to choose which data to query and it's useful for citizen science, it's useful for building apps in general.

B

I think uh I think the last- and I think most interesting piece of this- is that once we have this kind of graph of interactions between users, how people contribute to communities, maybe they made a proposal.

B

Maybe they made insightful comments and whatever else you might model into your application, you can kind of look at this graph of contributions to to your open data ecosystem and see which, which accounts, which autonomous accounts, contributed and do some sort of like communal retroactive funding, for the individuals that contributed and there you could, of course, like take into consideration like plurality and things like that as well.

B

All right, thank you for listening. I hope that was helpful.

B

Are you going to take some questions if we have time yeah sure yeah, I think.

A

We have at least five minutes.

A

Hi there thanks for the talk um I just wanted to check my own understanding, primarily so event sourcing is about you only ever add that you don't edit to your data sets is that right.

B

Yeah I mean you can think of an event stream as a stream where you put events and you can create, updates and delete as actions right. So you can put different actions into this event, streams and yeah. Those can be like great updates.

A

So you so you can deal with removing illegal content or the right to be forgotten. For instance,.

B

Yeah, so each each event stream is each node essentially chooses which event streams they keep available and pin to their node. So if there's something that you don't want to be kept available on your particular note, you can remove that.

A

Cool. Thank you.

B

Any other question.

B

Everything crystal clear: okay,.

C

Well, here we have one.

E

uh I was doing some research and I didn't want it to lose any data, so I thought okay maybe store that data on the blockchain. You had some projects in your slides, like.

E

B

So ceramic actually uses for the event stream. The way it's represented is using the same data model as ipfess uses. It's called ipld and it's basically a way to put data into a hash linked graph and refer to and kind of, create dags directed by cyclic graphs, uh using in a kind of standard way of representing this difference, you can actually represent the ethereum blockchain or like bitcoin blockchain, like in inside of ipld.

E

And the notes um there's a token token, there's.

B

No ceramic token of right now, but there's there is going to be because, like on in here like right now in the event streaming layer, you need to run your own node and choose which event streams to keep available as an end user. That's the hassle right! So I want to be able to pay the network to keep my data available as long as I want, and so we're adding an incentive layer using using a token um to achieve that.

E

Did you think about like a lot of other projects, have like single-pointed failures and if there's an emergency like nuclear war, we maybe have some soon or not? um Yeah um did you think about if the price from a token drops, maybe your node will turn off or nodes will turn off, because there's no incentive of running a node, so yeah the files are lost, maybe because yeah.

B

Loads are turned off. No, it's definitely about concern. I think it's true for, like any any decentralized crypto system that uses the token incentives.

B

One interesting aspect of ceramic in particular is that I can even if there's like this tokenized network- and I I I if I don't you know, I trust that I think that, like okay, if the token goes to nothing and like I don't believe in the current remediation strategy, I can still run a node and keep those event streams available on my node and if the, if the incentive mechanism stops working, the event streams will still be available in the network.

B

Okay, thank you.

C

um So other than bitcoin it sounds like you're kind of keeping their souls um are there other? If you have other like real applications, people are starting to use with this or anything else that kind of brings it to life.

B

Yeah uh I mean one. One interesting example that that recently got built in communities is cyber connect, they're building kind of a profile page for projects in the workplace. Space, like I think, as the web 3d space have grown, it's become like much harder and harder to like, actually know what's going on and like which projects are relevant because we have no source of truth and I think like if they can achieve like one source of truth. That would be interesting.

B

So that's just like simple profile page for for projects, but I think they are planning to expand that into more of a social network. So, like.

D

B

Social use cases are interesting. I think, like generally projects around like dial coordinations is interesting as well. um I talked to a bunch of projects that would be interested in using ceramic at the dsi conference, for like laptop example, essentially want to create a marketplace for people to run lab experiments. So if a someone that has an experiment could put up a proposal and people could use like requests for that cool thanks.

D

E

A last question sure anyone.

E

No, don't be shy, guys.

B

um I'm curious how you see like um competitor protocols that are sort of more single purpose, like um focused around a certain use case um compared to.

A

Ceramic, which seems to be sort of a very general sort of protocol in the sort of I.

B

Guess social app space yeah interesting. Do you have some examples? um Well, like lens protocols trying to do some like social media stuff, I mean they're trying to do it on jane.

D

A

It seems like social media applications could be built on something like lens protocol or something like ceramic. So I'm curious, which way you see, sort of the advantages of ceramic over things like length.

B

Yeah, so I don't see it as like. Strict competition, like I think, lens does interesting things because you can have like incentives through the the nfl organization, and ceramic is not really about like energies, it's about like generally building a high throughput data system. I think like, if you try to build a system that makes everything all the data into an nft you're gonna run into this problem right. You can't scale a strongly consistent system to to the scale of the internet like we can't do that in web two like.

B

Why would we be able to do that in group? Three? That's to me that doesn't make sense, but if we can leverage the benefit of like either good data that ceramic brings with kind of financialization of certain aspects like lens strings, I think that's the best of both worlds.

E

All right, thank you very much again.

E

So, let's now take a little 10 minutes break to enjoy some drinks and to grease a little bit, and then we come back for the last part.