IPFS IPFS þing 2022, 10 Aug 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Caching IPFS & Bitswap Workers - @vasco-santos - Connecting IPFS

Description

Caching IPFS & Bitswap Workers - presented by @vasco-santos at IPFS þing 2022 - Connecting IPFS - https://2022.ipfs-thing.io

A

Everyone happy to be speaking to you today about caching on ipfs, really like all the caching. uh This talk will be backed by our journey at tag house, while we're building the dot storage products that you know as nfl, storage and web 3d storage.

A

So uh yeah talk really about caching, a lot of caching, multiple caching layers uh and angles uh so yeah I was the dot storage products journey so far. It was really great. We had a lot of caching, um so yeah we have been building and maintaining uh these products for over a year now we had expected growth pains over time uh as expected, especially when you build a product in a really small time frame.

A

As you can see in this graphina dashboards, we have like over a little bit outdated, but we have over 100 million uploads. So far, uh one of the big pain points for users over time was always uh retrieval like retrieval was not reliable. Retrieval was quite slow and when you expect people to build on top of our technology, you can expect that you have this nft marketplace. They open their website and they don't even are able to retrieve nfts.

A

So yeah we need to do something about it and one of the things that we taught in the beginning, yeah what about cdn? Caching, how it can help so yeah like you, should already be familiar with cdn. Caching, I'm not going deep into that. I will focus more on what were the concepts that were nice for our use case, so focusing more on uh first user experience, uh of course, like really fast content, retrieval and also uh reliability.

A

While previously we like professor io, sometimes like allen's talk previously, it might take a while, but it's not like constant. So we want something that is: has predictable performance so into the network, slash infrastructure, side of things uh we want to cut down on bandwidth costs, so caching less less need to overload the public gateways in the ipfs network and also just out of the box geo dispersed content uh delivery network.

A

So why is cdn really a great cache, a great fit for um ipfs like so? As you know, location address of well content can't mutate any time which makes like okay. You can cache the content, but you might need to check it and validate that. It's still the valid content. uh You need to also take a good configuration on all the cache control and guarantee that you might not serve outdated content, while in ipfs everything is muted immutable thanks to its nature, so it's really an excellent fit.

A

We can just cache and cache the maximum amount of time, because, by the end of the day, all the ipfs http urls will return the same response so yeah with this, I want to present you our edge gateway. This was our first iteration kind of proof of concept which we built specifically for nfd storage. We called nf storage.link, so it's essentially it's not a typical ipfs gateway, as you know it, but a cdn on top of it. uh So it is kind of uh essentially built and deployed on cloudflare workers.

A

It's just and they have just hundreds of data centers, so it will be. You'll have one near your house all the time and things will be fast, uh so yeah. Essentially what happens? Is we receive ipfs ipfs request like either subdomain or ipfs path resolution and at first we'll go through our multiple caching layers? I will go into them in a bit, but for now, just if it's not cached, we'll do a race between multiple uh gateways, so we have ip festerio. We have a dedicated cloudflare gateway and also a dedicated pinata gateway.

A

We just race all of them. At the same time, the first one that responds back is the one, the response that we proxy and just give back to the user. uh Yeah. With this we can offer a more reliable solution. We have multiple ways of getting the content from and also more performance because, like we can just give the fastest response instead of uh having the user just go to one gateway and wait for it to to respond so yeah zooming into the actual edge gateway and all the caching things.

A

uh First, uh we just get for free the call for cache server, uh the http server cache uh and which, even before just the code to get to our code, we'll immediately have a lot of things cached uh in case uh we need to actually go to the cloudflare workers, uh part of things. Of course, we first have the denial list just to make sure that we don't serve bad content.

A

It is synced with the badly bad bits which of course, adds a bit of extra latency, but that is needed otherwise, we'll be end up blocked by google or other security vendors like we have been in the past, um so yeah lots of knowledge of operating uh kind of a gateway, even though it's not really a gateway uh yeah. So um our first layer of resolution, it's divided into two parts.

A

uh Both are cdn caching and we basically do both in parallel and the response, the first one that we get so we essentially rely on a cloudflare cache api, which is an lro cache. We constitutionally used uh and if you just got the request for- and we saved the response there- it's just really fast and we can get it back, but we also shipped super out recently. It is basically a premium permacache where okay, you want your content to be cached all the time you want.

A

You don't want to rely on it to be popular and always in the cache api. So you just want to also have it permacached everywhere. We have this premium feature in our red pipeline, uh so yeah, if it's not cached, as I mentioned, we just do the gateway race uh just a glimpse on on metrics.

A

Our cash plus race, of course, makes it it considerably faster than any individual public gateways. As you can see in some metrics here we are at least four times faster than public gateways. Sometimes like we get to kind of ten times, it depends really on the day and the type of content that we are getting like if it's already cached or not.

A

So it's not super predictable, even though like, as you see like for the 95th percentile uh response time year for this instant of my screenshot, we have uh a bit over 100 milliseconds 115 uh in terms of cash hits uh the first part the http server cache. We have kind of 40 percent, it sometimes drops. It sometimes gets a big a bit upper and in terms of the work running, gets to the cache api or to the super rod. We get still another 60 from the remaining 60 or 55 percent.

A

We we get there so essentially this makes this thing really really fast. uh We have a lot of other metrics uh comparisons with all the gateways we track, like all the the requests that we do to any single public gateway and we can like get all the metrics if you want to talk about them and look into them. I am happy to share with you that later on, uh but I didn't want to go too into deep into this, so yeah.

A

What's next, uh maybe maybe some more caching right, uh yeah, uh so nfc storage, just link has been uh quite a bit success so far, users are really happy about it. So now we are reframing. uh Nft storage, juggling this proof of concept that we did to be kind of part of all the reits pipeline of the dot search products and not specific to one product.

A

So for this we are, we have been relying on a lot of better features from cloudflare and they ship this new feature called worker bindings, which is still better, and we are using this to basically uh look into building w3s.link and that, but at the same time make infrastructure that we are not just operating two different things: maintaining all the different code, so yeah, so the the goals are.

A

We want to offer the same uh excellent, retrieval experience and empty stores link as we want to have a cdn lro cache for both products scoped for each other, so that they just have their initial caching layer uh specific to each product. But we also want to have a common dot storage caching layer that if the first cache layer is not yet, we can still have more in, like shared cash between these two products and eventually more in the future. um So yeah, as I said, we don't want to maintain two full gateway implementations.

A

We just want to you maintain most of the code in one place, and we also want to iterate on all the learnings that we had building our first gateway. So try to not get blocked by google and other security vendors, so yeah. We already know something. There are some risks on shipping, a new domain, we're all aware of them and we are working on getting hopefully not blocked this time, uh so yeah, uh two different domains that was great.link and of stars.link.

A

This essentially gives us out of the box two different scoped lro caches within each cloudflare zone. um Then we rely on the worker binding that I said to you where, basically all the kind of codes there is, the gateway part leaves its use for both products, even though, like in the future, we might want to have this specific feature for one product or for the other one, so we can just easily just add specific logic for each product in each gateway.

A

um Yes, so that's this here and now um one more like the last bit of caching. I have for you today. uh Let's talk about claudio. uh This is the new cache thing. Alan and I have been knacking around recently uh this you can think of this by reframing. What we did with uh gateways and caching to bitswap and caching.

A

Yes, yes, things changed so so fast, so yeah um you will listen uh later about uh elastic ipfs. Also, some motivations for this comes from there. Essentially egress is almost always free in cloudflare, as opposed to aws, where we are currently in um with the elastic ipfs, where egress is really expensive. So as we move into having elastic ipfs as part of our pipeline, we are looking into ways of both having a really good uh retrieval experience, but also reduce the costs of uh getting uh data out of the of the network.

A

So yeah claudio uh cloud is essentially a lipid p node running uh within once again, like cloudflare workers. We really like cloudflare in our its pipeline, so it essentially is a long-lived uh worker that exposes a websocket port and essentially any ipfs or libitry node can just dial us uh once we get a connection into websockets. We just have a custom, websocket transporter, talon road, where basically, we can just upgrade the raw websocket connection to a liberty connection with all the security and multiplexing negotiation. That libertp then offers us for free.

A

Then uh lipitp identify protocol kicks in. We add our kind of minimal implementation of bitswap, which is basically a fork of elastic ipfs uh bitswap with some tweaks, which basically means we use another cloudflare backed thing called minibus, where, basically, you can think of minibus as a block store in the edge where we can just have r2 buckets with blocks stored by multi-ash, so they can also multiple cpc. Ids can just point to the same block and also like as a different worker thing.

A

We can also have another caching layer just to get blocked even before going to to r2 so yeah. The flow is uh dial from any node to cloud a bit swap stream between the nodes, and then we just receive uh messages from bitswap, saying yeah. I want this this block and we just go to do a minibus and garrett in the future, like we will probably look into having a full bits of implementation or looking into other things.

A

uh This is just a really proof of concept that we did in like a week or so so yeah. uh Let's do, let's do a demo, uh a live demo to be more fun. uh Can I get my terminal this side? Yes, I can okay, so um yeah my new tool, so uh I'll just put the logs on for you to be able to see stuff, so we have on the top claudio uh worker running minibus, I'm running a js ipfs demon as well.

A

I have here on the top left. uh Basically just I will put uh a block with this string of content into minibuzz, so that then we can get it from claudium, so uh we can just post into it. Yeah it worked. We have the multi-ash of that content. ah My neck will not like me for it so yeah um you can see here uh so first uh storm piers for our nodes, no peers. We can try to uh no. First, we need yeah. We need to do this uh conversion.

A

uh We don't still know yet we didn't look into, but we do. We can't do use the js ipfs to get uh multi-hash, so we need to kind of transform it into a cid. um We don't not show it yet why? But that's a different story so yeah. uh If we try to uh get blocked and I try to get it. Of course, I won't be able to get it. I'm not even connected to claudio, so uh if we try to connect swarm connect, yes, uh okay! Now we are connected to claudio. Now.

A

If we now you, you will see once I get the I will get the post. I forgot to tell you about when I posted from uh that site to minibus now, when I will try to get from js ipfs, you will see that minibuzz will be contacted. So yeah give me your blog, so just happy facebook get. uh Will you give me that block get yes?

A

So so we have ipfs thing 1022, so yeah. That was that was it um let's go back into the slides? uh Okay, uh oh my okay, so yeah cloud, one lock says a lot of opportunities. We will still discuss this week a bunch of like what are the use cases that we want to try to unlock with this some things we can just place it in front of elastic ipfs, which would basically give us cdn caching for the blocks requested from the fpfs network.

A

uh We can get more fancy and kind of. uh Do. Kind of us is a protocol liberty, protocol kind of something like circuit relay, but for proxying content as well. If we want to to get more fancy about lipid p uh with that, we can also just uh create a full bit swap implementation and actually go to to the network and try to resolve blocks, or we can just also proactive uh do block caching.

A

When, for example, we get a root, cid request, you can go into carve it to indexes that we store and just get everything cached before the the follow-up requests come in so yeah. That's all from me. Do you have any questions, uh comments, suggestions.

A

A

Yeah yeah, so so one of the things that also made us go into the worker bindings architecture with multiple servers was that, like the the this piece on the right that I previously had, uh this could, in theory be kind of the web.link.

A

uh So in theory like we have it here, ipfs rayo, which is an iso to the web.link, but it could even move here with this sd web blockling, and then we could also use it as part of the race and we'd have even more caching layers on top uh yeah like we want. We are building this in a way that then the web.link could use it.

A

To what extent could this work would also benefit like having sex workers on the client that would also cash like we provide bits? What base yeah we? I think that would also work.

A

We actually, we had our uh team retreat a few months ago, and we act uh some stuff around that the actually the motivation was more around not getting blocked, because you could just do local requests to the service worker and then from the service worker to just to our stuff and like, of course indirectly we would have more cash as well so yeah we have been acting on that kind of stuff.

A

It's still not reached out as a priority for us, because we also know that running a service worker is sometimes a pain for users, and then you can't have two service workers at the same time, it's really difficult to distribute anyway, might be something that will look into the future maive.

A

um Yeah, so for now, uh our our strategy for xpns is quite simple. We just redirect to the web.link, but uh at some point we will integrate with the w3 name and we would resolve from the robot3 name.

A

uh Double training ends up being the cash already well yeah, so there's a name would be the cash for the actual national approvals. But then, once the happiness records resolve to a static identifier, then we would check still the same style exactly.

A

Can you talk to me a little bit about the different implications of what how the different layers of cache and how they're different helps in different use cases um yeah. So.

A

So here it's like all about gateway requests, so basically the different cache, so the first like scope by products because, like each cloudflare domain, has like this size of alloy cache. So we want more sizes. So this means we can have more uh our caches per product. uh We also have the permacash for people that are just not fancy enough to er to just rely on. Maybe it will be cached like I highly likely you just went all the time, but then in the bitswap part.

A

It's also like what, if requests are not even coming to the gateway, but you are just running your own ipfs node and you want to bit swap the content. We would just surf stuff from elastic ipfs, which will be like more expensive in terms of egress, but we can just put this other bit swap cache on top to make things faster and cheaper.

A

Also, like the majority of our wristband requests, are other gateways exactly.