IPFS IPFS Camp 2022 - IPFS Operators and Enterprise, 31 Oct 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Bridging the worlds - the care and feeding of IPFS Gateways - Mario Camou

Description

For several years, Protocol Labs has operated one of the busiest public IPFS gateways in the world,

A

So so it's a little bit about how we at PL are keeping our gateways healthy. So why do we even need gateways? I mean our friends from pinata from cloudflare already talked a little bit about this right, but I mean we're all decentralized.

A

Web3 is here? Isn't it uh at least all of us here who know that peer-to-peer is a thing. Web3 is the thing, so why would we even need a Gateway right um yeah? Let's just have everyone run their own Kubo node I mean why not right those are nice thoughts, but we do have a problem and, as William Gibson once said, the Fisher is already here. We are living in it, but not everyone is yet so it's just not evenly distributed yet at some point it might be, but not yet so, basically yeah.

A

We need gateways for all that. So, let's change this a little bit and that's called this Bridging, the webs we're Bridging the way the Gateway is there, because most of the people are still living in web 2, even though we are living in web 3, but something needs to bridge between them and the Gateway. Is our current chosen medium for doing that?

A

So now you might be asking so who's this guy? Why is he talking to me? Why can't I go just be out there getting coffee or doing something more fun than listening to this guy? Well, my name is Mario I'm working for PL uh I'm on fields. You can find me on Phil slack. You can find me on Twitter. You can find me well, if you search for me, you can probably find me anywhere.

A

uh I've been working for at PL for almost a year, I started in December of last week and I'm, currently part of the bifrost team and or beef roast or beef roast. If you like, we are basically the gatekeeper, the keepers of the gateways. We are the guys who actually try to keep them healthy, try to keep them working and try to keep them from killing over not always successfully.

A

uh We also keep track. We also the keepers of the preload notes, the bootstrap nodes, some clusters, uh almost everything that is not Falcon related. Basically, that's us. uh We have I mean uh we are now five people in the team up uh from about a month ago, to now up to about a month ago, we were two people, so it's getting better now before that, so why? How did I wind up here well before that I was actually SRE at a pln uh at a company? That's part of a pln which is academics.

A

We were doing Industrial Systems uh written on top of ipfs, using ipfs as a transport player uh a lot of fun there.

A

Now, uh let's talk a little bit about how things were when I joined a year ago and a year ago, and a little bit more than a year ago, probably uh yeah a year and a half ago, something like that.

A

The going philosophy was kind of what I talked what I mentioned at the beginning, we're all distributed. So why do we even need infrastructure? We, it was kind of the gateways were a little bit of a you know, necessary evil. So we could actually see the problems right. I mean we were dealing with about 500 million requests per week, which, if you compare it to web 2, is not sorry. I keep moving and uh yeah, so 500 million requests per week.

A

Yep, that's I, mean not really a huge amount by web 2 standards, but we were having a time to First Bite of about 25 seconds, which is definitely not acceptable. It's it things were bad. So what do things look like now? And some of you might have been in Mali stock at the beginning or in mali's other stocks and he's been harping about every single week in the sit reps in the Ingress ciprips.

A

He talks about our time to first bite and the number of requests five seconds while dealing with five times enough with four times the number of requests. So things are getting better. Well, not there. Yet we're not at web 2 levels. Yet we might never be just because the DHT is a DHD and distributed systems are hard, but you know things are getting better better now uh uh how how could have we got into this to this point? What do we do? Well, one thing is: this is just the way we are doing it.

A

This is the public Gateway at ipfs.io and the web.link. So you've heard uh people from pinata people from cloudflare talk about some of the things that they're doing. This is the way we are doing it. It's not the only way to do it and, as evidenced by a whole bunch of other people that are running gateways.

A

This is uh you'll, probably get the link from the presentation, but in GitHub you look at the public, Gateway tester and anybody can go in there register that their Gateway, and it will tell you things like you know- is this Gateway uh doing course or not?

A

Is it doing ipns or not, or something like that so but they're we're not the only ones running gateways, we're I think right now we're the busiest one, or at least the most famous one, because everybody uses ipfs.io or the web.link, but still we're, not the only ones and again this is what we are doing now.

A

What have we done? I'm going to go deeper into into this, but yeah. First of all. Well, people have been solving web 2 problems for a long time. Well, let's take some of the learnings that they have had standard CDM practices right.

A

um There are some interesting ipfs configure considerations that go into how things are configured. uh Oh, this is no you can't you sorry about that. Oh there we go. The thumbnail was so too small. So that's why I kept doing this because I could not read my own screen so uh also what metrics we are gathering some things that we need to watch out for so, first of all, what do we call separation of concerns?

A

Well, the first thing that we are doing is separating the web 2 part from the web 3 part, and the first thing is the first thing that we, you could think about. Well, you want to you want to do a Gateway. Well, why? Not? Just you know, start up a Kubo, node, Xpose port 8080 to the internet and you're done right.

A

Have you ever tried to do that? Has your Kubo node survived, you probably don't want to do that. That is I mean what could happen right. Well, as a matter of fact, multiple things, multiple bad things can happen. uh First thing is well you don't want access to the API right, I mean API. V0 ad is probably the least of your problems. If you're doing this, so you don't want to expose everything. There are some things you want to expose.

A

You want to, for example, be able to gather metrics, so you do want to be able to get the API Port, but you just don't want to expose it to everybody else. There are also also what uh what bathroom uh pinata talked about talked about right. There's all this content blocking that you have to do legal stuff, uh security, stuff things like phishing things like malware uh and there's also the fact that well Kubo, you can scale it vertically.

A

I mean if you give it memory, it will happily grab all the memory that you can give it until it crashes until it actually ooms so yeah. uh So it you can also have go routine leaks. That's gotten a lot better with Kubo zero 16., but the previous versions yeah, not so much it would basically go. It would start up using up, go routines using go routines using up, go routines, continuity. You actually went in and restarted the thing because it got completely blocked.

A

So there are some things that prevent you from wanting to expose Google Plus, you want to scale horizontally. You want to have multiple Kubo nodes, so you need to have something in front of it, which of course in we have two terms, is called a load. Balancer now load balancing. So what are we doing here? Well, we have a load balancer layer on top and then at the bottom we have the Kubo layer.

A

That also has nginx on it and I'll I'll get into it and we'll get into a little bit deeper into how this works in a moment right. So we have. You know we have multiple load balancers for failover in case one of them Falls and the other one, the other one will take over blah blah blah blah blah nor again normal web 2 uh stuff. If you have ever done this in the web 2, this should be old hat now, so the load balancer is doing multiple things. It's doing, SSL termination.

A

Normally you don't want Kubo to do your SSL termination because inefficient, but in in better service and so on so forth. So you want that, of course, HTTP load balancing you want to do, retries things like if one of the backing, if one of the downstream nodes Falls over one of the Kubo notes Falls over well, you want to retry to the other one or to the other one of the others.

A

uh You wanted to do redirects, at least from HTTP to https, for example. Things like that, uh some rate limiting like what our friends at Clover are doing, uh and one interesting thing that we're doing here is uh we're using. We are not using DNS for uh geography for geographical balancing, so we actually have six points of presence two in the U.S two in Europe two in Asia, instead of using DNS for load balancing for redirecting the traffic, we're actually terminating bgp in all of these nodes.

A

uh That means that we have a single IP address, all nodes share that same IP address and then internet routing protocols. The internet is a wonderful thing and internet routing protocols take care of delivering your packets to the closest node Network wise.

A

uh What the big advantage that we get that at least that I've seen when maintaining this from by using DGP is that if you need to take a note offline, you don't have to you know, update DNS and then wait for the DNS cache to propagate and hope that no one of the clients are that all of your clients are actually respecting the TTL. The time to live.

A

You just take down the the bgp demon, we're using bird. You take down the bgp Daemon and that note disappears from the network. This appears from the public network and then nobody's balancing to it, and then everything is fine. Everything is wonderful now, um so that's one of the things we're doing at the nginx level, we're terminating bgp well and in the nginx nodes and they'll have balancing nodes.

A

Of course, terminating is a cell that you all share a wild card certificate they'll have a wild card certificate normal one of the important things that we've seen in web 2 is that one of the most important tools that you have for um for getting good performance is caching, so we cache we have our load balancers and right now, we're seeing between 15 and 25 percent of requests are cached, which is not bad. It's not, you know really good, but we are not seeing too much uh too much.

A

Caching, because people it turns out are requesting different things right, but anyway, 15 25, that's 15 or 25 of traffic. That does not reach Kubo, which is good, uh and one of the nice things about ipfs. Is its content address it's content address. That means that you can cache a CID cash and I slash ipfs URL or a subdomain Gateway URL, basically forever, because it will never change. You don't have to worry about cash flow hearings. You don't have to worry about yeah.

A

Will this content be updated, so how long do I have to catch it? You cache it. Basically, until you run out of space, that is really nice, um uh then well, like we mentioned the redirect from HTTP to https the retrying to a different Kubo node.

A

If, for example, one of them is unresponsive for any reason, um we're also doing rate limiting we're right doing rate limiting actually at both levels, both at the level answer level and at the Kubo level, at the load balancer level, we are rate limiting per IP and we're limiting per IP and CID.

A

uh Basically, if you are on a computer and you're repeatedly requesting the same CID Something Fishy is going on, because you have already, you already have the CID. Why are you re-requesting it every 10 seconds?

A

So we are not not rate limiting as aggressively as we could just because people are behind nuts, so it could be that's two different computers that are actually requesting that same content that are behind the same IP. But now, if you get, you know, 100 requests for the same CID during the during 30 seconds. Then probably something fishy is going on there um and that's basically it that's. That's. Basically our load balancer layer, our load, balancing layer. Now we then go down, and now it gets to the load balancer and goes to the Kubo node.

A

And then we also have another nginx, that's running on on the Kubo node and the responsibilities of that nginx is basically protect. Kubo, so Kubo just listens to localhost and we forward things from via nginx, for example, to be able to get the metrics and the next takes care of allowing only requests from our Prometheus instance that will get those that will get those metrics uh and it only allows Gateway requests from the load balancing notes.

A

Again, it's protecting Kubo, it's just so that you know you don't get ham, you don't get Kubo hammered the other thing that it's doing, we are actually using basic health so that the load balancers can are the only ones that can talk to the Kubo nodes. That's the other reason. The other thing that's not on the slide, but the other thing that we're doing that this Kubota is doing it's actually doing basic auth so that we don't get somebody figures out the IP address of one of the Upstream nodes and starts requesting it.

A

We were having that problem six months ago, so we just added, you know basic auth, it works so far. It's not elegant. It works now.

A

uh The other thing that it's doing it's doing some rate limiting again to protect Kubo, just to not bombard Kubo, with uh with request just in case uh I know, you get many requests for many long-running requests and you're just doing load boundary you're just doing round robin between the Kubo nodes right, so if one of them is actually if you're for some reason, because of Statistics, one of the notes is one of the Kubo notes is actually getting hammered.

A

Well, we also rate limit to protect it a little bit and so that you don't get too many requests going on to a single Kubo node. And what will happen then, is that the load balancer will retry to the other one, because this one just returns uh four to nine.

A

No, there are some interesting considerations when you're dealing specifically with ipfs and here's where things start getting a little bit more interesting and so far it's just you know standard rip too.

A

um The first thing we mentioned about content, addressing so aside from the fact that you can cache content address URLs, essentially Forever Until, you run out of disk space. There are a couple of other things that are interesting about ipfs. The first thing is: uh if a node, so if a Kubo node has fetched the content, it will have it in its Block store until it's garbage collected. So why should we retry to if I said, if we get a request for that same CID? Why should we try a different node?

A

Why should we try a different backing node if it?

A

We know that it's already cached in this in in in in the node, so we're using consistent, hashing, consistent hashing, basically uh is an nginx configuration that will that if you get a request, it will always go that that specific request will always go to one of the to one specific Upstream node, which means then that we are trying to make the best use possible of The Block store of the Kubo Block store uh and, like we said well, ipfs can be aggressively cached ipns, not so much.

A

We actually have a configuration so that ipns is not cached as aggressively or actually a TTL for ipns, it's a little bit less or is less than ipfs uh now appearing. So it's important to for the DHT to work to work correctly. You need to be judicious with your peering. One of the things that we're doing is we're peering with all the hydras, so I'm, not sure how familiar you are with the hybrid notes, uh but it's basically, uh therefore making the DHT faster. So they are sharing the same.

A

They are sharing a um what's the correct name of it. They are.

A

They are changing, sharing the provider record store, but they are have multiple cids, it's a node that has a single providers, but it has multiple cids, which means that it reduces the number of hops that you have to go through the DHT to find a specific, a specific piece of content, uh if you, if any, if what I'm saying does not make much sense- and you are interested in that- you might want to uh look in the look in ipfest.io to see how the DHT works, but it's basically a way of reducing the number of hops.

A

You have to go through to get uh to get to a particular piece of content, we're also peering each content, each gate.

A

We know with each other right now with, since we introduced consistent caching, that is probably not a good idea or not necessary, but we're appearing with uh we're, also peering, with a few uh uh judiciously chosen service providers such as pinata such as cloudflare and we're also appearing with you, know: bootstrap nodes Etc, what else now the file system, so we have been experimenting with file systems, because kubois I mean the well not only Cuba, but also nginx. It's also it's I o intensive the cache and The Block store are are quite I o intensive.

A

You also need quite a bit of I o. You actually need you. You are usually on spinning disks just because of capacity and cost. So what how do you make the best use of them? So we have experimented. We actually started with extended four uh that while we were having some performance problems, we tried butterfs uh butterfests. We didn't get the increasing performance that we really wanted. So now we're experimenting with ZFS ZFS. So far is nice. The problem, so the good thing about ZFS is it gives you a lot of places.

A

You can tweak a lot of things. You can tweak, whereas butterfest is not the bad thing about ZFS is that it gives you a lot of things that you can tweak, so you have to experiment quite a bit to get an optimal configuration. We one thing we tried was using an SSD cache.

A

That was not a good thing, because we had so much cash churn that the cash actually made things slower and again it will be different, depending on your, um depending on on your workload, so the best way of figuring out what it works for you and what doesn't is Benchmark, Benchmark and benchmark, but again uh for us with how busy the Gateway the gateways are. uh Ssd. Caching was not a good was not a good idea.

A

Now, aside from that, we are doing horizontal scaling so another one of the reasons that the gateways are now so much faster is that we went from 15 Kubo nodes at the beginning of the year to 120 Kubo nodes today, so yeah, adding Hardware usually helps.

A

uh uh We have been scaling now we originally had. We had were evenly distributed between the six points of presence. Now we're basically scaling them up as they start falling over. So, yes, they start falling over because you get increasing traffic, increasing traffic increasing traffic, so we actually start scaling them up when they start falling over there. What sort of things are we tracking so our most important metric time to first bite time to First Bite P95? Usually we use a histogram.

A

Histograms are better than averages just they just give you a lot more information, so we check our P95 ttfb is kind of our Mantra, our most important metric. We also track HTTP codes, so how many requests are, for example, we alert whenever we get an excessive number of five of five hundreds because it might be uh or five x axis. uh It might be a bug in one of our in some of our Lua code that we're using to, for example, lob off. uh Oh we'll get into.

A

Oh yeah I put it in the slide and I completely forgot to mention it. Yeah, remember consistent caching, so where one thing we're doing is we're lopping off the beginning of the URL, just because it's always constant, slash, ipfs, slash, bafy blah blah blah blah blah, slash, ipfs, slash, qn, blah blah blah blah blah blah.

A

Well, we're lopping up that that up to get better caching performance, but that is done in Lua code inside nginx, uh where we're also converting all the cids to we're calling into Kubo to convert all cids to the same format so that, even if we get a request for to the two cids that look different but they're actually the same CID, because one of them is CID v0, the other one CID V1 we're actually converting them to the same, and we have had bugs because you know software has bugs.

A

Oddly enough, um so we have had buds there that result in 500s. So whenever we get, uh we start getting a lot of 500s, then we have to go look into that um we're also having four tenths. Four, tens are things that, as the HTTP code for that we use for Content, that's blocked. So are we getting a lot of blog content?

A

We get also how much content is rate limited, which is for tonight, um and one discussion that we are still having at this point is what does uptime mean, especially in the in a network such as the ipfs network, because the fact that you suddenly get some requests that take five minutes to resolve or 10 minutes to resolve or never resolve doesn't mean that your service is down.

A

It might be that it's someone that is hosting the content on a Raspberry Pi at the at the slow end of a 33.6 k, modem and they're, trying to get that CID. Well, guess what it will take forever or it will never get there. So we're still having that discussion. That's it's one of the difficult things to Define exactly what do we mean by service down? What do we mean by uptime? What metrics are useful for us and again this is still an ongoing discussion between by Frost.

A

um Okay so some things to watch out for uh first thing limits so especially for Kubo, so one thing we're doing is via systemd or in the case of the preloads and the bootstrappers docker limit kubo's memory usage and have the system kill it before the system actually dies? We have had cases we we were. We have been tuning that because we did well, we did tune that sometime this year, because the nodes were basically going out of memory and becoming completely unresponsive, which makes no sense, which is not useful. It's better.

A

If you actually have the system kill the funding process before it actually becomes completely unresponsive. So we have had to tune the memory limits to uh the Kubo memory limits. The thing is, you have to be careful because you have to set them low enough so that the system will have time to react but high enough, so that you're making the best use of your memory right now, if I remember correctly we're at 75 percent we're at 75 we're giving Kubo 75 percent of our of our Ram.

A

If I remember correctly same thing with GC same thing with disk space, you don't want your blog store to become full. So what do you do? You use GC? You enable GC and you set a GC percentage. You have to say it high enough so that you're making good user of your space but low enough so that it will have time to react.

A

It will have time to actually go in and have you see we have had cases where the GC percentage was a bit too high or a lot too high, and we get a bunch of large requests coming in a burst of large week was coming in one after the other, and basically the file system fills up completely and then you have to ssh in there and run ipfs repo GC or just go in there and do and use the other, the other form of GC, which is RM minus RF um to actually clean things up, because Kubo was not able to do the the GC quickly enough and there were around 70 if I remember correctly, for the GC for the GC percentage.

A

That's the thing about having all infrastructure in code since it's in code, it's not a cache.

A

So that's why I'm saying about if I remember correctly um summer stuff to watch out for, like our friends are pinata mentioned, people will abuse a system. So yes, this is a free CDN. It's a free, open public CDN. What's there not to like, so people will make use of that CDN.

A

uh The CDN will be abused now you need to set up. What do you consider abuse? Is a website using your CDN or using your ipfs notes to host some videos because they don't want to pay Amazon to do it? Is that abuse or not?

A

In our case, it's not in our case we're using we're defining abuse by basically uh uh requesting the same content over and over again requesting more than x, number of making X more than x, number of requests, and second, I really don't remember the number right now, it's fairly high, it's actually fairly High, uh so yeah we're rate limiting we're rate limiting just because we want people to make a reasonable use of a shared resource. The other thing is what we were mentioning in earlier in in the pinata talk about bad bits.

A

We call it bad Bits. Bad bits is basically our content blocking yes, we're supposed to be completely open. Yes, the internet is supposed to be free. Information wants to be free. The problem is that governments don't agree. So we have the gdpr in Europe we have the dmca. We have uh takedown requests because of you know, malware phishing, so on and so forth. So we actually have an email abuse at ipfs.io and those emails.

A

On the one hand they will, it will automatically scan for any cids there's a job that will automatically send for an ecids. It will create a pull request on our bad bits: repo internal battery repo. Now one thing we are doing for bad bits is: we have actually published our bad bits list and you can barely see it here, but you can see it in the slides. It's bad bits.dwebops.pub, so Jason and and I can can I see it here.

A

I can show it to you here. uh Let me just show it to you here: um yeah, oh, come on there we go yeah, so I mean we actually have it's. It's actually a Json file, but one thing that you will notice is that the Json file does not contain uh there. You go. The Json file does not contain cids.

A

So what we're actually doing is the cids that come in and there and get reported. So we don't want people to use this as a menu of interesting things that that you might find on the ipfs network, because people will do that right. I mean this is blocked. Oh, it must be blocked for some salacious reason: let's go in there and get it because you know it might be the least Marvel movie or whatever. uh So what we do is we hash it?

A

We actually hash the CID, we do double hashing and we publish the hashes when we get a request. We actually hash the CID. That has been requested and look it up in bad bits. That means that we are not exposing the original cids and then, on the other hand, we can actually also block it.

A

um uh Yeah come on uh I hate this. There we go. uh Why did I? Why did it go all the way to that? To that slide? That's the next slide. You don't want to be there. You don't! Okay, forget you just saw this: let's go back yeah well anyway, we're almost finished there, and one thing that that we have seen is that about, depending on the week between 20 to 50 percent of requests are for Content. That is blocked.

A

uh Some of that might be phishing campaigns. Some of that might be uh people downloading illegal content. uh Some of it might be whatever so yeah that, oh actually, you know 20 and 50 between abuse uh between rate limiting and bad bits, so for tens and 429s okay.

A

So that's the Gateway, that's what we are doing and so on and so forth. What alternatives do we have?

A

Well, one of them I mean this is a little bit also trying to look at the future, but, like Yogi Berra once said, it's tough to make predictions, especially when looking forward to when looking at the future, making predictions about the fastest, wonderful uh but yeah predicting the future is a bit hard but yeah one thing: we can do one thing we can do to get rid of the gateways or to make a better Gateway. Well, you know what let's use ipfs, let's do what we said at the beginning. Have everyone run their ipfs node?

A

That is getting easier and easier if you're running Brave, you just go into settings and tell it enable ipfs, and now you have an ipfs node running on your Brave browser and it will automatically grab things from ipfs. Instead of going to the web yeah that works, uh you can use the ipfs companion and the ipfs companion. Of course, that requires you to have an ipfs node running that you need to start up yourself, but it also will automatically redirect anything that is in that is on the ipfs Gateway.

A

It will redirect it to your node to your local node, which is nice. uh You can also, if you're a developer, incorporate js ipfs on your browser or whatever of the or the new.

A

um Whenever it comes out, Palm grenade or however, however, it's called the new version of ipfs and Js. There was a talk about that yesterday, ipss in JS is getting better, for example. Now it's it the with all the work on web transport. It will now support quick. So it's getting better there is. There are multiple problems aside from also making the having um more friction.

A

It will also not work in non-computers in non-laptops in non-interactive computers, so setup boxes, so uh Raspberry Pi that somebody's running there right it will not run everywhere. It will not the these Solutions uh work as long as you have a browser in front of you and you have a browser that supports any of this. If you're running Safari, then tough luck, unless you want to do your work manually, there's something else and I'm just uh I'm just going to plug a talk, that's going to happen in this same room at 2 pm.

A

2 p.m is answer stock right, yeah, which is Saturn filecoin Saturn. So what that does? What Saturn is it's uh um crowdsourced, CDN for filecoin and ipfs content?

A

uh The idea is that when once everything is uh Saturn is actually completely working, you will be able to earn falcoin by running a note on your how on your home that will help people actually get content from ifs and Falcon and yeah. You want to talk to. You want to see Ansgar stock at 2PM if you're interested in this, so I hope that this was interesting.