IPFS GUI and Browsers Weekly, 28 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Revisiting Distributed Wikipedia and topics for Q4 - IPFS GUI and Browsers Weekly, 2019-08-28

Description

IPFS Mirror: https://ipfs.io/ipfs/bafybeib26xf6nqf5vrlyl6h6baxrtbqg7lp6etniubezaegmpa4oeumj3a/

A

Welcome to GUI any word in web browsers for 11th September 2019, we'll be discussing some things related to web browsers. I, don't think, there's anything related to GUI, but we'll shall see. I have a first agenda item, so shall my screen now but feel free to add your items.

A

If you want to discuss something so in the past week, I've been mostly focused on embedded J's ipfs in brave, but as a side project I started tribes, eating distributed Wikipedia project, so I think it's a pretty good idea to give a short update on on the project so distributed to keep in your mirror.

A

It's a mirror of Wikipedia, that's put on ipfs and we've done this project few years back and it was sort of put on autopilot and we actually did not set up any compassion or scheduled any checkpoints for updating the content and we are in 2019 and we still got like English Wikipedia snapshot from 2017 like work does not change that much, but the latest views are not there and some I just did not receive updates.

A

This also translates to problems are for public gateway operators when they get DMCA requests for takedowns of specific Wikipedia pages in all the versions, all the heavy stuff, so I started looking how we were generating those mirrors and what are possible paths for going forward. So a quick background on how we were generating those mirrors is that there are snapshots of Wikipedia created in as in format.

A

Zine format is like one huge file created specifically for offline browsing of wiki wiki, not on the Wikipedia, but any media wiki compatible wiki's on the public snapshots available for download provided by key week's project. So that's some the base, the source of truth to that. We pick the years back for that and we added a thin orchestration layer when we unpack those theme files and put those unpacked wiki's on ipfs, with small changes such as, like fixing links making sure relative linking works as expected, adding a footer on a page.

A

So, for example, if we go to English wiki and go through the very bottom, you will see there's a footer which showed description, that this is a snapshot made on the specific date from those files and things like that yep.

A

So that's more or less where we are right now and the problem is we had lists of wikis right now. It's in this configuration.

A

The problem is those are sort of out of date, so I took some time and made issue grooming and trashing and filled some new issues, closed old ones and basically and I identified like the first need is to just update those snapshots, but that's a sort of semi manual process, so I created issues for updating specific mirrors, like I believe with English and Turkish and like existing ones, should be updated as soon as possible.

A

However, that's sort of a time in the investment I'm to invest first is to make sure the to update those templates to ensure the process it works, because it's have not been reused for some time and also we need to update the script and that modify the page content to reflect issues that we've identified.

A

Since then, one like the main one is to ensure the canonical link is present on every page, so we don't pollute, search indexes with duplicated Wikipedia content and each page points at canonical Wikipedia URL as their preferred the URL and that collapses search results.

A

Another one is to like ensure good footer is updated, and things like that so I believe the process of going through those steps would be like. We probably want to do that. On the English Wikipedia manually, we will drop their smaller snapshot by let's say 100 the most popular pages on English Wikipedia, which is just a smaller file, and you can like run your scripts on that. So I believe that's something we could do manually and then we would move to.

A

When we make sure the manual process works, we should figure out a way to automate the process. So I like the way I see it is. There should be automation in place that the text move snapshots like make sure it's not too often, but let's say once a month or once a week, and if there's a new snapshot for one of our supported wiki's, it should automatically build it, put it on ipfs pin somewhere so there's at least one source and then open APR in the suite.

A

So that's like the maintainer are able to review. It confirmed the Wikipedia's. Not this new Wikipedia snapshot works as expected, and then, when we merge it, the DNS link would be updated.

A

So that's sort of like a basic housekeeping to get project into a place when it's sort of self-sustaining and does not require too much overhead, but still keeps those snapshots updated and that we uses the current situation when we unpack theme files, a separate topic that we discussed this week was to use Zim files directly, which is sort of interesting, because the Zim file itself is a special format.

A

Just one long file optimized for random access and when we put that file on ipfs, we sort of split that file, which is already like, has internal it's already optimized for random access. So we split that huge file. It can be like tens of gigabytes. We split that file, put a build balance tree and put it on ipfs and then, when someone wants to access specific byte range from that file that needs to be fed from ipfs. We have API for accessing specific by two inches of files.

A

However, I'm not sure what will, even if there is a performance impact when you put the like one data structure on top of IP FS, which is sort of separate data structure, potential problem I identified is with data deduplication, so the Zeen file itself is just a flat file, but it has sections of content which are compressed, which means those bytes are usually unique, and those compressed sections are not compressed deterministically, which means when you have two snapshots of Wikipedia, the same content could be compressed with some other content and prod totally different bytes, which defeats the application that we get from.

A

Putting Wikipedia on ipfs because, right now, if we create the new snapshot, all the existing images honestly like most of them did not change. So everyone who has those old images in cache they automatically are co-hosting those files for new and future snapshots. So that's like an open problem, but it's interesting if we are able to put zim files in ipfs and if it's like performant enough, what if we have xeam reader in pure JavaScript that maybe not requires, maybe it's not reading over HTTP, but what?

A

If we have embedded J's ipfs that just fetches specific byte ranges of those theme files from my peer files like directly so that's interesting and we've got some feedback from the qyx project that they have. Actually, they have JavaScript client, which sort of works in browser extension, so those are spaces. We probably will look at at some point after of course, like that. The most pressing matter is too just refresh those snaps that snapshot in the old way.

A

However, it's pretty interesting if we are able to create a Wikipedia snapshot which there's like wicked input, JavaScript base reader on ipfs and put zinc piles on ipfs. So then we people could both download those in files and open them in offline Reader on their machine. However, they actually, if we have this ipfs, capable team reader, publishing ipfs, we you actually just need web browser and then that web browser would be able to just fetch specific zip file from ipfs specific parts of the same file.

A

So you don't actually need me to just right now like right now, when you browse the stupid Wikipedia, you don't and I. Think you just cache specific pages on your local IKEA. First note with this, you would be caching specific ranges of zim files.

A

Those files do not update that often I've seen snapshots been done like monthly or sometimes once a quarter or something like that, so those are fairly stable on the network. When you, you have multiple people sitting them. It's just an interesting change of approach to look at or to experiment with that's the update on Wikipedia I'm, not sure I probably skipped some details, but if you have any questions, yeah.

B

I looked at em, it looks like these snapshots are actually not really generated very often like even the newer snapshots are pretty old. So, like I, wonder if there's some way that we could can we support key weeks like who do we need to create our own snapshots? Can we use their infrastructure? Could we do something they actually create more updated snapshots, very good? Really, yes, so that's sort of like a snapshot of day would be really huge improvement to what they have going on there yeah.

A

So right now, I believe the key week's project they just have their own infrastructure that builds those especially English. Wikipedia is huge it even without videos it's over 60 gigabytes or something like that. Yeah it doesn't file but like generating. It probably takes some resources.

B

Yeah I mean: can we give them a grant or something like? Can we, like it, I'd love to explore, maybe finding ways that instead of taking people like you off the core implementation, how can we help accelerate things? I love the idea of just automating, it too and I think there's probably a lot of other people that would love to be able to see this working there's a way we can move the bigger whole space forward.

B

A

I believe in the issue, when we discuss reading the same theme, files directly I sort of mentioned, that the very very first step would be to just right now. The cubix project is providing those zoom snapshots for free for everyone over HTTP and over BitTorrent, like the most obvious.

A

The first step would be to just add the same data to ipfs like go a gif as supports above file store when you can add a pointer to existing file without duplicating data in your repo or I believe it was added for Internet Archive or some other partner, the URL store, when you add URL, and it's sort of like fetch. The ranges are fetched on demand when someone requests requests those specific ranges so putting sim files on a TFS and like exposing ipfs links. Next to me, torrent.

A

That would be probably the first step and we totally should support them in that and see if this experimentation with like static, who jeaious based only that which is capable of fetching data from my PFS. It's interesting because when sim files are already on ipfs, that would be like the obvious next step to check yeah. So that's more or less an update I'll be trying to support.

A

It's not sure how, like personally I, will be able to invest time, but I will be trying to at least try out issues and answer questions and in spare time push the regenerating those snapshot they'll. Do it.

B

Yeah I mean, let's be honest: your spare spare spare spare spare tire yep I think for jumping on it. I saw that when the wikipedia DDMS it last week and I was like, surely we should be set up for this and no we're obviously not.

A

Interesting fact we did not have a en Wikipedia on ipfs that org. For some reason we had at our Turkish Wikipedia India nestling, but we had not English. So that's fixed now.

B

It seems eminently within our our capabilities and reach and to cab the set up and I think that would be something that is. It would be such a powerful statement around. Both the values of our organization and of the capabilities of our technology is, if we had this automated and setup, and and was supporting, making Wikipedia available and less prone to this type of attack, which is probably only happened more and more.

A

B

Actually, like it so moving to the next agenda, item I'm, like I, would love to see some of the like things like this as okay, ours to be forcing functions on our use it on us dogfooding and using our own technologies to do things in ways that have really positive side effects and.

A

That's more that's a reason when I sort of jumped in on the Wikipedia cuz, it's honestly the best test case. I have right now when I want to test browsing websites in Drey or IP, has companion or things like that, because it has everything it there's a DNS link. It has it's so big that it does not fit in a single directory, so you have ham charting.

A

It has non ASCII characters in paths. There are various edge cases that we've identified. So it's very important test case if, like with Wikipedia, works most of stuff work, and we we totally should do our best to make sure it's useful. Apart from just being proof of concept, yeah.

B

So a segue- that's like that's a great segue into the Internet and next agenda item, which is what what are the things that we want to do in q4. We have things around brave for q3 and I suspect those will still be ongoing.

A

Yeah, probably it's neverending story, however, like the okiya Kentucky are to have to brave browsers, to see each other and be able to exchange files that should land like local discovery works right now down. The missing piece is like expose port and like announce port for others, but I have an idea, Alan experimented with that around Leedy web project last year, and he stumbled on the similar block, I'm struggling right now so I'll probably reuse.

A

His notes yeah the next steps generally like in web browsers when we look at Jess ipfs the most painful things right now. It's the lack of DHT, which is sort of kinda. On the lip p2p side. However, we are the main consumers and generally at the problem of peer and content discovery.

A

All the old issue of sunsetting WebSocket, star servers and moving to relay circuit relay based future, which is also mostly only p2p side, but we are still them.

A

We needed more more more than other b2b users are doing I.

B

Mean just a it would that be something like a in a J, asleep PDP quarter like what, if we spent a quarter just focusing on those on those two issues, or even just one of those issues, because some DHT and sunsetting web second star like is it been things that have been around for a long time like it wouldn't even be possible to say, let's dedicate a quarter to that and and actually get it done. Yeah.

A

Yeah I I don't see any other way. We need to tackle it either right. Another way would be to just like figure out. Are there other ways of doing content and pure discovery in web browser? Are there existing like services or standards or api's available for web browsers that we did not think about that? We could create a little bit oopy discovery module for maybe perhaps.

B

Yeah is there other? Is there any prior work or thought around that? Are you just blue skying here like? Should we explore laying waste and start from scratch, tabula rasa, or is there like some previous investigations and threads? We could pull on yeah.

A

Prior things are where, like web RTC as a sort of improvement on top of WebSockets, because you just use it for signal link and not like you there's web Bluetooth, which is like I, think it's just in chrome and it's still like a region trial or something. But it's.

B

A

B

Sure it's just extensions only to you right, oh.

A

No I think it's available to web sites or song queues. I flew the header for Argentina.

C

A

So then, chrome ads exposes this IP at those API, so we know object another thing like fringe stuff, like audio using some frequency signaling. So the website requires access to your microphone or something.

B

Yeah I already experimenting be interesting to look at what the use cases are. That would drive those types of discovery. Mechanisms like Hong Kong is the great example concerts like places where you have a lot of people in one place, really dense urban areas where that type of physical ago signaling and discovery would be yeah really interesting. We've.

A

Been thinking about.

A

Reusing BLS for our purposes, however, that's like the problem with DNS is that is super easy to block yeah.

B

Every time we fall back to that, we fall back to the existing conditions. Where we know it's, it's that's that's the first, maybe the first trying to fall in the fall back. Yeah.

A

Thing is that we have this built in into like ipfs. You can have multiple discovery methods and they can fall back or random RL and that's always like a best-effort yeah. So.

A

Relays DHD I mean.

B

You said: is that really they're like if we get well, we can fix or solve discovery, career discovery and content discovery yeah.

A

I believe that's the maybe I'm biased, because that's the biggest problem for what I do right now, but I believe like looking at forums so and people, especially people who just started working with Jess at the FS or I, give that they want to other ipfs to the page and they just to add something and gets the ID and save that CID in their database of their app or whatever, but just collapse data to this ad and then they just add it locally and they try to load it from the public gateway and that's the moment that it just does not work or work.

A

So so that's there are two pieces of that puzzle. One is the situation at the gateway and people using our like our gateway on ipfs io. Another thing is generally like the way embedded, J's ipfs on regular website. Does this discovery is limited to honestly, just like WebSockets, tar and being connected to bootstrap nodes in the past. It works perfectly fine, because those good strap notes, wear the same. Go ipfs instances that provide that HTTP responses on our public gateway.

A

So actually every JSA PFS was connected to the Gateway, so the Gateway immediately had a connection to the nodes that had data, so it worked, but that is not scale. So we detached bootstrap notes from our like gateway cluster and then it's not like one-to-one. It's like yeah one hopped and that's the problem. We just asked gears that we are connected directly and if there are connected to appear that has data yeah.

B

But I mean I think that's a at that point. You're talking about DHT algorithm or like we're not like. That's a that's either. If that is not a solvable problem, then then let p2p as a whole is not tenable as a solution to this at all. Right like that, it is designed for exactly that. One thing that you just described know is its low purpose of existing. Yes,.

A

Yes, the moment we stopped started to rely on DHT those embedded gpfs nodes were in trouble because actually there's no DHT in J's ipfs.

A

B

Why we have delegated routing and relay nodes and bootstrappers and yeah I.

A

Mean like I play with it, it's it's there, but it's like copying CPU for some reason, so it needs it's. It's not ready. Yet. Ok,.

B

So turn to reel it back in what is the? What would it? What is? It would be a a big, a big chunk. I'd, you know, we've had the last couple of quarters since I've been here. I feel like we've got a few different things. We've got our hands this and this, and this business and I would love to be able to maybe instead collapse and coalesce our attention into whatever. We think that biggest blocker is and take a quarter, push on that and remove some of these.

B

You know long long, standing things that that block our progress, we're gonna have there within web browsers on site. We can half that let's have a plan in hand better idea, I think that, from what the project operations meeting we talked about, the Oh care is next Dorothy. Ours, we're gonna start doing next week or the week after next I think week after next is when we're in the same room together, yeah.

A

I think so probably I think it was like that by the next end of the next week we should have maybe a draft or a set of eyebrows.

B

A

Usually just one.

B

Yeah and and that's just browser land and I, don't have the right people here. We don't have a hawk here to talk about desktop stuff, but from the from the from the desktop and GUI it around ensuring stability and performance and keeping up to date with the new release process and making sure that everything is still compatible working with each new releases. Now that Rana cadence released becomes more important to be able to have things like regression, detection, so I think like what am I wonder.

B

My suggestions there in that area was gonna, be around looking at the test, matrix again figuring out what the best high value regression, detection and CI combo that we could have to be able to make sure that, even if we're not actively building a bunch of stuff into desktop and GUI that it's tested daily against JSF EFS and go ipfs and to make sure that everything is still functional. Even if we're down to texting, detecting our regressions, we'd be detecting. There's sorry I said.

A

I believe, like from a pragmatic point of view, Before we jump into like fixing the HTM discovery. We should ensure that we don't spend time on fixing regressions, so my vote would be on the testing matrix. Okay,.

B

So, instead of even instead of attacking things like the brow funk functional function, the functionality of functioning of ipfs in the browser instead just put on the brakes and spend the quarter clear, checking all the boxes and that matrix.

A

That sounds like something we need to to move forward.

D

B

A

D

Steps back yeah.

A

Cuz like, if you want you need ideas, I have like a long list on the project for our week. I started like dropping like topics that we can discuss like I I would not even like go through the list and fit it in the very.

B

A

Oh yeah I'm dropping those into the column on the project in project one in back web browsers. We don't get it so general, but those are like things. I want to try ash with you and you go, and we see when we meet just to decide are those things that we should arc discuss or maybe spring yeah.

B

I, in fact, I think: let's do that before at all I'll put 30 minutes on the calendar and let's go over those things and do that triage before we get together yeah, then we get together, we can be as we can. We can do the things instead of talking about doing things cool thanks for creating this board I'm still like github project boards. It's like they're, not as easy as gel boards, but uh but they are also not another. Senator.

A

A

Then Paul think when you highlight when you hover your mouse on top of a cart and press spacebar, and it assigns you to that like there's. No, what no whole thing can you just.

B

Yeah, but I also do like the fact that this has like a this is built into the place that we already work. I did I was looking at some bots that do automation around these types of things too, so you can pre-populate columns with issues based on so when you're managing different issues, the columns autofill.

B

It's like that, there's some nice stuff that you can do a lot of it is built on actions separate to the board itself. These boards are pretty bare-bones.

B

Okay, so it sounds that's that's. An interesting idea is to spend a quarter, especially since we have lab week during the quarter and 2020 planning during the quarter, maybe maybe now to say better approaches to list all of those kind of stability and testing infrastructure requirements that we would always love to have. But there's never a good time to do the house also the holidays and q4.

B

So it's the trifecta of things that will be pushing against our ability to dig deep, take a big bite out of a large technical problem. So maybe it's a good time to think about cleaning and cleaning house yeah all right, I.

D

Believe we are at the end of our agenda this week.

D

B

More okay, our thing I did I did list Wikipedia down there and I would love to be able to have if we could slide that in today, Oh cares I'd love to be able to have that as a side effect of that quarter. Two of you like, especially since you say it, is such a good test case.

B

What if we even integrate the Wikipedia stuff into that see I as like a that is that is part of what we test against it's testing against a live Wikipedia way to make sure that we're not breaking stuff actually out in the wild on the network. Yeah.

A

Honestly, we already are using Wikipedia puffs in our tests, all right honestly yeah for testing things like like special characters in paths and the way you escape those okay, because, like I thief as files are not like URL paths, but we sort of started to be using similar conventions. So that's interesting problem space, so yeah Wikipedia is a part of our test suit. Already.

B

Hey Terry I think we have a local an offline meeting coming up soon right.

C

B

Next week, wow that.

C

B

Not coming up like immediately.

C

Yes, we do there's a guy. um Thank you. Person I have lined up at the moment, is a guy who's focused on like what the internet limitations are like four people on native lands in the States, so people so that's I- think maybe next Wednesday or something like that, but yeah and then on the protocol side. We just deal go left about a couple weeks ago and now there's a guy named Jill who joined us who's also at Moxie. um He was working on the IBM project on the front end of it, so not I.

C

Think I think he has less deep knowledge about ipfs then Yoko did, which means that the two of us will need to lean on some other people when we get to the PRS that aren't about front-end code but are instead about like okay, we're validating the ipfs stuff under the hood, or is it even possible to do this with essence or add the p2p validation or those kinds of things um we're going to need some some volunteer helpers on some of that the biggest like I'm, not sure how quickly I'll get to it, but my biggest next step is going the camp content to figure out, what's most reusable as tutorial content.

C

So it's part inventory and in part, doing the work to make it happen. So at the moment, I'm working on the inventory side of it but I definitely see some fast connections. One of the things that I think would be cool would be to use that this CID inspector thing that somebody built I, don't know if it was Allen who built it or just showed it. But right now you can sometimes when the results is a dag thing. You can click a button that says for you and I failed.

C

The Explorer and I don't see why we couldn't do that for any result. That's a CID viewed in the CID inspector. We also want a tutorial about the anatomy of a CID.

B

C

Section on in Allen section, and then we could link from that tool back into this tutorial about the anatomy of the CID. So there are a lot of things that we could do, but I'm just still trying to find the time to go through all the.

B

C

But I'm very open to help with any of that I just.

B

So I just shared a one of the one of the carrots that we had. The project Operations Group was too was like communications plan and be able to share out cam content and Jonathan Victor spent a couple days helping out putting together this master list of just it's a sheet of just everything that happened to camp and links to the things and then links to whether or not somebody like to already did a blog post about it or already shared something about it or and then Jonathan is actually going to move.

B

All of this, we talked with Zack yesterday and Zack has an air table of all the video stuff, and we are talking about using that as the single source of truth for all of this content, so Jonathan's gonna migrate, the contents of the sheet into Zacks air table and then insects air table. You will have a single database that has all of the stuff, not just camp related but being able to search by topic, be able to search for a keyword you want to find all CID related stuff be able to find links to it.

B

C

I haven't used your table before, but that conceptually that sounds great and I've been going through and trying to make some tweaks to the camp readme as you've been helping with to make stuff more obvious.

B

Where, thanks for doing that, yeah.

C

I'm, like I, don't see, I clearly have an idiot yeah so and then we'll need to add the video thanks, but the other place that, like one of the things that we can do very easily, is now that we have those resources pages at the end of each tutorial stuff, that's related, but either we haven't had time to shove it into proto, school or never format correctly. We can be leaking out to those resources more.

B

C

Already so there there will be many ways to do it. Well, so that'll be I, think one of our biggest things for q4, whatever we don't get to in the next few weeks before I head off to uh fly camp um and then I'm sure we'll have plenty of you extra. So so yeah.

B

Cool sad sad do go but new faces. You folks may.

C

B

Gonna meet all of Moxie sooner or later, Sunday.

C

Someday he'll rotate through.

C

All right I do like the idea that in theory, Jill could go tap. You go on the shoulder when I'm not awake. Yet, what's with this thing dude, what did you do, whether he's actually gonna? Do it I, don't know it's fun to think about that.

B

Yeah I think this is my last meeting for the date did.

A

We just fit into 45 minutes only.

C

If you hang up real quick.

D

I'll stop recording and let's pretend that's it. Bye.