GitLab Delivery Team, 5 Jun 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Delivery: Workhorse object storage acceleration

Description

A "brief" introduction to how workhorse perform request body hijacking to upload files in object storage

A

Okay, so, okay, nice, no.

B

Pressure dad doing this so I will try doing this.

B

B

Time time it's not working yeah I have this kind of favorite things when I can drive thing, and you would see that on the screen, which may help a lot, because I'm going to draw you how this is working.

B

Maybe it is so let me share now.

B

And here is so: if this is working, let's try it. So what I'm? What I'm starting with now is how we use it to handle the an incoming rivers with a file to workers, unicorn and everything involved. So this will start explaining the problems that we had, how we solve them and done the problem that we have now. So let's say we had this: we have.

B

B

So this thing's comes into our system. Then we have our hello balancers and things in front of workers that are not important in business now. So I would just skip all these details, but what happens is that these things goes into workhorse workhorse here act. Some like it's kind of a smart, reverse proxy, so it may just soft wood Eureka's and then pass it through our unicorn okay.

B

So when a request has a file, what happens is that Unicode is doing what we call Rico's hijacking because handling an incoming file with unicorns is going to be expensive because we have this process that is just locket to writing something on disk. Then we are doing this in workers which is more efficient and we are going to actually.

B

We are going to drive to an offense storage, okay, so workers write the content of the body, it only the file in the NFS storage and then it will add some others to the requests before sending it to Unicorn telling where you can, where unicorn can find this file okay. So the list goes on and reach unicode.

B

Now what happens? These two blocks are part of the same machine. So that's a square here round and this can be can be part of the web fleet or the Cody API trades, but both are doing their own in the same way we have workers in front and then a Unicode.

B

So what happens here is that in the genican controller, we know that this file is already on disk and we have to perform something with this. The what happens is this: if we want to store something on object storage, we can't just upload this from the unicorn controller to the object storage, because we will have the same problem that we are. We will wait on IO operation. So what happens? Is that the unicorn? Basically.

B

We are going to in queue a job in our sidekick fleet, which is on another set of machines. Now so the controller just write stuff on database and do controller things, but the real upload operation happens in in sidekick okay. So what once the sidekick job stars is going to read.

B

From the NFS storage and was going to write to the object, storage.

B

Now we have a problem here. The problem is that we need NFS storage for doing this. Okay, so we need Sherrod doing some shared drives across the flee, so that everyone is doing synchronous cooperation over the file system. So this was the certain point one year ago, more or less, and this was a completely. She was a blogger for the kubernetes hunt charts because we can't use the we can, but it's too complex and is not really a cloud native, so we needed to remove the NFS storage from this mix.

B

Okay, we change a couple of stuff here. So if we go through this again the changes that we need that we have done hard, that workers is going to up drive directly to the object storage, how this is the problem so from now you can imagine this workers is going to receive pre-sign material from rails. So once the request comes in to workers, workers has unicorns. If this request is allowed without passing the account I'm. Just asking.

B

Can I upload this file somewhere and unicorn is going to give back priests I need Europe, so that workers can start uploading. This file in the object, storage directly and so here, instead of writing to NFS, is going to drive to.

B

B

A

What do we use for the.

B

Object, storage: you can use, we are using Google Cloud storage, but we support s3 like implementation, so works on AWS, meaning a couple of providers out there that basically are using s3 at the eyes now. The thing is that this this request reach unicron's and instead of our local disk apart. He has some part and tells him this one is already available on the object, storage, but the behavior is the same. It's just where the file are search, and this time you're not going the through sidekick but directly.

B

In unicode controller, we are going to move from temporary object, storage location that was created by workers to the final object, certification, I'm just sorry. So the the the the final destination is situation only when we reach the engine account control, okay.

B

So this is very briefly what we, what we do a couple of details and lot of problems here, because when I say we support s3 like API is not every three like API is is equal, and this is the problem, and on top of that, we also had a very big problem that most of our clients do not send. The the content link with this upload requests, so we don't know upfront how big the file is going to be. Okay,.

A

Another question when you say requests with a file like what does that mean? Like what kind of examples can you give me about this file? Sure.

B

It can be a loading and artifacts, so it's an API call from the runner that are just posting binary file. It can be an API code from maven for uploading. The jar cards to our packages features, but it can also be a user on github your face and dragging an image into a discussion.

B

Okay, so that's why I told you it can be either web or API free, because both can be enriched directly. The absurd text, this deployment ation now the problem that we have now is this that me know broke as three compatibility in the name of performance: they they broke three compatibility and this, and so we started rethinking about what we are doing now. So let me first tell you this: oh I think I can stop sharing. This was awesome by the way great tool, I love it yeah.

B

Thing is this step.

B

What we have now is this: we, as I, was telling before not all the s3i api is article. So when we started, we had in mind AWS and we will cloud platform now, as I told you not all of our Rica's have content life. So we don't know content length up front. So Google allows us to start stringing content.

B

Without the we are knowing the length of front, we can use the it's an encoding that I think it's chunked encoding, and so we can just stop streaming and they will save the object storage without any problem. This is not true for AWS and all the open source implementation out there, so self menial and other s3 providers. They want it's part of the API specification that the content like must be provided.

B

When we start uploading, we add a complex situation place which is instead of just forming a single API call which is put object and you just create a food to a part of the object, storage itself swimming the body of the file. We do what is called a multi-part upload. Multiple code is a set of API from s3 compatible storage. Where you actually are writing several parts.

B

Is there codes part and you can drive them with the normal boot object operation and at the end, you are telling the objects are to combine everything, so you can say I want you to to complete the create a single object out of all this part that I have already uploaded okay and what what we are doing here is this bit as I told you before, when the reposts, which workers workers husk unicorn about, if it's allowed to do upload this file and the the API returns set of precise URLs that workers can use for uploading stuff without having the objects, are credential themself.

B

So what happens? Is this if the unicorns returns just one? It's not that simple, it's more structured by the point is this one fresh, Island URL and you know, then you can start streaming directly to this, and the underlying object starts supports unknown length file. If you get set of persona to RL. You know that this part of the multi-part upload, so you have to start reading the file in chunks.

B

In the answer in the answer you get from unicorn, you know you will get decides so you worker start reading the first, let's say five megabytes of the object. Once you reach the end of the first chunk, we are going to rewind it and start uploading to the object storage once the first five maggots once it's uploaded, we go to the next five megabytes chunk and so on.

B

So we build all the parts with the same implementation that we use for a little cloud storage, because it's the single food objects, cool and at the end we have another pre-signed request from a unicorn that allows us to complete the upload. So we just sending hold the list of the parts and all the ether leaders that AWS or the object search gave us in return for each upload.

B

So after Cusco is completed, the file has reach of this final destination and we can pass the control to Unicode, say please finalize this upload and then unicorns will move it to a final destination. Okay, so this is the the thing that we have now when we implemented this, we had some constraint that we have to respect. We didn't wanted to store the file on disk.

B

We didn't want to read old file memory because we may have several requests and so we're going to use a lot of memory, but also our artifacts versions can be bigger.

B

So you can everyone gigabytes hard to trance and say I, don't want to editor it on disk or store it on RAM I just want to as long as I read it from the incoming rivers, I want to write it directly on the object storage, so this was the weakest that we have, and the point is that while we read it, we are computing, the md5 hashes of everything that we then goes through workers, and this was helpful because we, each of year that we received back from Google, Cloud, storage or AWS or whatever implementation was exactly the md5 hash of the receipt.

B

The fact that they received so we add up at the end where we just we were matching our side. The DMV find that we computed during the upload with the md5 either's that was computed by the object's RG limitation. If they were, if they were not the same, we say: okay, something went wrong, so we're going to abort or the procedure delete everything and give back and ever to the to the client. That is uploading. Something so, and this is the fine part game.

B

The complex part is that we, this is true for a put object request, but it's not true for a complete multi-part request. So once you ask so the point is that the e-tag of a file uploaded with a single rapist is going to be the md5 of that file. But if it was built later, just by a multi-part precursor was compounded by several paths. It would not be. It would be something else, and here we made the first mistake.

B

We found our reverse-engineered implementation of this algorithm and we implemented it in workers so that we could still check that also. The final part was was completed because it was kind of based on the all the md5 of all the parts was kind of the ash of the ashes. Something like this so was still reasonable. Now, lot of information.

B

What happened is that Mineo decided to by default no longer give you back the md5 of the uploads into each on either, regardless of this being a multiple triggers for single upload.

B

So, as you can imagine, we are checking this value at the end of the applaud, and so we are failing all the uploads on the right default yeah.

B

So so right now it is not working I mean as they when they broke it. They broke it. They add a compatibility flag, so you can run mini with combat and the backward-compatible no backward s3 compatible fully as written, so it would still work in that case. This is this is a good opportunity to just ever think about what we did in the first iteration over this and making it more resilient, because we can't remove the check because it's the only check that we have now is the only integrity check.

B

So we have to think a bit better about this, so so this is the long story. And now do you have question?

B

Okay, so are we go to San Andreas? It's I'll try to I hope we can make this consumable because yeah you're the know the codebase and is also a very small part of this, but maybe it will make sense with the story. Did I told you this is the modulus. Is the workers, three nine eight, and it's about removing the attack check on the most part upload to object, storage bisque.

B

He gives a bit of context and when meaning bro compatibility and things like that, we discuss it a lot about what was the right thing to do here. We are in some kind of a middle ground, and then we moved discussing this to an issue, so I would just go through the larger APIs. For so that you will see the file that we handle this kind of this operation.

B

But all the discussion now is happening unleashes because it's it's a bit more than this, so we drew work this a bit more. So the thing is so actually.

B

Yeah, maybe if I do something like this? No oh yeah, because it's about the duration stream, so yeah we'll start with something that actually makes sense like this. No I mean justjust just going through the files in from the owner, and that doesn't help. We've tried to to make some sense out of it. So I will start with this. Internal object starts multi-part.

B

This is fine. That is uploading the multi-part, so it's breaking chunks and doing the final code. So maybe we can yeah exactly so. The thing is I: don't want to go too much in detail, but yeah. Here we have a loop and for each part we are going to read just as much as a part size and just read and upload one part. So we are going in look. We will upload one chunk of time and once we have done, we.

B

We completes the request, just I'm just glossing over details, so I just want to give you a sense of what we are doing. So if we go to the will upload one part, which is something here: yeah, no, okay, okay, one part, is it's even below yeah, okay. So what we do here, I just want to see, I need details, but we create a temporary file and we start writing to this. Then we seek back to the beginning of the file and we upload it.

B

Okay, so this upload part is, is going to the the single put object, so we're just breaking in chunks and uploading as a separate object on the object storage. But the things that we actually broke is here in the in the completed request yeah. Here we are so we here we are building the the index, America's 43, about completing the the upload, and this is basically here.

B

We create this compound, complete multi-part of blood. No sorry, this is the result, my bad. Oh, it was sorry it was provided as a parameter, so we get the complete multi-part upload as a parameter. We marshal it as an XML body and we send it with post Rica's. Okay, then do two things that we are. We change is here, so we used to extract the each arc from the result and then we used to verify it so the we had this implementation. That is this part here on line 166, the Sun could be removed.

B

That was reverse engineering, s3 implementation, so we were reverse engineering it and we were checking if they were even was match. So here with this, we need to remove this because it's over engineered and we it's not stated in the documentation of a string, so we I mean we were. We were lucky that mean you're, reverse engineered this as well, and they implement in the same way, but even it's not part of the API specification, we're just too much too much restrictive here.

B

So this is the more external part of this of this materials is just about removing this chat. Now, in order to do this, we.

B

Yeah this is the other part that when we started discussing and we decided to go on another direction, so the proposal modulus here is about as I told you, the multi part is just a single object uploaded, and so it's relying on the single object upload.

B

The proposal here was about adding a parameter to the new object constructor when we can just tell if we expect the attack to be md5 or not so that if we know this is not the end of five, we will not check it, and if we know that this is the md5, we will check it at the end of the upload. So, as you can see, all the signatures here have the new in each Auggie's md5 hash. Okay, it's calling here the same as well.

B

This is sorry object and we create- and this is the check line, we use it to check the each hunk with the md5 sound. That is md5 sum that we compound. We calculate it on our side, and here we are checking this only if we know that this is supposed to be. Okay know why this is not working and why we decided to to move on an issue when discussing better this. The problem is this that we use the single quote: object also in s3.

B

If we know the signs up front, for instance, if you are pushing an LFS object, is part of the LFS api. The object size is provided as a parameter, so in that case we know it and if we know it rails is gonna, give us just single pre-authorized Enriquez design unit. So we not go through all the chunking is not needed. We can do the simplest thing of just doing one single upload, because we have all the information.

B

Okay, so workers has no control here, it doesn't know, what's happening, there's no, who is on the others which implementation of the object search? We are talking to its. We designed this to be completely rails, driven, okay, so string that I propose here was about checking the response heater, because we know it does there's a server heater, which is part of the API, and so we know what.

B

If you are looking with AWS, if your Google cloud platform, if you're talking with Mina or other stuff and my proposal here will was about, we can build a list of broken implementation and we skip the check only on the broken implementation.

B

But then what happens is that we went back to discussing this and let me find the issues.

B

Because we say, instead of just focusing on this problem, can we think a bit better about what we are doing and if we can improve the overall experience. So.

B

B

Okay, now final part of the story, all the implementation or the objects are in implementation. Support is partly the opposite workflow when you can send with the request the expected md5, so the server will check it even mean you will do it in the case and will fail in the upload if what they receive doesn't have the same md5. So, instead of checking me on our side at the end of the upload, we can provide the md5 when we begin the upload and the server will validate it.

B

Okay, now, what's the problem here that we to read the whole file in order to to compute the the md5 hash? So there's a lot of discussion here. Maybe you can read it, but the point is we, we kind of agreed on this that we are already in the context of the multi-part upload. We are already reading part of the file seeking back and uploading.

B

So in that context we can just send the md5 hash, because we already have it and actually there's a there's, a batch here proposing by echo, but that is doing this more or less because, as I told you, this read and upload one part is actually creating an md5 hash. Sure so I'll show you some listen. Maybe we will not make that much sense.

B

Now it's can be, and instead of coping from from the source to the file, we are going to using this things, which is a teeny reader, which is which behave like the binary team. So we can. We can pipe two things, and so this is reading from source going through a sum which is here, which is an md5 assure and everything, and then it goes out of of the Essure exactly what it comes in and and is reaching the file.

B

So what happens with this two line changes is: is we are building a Bible so itself, reading from its source and right into the file we're reading from the source through an md5 hash sure? But what has what goes out from the md5 I should be still the source file, so we can copy the content to the introduced nation, but at the end of the copy we can ask the some object to give us the md5 hash body in between. So so it's kind of.

A

B

Exactly is this car that's on the fly, but this is because here we we read five megabytes. We go back and upload it. We can, for each chunk, has the server to one date, the md5 actions, because we have it that's, and this will remove the need of validating the complete multi-part, because if each part is validated, then we expect them. The sum of all the parts will be a rarity.

A

B

That's one point: the other point is that if we go with the Google Cloud Storage Ralph, when we don't know the length of the file, but we want to stream it because sorry I, don't know if I know this before Google Cloud do not implement the multi-part upload, so we can't just say: ok, we are going to use the multi-part also on rooftop storage, even if it is not needed so that we have a single implementation. No, they don't support it.

B

So we need to still do the single put object operation, but in that case we can validate at the end, because we know that Google Cloud Storage documentation tells us that the each arc will be the md5 of the content. So we have. We still have two slightly different implementation, but it's already the case. The problem was between start needing the third implementation for meaning, but going this route we will just keep only to implementation, move without storage and all the s3 compatible api's.

B

What is the problem here? There is a big problem as I told you before, when, when we ask worker, when hard workers ask unicorn to validate and upload, it will provide us set of procedure that we can use without knowing object, storage credentials, but this person in Uriel are they have. The leaders of the Lucas are are coded into it. So if we want to add some new eaters, this will not be a valid press on URL, because part of the signature.

A

B

The ash of meters modified yeah, so the thing is that we have to option here that we are discussing. One is to move object, storage, credential also to workers, so that workers can resign himself, the URL on a spot when he has the md5 come either. He already calculated it instead now per sign me an URL for this rapist with this either so that we can keep the old implementation just adding a function call to getting there. Dude.

A

A

I reach for this now you're back. What wait? Wait, I think something happened. Oh yeah.

B

I think that my connection just sold for a moment, so one option is to resign workers with the credential and ease the old reputation. The other option is to, instead of doing only at naturalization requests to work unicorn asking for a set of person interior. We can create an API call them say.

B

Can you preside me another request with this Miller's so that we have DMV files, we complete all creditors on workers and we ask unicorn to pre sign a new route so that we can upload it, but this is going to be more API code because we're going for each part doing me to ask, are assigned Europe. I think this is the word sorry.

A

Api calls like do. We worry about the performance that they are going to have? Are they going to decrease the performance that we already have, because we are introducing ok.

B

So we are not really worried about this in general, because pre signing URL is some is an offline operation. It's just going to sign something. We have all the pain material. We built the the URL parameters and give us an and we send it back, so there's no DB involved.

B

This is not completely sure because, as you know, once we go through all the rack middleware in unicorn there's a lot of things that happen. We are going to blog. We are going to validate tokens. We are going to do the thing that we do on each incoming request, so the pre signing part is trigger alright, but yeah. It has an over at and I, don't know which direction we will take I. Think we mostly will be a decision of who is going to work on this most would work.

B

The thing is that if we go for the API route, the installation will just work out of the box if we go with the sodium financial route on the omnibus installation. Omnibus can take care of moving secrets here and there. So we just work, but this will not be true for from sources. Installation was they need to upgrade the configuration file, but also Google supports two different set of credentials. It can be JSON file or the same secret and token that AWS is using.

B

So this is another layer of complexity, because and workers we need to implement both so yeah. It's. This is more or less a server.

B

Does this make sense, I just overwhelmed you with.

A

It was nice to see like the big picture of how everything is working, but yeah I understand that Thanks, okay,.

B

And I hope it helped come here now. If something happens to me, I'm, not the only one who knows this.

B

Recording so yes sometimes Mabel to happen to you, but you still have the recordings.

B

I think it's it's fine. We also have problems with the recording at the beginning, so Thank You, Myra and see you later. Okay,.

A

Well, thank you. I see you thanks for your time.