Cloud Native Computing Foundation Storage Special Interest Group, 13 Dec 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: CNCF Storage WG GMT2017-12-13

Description

Join us for KubeCon + CloudNativeCon in Barcelona May 20 - 23, Shanghai June 24 - 26, and San Diego November 18 - 21! Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.

A

Good morning folks been you out there I.

A

See been in the chat, but uh we'll wait for him uh sig! You you out there. Yes,.

B

I'm here, okay.

A

Good morning warning.

A

Once Ben checks in here we'll get the screenshare shifted over to you sounds.

B

C

Start in just a minute folks,.

C

All right, let's, let's get started folks and everyone hear me. Can people hear me yep, okay, so the first thing I wanted to do to kick things off so today we basically have sue to present on.

C

A

Ready we just need to pass on the ball and I'm you're done perfect.

C

So, just before we do that I wanted to address last week's face to face, and in particular some of you might have been there so that you might not have been there. But I wanted to reiterate to this group and I'll probably do this. The next couple.

C

These meetings as well I just wanted to reiterate at the CNCs Code of Conduct, which I just pasted in the slack channel and and in particular I, wanted to reiterate that, while I think a lot of us are very passionate about each of the work that we're doing and how we want this all to play out for.

B

C

Folks that are not talking mute.

C

Thanks, okay I want to reiterate that the expectations that we have around collaboration around discussion, around professionalism around not personally attacking people around using the right tone and and being generally pleasant and enjoyable to to to have be part of these calls and be chatting and and working with with everyone. And if I'm not seeing that I'm gonna pull you aside and and and ask that we have a discussion and potentially ask that you, you leave the working group. So with that said, any questions that folks have.

C

Nope, okay, okay, so we have subbu here to presents on the tests and I just shared another link in the slat channel, which is his presentation and I. Think we can pass the ball to him and he can get going.

B

Cool hi there we share my screen.

B

Okay, can you see my screen.

B

It's not cool all right, so my name is sue. Ku and I've been at YouTube for about eleven years now and I've worked on various scalability and infrastructure projects, and my last project has been with tests which I'm going to talk about so this this presentation, I kind of put it together to address many of the questions that CN CF has for due diligence. You didn't feel like it flows well, but it should cover all the information that you need. So hopefully it won't be too rough.

B

So we test our mission statement has been to be the best performing and scalable my sequel solution in the cloud. Actually, our our real mission statement is to be a new sequel storage solution, but there are a few items that we are missing, that we'll get to as we addressed issues before I jump into what we test is I wanted to cover quickly. My sequel, because not many people are from here with what with its significance.

B

So my sequel is basically an AR DBMS, so people usually choose an AR DBMS over a traditional, no sequel system like is usually to because they want secondary indexes, joins or transactions which basically make the application simpler to write without these application has to take on these burdens, and my sequel itself is by the by DBM genes. Ranking is the second most popular storage engine in the world, so it still has a huge, pretty huge following the one of the main disadvantages of my sequel. Is that it's a single instance server?

B

Once you have maxed out your machine, you are pretty much done for in terms of scaling. You cannot really scale beyond that, but you can scale reads by just adding more replicas, so that support is still there.

B

It's actually not a big data server by itself, because it's usually awesome for something about from 100 GB to 1 terabyte size per server beyond that. If you start adding, if your database gets bigger than that, and you start to run into various issues with my sequel and in terms of cloud support, there's only these hosted solutions like RDS and cloud sequel, my sequel itself doesn't have much help in terms of use. I want to just take stock, my sequel run it in the cloud.

B

There is not much help and we test actually is trying to help solve that problem. So what is we test? It basically solves three problems for my sequel. It helps you manage a large number of my sequel instances when the number of number of, as you scale, when the number of instances gets really really big. There is not much help from my sequel about managing them.

B

So with SLC with that, if you actually are on bare metal and want to move to the cloud you can use with s, to move your MySQL instance to the cloud and the last, but not the least, is that it's actually gives you new sequence, tile scalability. When your database runs out of steam, you can just shard and continue to scale. So those are three main features that we test gives you in terms of categorization.

B

So, prior to this whole new sequel, wave, that's coming up, people had to choose between Ava and our DBMS or a no sequel, and is so basically you either choose scalability or you choose data integrity features, but you can't have both, and that has been a problem, because many of the complex applications do want both. So hopefully, the new sequel wave will solve that problem. So the there are actually many new sequel systems. There's now tid B. There is also a situs I need to actually update the slide.

B

Some people actually were questioning whether with this is or should be, a new sequel system, and there are many definitions- it's still gelling, so what I've done is I've put with a slightly below the bar saying that we are almost there, but not quite there yet, but it does play in that space. Most people that want to scale indefinitely an RDP. My style system tend to converge to be test.

B

You know in terms of history, we this project is pretty old. We started actually in 2010. This was a time when YouTube was actually having some severe scalability problems. Actually the number of voltages were growing and they are pretty much getting out of control. So myself and my colleague Mike Solomon, we took ourselves out of the day-to-day operations and decided to see if you can come up with something that will let us sleep ahead of all these problems, so that is basically how a test was born.

B

But what we realized was that the problems we are solving are actually common to many organizations, so we decided to actually open source it so initially, even.

C

Though we open sourced.

B

It we mostly used it for ourselves. Actually, we used to develop in the open source world and import the data and use it for ourselves. Nobody else actually was participating, but over time the number of features we added kind of became Rhian, civ and Flipkart was also facing scalability issues, and they somehow found us and contacted us and said: hey: do you think we can use this? We said sure, give it a try.

B

It will help you to the extent we can and then a year later they went into production which was pretty exciting and then, after that, that credibility attracted more users and it has been pretty much growing organically since then, so we haven't been actually there's not much hype around with us. So most of the Vita's growth has been very quiet and secret almost, but it is it's grown pretty pretty fast.

B

Now we now have like a slack channel that has about 200 members, and so, if you look at our community, so these are the companies that are currently using or adopting with this. The group that you see on the left are actually people that are already in production, and the group you see on the right are people that are either in the pipeline or evaluating.

B

Actually, there are more companies than this. We only have permission to use these logos, which is why I've used them. There are other companies that have actually some of them are actually quite big that have gone into production, but they haven't given us permission to use them as a reference. Yet so it's pretty exciting this part, and there are some testimonials, but both from slack and squares that I collected, because people wanted to hear from them about whether they how they support C and C F. You can actually see the full text.

B

I will link the document. These are suggest some expire accepts. So what? What are the things that we test does for you in the area of making my sequel better? It allows you to run my sequel and multiple data. Centers takes care of fail, overs, load, balancing and one problem that my sequel has is. It cannot handle a large number of connections and/or.

B

A large number of or if there is a huge spike in requests, it just falls apart, so it does actually protects my sequel by using connect, pools that limit and throttle the traffic and often sometimes expensive, queries hit to my sequel and take the entire database down. So my sequel is is something that cannot where one query can actually affect everything else. So when these things happen with us intervenes and kills queries that are taking too long or transactions that are lingering.

B

So actually this we have been developing these features at YouTube, basically based on issues that we have been seeing in production, and these are actually things that anybody that's using. My sequel will relate to, because these are problems that all of these people face, and it is now so well tuned that people that go into production with Vitesse find that their my sequence generally runs better than it used to run before.

B

Actually, the these are actually not cloud features per se, but the entire retests software is written for the clouds, so there is actually subtle cloud related behaviors in all of these features. So it's so that's one. That's actually my next slide, there's a there are so because the the way, actually we we wrote it is, we test- was built to run in pork. So what we used to do is even though we built it in the open source we had to actually import it and actually adopt it to run in Google's work cloud.

B

So we had to use lock, servers, discovery mechanism, health check, everything that Google required for software to run in its cloud. We had to do it for with this, so you know maybe were actually cloud ready before even kubernetes was born, so when kubernetes came about, it was actually very trivial to make it work under there, because all the plugins are in place actually running in a cloud is not just checking these boxes, because there is a specific way you have to think about.

B

Software like, for example, cannot like rely on config files and expect them to be sprayed around to change the way software behaves. You have to use the log server mechanism to publish changes you have to rely on. Basically, the file system is not something that you always have access to. You cannot log into a box and look at what's going on all the time and and also when you use a log server. You have to be careful about not overloading it, because it's not something that you can afford to spam.

B

Those are actually low, QPS systems that you shouldn't abuse. So all these things we actually learnt them the hard way and actually changed. We test to actually treat these external systems correctly so that it works well. So the best feature for which I have with us is definitely the sharding and that's the one that attracts most users. Basically, once you once you outgrow one single instance. What we test lets you do is actually shard your database and continue scaling indefinitely.

B

So the way it does, it is basically lets you view the sharded database as if it was a single instance. So when you send a query, it figures out where and how it should route that query or split it in two parts and send it to different parts and then collect the results, and this database like features in the sharding. We have cross shard indexes. My sequel users rely a lot on auto increment and we have sequences that behave just like those. But then it works for cross short, shorter.

C

B

The other one is actually the pluggable sharding scheme. This is actually something that my sequel users prefer I, think it's something that most users should prefer, because I don't think systems are there yet where they can be good at figuring out how to share your data. So if the system made a decision about how data should be charted, it is most likely not going to be the most optimal way and hopefully machine learning will help us solve this problem. But for now this decision is best made by a human.

B

So therefore, we not only make you decide how you want short your data. You also decide the sharding scheme that you want to use, which is something that many, my sequel users want, because not everybody likes a hash-based sharding scheme. Somebody wants to use mod base, somebody wants to use an md5 or somebody just wants a sequence. So that's one thing that we test is very good good at the most important thing is that the sharding scheme and the odd index feature actually worked hand-in-hand together.

B

You can even reshot your database after you plug in your charting scheme and when you reach out a database, we test can do it with almost no downtime. The application will almost not know. What's going on, there's probably about like a few seconds of downtime when you switch over the Masters, which actually the proxies will just buffer those requests and the application will see almost no downtime. And finally, because you are shorted, you are probably going to have transactions that span across charts. So for that Vitas has support for two PC transactions.

B

There are limitations. These are actually all addressable limitations, but we just haven't prioritized them. Yet. The one thing is two PC is actually a new feature that we developed it's the first version and it is expensive and but then somebody that really really wants it can have it, but they have to pay the price. It essentially results in about 50% increase in data throughput requirements. So it does reduce your serving capacity. But then, if you can shard, you can just go wider.

B

The other one actually is the big one, which is when you do cross short reads. You don't we test, doesn't give you a consistent view of cross short reads. So that's one big limitation. That is, we have some proposals about how to address it, but actually the none of our users.

C

B

Really asked for it: they don't seem to think it's such so so important, so it may be video. We haven't encountered users that need this yet so, but this is something that is, that is in our radar and when so, when you issue a sequel and if all of that sequel can be served by one one shard, then we test is very good at figuring it out that sends that it is equal to one shard.

B

But if your sequel expands across multiple shards, where you have to do cross short joints and stuff, there is some amount of complexity that it can handle. But if the query is too complex, it'll say sorry: I can't do this yet. So this is also something that we plan to address in the near future.

B

The enterprise style uses the one thing that their main complaint has been, that it's very complex. It's based, basically, because just we have been going in for seven years. It has a lot of options and sometimes people get confused of what what works with what and so right now. They do need to talk to us sometimes too, and they have probably configuring with this. So this is another big area that we have to focus. Actually these last two currently are our biggest priority: the documentation and reducing the complexity of bringing up every test cluster.

B

But the good news is that, once you have configured with tests, it runs really really smoothly. So that's been the experience with our users all right, so this is actually the Vitesse architecture. It shows how we do what we do. The way we test is configured is at every my sequel instance. We attach what we call as a VT tablet, which is actually all queries go through. Vt tablet and VT tablet makes sure that the queries do not abuse my sequel.

B

There are lots of protection like, for example, if two identical queries hit, my sequel, it won't send them both to it it'll send only one of them and then share the results to everybody that has requested it and VT tablet. Also, the can perform housekeeping work like taking backups, restores and all the cluster management work it can take care of and on this side on the proxy level. That is this thing called VT gate, which is actually stateless.

B

So it's something that can scale up and down as needed when, when the load on the application goes up and down and the application knows only about VT gates, it does not know about all the details underneath so it'll just send the query, as if VT gate is a one big giant database, and then we ticket will figure out how to get that query served. So it greatly simplifies the view the application has about the cluster underneath and how do we tie all this together is actually through a log server here.

C

B

Put the HDD, but actually with that supports all the log servers it keeps track of which tablets are going up and down and inform city gate of those changes and then mitigate accordingly routes. Traffic and VDC TLD is our dashboard that we use to look at. What's all the all the cluster and perform actually operations like my parent, English, sharding, etc?

B

D

You taking questions now or do you wanna save until the end I can.

B

Take questions now, if you, if it's immediately relevant, could you talk a little about what information is in sed? Yes, so it's CD, there's actually.

D

B

Actually, a lot, it's pretty elaborate. We actually, you can actually use multiple log servers. We have one called the global log server, which is actually unique across all data centers, and we actually let you configure one log server per data center. So the idea is that when one data center goes down, it does not take everything else with it, and the global log server has very little information. It only tells you where the master data base is.

B

It tells you how your data base is sorted, what key spaces you have and how your database is charted and nothing else, but the individual log servers the data center log servers contain information about each VT tablet. Where are the replicas and if something.

C

B

Down for backup, it updates those those states, so the.

D

Information about the shards and and where they are, is all on a CD. Correct. Yes- and this is the VT- travel- require access to n CD. Yes,.

B

It does it VT tablet actually will. When it comes up. It will actually register itself with that city, saying that I am now here and I'm a replica and I'm ready to serve traffic and that will eventually and mitigates our actually have a watch set on this Exedy. So when it comes up, they'll receive the notification and will start sending will start load bar, including that VT tablet in the in the load, balance right.

D

And you talked a little about connection pooling and all sorts of client features as well does. Does a normal MySQL client say from a PHP or Python or go work with Vitesse or do you have a different client? No.

B

Actually, we recently implemented the my sequel protocol at the VT gate level. So, if speed client could just connect to mitigate as if it was a my sequel database and it's transparent, there is no difference. Yeah yeah, so yeah. There are actually some some differences by the fact that things are shrouded and the fact that, for example, the VT gate here represents all the databases here, like master replicas and stuff. So the app server could, for example, say I want to connect to my key space, but I want to connect to the master.

B

So you do. We have an extended syntax where it says I what connect to key space at master. There's.

C

B

Extensions but otherwise query wise. It's the same. You just said once you've specified your database. You just send queries like like normal, and we also have some non-standard extensions like for example, you could do. You could issue, show statements that return information about charts like, for example, you can say show me all the key spaces here or show me all the shards inside the key space, so a few extensions which people can optionally use, but typically, if you just point your app to vita gate, it should work asses.

B

Thank you so much, and we also actually have a gr pc api at the VT gate level, and we have right now we only support java. The gr pc actually gives you a few advantages. Over native my sequel.

B

It lets you use, bind variables at your app player itself, which is which is actually which makes things a little more efficient and not only that it actually makes the this VT gets fully stateless, which means that you could start a transaction with one mitigate, and if that VD kate goes down in the middle, you can actually complete that transaction using another one. So it's truly stateless and load-balanced, so that's a, but we have noticed that most people don't care much about this feature. So most of them just use the my secure protocol.

B

Alright, so this the the whole idea of this thing is we actually worked very hard at reducing the number of moving parts here. These are things that we felt are absolutely necessary. Anything less would be a problem or anything more would be a problem. So this a lot of thought has gone into this architecture and we actually- and this is also being tuned over many years- so I think I think we are very happy with what we have here.

C

B

We are probably going to stay with it for a very long time in terms of versatility. This is where, with this really shines, it can pretty much learn any place anywhere like right now. It learns like YouTube grunts with us on Borg stitch labs runs it on gke, better cloud uses, mesos and GCE HubSpot uses AWS and kubernetes, and others like booking and Flipkart. They basically run on bare metal, there's, actually a bunch of uses that run on bare metal. It can also work with any loc server.

B

We when we import, we tests, we basically swap in a chubby. We originally developed it with zookeeper, but now we also supported CD and console, and all these are used in production by some one or the other and in the same way, in the open source. We use G RPC, but when we import it, we swap it out for stubby, and there are plugins that people have provided for monitoring with prometheus in flux, TB, this collecti and obviously pokemon internally. So.

C

This is actually the.

B

Stuff that we spent a lot of time in our software design Vitas is fully pluggable. So basically, it's not just these things that our pluggable there is actually a whole bunch of things that we do when we import with us into Google, we bring in, we add throttle, errs, we add monitoring health checks, ACLs tapper, and actually the list is pretty pretty big. This is these are some other big things that that I wanted to mention and the most importantly, this plug ability does not affect efficiency.

B

Typically, if you deploy with tests on top of my sequel, the what people have reported is about under two millisecond overhead added to what would have otherwise been a direct round trip with my sequel, which most of them find acceptable and over time we have we have actually when we started, we were not very disciplined, but then we brought in some discipline about coding. So we now our coding standards are pretty high and stringent readability is the most important thing.

B

When, when somebody writes software, we spend a lot of time making sure that it's structured such a way that anybody else that comes in and reads the code can understand what's going on, and this is actually attracted more contributors and they've all been very happy with witnesses code. We follow all go coding standards.

B

We require unit tests to be written with strict coverage requirements, neither is needed, end to end tests have to be written and they should not affect performance and we integrate with Travis code climate and notify, which tells us which runs all the unit tests or all the tests. A unit is an end-to-end test before we actually merge code in so.

C

This is because.

B

Our master branch is something that we import from regularly and deploy on YouTube and many of our users do the same thing. So we require the master guides to be absolutely stable all the time there are only one or two users that use an official release of witness which is currently at 2.1, and we are about to release 3.0 in terms of I.

B

Looked at the contributions, the big ones came from YouTube slack and HubSpot github and booking actually came on board later this year and they are beginning to wrap up their contributions and the square has also been contributing.

B

And finally, this is these are the stats that we have for with us overall and the big one is that slack is growing. The one downside of slack is that you don't realize how much activity is going on there, because it's it's a private channel and it's not Google searchable, but there's a lot.

B

So this is pretty much all that I have to cover. Let me know if you have any questions now.

A

Thank you for the for the presentation. I think that's something that the unity of see is gonna. Ask just to play devil's advocate with you and yes, something to prepare for. Is you know what what do you like? Why do? Why do you want the test to to go to the foundation like what do you see as the trajectory free to the test if it's accepted as project yeah.

B

Actually, the the main thing that we are looking for. Actually this was the this is the reason why YouTube is pushing is very insistent what this happening is that we test is being seen as a project owned by YouTube, and we want to change this to be a project that is owned by the community by the fact that all these people depend on it and they contribute.

B

So somehow we want to change that impression. That's that we by far is the biggest factor, but but like it doesn't hurt that we haven't actually with us is the other thing. Is we? We hope that CN CF will increase visibility of it s overall.

B

So far, it is only grown through word of mouth so, and that is the other big thing that we expect from CN. Cf is.

A

The like the YouTube participation in the project are you: are you expecting it to increase or decrease like what? What are the plans for YouTube with the tests so.

B

I, it will so basically YouTube is committed to so. The way meters has grown is the features that YouTube uses so actually I, don't know. If you have heard YouTube ease, has made a decision to actually migrate to spanner, but it's a multi-year project so but then YouTube is still committed to supporting with tests and therefore it wants to make sure that it grows the community and establishes leadership.

B

It doesn't know how long it will stay involved with the project, but it's possible that maybe a year or two years from now it may not be as involved as it is right now. So.

D

Can you talk about the differences between Vitesse and spanner.

B

Yeah yeah, so architectural early, they are very different because we test relies on an RD, be me on a relational database underneath so because of that that that makes it a very different product product compared to spanner. However, we do get pitched we do get compared with the spanner cockroach tid. Be all these new sequel databases.

B

The way I would compare. It is usually.

B

Vitas is better at handling SQL as a language, mainly because it relies on and real lar to be a mess underneath. So in that respect, for example, it supports DM ELLs, which allows you to change data using SQL statements, which system will expand our and cockroach DB. Don't support the place where it falls. Short of final is something that I mentioned in the limitations.

B

Is that spanner uses an LS m system and it can actually obtain a consistent cross short view of the data which we test cannot do right now, because for that you need to actually the database. The underlying database has to support that feature which my sequel doesn't support yet, but otherwise, if you are just coming top down for you, there's a there is very heavy feature: parity between spanner and Vitas. They both scale equally well.

B

There are some differences in how we how we like, for example, spanner, has chorim's and they have cross data center currents. So they give you better durability for your transactions. But then you also pay the price in terms of latency for it. But as we test relies on a different mechanism called semi sync replication, which relies on within data center core own style, consistency which is actually faster.

D

Have you just a another quick question: have you considered it all supporting Postgres sequel as well as my sequel? Oh yes,.

B

It keeps coming all the time because if you look at the architecture, nothing prevents us from actually replacing this pair with the Postgres with a VT tablet that can handle post Chris. Yes, that is definitely something that is on our radar. But the thing, though, is right: now we have our hands full, supporting the my sequel community. So once that pressure goes down like every week, we have a new user that come up and says: oh I need help here and stuff. So it's been pretty busy lately just supporting what we have now.

D

Do you feel like the work you know you've done in terms BTK and we're writing, queries and sharding, and all that stuff is oh, yes, totally.

B

Because if you, if you actually looked at the VT gate code, all it does is look at SQL by the SQL standards and figures out how that SQL needs to be broken out into smaller parts and sent to different instances underneath. So, if you, if you actually you have, what you would need to write, is actually a new VT tablet component for Postgres that can manage post chris here on this side. So once you have that from the VT gate perspective, it is just very routing. The query: that's right! Thank you.

E

Hey it's it's Alex coop question. How is how is pitch s actually deployed and and how do you see that sort of integrating with say an Orchestrator.

B

That's a very good question. You mean the orchestrator by islami, no.

E

The orchestrator is, and so how are the various moving parts of it as they.

C

E

What deploys them and how are sort of clusters scaled or sized okay.

B

So actually we- where is this looking for this slide here, so we actually have help charts and a docker compose as an example. So this is one place where we could probably do more work, deploy more examples about how to make with us work in a kubernetes instance.

B

We currently have a few examples him in with us to talk about how actually Anthony a gave a demo once he just went there the help chart and he chewed one command and he product the entire cluster, which is pretty exciting, but people that have gone into production HubSpot actually uses kubernetes in AWS and ec2.

B

They said that both both the hub spot and seach labs also uses use still of jesus gke. They said that they did have to create a few scripts to deploy with s, so so that part. Some is something that we can definitely do more work on, so within YouTube. Actually, it's a pretty big instance. I think we have like tens of thousands of nodes and pretty hike ups as a set up like about 20 data, centers.

A

Or so so so is said, if there's not okay controller, that's been, though, for me task for kubernetes. Where is the helm start deploying a controller that then handles the scaling? How does that actually work so.

B

Right now, the way I think they have done it is so when you scale you actually so once you have the vita setup done you. Basically, as you can see, there are very few moving parts here per se, so scaling with us basically involves just bringing up more and more replicas here and those are operation.

D

B

Does today correct yeah yeah, so the reason is because so VT gate is actually auto scale that scales up and down automatically based on the load generated by the app, but here they actually have to provision because they have to decide how much disk they want and stuff. So you are right. This is one place. We could probably support auto scaling, but we don't do that right now. So a human has to say, oh looks like our read: traffic has gone up. We need to bring it at more instances on this side kind.

D

Of following up what Clinton I think was going, is this something that could be completely transparent? If you had an auto scaler for the shards from the earth, it doesn't look completely transparent, yeah.

B

I think so, although that scares me a bit because because we don't know whether whether an increase in load is due to legitimate increase in requirement in the load or whether there is an abuse going on okay, so that so we I would be slightly hesitant to make this auto scale.

B

Nobody in youtube has ever asked to auto scale these things that, yes, our is actually want a complete control over how many instances this is. These are actually planned on a weekly meeting where we say oh looks like we need to do. Rashard looks like we need to add more replicas, so these are humans, at least in YouTube. Humans have preferred to make these decisions.

D

And I assume you have support for collapse, insurance right. We combine them. Yes, yes, we support both splitting and merging.

E

And then one other question events: the pc tablets configuration so once you've got sort of obviously a very tightly coupled relationship between the v3 tablet and an instance of my sequel, because there's a beefy tablet handle any of the functionality in terms of actually managing the lifecycle of that my sequel instance ie, keeping the service alive and or maybe upgrades of the software or any of those sort of things. Yeah.

B

That is actually meaty tablets job right now, if it actually, the the main housekeeping work that it does is actually taking backups and performing restores. It also helps you with the schema deployment and, and then there is also coordinating about who is the current master. So those are the housekeeping work that we do tablet does for my sequel, other than that. All it does is serve queries, okay,.

E

And and the coordination of who's, the master doesn't happen in concert with. Let's see the unites you yes,.

B

Yes, so we actually have you can actually issue manual, failover, zouri parents and then so. The the one thing, though, is here, is that when a master failover happens, it's actually, you have to actually inform the VT gates immediately. You cannot afford to go through HCB, because that is actually the generally the SLA for something like HDD disseminating information is usually 15 to 30 seconds and sometimes even longer. So what the way we do.

B

This is actually these VT gates are connected to these VT tablets, and there is actually a health check stream that flows from all the VT tablets to ET gate. So as soon as the VT tablet becomes a master, the health check immediately informs the VT gates and within seconds all VT get self move the traffic over to the new master, so HDD is mainly used as passive scary.

D

What about the configuration of my sequel d? Obviously, it's got a ton of config and some a lot of it is very, very sensitive to performance and that's something that you could specify centrally with Vitas. Or is it something that you still have to manage through the my single conf and and other.

B

Stuff, so it s actually has a tool called my sequel CTL. My stands for my sequel control, where we have simplified, how you configure my sequel and actually a lot more people than we expected use it, because we thought that most DBA is like to configure their own. My sequels, but a large number of people actually have adopted that tool. It actually makes it very easy to spin up my sequel instances and the good thing that my sequel CTL does.

B

The way we test works is like we run small instances of my sequel, so in a particular box you could probably see 20 or 30 instances of my sequel running per box and what my sequel CTL does can do is actually isolate all of them the default. My sequel is expected to be running like one instance per box. So but then we have changed it such that it configures it's that to make sure that they are each isolated from each other.

B

So so people actually try to use my sequel CTL to a large extent, and many of them do.

D

And you do you allow shirts, we have different weights. You know big machines, machines right.

B

Yeah, you can actually decide your your shot, so basically you can decide the size of your shots by so the the final low-level sharding unit is actually what we call a key space ID and the ranges can be non-uniform right and that actually stitch that especially uses that, because they are a multi-tenant system where some retailers are huge and some Ella's are small so like for the big retailers, for example, they have just start dedicated just for that retailer right.

D

A

They say if like if you were to compare this, almost make a decision to to leverage like in the public lot, if they're gonna, leverage, Amazon's RDS or in Google's sequel cloud like what would the decision like what kinds of things do they or do they consider when they decide to either use the tests or the public cloud sequel instances? So.

B

Actually, we des can make both RDS and cloud sequel better, because and I was talking to somebody from Amazon. They said that Vitas can even help with Aurora it's, because Vitas can do sharding and none of these systems can. So, if you put, if you deploy with s in front of RDS or cloud sequel or Aurora, you solve the big scalability problem that they have where the write throughput is limited by the fact that only so much can be handled by one box. So.

D

How does that look like it doesn't? Vt travel require a UNIX socket cyc kudi. How does that work with RDS? No.

B

Actually, this is how we presented it here, but VT tablet can run outside of the box that my sequel runs on. It can just connect through a regular port, I, say and actually Square uses it that way they actually, they actually have a huge, my sequel instance. They are like they are coming from a monolith, my sequel and they're slowly, shutting out of it and they cannot afford and they actually the way they are set up.

B

Is there the my sequel is administered by a different team, so they actually run the test as if it's an app, so it's actually VT tablet runs in a separate box and drives my sequel so.

C

In the case of RDS, that's exactly.

B

How we will run it, we'll just have ET tablet outside my sequel and just connect to to a TCP port.

A

So I'm definitely an expert with with RDS sir Google's cloud sequel, but you're basically saying that the there's limited scale in those services and the test is gonna. Allow you to have wider scale correct.

B

Yeah so RDS cloud, sequel or aurora, their limitation is each instance, is only one machine, so you can choose the size of your machine, but once you've gone to the largest machine that you can get, that's pretty much. You are pretty much stuck there at that point and if you're, if you suddenly, if your data size blows up out of control beyond that, then you're you're in trouble and we we we want to do actually RDS, fortunately or unfortunately, nobody has asked for it yet, but and we've been busy supporting the current community.

B

But if the next person that comes on board says I want the test for RDS will probably focus our energies there at that time.

B

So we think it's actually very little work. It's just that we haven't spent the time.

D

Thank you. So what is the like? Largest deployment of the classes? It's still my strong I'm, sorry, our youtube actually I.

B

I won't be the largest insists, would probably be YouTube, but based on some of the people.

B

Some of the numbers that people have reported at this point I won't be surprised if what is outside YouTube combined is the YouTube inside some of these, some of guys are pretty big in what they have like slack is in the hundreds of thousands of QPS same with Stitch labs, and some of some people who have not mentioned one of them said that they have like 300 databases and some of them sharded that they're fully into production, so I don't have an exact list, but approximately what is currently outside YouTube now is definitely said to exceed what YouTube has internally and.

D

Have you had any discussions with the folks over at Facebook around sharding and my skull? I know they do a lot with this stuff as well? Yes,.

B

I do talk to them every time. Every time I go to the conference's. I talked to mark Callahan, who is the air architect for my sequel and.

B

Every time they mentioned that we should there are places where they can use with us. Actually I met the head of the era sorry last week and he was still saying the same thing, but I think that is there's also the fact that things are working now at Facebook and there is inertia about taking the effort to replace, what's currently working with something else, so so I think eventually I think the way they would both Facebook and others. Other companies are also going through. This is when they encounter big problems when they say.

B

Ok, we need something to be changed at this point. She is the time when they would do it, but this there's like no motivation for someone to change something. That's currently working and fine.

A

Thank you, I've got one more I know we only have a few minutes. Is there anyone else out there? That has a question for Sega.

A

Alright, one more so, regarding you know the the interaction with with my sequel has anyone you know who's contributed in my sequel, expressed interest in absorbing some of the features of the text, because you know you described with the BT tablet and it seems like some of that I know throttling or other logic that's in place there. It could be beneficial to my sequel on its own. Any discussions there for what pieces makes sense to stay into tests and what pieces may just need to be absorbed into my sequel, yeah.

B

I, don't I thought I talked to the Oracle, my sequel team, at least twice a year and force one reason or another. None of them think that it's important I don't know why, but they for they are I, think they are still so there is some relational trade-off that we make when we do a connection pool, for example, right like you at that point, eat you enjoy. Each individual connection is actually stateless, which is actually not, which is actually a move away from a relational concept. So I I have a feeling.

B

They don't want to go that route. They want to remain a pure relational database that honors everything, because if you, for example, if you, if you connect to my sequel on that connection, for example, you can create session variables that state all those things actually make every connection unique and expensive, but because of the fact that they are committed to supporting those features, they cannot move into something that what we test us in. We test this case. What we say. Those are actually features that most people don't use in Vitas all he says.

B

Well, we don't support those features, so don't use them and most people are happy with that.

B

So yeah, and if you ask me they should be part of my sequel and the only reason we actually tried to change my sequel to to push some of these features inside there. But what we realized was that as soon as you patch, my sequel, nobody wants to touch it because it's very hard to build it. It's people have probably downloading building. So we use we started with a 5-line patch and nobody wanted to use it. So we quickly moved away from that and made a decision that we will work only with stock.

B

My sequel, as is.

A

C

Hey bender yaaaa yep great, so thank you so much for the presentation, sugru and what's a let's wrap it up, and we will probably see everybody in the new year thanks folks, all right.

A

Thanks so much thanks, you're gonna be our great holidays.

C

C

C