Ceph Cephalocon APAC 2018, 22 Mar 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph's journey at SUSE - Lars Marowsky Brée, Marc Koderer

Description

Cephalocon APAC 2018
March 22-23, 2018 - Beijing, China
Lars Marowsky-Brée SUSE Distinguished Engineer, Ceph Advisory Board member
Marc Koderer, SAP OpenStack Evangelist

A

Again, I can only echo my previous speakers, it's a great honor to be here today at the first largest cephalo Cohen ever and I would like to talk to you a little bit about our first five years with Seth. So.

A

Souza has been in the Linux business for over 25 years. Now we have four five years of that time. We have journeyed with Seth, but previously two that we've done a lot of work on storage. I personally have build ups, a high available product, where we have focused very much on enterprise availability and our products are now in use everywhere, including the airplane tower I flew through, which is always kind of worrisome to myself so and Linux and Souza, and especially Seth, are really chameleons.

A

You find some everywhere as these days right and the mimic octopus is just a perfect match for the show today. So, but first, let's review some of the challenges that we've already heard in many of the previous sessions. Why are people looking at software-defined storage in the first place? Well, that's primarily because the solutions that we have been using in the past and that I've been working on for the larger part of my career, don't scale it's impossible to bring a single storage system up to you know hundred 400, 500, petabytes or even an exabyte.

A

You have to scale out it's impossible and data keeps growing, so you really have no alternative and they're also kind of expensive, so they have very high margins, which is great if you are selling them, but not so great when you're buying them and they are not a good fit they're too monolithic in the software-defined data center, where everything is now defined by software, where software is eating the world, where the network layer, even as being defined by the software having a static storage component, really is not a good fit anymore and, of course, data grows very very fast.

A

We have data coming in from our mobile devices, Internet of Things video surveillance, medical data that has to be you know not just generated and stored briefly but kept around for us a lifetime of the patient plus 10 years. Videos are growing. Resolution is growing email, us I, don't remember when I last deleted an email I just keep it around forever, because I never know when I might need it again, so nobody deletes State hi. Everyone just wants to add more and data protection makes us worse right.

A

You you have a lot of data, it becomes hard to manage. It becomes hard to store, it becomes expensive to store. You have to keep it available. Yes, I have this big email archive, but if I have a really big one I'm like big provider or you know, storing data for other purposes, I, don't quite know which part of my data, my clients are going to access. I have to make sure all of this data is online.

A

I also have to make sure all of this data survives as one of my data centers there's a problem so I have to keep backup and recovery redundancy in my mind and that's why I really appreciate safe and why? One of the reasons why we choose safe of others is I come from a world where we do high availability, clustering and the protection of the data is the most important thing ever right.

A

Eventual consistency usually translates to very little consistency, and if you don't have the data, you will not get the service back up, so the data is really at the core of everything and another important reason why we choose safe over other proprietary solutions is well. Suzy is an open source business, so we only do open source, but open source is the only sustainable option. If you have a proprietary solution, you are computing competing sorry with this huge community of contributors.

A

There are, you know, distributed that are each contributing a tiny part, but that adds up it's this wave of the future and you we have seen this before with linens right when I started with Linux before something that students ran and universities ran and then someday I woke up, and somebody asked me: how can we build a you know?

A

Geo replicated cluster for our mission, critical core business and I, don't quite know what happened in between, but it was kind of inevitable right and it's possible to sometimes keep a little bit of an edge over open-source when you were like really focused, but eventually the community catches up right. This is a marathon, not a sprint right, so in the end, we really believe that contributing to the community is the only way of solving our customers problems and with that, I would like to take a brief look at the product itself.

A

So Sousa has a product asset based distribution that we make available to our customers and we've seen quite a bit of history with this, and you may not be aware of this, but Sousa has been shipping Ceph since August 2012 right, we shipped it as part of our very first OpenStack cloud offering that was safe, Argonaut right, we had customers running safe.

A

Our Granada outsider at that point was the technology preview, but we have a long, long history with this and then eventually we realized that surf is a really great and not just for OpenStack, which is why we called it Sousa enterprise storage. We make it useful, not just for OpenStack, it's still the most common use case for sure, but we have a long history. With this we keep building.

A

We keep tracking the safe upstream community and contributing we, thankfully incorporated the automatic team in October 2016, and currently we obviously have a release out on the latest set released, safe luminous since October last year and later this year, as you might guess, there will be a release on mimic so again, a long history, a long history with supporting Linux and open source projects for our customers in general.

A

Sometimes people are worrying why they should choose a linux vendor over you know just using the community project and if I was CERN, I would probably go with a community version. I have a lot of cheap labor. That is very well trained and that's how Olynyk started right. It started at the universities, but then you realize that just having the software out, there is not good enough. It needs to be tested, it needs to be validated. You need to have so certification and certifications are really not something community members enjoy right. It's much better.

A

If you can offload that to somebody who really is interested in that customers have questions, they do not want to Google, they do not want to go to mailing lists all the time they want somebody to answer their questions and provide experience.

A

They do not always have to bend twist to interact with the community members directly. They do not have the bandwidth to interact with all the software, vendors and hardware, vendors and, of course, sometimes customers have problems and really want somebody to fix them now, and that also is one of those value-adds that vendors provide. But besides business Suzy is very active in the safe community as well.

A

We are strong contributor to the safe community, I'm, very happy that my company was able to sponsor this conference, but we have, in the past sponsored many safety events as well. We have hosted them. We have sponsored them. We are a member of the surf advisory board. We do everything in open source. All our safe work is open source. We aspire to an upstream first policy.

A

We have contributed to I Scaasi you where, as a solution that we choose supports multi parsing very early on, and we are now adding it to the solutions that the upstream community has chosen to go within the next upstream release. So that's great. That's really really useful I Scott! It's really important if you are into operating with systems that are not quite ready for native safe, yet we have supported it on arm 64.

A

We have supported CFS and that CFS deployment was actually a great scenario, because something being ready for the community is not always the same as something being entered ready for production used by customers. So we initially put guidelines around self s, use based on our own testing, so that the customers could feel safe, that the use cases they had would work stable and reliably with CFS.

A

And another thing: that's really important for us is orchestration, so SEF is, as we've heard several times complex to deploy and complex to manage, so we have invested heavily into salt-based orchestration.

A

We call the project deep sea, you know octopuses swims through the deep sea and that's also fully open source, of course- and it also includes tooling, for upgrades it includes tooling, for is a file store, blue store migration that takes care. That's a cluster is migrated one OSD at a time and converted from file store to blue store, while the entire system remains online and continue serving customer data. And of course we have OPA medic, which is a great great tooling, based with monitoring based on Prometheus and Griffin arm.

A

So we try to leverage open source project as much as possible instead of reinventing them and it's now being merged into SEF core and here's just a screen shot of automatic monitoring dashboard. That is actually an embedded prometheus graph fauna instance. Prometheus is an excellent monitoring time series tool, it's also cloud native, so we are already here ready for containerization. I will talk a little bit more about that in the future, and we realize how important it is that our software is accessible to everyone.

A

So, together with a partner in China, we have localised, the automatic dashboard and the management functionality. We have translated our documentation and we are making sure it's a safe management dashboard from day one inherits the capability to be localized and translated into all the languages that are needed right, because sometimes the local operator, you know, may need this functionality and it's generally just good practice and yes automatic is now merging. This is actually outdated.

A

It is now merged into the Ceph dashboard, but it's not yet in a shipping release and we are adding functionality to this and it validates our choice both in automatic but also in up streaming it. We are seeing already the first contribution from outside our team so and for that we are deeply deeply grateful. But this just highlights that if you are not doing open source, eventually open source will catch up with you and it's great.

A

Sometimes things are prototype outside the Ceph core, like automatic took some time and frankly, we took inspiration from many open source management platforms that may not be around anymore, but ultimately the community will catch up and rillette it. So strong industry partnerships- this is also important. We have interactions with many many large companies. Souza is part of micro focus, so we are very truly global company and we have also the ties to the other companies as well, but instead of talking about the partners that we have, for which again, we are very grateful.

A

I would like to talk to you very soon about a partner who is actually here with us on stage and before I hand over to mark. Let's take a look briefly at this growth of SEF deployments. Initially, the deployments were very tiny and tender, so customers were just trying out Seth right. It was like small environments that were like development setups, just like grassroots, how Linux started, and then we had customers deploying it.

A

You know in the second stage, deploying it for traditional workloads where they replace traditional storage arrays that were getting too expensive with set based solutions, and that's great that's really. What helps a product grow in understand the market better and get the customer familiar with it, but where it really shines is when you go as a full weigh in and really switch. Not just to.

A

You know, drops F into an environment that you already have, but have a true mode to cloud native environment, and then you can really get the full benefit of the scale out solutions, and we said I would like to invite mark Cordova from sa P I'm sure you may have heard of this company who has choosen SEF. As you know, a core part of the new architecture and after his presentation, I will be back with you for a few slides about the future.

B

Hello, everyone I'm really happy to be here so I want to talk about our architecture that we choose for the cloud. So basically s ap has a long tradition, so it has 40 more than 40 years of history in software and computer science, and basically we are middle of of an industrial revolution to to transform everything to a cloud native world and, as I said, as we have a big history, we have a lot of brownfield applications that needs to be transformed into the new architecture.

B

So basically, how can we do that and the the answer here is we have a product called s AP cloud platform, so the s AP cloud lab cloud platform is a platform as a service offering which enables you to put your cloud native application with a variety of backing services that are also cloud native, like a database, Cassandra database and so on, and you put your micro services on top and can build your application. So it's also integrated and not the the traditional as ap products for ERP and also the sa P Hana system.

B

So my team is basically focusing on the layer below, so we are running the OpenStack cluster in a DevOps mode. We are kept taking care about the bare-metal provisioning and all the bare metal servers for that. The SP core platform itself can run. Public cloud can also run on as ASAP datacenters with OpenStack, and we were. We were working on enabling that also for for on-premise private edition workload so that customers can build the same stack on their own premise.

B

The basic the the usage that for for this kind of stack, is primarily IOT workloads, so we already have two years of experience with iot workloads on top of OpenStack. So now we came up with a new architecture and we're since last year started to youssef. So the question is: why do we have a need on SEF? And why do we have a need on a special architecture for cloud native? So so, basically, if you have a look and on the left, you see a monolithic payload and that that's the traditional thing right.

B

So you have an application. You have an database that is active standby and you have one central storage. What you do is that's the usual for monolithic payloads. You chew your hardware that you dead a never fail and you never test fail overs that extensively in a distributed or cloud native world. You will have micro services that can stay scale unlimitedly. Basically, and you have databases that all do their job active active so also that can scale. So you need to have a software-defined storage there to scale up your needs. So, for instance, we are.

B

We are onboarding now the one customer that has 60,000 IOT centers and these 60,000 IOT centers will increase every month by 20 thousands. So you see that there will be a huge amount of data coming in and we need to be prepared and we need to enable scaling in that in that sense, so let's have a look in our architecture, so we choose within our deployment three availability zones. So why three availability zones, because we are hosting cloud native applications and all cloud native applications need forum.

B

So so that's why we have an odd number and we have three. So we place the OpenStack control claim across these three availability zones and we also distributed the hypervisors across those.

B

Basically, four for our instance, we have here a set up that we have a to ten to two data centers that are quite close, 300 meters in range, and we have basically these two data centers and two five compartments in one of those, so that we have three availability zones so also for staff. Here we are, we are distributed to safe cluster across those a real ability zone.

B

So basically we have one OpenStack and one staff across that we choose Susy for the OpenStack part, so we use Susa OpenStack cloud and the SUSE enterprise storage product force F.

B

So let's have a closer look to to our installation. So what is really important here is that that we we should not underestimate that software-defined storage need network. So that's really crucial thing you! It's not! You don't need to just storage expertise in your team if you want to run this after so you need to have also networking expertise and also compute or Linux expertise.

B

So what we have here is a spinal if architecture from networking means it doesn't matter where, when if data is flowing from one data center to another or from one within the same data center, it will be the same hope and the same bandwidth. So our our safe cluster, with our in production landscape, has 108 storage, nodes, 24 disks, pear, OS d, node and 2 viene, an Viennese and networking wise. We have two times twenty five gigabit four for the front and network and two times twenty five gigabit for the back end replication network.

B

Primarily, we are focusing on radio Skateway object, store usage because IOT workload, our IOT workload, would hit the object store. So that's the thing that we actually tuned a lot right now. Now we measured the performance overall, we had I think at the end, 10 compute hosts really putting load on it. We came to something around 50 gigabyte per second, but this was not was not the the end of it. So we estimate the maximum performance about 60 gigabyte per seconds with 4 megabyte writes.

B

So, basically, as I said, the radio scaled way for us is really important. We use swift interface here, and so we had basically already last week some some problems with with scaling. So we did a lot of performances and also our customers now start to onboard AI. Oracle I would heal workloads and we saw that the performance is not good enough. So we scaled up. We had we started with 3 radio state ways we scaled up now to 30 radio state ways.

B

These are virtual little machines, so we can easily scale them up I think later on. It would be nice to have them on kubernetes to really yeah on the fly scale them. So basically, what? What do you see here? If you have a look at the stack, is something quite obvious. You see that that's F quite well scales. You can scale each layer individually, so you can add more osts, you can add more monitors or you can add more writers gateways, but there's one layer here.

B

That is actually not really scaling, which is the load balancer. So so we had problems with with the load balancer with SSL Kemeny, so we changed the way that we do determination in the registered way right now, so we have the first measurements. These are really fresh from Friday last week. So basically we we tried out. How much does the Raiders Gateway scale here?

B

So we start with one Raiders gateway, it's a virtual machine with 60 P use and do a get put benchmark, which is a nice tool for for benchmarking by the way- and you see from from the numbers that they're the radio Skateway itself can scale up quite dynamically. So so, basically we can. We can reach and saturate the staff trusts or quite easily by extending the Raiders gateway number.

B

What is also an important thing is the concurrent connection, so basically we s there will be more and more IOT Central's in place. There will be also more and more traffic, so we we came up with a number that we have the maximum of 512 worker threads. So basically, this also will scale up individually radios gateway. So, basically, that that's just one detail that I want to share to show how so so.

B

Basically, it's important to understand that if you want to operate a safe press, so you need to also benchmark it and we'll each unit for the way that you want to use it and I think for for IOT. We are now well prepared and and yeah. That's basically what I wanted to present and want to thank you for giving me the opportunity to be here.

A

Thank you Mark, um yes, so that is very exciting. We are really happy to see that chef is being deployed in these use cases, um but that's where we were and where we are. Let's see where we believes this might go. I.

A

Already mentioned that we'll have a release out based on self mimic. Our focus is improved. Interoperability, improved localization management, the scale-out user experience is also interesting. Sometimes you notice that a UI that worked well for cluster of 100 nodes really needs some changes for cluster with the cells and nodes, eventing, alerting, metric reporting and telemetry and I never get tired of seeing those hollow ports. They are so beautiful.

A

Let's look at that into a little bit more detail. As far as what's our vision, our vision is set chef blends into the ecosystem in your data center, and you have to really look really hard to find it right.

A

It's just a part of the infrastructure. If you look really closely, you can spot it, but in general it's just there does its job and you can get on with other things. How do we want to get there? Well? Containers are an obvious part. As the data send, it becomes more flexible, more dynamic.

A

We need an abstraction layer that is more agile than installing packages and less heavies and virtual machines and containers and cuban Etta's seem to be the way this goes, and it's really it would help us address a number of the issues that we see in managing large-scale cluster lifecycle issues, so that is certainly coming. We've had meetings all this week around sus, it's a really exciting technology.

A

We love the automatic dashboard and we love self. So as part of this management experience, we want to make this better. Add features make complex tasks easier. Yes, initially, some of the workflows are going to be there manual, but ultimately there will be Brizard guiding you through. There will assist you along the way. You know really. The computer and the human have to work better together from a monitoring perspective from a management perspective, and it also means that we have to make sure that our management interface can address users of different levels.

A

We have to make sure that we can address users that are very experienced and address users that you know just want to provision an additional instance of their workload or see. Why their workload isn't performing and effects that workload and also expose more relevant metrics to the administrator, but those metrics are really complicated and we hear a lot about machine learning in AI, I, don't think AI is just rel and machine learning is just relevant as a workload running on top of our cluster.

A

When you look at something that is sizing a safe cluster, which seems such a trivial question, how much Hardware do I need to manage my workload and then the developer doesn't go either, doesn't know either ask spec. Well, what is your workload?

A

How you know how many I helps do you need and it becomes really complicated and we need from telemetry data that we get from real-life clusters as we referred in the previous presentation as well, and where there's a currently an upstream project with video den Hollander, we need real-life data on how much memory a safe cluster needs, how much disks it needs and also be able to track. You know global performance, you know it's F release faster on average. We know on our synthetic benchmarks in our lab, but is it true in the real world?

A

So that is important. You know analyzing and understanding that telemetry gaiter from a performance back perspective. Also from a failure prediction perspective. When will a drive fail, you know, do I have to order drives, will turn off my derive. Will 10% of my cluster fail on the next week? Should I be prepared and we've heard a lot about all the knobs that set exposes, and you know, tuning PG NAM tuning, all those quality of service parameters and all those things those options are really really complicated.

A

Right, I am NOT sage I'm, not smart enough to understand them all, but there is already interesting research out there that treats this as a game problem, and you tell the system, your your procuring Q networks that you've wanted to optimize for latency and latency stability, or maybe you want to optimize for throughput, and you know over time it learns and generates data. So this is actually really exciting and it seems far so far out and then you look at it and somebody has already done it right.

A

So these kinds of technologies have already been documented for other storage solutions and would be really great to bring them to Seth and with that I would really like to conclude that safe. The question is not if you should be using self I, believe the question really is only when and I hope to speak to more of you about these questions during the rest of the conference and with that I would like to again thank you for your time and please come find me and talk to us. Thank you.