Red Hat OpenShift San Francisco 2019 | OpenShift Commons Gathering, 28 Oct 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Big Fast SQL with Presto: Kyle Bader and Kamil Bajda-Pawlikowski

Description

Big Fast SQL with Presto with Kyle Bader of Red Hat and Kamil Bajda-Pawlikowski of Starburst Data.

Filmed October 28th, 2019 in San Francisco.

A

So how many of you know about Presta a little bit? Okay, not too many all right skate, so just focus on giving a quick overview of Presta and its capabilities and then we'll discuss how we can deploy it with one open shift and inquiry redshift safe.

A

Another great thing, so, first of all, I'd like to talk about Presta as a sequel on anything engine. So it's an obviously open source project was first started about seven years ago at Facebook and then spread here in the valley and beyond, very very quickly and myself and my team were involved in this project for almost five years. By now.

A

So what's unique about Presta is a compute only distributed sequel engine, which means you can deploy it almost anywhere and it you can actually allow press so to access data from many many different data sources, some of them. Some of those are object. Storage like like safe or you know, Amazon, s3 or Google, Cloud, storage or Arthur, blob, storage, idos and other technologies.

A

Like this, you can also query HDFS and Hadoop, obviously known for storing big data, but you can also connect to a variety of different databases like or called data, sequel server and PostGIS, and so on, so on, and also no sequel engines like Cassandra and most recently elasticsearch as well. So it's very, very powerful mechanism, where you separate compute and storage and and you can do- provide scalable processing using multiple machines in York, presto cluster and then from the user perspective.

A

This is very familiar because you're sitting in front of your favorite bi tool or sequel edit or you can run things like superset or read ash or Jupiter notebook. So it's very, very powerful known interface through JDBC and ODBC drivers.

A

Now you know, presto, community of users is actually very very large. This is a small sample of that, but you can see some of the biggest companies in the world and from actually all over the world.

A

Many different industries very, very horizontally, applicable project, and it's proven at scale by and for variety different use cases by all those different companies. So you can, if you decide to apply it, you can sleep. Well, you know those guys are pushing this to the limit already.

A

Now so why people like first or why so many companies are deciding to you, leverage presto rather than alternatives, I think it's several different reasons, and some of them are summarized on this slide. First of all, it's a community driven open-source project used by a number of big players who better their psychoanalytic needs on presto, so guys like Airbnb Netflix, lived Airbnb and many many LinkedIn, and many many others right and a part of the community driving this forward.

A

You know making sure that the product survives, despite of any changes of a single individual company, deciding to go further or not. It's a very powerful high performance, sequel engine proven at scale. So the largest deployments of presto are, you know, approaching about a 1,000 machines in a single cluster and there many companies are actually running many many clusters because, since its compute only is very easy to spin them up and down and give access to certain data sources without sort of creating data silos.

A

You know it's meant to be interactive, sequel and and handle high concurrency, of course, as.

B

A

Mentioned fundamental piece of architecture is separation of computing storage, which means presto itself, doesn't have any favorite storage mechanism doesn't come with its own mechanism to store the data it relies on whenever your big data is whether that's storage or HDFS. You may keep some of your older data in Oracle, Tara data and other data warehouses. You can keep some access and operational data from Clausius or a sequel, server or anywhere it lives right now.

A

You don't have to invest in moving data around to start getting into insights right because you can access data where it lives after property mounting the configuration settings.

A

So with that, we also like to say you know it represents a big value as a no vendor, locking, first of all, it's an open source project, so we can run it use it without any vendor. If you like, you know you're on free from being tied to any Hadoop distribution, it works across any distribution. You can change the storage underneath presto and your applications. Your end users will be still interacting with the same data without knowing you actually move from. You know HDFS to object, storage, for example, or wood.

A

You can move from on-premise deployment to the cloud or the other way around and and things for them do not change, because pressed is isolating them from that entirely and and again, you're not tied to any specific infrastructure. So you can move between clouds, for example, if that's your choice so provides a great insulation flexibility.

A

Optionality in deploying and querying your data.

A

Okay, so stuber's, you know, as I mentioned, we are involved in price to community for for many years ready. We have large customers in production, both on-premise and in in various cloud deployments with with kubernetes. We are now in enabling, through very similar experience across any cloud and on-premise environments like OpenShift, for example, so which is really great for both customers and us as developers that we don't have to necessarily you know, handle custom deployment mechanism for each cloud separately and as an enterprise vendor.

A

We have that you get extra in addition to just core open source project, and but we contribute heavily to the open source community. In fact, we probably represent over 70% of contributions to the project right now.

A

So, as I mentioned, presto is very, very high performance, sequel engine and it was built like that from the beginning. So the objective for the team that was implementing this was make interactive penalties in big scale, a reality right so before presto, that was high, for you know, obviously very highly respected engines can handle petabytes of data.

A

But the interactivity wasn't a strong point of hive so with good design techniques from sort of textbook, Facebook's recipes for building MPP database press, the advantage of pipeline in memory, execution, columnar, processing, internally, vectorized processing, efficient memory structures and computes leveraging modern computer engines like multi-core CPUs and multi-threaded architectures, and we've combined with corner storage under the cover in the form of in form of our c / k.

A

We are now able to the very nice performance for political use cases and we and we then also cost-based optimizer, which now works across many different data sources, and you can arrive at very efficient query plans across your data that could be spread out in many different in many different places.

A

So, in terms of performance, this is just to show of hospice optimizer improvements we introduced some time ago. The the primary goal here was for environments where your data is spread across many different sources, and you have main tables being involved in in in various queries.

A

You know the the the decision: how to arrange the join order in a query was really fundamental when, if done properly, so this is showing press two before and after introduction of cost-based optimizer- and you can see, we are enjoying benefits when autumn of magnitude faster performance for for many queries.

A

Okay, so with that I think, hopefully it's very clear how Presta works internally, what it's good for and I will let Kyle to discuss how you deployed in OpenShift platform and how we can enable data analytics in the environment, yeah.

B

So when I was first exposed to presto was probably a couple years ago, two three years ago, we were meeting with a number of customers who were in the middle of kind of switching their data processing architecture over to using, like on-prem, object, storage, and you know that's why I was there to help them kind of adjust and use SEF for those needs, and presto was something that they really liked because they were, they were breaking up.

B

Compute and storage and and presto didn't come like with with an opinion, is in terms of what what storage would be used with it and a couple years ago. A lot of these same customers we're using presto and we're using the object, storage and what they were. You know they.

B

They basically wrote their own deployment tools for deploying these different presto clusters, and one of the things that are great about being able you know, having open shift is alleviating this burden from folks right instead of having to write, you know some write, scripting and some sort of configuration management tools. They can use something like I can operator, and so you know and having written you know, ansible play books for presto I can I can appreciate not having to do that anymore.

B

So all the things that kubernetes are good at you kind of get once you start using the operator framework to deploy clusters and particularly presto clusters. So instead of having to worry about, you know provisioning new nodes or do fault-tolerance kubernetes kind of handles that for you it can. You can say how many workers, how many press co-workers you want online and it'll bring that many up. If one goes down, then it'll provision a new one and it'll get bound to a different node. You can trivially scale it right, so I can go in.

B

I can change the number of replicas for workers up and then you know I have more, and so you can potentially make it so that you know, if you have a higher query volume that you scale out the cluster to be able to. You know, keep your your query responsiveness, low and then, if the volume of queries kind of subsides- and you can- you know, scale it back in and because it's compute only there's no you you don't have to worry about it right.

B

One of the classical problems with sorts of like database type approaches where you have compute and storage together is scaling in is usually prohibitive, because that means you know the data is there, so you can't it makes it more complicated, but being compute only makes that a lot easier, so presto on OpenShift we, it had been been a little while and so I reached out to Camille, again and I said: hey, you know, I see that you guys are doing some work with kubernetes and and I think it would be great if, if starburst had an operator inside of OpenShift, so that people could really easily provision presto clusters and for to process their data.

B

So we kind of connected the dots and they made it they made. It happen with a little bit of help, but mostly them they had. It was like 90% done by the time we started. Having the conversation with them, so what the operator does is it deploys the court deploys coordinator and worker? That would then work together. So you submit the queries to the coordinator.

B

Ordinary comes up with the coordinator, comes up with the query plan and then distributes the tasks to the workers, which then will source data process.

B

It filter it and then you know, do any sort of other more complex type operations and then return it back up and they also have bundled in hive Mehta store service so that you can basically catalog your schema there, and if you, you know, want to improve the query plans, you can, you know analyze the tables and then subsequent results of that analysis will be stored in the medal stores, so that future queries to interact with those tables will be done more efficiently.

B

So at this point this is a screenshot from one of my OpenShift 4.2 clusters. If you go into the the catalog under the Big Data section, there's the Presto ovulate er operator, so it's under the OLM and you can click and install and you know, then you can submit CRS and effect a presto cluster for your environment and begin to experiment with it.

B

So, where does where does SEF come in? Well? I had a little lightning taco Larry about the scalability of SEF, but SEF and presto actually work really great together, because it's just an object, store and presto is just a compute engine. So there's not really. You know opinions around using a particular storage or using a particular query engine, because it's not a verticalized stack and originally you know. I was learned about presto by way of customers right. So we had.

B

We had a number of customers that were deploying very large, set based object, stores and we're using presto to process that data for reporting and so on and so forth, and the things that Seth provides like a racer coding really are great for dealing with higher amounts like many petabytes of data, also open source and then kind of the the requisite plug right.

B

So, in a an open shift, an open shift environment, we have open shift container storage, which is the the packaging of of Ceph with an operator that can manage staff which is called rooks F and then additionally, there's another component called nuba, which is kind of a multi cloud gateway. That can you know you can have multiple different object stores on prem or public cloud and it can kind of route and have sophisticated policy around where where data should be placed.

B

So how do you use SEF, plus presto? Well, there's a connector! So presto has this idea of different connectors, there's different connectors for connecting to different data sources.

B

So, if you are connecting to you know Postgres, you might use the Postgres connector if you're connecting to in this case, if you're connecting to s3 compatible object store, you would use the hive connector and then you reference your your meta store where it's going to retrieve schema, that's going to have information about what what buckets and what what prefixes, within those buckets map to particular databases and tables.

B

And then, this is also where you would configure your credential information and an endpoint if you were not using like Amazon s3 proper and instead we're using an on-premise object, store.

B

And so you don't really know like if I'm a if I'm, just a data, scientist and I'm interacting with the data I, don't really know necessarily if, if the tables have already been created and I'm just doing sequel, queries I, don't really know where the data is coming from, and that's one of the nice things about presto right. They can have multiple. You can have multiple different data sources. You can have some relational databases, you could have an object store.

B

You could have some older data, that's in HDFS and from the data scientist perspective it doesn't they don't know that it's being sourced from one one place or another place, and so this is kind of nice. If you wanted to create tables that map to object store, you know it's as simple as running a few statements and then you provide basically an external location right.

B

So the path you would give it like, s3 and then the bucket and then some path in there and then that map's that particular table and the partitions within that table, and then you know the park' or or RC files that that compose that those tables to there so that when the workers go to source data right, if they have a query that you know is going to pull in, you know by date range right and you've partitioned your tables.

B

By date, then it'll read all the files that are in, like it a little filter the path right, so it'll query for the list of all the objects that are in the bucket with this particular prefix that match. You know based on the time range and then it'll read in all those files, and then you know, parquet has metadata and so on and so forth and so it'll bring all that in and the person writing. The sequel has no idea.

B

It doesn't need to know all of the intricacies around accessing the data and object store, it's all being handled for you by the hive, connector and by presto, and it's kind of abstracted away by this. This idea of the schema.