S
3
-
EPISODE
6
Data Masters Podcast
released
September 21, 2023
Runtime:
23m50s

Bayesian Optimization for Databases

Dana Van Aken
co-founder and CTO at OtterTune

With the exponential increase in data generation and the need for real-time data access, traditional data storage solutions are becoming inadequate and economically unviable. Cloud-based solutions offer scalability, flexibility, and cost-effectiveness, making them a superior choice for businesses of all sizes. As more businesses migrate to the cloud, the ability to deploy and administer cloud databases without requiring expertise in database management becomes crucial.

Enter machine learning, specifically Bayesian optimization, to automatically tune database configurations based on performance metrics and user-defined optimization goals— and now, meet OtterTune!

In this episode, we welcome Dana Van Aken, co-founder and CTO at OtterTune. Their SaaS solution leverages AI to streamline database tuning, addressing configuration challenges and enhancing the performance of PostgreSQL and MySQL—all while reducing costs. Join us to delve into the challenges faced by database administrators and explore the exciting future of database management with advancements in machine learning and language models.

I'd rather read the transcript of this conversation please!

Intro - 00:00:03:

Data Masters is the go-to place for data enthusiasts. We speak with data leaders from around the world about data, analytics and the emerging technologies and techniques data savvy organizations are tapping into to gain a competitive advantage. Our experts also share their opinions and perspectives about the hyped and over hyped industry trends we may all be geeking out over. Join the Data Masters podcast with your host Anthony Dayton, Data Products General Manager at Tamer.

Anthony Deighton - 00:00:38:

Welcome to another episode of Data Masters. Today we have the privilege of hosting an exceptional guest, Dr. Dana Van Aken. The brilliant co-founder and CTO of OtterTune. Ottertune is a database optimization tool created by a team of database researchers at Carnegie Mellon University. The company envisions a future where developers can deploy and administer cloud databases without needing to be database management experts. OtterTunes's customers include Booking.com, Nordigen, and Postscript, along with prestigious institutions such as Carnegie Mellon University. Before embarking on the OtterTune journey, Dr. Dana Van Aken was a PhD student at Carnegie Mellon University and also undertook notable research internships at Microsoft and Google. With a wealth of knowledge and experience in the field of computer science, Dr. Dana brings a unique perspective to the realm of data optimization. We're thrilled to have you here with us, Dana. Welcome to the show.

Dana Van Aken - 00:01:41:

Great, thanks so much for having me. Happy to be here.

Anthony Deighton - 00:01:44:

So you have a great sort of an impressive background in data and technology. We've been working in this space for many years. Maybe you could share a little bit about your personal journey and also how that personal journey intersects with OtterTune, which obviously you're a founder of directly. Maybe share a little bit about that journey.

Dana Van Aken - 00:02:04:

Absolutely. Interestingly, OtterTune started out an academic research project. So my journey sort of with OtterTune started around 2014. I had applied to a number of PhD programs and one of the programs that was accepted into was Carnegie Mellon University's computer science department. So that's when I met a professor at Carnegie Mellon University and also OtterTune  CEO and co-founder Andy Pavlo, who is well known in the database space. And after speaking, he recently had this great vision about database tuning using machine learning and had recently submitted a proposal and was trying to get me to join him as a PhD student and decided to join him in 2014. And that's around the time where we began the work on OtterTune, the academic research project. So the goal of OtterTune, you know, in the research project was to do automatic database configuration tuning using machine learning. So we worked on this for a few years. We published our first paper at SIGMA conference, which is an academic conference in 2017.Of course, this was an academic conference, although it was well received. We didn't get very much feedback from the industry. It wasn't until later that year where we had one of the VPs at Amazon visit Carnegie Mellon University. And I believe that they spoke in one of Indy's database classes. So what happened is that Andy went up and was thanking them for providing us with so many cloud credits because I went through a ton of cloud credits, running all of the experiments at the university, you know, it was a big deal for us. I mean, they gave us several thousand dollars worth of credits. And at that point, they invited us to write a guest blog post on AWS's machine learning blog. So we went forward and we published this guest blog post, you know, a number of people from the industry just came out of the woodworks. There was a lot of interest in the product, you know, people really wanted to try OtterTune. Now at the time they were contacting Andy trying to see if they could get somebody to come out and set up this academic prototype at their company, which there was only one of me and we also had a few other research students working with us at the time. As you can imagine, that doesn't scale. We did do a POC with Society General  and when we published that work in 2021, we had some really positive results there. And then we had also started a POC with booking .com at that point also had very positive results. And at that point we decided that it made sense to spin out this academic research project into a commercial product. So we founded the company in June 2020. And you know. That's when the company started. That's sort of the background of how we got here.

Anthony Deighton - 00:05:15:

So listeners, I'm sure, are drawing a parallel here, because of course, Tamer was founded out of an academic research project at MIT, Dr. Stonebreaker, with a set of PhD students, thinking through some of the original machine learning behind Tamer's today commercial offering OtterTune sounds like a very similar experience coming out of Carnegie Mellon, really interesting academic research specifically, and also in parallel here with machine learning, then commercializing that. But maybe if you could share a little bit of your experience of these different worlds, the worlds of academia and the worlds of commercial startup, and what it's meant and how it's been to found a company within an academic institution with very much an academic bent, and then what it means to then take that and bring it to become a commercial business.

Dana Van Aken - 00:06:17:

Absolutely. So I will say that the transition from academia to a commercial product doing a startup has been really interesting. There are a lot of assumptions that we made in academia that didn't hold up in the real world, as you might imagine. I can give a few examples. So one of the main things that we realized, you know, because we had previously, as I mentioned, done POCs with booking.com and Society General. Now these are very large companies. What we're actually able to do is deploy Ottertune on a copy of the database and replay the workload. That was what we did at Society General and at booking.com, they had a staging environment that was very similar to their production environment, which is where we deployed Ottertune. So we kind of made this assumption that all companies had this and that was not the case at all. Society General was running Oracle Ottertune, the commercial product, we only run on AWS RDS databases. So that includes Aurora and Aurora Serverless and for MySQL and Postgres. So for MySQL and Postgres you just, they're not commercial databases. So you don't have the set of tools to be able to make a copy of the database and then also, you know, the bigger pieces like replaying the workload accurately. On top of that, being able to spin up similar instances, there's a lot of costs associated with this, and it becomes very expensive. And the other lesson there was that we have learned that most customers do not have a dev or staging environments that look anything like their production environments. So we tried that initially. What we did is, we deployed OtterTune on a customer. At that time, they were on Aurora,they wanted to reduce costs by reducing IOPS. That's what we optimized for, the configuration tuning piece. And we were able to get a 15% improvement in cost, so 50% reduction. So we took the same configuration and we applied it to the production database and we saw about a 2% improvement. This was a big realization for us. So interestingly, like what we've done with OtterTune, we transitioned it to a self-service product. And the other thing that we did is a lot of our customers are running on their production databases What this means is that, I'm sure we'll get into OtterTune more, but for some background,OtterTune can automatically apply changes on the behalf of the user to sort of close the optimization loop. So what this means is that we have to be really, really, really careful about the optimizations that we're applying. So we built a bunch of safeguards.

Anthony Deighton - 00:09:14:

This idea that what works in the ivory tower of academia, then when it hits the real world, maybe some of these assumptions we made or test cases that we had are very different, probably resonate with anyone who's worked on an academic project.

Dana Van Aken - 00:09:30:

Absolutely. I mean, in academia too, we were very guilty of making these assumptions in the same vein. I understand both perspectives. In academia, it's very hard to get access to real world data or anybody's database for that matter. So that's where the mismatch is.

Anthony Deighton - 00:09:51:

Yeah, 100%. Let's talk a little bit about OtterTune.So, you're the chief technology officer. Maybe share a little bit about the technology behind OtterTune and just to say it, it seems quite counterintuitive that there should even be a problem here in the sense that a database is a database. Sure, there are a bunch of settings, but this is why we have database experts in the world. And if you're me, of course, you just hit all the defaults and it's going to be perfect. Everything will be wonderful. Clearly, I'm wrong, but maybe share a little bit about both the problem and a little bit about how you've come to the solution.

Dana Van Aken- 00:10:28:

So database tuning has typically been sort of the solutions for that have been very manual. So as you mentioned, what you would traditionally do is hire a database administrator or a team of them to manage your databases. And that includes performance optimization. The problem these days is that there's more than problems overall, but two high level problems, which is that there's, you know, these days companies have many databases. And so the tuning that, you know, a human can do is very reactive in that sense. They're not proactively able to keep the databases optimized. When there's a performance issue, they go in and fix it. And that's, you know, the tuning. The other issue is just the complexity of databases has continued to grow. For example, back in the early 2010, Postgres and MySQL both had, I believe, less than 100 configuration knobs. That's still a lot, but these days they're in the hundreds. So to actually understand all of those knobs and how to configure them correctly for different workloads and hardware becomes a very challenging problem even for experts. There's just not enough experts in the world with that knowledge. And these days, another thing is that we're seeing a lot of people are managed, you know, in cloud database services, like  Azure, etcetera. They're moving away from having, like a dedicated DBA or team of DBAs to DevOps and, you know, and or having an engineer in general manage the database. So we find, speaking with a lot of our customers, that the person managing the database, is the poor person that just happened to know the most about Postgres, for example, and kind of got stuck with the job. So that's sort of the challenge. So the solution we propose is looking, is doing automatic configuration tuning using machine learning. We've also used our knowledge to kind of come up with an initial recommended set of knobs for people to tune based on what we see being most effective for different workloads. So that's sort of the solution that we propose in the configuration tuning space.

Athnony Deighton - 00:12:44:

So this might be a silly question, but why machine learning in this context? Wouldn't a set of rules be as effective? And clearly they're not, but in my mind drawing a bit of an analogy to Tamer in that we think about using machine learning because we wanna be able to learn from prior examples, the strong sort of benefit to prior history in terms of doing in our case entity resolution. And it largely would be a manual human task. And as we always like to joke, the problem with people is they like to eat and sleep. And so, and I see some of those parallels in what you were describing where even if you could find the Dadaism. Sounds like they're hard to find. They only go to where the problem  is, you know, they don't work 24 by seven across thousands of databases. That's not physically possible. And you're not going to hire hundreds of them that would be necessary to do that. But talk a little bit about the use of machine learning in this context.

Dana Van Aken - 00:13:46:

Right, so machine learning, you know, database tuning is, configuration tuning is perfectly paired with machine learning. So basically, like I mentioned, there's so many configuration knobs to tune. And what we find is that the right configuration really varies with the workload, especially you can kind of come up with some rules for the hardware, but to come up with rules for all of these different configuration knobs for the different workloads that we see at the company is, you know, really impossible. There are popular open source products like PGTune, for example, is a popular configuration tuning, a solution for Postgres. You basically just input the hardware characteristics of the system, so it'd be like the number of CPUs, the memory, you know, storage size and a few other settings and it spits out a, you know, configuration just based on the hardware. So while that's a good starting point, we see, I believe improvements of like, you know, anywhere from 10 to 30% in the experiments that we run over PGTune. And this is just because like the amount that one workload varies from, you know, one to the next is just huge. So at a high level for the algorithm, we use Bayesian optimization, which is sort of the black box optimization algorithm. In this way, we do need to provide it with some inputs, you know, for example, like the range of knob values to look in, a little bit of information about the type, you know, the data type of the knobs. But over time, you know, there's this feedback loop where we basically read, you know, we're looking at the metrics from the database system. And also, and that includes performance metrics, whatever the user wants to optimize for like latency or throughput. And by observing, you know, how different knob configurations impact the system, over time, the models converge on a good configuration for the user. And this has proven much more effective in our research.

Anthony Deighton - 00:15:52:

And probably much more cost effective because you're not having to have a human being sit there and do their best guess.

Dana Van Aken - 00:15:59:

That's right, yes, absolutely.

Anthony Deighton - 00:16:02:

So you've talked a lot about this in the context of configuration of the database, but there's a lot more to database performance than the underlying hardware and configuration. How does OtterTune think about the broader set of ways you can make a database work better?

Dana Van Aken - 00:16:22:

So like I mentioned, the academic version of OtterTune was only focused on configuration tuning. But that was another realization when we got into the real world. Ultimately, if you have really poorly written queries or missing key indexes, there's only so much you can do with configuration tuning. So in the past year and a half, we've been working on expanding the service to provide sort of a more holistic database optimization solution. We also provide recommendations with respect to indexes, queries, and resource utilization. So we just released our query recommendation piece, which is exciting. That's more of a data-driven solution in that, you know, it basically does anti-pattern matching with SQL queries. But so far, you know, it's quite helpful. On the index side, we've released recommendations for identifying unused indexes and also redundant indexes. So these would be, you know, if you have two essentially identical indexes, we can highlight those. So although these aren't machine learning back like the configuration tuning piece, they have helped customers, you know, identify a number of problems and improve database efficiency. And we're currently working on an initial version of recommending indexes that you're, which we're hoping to release later this year.

Anthony Deighton - 00:17:51:

Yeah, I was going to ask about that. So first of all, that's super exciting, this idea that, to your point, it's more than just the knobs. That's a lot to a database. From a roadmap perspective, and again, given your role leading the CTO office and the technology, how do you think about where OtterTune goes from here? From a roadmap perspective, how do we think about the future of database management? What's it going to look like in five or 10 years?

Dana Van Aken - 00:18:23:

So I guess the initial roadmap of course is to, or the sort of short to midterm roadmap is to improve and expand our current database optimization offerings. So like I said, we would like to release a product around missing indexes. Now this is a really hard problem. I mean, there's been research on database tuning since the 1980s and index tuning is a very hard problem. Microsoft has released an index tuning tool. It's probably the most state of the art, I guess commercially available in my opinion, that I know of at least. And so it's a very hard problem. We're gonna solve the problem piecewise. Start very simple, but ultimately, like we definitely want to provide, you know, or expand and provide holistic tuning for all of the services. In addition to that, I mentioned the ability to complete the feedback loop with a configuration tuning. So one of the reasons that we chose to only provide our offering on AWS, you know, the cloud services, because it standardizes a lot of the APIs we use. Now, would you, a lot of, or, you know, prospective customers come to us, they're running on-prem. The problem with on-prem databases is everybody has their own custom setups. So it's really hard to, you know, we would have to write connectors for each individual customer, and that's just not practical for us. So this is, you know, why we focus on cloud databases. But getting back to the point is that the problem with completing the loop with, you know, missing or, you know, adding or removing indexes or rewriting queries, is that a lot of times that this needs to be done in the application code. So, and for example, like a lot of our customers use ORMs like Django, Ruby, so if you drop an index through PSQL or whatever command prompt for the database, it will just add it back next time you do a migration if you don't actually update the application code. So this is the major reason why we can't complete the loop right now for that. So in the future, I think it would be really cool to do GitLab integration where we actually do some analysis of the application code as well. And so we would be able to look at, we'd be able to detect when you made changes to the database and even sort of connect that to improvements or degradations in performance. I would love to see like a full-fledged integration where we're able. Do optimization at both the database and the application level.

Anthony Deighton - 00:21:01:

I am by law required on every podcast to ask about chat GPT, the new rule. But the point you're making there about connecting to database performance application code, strikes me as a great example of really where large language models and generative transformer models could be quite interesting. I'm sure you have a thought and perspective here.

Dana Van Aken - 00:21:27:

Yes, absolutely. In fact, one of our engineers who's leaving to do their PhD, won't be as soon has been working on this exact problem. So you can go so far as to rewrite queries. I actually gave a talk on this recently at a Postgres conference where he showed a few examples to see if these large language models can go in, rewrite some of our Django code actually to see if we could find some problems. And the results were Okay, you know, at this point, I believe the finding was that you really had to prompt it multiple times and feed it, you know, feed the code in multiple times and provide it with hints and be very specific to get any solution. But as they continue to, you know, improve these large language models, this will absolutely become possible. And so this is the direction that they're working in because, of course, you're right, like being able to just fix the sequel, there's actually quite a bit of work. And I've been seeing a number of companies providing this at present, but I don't believe that they can rewrite the application code yet. That is certainly an interesting research direction.

Anthony Deighton - 00:22:41:

First, we get rid of the database administrators, and eventually we'll get rid of all of the developers all the way up the stack. And it will just be able to hang out at the beach and our databases will work and our applications will hop.

Dana Van Aken - 00:22:53:

Yes, absolutely.

Anthony Deighton - 00:22:54:

Hey, Dana, it's been a thrill to have you here and appreciate you sharing the OtterTune story. I'm really fascinated to see this academic research out of Carnegie Mellon really, you know, flourishing and helping to contribute to better database performance for everyone.

Dana Van Aken - 00:23:11:

Anthony, thank you again so much for having me. It's great being here today and talking about  OtterTune, of course.

Outro - 00:23:17:

Data Masters is brought to you by Tamer, the leader in data products. Visit tamer.com, to learn how Tamer helps data teams quickly improve the quality and accuracy of their customer and company data. Be sure to click subscribe so you don't miss any future episodes. On behalf of the team here at Tamer, thanks for listening.

Suscribe to the Data Masters podcast series

Apple Podcasts
Google Podcasts
Spotify
Amazon