Key Takeaways: Michael Stonebraker’s Big Data, Disruption, and the 800 Pound Gorilla in the Corner

As businesses across all sectors navigate this time of rapid change and uncertainty, we wanted to relay the advice shared by Michael Stonebraker, adjunct professor at MIT CSAIL, and a database pioneer, during his recent webinar on Big Data, Disruption, and the 800 Pound Gorilla in the Corner.

In his presentation, Michael covered the 3V’s of Big Data: Volume, Velocity, and Variety, and made the case why data variety remains the big unsolved challenge for businesses among the data trio. If you’re curious about the unique challenges that variety presents, and want to hear Michael discuss the different solutions that are available, listen to the webinar on-demand.

But given the current environment, we also gave attendees the chance to ask Michael some of their pressing questions about the state of data in the enterprise.  Here’s a glimpse into the valuable insights captured during the Q&A session with Mike moderated by Mingo Sanchez, Sales Engineer at Tamr.

Q: We’ve seen a lot of organizations have been forced to enable their entire workforce to work from home because of the coronavirus. This is having a huge impact on businesses, especially those that follow more traditional work patterns. So for those organizations, what advice would you have as they’re facing unplanned downtime or had to close because of COVID-19?

A: We’re all in the same situation. And I think the basic answer to that question is to stay safe, figure out how to practice social distancing, and stay healthy. If you can’t stay healthy, nothing else matters. Now is a good time to do things that will give you leverage over your business problems. And so get going on the big problems that I’ve discussed in this presentation especially on the data variety problem.

Q: Do you have a favorite story about an organization using data mastering to solve one of these problems?

A: I talked about Toyota Motor Europe, which is mastering customers, and GE which is mastering suppliers. So let me talk about a company that is mastering parts. So Carnival Cruise Line turns out to be nine different brands and as you would surmise, they each have their own parts depoting system and supply chains. And that requires them to integrate nine different parts databases, each written independently, they are in the process of doing exactly that with Tamr’s help.

Q: Where do you see the future of DataOps in the next 10 years? What do companies need to do in order to adapt and be successful?

A: I’ll give you an answer in the Tamr context. If you look at the problems faced by large organizations like Toyota and GE, the solution is a pipeline of data operations. From ingesting the data to performing transformations, you can call this a pipeline of DataOps. They are going to run this pipeline initially to put all of their entities together. As things change, they are running this pipeline in incremental mode to keep their output correct. Then they have to send this output to a downstream, typically an analytics platform, so it’s a DataOps platform and pipeline that’s going to be run for a long time, years. It’s going to change over that lifetime as you add and subtract new kinds of cleaning, transformations, and data sources. DataOps basically says to structure your problem as a sequence of these data operations and then maintenance becomes a lot easier. So it’s a very good way to think about any data-oriented problems, think of them as a pipeline of DataOps.

Watch the full webinar recording above to learn how to help your organization solve data challenges at scale.