"Today's data challenges need new solutions. MDM needs Machine Learning"
— Dr. Michael Stonebraker
Who is Dr. Michael Stonebraker?
Dr. Michael Stonebraker is an adjunct professor at MIT CSAIL and a database pioneer who specializes in database management systems and data integration.
Through a series of academic prototypes and commercial startups, Stonebraker's research and products are central to many relational database systems. He has founded nine database startup companies (so far) over the last 40 years, an extraordinary achievement for a computer scientist. The nine startups include Ingres Corporation, Illustra, Paradigm4, StreamBase Systems, Vertica, VoltDB, Informix and Tamr, where he currently serves as CTO.
Michael Stonebraker was awarded the 2014 A.M. Turing Award (known as the “Nobel Prize of computing”) by the Association for Computing Machinery for his “fundamental contributions to the concepts and practices underlying modern database systems as well as their practical application through the start-ups he has founded.
Fierce Pragmatism & Guiding Principles
Michael’s guiding principles are evident in everything he does. His ‘fierce’ pragmatism was described by his colleague and fellow co-founder Andy Palmer through this anecdote from the early days of Vertica ‘we agreed that our system had to be at least 10× faster and 50-plus% cheaper than the alternatives. If at any point in the project we hadn’t been able to deliver on that, we would have shut it down. As it turned out, the Vertica system was 100× faster, a credit to the brilliant engineering team at Vertica.’
Michael stands behind the premise that mastering data at large scale requires machine learning. He feels strongly that relying on a ‘rules-based’ approach to data management will waste some of your organization’s most valuable resources. Stonebraker emphasizes that Data Scientists spend 80% of their time doing data integration work, leaving little time for the analytics needed to fuel successful business outcomes.
“You can’t deduplicate six million records manually… hell will freeze over before you finish,” according to Turing Award Winner Dr. Michael Stonebraker.
Stonebraker invests tremendous time, energy, and effort in developing young people and giving them life-changing opportunities.
The 10 Big Data Analytics Blunders
Michael Stonebraker is no stranger to the speaking circuit. Mike has featured as a keynote speaker at many of the most prestigious data analytics conferences internationally. He presented a session titled How to Avoid the 10 Big Data Analytics Blunders at the O’Reilly Strata Data Conference and Big Data London in 2019, Michael delivered this session remotely at IQPC Chief Data and Analytics Officer Exchange and MIT’s CDOIQ in 2020.
Here are three of the top 10 blunders Michael has witnessed and helped customers like Toyota, GE, and Carnival overcome.
Not moving to the cloud
If your organization isn’t planning to become cloud-exclusive, you could be backing losing technology. The cloud is more elastic than your in-house solution and more cost-effective in the long run.
The cloud will save your organization a raft of money, allow your business to take advantage of new technologies with elastic compute, and open your organization to new geographies.
Not planning for AI/ML to be disruptive
Make no mistake: AI will displace some of your workers and has the potential to upend how you handle your operations. But there is only one choice: you can be a disruptor, or you can be disrupted.
If you want to lead, you must be willing to pay for talent and act quickly because the best talent is being snapped up fast. HR won’t like what you need to pay for machine learning (ML) experts but spending money now on experts nets you a much greater return in the long run. And, don’t make the mistake of contracting this essential skill out.
Not solving your real data science problem: dirty data
You’ve hired data scientists, so you think you’ve got big data analytics covered. However, it’s crucial to look at how they are spending their time. Unfortunately, most of their time is spent analyzing and cleaning data and integrating it with other sources.
Without clean data, your data science is worthless. So, have a clear strategy for dealing with data cleaning and integration, and have a chief data officer on staff.