Machine learning is everywhere. From the apps on our phones to the searches we conduct on Google, most people use machine learning in some capacity every single day, even if they don’t realize it.
But there’s a perception with machine learning that it’s a magical black box. You send data in, and clean data comes out, with little to no transparency as to how it works. This approach makes it easy to resolve large amounts of data quickly and at scale, but it lacks the human feedback needed to improve the models.
On the other end of the spectrum, we find processes that are 100% human-driven. Companies hire tens, even hundreds, of people, often in low-cost areas, and ask them to resolve the data and the entities in the data. Human-driven processes work, but they are labor-intensive and don’t scale.
While both of these approaches are options when it comes to mastering data, I believe that there is a middle ground. A place where the machine takes the lead and humans provide guidance and feedback to make the machine – and the results – better. This is supervised machine learning, and it’s the data mastering approach that delivers the best outcomes.
Supervised machine learning combines the best of the machine with the best a human has to offer. Machines are very good at resolving data and data entities at scale and with speed. And they don’t get tired. This is a benefit, especially as data volumes continue to grow at a rapid pace.
Humans, on the other hand, are very good at providing feedback and ensuring that the machine’s results are accurate. And the more feedback they provide, the better the machine becomes. Another benefit of human involvement is trust. When humans participate in the process and have a hand in training the machine, they are more likely to trust the data. And when they trust the data, they are much more likely to use it in analytics and to drive decisions.
Supervised Machine Learning: An Analogy
Let’s take a look at an analogy to illustrate my point: self-driving cars. Today, companies like Tesla are touting the benefits of their self-driving cars. And they believe that the black box model delivers the better outcome. This is probably the wrong ambition.
See, self-driving cars work really well…until they don’t. When they encounter a situation that they’ve never seen before, they don’t know what to do. And they don’t know how to anticipate the outcome. This happened with a Tesla. The car was driving itself, and up ahead was a stopped firetruck. The Tesla didn’t stop and ended up crashing into the fire truck. Why didn’t it stop? Because the situation was unknown to the machine and the algorithm didn’t anticipate a crash as the outcome.
One could also argue that fully human-driven cars are not much better. Accidents happen all the time when humans take the wheel.
But human-supervised driving is the best of both worlds. It combines the power of self-driving cars with human oversight to guide the machine when a new or unanticipated situation arises. In the case of our fire truck accident, if the human was guiding the machine, it could have applied the brake and stopped the car before it crashed. Then, moving forward, the machine would recognize that situation and know to apply the brake before it hit the fire truck.
Human-Guided, Machine Learning Data Mastering
Just like human-supervised driving delivers the best outcome, so, too, does human-guided, machine learning data mastering. Organizations benefit from the power of the machine to clean and curate data from a myriad of sources across multiple data silos while also reaping the value of human feedback to ensure that machine stays on track and delivers the best result.
Here’s an equation I like to use to illustrate how the best modern data mastering solutions work:
Modern Data Mastering = 80% machine + 10% humans + 10% rules. And that is Tamr.
Through its cloud-native, machine learning-driven approach to data mastering, Tamr provides the clean, curated data your organization needs to power analytic insight and accelerate business outcomes. With modern data mastering, you benefit from the power that machine learning provides and the valuable feedback only humans can contribute. It solves many of the challenges traditional MDM solutions cannot overcome, allowing businesses to accelerate critical analytical insights by reconciling internal and external data at scale. The result is better, more accurate data and higher levels of trust.
So remember, machine learning is great. But humans (still) need to apply.