To be competitive, all businesses today must be much faster at asking the right questions and making decisions quickly, with the ultimate customer experience always top of mind. (Think Amazon.com)
However, in enterprises, this agility has been painfully impeded by the primitive process of data scientists toiling away in obscurity, spending (too) much of their time cleaning and curating data for use but removed from the people who actually are responsible for both the data and the decisions. If the cleaning/curating process could be more fully and intelligently automated and the data scientist become an integral part of the team, decision-makers could think and act much faster and with more confidence. Framing questions and making decisions could now be focused on the customer experience in a way that was not possible before for traditional companies.
Defining Analytic Velocity
Analytic velocity is the concept of providing ever-faster time-to-answers to critical questions that help you achieve business goals, like reducing procurement costs or lowering compliance risk. More importantly, it involves being able to get more, ever-more-nuanced insights–all ultimately aimed at serving customers better or saving more money (why you’re in business). In the case of procurement, it’s not just “What are we buying?” but “Who should we be buying from?” “Where is the market heading?” and “How should we be sourcing?”
Analytic velocity starts with clean, integrated and classified (easily usable/digestible) data from
diverse enterprise data silos. It picks up speed with DataOps principles and pipelines. It achieves full power when you can easily layer new questions on top of this foundation and start answering them in weeks, instead of months or years. The payoff? The ability to be more data-driven in your analytics initiatives, leading to better and sharper insights that can literally transform your business.
Getting off on the right foot is crucial, which means thinking differently. With data scientists spending as much as 90% of their time cleaning data before they can use it, something’s clearly got to give. Clean, integrated and classified data can’t be achieved with traditional data integration methods like MDM and ETL–we’ve all been trying and failing for too long. They’re not scalable nor easily repeatable when dealing with the huge amount of data variety in enterprises. They depend on top-down, rules-based methods of integrating data, and involve far too much human labor (time). One-off hacks and heroics by your data scientists–while common and valuable–are also not scalable or repeatable. Nor is slapping some ML package on the backend of your software, have it do something, and make a few recommendations. Analytic velocity is a process not an event.
Enterprises like Societe Generale, Toyota Motor Europe and GSK have successfully bet big-time on machine learning and AI to expedite clean, integrated and classified data. As a result, they’ve reduced procurement costs, achieved a 360-view of their customers, and accelerated drug discovery and development, respectively.
At Tamr, we’ve used human-guided machine learning (ML) to clean, label and connect data for enterprise customers, avoiding problems downstream in delivering readily digestible data for analytics at scale. It’s an approach tailor-made for analytic velocity. ML models do the heavy lifting of data integration, taking a probabilistic approach (“scientific guessing”) that invokes just the right amount of human expertise and rules if and when needed. Because Tamr’s models are constantly and actively learning with use, they get smarter and smarter over time while data gets cleaner and cleaner. Less and less human effort and fewer rules are required over time. Adding new data sources becomes easier and cheaper as does adding new, nuanced questions.
Without using ML to clean, label and connect data, as my partner Mike Stonebraker so eloquently told SiliconAngle, “You’re toast….You’re going to have to write a huge number of rules that no one can possibly understand” to enable people to access and use data.
The Importance of Human-in-the-Loop
Given that we’re talking “velocity” here, you may be surprised to hear me talk now about people.
Hear me out.
For enterprises that want to achieve true analytic velocity, human-in-the-loop is a pillar throughout the DataOps cycle (not “just” in continually producing clean, unified and classified data). The most-sophisticated pursuers of analytic velocity will think first about how it translates into the customer experience (like internet companies).
Data Consumers: If the people on the receiving end of your newly cleaned, unified and classified data don’t understand it, trust it, or know what to do with it, they won’t use it. In some enterprises, they’ve had no access to data, access to poor data or both–giving them little natural incentive to trust or use the “new” data. For such enterprises, it’s advisable to start with data that your data consumers can immediately use, act on and get value from, rather than a more-sophisticated insight that ultimately might not translate into enough clear business value or be adopted. Then, gradually layer on more sophisticated questions as data consumers become more adept. The proof is in the practice.
In addition, provide an easy way for data consumers to comment on the data or report errors, e.g., via an automated bi-directional link to data stewards from their analytics tools or dashboards. This will enhance adoption and deepen trust in the data. At Tamr, we’ve launched “Steward” which enables your data consumers to provide feedback on data no matter where the data is being consumed across their enterprise regardless of what toolset they are using for visualization.
A customer in the oil and gas industry used Tamr to create a master data source for essential data about wells, much of it from external sources. Mastered wells data is vital to the company in reducing the risk and increasing precision in selecting where to drill next: mega decisions. The company launched an internal campaign to promote the new data source and encourage analysts to explore and query it.
Another example: With data cleaning, integration and classification automated for what they’re buying, procurement category managers at a global manufacturing company hope to soon able to understand why they’re buying, from more nuanced insights (e.g., intelligence about market trends) and eventually recommendations (“What should we be sourcing?”). This is behavior- and business-changing analytics.
Data Scientists: Data scientists have pretty much been in charge of analytic velocity in the past, mostly by default. They’ve usually gotten stuck with data cleaning, with an estimated 80-90% of their expensive time going into data integration and cleaning before they could get to model creation. Given the choice of buying yet-another-piece of expensive software, they’ve become accustomed to doing data cleaning on their own to produce models that are “good enough.” It’s a hack, but it’s all they’ve had.
But it’s a problem. For example: Every time a data scientist takes data from Salesforce.com, puts it into her own database, and changes a few values to “clean things up,” she’s created another “source of truth” (more data variety!). Integrating with data scientist workflows–meeting data scientists on their turf (with scripting-oriented tools for your ML-powered data integration system that help them spin up new models quicker, for example)–can speed adoption and analytic velocity.
Data Owners: Analytic velocity involves both business and technology, so having champions from both sides will help ensure success. For example: in a procurement analytics project, an ideal combination would be someone focused on procurement excellence (big picture), a procurement team member focused on the project deployment (big incentive), and a data science partner (big ideas). Bonus: Enterprises in which there’s a senior-executive champion who understands analytic velocity and will push for those breakthrough, business-changing insights.
It all boils down to this:
Take a thoughtful, data-first approach to analytic velocity, one with attention to the customer experience. Start with the questions, and work back from there. You’ll speed your path toward answering those big questions that deliver true business insights and business-changing outcomes.
To learn more about how Tamr uses human-guided machine learning (ML) to clean, label and connect data for enterprise customers, schedule a demo today.
*Thanks to my colleague Matt Holzapfel, who’s on the front lines with enterprises helping to improve their analytic velocity, for contributing to this post.