Written by Andy Palmer
A recent Forbes Insight/Teradata survey of 316 large global company executives found that 47% “do not think that their companies’ big data and analytics capabilities are above par or best of breed.” Given that “90% of organizations report medium to high levels of investment in big data analytics,” the executives’ self-criticism begs the question: Why, with so many urgent questions to answer with analytics every day, are so many companies still falling short of becoming truly data-driven?
Here’s a look at the quandary, and some thoughts about what’s needed to liberate businesses from its effects.
Analytics projects start from the wrong place.
Many analytics projects start with a look at some primary data sources and an inference on what kinds of insights they can provide. In other words, they take the available sources as a constraint, then go from there. Understandable, but running a project like this skips a crucial step.
Analytics projects must start with the business questions you’re trying to answer, and then move into the data. Leading with your data necessarily limits the number and type of problems you can solve to the data you perceive to be available. Stepping back and leading with your questions, however, liberates you from such constraints, allowing your imagination to run wild about what you could learn about customers, vendors, employees and so on.
Analytics projects end too soon.
Through software, services or a combination of both, most analytics projects can indeed get to the answers for the questions they’re asking at any given time. But I’d argue that a successful analytics project shouldn’t stop with the delivery of its answers. For all the software and services money they’re spending, businesses should expect every analytics project to arm them with the knowledge and infrastructure to ask, analyze and answer future questions with more efficiency and independence.
Analytics projects take too long … and still fall short.
Despite improved methods and technologies, many analytics projects still get gummed up in complex data preparation, cleaning and integration efforts. Conventional industry wisdom holds that 80% of analytics time is spent on preparing the data, and only 20% on actually analyzing the data. In the Big Data Era, wisdom’s hold feels tighter than ever. Massive reserves of enterprise data are scattered across variable formats and hundreds of disparate silos. Integrating information for analysis through manual methods can significantly delay attempts to answer mission-critical questions.
Or worse. It can significantly diminish the quality and accuracy of the answers, with incomplete data risking incorrect insights and decisions. Faced with a long, arduous integration process, analysts may be compelled to take what they can (i.e., the cleanest data from the closest sources) — leaving the rest for another day, and leaving the questions without the benefit of the full variety of relevant data.
So what can companies like Tamr do for businesses awash in data and the tools to analyze it, but continuously frustrated by incomplete, late or useless answers to critical business questions?
We can create human-machine analytics solutions designed specifically to get businesses more and better answers, faster and continuously. In other words:
- Speed/Quantity — get more answers faster, by spending less time preparing data and more time analyzing it
- Quality — get better answers to questions, by finding and using more relevant data in analysis – not just what’s most obvious/familiar
- Repeatability — answer questions continuously by leaving customers with a reusable analytic infrastructure
Fortunately, a range of analytics solutions are emerging to give businesses some real options.
Data Preparation platforms from the likes of Informatica, OpenRefine and Tamr have evolved tremendously over the last few years, becoming faster, nimbler and lighter-weight than traditional ETL and Master Data Management solutions. These automated platforms help businesses embrace, not avoid, data variety by quickly pulling data from many more sources than was historically possible. As a result, businesses get faster and better answers to their questions, since so much valuable information resides in “long tail” data. To ensure both speed and quality of preparation and analysis, Tamr’s analytics solution pairs our automated Data Unification platform for discovering, organizing and unifying long tail data with the advice of business domain and data science experts.
Cataloging software like Enigma, Socrata and Tamr can identify much more of the data relevant for analysis. The success of my recommended Question First approach, of course, depends on whether you can actually find the data you need for answers. That’s a formidable challenge for enterprises in the Big Data Era, as IDC estimates that 90 percent of big data is “dark data”: data that has been processed and stored but is hard to find and rarely used for analytics. At Tamr, we’ve decided to offer our Catalog product — which quickly locates and inventories all data that exists in the enterprise regardless of type, platform, or source — as both a free downloadable application and a pivotal part of our overall analytics solution.
At Tamr, we’ve decided to take one additional step, designing a full-scale machine-driven, human-guided analytics solution that enables you to answer your questions quickly, accurately and repeatedly by conversing with your data:
- Ask the questions to be answered and specific data and analytics needed to answer those questions
- Find all relevant data available to answer the question
- Organize this data for analysis with unprecedented speed and accuracy [spend 20% vs. 80% of time on data prep]
- Analyze the organized data through a combination of Tamr solutions experts and software
- Answer questions continuously through infrastructures that are reusable even as the data changes
Our business domain and data science experts help businesses ask the right questions then shepherd the full analytics process, leveraging Tamr’s catalog/connect/consume software along the way. We then leave our customers with the data engineering infrastructure to update them continuously as the data changes.
As the Forbes/Teradata survey implies, collectively businesses and analytics providers have a substantial gap to close between “analytics-invested” and “data-driven.” If we follow the simple design vision of getting businesses more and better answers faster and leaving them the infrastructure to answer them continuously in the future, we’ll close that gap much faster than we’d think.