The O’Reilly Strata Data Conference is an event that “helps you put big data, cutting-edge data science, and new business fundamentals to work.” Attendees at the event dissect case studies, develop new skills through tutorials, and share best practices in data science. At this year’s event, I spent time attending sessions and speaking with attendees about the current state of the data science and big data industry. I gained more clarity on where organizations are succeeding, the tools required to derive the most value, and why the vast majority of today’s data initiatives still fail.
Organizations Are Making Data Initiatives a Top Priority
One message from Strata was clear: attendees see significant value in creating an analytics driven organization. In fact, 90% of organizations reported plans to invest in data strategy as a primary deliverable for fiscal year 2020.
Entities also realize that they can only derive the full benefits from their data if they put the muscle of artificial intelligence (AI) and machine learning (ML) behind their initiatives. AI was a major focus throughout the conference sessions, affirmed by industry projections that AI will generate $3 trillion in revenue within the next two years.
85% of Data Science Projects Still Fail
The mindset of seeing data as an asset is now pervasive, but the reality and the execution of data-driven initiatives paint a less positive picture. A disappointing 85% of data science projects today still fail due to lack of AI and ML adoption, untrustworthy data, poorly connected or integrated data, and erroneous data.
These shortcomings are due in large part to the misconception that it’s possible to rely on a single AI or ML solution to clean and unify data. A recent Gartner report outlines the flaw in the single-solution approach, citing that “data and analytics leaders must invest in new data management solutions that leverage aggregated and integrated capabilities.” According to Gartner, success will require more than one data management solution to unite and clean the data silos that have plagued data initiatives to date.
Dependence on Outdated Data Mastering Approaches
Other efforts falter due to reliance on 15-year-old master data management (MDM) techniques that use outdated Extract, Transform, and Load (ETL) processes for data mastering. ETL is great at solving small problems, but the technique does not work at scale.
Similarly, traditional rules-based mastering does not scale. It is well known that rule systems will work as long as the rule base is small (around 500 rules). When a mastering project requires substantially more than this number, traditional mastering projects tend to fail. Companies are learning that data mastering at scale requires machine learning, backed by human data stewards.
The Talent Gap
Limited talent is cited as another reason for high failure rates of data-driven initiatives. Organizations are aimlessly hiring and paying high salaries to data scientists and chief data officers (CDOs) with no defined objectives as to what their roles are or what data to leverage.
In many instances, the talent gap is not the fault of the CDO. Instead, lack of clear direction is to blame. At one event we attended earlier this year, a speaker mentioned that when he was asked in an interview what would be expected of him if he were to be hired as a CDO at the organization, he was met with the response of, “we don’t know, you tell us.” The challenge is that many organizations know they need to get a handle on their data, but they aren’t sure where to start.
A Promising Future
Despite the stalls and setbacks, the future for organizations that adopt AI and ML to obtain clean, unified data is full of promise. One company that implemented human-guided machine learning and AI saved $80 million by unifying and fully making sense of their data for the first time. Another is accelerating development of life-saving medicines by uniting information from assays, clinical trial data, and genetic data. Yet another organization transformed customer data into a valuable asset to improve sales, retention, and customer service.
These are just a few of the transformative, meaningful, and profitable outcomes that can be earned for organizations that achieve data unification at scale. Strata showed that the industry is heading in the right direction. The next step is to adopt new strategies and technologies that yield genuine results to build a path toward becoming a truly analytics driven organization. To learn more about the role Tamr can play in improving access to real-time data across your organization, schedule a demo today.