Written by Michael Collins
The promise of the Big Data era revolves around the ability to quickly access complete, trusted information for use in business analysis – ultimately leading to more informed decisions and substantial growth. There is a lot of market buzz and innovation around consumption points of this information – take analytic tools, for example, that allow business users to “slice and dice” insights in virtually limitless amount of ways. On the other side of the coin, we’re seeing new technologies emerge for storing and processing data – like Hadoop ecosystems for instance. However, the “middle ground” of data management is where the real key to unlocking insights lies – as it ensures organizations quick access to complete information that can be trusted. Without it, companies will simply be using novel approaches to making decisions based off of incomplete and inaccurate information.
Unfortunately, although progress has been made within the realm of data management, there are not many technologies that can operate with the speed, scale, and flexibility that customers demand. This is particularly true within Master Data Management (MDM) – which seeks to create a single, trusted view of entities (suppliers, customers, products, etc.) in an organization. These single views of mastered entities are critical to fueling both analytic and operational initiatives but as data environments grow larger, it becomes more difficult to generate them.
As discussed in Scalable Data Curation and Data Mastering, a technical whitepaper written by Dr. Michael Stonebraker, generating mastered views of entities requires a new approach in the Big Data era. Traditional, rules-based approaches to matching and merging entities become too slow and costly to develop and maintain – ultimately causing frustration and potential abandonment of projects. Gartner mentions this trend in their MDM research when stating “MDM programs incur significant risk and expense in delivering on these visions. Gartner’s numerous client interactions strongly suggest that many enterprises no longer have the appetite for initiatives that take multiple budget cycles to deliver on their value propositions, as attractive as those strategic benefits may be.” Moreover, many technologies are based on a particular domain – requiring the purchase of multiple products to create analytics that span entities (“what products are each of my customers buying?”). A new, more agile approach is needed.
Tamr’s agile data mastering solution solves these issues by applying machine learning to the tasks of matching and merging entities within datasets to create single “golden records”, drastically accelerating speed and scale, and including human expertise in the process where needed to ensure the highest levels of accuracy. Moreover, the solution gives tremendous flexibility to organizations by leveraging the same process to enable quick, easy additions of datasets, classify mastered entities downstream, and span domains of usage (i.e. ‘multi-domain mastering’).
This agile mastering approach to generating golden records has proven successful in many large enterprise deployments – and the benefits of speed, scale, and agility generate tremendous business value for customers. Using this approach, customers can embark on enterprise-wide initiatives and confidently, quickly answer questions that give them full visibility into their business operations – such as determining “what products are each of my customers buying?” One of the best examples of this is in Tamr’s work with GE – as agile mastering has enabled GE to generate hundreds of millions of dollars in cost savings and, due to this success, the team continues to expand the application of Tamr to other entities such as customers.
While certainly a disruptive approach to MDM, agile mastering has a role to play in a larger ecosystem of technologies, including those that provide comprehensive data governance features, and enterprises often see the value of deploying many of these in a complementary fashion. What is clear, however, is that to capitalize on the promise of Big Data, organizations need to rethink their approaches to data management – and MDM in particular. New approaches and technologies, like agile mastering, are needed to adapt to new environments and challenges. They are the disruptive, foundational building blocks to ensuring quick access to complete, trusted information – which will always be the fuel for making better business decisions and driving stronger growth.