Written by Matt Holzapfel
How Tamr Solves Data Unification Challenges for Media and Entertainment
The story of Netflix sounds like something out of Hollywood — one of the most valuable and fastest growing companies over the past decade is a $16/month video subscription service. But there is nothing fictional about their meteoric rise, which has been fueled by taking a data-driven approach to all aspects of their business.
Both their rapid growth and impact on elevating customer expectations have forced the media and entertainment industry to rethink the role of data. Players in the industry are being forced to wrestle with core data management challenges for the first time, and have turned to Tamr to help them transform complex data assets into a catalyst for growth.
Challenge: external data is tremendously valuable…but is full of variety
The media and entertainment industry is driven by talent and content. Companies invest heavily to understand what talent appeals to which audiences, and what content they should buy or produce. This is a highly data-driven process, but most of the data required to gain this insight lives in external sources, such as social media or 3rd parties (e.g., Nielsen, cable providers).
Across all of these sources, there is high variance in how a common entity is described. “Daniel Day-Lewis” on IMDB might be trending as “#DanielDayLewis” on Twitter. The Entertainment Identifier Registry returns over 350 results when searching for “Batman”. Without a way to disambiguate, or master, this data, it’s impossible to get reliable signal about what audiences care about. The problem is exacerbated by the fast pace of the industry — even a 1-month old view into who and what’s popular might lead to poor decision making.
Solution: data unification powered by machine learning
Many of the customers we work with in this industry have learned the hard way that getting accurate, up-to-date 360 views of talent and media titles are not problems that can be solved through human effort alone. There are not enough lines of SQL that can be written to reliably unify this data.
Four reasons consistently bubble to the top when we ask our customers why they choose to partner with us to establish their foundational data assets:
- New data attributes are constantly being introduced. As viewership patterns change, so does the metadata that is available. This drives constantly changing data models, requiring new attributes to be incorporated to effectively unify the data. Our machine learning-based platform, Unify, incorporates these new attributes as just another input into the model once they’re made available. No new project needs to be kicked off, or new set of rules need to be developed.
- Multiple levels of granularity are required. It is not enough to simply group together every episode of “American Gladiators” across all 7 seasons. Companies need every individual episode grouped together and all of the episodes from a season and all of the episodes from a series. This is a challenging problem to get right by writing rules because of the massive variation in show titles (e.g., “Episode #1.1” vs “S01 E01”). Tamr Unify reviews all of the data available to identify patterns and relationships that are difficult for humans to discern. Further, since a new model can be trained in a matter of days, our customers aren’t forced to compromise on having their data mastered at multiple levels of granularity.
- Data volumes are growing fast. The more consumers shift to streaming and digital services, the more data available to analyze. Our platform’s modern, big data architecture allows customers to match new, incoming records with sub-second response times, ensuring data is always fresh and reliable.
- Integrating external data is essential. Our open, API-oriented platform allows customers to seamlessly build data pipelines that unify internal and external data. We also partner with key industry data providers, such as EIDR, to give customers a ready-to-deploy solution to the problem.