The Critical Importance of Data Quality for Generative AI Applications: Practical Strategies for Success

Andy Zimmerman

Chief Marketing Officer

The Critical Importance of Data Quality for Generative AI Applications: Practical Strategies for Success

Getting your Trinity Audio player ready...

I recently attended a panel session at Corinium CDAO Fall 2025 in Boston, moderated by Tamr’s CEO, Anthony Deighton. The session, “The Critical Importance of Data Quality for Generative AI Applications: Practical Strategies for Success,” featured an impressive line-up of data industry experts including:

Elena Alikhachkina, Chief Data and AI Officer, TE Connectivity
Tyler Frieling, Director—PMG—DS&S Alternative Data Research, BlackRock
Victoria Gamerman, Global Head of Data Governance and Insights, Boehringer Ingelheim
Tarun Sood, CDO, American Century Investments
Steve Boras, Head of Model Risk Management & Validation, Citizens Financial Group

The conversation covered both the challenges and opportunities facing companies as they explore the use of large language models (LLMs) and AI agents in data management. Here are some of the key insights and perspectives that stood out.

LLMs Are Like Interns

LLMs are still in the relatively early stages of development, with much of their potential yet to be realized. Multiple panelists likened them to interns who are just beginning their careers.

This makes sense. When compared with seasoned employees, interns know relatively little. They lack the depth of experience and expertise that comes from years of working at a company or within an industry. And the risk with interns is that you give them a task and they work really hard on it, only to discover that they’ve worked on the wrong thing. It’s not that they did that intentionally—they just didn’t know any better.

And the analogy can go one step further. Imagine the LLMs are like MBA interns. In addition to working hard without sufficient knowledge and context, they are extraordinarily confident and eloquent! (I say this as someone who was once an MBA student.) They state incorrect information and do so boldly and articulately, which makes you want to believe them, even though their information is based on inaccurate data, lacks context, or is just plain wrong.

When the audience was polled for their take on the state of LLMs, many agreed with this analogy, describing LLMs as:

50%: Intern level (hardworking, not that smart, no business knowledge)
29%: MBA intern level (hardworking, kind of smart, general business knowledge, sounds smart)
14%: Average employee level (smart, knows their business domain)
7%: Your best star performer (someone you can point at any problem, and they’ll get it fixed)

The panelists highlighted that understanding how LLMs and AI agents fit within an organization is part of the battle, while ensuring they have the right context is also critical.

Little Bit of Garbage In. Lots of Garbage Out.

We’ve all heard the adage “garbage in, garbage out” used in the context of analytics and dashboards. But it takes on a whole new meaning when it comes to scaling poor-quality results. Because when you put a little bit of garbage into an AI model, you can potentially get a whole lot of garbage out. That means we need to think about data quality differently for AI than we have historically for analytics.

Not only do we need to define what we mean by “data quality,” but we must also think about quality in the context of disconnects between systems and teams. And while GenAI is changing data management, many CDOs are still not thinking about data quality in the context of data that is constantly moving and changing. This needs to change; even the most pristine data can fail to deliver value if it’s disconnected and lacks context. Which brings us to the next insight.

Context Over Quality

As important as data quality is, in the realm of GenAI, context is probably even more important. Even the cleanest, most accurate data can lead to poor outcomes if it is stripped of the context that gives it meaning or is disconnected from the rest of the organization.

Context captures the nuanced and idiosyncratic information needed to train the AI models to interpret enterprise data correctly. Without that layer of understanding, even high-quality data risks being misapplied or misunderstood by an AI agent.

Scrubbing Plates vs. Creating Masterpieces

As LLMs and AI agents continue to gain ground, there is a temptation to apply them to every data challenge that comes our way. But the panelists cautioned against this approach, stating that we should use AI to do the “dirty work” so we, as humans, can do the “sexy work.”

As one of the panelists pointed out, author and video game enthusiast Joanna Maciejewska may have said it best: “I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.”

This quote underscores the importance of deploying AI agents to handle the mundane, boring, thankless work of cleaning up messy data, while reserving the interesting, strategic work for humans—not the other way around. And because right now LLMs are behaving much like hardworking, yet ill-informed interns, that seems like good advice.

Data Quality (Still) Matters

The most important takeaway from the panel discussion was that the path to stronger, more reliable AI agents starts with better data. When you use clean, consistent, context-rich information to train the models, LLMs have the foundation they need to minimize hallucinations and deliver valuable output you can trust.

Clearly, LLMs and AI agents represent a massive opportunity for data management. But it’s also important to remember that, at the core, we still need clean, connected, knowledge-graph level data that drives these applications so we can turn these “interns” into our top employees.

Andy Zimmerman

The Critical Importance of Data Quality for Generative AI Applications: Practical Strategies for Success

Andy Zimmerman

LLMs Are Like Interns

Little Bit of Garbage In. Lots of Garbage Out.

Context Over Quality

Scrubbing Plates vs. Creating Masterpieces

Data Quality (Still) Matters

Related posts

Insights from the Data Masters Forum: Agentic AI Takes Center Stage

Why Master Data Management is the Foundation for GenAI

What is Data Unification?

Andy Zimmerman

Get a free, no-obligation 30-minute demo of Tamr.

Related posts

Insights from the Data Masters Forum: Agentic AI Takes Center Stage

Why Master Data Management is the Foundation for GenAI

What is Data Unification?