Written by Matt Holzapfel
Great analytic dashboards require great data. This places a heavy burden on analytic developers, who are rarely given perfect data that aligns with their users’ expectations. Data variety makes it challenging to easily assess if data is fit for consumption. Asking analytic developers to sift through all of their company’s data to identify inaccuracies before publishing a dashboard is unreasonable and unrealistic. Companies that recognize this, and establish analytic quality assurance processes that close the loop between dashboard users, analytic developers, and source owners, are far more likely to drive analytic adoption & business outcomes. Let’s dig into what this process looks like.
The Data Readiness Assessment
The excitement of building a slick dashboard often makes it easy to overlook basic principles about what makes data trustworthy. As the analytic developer, you should be well-versed on where their data is coming from and how it gets created, but in the absence of a comprehensive understanding of this, you should review the data along a few key dimensions before going too far into the development lifecycle.
How far back can the data being used for analytics be traced back to its source? Ideally, there is a clear path from the transformed, unified dataset back to the sources. But that’s rarely the case. In these scenarios, it’s essential to identify the stakeholders who prepared the data being used in the analysis so they can be looped in when the data is challenged. Once an analytic goes out into the wild, users will expect issues to be fixed quickly.
A comprehensive, up-to-date representation of the data assures users that all relevant and available data are being displayed–they’re getting the whole story. Only taking slices of data that fit a certain storyline or omitting relevant time periods is a dishonest illustration of events and is counterintuitive to the goals of analytics. Metrics and figures should come from all data you can gather, otherwise your analytics won’t tell the whole story.
Finally, data should come from reliable and credible sources. If the user cannot trust the source of the data, or spots obvious errors when working with the data, they have no reason to trust the conclusions drawn. Ideally, data are collected internally with documented data collection practices and dictionaries.
Establishing the Analytic Quality Assurance Process
Analytic quality assurance (QA) is, as its name describes, the process of ensuring that the data behind analytics, business intelligence, or similar models are up-to-date and accurate. Analytic QA is the housekeeping that allows businesses to make high level, data-driven decisions. Improving analytic QA improves data quality and provides more accurate insights. Data can contain multiple types of error — they may simply be incorrect or inaccurately entered, they may lack metadata or documentation, or they may be out-of-date. Analytic QA catches these errors before they are interpreted in an analytic.
Once you’ve established a baseline level of trust in the data and have a foundational understanding of its lineage and quality, you can begin the analytic QA process. This starts with establishing the QA team, an informal group organized around a single dashboard or set of analytics. In many cases, business users have the most knowledge about the data behind the dashboard and can improve the dashboard if they see incorrect figures or visualizations. Identifying 2-3 business users who can serve as reviewers of sandboxed analytics is essential to ensuring the data makes sense. Further, by bringing business users into the mix early, you can get stronger buy-in once you roll the dashboard out to a broader audience.
It’s also important to identify the data stewards & source owners who will need to act when issues are identified. If an erroneous value is identified, or aggregations don’t make sense, you should avoid the temptation to manually override a value in a separate spreadsheet just to get the dashboard over the line. Creating additional versions of the truth makes subsequent dashboards less reliable, since other developers are likely to point to the source system.
A best practice during this stage is to have a closed loop process so that issues identified get resolved upstream, and there is an auditable log of the issues identified during the QA process along with the action taken. Further, the team should agree on SLAs for making fixes to upstream data sources. One of the biggest risks during the analytic QA process comes from source owners being apathetic to disagreements about the veracity of the data. If expectations aren’t well-defined, issues with the source systems can linger for days or weeks, and threaten the viability of your analytics effort.
Who Should Adopt Analytic QA Processes?
A well-defined Analytic QA process is a best practice for all organizations, but becomes increasingly important as there is more distance between developers and source owners. It’s difficult to expect users to adopt a dashboard if the data seems wrong, or the data in a dashboard is different than that in a source system. Analytic QA processes help bridge the gap between developers and source owners, ensuring both parties are equally accountable to the quality of data being consumed by users.
Additional Best Practices for Analytic QA
There is no definitive workflow for analytic QA–the size, scale, and needs of a business all necessitate different procedures–nonetheless, best practices will guide your analytics groups toward success.
A key first step to effective analytic QA is proper communication. In many instances, knowledge is spread throughout an analytics team, the individual primarily in charge of analytical QA may not have the best expertise on the original dataset. Data stewards and owners should work within their team to keep counts and figures up-to-date and relevant.
Documentation goes hand-in-hand with communication. With many end users engaging an analytical tool, it is important that documentation is available as a reference. Even simple categories or tags can be misunderstood or confused by users who do not have the same prior knowledge as the builder of the analytic. A basic data dictionary can go a long way to minimizing the amount of confusion and debate once a dashboard is published.
Finally, but perhaps most importantly, analytic QA requires strong collaboration. If analytics were a one-person task none of this would be necessary, however, it is not that simple. Analytics represent many teams coming together to improve business outcomes and hence, collaboration is at the heart of successful analytical tools. Analytic QA requires a team effort to deliver a quality result.
We built Tamr Steward to help facilitate best practices around analytic QA. If you’re interested in getting started for free or learning more about how we’ve helped customers shorten the cycle time on their analytics, please reach out.