Fraud Detection with Data Mastering

Harnessing the power of federal data and building a data-driven culture is a focus for the U.S. Federal Government. And while it’s a big undertaking, it’s a worthwhile effort that will provide far-reaching benefits to all parts of the government and public sector. 

Having clean, curated data will certainly help all areas of government make better decisions. But I believe that one area in particular will benefit from being more data-driven: fraud detection. 

Fraud takes many shapes and forms, from individuals filing false tax returns to criminals who consistently misbehave and elude the system and even electoral fraud. It’s currently a struggle to identify fraudulent behavior and take action on it. 

Today, the most common solution to improve fraud detection is to hire more federal agents to monitor individuals suspected of fraud. But what if there was a better, more efficient way to spend taxpayer dollars? 

Reducing Fraud through Data Mastering

There are many ways that the government can use its funding to help spot and prevent fraud. Hiring more agents to investigate fraudulent claims is one way to address the issue. But I would argue that investing in data mastering is a better, more effective use of funds. 

Think about it. The government captures data about individuals or organizations who are suspected of committing fraud in systems across various departments and agencies. But oftentimes, there is no easy way to connect the data within these systems together in order to identify red flags, spot patterns in fraudulent behaviors, or recognize repeat offenders. 

With data mastering, all that changes. Data mastering employs machine learning (ML) to consolidate, clean, and categorize the data across systems and agencies so that the government can unify data in a centralized data warehouse or data lake. Then, they can clean, curate, and enrich the data and assign a persistent identifier so that it’s easy to identify individuals or entities who show signs of committing fraud. 

While this approach is data-driven, it still requires humans to be in the loop to provide feedback on the machine learning models and to spot anomalies in the data. This human feedback also helps to train the models, enabling them to deliver better, more accurate results over time. And the better the models the better the results – and the more confidence decision makers have in the data itself. 

Case in Point: COVID-19 Stimulus Funds

During the height of the COVID-19 pandemic, the government provided stimulus checks to millions of Americans as well as small businesses hit hard by the pandemic. But what they found was that some entities were receiving multiple checks erroneously. 

To address this issue, the government created an entire unit dedicated to the investigation of fraud as it relates to stimulus checks and to recovering the funds. Data played a role in helping to spot cases of suspected fraud, but the process was manual and time-consuming. As well, there was no easy way to identify when an individual or entity applied for multiple rounds of relief, because the data lived in multiple systems and in many formats. 

With data mastering in place, this situation would unfold differently. The government could start by pulling a list of addresses that received stimulus checks, regardless of where this data lives. Then, they could use machine learning to curate and clean this data, allowing them to identify clusters that indicate which addresses received multiple checks. Humans would still be involved to provide feedback on the model results so that the model can improve and provide better, more accurate results over time. 

Using this approach, the department would narrow down the list of individuals or entities to investigate, allowing them to save time and realize results faster. 

The same approach applies to other fraud-prone areas such as income tax, veterans benefits, and federal student loans. In each of these examples, the relevant data lives in multiple systems that cross multiple government departments and agencies. Data mastering can help to unify the data using machine learning so that officials can more easily identify and investigate potential fraudulent claims.

Through the use of data mastering, government agencies will not only benefit from increased efficiencies and better decision making, but they’ll see improved citizen satisfaction as well. Now, citizens can feel more confident that the government will identify fraudsters and take action against them.

To learn more about how Tamr and how our data products help public sector organizations master data at scale, please visit tamr.com.

Schedule a Meeting