Written by Bernie Kuan
In the life sciences industry, R&D for new drugs is a long, expensive process with most drugs having less than a 10% chance of being approved by the FDA after multiple years of investment. There are a wide variety of causes for expensive drug failures, but key among them is the inability for scientists to efficiently process study data and drive faster, more data-driven processes in the drug development life cycle. Without access to unified clinical data, scientists are burdened with either the inability to find the data they need, or must manually clean, transform, and harmonize study data from different sources and formats. More often than not, this burden is compounded with data needs across multiple clinical studies.
To succeed, life science companies must modernize R&D processes with data at the core of their strategies. They have to take a DataOps approach in R&D to process data faster, design better trials, as well as learn from mistakes—and successes—early and often. Instead of spending monumental effort on long-term projects that may not succeed, companies may incrementally curate and consolidate better study data over time to improve their drug development processes and success rates. With this approach, life science companies can greatly reduce costs of drug failure, increase chances of success, and release life-saving drugs faster.
Data in Life Sciences is Messy
Integrating disparate study data is notoriously difficult in life science companies, whether due to data being captured and stored in multiple systems and diverse formats, or the data itself featuring different levels of completeness and accuracy. As a result, even if critical information exists among the hundreds or thousands of study datasets, scientists may not have access to it. To resolve these challenges, data standards are necessary.
The Clinical Data Interchange Standards Consortium (CDISC) is a widely used authority in managing life science data, and provides data standards to guide drug development. Yet, converting diverse data is traditionally highly manual, requiring a tremendous amount of time from subject-matter experts and SAS developers. Delays, inadvertent human errors, and high costs ensue as companies struggle to convert and standardize their study data.
The Tamr CDISC Conversion Solution
To create a more modern R&D pipeline for pharmaceutical companies, Tamr has introduced the Tamr CDISC Conversion Solution. Driving this solution is our experience working with the world’s largest life science companies, as well as a number of Tamr-hosted pharma user group meetings to collaborate with experts on tackling the industry’s data challenges.
As a result, we have created a powerful CDISC conversion platform, driven by human-guided machine learning, to integrate study data in a way that overcomes the challenges of traditional in-house tools that are prone to delays and errors. The solution harmonizes messy, disparate data sets into unified CDISC models through extendable CDISC conversion scripts and the integration of existing schema mapping specifications.
With the Tamr solution, companies have a robust platform to manage all their CDISC models across studies and domains as they incrementally improve their CDISC conversion capabilities. Moreover, Tamr’s CDISC conversion solution improves as machine learning components capture and apply domain knowledge from the company’s uploaded data sets–further accelerating the process of converting new study data over time.
CDISC conversion is also enhanced by Tamr’s powerful data transformation engine, which leverages an optimized implementation of Apache Spark that easily scales data processing to billions of records a day.
A Transformational Impact on Drug Development
Tamr’s data unification platform replaces inefficiencies involved in the manual harmonization of disparate data silos within and across multiple studies. With Tamr, our customers have enabled their data scientists and researchers with the ability to access curated, standardized, study data from unified data repositories–leading to reduced costs and faster breakthroughs, the primary goal of any life science company.
Amgen leveraged Tamr as a key component of their data integration pipeline: “Tamr made a huge impact in our ability to process over 200 Amgen legacy clinical study data in less than a year with only two resources. When you consider that such schema mapping was conducted on over 4000 source tables, the benefit of machine learning really added up.”
GlaxoSmithKline (GSK) implemented Tamr and saw incredible initial results in their goal of unifying data from three different domains (assays, clinical trial data, and genetic data). With Tamr, GSK mapped over 40,000 datasets across 1,000 studies to 36 different SDTM domains. “We are doing a step change on machine learning […]. We simply have to have more machine learning skills to deal with all the available data now.”
For many of our customers, Tamr’s CDISC solution is a critical piece of the data pipeline in managing and converting study data. We’re genuinely excited to help our customers in their efforts to discover treatments faster and enable better outcomes for patients. To learn more about our new Tamr CDISC Conversion Solution, please download our white paper below, reach out or schedule a demo.