CDISC Data Sheet

Clinical Trial Data: Automated, Replicable CDISC Conversion


+ IND and NDA programs necessitate recurrent submission of clinical trial data to regulatory agencies in specific formats.

+ Trial data is stored in a tangled web of standards, versions, and file formats, requiring conversion to CDISC or another standard every time data needs to be submitted.

+ Manual methods of converting this data to approved formats are time-consuming, expensive, and create a human bottleneck that dramatically slows trial velocity.

For most pharmaceutical companies, converting clinical trial data is a time consuming and expensive ordeal. As research progresses through the multiple phases of a clinical trial, data must be submitted and resubmitted to agencies such as the FDA on a continuous basis. Unfortunately, the processes used to aggregate, clean, and convert this data are largely manual, relying on teams of contractors or employees to convert data from proprietary formats to standards accepted by regulatory agencies. In fact, up to 80% of the cost and effort in answering regulatory queries is spent finding, retrieving, and shaping the underlying data.

This whitepaper discusses the reasons behind this, and presents a new solution that promises to dramatically reduce the time and resources required to convert trial data into accepted formats.

The Human Bottleneck

For pharmaceutical companies, there is a never ending need to convert trial data into formats accepted by regulatory agencies. Some of these reasons include:

Submission of New Trial Data

As trials progress through their various phases, regulatory processes necessitate submission of new data.

Updated Formats

New versions of standards used for data interchange (e.g., SDTM, ADaM, and SEND) require re-converting legacy data to updated formats.

New Domains

New domains (e.g., PGx for pharmacogenetics) mean reconverting old data.

Unfortunately, the processes used for handling these conversions make curating the 100th dataset just as expensive and time consuming as converting the first. This is largely caused by:

Manual Nature of Conversion Projects – The learnings and knowledge gained from past conversions cannot be readily applied to future projects. Companies end up asking the same questions over and over again instead of automating what they already know.

Inability to Apply Institutional Knowledge – When data managers need to poll subject matter experts (e.g., “What domain does this new data source map to?”), they don’t use a centralized workflow, resorting to one-off emails which are messy to audit.

Proprietary File Formats – Data is locked in proprietary SAS file formats and other third party standards, requiring developers with specific skill sets for conversion. These factors result in a process that is a huge bottleneck to trial velocity and requires millions of dollars annually to support. 

Introducing Replicable, Scalable CDISC Conversion

Tamr offers a scalable, replicable process for unifying and converting clinical trial data to CDISC format. By understanding SAS input and output formats, in addition to controlled terminologies, Tamr can leverage machine learning to automatically convert your clinical trial data set to a specific standard (e.g., SDTM, ADaM, and SEND, etc.).

When human intervention is necessary, Tamr generates questions for data experts (e.g., “What domain is this new data source?”), aggregates responses, and feeds them back into the system. This feedback enables Tamr to continuously improve its accuracy and speed while building institutional knowledge as new trial data sets are introduced. 

As a result, Tamr enables pharmaceutical companies to better leverage their existing knowledge and investment, making future transformations significantly faster and more efficient. Further, by turning conversion into a replicable, automated process that programmatically leverages subject matter experts, risks associated with data quality (e.g., patient, site data) can be further mitigated.


Converting clinical trial data to regulatory standards is a time consuming and expensive endeavour for pharmaceutical companies. Consequently, delays introduced in converting trial data can slow review and approval processes for new drugs, dramatically increasing time to market. Tamr offers a novel approach to this problem, offering a scalable, replicable way to automatically unify and convert clinical trial data to CDISC approved formats.