3 Trends in Life Sciences Data Management

In a previous blog post we wrote about how Tamr has begun hosting user group meetings for life sciences customers as a way to support our customer’s desire to learn from each other. This is a great opportunity to learn how many of the top pharmaceutical companies are using Tamr to tackle data unification challenges in areas that range from R&D to commercial applications.

In March, we hosted our second user group and have some interesting takeaways to share. Here are three important trends in the life sciences industry as they relate to data:

Data Needs to be FAIR

In this case, FAIR stands for Findable, Accessible, Interoperable, Reusable. This is an increasingly important term within the industry right now — as organizations look for ways to make clinical study data more available and linked to other important datasets, such as biomarkers or biospecimens, they are finding that the only way to make data FAIR is to harmonize it. This is a prerequisite for scientists at these organizations to be able to analyze and mine the data across studies, populations, and diseases. The result is an accelerated discovery process and reduced time and cost to bring a drug to market — improving both profitability and clinically significant outcomes for patients.

Solutions like Tamr are a critical piece of the solution to the data unification challenge for multiple reasons. First, datasets within Tamr can be tagged with metadata to allow users a comprehensive view of the data available to them. Second, Tamr’s data unification solutions enable life sciences organizations to easily integrate data from disparate sources into unified datasets that adhere to the SDTM standards while also linking to other dataset types (e.g. biomarker or real world data). By leveraging human-guided machine learning, our solutions allow an organization’s data to be easily integrated across domains and re-used to flexibly support downstream analysis.

Scalability is Key

As the amount of data and number of studies life sciences are tasked with managing continues to increase, the ability to scale, both in terms of variety of sources and amount of data is critical. Many of our customers and participants in the user group have discussed hitting a scale cap with the traditional data management solutions they were using — typically because those solutions were based on a limited approach using deterministic rules that can break when a new dataset is introduced. In contrast, Tamr utilizes probabilistic machine learning models to enable organizations to scale much more quickly and easily.

Further, Tamr’s compute engine leverages an optimized version of Apache Spark, allowing a data processing solution that scales easily to billions of records.

Think API First

Modern data pipelines require, above all else, interoperability. And as more life sciences organizations strive to integrate DataOps principles, these organizations are looking for solutions that integrate easily with other platforms. To meet this need vendors must be ‘API first’. At Tamr, we pride ourselves on building products that can be adapted to any platform and are easily extensible to support any organization’s needs. Backing this up is our recent release of our first publicly available python API. Our life sciences customers are already using this to build complex, scheduled, and robust data unification pipelines to support their analysts’ needs.  

 

To learn more about how Tamr’s solutions help Life Sciences companies address data challenges, please reach out or schedule a demo.



Clint is a Senior Data Scientist/DataOps Engineer at Tamr where he leads several efforts across the life sciences space. Prior to Tamr he got his PhD in Particle Physics and is a co-author on one of the most widely downloaded reviews of machine learning in physics.