Implementing DataOps in Life Sciences
The amount of clinical data is compounding at an astonishing rate. Largely, this is due to our increased ability to generate and collect vast amounts of data that was previously left untouched. While this new data holds tons of potential, it is also presenting new challenges like: frequent changes to ontology standards, uncurated backlogs of clinical data assets, and untapped utilization of valuable reference data. Data organizations within life science companies are struggling to keep pace with all this change.
Recognizing the problem that data volume and variety presents, now is the time for the industry to embrace DataOps; the automated process-oriented best practices to improve the quality and agility in which highly curated data can be consistently delivered to data consumers for operational and analytic impact.
To stay competitive in today’s data environment, the FAIR-ification of data through leveraging cloud services and best-of-breed technologies is critical.
The principles of DataOps can be framed around the key concepts of process, technology, and people. At a high level, the framework components can be summarized below:
Process
- Agile – Incremental delivery model
Technology
- Architecture – selection of tools which comprise data supply chain
- Infrastructure – selection of platform to support architecture
People
- Roles – division of labor across mixed-skill teams
- Structure – working model for projects across technical and business teams
The DataOps approach tackles modern data initiatives with all three components in mind to guarantee successful, transformational outcomes within organizations.
How DataOps Accelerates Life Science Initiatives
Compared to traditional approaches to clinical data harmonization, Tamr’s DataOps approach enables life science companies to deliver curated data products at a pace and scale previously impossible – opening up a large variety of use cases previously not feasible.
At GSK, for example, Tamr helped harmonize 30 domains of over 1500 legacy clinical study data into GSK’s SDTM standards within one year. This achievement of processing millions of source attributes across more than 40,000 sources of clinical trial data in a data pipeline that can process 10 billion records a day allowed GSK to gain tremendous leverage in the variety of operational and analytic applications across their R&D teams. DataOps played a key role in achieving this.
As David Cowen, Director, Data and Computational Sciences at GlaxoSmithKline, pointed out during his presentation at DataMasters:
“I think the DataOps changes that we’ve seen in the industry have been led by companies like Google and Amazon that have a real data focus. They understand data is a key factor in their product.
In the life science industry, that’s not so much the case. GSK to some extent, is pharmaceutical with drugs, and so data is on the periphery and while for years we’ve known that we got a goldmine of data that’s available to us, really having the will to go after that and make it manageable that’s something that has not been there. It’s only been recently here at GSK that the value of data assets and information assets has been realized and we’re in the process of trying to capitalize on that.”
Tamr’s Role in a DataOps Framework for Life Sciences
Tamr plays an important role in the DataOps technology landscape; it has been built with all three components of the DataOps principles in mind. As a technology platform, Tamr promotes the DataOps principles in its approach to architecture and infrastructure:
Cloud-native scale out across enterprise infrastructure – Tamr has been built and tested to leverage modern best-in-breed technology to reliably manage large data volume, variety, and velocity. With strong partnerships across all major cloud vendors, Tamr’s underlying storage and processing components use native cloud services that can be managed and scaled on demand.
Effectively retain and utilize subject matter knowledge within the organization – Tamr’s emphasis on human-guided machine learning to automate data curation ensures the flexibility to meet the organization’s needs. Tamr’s interface to capture subject matter expert feedback allows for the accurate depiction of how the organization curates their data; whether it be the sponsor-specific SDTM data models during CDISC conversion, or how specific business units organize their clinical partners, sites, and investigators. This approach ensures that accurate (for the organization), up-to-date data is always available.
Best of breed to integrate with modern technology stacks- With technology evolving quicker now that ever, modern data stacks avoid locking an organization’s entire data flow within a monolithic software solution. Instead, the modular, best of breed approach in DataOps has been critical to not only future-proof technology solutions, but also provide the best results. Tamr’s RESTful API-oriented to demand interoperability within an organization’s existing infrastructure, and excels in what it does best in data harmonization, while leveraging complementary tools to augment gaps in an organization’s data flow.
Assume data will change over time, and adapt to it – Tamr’s machine learning approach assumes data will always change — data standards will change, new clinical data will be acquired, new vendors will be onboarded, and so on. It is thus again important that the organization’s technology for data curation can be flexible, and automatically scale to this change.
Modern data ecosystems have become more complex than ever, so DataOps principles provide the best practices to ensure a successful approach to all data transformation initiatives.
A Final Thought: How Life Science Companies are Responding to DataOps
From what we’ve seen at our customers, the DataOps approach has proven to be critical in transforming large varieties of data initiatives across the life science industry. One of the most important yet illusive measures of success that we have seen is the fundamental shift in data culture throughout an organization. Tamr’s involvement in many of our customer’s digital transformation journeys has enabled the organization’s data consumers to have (and even expect) highly curated data to succeed in their roles for faster drug discovery and more efficient clinical development.
We’d love to help with your DataOps initiatives: whether you’re far along in the journey or just beginning. We offer tailored workshops where we can evaluate your data and then work together to identify areas where we can transform your latent data into an asset.