9 Key Principles of a DataOps Ecosystem

Image

1. Cloud-First – Scale-Out/Distributed

Modern cloud database systems are designed to scale out natively and simplify operations and maintenance of large quantities of data.

Image

2. Highly Automated, Continuous & Agile

Build an infrastructure that supports a continuous flow of data, from the instruments to all potential consumption endpoints.

Image

3. Open/Best of Breed

Open ecosystems result in better software being adopted broadly, offering the flexibility to replace with minimal disruption to your business.

Image

4. Loosely Coupled

The primary method for integrating best of breed tools/software artifacts and design patterns for interfaces should include data access services, messaging services, and REST services.

Image

5. Lineage/Provenance
is Essential

Establish as much lineage/
provenance for data as possible enabling reproducibility that is essential for any significant scale in data science practice/teams.

Image

6. Bi-Directional Feedback

Feedback collection methods that are broadly embedded in all analytical consumption tools are essential — enterprises need a “Jira for Data.”

Image

7. Deterministic, Probabilistic, and Humanistic Data Integration

The only viable method of bringing data together is the use of machine-based models (probabilistic) + rules (deterministic) + human feedback (humanistic) to bind the schema and records together.

Image

8. Aggregated and Federated Storage

The modern enterprise requires an overall architecture in which sources and intermediate storage of data will be a combination of both aggregated and federated data. Evaluate cloud storage.

Image

9. Batch and Streaming

These design patterns can give you the best of both worlds–the ability to process batches of data as required and also to process streams of data that provide more real-time consumption

The DataOps Ecosystem

The future is inevitable—more data, technology advancements and vendors, are increasing the need to implement successful DataOps.

Image

People - You can have the best tools and mindset, but if you don’t have the right skill set for DataOps, it won’t work at scale. Learn more about the Eight People of DataOps.

Process - DataOps initiatives must be automated, incremental, and collaborative. Rules-based approaches that rely on modeling and testing are too labor-intensive and will not scale as you do.

Technology - Choose open, interoperable and best of breed technologies that can be customized to serve the needs of the organization’s goals and drive to business outcomes.

Read the Key Principles of DataOps blog post here. 

See Tamr in Action

Learn more about how Tamr exercises these principles of DataOps with our customers.