9 Key Principles of a DataOps Ecosystem
1. Cloud-First – Scale-Out/Distributed
Modern cloud database systems are designed to scale out natively and simplify operations and maintenance of large quantities of data.
2. Highly Automated, Continuous & Agile
Build an infrastructure that supports a continuous flow of data, from the instruments to all potential consumption endpoints.
3. Open/Best of Breed
Open ecosystems result in better software being adopted broadly, offering the flexibility to replace with minimal disruption to your business.
4. Loosely Coupled
The primary method for integrating best of breed tools/software artifacts and design patterns for interfaces should include data access services, messaging services, and REST services.
5. Lineage/Provenance
is Essential
Establish as much lineage/
provenance for data as possible enabling reproducibility that is essential for any significant scale in data science practice/teams.
6. Bi-Directional Feedback
Feedback collection methods that are broadly embedded in all analytical consumption tools are essential — enterprises need a “Jira for Data.”
7. Deterministic, Probabilistic, and Humanistic Data Integration
The only viable method of bringing data together is the use of machine-based models (probabilistic) + rules (deterministic) + human feedback (humanistic) to bind the schema and records together.
8. Aggregated and Federated Storage
The modern enterprise requires an overall architecture in which sources and intermediate storage of data will be a combination of both aggregated and federated data. Evaluate cloud storage.
9. Batch and Streaming
These design patterns can give you the best of both worlds–the ability to process batches of data as required and also to process streams of data that provide more real-time consumption
The DataOps Ecosystem
The future is inevitable—more data, technology advancements and vendors, are increasing the need to implement successful DataOps.
People - You can have the best tools and mindset, but if you don’t have the right skill set for DataOps, it won’t work at scale. Learn more about the Eight People of DataOps.
Process - DataOps initiatives must be automated, incremental, and collaborative. Rules-based approaches that rely on modeling and testing are too labor-intensive and will not scale as you do.
Technology - Choose open, interoperable and best of breed technologies that can be customized to serve the needs of the organization’s goals and drive to business outcomes.
Read the Key Principles of DataOps blog post here.
See Tamr in Action
Learn more about how Tamr exercises these principles of DataOps with our customers.