Companies have invested an estimated $3-4 trillion in IT over the last 20-plus years. Most investment has gone into development and deployment of single systems, applications, functions and geographies to automate and optimize key business processes. This increases the number of data silos. Automated business processes generate further data.
Companies are now investing heavily in Big Data and Analytics 3.0 to begin the analytic prosecution of all this data. Data Variety – the natural, siloed nature of data as it’s created – is becoming a bottleneck. Its cost is appreciated when companies attempt to ask simple questions across many business silos: divisions, geographies, functions. Current top-down, deterministic data unification approaches (such as ETL, ELT and MDM) weren’t designed to scale to the variety of hundreds, thousands or tens of thousands of data silos. These systems depend on highly trained architects developing “master” schemas – “the one schema to rule them all.” This is a red herring.
The fundamental diversity and mutability of enterprise data and semantics lead towards a bottom-up, probabilistic approach to connecting data sources from various silos. You also need to engage source owners to curate data at scale. Overcoming data silos demands a more scalable, open and collaborative approach to getting data to work together – one that respects the need for data quality, provenance and fidelity.
A new bottom-up, probabilistic approach to data unification provides the scalability to exploit Big Data Variety. Finding and connecting siloed data into unified views starts to look more like a Google search circa 2014 than a Yahoo index crawl circa 1995.