Tamr In Depth: Hadoop and Data Lakes

Hadoop 1


Hadoop’s distributed approach has driven down the cost of storage and processing of massive amounts of heterogeneous data from diverse structured and unstructured sources.

To deal with the staggering volume and variety of data in their organizations, enterprises turn to Hadoop to help form a ‘Data Lake,’ allowing these companies to store all of their raw data in one place for future analysis. While compelling, this approach comes with an obvious problem: you end up accumulating vast amounts data and promising yourself you’ll get back to it — eventually. Pretty soon you’re overwhelmed with dark, murky data that you’ll need to invest a whole lot of money in just to see (much less use). Getting data into the lake is easy, but getting it out in an efficient manner is extremely difficult.

The thing is, much of that dark, murky data has potential to drive valuable insights, be it through un-audited vendor part/pricing info; hyper-local customer account records; or research results trapped in individual scientist CSVs. Dumped into Data Lakes, this kind of data has a way of settling on the bottom. For it to be useful – i.e., ready for analytics – the data needs to be surfaced, cleaned and integrated with both scale and precision. Which is exactly what Tamr’s Data Unification platform does, helping return to enterprises the full value of their Hadoop investment.

Download Tamr’s white paper and learn how to unleash the power of your Hadoop implementation.

Hadoop 2

You’ll learn how Tamr unifies data within Hadoop via two core components:

  • A module for training, administration, and expert sourcing that runs on top of a relational database on an edge node of the customer’s Hadoop cluster.
  • A matching engine that runs distributed on the Hadoop cluster where pertinent data is stored.

The paper walks through an example of a company wishing to generate a 360-degree view of its customers by integrating disparate data sets (CRM, Clickstream, Transactional) currently stored in a Data Lake – and how a Tamr deployment can get this data integrated and ready for analytics with scale and precision in a cost- and time-efficient manner.

Download the white paper here.

Click these links to read more on Tamr and Hadoop and Data Lakes.