Tamr Insights
Tamr Insights
The Leader in Data Products
April 2, 2019

An Introduction to Machine Learning

An Introduction to Machine Learning

Machine learning, although not a new technology, is a topic that has taken news cycles by storm over the course of the past few years, spanning multiple industries and applications. And it’s with good reason, as artificial intelligence (AI) and machine learning are changing the way we — as consumers, workers, and human beings — perform tasks and even interact with one another. It’s an extremely useful technology that, when applied correctly, can be used to answer big questions and solve business critical challenges for large enterprises.

What is Machine Learning and How Does it Work?

The terms ‘artificial intelligence’ and ‘machine learning’ are sometimes used interchangeably, but it’s important to understand that machine learning is a subset of artificial intelligence. At its core, machine learning is about teaching machines to make data-driven decisions. How does a computer know how to distinguish an orange from an apple, for instance, or to mark certain emails as spam? It’s because the computer is leveraging machine learning to predict certain outcomes based on models that enable it to infer patterns, much the way a human brain would.

Machine learning algorithms use available sample data, or training data, to create a model. This model allows the computer to make predictions or decisions about new data that is introduced without someone needing to explicitly train the machine to do the required task.

The best machine learning problems are those where enough data exists for patterns to emerge. Data volumes don’t need to be massive–machine learning can be applied on hundreds of records for simple problems as well–but they do need to be large enough for patterns to exist.

An example of machine learning

Consider the example of a data analyst who wants to predict the products a new customer will buy. To start, the analyst wants to understand whether age is a determining factor for purchasing particular product lines. To build a machine learning algorithm, the analyst collects a sample dataset from the customer base that includes customer ages as well as products purchased. This sample dataset will be used as the training data. That training data is then used to build a model that can predict future purchases. As more customer data is fed into the model, it continues to improve and become more accurate over time.

In this example, the data involves only two data fields or features: age and purchase history. But in most cases, there would be several additional features, such as income, location, etc. And the analyst could also choose to include publicly available, industry-wide data to expand his or her dataset. As a general rule, the more training data available, the more accurate the machine learning model is likely to be.

Types of Machine Learning

The situation described above is an example of supervised learning. There are three general types of machine learning: supervised learning, unsupervised learning, and reinforcement learning.

Supervised Learning uses a set of human-labeled training data to develop a model. The algorithm learns a set of inputs along with corresponding correct outputs. The training data used to create a machine learning model is assumed to be ground truth, meaning that its validity is not questioned–however, the model must still be tested for accuracy before it can be deployed.

Unsupervised Learning infers patterns from unlabeled data to create a machine learning model. While this type of machine learning can be used to uncover previously unknown patterns in data, these are usually poor approximations compared to what can be achieved with supervised learning.

Reinforcement Learning is based on the underlying idea of learning by doing. As with unsupervised learning, the machine is presented with unlabeled data, but is also given positive or negative feedback depending on the solution it proposes. Over time, the machine learns to choose the desired outcome based on this positive or negative reinforcement.

Machine Learning and Big Data

Enterprises need ways to quickly and efficiently make decisions based on hundreds of thousands of datasets stored across different regions and business units. This is where machine learning can help–by providing the scalability needed to tackle the volume, velocity, and variety of Big Data.