How Can Oil and Gas Companies Extract Value From FracFocus Using Tamr

In my last post, I covered how Tamr’s software can be applied to many areas of the oil and gas industry to master information and find hidden value in messy data sets.  One of the examples from that article centered around using it to clean the FracFocus database.  In this post, I’ll summarize what FracFocus is, what the data output looks like, how it can be cleaned using Tamr, and show some visualizations I created by transforming the data into a much more usable state.

What is FracFocus?

FracFocus is a free database that contains the hydraulic fracturing “recipes” for many of the wells, horizontals and verticals, drilled in the United States.  It comes in a .bak format, can be restored in MS SQL, and even has query instructions.  The original use for this data set was to monitor what chemicals were being used around the public for health considerations. However, E&P companies soon found the information incredibly useful for monitoring their competitors’ well completion techniques.  As a result, not only are there just regular, everyday mistakes in it, but there is also wording intentionally entered to misguide people that are trying to figure out what operators are doing to optimize their frac jobs.

Working in the Database

After you download the database (which is a little over 3 GB in size) and run the SQL query, you will get output like this:


It is a lot to take in.  As of the database version I am using in this article (11/1/2019), it is about 4.6 million rows of data and the file grows considerably larger every month.  The data columns of the most interest are:

  1. API number
  2. State name
  3. County name
  4. Operator
  5. Well name
  6. TVD (True Vertical Depth)
  7. Latitude/Longitude (along with its “Projection”)
  8. Total base water volume (very important)
  9. Percent HF job (HF – hydraulic fracturing)
  10. Trade name
  11. CAS number (unique chemical identifier)
  12. Purpose
  13. Supplier

Don’t worry about the other columns, many of those are only there as a byproduct of the query to join various tables together to produce the output you see.

Those 13 columns are probably going to be the most important to you, but I will put a caveat out about one in particular – True Vertical Depth.  TVD is really hard to judge as correct from the database alone because the people entering the value sometimes treat it as a horizontal well’s measured depth (MD, length of the vertical portion + horizontal lateral length).  If you have a lot of experience in oil and gas, especially within a specific basin, you can probably make an educated decision on using it.  But if you don’t, you should pass on guessing. Typically, you will be using this database with a producing well set within a specific basin. That is where Tamr can easily help you discern the correct value between this database and any dataset with complementary information.

Once you start getting around in the database, you will come across a substantial number of spelling errors.  For example:

Chemical Name Count


The above table is a small portion of a much larger list of just one chemical and the associated unique counts of the term.  This particular compound is used as a biocide in frac water to make sure there are no clogging issues due to microorganisms when the well is put on production. and one of the correct spellings would be 2,2-Dibromo-3-nitrilopropionamide.   If you tried to find all of the various spellings of this compound, along with a sizable number of other compounds that will have similar lists, it would take you hundreds of hours of work to properly identify everything.  Also keep in mind that the database is updated monthly with new batches of wells from every hydrocarbon producing state. Mastering the data using Tamr will cut down the time needed by more than 90%.

The chemical, along with all other chemicals except water and proppant (primarily some form of sand), don’t make up a very large portion of a frac job – typically <1% by total fluid volume.  Though, a lot of companies and research groups care about this information because sometimes they find a little bit of something makes a big difference in the quality of a completion method.  Citric acid is a very small amount of some companies’ frac recipes, but some engineers claim that adding more of the acid to the fluid mixture has a beneficial effect on the well’s initial production.  The only way to know if this is true is to measure it. To get a statistically significant answer – without hundreds of hours of work – will require cleaning and joining data from a data source like FracFocus using Tamr.

As I mentioned above, the main components in a hydraulic fracturing job are fresh water and sand.  It can’t be that hard to misspell about 30 different versions of proppant, right? Going through this version of the file, I found 1,002 unique terms for proppant.

In terms of difficulty, fixing the chemical names is the biggest challenge,  probably followed by purpose and supplier.

Calculating Water and Sand Amounts

Along with scrubbing the database for incorrect spellings and reclassifying items, you will also have to fix the “Percent HF job” column.  It has the common issue of percentages being both in whole number and decimal formats. Tamr has the ability to perform transformations on numeric values, so is an easy fix for the software to perform.

Once those percentages are in a standardized format, you can calculate the amounts of sand and water used.

Sample Calculation:

Total Base Water Volume (gal) (TBV):  16,489,452

Max Ingredient Concentration in HF Fluid for water (% by mass):  91.0

Max Ingredient Concentration in HF Fluid for silica (% by mass):  8.6

Gallon of water weight (lbs./gal):  8.34

Barrel (bbl) = 42 gallons

Total weight of water (lbs.) = TBV * Gallon of water weight = 137,522,030

Total weight of frac job (lbs.) = Total weight of Water/Water concentration in frac = 137,522,030 / (91.0/100) = 151,123,110

Proppant weight, Silica, (lbs.) = 151,123,110 * (8.6/100) = 12,996,588

Barrels of water = 16,489,452 / 42 = 392,606 

Visualizations from the data

Now that we have a much cleaner dataset, we can start pulling together some visualizations regarding hydraulic fracturing jobs over the past few years.

Maps – as mentioned earlier, latitude and longitude are in the database, so it is easy to generate a map of the wells.  I have colorized the points based on formation/basin classifications from this well known graphic of North American basins from the EIA.  Of course, there will be individual points that don’t correspond with the various basins because the borders aren’t hard lines and there can be overlap.


Historic Averages – Looking at averages over time, as E&P professionals are well aware of,  hydraulic fracturing jobs have greatly increased in size – both in the amount of proppant and fluid used.  The more the formation is cracked with larger volumes of water and sand, the more oil and gas it will initially produce.  In some formations, like the Bakken and Utica, there is a small decrease in the average for 2019. That decrease could stem from new knowledge of the formation leading to a more efficient use of proppant, a desire to lower capital investment costs, or operators working more in a part of the basin that responds better to fracturing and won’t require as many materials.  The volume size is also greatly affected by the lateral length of the well. The longer the horizontal, the more proppant and fluid needed. As I had mentioned before, we cannot easily discern the depths and lengths of wells from FracFocus and will require something like state well data to get better visibility on those metrics.

Recent Completion Statistics – This chart shows more recent data for the five operators in each basin with the most horizontal wells.  A number of the basins have operators whose choices on the amount of sand and water to use are relatively close in value, but some basins/formations like the Utica, STACK, and Niobrara have some clear differences or outliers.

In the Utica, Ascent and Gulfport have overlapping acreage positions in the more eastern dry gas and liquids rich windows of the formation which would easily lead to them having similar completion methods.  Eclipse Resources merged with Blue Ridge Mountain Resources in late 2018 and was renamed Montage Resources, but while Eclipse was by its own company, it was known for drilling laterals that were over 2 miles long and believed in big frac jobs.  Chesapeake is more in the northern part of the Utica play, while Antero is further south near the West Virginia border.

In terms of the STACK (Sooner Trend Anadarko Basin Canadian and Kingfisher County), Alta Mesa Resources is on the eastern edge of the play, in a more shallow, oily portion of the trend.  As opposed to the more talked about Woodford formation within the trend, they target the shallower Meramec and Osage formations.

The outliers I found most interesting were Anadarko and Crestone Peak in the Niobrara.  I try to keep up with everything going on in the industry, but there are a lot of formations with a lot of companies trying new techniques all of the time.  While Anadarko does some big frac jobs on the Delaware side of the Permian, they go pretty light, along with Crestone, in the Niobrara. Instead of going too deep into it here, this article is a great summary of why these companies are doing what they are doing.

Final Word

The FracFocus database is a phenomenal resource if you can clean the data – and that is where Tamr can do all of the work for you.  Instead of spending weeks of time and associated employee cost cleaning the database, companies can use Tamr to sanitize the millions of rows of data and distill the most pertinent information in hours by one person.  To go along with that, companies would have an excellent view of all of the other chemicals competitors are using in order to determine if there is any hidden value in the smaller additives.

We just evaluated this datasource by itself, but combining it with others like state and proprietary data can yield powerful results.  With just those two sources, you can tie the amount of sand and fluid used in the frac job to drilling results and determine optimal completion methods for each area of a basin.  In the next article, we will combine the information from here with state data for one of the more popular basins right now. Doing that will help clear up horizontal lengths, and allow us to come up with more metrics that will aid in valuing properties.