How Can Oil and Gas Companies Use Data Mastering to Uncover New Sites in the South West

There are a number of oil and gas basins in the United States, but the one that E&P operators keep coming back to is the Permian.  Other basins have their own geologic, operational, and financial advantages, but the Permian has…

There are a number of oil and gas basins in the United States, but the one that E&P operators keep coming back to is the Permian.  Other basins have their own geologic, operational, and financial advantages, but the Permian has continually proven itself with its multibench formations, prolific wells, and a large quantity of untapped resources.  In my last post, I covered how you can use Tamr to clean the FracFocus database and extract the data to determine how much proppant and fluid companies are using in their hydraulic fracturing jobs.

Building on that post, we will take a look at a sub basin of the Permian called the Delaware, and more specifically, the New Mexico side of it.  I will show you where to find production, general well data, and some other pieces of information that will be key to evaluating properties. Afterwards, I will detail how Tamr is essential to cleaning and unifying the data, things to beware of when evaluating the information, some general thoughts on the area, and finally some interesting statistics regarding the area.

A recent US Geological Survey report about the Delaware basin placed the amount of undiscovered, technically recoverable resources at 46.3 billion barrels of oil, 281 trillion cubic feet of gas, and 20 billion barrels of natural gas liquids.  For an idea of the geographic extent of the basin, check out the graphic on Concho’s website.  Concho (COG) is one of the premier operators in the region, especially in the focus area of this article.

I have mentioned the word “basin” a few times now, so let’s cover that.  I will keep the answer simple as I don’t want to incur the wrath of any geologist or geophysicist reading this by getting fancy.  From the Schlumberger Oilfield Glossary:

“A depression in the crust of the Earth, caused by plate tectonic activity and subsidence, in which sediments accumulate. Sedimentary basins vary from bowl-shaped to elongated troughs. Basins can be bounded by faults. Rift basins are commonly symmetrical; basins along continental margins tend to be asymmetrical. If rich hydrocarbon source rocks occur in combination with appropriate depth and duration of burial, then a petroleum system can develop within the basin. Most basins contain some amount of shale, thus providing opportunities for shale exploration and production.”

Free Data

As you know from the title, we will be looking at strictly the New Mexico side of the Delaware basin.  Why not Texas, too? Great question, and that leads to a longer discussion of what types of data do states make available, what is the format, and does it cost anything to get it.   New Mexico makes a lot of data available in a great format for free. Texas also makes a lot of data available for free, but the format can be difficult to get your hands around. Oil and gas production data in Texas is not “allocated” to the well, but to the lease level.  If you go to Texas’ treasure trove of oil and gas information and pull the production, to calculate the well level production data you will have to look at the production history of the lease, take into consideration any well tests, and then write an allocation script in your favorite programming language.  It can be done, but is very time consuming. Comparatively, New Mexico is easy. Here are the locations of the various sources:

  • For a bulk download of data up until early 2019, go here and grab the two “volumes” files for the production.
    • The XSD file folder has 13 other files that will provide general well header data, formation depths, location data, etc (all in xml format).
  • For the rest of the 2019 data, and from here on out, you can pull the data from New Mexico Tech.  If you are interested in production data, I suggest getting it by county.  If you want to look at the areas covered in this article, pull the data from Eddy and Lea counties.
  • If you just need to see one-off wells and desire something quick and easy to read, you can search for it here.  I haven’t gone through all of the information these sites offer, but one interesting thing about the single well search is that they consistently offer things like casing schedules, which company is transporting their product from the wellhead, and permitting information – all in one spot.

Using Tamr to Ease Your Headache

So, from the previous section, I am sure you can tell we are going to have a lot of different data sources that will be challenging to combine, deduplicate, and standardize.  Adding to that NM state data, you will definitely want to use the FracFocus dataset I talked about cleaning in my last post.

While API numbers are typically used as a standard column to join files in oil and gas, those values can be wrong, duplicated, or flat out missing.  Along with that, many API numbers can have multiple entries in which some items in those individual rows could be incorrect. For instance, when I pulled formation depth data from one of the sources available, there were 5-10 lines per API detailing formation depths – but those formation names were not standardized.  Keep in mind, this is a relatively small set of wells as we have narrowed our evaluation down to two counties – and the list of wells with their tagged formations is a little over 86,000 rows.


You will find similar issues with many of the API numbers being tied to 2 or 3 lateral length values in the portion of the information that contains the “top perf” and “bottom perf”.  Subtracting those two values, the starting point and ending point of where in the lateral a frac job occurs, will get you the “effective lateral length.” It won’t be the exact length, but it will give you the portion of the well that is in direct contact with the formation, which is essentially what you want.  Tamr will find these problem duplicate APIs, group them, and make the correct choice based on the questions the SME has answered.

Why Lateral Length and Proppant Amounts Matter

Some notes on why we care about the data from FracFocus, the lateral length, and how it is used.  Say I told you that one well was projected to make 500 Mbbl (thousand barrels) during its economic life and another well was projected to make 700 Mbbl, which would you say is the better?  Just on that information, you would say the 700 Mbbl well. Now, add in that the 500 Mbbl well has a 5,000’ horizontal lateral and the 700 Mbbl well has a 10,000’ lateral. That means that on an EUR/ft (estimated ultimate recovery) basis, the former is 100 bbl/ft and the latter is 70 bbl/ft.  There are a lot of variables that determine the actual value of a well , but everything else equal, the first well is looking like the better deal.

Same thing for proppant amounts.  If you are putting in the same amount of proppant in a 5,000’ vs  10,000’ lateral, the amount of proppant per foot is going to be greater in the shorter lateral.  That shorter lateral will have more/better flow channels because of the larger amount of proppant/ft vs the long one.  Again, all other things equal, you might get a better well with the one that has a shorter lateral. In oil and gas, finding that sweet spot of just the right amount of proppant over the right length of lateral is key to an economically strong well.  It is typical in company reports and at conferences to hear that forecasted production is better now than this time last year because of the increase of proppant/ft. The thing to watch out for is the point of diminishing returns, i.e. only a very slight gain on an increase in proppant.

Development in Southeast NM

Eddy and Lea counties, NM contain the northern extension of the Delaware Basin.  While this area has become a hot spot for drilling horizontal wells over the past 5-7 years, when you go through the records, you will see operators have been drilling conventional wells (verticals) here since 1928.  In fact, pulling all of the active wells from both counties for this article yielded ~28,000 entities. The charts that I include at the end of this post only reflect horizontal development from 1/2017 to 10/2019 as I just wanted to give you an idea of what kinds of numbers people are seeing currently.  Though, don’t get me wrong, I am one of those E&P professionals that strongly believes vertical wells should be in every company’s portfolio of assets – quick to drill and complete, have smaller footprints than horizontals, typically pay out faster, have fewer operational issues, and average a higher rate of return.  So, why don’t we drill them? That is probably a “couple of posts” explanation.

In terms of horizontal development, the Delaware sees huge operational changes and new challenges on a yearly basis.  Some of those are:

  1. Large acreage positions trading hands
  2. Experimentation in completion design
  3. Volatile oil and gas pricing leading to boom and bust development situations
  4. Issues with infrastructure, most notable is a lack of space in oil and gas pipelines
  5. Disagreement between operators on well spacing – can you fit 4, 5…8 horizontal wells on a square mile of land without seeing interference from a neighboring well?

Looking at that list, I see Tamr being able to help with numbers 1, 2, 4, and 5 by properly mastering and classifying the data to make better long term decisions and provide better data to improve research projects.  If you can nail down #3, give me a call…

Some Statistics on Southeastern NM

Besides the data issues I touched on earlier, it is also tough to get all of the data you want for the wells.  You might have all of the general and production data, but the operator didn’t submit their completion data to FracFocus, or perhaps didn’t write down the top and bottom perfs so you cannot get the effective lateral length.

One metric that I was able to add to the data set that will be of interest is the forecasted gross estimated ultimate recovery.  If you are new to oil and gas, forecasting wells is one of the more subjective aspects of the industry. While we do have formulas that dictate how wells decline, humans typically pick where forecasts start and shapes of the curve based on factors like what “looks” correct, their intimate knowledge of individual wells, and trends they know of in a basin.  In the case of this data set, I forecasted production for about 2,000 wells. The numbers you see are strictly based on forecast metrics I detail in my assumptions and not economic factors that would include things like investments, expenses, taxes, and company interests.

My statements/assumptions about the EUR calculations:

  1. Assumes a 50 year life or 1 bbl/d rate on oil wells and 10 Mcf/d on gas wells for end of life
  2. A terminal decline rate of 5% – when the hyperbolic forecast hits an annual rate of 5% it switches to an exponential decline, holding that 5% decline rate constant
  3. The b-value of the hyperbolic curve is capped at 1.6 – most reserve auditors use that number as their ceiling value.
  4. Only using wells no older than first production on 1/2017
  5. Have at least 6 months of production to better estimate the forecast curve shape.


Pulling data from FracFocus and knowing the lateral length allows you to calculate the amount of proppant used per foot in a horizontal lateral.  There are typically three general length categories of lateral – 5,000’, 7,500’, and 10,000’. Though, keep in mind you will definitely see lateral lengths that fall between and are less than and more than that range.  Length depends on how much of a lease position a company owns and what distance a company feels is economically optimal from past experience in the area. In this case, it looks like Matador and Marathon probably drilled more 5,000 footers, while Cimarex, XTO, Mewbourne, and COG drilled laterals in the range of 5,000-7,500’. The others drilled ranges of 7,500-10,000’.  When it comes to lbs of proppant/ft in the Delaware basin, over 2,500 lbs/ft would be considered high – but that is the trend for proppant loading in horizontals here.


The forecasts are in barrels of oil equivalent (BOE), but I have broken out the well counts by well type so you can take into account the companies’ product mix.

A note to those unfamiliar with the BOE unit, It is typically:

(past + forecasted oil) + ((past + forecasted gas)/6) = total amount in BOE

The “divided by 6” part reflects that 6 Mcf (thousand cubic feet) of gas is equal to 1 barrel of oil on an energy basis.  Sometimes reports will show gas as divided by a much larger number – because they are basing it on market value. So, in that case, If oil is $60/bbl and gas is $3.00/Mcf, on a value basis you would divide by 20.


This map depicts where you will find mainly oil or gas wells.  You can see that the central southern portion of the map has a mix of oil and gas along with the Northwest Shelf being primarily natural gas.



This is an EUR heat map that I put together in QGIS.  There are a large number of great wells down near the TX border, so you may have to zoom in to see where some of the wells fall on the color gradient.

Final Thoughts

This was a very brief overview of a small portion of one oil and gas basin in the lower 48.  Evaluating this small area requires many data sets that have missing, disjointed, and incorrect data.  If you were to include the Texas portion of this sub basin, along with the other two sub basins found in the Permian, you are looking at handling hundreds of thousands of wells.

If you were a company looking to acquire assets or better understand your own,  getting a clean data set to even start either endeavor will take a considerable amount of time – and the data doesn’t stop coming.  Tamr can help your organization master all of your engineering, geologic, financial, and land data to lead you to the hidden value and cost savings you are not able to see yet.