Finding the Spacing Signal in the Noise: How to Debias a Model
Note: This is the 2nd of a multi-part series on Novi’s approach to creating forecasting models which respond accurately and sensibly to downspacing and parent-child depletion.
Understanding the Bias in Well Forecasting Models
In Part 1, we identified trends in spacing data that do not agree with physical intuition. These trends, showing rising production with tighter spacing, stem from operator decision making to downspace and increase completion intensity in higher quality rock. As a result, purely associative pattern recognition models struggle to decouple the positive effects of rock quality with the negative effects of tighter spacing.
Now that we have examined the bias in the underlying data, let’s have a look at the results of a purely associative machine learning model. The tree-based associative model treats all input variables equally, and has no special knowledge of causation or physics. This model uses 11,176 horizontal wells in the Delaware Basin with the following characteristics (Figure 1), and the following input variables:
Fluid Per Foot
Proppant Per Foot
Target Formation
Depth of Target Formation
Water Saturation
Total Organic Carbon
Clay Content
Thickness of Target Formation
Distance to Lateral Farther Neighbor
Distance Stagger Farther Neighbor
Age of Parent Well
Distance to Parent Well
Figure 2 shows the definitions of lateral farther and stagger farther neighbors that were used as spacing features in this model. Using the second closest neighbors creates a delineation between exterior single bound wells and interior double bounded wells on a pad. Other spacing definitions, such as total wells in radius or average neighbor distance can be used, but these are the 2 interwell features we chose for this particular model.
Impact of Proppant Loading on Production: SHAP Analysis
Figure 3 shows the SHAP dependence plot for proppant in the Delaware Basin. The SHAP impact isolates the effect of a variable on the model, showing how much the value of an input variable can move a forecast away from the average well in the dataset. In this case, increasing proppant loading moves the forecast up in a mostly linear trend between 1000 and 3000 lbs per foot. As we saw in the raw data, some local variations occur around large sample size round numbers (2000, 2500, etc), but the trend is mostly uniform and proppant alone can swing a forecast by 20% in this range. An operator would likely scale fluid and proppant together, creating an even larger impact for completion intensity.
Figure 4 shows a different story. Figures 4A and 4C show the same raw data trend from Part 1 of this series, while 4B and 4D show the equivalent SHAP trends for these variables in the model. The model is able to determine that downspacing has a negative impact on production, but the magnitude of change is much smaller than we would expect. Adding together these 2 features gives a downspacing impact of less than 10% when moving from a purely unbounded well to a sub 440ft cube development. The purely associative model is partially able to isolate some of the trend we expect to see, but the model is likely not useful to understand the economic impacts of downspacing.
Now that we have identified the underlying data issue and the results of an associative machine learning model, what tools can we use to create models that respond in a physically sensible way to spacing and depletion? Do data scientists in other industries face similar problems? As it turns out, these types of issues are common in medicine and economics, where fully randomized and controlled trials are impossible due to cost, ethics, or time constraints.
Figure 5, borrowed from wikimedia, demonstrates the concept we are trying to elicit from our forecasting models for shale wells. In this example dataset, the overall trend is down and to the right with a correlation coefficient of -0.74. This is analogous to the basinwide trend, created by sampling bias, showing downspacing correlated to higher production. Within this dataset, there are cohorts showing the expected behavior of a positive correlation. Similarly, within our basinwide dataset, there are cohorts of wells with similar rock quality and completions that show degradation with spacing. As an operator, you have probably observed this by comparing 2 pads in nearby areas with different spacing strategies.
Because of the inherent sampling bias in the data, we need a way to teach our models that geology and spacing will be correlated. One way to do this is to segment the dataset. Figure 5 demonstrates the concept of stratification, manually bucketing the dataset with some prior knowledge of the shape of the bias. For our dataset, this might take the form of bucketing wells by rock quality or creating a model without unbounded or single bounded wells.
Moving from the conceptual example (Figure 5) to the true dataset (Figure 6) presents the challenge at hand. The subpopulation groupings exist in the dataset, but it is extremely difficult and time consuming to isolate the subpopulations and then build individualized models. At Novi, we have experimented with subpopulations defined by spacing, rock quality, location (county or formation), with varying success. Much like the process of manually selecting type curves, this process is subject to human bias, and contains too much data across too many dimensions for anyone to manually select.
Conclusion
Creating unbiased forecasting models in the Delaware Basin is challenging, especially when separating the effects of downspacing from rock quality. Traditional associative models often fail due to data biases. By stratifying datasets into subpopulations based on rock quality and completion strategies, we can develop models that better reflect physical realities. This approach enhances the accuracy of well forecasts and provides more reliable data for strategic decisions.
In the next post, we will describe the process (Double Machine Learning) of using a model to select the subpopulations which will debias the model.
Written by Kiran Sathaye
Senior Reservoir Engineer, Production Optimization experience and expertise. Well testing and Reserves specialist for both conventional and unconventional reservoirs. Specialized in analysis workflows for specific cases.
5moVery good article, and perfect title for it. We have to use tools like this to "debias" our studies and conclusions when it comes to completion sizing and well spacing combinations.