# How Model-Assisted Sampling Can Reduce Fieldwork

This article first appeared in the August issue of The Forestry Source.

In the previous Biometrics Bits article, "Sources of Error in Forest Inventory," we mentioned how a model-assisted (MA) approach can eliminate sampling error as a source of overall error. We received a lot of questions about how model-assisted inventories "actually work," so in this article, we'll run through a simple worked example. We hope this will help you develop an intuition for how model-assisted inventory can potentially help you get better information with less fieldwork.

The main motivation behind the adoption of MA inventory approaches in forestry is the availability of remotely-sensed (RS) data from satellites, planes, and drones. Of course, foresters have been using aerial imagery for decades to help with stand delineation and typing. However, to take full advantage of the explosion of freely available satellite and aerial imagery, foresters need a quantitative approach. We can use these long established MA methods to combine cruise data with RS data in a rigorous, statistically valid manner. If you're interested in learning more and feeling brave, you can dive into the full details in Swedish statistician Carl-Erik Särndal’s 1992 textbook "Model Assisted Survey Sampling."

**What is Model-Assisted Sampling?**

You can think of MA sampling as a way of extending a traditional grid-based forest inventory. After installing the cruise, you pair the plot data with the corresponding “pixels” in the RS data which covers the entire stand. You then use those pairs to develop a statistical model that relates what you see in the RS imagery to what was actually measured on the ground. In a sense, the RS imagery is filling in the gaps between the sample plots, making sure you don't miss anything because of where the plots happened to land.

At best, there is a great relationship between what was measured and the imagery, resulting in far better inventory information. At worst, there’s not a good relationship and you still have the solid inventory information from your probability based sample. The beauty of MA sampling is that it's backed up by a traditional cruise. This standard probability based sample guarantees the statistical validity of your overall inventory estimate.

**How Does It Work?**

Let's dive into an example to see how to see how MA sampling works in practice. If you'd like to really get into the details, we've made the Python code for running this example available in a Jupyter Notebook at https://github.com/SilviaTerra/forestry_source.

*Figure 1. A stand with a traditional cruise (9 orange plots) and some imagery data covering the entire stand. Note the landing in the bottom left.*

We'll be inventorying the very rectangular stand shown in Figure 1. We'll start by putting in a traditional grid with 9 plots. You can see the measured basal area (BA) for each plot in Table 1. Working this up just like a normal cruise, we get an average BA of 127 ft2 and a standard error of 19 ft2. Nothing special so far.

Now let’s add in some imagery. As we discussed in an earlier Biometrics Bits article, "Imagery in Forest Inventory - Platforms, Sensors, and Analysis," there are many options and tradeoffs with different types of imagery. With a MA approach, you can work with just about any type of imagery. For this example, we're using a fairly low-resolution image. As you can see in Figure 1, it looks like a bunch of small squares with different shades of green - not much detail. Inside the digital image file, each of these shades of green corresponds to a number (e.g. 0.053). In Table 1, we've listed the numeric value of the pixels that each of our 9 plots happens to land on.

*Table 1. Sample Data*

In Figure 2, we've used orange dots to chart the basal area of each plot against the pixel value from the RS image. It looks like the smaller the pixel value (the darker the shade of green) in the RS image, the more BA there is. We can develop an equation (a "statistical model") that tells us how much BA we can expect for each shade of green in our RS image. The blue line in Figure 2 shows our model's BA prediction for each pixel value. You can see that this simple example model isn't perfect, but it does a reasonable job of capturing the overall trend. As the saying goes, all models are bad, but some are useful. Let’s see if this one’s useful.

*Figure 2. Plot BA vs. Pixel Value*

We now apply this model to all 1,706 pixels in our stand to get a prediction of the BA for each pixel. Taking the average of all of these predictions, we arrive at an average BA of 118.11 ft2.

All done? Not quite yet - we need to account for bias in our model. Let's compare what we measured in the field to what we predicted for the pixels in which our plots were located. Statisticians call this difference the "residual."

*Table 2: Measured vs. Predicted BA*

The average of these residuals is -12.92 ft2. This is the estimated bias in our predictions. We simply subtract this correction from the pixel BA average we calculated above to get our final estimate of the average BA: 118.11 ft2 - (-12.92 ft2) = 131.03 ft2.

*Figure 3. Predicted BA for each pixel (darker blue means more BA)*

Not only do we end up with a statistically valid estimate of the mean BA for the stand as a whole, but as shown in Figure 3, we also now have quantitative estimates of the BA in every pixel within the stand. This sort of high-resolution inventory data can be helpful when planning harvest operations, fertilizations, etc.

**When Is It Helpful?**

We've done a lot of work to get to this point. Was it worth it? In his 1992 textbook, Särndal gives us a straightforward way to check. The variance of our MA estimate comes from the variance of our residuals. Plugging our residuals from Table 2 into the standard variance equation yields 46.55. This in turn corresponds to a standard error of 2.27 ft2. Note that this is considerably better than the standard error we got from using just the plots with no RS data. We've achieved better precision with the same number of plots. In fact it turns out to be 8.68 times better - meaning we get more than an 8x reduction in the size of the standard error of our mean BA estimate given the same number of plots, not bad!

*Table 3: Traditional Cruise vs. MA Sampling*

It's not always the case that the standard error of the MA estimate will be lower than the underlying cruise. As McRoberts (2012) notes, "The primary advantage of MA estimators is that they capitalize on the relationship between the sample observations and their model predictions to reduce the variance of the estimate of the population mean." If the relationship between the plot data and the RS data isn't strong, then the variability of the MA estimates may be so high that you're better off sticking with the original cruise. That's part of what makes MA sampling so safe - you can always just choose to use the underlying boots on the ground cruise if the auxiliary data (RS imagery, soils maps, etc.) isn't as helpful as you had hoped.

**In the Real World**

While the theory of model-assisted sampling is well-established, its application requires good biometric judgement and statistical skill. As mentioned above, the better correlation there is between your RS data and the parameter you're trying to estimate (e.g. BA), the better results you'll get. This is where skill in sample design, imagery selection, and statistical modeling come into play.

In the real world, you'll want to include a whole stack of imagery rather than just a single image (and more plots too). Typically, different types of remote sensing are correlated to different aspects of the forest (for example species composition vs. trees per acre). Having more data available can help you build better predictive models. You'll also want to predict far more than just basal area and use more sophisticated modeling approaches.

In this example, we've looked at how MA sampling can help you use the same number of plots to achieve better precision. You can also choose to put in fewer plots while maintaining your prior level of precision.

Here's the bottom line: there's a lot of freely available imagery out there that you have already paid for with your tax dollars. There are well-established statistical methods for integrating it into your inventory process. Model-assisted sampling is a powerful approach that deserves a place in your forest inventory toolbox. Applied correctly, MA sampling can potentially save you a lot of fieldwork.

**Citations**

McRoberts, R. E., Næsset, E., & Gobakken T. (2012) Inference for lidar-assisted estimation of forest growing stock volume. Remote Sensing of Environment 128, 268-275.

Särndal, C., Swensson, B., & Wretman, J. (1992) Model Assisted Survey Sampling. New York: Springer-Verlag.

**Clarification on the previous Biometrics Bits**

In our prior article on "Sources of Error in Forest Inventory," we illustrated coverage error with an example. The text should have read:

To understand coverage error, let's examine a common practice that you’ve probably seen. A forester lays out a grid of 50 plots across a stand and starts cruising at the northern end. So far, so good. But as our forester measures plots, he's also keeping a rolling calculation of the confidence interval of the BA in his sample. After he's worked his way through 30 plots on the stand, the CI falls below 10% of the mean, he declares "mission accomplished!" and heads for the truck, leaving the remaining 20 southern plots unmeasured.