Sources of Error in Forest Inventory

This article first appeared in the June issue of the Forestry Source.

I remember a time in forestry school at Mississippi State when the professor had each student try to count every single tree in a 10 acre stand. We were all surprised (except the professor) that we had as many different numbers as we had students.

Error is a given in forest inventories, but not all errors are created equal. This is particularly true now that many foresters are integrating remote sensing and other new technologies into their inventories. Traditional cruising and remote sensing assisted methods contribute error in different ways. By understanding the different sources of error in various inventory designs, you can make an informed decision about when and where it makes sense to include remote sensing in your inventory process.

In this article, we'll be considering error in three broad categories of inventory designs: traditional cruises, model-assisted remote-sensing, and model-based remote sensing. All foresters are familiar with traditional cruising, where you lay a grid of plots out across a stand or strata. The other two categories both use remote sensing data (satellite images, aerial photos, LiDAR, etc.) and statistical models that relate plot data to the remote sensing imagery. Model-assisted methods still rely on a grid of plots and use imagery to "fill in the gaps" between the plots in a traditional cruise. Model-based methods, on the other hand, are not backed by a traditional (design-based) cruise.

Measurement Error
Measurement error is the simplest type of error to understand. It occurs when your measurement instrument does not correctly register a measurement. This can happen in many different ways, from a DME not being properly calibrated to misreading a diameter tape because of fatigue. This is what happened in the story above where the students miscounted the total number of trees. We didn't measure perfectly - some overcounted and some undercounted.

Troublingly, measurement error can frequently be directional - that is, systematically biased towards undercounting (or overcounting). For example, a careless cruiser might consistently call borderline trees "out" of a plot, artificially undercounting the number of stems in each sample.

To account for these types of errors, we turn to a familiar practice: check cruising. By revisiting plots and remeasuring the stems, we can check for measurement error against a second (perhaps more careful) measurement.

Remote sensing data can also susceptible to measurement error. Poorly calibrated sensors or errors in post-processing can both contribute to measurement error.

Sampling Error
Sampling error occurs when you take a sample (hence the name) rather than taking a census. When you only measure a fraction of the population you're trying to describe, the areas that you don't measure contribute to your sampling error. This is less of a problem in very homogenous plantation stands because the unmeasured areas are likely very similar to the measured areas. All else equal, more variable mixed-structure and mixed-species stands usually end up with more sampling error because there's a greater chance that the area you measured is not representative of the area you didn't measure. We use the standard error calculation to estimate how much sampling error we can expect in a given cruise. The equation for and description of the standard error for a cruise was covered by the good Dr. DBH in a prior issue of the Forestry Source.

Note that sampling error only occurs in traditional cruises. Remote-sensing approaches typically do not have sampling error because they have wall-to-wall imagery for each stand. The imagery is a true census - because it covers the whole area, there's no sampling error. But there's no free lunch! Remote sensing methods trade sampling error for a different type of error: modeling error.

Modeling Error
Many foresters have encountered modeling error in the context of subsampling tree heights. Rather than measuring the height on every tree in a plot, we'll measure a few heights and then build a model to predict height from DBH. Because DBH relates well - but not perfectly - to height, there will be some modeling error in the predicted heights.

Modeling error also occurs in remote sensing forest inventories. Remote sensing data are rarely direct measurements of the tree attributes you actually care about (dbh, species, etc.). Instead, it's necessary to build a model to translate the raw remote sensing data into useful forest inventory information.

For example, several studies have shown that radar measurements of forests are correlated with timber volume. By cruising a few plots, we could build a mathematical model that relates the volumes we measure on the ground to the radar signature measured by the European Space Agency Sentinel-1 satellite (don’t worry, a US radar satellite is going up in 2020). However, our model won't be perfect. Sometimes it will predict more volume than is actually there and sometimes it will underpredict. The expected difference between the actual and predicted value is the modeling error.

We can develop a quantitative metric for our model performance by looking at the difference between the actual and predicted values - this is known as the "residual." If the residuals are highly variable, that means our model has inconsistent quality - a troubling sign. However, if the variability in the residuals is low, and they don’t show any directional bias, it means that our model is fairly reliable and our modelling error is low.

Note that traditional cruising is not susceptible to modeling error. Cruisers measure DBH directly on each plot - there's no model involved. However, modeling error can sneak into traditional cruises through height and volume models. These are attributes that are not often measured on every tree and rely on models to predict them.

Coverage Error
Coverage error is perhaps the most insidious type of error in a forest inventory because it's often impossible to quantify or correct. You can avoid coverage error in a well-designed traditional cruise and in a model-assisted remote sensing inventory (because it is underpinned by a traditional cruise), but when it comes to model-based remote sensing approaches there’s no guarantee.

To understand coverage error, let's examine a common practice that you’ve probably seen. A forester lays out a grid of 50 plots across a stand and starts cruising at the northern end. So far, so good. But as our forester measures plots, he's also keeping a rolling calculation of the confidence interval of the BA in his sample. After he's worked his way through 30 plots on the stand, the CI falls below 10%, he declares "mission accomplished!" and heads for the truck, leaving the remaining 20 southern plots unmeasured.

This cruise suffers from coverage error. The problem is that those 30 plots are not a valid sample of the full stand. The southern part of the stand could be significantly different and we wouldn't know (and without going back out to the woods, there's no way to fix this problem or even understand how big of a problem it is). Coverage error occurs when your cruise is not a statistically valid sample of the whole population.

Luckily, it's straightforward to avoid coverage error in traditional cruises and in model-assisted remote sensing approaches. Ask the simple question, “Do all areas this inventory is meant to describe have a known probability of being sampled?” If the answer is yes, no problem. If the answer is no, then you might want to rethink your design.

However coverage error can be a big problem in model-based remote sensing approaches. The plot data that is used to train the remote sensing model is often selected to build a good model, but perhaps fails to cover the full range of oddities in the target population. The plots (or individual trees, in the case of single-tree LiDAR methods) may not be a statistically valid subsample of the population being measured.

To see why this could be a major problem, consider building a model for a LiDAR individual tree segmentation project. If the data used to train this model only comes from nice, straight trees, how will the model perform on the messy, forked, twisted trees that surely exist in our forest? Will the model split a forked tree into two trees? Will it clump two adjacent trees into a single tree? If the training data isn't drawn from a statistically valid sample, there's nothing to guarantee that the "splitting" will balance out the "clumping." Like measurement error, coverage error like this can frequently be directional, biasing your estimates either high or low.

Parting Thoughts
So how can all of this help you make better decisions about your inventory strategy? All inventories are subject to error, but understanding the tradeoffs between different types of error can save you a lot of grief later on.

Measurement error exists in all types of inventories, but a good check-cruising program can help minimize that problem.

A big difference between traditional cruising and remote-sensing approaches is that you're trading sampling error for modeling error. If the modeling error is less than the sampling error, this can be a good trade and can save you a lot of fieldwork. The appropriate choice of training plot data, imagery inputs, and modeling approaches can significantly reduce modeling error.

Of the four, coverage error is perhaps the most pernicious because it leaves you with no visibility about the magnitude of the error - you have no idea what you're missing. As Donald Rumsfeld famously said, it's an "unknown unknown." Fortunately, you can avoid this type of error by using statistically valid sampling design and not relying solely on a model-based remote sensing method. Having a design-based cruise underpinning your inventory (as in traditional cruising and model-assisted remote sensing) is a safety net that prevents you from being blindsided by unexpected coverage errors.

It's an exciting time to be a forester because there are lots of new data sources, imagery platforms, and modeling methods becoming available all the time. For most tree attributes, low cost remote sensing data is available to help us improve our estimates. By being aware of the sources of error in different remote sensing approaches, you can take advantage of these new technologies while avoiding some common pitfalls.


Zack Parisa is the president of SilviaTerra.