This article first appeared in the February 2021 issue of the Forestry Source: Vol. 26 - No 2.
I was working on a project recently where we needed to estimate stem-level volumes for trees in Northern Minnesota. A query of some contacts in the forestry world yielded a promising result: 1984 USFS Research Paper NC-250 by Jerold T. Hahn. Available in the dusty file archives of the USFS online library, the document outlines a method for obtaining estimates of gross and net volume for tree species in the Lake States complete with some example calculations and some contextual details about the data used to develop the equations.
RP NC-250 is intended to be a practitioner’s guide to computing stem-level volumes and it is an elegant example of how scientists can synthesize decades of research into a succinct format. However, implementing the methods outlined in the paper requires translation from the equation forms and tables of model coefficients into a digital format. I am sure many have begrudgingly faced the challenge of copying relevant digits into an Excel spreadsheet, but I thought I might spare humanity from this exercise once and for all and follow my millennial scientist instinct: turn it into an R package that anyone can use!
rpnc250 is now available on GitHub: https://github.com/SilviaTerra/rpnc250
Many of the foundational documents in our profession are intended to be carried into the field and referenced by foresters as they survey the heights and ages of trees to estimate the productivity of a stand using site index. These documents are often photo copied, annotated, tattered, laminated, then photo copied again as they are passed between generations. While they have considerably less romantic zest, R packages are a natural extension to the technical bulletin format that we foresters have relied upon during our careers. They can transform complex systems of equations and tables into point-and-shoot analytical tools that can easily be applied to large amounts of information.
The task of translating RP NC-250 into an R package can be broken down into five steps:
- represent the equations for tree height and volume as R functions
- copy the appendix tables with species-specific coefficients into digital format
- create a system for relating the species labels from the paper to a commonly-used system (FIA species codes)
- write unit tests to check the accuracy of the new code
- write some documentation for new users that demonstrates how to use the tool and how to interpret the results
Each of these steps is important for creating a reliable system and writing code that can efficiently produce reliable results takes a lot of practice, but arguably the most important step is writing tests for the new code. Unit tests are used in software development to confirm the expected output of a function and are meant to help us catch “bugs” in our code before they propagate into egregious errors. For example, if I have a function square_root that returns the square root of the input value, I could write a test to check that square_root(4) returns the value 2. If the function returns 16 or any number that is not 2, then something is wrong with my function. While writing the code for the rpnc250 package, I wrote tests that allowed me to compare intermediate calculations to the examples outlined in the paper which gave me confidence that the functions were working as expected and that nothing was lost in translation between scanned PDF and R code!
Projects like rpnc250 are meant to be open source. The code for implementing the equations in the paper and the coefficient values copied from the PDF are all stored in the publicly available source code for the R package. There are many advantages to developing open source in this context, including:
- transparency: users of the package can inspect the source code and compare to the original publication to make sure that they understand and trust the contents
- reliability: code that is available to the public can be improved when individuals find errors or have ideas for new features
- education: individuals that are getting started writing their own packages can work from existing examples
There are many more papers like RP NC-250 that are waiting to be translated into open source R packages. For students who are interested in learning more about programming it can be an excellent way to sharpen your skills while delving into the technical details of a specific topic like volume estimation, site productivity, or some other quantitative topic. For practitioners it can be an opportunity to give something back to the community. In any case, the practice of translating published research into a form that can be more easily used by the public is a good way to improve accessibility of current and historic research efforts.
Would you, dear reader, like to see other tools made available as R packages? Perhaps there is a daily-driver technical bulletin that you would like to be able to use in R. Or are you a student who wants to tackle a project like this? SilviaTerra would love to hear from you - send me an email at firstname.lastname@example.org. We have several public repositories with work from previous Biometrics Bits articles, and are committed to working with our community to make automatic functions and packages more available to people who can make the best use of them in their practice of forestry.
Hahn, Jerold T. 1984. Tree volume and biomass equations for the Lake States. Research Paper NC-250. St. Paul, MN: U.S. Dept. of Agriculture, Forest Service, North Central Forest Experiment Station. Available online at https://www.nrs.fs.fed.us/pubs/rp/rp_nc250.pdf
Henry Rodman is a data engineer at SilviaTerra.