Journal list menu

Volume 63, Issue 6 p. 2372-2383
Article
Free Access

Combining nutrient, productivity, and landscape-based regressions improves predictions of lake nutrients and provides insight into nutrient coupling at macroscales

Tyler Wagner

Corresponding Author

Tyler Wagner

U.S. Geological Survey, Pennsylvania Cooperative Fish and Wildlife Research Unit, Pennsylvania State University, University Park, Pennsylvania

Correspondence: [email protected]Search for more papers by this author
Erin M. Schliep

Erin M. Schliep

Department of Statistics, University of Missouri, Columbia, Missouri

Search for more papers by this author
First published: 13 July 2018
Citations: 11

Abstract

Empirical nutrient models that describe lake nutrient, productivity, and water clarity relationships among lakes play a prominent role in limnology. Landscape-based regressions are also used to understand macroscale variability of lake nutrients, clarity, and productivity (hereafter referred to as nutrient-productivity). Predictions from both models are used to inform eutrophication management globally. To date, these two classes of models are generally conducted separately, which ignores the known dependencies among nutrient-productivity variables. We present a statistical model that integrates nutrient-productivity and landscape-based regressions—where lake nutrients, productivity, and clarity variables are modeled jointly. We fitted a joint nutrient-productivity model to over 7000 lakes with three nutrients (total phosphorus, total nitrogen, nitrate concentrations), chlorophyll a concentrations, and Secchi disk depth as response variables and landscape features as predictor variables. Because lakes in different regions respond to landscape features differently, we focused our analysis on two subregions with different dominant land uses, the agricultural Midwest and the forested Northeast U.S. Predictive performance was enhanced by modeling nutrient-productivity variables jointly. We also found strong evidence that nutrient-productivity variables were coupled, and that only nitrate may be decoupled from other nutrient-productivity variables in the forested region. We speculate that these regional differences may be related to differences in the strength of biogeochemical cycles and stoichiometric controls between these regions. Jointly modeling nutrient-productivity variables in lakes effectively integrates the two dominant approaches for studying lakes nutrient-productivity relationships and provides novel insight into macroscale patterns of the coupling of nutrients, chlorophyll, and water clarity in lakes.

The development of empirical models to predict nutrient concentrations, measures of primary producer biomass (e.g., chlorophyll a concentrations; CHL) and water clarity (e.g., Secchi disk depth), has a rich history in limnology (Dillon and Rigler 1974; Canfield and Bachmann 1981; Peters 1986). The classic example—a log-linear relationship between CHL and total phosphorus (TP)—is commonly used to inform the development of lake water quality criteria (Havens and Walker 2002; U.S. EPA 2010; Huo et al. 2014). These models used to describe lake nutrient, productivity, and clarity relationships (hereafter referred to as empirical nutrient-productivity models) are integral in informing lake management decisions, testing basic limnological principles, and for advancing our understanding of the controls and drivers of water quality in lakes (Pace 2001). Nutrient-productivity models have been widely applied to individual lakes and, more recently, to populations of lakes in efforts to improve our understanding of the spatial heterogeneity in the response of lakes to environmental stressors (Malve and Qian 2006; Phillips et al. 2008; Wagner et al. 2011).

Although the development and application of empirical nutrient-productivity models are common, models that include information about lake morphometry and the natural and anthropogenic watershed features of lakes are increasingly being used to identify and understand the importance of landscape drivers of lake nutrients and measures of productivity (Wagner et al. 2011; Nielsen et al. 2012; Read et al. 2015). For instance, landscape-based regressions are used to identify the relative importance of different land use types as non-point sources of nutrients to lakes and for explaining macroscale variation in nutrients, primary producer biomass, and other indicators of water quality (Arbuckle and Downing 2001; Jones et al. 2004; Wagner et al. 2011). These landscape-based regressions are also used to highlight potential management activities that could help meet water quality goals and have been shown to be useful for prediction (Meeuwig and Peters 1996). Because landscape-based regression models use predictors that are largely derived from widely available geospatial datasets, these models are well suited for predicting nutrients and productivity across large spatial extents—at regional to continental scales (Cheruvelil et al. 2013; Collins et al. 2017).

The increase in use of landscape-based regression models is driven by the importance of lake morphometric properties and landscape characteristics to the source, delivery, and processing of nutrients in lakes (Soranno et al. 1996; Carpenter et al. 1998; Collins et al. 2017). Likewise, empirical nutrient-productivity models are widely used because many nutrient-productivity variables are correlated with one another (Ostrofsky and Rigler 1987; Phillips et al. 2008). The correlation among nutrient-productivity variables is partly due to coupled biogeochemical cycles (Schlesinger et al. 2011; Gibson and O'Reilly 2012). For example, nutrients and other elements do not cycle independently, as illustrated by co-limitation of growth of primary producers (Sterner 2008; Harpole et al. 2011). For other variables, such as Secchi disk depth, the correlation with nutrients is because water clarity is measuring, in part, an outcome (i.e., algal biomass) driven by the coupling of biogeochemical cycles. This correlation among nutrient-productivity variables has important implications for limnological modeling and prediction and for furthering our understanding of the ecological processes that influence lake water quality.

From a limnological modeling perspective, combining nutrient-productivity and landscape-based regressions is desirable because both approaches are useful for furthering our understanding of nutrient dynamics and water quality in lakes and making informed predictions for unobserved lakes. In fact, there are important relationships between and among nutrients and landscape variables that may be ignored or not realized by not integrating these two approaches. Efforts to integrate nutrient-productivity and landscape-based regressions have been limited due to challenges with missing data and multicolinearity. Instead, researchers study the different nutrient-productivity variables individually by developing univariate nutrient regressions, where, for example, independent TP, total nitrogen (TN), and CHL regressions are developed and modeled as a function of one or more landscape predictors and then compared (Jones et al. 2004; Carle et al. 2005; Chen et al. 2015; Soranno et al. 2015). Including additional nutrient-productivity variables as predictors into these models can be problematic because there is often missing nutrient-productivity data (Soranno et al. 2017), and these nutrient-productivity variables will likely be confounded with each other and with other landscape predictors. The missing nutrient-productivity data can be problematic because—although accommodating missing data during analysis is a statistical issue—commonly used statistical software programs for fitting regression models require values of all predictor variables for each lake, either observed or imputed. Missing data is less problematic for landscape-based predictors which are often derived from satellite or remotely sensed data sources. The presence of missing predictor variables is most commonly addressed by using a complete-case analysis; whereby, those lakes with missing data are removed from the analysis (Little 1992; Fergus et al. 2016). Discarding lakes with missing data results in a loss of information.

The second major challenge with integrating these two types of empirical models is the high correlation that often exists among predictors (particularly among nutrient-productivity variables and between nutrient-productivity variables and landscape predictors) which leads to uninterpretable regression coefficients associated with multicollinearity (Doubek et al. 2015). This is important because there is interest in interpreting regression coefficients within the context of understanding drivers of lake nutrients and productivity. In addition, by independently modeling nutrient-productivity variables, we are ignoring the inherent dependence among them, which may be due to coupling, similar environmental drivers, or some combination of both. This dependency affects predictive performance and limits our ecological understanding of how these indicators covary over space and time. Ideally, we could utilize both approaches—nutrient-productivity and landscape-based regressions—that is, jointly model nutrient-productivity variables as a function of landscape-based predictors.

Jointly modeling multiple response variables allows for the integration of nutrient-productivity and landscape-based regressions and overcomes some of the above challenges (Clark et al. 2014; Warton et al. 2015; Schliep et al. 2017). To date, however, jointly modeling multiple nutrient-productivity variables is rarely done (but see Cha et al. 2016 for an example of jointly modeling N and P). These models, which we will refer to as joint nutrient-productivity models, quantify the effects of landscape drivers on nutrient-productivity variables, while at the same time account for nutrient-productivity dependence through the residuals. The joint models enable observed nutrient-productivity variables to inform on the prediction of unobserved variables after accounting for the effects of landscape predictors. For example, by understanding how relationships between nutrient-productivity variables, such as TP and Secchi disk depth—which is widely measured (Lottig et al. 2014)—vary across a population of lakes, we can leverage this information to make more informed predictions of TP at unobserved lakes. Joint nutrient-productivity models can also provide new insight into fundamental limnological relationships by decomposing the correlations among nutrient-productivity variables that are due to shared landscape drivers vs. those that may be due to ecological processes that result in strong ecological coupling.

Our study area contains over 7000 lakes and five nutrient-productivity variables. We compared the predictive performance of the joint nutrient-productivity model to the traditional univariate models where residual dependence is ignored. Previous research has shown that lakes in regions with different dominant land uses and eco-climatic zones respond differently to different landscape drivers. Our study area contains two regions with different dominant land uses—an agriculturally and forested-dominated region. We fitted a nutrient-productivity model for each region to test our expectation that lake nutrient and productivity measures in different regions are likely to be coupled in different ways and are likely to respond to different dominant landscape drivers.

Methods

Nutrient-productivity data

We used lake nutrient-productivity data for 7184 lakes located in the Midwest and Northeastern United States. Nutrient-productivity variables included nutrients—TP (μg L−1), TN (μg L−1), and nitrate concentrations (μg L−1; NO3-N)—and indicators of algal biomass (CHL concentration; μg L−1), and water clarity (Secchi disk depth [m]). All data came from the Lake Multi-Scaled Geospatial and Temporal Database (LAGOS) of the Northeast U.S. (LAGOS-NELIMNO v. 1.087.1; Soranno and Cheruvelil 2017a,b; Soranno et al. 2017) using the LAGOS R package (Stachelek and Oliver 2017). LAGOS-NE is a subcontinental scale database that includes approximately 1,800,000 km2 over a 17-state region in the Midwestern and Northeastern United States. We used a subset of lakes with a surface area urn:x-wiley:00243590:media:lno10944:lno10944-math-0001 4 ha that had at least one of the five nutrient-productivity variables quantified. Lake nutrient data were restricted to epilimnetic samples taken during the summer months (15 June–15 September) spanning the years 1990–2011. We retained the most recent sampling occasion for every lake. For lakes sample more than once over time (n = 5081 lakes), we used a two-staged approach to ensure that we obtained lakes that were sampled for TN, because TN was not measured as often compared to the other nutrient-productivity variables. First, we selected all lakes that had observed values of TN and used the most recent observation for those lakes. For all other lakes sampled multiple times, we used the most recent observation. This process resulted in a single observation of at least one of the five nutrient-productivity variables per lake (Fig. 1).

Details are in the caption following the image

Map showing study lakes across the LAGOS-NE study extent and the Midwest and Northeast subregions.

Study subregions

We selected two subregions within the LAGOS-NE study area, previously delineated by Collins et al. (2017) in a study that examined drivers of lake nutrient stoichiometry across the LAGOS-NE extent. The regions were created to capture the gradient in forested and agricultural land use present in the LAGOS study extent and to represent regions dominated by extremes in these two land use/cover types. The focus was on contrasting agricultural and forested landscapes because of the strong relationship between agricultural land use and nutrient inputs into inland lakes (Collins et al. 2017). The two regions (Fig. 1) were created by aggregating regional major river watersheds (HUC 4) based on the proportion of agricultural land use cover in a given HUC 4 where adjacent HUC 4 watersheds with similar land use characteristics were combined. This approach resulted in two contrasting regions—the Northeast region (n = 1655 lakes), composed of ten HUC 4 watersheds, that had very low proportions of agricultural land use (< 10%), and the Midwest region (n = 434 lakes), which was composed of seven HUC 4 watersheds with relatively high proportions of agricultural land use (> 50%).

Landscape predictor variables

We chose landscape predictor variables that represented important sources of nutrients (e.g., land use) or the transportation of materials to lakes (e.g., stream density), and that are associated with internal processing of nutrients in lakes (e.g., lake depth; Collins et al. 2017). All geospatial lake predictor variable data came from LAGOSGEO v. 1.05 (Soranno and Cheruvelil 2017b). Except for lake maximum depth and lake area, which are lake-scale properties, all geospatial summaries were derived at the lake watershed scale.

Statistical model

We modeled nutrient-productivity variables jointly to account for correlations among variables. Let urn:x-wiley:00243590:media:lno10944:lno10944-math-0002 denote a vector of length K of lake nutrient-productivity variables for lake urn:x-wiley:00243590:media:lno10944:lno10944-math-0003. The joint nutrient-productivity model can be written as:
urn:x-wiley:00243590:media:lno10944:lno10944-math-0004(1)
where urn:x-wiley:00243590:media:lno10944:lno10944-math-0005 is a matrix of coefficients such that Bkp is the coefficient of the pth predictor variable for the kth variable. Additionally, urn:x-wiley:00243590:media:lno10944:lno10944-math-0006 is an error vector of length K. We model
urn:x-wiley:00243590:media:lno10944:lno10944-math-0007
where Σ is a K × K covariance matrix capturing the dependence between nutrient-productivity variables that is not accounted for by the regression. These errors are assumed independent and identically distributed across lakes.

Nutrient-productivity variables were modeled on the loge-transformed scale. Because of highly skewed distributions, all proportional predictor variables (e.g., land use) were logit transformed and standardized, while non-proportion predictors were loge-transformed and standardized prior to analysis. The model was fitted to three datasets. The first model used the entire LAGOS-NE study extent. The second and third were fitted to the Midwest and Northeast subregions. All models were fitted using the program WinBUGS (Lunn et al. 2000) called from within the program R (R Core Team 2017) using the R2WinBUGS package (Sturtz et al. 2005). Independent, diffuse normal priors were used for all coefficient parameters in urn:x-wiley:00243590:media:lno10944:lno10944-math-0008, and the variance-covariance matrix, Σ, was modeled using the scaled inverse-Wishart distribution (Gelman and Hill 2007). We ran three parallel Markov chains beginning each chain with random starting values. Each chain was run for 15,000 iterations, from which the first 5000 samples were discarded. This resulted in 30,000 samples used to summarize posterior distributions. Convergence was assessed both visually through the use of trace plots and quantitatively using the Brooks-Gelman-Rubin statistic. Residual plots were examined to assess the assumption of normality. We determined predictor variable significance by evaluating whether or not the 95% credible interval of the coefficient overlapped with zero.

Model performance measures and decomposing correlations

We calculated root mean squared prediction error (RMSPE) using 10-fold cross validation, where the model was fitted 10 times to 90% of the data with 10% retained for out-of-sample prediction. To evaluate the potential predictive power gained by modeling nutrient-productivity variables jointly, we compare marginal predictions (RMSPEM) to conditional predictions (RMSPEC) of the nutrient-productivity variables obtained at out-of-sample locations. Under the multivariate normal distribution assumed in Eq. 1, the marginal predictive distributions are equivalent to the predictive distributions that would result from modeling each nutrient-productivity variable independently. Therefore, marginal predictions are obtained for each nutrient-productivity variable without reference to the values of the other variables. The conditional predictions are obtained for each nutrient-productivity variable by conditioning on the values of all other variables, e.g., we predict TN conditionally at location urn:x-wiley:00243590:media:lno10944:lno10944-math-0009 given its observations of TP, NO3-N, CHL, and Secchi disk depth. Assuming that there is information shared across variables, we would expect to see more accurate and more precise predictions from the conditional predictive distributions than the marginal predictive distributions. If there is very little dependence between the nutrient-productivity variables after accounting for the predictors in the model, the conditional and marginal predictions will be approximately equivalent. As an additional measure of model performance, we calculated both the marginal and conditional predictive R2, urn:x-wiley:00243590:media:lno10944:lno10944-math-0010, and urn:x-wiley:00243590:media:lno10944:lno10944-math-0011, respectively.

To quantify the potential spatial variability in both landscape drivers (i.e., the estimated regression coefficients) and the residual covariance structure among the response variables on predictive performance, we calculated RMSPEM, RMSPEC, urn:x-wiley:00243590:media:lno10944:lno10944-math-0012, and urn:x-wiley:00243590:media:lno10944:lno10944-math-0013 for the Midwest and Northeast regions using parameter estimates from the model fitted to all lakes in the LAOGS-NE study extent. If there were differences in the marginal predictions between the region-specific and LAGOS-NE models, then this would suggest the need for spatially varying coefficients. If there were differences in conditional predictions (and marginal predictions were the same), then this may suggest the need for spatially varying covariance structure. If both marginal and conditional predictions were different, this would suggest the need for both spatially varying coefficients and covariances.

Nutrient-productivity variable correlations were decomposed into residual correlations and correlations due to shared environmental drivers. Residual correlations were obtained from the off-diagonal elements of Σ. The correlations due to shared environmental responses were calculated following the methods of Pollock et al. (2014), where urn:x-wiley:00243590:media:lno10944:lno10944-math-0014kk′ denotes the correlation between nutrient-productivity variables k and urn:x-wiley:00243590:media:lno10944:lno10944-math-0015, and is a function of the regression vectors βkp and urn:x-wiley:00243590:media:lno10944:lno10944-math-0016 and the covariances of the environmental variables. Strong residual correlation may suggest strong coupling of nutrient-productivity variables or the need for the inclusion of more predictors in the model. A strong correlation due to the environment may suggest similar landscape and lake-scale drivers.

Results

Study lakes

Lakes across the study region varied substantially in their geophysical, chemical, and biological properties, and anthropogenic settings (Table 1). Median values of TP, TN, and NO3-N across all 7184 study lakes were 16.0 μg L−1, 600 μg L−1, and 20 μg L−1, respectively. Median CHL was 5.1 μg L−1 and median Secchi disk depth was 2.4 m. Not all lakes had all five nutrient-productivity response variables observed. The proportion of lakes with missing observations was 0.29 (n = 2093), 0.44 (n = 3135), 0.55 (n = 3983), 0.32 (n = 2309), and 0.18 (n = 1265) for TP, TN, NO3-N, CHL, and Secchi disk depth, respectively.

Table 1. Medians, followed by the first and third quartiles, of the landscape and lake-scale predictors and nutrient-productivity response variables for the LAGOS-NE study extent and Midwest and Northeast subregions.
LAGOS-NE Midwest Northeast
Maximum depth (m) 8.6 (4.9, 14.2) 5.8 (3.0, 10.3) 7.6 (4.2, 13.4)
Lake area (ha) 54.5 (21.0, 145.5) 82.1 (30.0, 234) 35.1 (13.9, 108)
Watershed : lake area 8.4 (3.9, 21.6) 8.5 (3.6, 28.7) 10.2 (5.5, 22.9)
Proportion urban land use 0.06 (0.03, 0.12) 0.08 (0.06, 0.16) 0.05 (0.01, 0.09)
Proportion agricultural land use 0.05 (0.0, 0.35) 0.66 (0.40, 0.80) 0.01 (0.0, 0.05)
Proportion wetland land cover 0.07 (0.02, 0.16) 0.03 (0.0, 0.07) 0.04 (0.02, 0.08)
Road density (m ha−1) 25.8 (16.3, 40.4) 28.2 (20.1, 44.2) 21.7 (12.0, 35.2)
Stream density (m ha−1) 3.7 (0.02, 8.0) 4.0 (0.7, 8.0) 6.5 (2.1, 10.3)
TP (μg L−1) 16.0 (10.0, 34.0) 68.2 (33.5, 138) 10.0 (7.0, 16.0)
TN (μg L−1) 600 (366, 1000) 1450 (994, 2300) 284 (195, 420)
Chlorophyll a (μg L−1) 5.1 (2.7, 14.0) 25.9 (9.1, 62.3) 3.9 (2.5, 6.6)
Secchi disk depth (m) 2.4 (1.3, 3.9) 0.9 (0.5, 2.0) 3.9 (2.4, 5.6)
Nitrate (μg L−1) 20 (5.0, 50.0) 60 (18.0, 200) 50.0 (20.0, 50.0)

Lakes within the study extent also varied widely in the amount of urban and agricultural land use present in their watersheds, with lake watersheds ranging from 0% to 95% urban and from 0% to 100% agricultural land use. The median percentage of agricultural land use in a lake's watershed in the Midwest subregion was 66% and only 1% for lakes within the Northeast subregion. As expected, the two subregions also differed substantially in lake chemistry and landscape settings (Table 1). For example, median TP for lakes in the Midwest subregion was 68.2 μg L− 1; whereas, median TP in the Northeast subregion was 10.0 μg L−1.

Landscape predictors and predictive performance

For all three models (LAGOS-NE extent, Midwest, and Northeast) and across all five nutrient-productivity variables, the predictive R2 due to landscape predictors alone ranged from 6% (NO3-N in the Northeast subregion)—61% (Secchi disk depth in the Northeast subregion; urn:x-wiley:00243590:media:lno10944:lno10944-math-0017 values; Table 2). On average, predictive R2 was greatest for Secchi disk depth (average urn:x-wiley:00243590:media:lno10944:lno10944-math-0018 = 47%), followed by TP (average urn:x-wiley:00243590:media:lno10944:lno10944-math-0019 = 41%), TN (average urn:x-wiley:00243590:media:lno10944:lno10944-math-0020 = 38%), CHL (average urn:x-wiley:00243590:media:lno10944:lno10944-math-0021 = 22%), and NO3-N (average urn:x-wiley:00243590:media:lno10944:lno10944-math-0022 = 11%). Some important similarities and differences in predictors of nutrient-productivity variables were detected across indicator and region (Fig. 2). For instance, lake depth was consistently negatively correlated with TP, TN, and CHL and positively correlated with Secchi disk depth. In addition, the proportion of agricultural land use was positively correlated to TP, TN, CHL, and NO3-N and negatively correlated with Secchi disk depth across the LAGOS-NE extent; however, the effect of agricultural land use on nutrient-productivity variables varied by region. The largest differences among the three models were observed in the agriculturally dominated Midwest subregion. Fewer landscape predictors were important in predicting all five nutrient-productivity variables in the Midwest subregion compared to all the lakes in the study area and the Northeast subregion. The larger uncertainty in parameter estimates in the Midwest subregion primarily reflects the smaller sample size in this region compared to the other two models.

Table 2. Margnal (M) and conditional (C) root mean squared predictive error (RMSPE) and predictive R2 from joint nutrient-productivity models for the Midwest and Northeast subregions. RMSPE and R2 values with a LAGOS subscript are values for the Midwest and Northeast subregions calculated using the model fitted to the entire LAGOS-NE study extent.
TP TN CHL Secchi NO3-N
Midwest
RMSPEM 0.762 0.552 1.125 0.743 1.689
RMSPEC 0.637 0.428 0.775 0.527 1.404
urn:x-wiley:00243590:media:lno10944:lno10944-math-0023 0.927 0.658 1.148 0.770 1.804
urn:x-wiley:00243590:media:lno10944:lno10944-math-0024 0.662 0.437 0.765 0.531 1.498
urn:x-wiley:00243590:media:lno10944:lno10944-math-0025 0.48 0.26 0.13 0.34 0.12
urn:x-wiley:00243590:media:lno10944:lno10944-math-0026 0.64 0.55 0.59 0.67 0.39
urn:x-wiley:00243590:media:lno10944:lno10944-math-0027 0.23 −0.06 0.10 0.29 0.00
urn:x-wiley:00243590:media:lno10944:lno10944-math-0028 0.61 0.53 0.60 0.66 0.32
Northeast
RMSPEM 0.640 0.431 0.805 0.391 0.962
RMSPEC 0.588 0.349 0.714 0.332 0.878
urn:x-wiley:00243590:media:lno10944:lno10944-math-0029 0.722 0.549 0.847 0.504 1.120
urn:x-wiley:00243590:media:lno10944:lno10944-math-0030 0.614 0.400 0.773 0.397 1.103
urn:x-wiley:00243590:media:lno10944:lno10944-math-0031 0.34 0.44 0.21 0.61 0.06
urn:x-wiley:00243590:media:lno10944:lno10944-math-0032 0.44 0.63 0.38 0.72 0.21
urn:x-wiley:00243590:media:lno10944:lno10944-math-0033 0.16 0.09 0.12 0.34 −0.28
urn:x-wiley:00243590:media:lno10944:lno10944-math-0034 0.39 0.52 0.27 0.59 −0.24
Details are in the caption following the image

Estimated effects of landscape and lake-scale predictors on nutrient-productivity variables for the entire LAGOS-NE study extent (row 1), the Midwest subregion (row 2), and the Northeast subregion (row 3). Circles are posterior means and horizontal bars are 95% credible intervals. Effects with 95% credible intervals that overlap with zero are shown in blue. TP, total phosphorus; TN, total nitrogen; CHL, chlorophyll a; Secchi, Secchi disk depth; NO3-N, nitrate.

Jointly modeling nutrient-productivity variables and leveraging information about the dependence among nutrient-productivity variables led to substantial gains in predictive performance and, in particular, the precision of estimates. The increased precision of the conditional predictions can be seen when comparing the marginal and conditional posterior predictive distributions in Fig. 3 for two lakes from the Midwest subregion. These lakes were randomly chosen for illustrative purposes. For a given lake, the conditional distributions are obtained by conditioning on whichever other nutrient-productivity variable is/are observed. For lake #1, all five nutrient-productivity variables were observed. Therefore, the conditional distribution for each variable was obtained given the other four variables. For this lake, the highest density of each conditional posterior predictive distribution is closer to the observed value (improved accuracy), and the distributions are narrower (i.e., increased precision) when compared to the marginal posterior predictive distributions (Fig. 3). Similar patterns are observed for lake #2; however, for this lake, TP and TN were not observed. The conditional posterior predictive distributions for these unobserved quantities have less uncertainty and predict greater concentrations compared to the marginal predictions (Fig. 3). In addition, note that the observed value of Secchi disk depth for this lake is slightly lower than the mean of the predictive distribution given the predictors. Therefore, since there is negative dependence between Secchi disk depth and TP, TN, CHL, and NO3-N, the conditional distributions of TP, TN, CHL, and NO3-N for this lake are shifted right. The shift is least pronounced for NO3-N, which had the lowest residual correlation with Secchi disk depth. Comparisons of the marginal and conditional RMSPE and marginal and conditional predictive R2 values in Table 2 summarize the gain in predictive performance—where smaller RMSPE and larger predictive R2 values indicate better predictive performance. For example, in the Midwest study region, RMSPE for TP decreased from 0.762 to 0.637 when predictions were conditional on the observed values of each of the other nutrient-productivity variables. In addition to the decrease in RMSPE, the predictive R2 increased from 0.48 ( urn:x-wiley:00243590:media:lno10944:lno10944-math-0035) to 0.64 ( urn:x-wiley:00243590:media:lno10944:lno10944-math-0036; Table 2). Similar gains in predictive performance were observed for all nutrient-productivity variables and across all three regions. Importantly, however, there were differences between the subregions in predictive performance for some nutrient-productivity variables. For instance, the predictive R2 increased substantially for NO3-N in the Midwest subregion ( urn:x-wiley:00243590:media:lno10944:lno10944-math-0037 = 0.12 vs. urn:x-wiley:00243590:media:lno10944:lno10944-math-0038 = 0.39); however, a similar gain in predictive R2 was not observed for NO3-N in the Northeast subregion ( urn:x-wiley:00243590:media:lno10944:lno10944-math-0039 = 0.06 vs. urn:x-wiley:00243590:media:lno10944:lno10944-math-0040 = 0.21; Table 2).

Details are in the caption following the image

Marginal (solid black lines) and conditional (dotted lines) posterior predictive distributions from a joint nutrient-productivity model for two lakes in the Midwest subregion. TP, total phosphorus; TN, total nitrogen; CHL, chlorophyll a; Secchi, Secchi disk depth; NO3-N, nitrate. Vertical line is observed value. Lake #1 (upper row) had all five nutrient-productivity variables observed; whereas, TP and TN were not observed for lake #2 (bottom row).

Residual and shared environmental correlations

Pairwise shared environmental correlations were plotted against residual correlations for all nutrient-productivity variable pairs and models (Fig. 4). The partitioning of the effects of shared environmental drivers from residual interactions revealed which nutrient-productivity variables responded similarly to environmental conditions and which ones may be correlated due to ecological processes not accounted for by the predictor variables. For instance, for lakes in the Midwest subregion, TP, TN, and CHL tended to respond similarly to landscape drivers ( urn:x-wiley:00243590:media:lno10944:lno10944-math-0041jj′: TP,TN = 0.47, TP,CHL = 0.67, and TN,CHL = 0.32) and were also indicative of variables potentially driven by similar ecological processes (residual correlations: TP,TN = 0.35, TP,CHL = 0.54, TN,CHL = 0.41). Whereas, Secchi disk depth tended to be negatively correlated with shared environmental drivers when compared with TP, TN, and CHL ( urn:x-wiley:00243590:media:lno10944:lno10944-math-0042jj′: Secchi disk depth,TP = −0.68, Secchi disk depth,CHL = −0.91, and Secchi disk depth,TN = −0.35), and to respond in the opposite direction to shared environmental processes (residual correlations: Secchi disk depth,TP = −0.53, Secchi disk depth,CHL = −0.72, and Secchi disk depth,TN = −0.35). Similar patterns were observed in the other regions for these nutrient-productivity variables, although the magnitude of the correlations varied. One noticeable difference in the correlation partitioning across the two subregions was between NO3-N and the other nutrient-productivity variables. In the Midwest subregion, the correlations due to shared environmental drivers between NO3-N and TP and TN were much larger compared to the Northeast subregion (Midwest: NO3-N,TP = 0.37, NO3-N,TN = 0.76; Northeast: NO3-N,TP = 0.01, NO3-N,TN = 0.06). Residual correlations between NO3-N and TP and TN were also larger in the Midwest subregion compared to those observed in the Northeast subregion, especially for NO3-N and TN (Midwest: NO3-N,TN = 0.49, Northeast: NO3-N,TN = 0.29). In addition, NO3-N shared environmental and residual correlations tended to cluster more closely around zero compared to the Midwest subregion (Fig. 4).

Details are in the caption following the image

Estimated residual and environmental correlations between pairs of nutrient-productivity variables for the LAGOS-NE study extent (a) and the Midwest (b) and Northeast (c) subregions. Circles are posterior means and error bars are 95% credible intervals.

Discussion

Our results demonstrate that jointly modeling nutrient-productivity variables effectively integrates nutrient-productivity and landscape-based regression approaches to understand macroscale drivers of water quality, while simultaneously accounting for dependence among indicators. This approach also easily accommodates missing nutrient-productivity data which allows for the inclusion of lakes into an analysis that otherwise may have been excluded. To date, jointly modeling nutrient-productivity variables is rarely performed. One exception is Cha et al. (2016) who jointly modeled TN and TP to examine N and P limitation in aquatic systems. Their work focused on the spatial and temporal dynamics of N and P limitation, but did not include landscape predictors nor decompose correlation structure. They emphasized the utility of jointly modeling nutrients to further understanding of potential (de)coupling of nutrients across space and time. Our results also highlight the gain in predictive performance that is achieved by jointly modeling nutrient-productivity variables, which is important given the wide-spread use of predictions from limnological models to help inform management decisions (Jones and Bachmann 1976). Specifically, there is a substantial improvement in the precision and accuracy of predictions that is achieved by conditioning on other observed nutrient-productivity variables.

The effects of landscape predictors on nutrient-productivity variables were as expected, and similar to those reported by other studies that examined landscape drivers of lake nutrients across large spatial extents (Wagner et al. 2011; Read et al. 2015; Collins et al. 2017). In addition, we found the effects of landscape predictors to vary spatially (i.e., between subregions), which suggests regional differences in the dominant drivers of, and their effects on, lake nutrients and productivity. Spatially varying effects of landscape predictors on lake nutrients and productivity have been previously identified (Soranno et al. 2014). For example, using spatially varying coefficient models, Fergus et al. (2016) explicitly accounted for spatial heterogeneity in the effects of TP and water color when predicting CHL. Accounting for spatial differences in the effects of predictors on CHL improved model fit and predictive performance compared to models that did not allow predictor effects to vary over space (Fergus et al. 2016). Our results support this notion of the importance of incorporating spatially varying or regionally specific coefficients. For example, predictive performance decreased when using the model fitted to LAGOS-NE to predict lakes in the Midwest or Northeast subregions compared to using the region-specific models to make predictions.

We observed relatively large, positive residual correlations among TP, TN, and CHL across all analyses. In the case of TP and TN, these positive residual correlations may be the result of coupled biogeochemical cycles and the response of algal communities to increased nutrient loading (Schindler 1978; Cha et al. 2016). Cha et al. (2016) also observed large correlations between N and P in Finnish lakes, which they concluded were indicative of similar rates of N and P biogeochemical cycles. In addition, a relatively large negative residual correlation was observed between Secchi disk depth, TP, TN, and CHL. This was expected, as Secchi disk depth is generally negatively correlated with nutrients and CHL (Canfield and Bachmann 1981). Interestingly, however, there was spatial variability in the residual correlation among NO3-N and other nutrient-productivity variables. For example, the residual correlations of NO3-N with other variables were near zero in the Northeast subregion and positive in the Midwest subregion. These regional differences in NO3-N residual correlations could be the result of differences in the strength of the coupling between the N and carbon biogeochemical cycles—and this coupling may play a larger role in northeastern lakes through stoichiometric controls and microbial processes (Taylor and Townsend 2010). Differences in atmospheric chemistry, nutrient processing dynamics of dominant land cover types, and lake internal processing may also contribute to the apparent decoupling of NO3-N with other nutrients in Northeast, and in particular with TN, compared to Midwest lakes (Bernhardt et al. 2002; Goodale et al. 2003). Understanding this potential (de)coupling of nutrients across environmental gradients is important within the context of global change. For example, extreme temperature events may result in the decoupling of some biogeochemical cycles through altering microbial processes (Mooshammer et al. 2017). These results suggest that further investigations into spatially varying covariances among nutrient-productivity variables may be warranted in an effort to understand the implications of spatially varying covariances on model predictive performance and for understanding how (de)coupling may vary spatially and in response to global change.

In addition to relatively large positive residual correlations among TP, TN, and CHL, we also observed strong positive shared environmental correlations among these variables, and strong negative shared environmental correlations among these variables and Secchi disk depth. The positive environmental correlations reflect the positive relationship between nutrient loading and primary production (Schindler 1978). These patterns were consistent across all analyses and highlight the similarity in dominant landscape (e.g., agricultural land use) and lake-level (e.g., lake depth) drivers between our subregions that influence the observed spatial variability of key nutrients and, thus, algal biomass and water clarity. Similar to the spatial variability observed for residual correlations between NO3-N and other nutrient-productivity variables, we also observed spatial variability in the correlations due to shared environmental drivers between NO3-N and other indicators. The regional differences in NO3-N shared environmental correlations may be related to the dominant source of NO3-N to lakes in these two regions. In the Midwest subregion, the dominant source of NO3-N is from agricultural land use practices (Van Metre et al. 2016), which is also a significant source of P. This may result in similar shared environmental correlations among nutrients and biological responses (e.g., CHL concentrations). In contrast, the dominant source of NO3-N in the forested Northeast subregion is from atmospheric deposition (Aber et al. 2003), which may reduce the shared environmental correlations of NO3-N with landscape-derived nutrients (e.g., P). For NO3-N, shared environmental correlations were closer to zero in the Northeast subregion and positive in the Midwest subregion. These patterns also partly reflect the fewer number of landscape-based predictors that were important for predicting nutrients in the Midwest compared to the Northeast subregion—with lake depth, stream density, and agricultural land use playing important roles as a drivers of nutrients and lake productivity in the Midwest. Conversely, a more diverse set of predictors were important in the Northeast subregion, suggesting more than just agricultural inputs and internal lake processing are potentially driving nutrients and productivity in those lakes.

Summary

Understanding the dominant drivers of lake nutrients and productivity, the coupling of biogeochemical cycles, and the use of empirical models to predict water quality is necessary to help guide management and conservation of lake ecosystems. This is particularly the case when examining lakes across macroscales, since some of the primary stressors of freshwater ecosystems operate across large spatial extents. For example, understanding how land use and climate change and the increased human demands on freshwater systems will affect water quality at regional, continental, and global scales is of increasing importance (Woodward et al. 2010). In fact, changes in environmental factors—such as increasing temperatures—may interact with increased nutrient loading to exacerbate the symptoms of eutrophication (Moss et al. 2011). Jointly modeling nutrient-productivity variables provides a useful analytical framework for increasing knowledge of environmental drivers, the coupling of nutrients across space and time, and improving limnological predictions at macroscales.

Acknowledgments

We thank the Continental Limnology team for discussions that helped improve this work. We also thank Pat Soranno, Emily Stanley, and Craig Stow for comments on an earlier draft that improved this manuscript. This research was funded by the National Science Foundation (EF-1638679; EF-1638554; EF-1638539; and EF-1638550). Use of trade names is for identification purposes only and does not imply endorsement by the US government.

    Conflict of Interest

    None declared.