Updated methods for global locally interpolated estimation of alkalinity, pH, and nitrate

We have taken advantage of the release of version 2 of the Global Data Analysis Project data product (Olsen et al. ) to refine the locally interpolated alkalinity regression (LIAR) code for global estimation of total titration alkalinity of seawater (AT), and to extend the method to also produce estimates of nitrate (N) and in situ pH (total scale). The updated MATLAB software and methods are distributed as Supporting Information for this article and referred to as LIAR version 2 (LIARv2), locally interpolated nitrate regression (LINR), and locally interpolated pH regression (LIPHR). Collectively they are referred to as locally interpolated regressions (LIRs). Relative to LIARv1, LIARv2 has an 18% lower average AT estimate root mean squared error (RMSE), improved uncertainty estimates, and fewer regions in which the method has little or no available training data. LIARv2, LINR, and LIPHR produce estimates globally with skill that is comparable to or better than regional alternatives used in their respective regions. LIPHR pH estimates have an optional adjustment to account for ongoing ocean acidification. We have used the improved uncertainty estimates to develop LIR functionality that selects the lowest‐uncertainty estimate from among possible estimates. Current and future versions of LIR software will be available on GitHub at https://github.com/BRCScienceProducts/LIRs.

The locally interpolated alkalinity regression (LIAR) method and software was developed to estimate A T globally from other measurable seawater properties (Carter et al. 2016b). The original application for the method was providing A T estimates as a second carbonate parameter for use with data from the emerging network of biogeochemical floats that measure pH (Johnson and Claustre 2016;Wanninkhof et al. 2016). However, LIAR may also prove useful for studies or models interested in estimating a climatological A T baseline with limited variability or deviations from such a baseline (e.g., Carter et al. 2016a).
Locally interpolated nitrate regression (LINR) and locally interpolated pH regression (LIPHR) are primarily intended to provide cross-comparisons for nitrate (N) and pH sensor measurements that can be used to assess potential float sensor errors or measurement drifts. Profiling biogeochemical floats cannot typically be retrieved for sensor recalibration, so it is important to have independent means to assess such problems that may arise during or after float deployment. A common approach to this problem is to use known atmospheric, surface, or climatological concentrations (Takeshita et al. 2013;Bushinsky et al. 2016;Plant et al. 2016) to recalibrate sensors, but such known values are not always available for N and pH. LINR and LIPHR are designed to provide estimated values in the stable 1000-2000 m depth range of the ocean as alternatives. All three locally interpolated regressions (LIRs) have secondary scientific applications when A T , N, or pH estimates are desirable and some seawater property information is available.
By default, LIRs have the limitation that they are unable to capture changes in the relationships between the estimated properties and the predictor properties. An example of such an unresolved change comes from the influence of ocean acidification (OA), the effect of continually increasing ocean storage of anthropogenic carbon dioxide (CO 2 ) on seawater pH. LIPHR contains an option to adjust for the effects of OA on pH, but we expect OA induced pH changes to result in LIPHR estimates becoming less skillful over time even when this adjustment is used because the adjustment does not account for regional or temporal variations in the rate of OA. All three LIRs are expected to be most skillful at reproducing measurements below the ocean surface where the effects of OA and other changes are smaller, or for estimates made close in time and space to the measurements used to train the LIRs. Another limitation of these algorithms is that they break down any time relationships between predictors and the estimated properties become significantly nonlinear. An example of a region where estimate skill would be expected to be diminished by this limitation would be on the margins of O 2 deficient zones where the influences of both denitrification and aerobic respiration can be important.
Regressions for estimating pH, N, and A T have been reported numerous times. A T regressions are the most common variant (e.g., Millero et al. 1998;Lee et al. 2006;Alin et al. 2012;Bostock et al. 2013;Sasse et al. 2013;Velo et al. 2013;McNeil and Sasse 2016) with regressions for pH being less frequently reported (e.g., Juranek et al. 2011;Alin et al. 2012;Williams et al. 2016) and nitrate regressions being even less frequently reported still (e.g., Williams et al. 2016;Supporting Information). The LIRs presented here make improvements over earlier versions with respect to global applicability, ease of use, and the ability to scale uncertainty estimates based on input uncertainties. Critically, they also produce estimates that reproduce pH measurements at least as skillfully as earlier versions. The bulk of the improvement results from the larger quantity and span of data available through the Global Data Analysis Project version 2 (GLO-DAPv2) data product (Olsen et al. 2016) than was available to train earlier methods. A similar method to the LIRs developed recently is the "carbonate system and nutrients concentration from hydrological properties and oxygen using a neural-network" (CANYON) approach (Sauzède et al. 2017). CANYON was also trained using the GLODAPv2 data product and is capable of estimating pH, A T , silicate (Si), N, total dissolved inorganic carbon (C T ), and pCO 2 globally from O 2 , temperature, salinity (S), latitude, longitude, depth, and day of year. We expect the LIRs we propose here will provide complementary estimates to those provided by CANYON for most applications, and note that the LIRs presented here do not require O 2 and temperature as measurement inputs.
In the remainder of this article, we describe version 2 of the LIAR software (LIARv2) in the context of the improvements relative to version 1 (LIARv1: Carter et al. 2016b), and extend the LIR approach to nitrate and in situ total scale seawater pH estimates with LINR and LIPHR. Particular attention is paid to new procedures required to address complications with extending the LIR framework to pH measurements.

Summary of LIR methods
As with LIARv1, the LIR methods developed here use regression coefficients that are determined at each location on a 58 latitude and longitude grid with 33 depth surfaces (44,957 total locations). Each set of regression coefficients is determined using a robust multiple linear regression (MLR) of the subset of measurements from the global training dataset that are found within a volume defined by latitude, longitude, and depth/density windows of the grid coordinates (the same grid used by Carter et al. 2016b). The windows used are 58 for latitude, (108=cos ðlatitudeÞ) for longitude, and either 0.01 kg m 23 for potential density or 50 m for depth (whichever is more inclusive). The dimensions of these windows are iteratively scaled by a factor of the iteration number until at least 100 measurements are selected to train each regression. When generating estimates, the LIAR software then interpolates between regression coefficients specific to these grid locations to arbitrary locations where the user desires regression estimates. LIARv2 works with 16 different combinations of the predictor variables: salinity S, potential temperature h, nitrate N, apparent oxygen utilization (AOU), and silicate (Si). LINR uses the same combinations as LIAR with phosphate P in place of N in the eight regressions that included N. LIPHR uses the same predictors as LIAR, but also includes depth (z) in meters as a predictor. This additional predictor is intended to allow for the effects of pressure on in situ pH. The specific combinations of variables used are indicated in "LIARv2," "LIPHR," and "LINR" sections. A full description of the LIARv1 method is provided by Carter et al. (2016b). In this update, we focus on how LIARv2, LIPHR, and LINR adapt and improve upon the LIARv1 methods.
In some instances where spectrophotometric pH measurements are unavailable, we use in situ total scale pH as calculated from A T and C T . These calculations were made with carbonate constants from Lueker et al. (2000), borate dissociation coefficients from Dickson (1990), total borate from Lee et al. (2010), and HF dissociation constant (KF) from Perez and Fraga (1987). Calculations are performed using the CO2SYS for MATLAB routine by van Hueven et al. (2011).

Data products used to train and test LIRs
The primary improvement in LIARv2 relative to LIARv1 stems from regression coefficients having been re-estimated using the GLODAPv2 data product. All measured and calculated values in GLODAPv2 were used except those from 161 cruises (40,303 measurements) that had A T quality control (QC) adjustments of 6 10 lmol kg 21 or greater, were flagged as poor data, or were not quality controlled for A T (Olsen et al. 2016). The new training data set is comprised of 236,852 A T measurements and A T estimates from CO 2 -calculations based on other CO 2 parameters, 211,704 of which had the property measurements required for training all 16 regressions (Fig. 1). The LIAR test data set omits the 2279 calculated A T values that are included in the training data set. We use the coefficient re-estimation strategy used by Carter et al. (2016b) to allow overlap between our training and test data sets without compromising the validity of the assessments (described in "Assessment" section).
LINR regression coefficients were estimated using 684,475 N measurements, 569,761 of which had associated property measurements required for training all 16 regressions. This training dataset is all GLODAPv2 data product N measurements excepting those from 187 cruises that had multiplicative adjustments greater than 10%, that were not QC'd, or that were flagged as having poor quality measurements. GLODAPv2 QC protocols changed reported negative N values to 0 lmol kg 21 . The LINR code does likewise. The LINR test data set is identical to the training data set.
There are several additional difficulties for constructing a consistent data product for training LIPHR that originate from changes in ocean pH and in pH measurement practices over time. Dealing with these inconsistencies requires understanding several adjustments that we and others (Olsen et al. 2016) have made to pH measurements and estimates. We list these adjustments here and explain them in this section and the next.
1. GLODAPv2 adjustments: These are recommended adjustments to cruise pH, A T , and C T measurements based on deep crossovers (Olsen et al. 2016). We do not use these adjustments for pH, though we do use them for A T and C T . 2. Impure-dye adjustments: These are adjustments to pH measurements that we make for pH values measured using impure dye (i.e., commercially available indicator dye that has not been specially purified). These adjustments are intended to bring these values in line with pH calculated from A T and C T . They are detailed below. 3. Calculation-to-purified-dye-pH adjustment: This is a single adjustment we apply to impure-dye measurements (after they have been adjusted by the impure-dye adjustment) and to calculated pH values. This adjustment is intended to bring these values in line with pH measurements made with purified dyes. LIPHR includes optional code to apply the inverse of this adjustment to returned pH estimates if the user desires estimate that match what would be expected for pH calculations from A T and C T . This adjustment is also detailed below. 4. OA adjustment: This is an optional adjustment applied to LIPHR pH estimates to reflect the impacts of ongoing OA on seawater pH (detailed in "An OA adjustment for pH estimates" section).
The primary additional difficulty for pH stems from the variety of ways pH is measured or calculated, as well as the evolution of accepted best practices for pH measurement over the decades for which GLODAPv2 contains data. GLO-DAPv2 contains a mixture of pH calculated from carbonate system measurements, pH measured using electrodes, and pH measured spectrophotometrically. Also, although the spectrophotometric pH method has been used since the early 1990s, Yao et al. (2007) revealed that impurities in the indicator dye used can significantly bias spectrophotometric pH measurements, and Liu et al. (2011) subsequently published calibration equations that allow seawater pH measurements to be made using purified m-cresol purple dye. Others (Carter et al. 2013;Patsavas et al. 2015;Williams et al. 2017) have since shown that measurements with purified dyes appear to have an (unexplained) broadly consistent-but-pHdependent discrepancy from the pH calculated from combinations of A T , C T , and pCO 2 whether calculated at in situ or laboratory conditions (Fig. 2c). This pH dependent discrepancy is not unique to a single pH sample handling approach, as it exists for both manual and automated pH measurements. It exists also for multiple carbonate constant sets (Carter et al. 2013). It exists for multiple characterizations of the properties of purified dyes: there is a small pHdependent discrepancy between spectrophotometric pH obtained from various sets of purified dye coefficients (Liu et al. 2011;DeGrandpre et al. 2014), but the discrepancy (ranging from $ 0.006 at a pH of 8.2 to $ 0.002 at a pH of 7.4) is too small to account for the differences between calculated pH and pH measured with purified dyes. The pHdependent pH discrepancy is less apparent for electrode pH measurements ( Fig. 2a) and impure dye measurements ( Fig.  2b) considered collectively across many cruises. However, there are many strongly differing discrepancy relationships visible when impure dye measurements are considered on a cruise-by-cruise basis (see Supporting Information Figures), with some discrepancies increasing and some decreasing with pH. It should be noted that Fig. 2c includes no measurements from the subset of research groups that produced impure dye measurements showing a relationship between the pH discrepancy and pH with a negative slope.
A second complication arises in the GLODAPv2 data product QC process. This data product relies on deep crossovers to obtain measurement adjustments intended to bring measurements from various cruises in line with one another. However, the variety of pH-dependent pH discrepancies found in various cruises casts doubt on the comparability of deep-ocean pH measured on different cruises. Adjustments based on forcing an agreement at depth between pH distributions obtained with different approaches could therefore create, exacerbate, or inadequately capture discrepancies at the surface.
Our approach to these challenges is to first divide the data into three subsets and then apply linear adjustments to the first two subsets to make them comparable to the third. The first subset is the earlier measurements made with impure dyes. The second subset is pH calculated from A T and C T . These two subsets collectively comprise the majority of the GLODAPv2 data product. The third subset is the subset of the GLODAPv2 data product where pH was measured with purified dyes. We augment the purified dye subset with 11 cruises conducted too recently to appear in the GLODAPv2 data product (Expocodes: 096U20160108, 096U20160426, 29HE20130320, 318M20130321, 320620140320, 320620151206, 33AT20120324, 33RO20150410, 33RO20150525, and 33RO20161119). We further add data from two recent cruises measured with impure dye to the impure-dye subset (33RO20130803, 33RO20131223). Data from one additional recent cruise using purified dyes along the I09N transect (33RR20160208) is withheld from the pH training data set entirely to provide a completely independent assessment ("Example section"). Linear pH-dependent adjustments (D 1!2 , adjustment 2) are applied separately to each cruise measured with impure dyes to make the pH measurements comparable to the "calculated pH" subset. The coefficients for these adjustments are determined with a robust linear regression of the pH discrepancy (measured minus calculated) against measured pH. Coefficients for these adjustments are supplied as Supporting Information. Next, a single pH dependent adjustment (D 2!3 , adjustment 3, $ 10.004 to 20.020, Fig. 2b) is applied to the combination of the second subset and the adjusted first subset to make them comparable to the third "purified-dye" subset. Theadjustment is (Fig. 2c): (1) After applying D 2!3 , the combined training pH data set has a pH-dependent pH discrepancy with calculated pH (Fig. 3).
Adjustments to the impure data are designed to take the place of the recommended GLODAPv2 adjustments (adjustment 1), and-except when noted-pH data presented herein do not include the GLODAPv2 adjustments. Supporting the decision to omit the GLODAPv2 pH adjustments, the algorithms we produce have a $ 3% smaller RMSE and 4% smaller average bias when reproducing the unadjusted data than adjusteddata-trained algorithms have when reproducing adjusted data. Our use of the purified-dye adjustment (adjustment 3) reflects our need for a consistent training data product and not any confidence that purified dye measurements are necessarily more accurate representations of the "true" seawater pH than pH calculations. The apparent pH-dependent pH discrepancy remains an unresolved challenge to our carbonate system knowledge. Our strategy is to allow LIPHR users to decide whether pH estimates specific to purified dye measurements or pH calculations with Lueker et al. (2000)'s carbonate chemistry coefficients are more appropriate for their own applications. LIPHR therefore includes an optional counter-adjustment for adjustment 3 (D 3!2 ) derived from Eq. 1 to return pH estimates that are consistent with pH calculated from A T and C T . Broadly, we recommend the default "purified dye estimates" without this counter-adjustment when pH is the parameter of interest, and "calculation-pH estimates" with this adjustment when LIPHR estimates are being used as one of two constraints to estimate another carbonate system parameter. Whichever is used, the user should be aware of this mismatch in our understanding of carbonate system chemistry.
In total, the LIPHR training data set consists of 51,325 impuredye measurements (adjusted with D 1!2 and D 2!3 ); 99,061 calculated pH values (adjusted with D 2!3 ); and 35,383 unadjusted purified dye measurements (185,769 total measurements). The test data set contains only the 35,383 purified dye measurements. These data sets exclude 416 electrode pH measurements and 14,983 impure dye measurements for which no calculated pH value was available. These totals also exclude measurements and calculations from cruises that either had GLODAPv2 pH adjustments estimated to be larger than 6 0.015 pH units, that were calculated from cruises with (applied) total dissolved inorganic carbon (C T ) or total seawater titration alkalinity (A T ) GLODAPv2 adjustments greater than 6 10 lmol kg 21 , or that were flagged as having poor quality pH measurements. When viable pH measurements and calculations were both available for a sample, only the pH measurements were included. We also omitted data from seven cruises (Expocodes: 49K619990523, 49HG19950414, 49HG19940413, 49HG19930807, 49HG19930413, 33RR19971202, 318M19940327) either because they came from series of cruises with large and variable GLODAPv2 adjustments or because the calculated and measured pH values did not agree with a 6 0.03 or less root mean squared (RMS) or 6 0.015 average difference. A full list of cruises and how they were classified is provided in Supporting Information.
An OA adjustment for pH estimates Johnson et al. (2017) find that recent profiling float sensor pH measurements are significantly lower than most nearby pH stations in the GLODAPv2 record, and that these disagreements are largest in the better-ventilated surface ocean. LIPHR includes an optional adjustment (on by default) to reflect these expected effects of OA on modern and future seawater pH (adjustment 4). For this adjustment, the rate of pH change (c OA ) is approximated from the robust regression: This is a regression between the reconstruction error (pH TestDat 2pH LIPHR ) as the dependent variable and the difference (D TestData 2D TrainingData ) between the mean decimal years of the training measurements used to estimate the regression coefficients (D TrainingData ) and the decimal years of the test data (D TestData ) as the independent variable. The term "decimal years" is used to mean the year (C.E.) with a decimal added to represent the fraction of 365 d elapsed in that year (such that a measurement on the 200 th day of 2020 would be represented by $ 2020.55). This regression has been performed for the reconstructions of 10 subsets of the GLODAPv2 data product used separated by every 10 th percentile of potential density (r h ) (Fig. 4). If the OA adjustment is enabled in the LIPHR code, c OA is linearly interpolated to the r h estimated for the query data location and the adjusted LIPHR estimate (pH Ã LIPHR ) is supplied as: The OA pH change rates we find here are consistent with previous estimates (e.g., Feely et al. 2009). These simplistic OA adjustments may be poor estimates of the impacts of OA on seawater pH generally because they treat all water of a given density identically despite strong regional differences in the degree of water mass ventilation and C anth storage. Nevertheless, we believe the optional adjustment is useful for LIPHR pH estimates made in the coming decades, and note that including the adjustment decreases mean estimate bias by 85% and RMSE by $ 51%. Due to the progressive effects of OA, we contend this adjustment will be yet more important for modern estimates than for our test data set. Limited experimentation suggested additional cruises would be needed to adequately constrain regional differences in this adjustment. The LIPHR code therefore contains an option for users to input c OA estimates that are specific to the OA rates found in their study regions, if desired. The assessment values we report in "Assessment" section include the OA adjustment.

Update to uncertainty estimation
The LIRs generate uncertainty estimates for each property estimate returned. As with LIARv1, uncertainty estimates (E Est ) are quantified as: E terms refer to the RMS uncertainties as assessed in the "Assessment" section. E Meas represents A T , N, and pH measurement uncertainties in our data product, and is assumed to be a constant 2.8 lmol kg 21 A T , 0.3 lmol kg 21 N, and 0.005 pH units, respectively (Olsen et al. 2016). U j are the n input uncertainties for the predictor properties provided by the user, or default uncertainties if no U values are provided. The default uncertainties are now 0.005 for S, 0.0058C for h, 1% O 2 , and 2% of N, P, and Si. The a j terms are the n regression coefficients used in the estimate. E MLR represents the component of the overall uncertainty inherent to regression based estimates. It is estimated for LIR outputs using estimates of E MLR that are specific to each of the 16 equations and to 10 depth ranges (for N and pH) or 50 ranges of depth and S (for A T ). These ranges correspond to every 10 th percentile of depth and/or salinity in the training data product (with a single range spanning the 20 th through 80 th percentile of salinity). The E MLR estimates for these ranges are obtained by solving Eq. 4 for E MLR using assessment data with known E Est . These range-specific E MLR estimates are then interpolated by these properties to the depth and/or Fig. 4. The average annual rate of OA-related impacts on LIPHR estimate errors (c OA ) calculated for every 10 th percentile of potential density (r h ) in the GLODAPv2 data product. If the optional OA adjustment is used (Eq. 3), LIPHR uses user-provided dates with this relationship to adjust estimates it returns for the effects of OA. The green envelope indicates 95% confidence intervals of the fits. The blue envelope shows the larger confidence intervals obtained if one degree of freedom is assumed for each cruise rather than each measurement. Values in this figure are calculated using regression 7 (of the 16 regressions LIPHR can employ, see Table 2). Values for the other 15 regressions would be within $ 6 0.0005 yr 21 of these.
salinity inputs for the E Est calculations. LINR and LIPHR errors also scale slightly with salinity, but not as strongly as LIAR errors do because of the smaller impact of freshwater cycling on N and pH than on A T . All LIR uncertainties increase near the surface due to a larger impact of seasonality, episodic biogeochemical cycling, and gas exchange.

Minimum uncertainty estimates
One difficulty with LIRs is choosing between up to 16 possible estimates. We have added (optional, on by default) functionality to all LIR routines that automatically picks the estimate with the smallest estimated uncertainty from among all estimates it is possible to generate using the suite of input predictor data provided by the user. This feature is intended in part to address a limitation of the method, being that some LIR equations have too many terms (i.e., are overfit) for some of the > 2 million combinations of predicted variables, predictor variables, and grid locations. Over-fitting leads to larger-magnitude regression coefficients due to "Variance Inflation." Larger magnitude coefficients (a j ) propagate through Eq. 4 to return larger uncertainty estimates. Once the increase in E Est from having more and largermagnitude coefficients (i.e., from over-fitting) balances the typically lower E MLR values for the equations with more terms, this functionality automatically selects the less complex and less over-fit equation. This feature therefore selects an equation that minimizes overall error from over-fitting, input uncertainties, and method errors generally. This option modestly decreases estimate RMSE by 0-11% and, more importantly, makes the function easier to use without compromising estimate skill. The estimate improvement becomes more marked with (known) larger input uncertainties such as those that will be common with sensor measurements. For example, the A T estimate RMSE improvement with this feature increased from 3% to 10% after simulated errors were applied to AOU (these were normally distributed offsets with a mean of 0 and a standard deviation of 5 lmol kg 21 O 2 ).

Assessment
Estimate bias and RMS errors are calculated in the same way as the error estimates provided by Carter et al. (2016b), except using the subsets of the GLODAPv2 data product and additional cruises specified as "test data" sets in "Data products used to train and test LIRs" section. These values are presented as "bias (6 RMSE)." The bias is the mean residual for the assessment and can be positive or negative. LIR bias estimates are small compared to RMSE at the global level, suggesting the LIR estimates are appropriately centered on the measured values. However, bias grows (in an absolute sense) as the number of measurements averaged decreases, so the bias estimates are presented alongside RMSE as potentially useful indicators of how correlated LIR errors are for various regions. Bias estimates are also useful when comparing assessments from various algorithms. In particular, lower biases for LIPHR than for other pH algorithms highlight the  Table 1. Error estimates expressed as "bias (6 RMSE)" with units lmol kg 21 for the subset of our data product found within the open-ocean salinity range of 33-38. E MLR is uncertainty inherent to the use of a MLR approach, E Input is error arising from uncertainties in the input data (i.e., the summed term in Eq. 4), E LIARv2 is the overall estimate uncertainty for LIARv2. GLODAPv2 data product is used as test data for all estimates. Errors are expressed as standard errors in lmol A T kg 21 .

Reg. #
Parameters used importance of the OA adjustment and the dye-impurityrelated adjustments applied to the training data set. An important feature of the error estimation method used is that a separate set of regression coefficients is estimated for each data point in our test data sets, and is estimated without using any data from the cruise that produced that particular test pH value. Data from the same cruise is omitted to avoid under-estimating error by including numerous measurements in the training dataset found proximally in time and space to the test measurement.

LIARv2
The updates to LIAR decreased the overall reconstruction errors (E LIARv2 ) for all 16 regressions relative to E LIARv1 by 7-26% (average 18%) when both sets of errors are calculated using the newer test dataset. The largest improvements are for regressions with the fewest predictors. We attribute the majority of the improvements to the increased size, quality, and consistency of the subset of the GLODAPv2 data product we used relative to the merged data product we used for LIARv1 (Fig. 5). LIARv1 compared favorably to regional A T regressions in literature (many are compared in Carter et al. 2016a,b) and Table 1 shows LIARv2 does somewhat better still. CANYON A T estimates reproduce our entire test dataset with errors of 20.2 (6 5.4) lmol kg 21 while LIARv2 (Regression 7) has errors of 20.1 (6 5.1) lmol kg 21 . These errors are slightly smaller at 20.5 (6 5.2) lmol kg 21 for CANYON and 0.2 (6 4.4) lmol kg 21 for LIARv2 when limited to the open ocean test regions used by Sauzède et al. (2017).
Interestingly, regression 3 (S, h, AOU, and Si) slightly outperforms regression 1 (S, h, N, AOU, and Si) on average, and there is little difference between the error estimates for the various equations for A T . This suggests that regression 1 and possibly others are over-fitting A T in places (this observation does not hold true if we include the test data in the training Table 2. LIPHR error estimates expressed as "bias (6 RMSE)" for the subset of our data product found within the open-ocean salinity range of 33-38. E MLR is the uncertainty inherent to the use of a MLR approach, E Input is error arising from uncertainties in the input data (i.e., the summed term in Eq. 4), and E LIPHR is the overall estimate uncertainty. E LIPHR2000m is the uncertainty estimate for pH measurements between 1000 m and 2000 m, or the approximate depth range at which biogeochemical floats will require pH estimates for cross-comparison.

Reg. #
Parameters used  Fig. 6). We separately estimate error between 1000 m and 2000 m as these estimates are more likely to be used to compare with float data (Table 2).
LIPHR estimates compare well to the few published pH regression estimates. Williams et al. (2016) designed regression estimates for south of 458S between 2006 and 2017 and between 0 m and 2100 m depth. For the subset of our data product within these bounds and omitting their S04P and P16S training cruises, their published regressions have errors of 20.006 (6 0.017) and 20.006 (6 0.016), while similar LIPHR regressions (6 and 7, respectively) have errors of 20.001 (6 0.010) and 20.001 (6 0.011). Williams et al. (2016) also report a regression for estimates in the same region but trained specifically for estimates between 1000 m and 2100 m depth, the depth range most useful for assessment of biogeochemical profiling float sensor performance. For the relevant subset of our test data product, their algorithm has errors of 20.001 (6 0.005), while the LIPHR regression 7 has errors of 0.002 (6 0.005). LIPHR (also regression 7) estimates have errors of 0.004 (6 0.014) in the California Current Ecosystem specific window of 1148N to 1248W, 278N to 368N and 15-500 m depth after 1994 where the algorithm from Alin et al. (2012) uses temperature and O 2 measurements to generate estimates with errors of 20.008 (6 0.015). CANYON pH estimates reproduce our entire test dataset with errors of 0.009 (6 0.017) while LIPHR (Regression 7) has errors of 0.000 (6 0.010). At mid depths (1000-2000 m), these estimates are 0.013 (6 0.017) for CANYON and 0.000 (6 0.006) for LIPHR. The CANYON Table 3. LINR error estimates expressed as "bias (6 RMSE)" with units lmol kg 21 for the subset of our data product found within the open-ocean salinity range of 33-38. E MLR is the uncertainty inherent to the use of a MLR approach, E Input is error arising from uncertainties in the input data (i.e., the summed term in Eq. 4), and E LINR is the overall estimate uncertainty. E LINR2000m is the uncertainty estimate for N measurements between 1000 m and 2000 m, or the approximate depth range at which biogeochemical floats will require N estimates for cross-comparison.  error estimates are the same at this precision when the GLODAPv2 adjustments are retained.

Reg
LINR LINR estimates also reproduce the test data product well (Table 3; Fig. 7). Williams et al. (2016) provide an N estimation algorithm specific to the Pacific sector of the Southern Ocean south of 458S between 1000 m and 2100 m. This algorithm has errors of 0.42 (6 0.65) lmol kg 21 for the portion of our data product in the target region for this regression. LINR (Regression 7) has an error of 20.11 (6 0.45) lmol kg 21 for this same subset. CANYON nitrate estimates reproduce our entire test dataset with errors of 20.01 (6 0.89) lmol kg 21 while LINR (Regression 7) has errors of 20.02 (6 0.86) lmol kg 21 . These errors are slightly smaller at 0.03 (6 0.66) lmol kg 21 for CANYON and 20.02 (6 0.65) lmol kg 21 for LINR when limited to the open ocean test regions used by Sauzède et al. (2017).

Uncertainty estimation skill
With the changes to the error estimation strategy noted in "Update to uncertainty estimation" section, the overall standard error estimates provided by the software are now greater than or equal to the test data set reconstruction error for 76% of the data product for LIARv2, for 75% for LIPHR, and for 80% for LINR. For perfectly estimated normally distributed RMS uncertainties, this number would be 68%. This was true for 87% of the data product with LIARv1.

Example section
Example LIAR, LIPHR, and LINR estimates are derived from hydrographic measurements from the 2016 occupations of the I09 section in the Indian Ocean by the Global Ocean Ship Based Hydrographic Investigations Program (GO-SHIP) program (Fig. 8). These estimates provide an independent validation when compared to the measurements made along the cruise because the data from these cruises were not included in either the test or training datasets for the LIRs. The LIRs do an excellent job of reproducing the measurements with errors of 20.6 (6 4.2) lmol kg 21 for A T , 0.001 (6 0.008) for pH, and 0.14 (6 0.32) lmol kg 21 for N. LIPHR errors increase to 20.014 (6 0.017) when the OA adjustment is omitted.

Future directions
Climatological distributions of carbonate parameters from LIAR A T and LIPHR pH-or calculated from this pair of properties-may be of interest and would be simply calculated for the measurement-dense World Ocean Atlas climatology (Locarnini et al. 2013;Zweng et al. 2013;Baranova 2015) or similar products. Such a regression-based climatology-like the A T climatologies created by Lee et al. (2006) and used by Takahashi et al. (2014)-would be one step further removed from the measurements than gridded climatologies like those provided by Lauvset et al. (2016) and Key et al. (2004). However, it would have the advantage that it could be based on property measurements (such as O 2 , S, and temperature) that are more numerous, more broadly spatially and temporally distributed, and less seasonally biased than the carbonate measurements.
With LIAR and LIPHR, it is now possible to estimate two parameters for the carbonate system, thus-in principleproviding a complete carbonate system description. While measurements would be preferable for most applications, this pair of algorithms allows additional context to be added to historical data products.
As Velo et al. (2013) pointed out, regressions can be potentially powerful tools for data QC. An algorithm that uses many measured properties to estimate many other measured properties and then assesses the various residuals may provide a fast method for identifying apparent outliers and interesting anomalies in property measurement sets. Such automated measures designed to assist human-QC efforts may be of increased importance as growing sensor networks increase the quantity of data being produced relative to the amount of human-effort available for data QC.
The OA rate estimation strategy used (Eq. 2) provides a means to incorporate a large number of measurements that are disparate in space and time into unified global trend estimates. This framework could perhaps be applied to examine the low-signal-to-noise scientific questions of whether long term trends are occurring in A T (c.f. Carter et al. 2016a), N, or O 2 relative to other measured parameters.