Locally interpolated alkalinity regression for global alkalinity estimation

We introduce methods and software for estimating total seawater alkalinity from salinity and any combination of up to four other parameters (potential temperature, apparent oxygen utilization, total dissolved nitrate, and total silicate). The methods return estimates anywhere in the global ocean with comparable accuracy to other published alkalinity estimation techniques. The software interpolates between a predetermined grid of coefficients for linear regressions onto arbitrary latitude, longitude, and depth coordinates, and thereby avoids the estimate discontinuities many similar methods return when transitioning from one regression constant set to another. The software can also return uncertainty estimates scaled by user‐provided input parameter uncertainties. The methods have been optimized for the open ocean, for which we estimate globally averaged errors of 5.8–10.4 μmol kg−1 depending on which combination of regression parameters is used. We expect these methods to be especially useful for better constraining the carbonate system from measurement platforms—such as biogeochemical Argo floats—that are only capable of measuring one carbonate system parameter (e.g., pH). It may also provide a useful way of simulating alkalinity for Earth system models that do not resolve the tracer prognostically.

An emerging strategy for monitoring the ocean carbon cycle involves using sensors on profiling floats. The primary advantages of this strategy are significant cost savings relative to shipboard measurements and the possibility of extending data coverage to regions and seasons that ships cannot routinely access with current resources. However, while float-capable sensors can now measure several biogeochemical properties including oxygen (O 2 ) and nitrate (N) (e.g., Johnson et al. 2010), float sensors can only currently measure one of two carbonate system parameters required to constrain the carbonate system in seawater. Ion-Sensitive Field Effect Transistor (ISFET)-based sensors now allow pH measurements on moorings, autonomous underwater vehicles (AUVs), and profiling floats (e.g., Johnson et al. 2012, unpubl.;Bresnahan et al. 2014;Talley et al. 2014;Schuller et al. 2015). However, while options exist for moor-ings (Sutton et al. 2014;Fassbender et al. 2015), inexpensive, small volume, low-power draw, fast response time, reagentfree, and pressure-tolerant sensors are not yet available for profiling float measurements of total seawater titration alkalinity (A T ) , total dissolved inorganic carbon or (C T ), or partial pressure of CO 2 (pCO 2 ).
The need for additional carbonate system constraints led many (e.g., Millero et al. 1998;Lee et al. 2006;McNeil et al. 2007;Alin et al. 2012;Sasse et al. 2012;Bostock et al. 2013;Velo et al. 2013) to regress A T data measured on hydrographic cruises against measurements of other seawater properties. The regression constants obtained allow A T to be later estimated from other property measurements where A T measurements are not also available. A T is an ideal carbonate system parameter to estimate in this manner and use with pH for several reasons: it is nearly orthogonal to pH as a constraint for the carbonate system; it mixes linearly and is unaffected by temperature, gas exchange, or the continuing ocean uptake of anthropogenic carbon; and it varies predictably and linearly with other seawater properties. A T has similar measurement uncertainties to C T and pH (Bockmon and Dickson 2015).
A T regression estimates have advanced since Millero et al. (1998) showed that A T could be estimated across large Additional Supporting Information may be found in the online version of this article.
regions of the surface ocean from simple regressions with high accuracy, and similar regressions have been used to simulate A T distributions in Earth system models (e.g., Galbraith et al. 2015). Scientists have worked to develop regressions for new regions and to incorporate new A T measurements into regression estimates. Sasse et al. (2012) and Velo et al. (2013) recently showed that superior fits can be obtained using neural networks or self-organizing maps that divide measurement sets into optimized "neurons" instead of regions, where a neuron can be thought of as a collection of measurements that share similarities in their meta, physical, or chemical data. This approach has the advantage of eliminating the need for arbitrarily prescribed regional boundaries, but the disadvantage of needing to optimize the arbitrary number of allowed neurons.
We argue boundaries between regions and between neurons limit the usefulness of the A T estimates obtained from some of these methods and are unnecessary. One can imagine a float drifting from one region to another, or transitioning into a new neuron when measuring across a thermocline. In these cases, A T estimates will show a discontinuity where the transition between one set of regression coefficients to another occurs. A simple example is the boundary between the Pacific and other sectors of the Southern Ocean where the estimates returned by equations from Millero et al. (1998) change by 9 lmol kg 21 . A neuron transition can also happen over time at a fixed location provided there is warming or freshening. These discontinuities could show up as abrupt and spurious changes in the C T calculated from, for instance, measured pH and estimated A T . We therefore argue that A T estimate consistency is at least as important as A T estimate accuracy. Lee et al. (2006) recognized this limitation and forced second order polynomials for sea surface temperature and salinity, applied to regimes of both physical and property space, to return identical estimates at regime boundaries. This approach has the drawback of biasing regression fits away from the values that return the smallest residuals.
We present methods for estimating A T and associated uncertainty globally at all ocean depths. These methods are similar to the 3DwMLR method advocated by Velo et al. (2013), which circumvents the need to carve datasets into regions by considering a window around each estimate to be its own region. We take the 3DwMLR approach a step further by recognizing that linear coefficients can be linearly interpolated, or triangulated in three dimensions, onto any given location of interest. This ensures that transitions between regression coefficient sets happen smoothly from location to location. This aspect of our approach is very similar to the recently published methods for estimating carbonate mineral saturation in the North Pacific by Kim et al. (2015). We call our approach LIAR, for "locally interpolated A T regression." The traditional meaning for our acronym serves as a reminder that the A T values generated are only estimates, not measurements.
In the "Procedures" section, we detail the methods we use to generate regression coefficients. Then we detail how one can estimate A T from LIAR coefficients. In the "Assessment" section, we estimate the uncertainty of LIAR A T estimates and how estimate uncertainty varies with changing inputs and input uncertainties. In the "Discussion" section, we enumerate the advantages of LIAR strategy with respect to scope, convenience, consistency, and accuracy over similar estimation strategies.

Procedures
The merged data product We merged the PACIFICA (Suzuki et al. 2013), GLODAPv1.1 (Key et al. 2004), and CARINA (Velo et al. 2009)  datasets, and then eliminated duplicates of bottle measurements that appear in more than one of these products. We eliminated any data flagged with a quality control code corresponding to "bad" or "questionable" for all of our regression parameters. We removed data collected before 1990-or the approximate advent of seawater reference materials which were later certified for A T (Dickson et al. 2003)-for the data product we use to derive regression constants, although we retain this data in a separate data product and use that set for regression error estimate calculations in "Assessment" section. We omitted data from the PACIFICA dataset falling along the Ocean Station Papa line because some of the station profiles disagree with neighboring A T profiles at depth (> 3000 m) by as much as 100 lmol kg 21 . We also removed measurements from GLODAPv1.1 stations 24048 through 24065 because the station profiles appeared noisy. Finally, we removed data from CARINA stations 11000 through 11013 because these Southern Ocean station profiles disagreed with neighboring profiles in GLODAPv1.1 (there were no neighboring profiles in CARINA). We are left with a merged data product with 204,110 sets of A T measurements and other regression parameters. The locations of stations at which we have data are mapped as x's in Fig. 1.
We select a subset of the data product to use for each regression at each coordinate set. To prevent using data from seawater measured on opposite sides of Central America or the Bering Strait in a single regression, we exclude data in the Arctic and Atlantic Oceans when estimating regression constants outside of these oceans and exclude data from outside of these oceans when estimating regression constants within them. Exceptions are that data in the Southern Atlantic, south of 08 N, is never excluded for this reason and no data is excluded for this reason for regression constant estimates in the Southern Atlantic. The latitude-longitude polygons we use for these basins are provided as Supporting Information. We also exclude all data with a salinity less than 30 to ensure our open ocean coefficients are not overly biased by river water. Finally, we exclude all data not within latitude, longitude, depth, and potential density r h windows of our measurement values. Window sizes W are given by criteria (1-4): Data are included if they satisfy criteria (1), (2), and either (3) or (4). We iteratively increment the integer i whenever exclusion criteria result in fewer than 100 viable measurements. We divide by the cosine of the latitude in (2) to maintain an approximately constant window width at all latitudes.
Once we have selected a subset of our merged data product, we perform regressions with 16 combinations of regression parameters. Like Velo et al. (2013), we use robust linear regression, which iteratively re-estimates regression coefficients, following an initial traditional least squares estimate, by assigning smaller weights to measurements with larger residuals. The iterative outlier un-weighting step addresses inaccuracies in the assumption that measurement errors are adequately described by a normal distribution. We use a bisquare outlier test with the turning constant of 4.685 for this step, meaning data with residuals in excess of 4.685 the standard residual are given no weight. The first of the 16 regressions has all regression parameters we consider: Here a terms are regression coefficients we estimate for the subscripted properties, S is salinity, h is potential temperature in 8C, N is nitrate concentration in lmol kg 21 , AOU is apparent oxygen utilization in lmol kg 21 , and Si is total dissolved silicate concentration in lmol kg 21

:
We use apparent oxygen utilization in place of O 2 concentration because preliminary testing found this to be a slightly more powerful predictor and one that is less correlated with temperature. We do not use phosphate since these are highly correlated with N-for which sensor measurements are more common-and including both measurements would therefore risk overfitting A T . We henceforth call Eq. 5 "Regression 1." Predictors used in Regressions 1 through 16 are indicated in Table 1.
Regressions 9 through 16 do not include potential temperature because McNeil et al. (2007) found temperature terms can create large spurious surface seasonal A T estimate swings in estimates calibrated using data with incomplete seasonal coverage. a Si , a AOU , and a N are omitted from some regressions because the related measurements are frequently unavailable.

Estimating A T
The LIAR method requires two steps to estimate A T . The first step is interpolating the regression coefficients to the location of interest. Linear interpolation can be done easily once appropriate points are chosen to interpolate between, and choosing points that bound the location of interest is Carter et al.
Locally interpolated alkalinity regression made simple by our use of a regular grid. However, there are edge cases to consider when interpolating near holes in our grid (e.g., Greenland). For this reason, we use MATLAB Delaunay Triangulation 3D linear interpolation routines (see Lee and Schachter 1980) after dividing depth differences by a factor of 25 to equate 100 m depth with 48 latitude in the triangulation distance calculation. Delaunay triangulation selects nearby points that bound the location of interest while avoiding sets of points that make "skinny" polygons with small minimum interior angles. Triangulation is faster and performs less smoothing than objective mapping. We note that additional smoothing is unnecessary because of the smoothing inherent in our regression constant estimation process. The changing window sizes with latitude and depth (see "Estimating regression coefficients" section) ensure our estimates are not strongly sensitive to the choice of 25 for the depth-to-latitude conversion or to our assuming latitude differences equate to longitude differences regardless of coordinate latitude in this step. The latter of these simplifications allows us to use a single interpolant for all estimate coordinates for each regression coefficient, and greatly reduces the LIAR estimation routine computational burden. Extrapolated values outside the domain over which we have regression constant estimates (e.g., near sediments and some coasts) are set equal to the regression constants interpolated at the nearest location inside the domain. As when selecting data for a regression, we interpolate the Atlantic and Arctic Oceans separately to avoid interpolating across Central America or the Bering Strait. In the second step, the interpolated regression coefficients are used to estimate A T directly from the intended equation (e.g., Eq. 5) in the second step.

Estimate requirements
The LIAR code is written for MATLAB R2014b, with backward compatibility tested through version 2012. Computation time varies with application and machine, although our desktop 64 bit PC operating with a 2.0 GHz processor returns an average of 35,000 A T estimates per second.
LIAR estimates require any combination of the following measurements, with S being the only mandatory input: S, Si, N, O 2 or AOU, h or in situ temperature. Concentrations can be provided in molar or molal units. Temperature and O 2 are converted to h and AOU, respectively. These conversions are made using the CSIRO MATLAB seawater package version 3.1 (Morgan and Pender 2006). The (now deprecated) seawater package is used in place of updated Thermodynamic Equation of Seawater 2010 (TEOS-10) functions to ensure converted measurements are consistent with the GLO-DAPv1.1, CARINA, and PACIFICA data products used to estimate regression coefficients. Input uncertainties are an optional input; default values, corresponding to the uncertainties in Table 3 scaled to typical deep water properties, are assumed if none are provided.

Assessment
We use two variants on LIAR to assess aspects of the estimation strategy. For both variants, we estimate regression coefficients at the locations of each of our 204,110 A T measurements for which we have measurements of all regression parameters instead of at the 44,957 WOA coordinates, allowing us to bypass the interpolation step when estimating A T at the measured locations. Variant 2 also does not use measurements from a cruise as training data for regression coefficients estimated along that cruise track. For assessments with the "unmodified LIAR method," we interpolate regression coefficients from the 44,957 WOA coordinates to the 204,110 measurement coordinates. Figure 2 has two-dimensional (2D) histograms of unmodified LIAR-estimated A T against measured A T for measurements in our merged data product for regression 1 (see Supporting Information for histograms for all 16 regressions). The strong linear relationships demonstrate that the LIAR method estimates the global A T field well. R 2 statistics are given for the 16 regressions for LIAR Variant 2 estimates in Table 2 for all data in our merged data product. Variant 2 estimates, like estimates that will be made when the LIAR method is applied, do not benefit from using the measured alkalinity at the location of interest to estimate the regression constants used for that location. Variant 2 estimates are therefore a more appropriate test for the strength of the fit than unmodified LIAR estimates.
There is also no need to adjust Variant 2 R 2 values in light of the differing degrees of freedom since a larger number of predictors does not guarantee a better fit to data withheld from the regression constant estimation procedure. We caution that estimate uncertainty can increase with an increasing number of predictors when collinear regression parameters are used, but note that our uncertainty estimation procedure (detailed later) accounts for this by propagating uncertainty with the regression coefficients. Variant 2 R 2 values suggest the relative importance of non-salinity regression parameters is h < AOU < N < Si. All four parameters improve the fit.
We rank the 16 regressions by their Variant 2 R 2 values in Table 2, but note that the best regression to use depends both on which regression parameters are available and the uncertainties of the input parameter measurements. We therefore develop an approach to estimating LIAR estimate uncertainty from the bottom up. This approach can be applied to measurements of arbitrary quality, so we are able to return estimate uncertainties from user-provided input uncertainties as part of the LIAR software.
We consider four sources of error E for our bottom-up error uncertainty estimate: 1. E Alk from errors in the A T data used to fit the regression constants, 2. E Input from input parameter measurement uncertainties, 3. E MLR from the inadequacies inherent to the use of multiple linear regression to reproduce the global A T distribution, 4. and errors associated with interpolating regression coefficients E Interp .
We combine these errors as the square root of the sum of squares to produce the overall uncertainty, E LIAR : We start with an E Alk estimate of 3.3 lmol kg 21 for the uncertainty of the A T measurements in our merged data product (Velo et al. 2009). This is the minimum possible uncertainty for LIAR estimates. For reasons discussed shortly, over or underestimation of this uncertainty is not of great concern for this assessment.
Next we estimate E Input . We assume measurement uncertainties (other than A T uncertainties) have negligible influence on the regression constant a values due to the large number of measurements used to estimate each constant. E Input is due rather to uncertainties in the singular sets of parameter measurements used to estimate A T from the 16 regressions. We estimate E Input as the input uncertainty propagated through the regression equations (e.g., Eq. 5). For a regression with n predictors, E Input is: shown as color on a log scale for Regression 1 (see Supporting Information for histograms for other regressions). More than 90% of measurements fall within the darker bins corresponding to log histogram frequencies > 2. Thin blue 1 : 1 foreground lines and a background grid are provided for reference. No measurements fall within bins where the blue-grey background is visible.
Here, U j is assumed uncertainty for the jth parameter used. Here we assume our measurement uncertainties are independent despite potential small correlations between errors in, for instance, temperature and AOU calculated from temperature. Our input parameter uncertainty estimates for our merged data product are given in Table 3. These estimates are inferred from Suzuki et al. (2013)'s minimum adjustments for the PACIFICA data product.
We estimate E Interp by comparing the root mean squared error (henceforth: error) for Variant 1 estimates (or E 1 ) to the error for estimates from the unmodified LIAR method (or E 0 ). Variant 1 has no interpolation step, so E Interp is 0. The methods are the same otherwise. This allows us to write: E Interp is statistically indistinguishable from 0 over the open ocean salinity range of 33-38, and small relative to E MLR (<10%) outside this range. We henceforth assume E Interp is 0.
The overall LIAR uncertainty estimate is the error for Variant 2 A T estimates, or E LIAR . We use error estimates from Variant 2 in place of similar estimates from the unmodified LIAR method or Variant 1 because the Variant 2 estimates are not derived from regression coefficients determined using the target A T values, as will be the case for future LIAR estimates.
We rearrange Eq. 6 and neglect E Interp to solve for E MLR : This E MLR estimate is highly sensitive to the assumed E Alk estimate, although this is unimportant for our final uncertainty estimates because E MLR and E Alk will always be combined in Eq. 6; an underestimation of one is compensated for by an overestimation of the other.
In Table 4, we report mean E MLR , E Input , and E LIAR for each of the 16 regressions for the portion (>95%) of our data product found within the open-ocean salinity range of 33-38.
Critically, LIAR estimates have similar or superior accuracy to alternative estimates. Velo et al. (2013) suggested their neural network and 3DwMLR methods have average absolute residuals (note: not standard deviations) of < 5 lmol kg 21 . Regression 1 is our most comparable regression to Velo et al. (2013)'s method-they use O 2 in place of AOU and include pressure and phosphate as additional regression parametersfor which we have an average absolute residual of 4.1 lmol kg 21 . McNeil et al. (2007) reported an error of 8.1 lmol kg 21 for the depth range, regression, and region over which we estimate an error of 5.8 lmol kg 21 . Bostock et al. (2013) obtain an error of 9.8 lmol kg 21 for the region south of 208 S while the LIAR regression with the same parameters (regression 7) returns an error of 6.4 lmol kg 21 for this region. Lee et al. (2006) used six constant terms in five 2 nd degree regional regressions to estimate surface (<30 m) alkalinity with a combined error of 8.1 lmol kg 21 , while we achieve smaller errors for this depth range with all regressions except 8 and 16 (i.e., with only h and S and with just S, respectively). The selforganizing map approach of Sasse et al. (2013) achieved an error of 9.2 lmol kg 21 for a similar region to that considered by Lee et al. (2006). Alin et al. (2012) used a four-term function of temperature and salinity to estimate A T with an error of 6.4 lmol kg 21 above 500 m depth in the CALCOFI region, and the LIAR error in this region is 3.9-6.2 lmol kg 21 (depending on regression). We apply Millero et al. (1998)'s estimate for the Pacific Gyres to data in our data product shallower than 50 m depth, between 208S to 308N, and between 1508 and 2408E and estimate an error of 7.5 lmol kg 21 . LIAR error for this subset of our data product using the equivalent Regression 16 is 6.4 lmol kg 21 . We do not exclude data beyond the open ocean salinity range or measured before 1990 for these error comparisons.
As with other estimation strategies (e.g., Velo et al. 2013) LIAR estimation performs substantially worse in the 4% of our dataset that does not fall within the open ocean salinity range. For example, the RMSE values in Table 4 increase by an average of 3% when we extend the minimum salinity measurement used for the calculation to 32, and increase by Table 4. Error estimates for the subset of our data product found within the open-ocean salinity range of 33-38. E MLR is error arising from the use of a MLR approach, E Input is error arising from uncertainties in our input data, and E LIAR is the overall estimate uncertainty. Errors are expressed as errors in lmol A T kg 21 .

Reg. #
Parameters used an average of 36% when all measurements are included. Salinities between 32 and 33 are common in the surface North Pacific, and LIAR only performs slightly worse for these estimates (32% higher RMSE values for this surface subset). Very large errors are typically found in regions with unusual A T to S relationships resulting from river water (e.g., the Arctic or Bay of Bengal) or in evaporative marginal seas where there are not enough measurements to estimate LIAR coefficients locally (e.g., the Red and Mediterranean Seas) (Carter et al. 2014). While the LIAR method could, at higher resolution, be adapted to reflect the distinct A T to S relationships characteristic of these regions, we decided to instead optimize our method for the open ocean with a coarse resolution grid, the requirement of > 100 measurements per regression, and the omission of data with S < 30 from the data used to estimate regression constants. Nevertheless, we develop methods to estimate the greater uncertainties for LIAR estimates for unusually fresh or saline seawater. We do this by calculating E MLR separately for all data product measurements falling within each 1 unit S bin that our dataset spans. For bins for which we have no measurements, we linearly interpolate between estimates for neighboring bins. Our final error estimate then uses the E MLR estimate appropriate to the salinity bin the measurement is found within. Figure 3 shows E MLR for Regression 1 as well as the number of data product measurements within each bin. Other regressions have similar distributions. Our software therefore returns A T estimate uncertainty E Est : where E Alk is the constant 3.3 lmol kg 21 , E MLR is determined from the histogram in Fig. 3 (or an equivalent histogram appropriate to the regression used), and U values are provided by the user (or assumed to equal the values in Table 3 if not provided). LIAR estimate bias is indistinguishable from 0 for all 16 regressions. However, LIAR errors are not normally distributed about 0 due to large errors for a small number of measurements. For all 16 regressions, more than 94% of LIAR errors are less than the standard error estimates in Table 4, whereas for normally distributed errors we would expect 68% of deviations to be less than or equal to the standard error. However, 87% of errors are less than E Est . This suggests Eq. 10 does scale estimate uncertainty with estimate error for our data product to a degree, but not sufficiently to ensure that the ratio of the deviations to E Est is normally distributed. We do not anticipate this will be a problem for applications in the open ocean, but note that LIAR uncertainties are likely underestimated for river plumes, marginal seas, and areas without many historical A T measurements.
We tested the LIAR method on the recent occupation of the P16 repeat hydrographic line. Data from this occupation were collected over two cruises with three total legs from early 2014 through mid-2015 (Talley 2014;Cross 2015;Macdonald 2015). These data were not used for estimating the LIAR regression constants, so they provide a preliminary demonstration of how well the method could perform in regions where there are ample measurements available in the training dataset. Figure 4 maps (preliminary) measured A T , LIAR-estimated A T (from Regression 1), and differences between these values. It can be seen that LIAR does an excellent job of capturing the broad-scale patterns observed on this cruise, including several localized features of the fronts in the Antarctic Circumpolar Current and the North Pacific. The measurement-estimate disagreement averaged 20.1 6 3.2 lmol kg 21 A T for data from this cruise.

Discussion
LIAR A T estimates improve on the previously available suite of estimation strategies in several important ways without compromising the high estimate accuracy characteristic of recent A T estimation efforts. First, LIAR estimates are applicable globally at all ocean depths. Second, lacking regional or neural boundaries, LIAR provides estimate precision when transitioning between regions of physical or property space. Third, LIAR can be used with many combinations of parameter measurements, including combinations that are measureable by float-capable biogeochemical sensors. Fourth, LIAR provides uncertainty estimates that scale with input uncertainties and seawater S. Fifth, LIAR regression coefficients are determined using more data and more recent data than some alternatives. Finally, we provide MATLAB code and documentation that make LIAR estimates accessible. It is our hope LIAR estimates will be used to supplement incomplete constraints of the seawater carbonate system from, for example, sensor measurements and Earth system models that do not include prognostic A T due to computational or data storage constraints (e.g., Galbraith et al. 2015).
The high, and in some cases improved, accuracy of LIAR estimates relative to other A T estimates is likely due to the larger quantities of data used to produce the regression coefficients, the large fraction of data collected in the years following the introduction of reference materials for alkalinity A T in our data product, and to the circumvention of the limitation of alternate approaches (except 3DwMLR) that each measurement in the training data set be used to constrain only one set of regional regression constants. Also, LIAR implicitly relies on sample position information through the regression constant interpolation step. This is the reason LIAR achieves comparatively small errors even for regressions with few parameters (e.g., 10.4 lmol kg 21 globally using only S as a predictor) and is able to automatically adopt regression coefficients appropriate for both dynamic frontal regions and stable subtropical gyres. The local-interpolation step is added for the convenience of deriving regression coefficients appropriate to arbitrary locations in the ocean and has little impact on the estimate accuracy.
An important question for our estimation strategy is how well the methods reproduce temporal A T changes from natural variability and long term changes. LIAR does capture natural variability to an extent. For example, the standard deviation of surface (<25 m) A T is 13 lmol kg 21 and 11 lmol kg 21 at ocean stations ALOHA and BATS, respectively (Joyce and Robbins 1996;Karl and Lukas 1996), while the standard deviation between measured and LIAR-estimated A T is only 6 lmol kg 21 and 6 lmol kg 21 , respectively. Lacking a temporal component, LIAR cannot capture the long term changes expected with biogeochemical feedbacks with ocean acidification. However, these impacts have been estimated to only become detectable after 2040 (Ilyina et al. 2009), so they are not a large concern for the immediate future. Furthermore, LIAR estimates may be of use as a baseline for detecting such changes. For instance, regression of surface A T at BATS normalized to a salinity of 35 against time reveals a statistically significant (at 95% conf.) increase of 0.24 lmol kg 21 per year over the record, while no increase is found in the difference between measurements and LIAR estimates. These observations suggest this observed surface increase can be attributed to captured natural variability rather than to long term changes. Contours in (c) demark regions where the average offset between measured and estimated A T exceeds 6 5 lmol kg 21 for a version of the same plot smoothed with weighted average gridding (with 8 and 9 permille length scales in the X and Y directions, respectively).

Comments and recommendations
The next step for development of the LIAR method is to update regression coefficients with the planned version 2 of the GLODAP data product. This data product will have quality controlled data from over 700 cruises including data from more recent cruises than those included in GLO-DAPv1.1, CARINA, and PACIFICA. This data product will also likely have cruises from several under-represented regions in our merged data product, such as the Gulf of Mexico, the Mediterranean, and the South China Sea.
It would be desirable to extend LIAR to other programming platforms commonly used by oceanographers, especially freely-distributed platforms such as Python, Fortran, Ocean Data View, and R. Implementations in Fortran, a common language for Earth system models and ocean circulation models, would allow the LIAR method to more easily be used to simulate A T distributions in models that do not resolve A T prognostically.
It may be useful to estimate and interpolate E MLR regionally instead of against S. This would allow uncertainty estimates to increase where residuals are larger due to enhanced measurement uncertainty or variability that is not well captured by our regression approach. We expect such a strategy would further reduce the non-normality of our uncertaintyestimate-normalized error distribution.
Methodological adaptations near river mouths and in marginal seas may also allow the LIAR method to return better estimates for these regions. For instance, deriving regression constant sets specific to riverine or estuarine outflows and placing these regression constant sets in the LIAR regression constant grid at the locations of river mouths may allow for better estimates to be returned in these areas. Currently, LIAR uncertainties in these regions are quite large, and possibly underestimated.