Improving surface heat flux estimation for a large lake through model optimization and two‐point calibration: The case of Lake Geneva

Net Surface Heat Flux (SurHF) was estimated from 2008 to 2014 for Lake Geneva (Switzerland/France), using long‐term temperature depth profiles at two locations, hourly maps of reanalysis meteorological data from a numerical weather model and lake surface water temperatures from calibrated satellite imagery. Existing formulas for different heat flux components were combined into 54 different total SurHF models. The coefficients in these models were calibrated based on SurHF optimization. Four calibration factors characterizing the incoming long‐wave radiation, sensible, and latent heat fluxes were further investigated for the six best performing models. The combination of the modified parameterization of the Brutsaert equation for incoming atmospheric radiation and of similarity theory‐based bulk parameterization algorithms for latent and sensible surface heat fluxes provided the most accurate SurHF estimates. When optimized for one lake temperature profile location, SurHF models failed to predict the temperature profile at the other location due to the spatial variability of meteorological parameters between the two locations. Consequently, the optimal SurHF models were calibrated using two profile locations. The results emphasize that even relatively small changes in calibration factors, particularly in the atmospheric emissivity, significantly modify the estimated long‐term heat content. The lack of calibration can produce changes in the calculated heat content that are much higher than the observed annual climate change‐induced trend. The calibration improved parameterization of bulk transfer coefficients, mainly under low wind regimes.

Surface Heat Flux (SurHF) and wind forcing control stratification dynamics and have a major influence on the physical, chemical, and biological properties of lakes (e.g., MacIntyre et al. 2002;Churchill and Kerfoot 2007;Bonvin et al. 2013;Finlay et al. 2015). For many lakes, changes in heat content are mainly due to SurHF variations, as shown in both short-term investigations (Van Emmerik et al. 2013) and long-term climate change studies (Arvola et al. 2010;Fink et al. 2014). SurHF temporal variations are often obtained from measurements taken at a single location (e.g., Heikinheimo et al. 1999;Laird and Kristovich 2002;Rouse et al. 2003;Rouse et al. 2008;Nordbo et al. 2011;Van Emmerik et al. 2013;Woolway et al. 2015a), using bulk formulas (e.g., Schertzer 1978;Henderson-Sellers 1986;Lenters et al. 2005;Woolway et al. 2015b) or 1D numerical modeling (e.g., Tanentzap et al. 2007; Momii and Ito 2008;Austin and Allen 2011;Stepanenko et al. 2014;Thiery et al. 2014a;Thiery et al. 2014b;Yang et al. 2017). Such quasi onedimensional (1D) estimates are then considered representative for the whole lake. Although the single-location approach might be suitable for small water bodies, spatial variability of SurHF due to variable meteorological conditions can be important for large lakes (e.g., Lofgren and Zhu 2000;Xue et al. 2015; Moukomla and Blanken 2017). Data from multiple locations permit the investigation of SurHF spatial variability, and the availability of such data is growing (e.g., Rimmer et al. 2009;Verburg and Antenucci 2010;Spence et al. 2011). A systematic evaluation of the validity and performance of SurHF models at more than one location received little attention up until now, especially for longterm studies. Here, we examine the impact of using data taken at two locations on the bulk SurHF model optimization and calibration and then compare it to the common onepoint approach.
An additional source of uncertainty is the selected SurHF model itself. SurHF models involve several terms for the relevant physical processes. For each of these terms, different formulations exist. Some studies aimed to improve and optimize individual SurHF terms (e.g., Fairall et al. 1996;Zeng et al. 1998;Crawford and Duchon 1999;Rimmer et al. 2009;Wang et al. 2014). However, the effect of combining the different equations for all the relevant SurHF terms and optimizing them as a set has not been evaluated.
In this study, we calibrated different combinations of SurHF term equations (with each term describing a different heat exchange mechanism) to quantify the heat content variation of a large lake for a 7-yr period (2008)(2009)(2010)(2011)(2012)(2013)(2014). We also assessed the impact of using time series of temperature profiles taken at two measurement points, instead of at only one point, for the calibration and estimation of SurHF. Lake Geneva, the largest freshwater lake in Western Europe, is a suitable study site as the required data are available. Specifically, the following questions were addressed: (1) What is the optimal combination of bulk formulas for modeling SurHF in a given lake?
(2) What is the impact of estimating the lake's heat content based on profile data from two measurement locations, as opposed to calibrations based on a single location?
(3) Using the example of Lake Geneva, how sensitive is the lake heat budget variation to the optimal calibration factors in a long-term analysis?
The methodology is developed here using two locations. However, it can be extended to more than two locations. In fact, by increasing the number of locations, it can be expected that the performance of the lake-wide SurHF model will be further improved.

Study site
Located between Switzerland and France, Lake Geneva (Local name: Lac Léman) is a large, deep perialpine lake with a mean surface altitude of 372 m. It is approximately 70-km long, with a maximum width of 14 km, a surface area of 582 km 2 , and a volume of 89 km 3 . The lake is composed of two basins: an eastern, large basin called the Grand Lac, with a maximum depth of 309 m, and a western, small narrow basin, the Petit Lac, with a maximum depth of approximately 70 m (Fig. 1).
The main inflow (Rhône-in) and outflow (Rhône-out) of the lake are shown in Fig. 1. The lake is surrounded by the Jura Mountains in the northwest, and by the Alps in the south and, to a lesser extent, the northeast (Fig. 1). This topography leads to two dominant wind fields, namely the Bise coming from the northeast, and the Vent from the southwest (Lemmin and D'Adamo 1996). On average, due to topographic sheltering, the eastern part of the Grand Lac experiences lower wind speeds than the western part of the Grand Lac and most of the Petit Lac. However, the two monitoring locations, SHL2 located in the Grand Lac and GE3 in the Petit Lac (Fig. 1), for which all the required data for model calibration are available, are located in the part of the lake where the surrounding topography is flat and topographical sheltering effects can be neglected.

Energy balance in a water column
The total heat content of a water column, G o (Jm −2 ), is given by: where ρ w and C p,w are the density (kgm −3 ) and the specific heat capacity of water at constant pressure (Jkg −1 K −1 ), respectively, and T[z,t] represents vertical temperature profile ( C) at time t down to water column depth z = H (m). The heat content variation from time t 1 to time t 2 then can be quantified by: The heat content variation in the water column over the full lake depth is the sum of the net energy flux into it and includes net SurHF, Q N , advective (lateral) heat transport, Q ad , and geothermal heat flux, Q ge : The contribution of the Q ad and Q ge terms to ΔG m were compared with that of Q N . For advective heat transport, note that the measurement locations in this study are far from shore and the river mouth, and that river input effects can be considered to be small, as is evident from the long theoretical residence time of Lake Geneva (11.3 y, Commission Internationale pour la Protection des Eaux du Léman [CIPEL], last accessed 5 July 2018). The contribution of advective heat flux, Q ad , to the heat content variation in the water column, ΔG m , could be important in lakes with short flushing times due to river inflows (Carmack 1979). In large lakes, the advective heat flux could be significant in certain areas where persistent currents move water between lake regions with different thermal regimes, such as the Keewenaw current in Lake 577 Superior (Zhu et al. 2001), or the bidirectional flow in the Straits of Mackinac (Anderson and Schwab 2017). However, the mean circulation pattern that is most often observed in large lakes is primarily a cyclonic circulation, with welldeveloped along-shore currents in the near shore zone and mostly weak currents of more random orientation in the central region of the lake (e.g., Emery and Csanady 1973;Simons 1980;Boyce et al. 1989;Beletsky et al. 1999;Beletsky and Schwab 2008). With this circulation pattern, steady shoreperpendicular currents advecting heat toward the center of the lake are rare. In their numerical modeling of Lake Superior, Bennington et al. (2010) observed that cyclonic circulation dominates and that there is no correlation between the daily anomalies of the local temperature gradient in the meridional direction and the daily anomalies of the current speed in the zonal direction outside the near shore boundary layer. Derecki bathymetry data (SwissTopo, last accessed 5 July 2018). SHL2 (46.45 N,6.59 E) and GE3 (46.3 N,6.22 E) are two monitoring locations in the lake used for model calibration and validation. The thick black arrows indicate the direction of the two dominant strong winds over lake called Bise and Vent. Rhône-in and Rhône-out show the lake's main river inflow and outflow locations, respectively.

578
(1976) reported that heat advection can be ignored for Lake Erie during most of the year. Based on numerical modeling, a two-gyre (Simons 1980) or three-gyre (Akitomo et al. 2004) large-scale circulation pattern is sometimes found in large lakes.
In the literature, contributions by Q ad and Q ge are ignored and it is assumed that ΔG m can be approximated by only considering Q N . To determine whether such a quasi-1D approach is justified in Lake Geneva, we estimated the contribution of Q ad using a three-dimensional (3D) hydrodynamic model. A detailed simulation was performed for a representative period (January to October 2010), as described in the Supporting Information section. The results (Supporting Information Fig. S3) show that far from shore and the main river inflow, the Rhône (Fig. 1), at the locations where this study is carried out, the contribution of SurHF, Q N , to the heat content variation is much higher than that of lateral advection, Q ad . For the whole lake, the heat content variation due to advective heat flux is high in the near shore zone and low in the center of the lake (Supporting Information Fig. S4), in agreement with Bennington et al. (2010). Thus, in the present analysis, advective heat flux, Q ad, is ignored.
Although the geothermal heat flux, Q ge , is not known for Lake Geneva, it is reported to be small in many Swiss lakes, typically~0.1 Wm −2 (Finckh 1976), and its contribution is ignored here. However, we briefly quantify below the impact of this (potential) flux on the estimated parameter values.
The net input energy, ΔG m (Jm −2 ), is calculated by integrating the net SurHF, Q N (Wm −2 ), for a given period (Eq. 3). The energy balance in the water column can then be written as (Van Emmerik et al. 2013;Fink et al. 2014;Nussboim et al. 2017): The net heat flux at the air-water interface (positive when directed into the water), Q N , in Eq. 4 is given by: where the right-hand side terms describe the flux due to solar shortwave radiation, Q sn , incoming long-wave radiation from the sky, Q an , back long-wave radiation, Q br , latent (evaporation, Q ev ), sensible (convection, Q co ), and precipitation (Q pr ) heat fluxes. The effect of precipitation, Q pr , on the SurHF of European lakes is neglected due to its minimal influence on SurHF (Livingstone and Imboden 1989;Rimmer et al. 2009;Fink et al. 2014). To use Eqs. 1, 4, and 5, water and atmospheric field data input are required.

Satellite data
Lake surface water temperatures (LSWTs) are needed for Eq. 5. Riffler et al. (2015) determined the LSWT for 25 lakes in and near the Alps from a long-term archive of Advanced Very High Resolution Radiometer (AVHRR) satellite imagery (~1.1 × 1.1 km pixel size). The satellite-based temperatures agreed well with the near-surface in situ measurements for our study with a bias and root mean square error (RMSE) within the range of −0.5 to 0.6 C and 1.0 to 1.6 C, respectively. This range of values favorably corresponds with another long-term LSWT calibration for Lake Geneva (Oesch et al. 2005). We use the same data set as Riffler et al. (2015) (4384 images from 1 March 2008 to 31 December 2014) to retrieve the LSWT at the SHL2 and GE3 locations. Images with more than 70% lake coverage were selected, resulting in a total of 856 diurnal images and 308 nocturnal images. Missing pixels in these images were interpolated spatially using Barnes interpolation (Koch et al. 1983;Liston and Elder 2006). The present analysis requires a pixelwise spatially resolved time series of surface temperature. These were derived from the spatially interpolated LSWT maps. Time series were produced using piecewise cubic hermite polynomials (Fritsch and Carlson 1980). Fig. 2a shows the variation of LSWT (labeled T w in the figures) at the SHL2 and GE3 locations (the nearest pixels in the satellite images) smoothed with a 30-d moving average. Generally, SHL2 has a higher skin temperature than GE3 in summer and winter (Figs. 2a). A shift in the T w distribution at the two locations is evident from the smoothed probability density function (PDF) of the LSWT (Supporting Information Fig. S6a).

Meteorological data
Meteorological data over Lake Geneva are not measured. However, since 2008, the Swiss Federal Office of Meteorology and Climatology (MeteoSwiss, last accessed 5 July 2018) has run a numerical weather model, COSMO-2, which provides hourly output on a 2.2 × 2.2 km grid. Lakes are distinguished from land by using a lake model in COSMO-2 (Mironov 2008). COSMO-2 data include spatiotemporal maps of wind speed (10 m above the lake), air temperature (2 m above the lake), relative humidity (2 m above the lake), cloudiness, global radiation and air pressure. Model results are systematically verified against over-land surface data in Switzerland and Europe (MeteoSwiss). This study is based on reanalysis COSMO-2 datasets (assimilated results are based on past field observations) for the period from 1 March 2008 to 31 December 2014. To investigate the quality of COSMO-2 results further, we compared these data with measurements from meteorological stations located around the lake (Supporting Information Fig. S7a). Our analysis indicates a high correlation between these measurements and the COSMO-2 outputs (Supporting Information Fig. S7b), with the exception of wind speed, which has a higher local spatiotemporal variability. For wind speed, the cross-correlation between different stations is similar for COSMO-2 results and measurements, which confirms the capability of the COSMO-2 model to represent realistic large-scale wind patterns over Lake Geneva (Supporting Information Fig. S7c).
Smoothed meteorological data with a 30-d moving average are shown in Figs. 2b-f, for the SHL2 and GE3 locations (PDFs of the raw hourly data are presented in the Supporting Information S6b-f). The differences between the two locations in variation and distribution of wind speed, U 10 ( Fig. 2b; Supporting Information Fig. S6b), and relative humidity,ϕ rel , ( Fig. 2e; Supporting Information Fig. S6e), are pronounced. In particular, the average wind speed is higher at GE3 than at SHL2 (Fig. 2b). The probability density of low wind speeds (1-3 ms −1 ) is also lower at GE3 (Supporting Information Fig. S6b). This is due to the differences between the characteristics and fetch of the two dominant winds, Bise and Vent, as described earlier (Fig. 1).

Model calibration procedure
The net SurHF, that is, the air-water heat exchange in Eq. 5, is usually estimated using bulk formulas. These formulas are based on different concepts and require specific parameters. Ideally, these parameters are known for a given set of conditions (e.g., a lake). However, in reality, this is rarely the case and they have to be determined by calibration.
For each of the five SurHF terms in Eq. 5 that remain to be solved, there are various formulations available in the literature, and their possible combinations give rise to numerous net SurHF models. All of them contain coefficients, which can be a source of error and uncertainty. Model intercomparison is a good way to explore some of the uncertainties and determine the optimal model configuration. In this study, we evaluated 54 net SurHF model combinations and report here in more detail the results for the six combinations considered optimal. Details of the formulas and parameters are presented in the Supporting Information (Tables S1-S4). The model intercomparison indicated that the net SurHF variation, and consequently, the water column heat content are more sensitive to the spatiotemporal variability of atmospheric longwave radiation, Q an , and turbulent heat fluxes, Q ev and Q co , than to the remaining heat flux terms. For Q an , several formulas (e.g., Brutsaert 1975;Satterlund 1979;Crawford and Duchon 1999) with a relatively wide range of values for the corresponding coefficients (see, e.g., Supporting Information Table S5,) were reported. Studies over large lakes already indicated a significant spatial variability of turbulent heat fluxes, Q ev and Q co , in particular the latent heat flux, compared to the radiative terms (e.g., Lofgren and Zhu 2000;Verburg and Antenucci 2010;Moukomla and Blanken 2017). There is a similarity between the formulas for the Q ev and Q co estimation that is attributed to the physical analogy between processes 580 controlling humidity and air temperature. On the other hand, Q br is usually modeled with a constant water surface emissivity value in a relatively narrow range, that is, 0.95-0.97 (Davies et al. 1971;Sweers 1976;Octavio 1977). Thus, for back longwave radiation, Q br , and also solar short-wave radiation, Q sn , we selected the formulas in the literature reported for previous studies in Switzerland. For the model calibration, therefore, we focused on these three SurHF terms, Q an , Q ev , and Q co . We used one model for Q sn and Q br , two models for Q an , and three models for the Q ev /Q co calculations in Eq. 5. Table 1 summarizes the references and the corresponding calibration factors for each SurHF component. The combination of equations used in each of the six SurHF models is given in Table 2. It can be seen that any combination requires optimizing four calibration factors. A more complex equation set with additional calibration factors could be imagined. However, in a recent study, Dommenget and Rezny (2018) reported that, in a model tuning problem, a parameter space with higher dimensions will reduce the cost function, but the "modeled physics" are not necessarily closer to the "true physics." Their results indicated that tuning will be more successful by limiting the complexity of the problem to 1-5 calibration factors. Therefore, we calibrated those four parameters for each of the SurHF models, which were found to be most critical in our preliminary 54 model intercomparison.
The calibration factors listed in Table 1 were optimized based on energy conservation over time (Eq. 4). The generalized likelihood uncertainty estimation (GLUE) methodology (Beven and Binley 1992) was applied to calibrate the six SurHF models in Table 2. This methodology requires a validity range for each parameter, a sampling strategy for the parameter space and a likelihood measure for each parameter set. The range of different parameters was chosen according to physical limitations and their reported ranges in the literature, particularly for Swiss lakes. The details on the range of parameters and the sampling strategy can be found in the Supporting Information. To evaluate the temporal variation of the model heat content, the RMSE with respect to observations was selected as the minimized optimization metric: where N is the number of total observations at the two measurement locations (SHL2 and GE3, Fig. 1) at time t i . However, in cases where RMSEs are close for different models, other metrics, such as correlation coefficient and standard deviation were used. The approach used here is based on the optimization of Q N by minimizing the uncertainties with respect to the observed water column heat content G o . However, Q N is the sum of different SurHF components and minimizing the uncertainty of the sum may create erroneous values of the individual terms, which may cancel each other. To determine the errors in the estimates of each SurHF component, direct measurements of each component are required. These errors cannot be estimated in the present study, because such data are not available for Lake Geneva. Instead, we will further investigate this point below by examining the dynamics of the individual components and compare their values with those reported in the literature.  Murakami et al. (1985) C mur and C b (Q ev + Q co ) 2 Supporting Information Eq. S6 Ryan et al. (1974); Gill (1982) C e,r and C h,r (Q ev + Q co ) 3 Supporting Information Eq. S7 Monin and Obukhov (1954); Woolway et al. (2015b) C m2 and C q1 Table 2. Equations used in each of the six selected SurHF models. See Table 1 for references for each heat flux term (details are given in the Supporting Information Tables S1-S4).

581
Another source of error in RMSE calculations (Eq. 6) is the uncertainty of the measured temperature profiles used to calculate G o . In the Error analysis in the Supporting Information, we estimate this error and it will be demonstrated below that it is much smaller than the smallest RMSE values. Thus, RMSEs essentially report errors in G m .

Model calibration and assessment
Uncalibrated versus calibrated net surface heat flux models Various combinations of SurHF models were studied applying the water column energy balance First, we examined the performance of the different models using coefficient values given in the literature, with an emphasis on those used in other lake studies in Switzerland. These include C cloud = 0.17, C an = 1.0592, and C b = 0.62 as in the study by Fink et al. (2014), C lt = 0.06 and C lc = 1.22 (Crawford and Duchon 1999), C mur = 1.2×10 −7 (Murakami et al. 1985), C e,r = 2.1×10 −3 and C h,r = 1.45×10 −3 (Wahl and Peeters 2014), C m2 = 0.11 and C q1 = −2.67 (Zeng et al. 1998).
A Taylor diagram (Taylor 2001) was used to determine how well the results of the six SHF models matched the observations (Eq. 4). The Taylor diagram (Fig. 3) provides a comparison between a group of models and a reference observation by combining correlation coefficients, RMSE and standard deviations in a single figure. Here, the reference is the heat content variation at the SHL2 and GE3 locations, ΔG o , calculated by Eqs. 1 and 2. The comparison groups are the heat content variation calculated with these six different net SurHF models, ΔG m , estimated by Eq. 3. These models and their corresponding SurHF terms are described in Tables 1 and 2. The results reveal that the models using the predefined (uncalibrated) values lead to a high standard deviation and RMSE (1.2-7.7 GJm −2 ), and a low correlation coefficient (less than 0.4) compared to the observations. These substantial deviations emphasize the importance of performing a pre-analysis before applying the SurHF models for long-term air-water heat exchange investigations. For example, model results indicate that employing predefined calibration factors results in nonphysical atmospheric emissivity values ε an > 1 and consequently significant model-observation divergence. The modelobservation deviation is higher for uncalibrated models using atmospheric emissivity as proposed by Crawford and Duchon (1999), ε an2 (Supporting Information Eq. S3b, Table S1), resulting from an overestimation of Q an2 due to a presumably high (uncalibrated) C lc value, 1.22. Crawford and Duchon (1999) reported a monthly mean bias error of −9 to 4 Wm −2 based on their data for a 1-yr period using this formula. In summary, all six models using these predefined coefficients failed to reproduce the heat content variation at both the SHL2 and GE3 locations.
Therefore, we followed the optimization and calibration procedure for all model combinations, as explained above, using a two-location calibration approach. The zoomed lower panel of Fig. 3 presents the model-observation comparison for the optimum (calibrated) net SurHF models and the combined observations at SHL2 and GE3. The results show a great improvement over the uncalibrated estimations. Models 1 and 4 (Table 2), which calculate turbulent heat fluxes using (Q ev + Q co ) 1 have a higher correlation coefficient, lower RMSE and smaller standard deviation than Models 2 and 5 using (Q ev + Q co ) 2 . The simple Murikami's and Bowen's formulations, (Q ev + Q co ) 1 , yield a better estimation of Lake Geneva's surface heat exchange than considering the effect of both free and forced convection in (Q ev + Q co ) 2 . For the model of Ryan et al. (1974), turbulent heat fluxes, (Q ev + Q co ) 2 , were derived under laboratory conditions where forced convection is not significant compared to free convection and the air-water temperature difference was greater than in natural systems. However, the similarity theory, (Q ev + Q co ) 3 in Table 1, applied in Models 3 and 6, reproduces the temporal variation of turbulent heat fluxes far better than the other two algorithms. This model, (Q ev + Q co ) 3, iteratively determines the atmospheric boundary layer condition at each point and at each time step based on the LSWT and meteorological data, and better resolves the spatial variability of SurHF. Results indicate that for approximately 75% of the time, the atmospheric boundary layer over the lake is unstable.
Since the calibrated Models 3 and 6 have similar RMSEs, we applied additional statistical methods, i.e., correlation coefficient and standard deviation, to select the best model for SurHF estimation. We also calculated the bias between the observed and modeled heat contents for each model. These two models have nearly identical correlation coefficients and RMSEs (purple lines and green lines in Fig. 3). However, Model 3 gives a slightly better standard deviation than Model 6 (orange lines in Fig. 3) while the estimated bias using Model 6 is slightly smaller than Model 3. Although the temporal variation of heat contents by implementing Model 3 and Model 6 are not noticeably different (results not shown), Model 3 was selected as the best model resulting from the Taylor diagram comparison (Fig. 3).

Two-point versus one-point calibration
Our analysis (Fig. 3) shows that, regardless of the chosen model, recalibration greatly improves the estimation of the long-term surface heat exchange for Lake Geneva. In order to determine whether there is a significant difference between the one-point and two-point calibration, we calibrated the six SurHF models using either SHL2 or GE3 temperature profiles. Fig. 4 shows the heat content variation comparison between observations ΔG o and Model 3 results ΔG m . The SurHF model calibrated at SHL2 overestimates the SurHF at GE3 (Fig. 4a), while SurHF values are underestimated at SHL2 using only GE3 temperature profiles for model calibration (Fig. 4b). Therefore, SurHF models calibrated using only one temperature profile location fail to predict the profile at the other location. Similar results were obtained using the other five net SurHF models (results not shown). This confirms that there is a significant difference between the one-point and two-point calibration.
Intercomparison of lake heat content variation A group of four optimal calibration factors was obtained for each of the six models, listed in Table 2, using the two-point calibration. For the remaining analysis, only these optimal values are employed. The observed heat content variation, ΔG o , and the corresponding heat content variation using the six different calibrated models, ΔG m , are compared in Fig. 5.
Here, the individual model performances at SHL2 and GE3 are investigated separately. Using Models 2 and 5, the SHL2 and GE3 results are roughly distributed below and above the optimal dashed line, respectively, and have the largest scatter. This demonstrates that the worst combination of sensible-latent heat flux terms (Q ev + Q co ) 2 (Supporting Information Eq. S6, Table S3), underestimates the SurHF at SHL2, while it is overestimated at GE3. Although this two-point separation is less pronounced using Models 1 and 4 (left panels of Fig. 5), these models still have a relatively higher RMSE and lower correlation coefficient compared to the best models (Fig. 3). Models 3 and 6, which are the best models in terms of RMSE, use similarity theory (Q ev + Q co ) 3 (Supporting Information Eq. S7, Table S3), for turbulent heat flux estimation. Model 3 uses Brutsaert's formulation (Supporting Information Eq. S2, Table S1), while Crawford-Duchon's approach (Supporting Information Eq. S3, Table S1) quantifies atmospheric radiation in Model 6. Therefore, using a more advanced model for the sensible-latent SurHF calculation, Supporting Information Eq. S7, Table S3, not only it leads to better model-observation statistics but also reproduces the heat content of individual points better than the other models.
When comparing Models 1-3 (top panels) with Models 4-6 (bottom panels) in Fig. 5, it should be noted that in the top panels, the results for SHL2 mainly cluster above the optimal line, whereas those for GE3 are mainly below that line. Such a clear separation between the results of the two stations is less obvious in the bottom panels. Models 1-3 apply the Brutsaert (1975) formulation for cloud cover, and the remaining models use the more complex model of Crawford and Duchon (1999). This latter formulation gives more realistic results in terms of the model bias.
Using the best and worst models (respectively, Model 3, closest to "obs" in Fig. 3, and Model 2, furthest from "obs" in Fig. 3), time series of the modeled lake heat content at the calibration locations are compared with observations in Fig. 6. As stated earlier, the calibration factors are assumed constant. However, if each location is treated independently in the calibration process, with a different set of calibration parameters for each model at each of the two stations, there is little or no difference between the models (results not shown). This, again, demonstrates the significant improvement resulting from the twopoint calibration and the use of optimal heat flux term concepts. The failure of some models, for example, Model 2, to resolve the spatial variability of heat content, that is, two-point calibration in this study, could be due to incorrectly modeling some relevant physical processes or not modeling them at all. As the long-wave radiation Q an shows little spatial variability over the lake, the turbulent heat flux model (Q ev + Q co ) 2 (Table 2) can be considered the source of the deviation of Model 2. It was already shown earlier (Figs. 3 and 5) that Model 2 significantly differs from the observations.

Intercomparison of net surface heat flux
In order to compare the net SurHF, Q N (Wm −2 ), at SHL2 and GE3 obtained with the six selected models using the bestfit parameters, the mean and standard deviation of Q N were calculated for the period 1 Jan 2009 to 31 Dec 2014 (due to missing data, the January-February 2008 results were excluded). The mean value of SurHF at GE3 is higher than at SHL2 in all models (Table 3). The mean SurHF difference ranges from 4 to 10.7 Wm −2 for Models 6 and 2, respectively. In terms of mean net SurHF, Model 6, and to a lesser extent Model 1, are close to Model 3, while Model 2 has the largest deviation. The results of Models 5 and 6 have the maximum and minimum standard deviation differences with respect to Model 3 at the two locations. Consequently, there will be  The individual formulas used in each model are given in Table 2 (see Supporting Information Tables S1-S4 for more details).

584
marked differences between the models if integrated over the entire lake and considered over an annual cycle.
The mean net SurHF at SHL2 and GE3 are −1 and 3.4 Wm −2 , respectively, for 2009-2014. The results also indicate that on average, some parts of the lake (GE3) are warming at the same time as other regions (SHL2) are cooling down during this period. This emphasizes the importance of using multiple locations (or ideally the entire lake) instead of only one location for SurHF and lake heat content studies, especially for long-term analyses. The high standard deviation values in Table 3 result from using hourly estimates. When the data are smoothed with a 30-d moving average, standard deviations of 88.9 and 95.6 Wm −2 for SHL2 and GE3, respectively, are obtained with Model 3. Although the difference between these standard deviations is small, it shows that the SurHF distribution can vary systematically from one point to another over a large lake. Furthermore, the SurHF values are spread over a wider range at GE3 than at SHL2. This distribution is the response to the combined contribution of various meteorological parameters, for which spatial variations are likely to occur. Fig. 2; Supporting Information Fig. S3, for example, illustrate this variability at the two studied locations.

Surface heat flux estimation
We calculated SurHF terms at two locations using the best model (Model 3), with the following four optimal coefficient values: C cloud,opt = 0.11, C an,opt = 0.98, C m2,opt = 0.01, and C q1,opt = −1.52. The results converge to a minimum RSME value for all four parameters with a narrow range of variation in the neighborhood of these optimal parameter values (Supporting Information Fig. S9). Furthermore, the uncertainty analysis also showed a negligible error of the observation data, G o , compared to the above minimum RMSE value (see error analysis in the Supporting Information). The C cloud,opt value we obtained (C cloud,opt = 0.11) is lower than C cloud = 0.17 determined by Kuhn (1978) and used in other studies in Switzerland (Livingstone and Imboden 1989;Fink et al. 2014). Our parameter C an,opt (C an,opt = 0.98) is also slightly lower than the values of 1.09 and 1.0592 used in those studies. To verify the calibrated values for C cloud and C an , we evaluated the computed atmospheric emissivity for the SHL2 and GE3 stations. The emissivity values, ε an1 (Supporting Information Eq. S2b, Table S1), fluctuate between 0.6 and 1, with the lower emissivity values being valid for clear skies and low air temperatures in winter and the highest values during warm and fully cloudy summertime conditions (Supporting Information Fig. S10). This shows the improvement resulting from our calibration for this SurHF component, as it was previously mentioned that uncalibrated emissivity may reach values >1 which are physically unrealistic. The C m2 and C q1 factors will be further discussed in the next section.
The calculated temporal variation of the five SurHF terms and the net SurHF for the study period using Model 3 is presented in Fig. 7 (smoothed with a 30-d moving average). There is a high correlation (>0.97) for the radiative heat fluxes, Q sn , Q an , and Q br at SHL2 and GE3. However, due to the difference in variation of Fig. 6. Temporal evolution of the lake heat content. Model results compared to observations (points) at (a) SHL2, and (b) GE3 (see Fig. 1 for station location). Only the best (Model 3, black lines) and the worst (Model 2, gray lines) models are presented. Note that due to the difference in depth at the two locations, that is, 309 m at SHL2 and 70 m at GE3, the absolute values of heat content (vertical axis) have different magnitudes in (a) and (b). Table 3. Mean and standard deviation of net SurHF, Q N , at SHL2 and GE3 using the six SurHF models specified in Table 2  585 wind speed, relative humidity and LSWT at SHL2 and GE3 (U 10 , ϕ rel and T w in Fig. 2; Supporting Information Fig. S6), the correlation coefficients of sensible, Q co , and latent, Q ev , heat fluxes between SHL2 and GE3 are 0.8 and 0.63, respectively, based on hourly values. A correlation coefficient of 0.93 was found between the net SurHF Q N at SHL2 and GE3. The mean and standard deviation of the different SurHF terms at both locations for the period 1 Jan 2009 to 31 Dec 2014 are given in Table 4. Examining the results for different SurHF terms reveals that radiative components mainly contribute to the variation of the mean net SurHF at each location. The higher standard deviation of the net SurHF at the GE3 location compared to SHL2 (Table 3) can be explained by the higher standard deviation values of turbulent heat fluxes at this location (Table 4).

Discussion
Bulk transfer coefficients The model assessment shows that using an appropriate model for the sensible-latent heat flux estimation is essential for the two-point calibration. The results indicate that some models fail to reflect the long-term heat content variation at two points. Based on the RSME, the bulk aerodynamic algorithm using the similarity theory and empirical relationships (Q ev + Q co ) 3 in Table 2 (details in the Supporting Information Eq. S7, Table S3), was the best model. Two components must be defined in this algorithm: turbulence stability functions, f m , f e, and f h , and roughness lengths for momentum, temperature and humidity, z 0 , z 0t , and z 0q , respectively. Details of the calculation procedure are given in the Supporting Information. Unlike the other selected models, that is (Q ev + Q co ) 1 and (Q ev + Q co ) 2 in Table 2, the heat transfer coefficients are not constant in this algorithm. The spatiotemporal values of the transfer coefficients, C d,m , C e,m , and C h,m , are defined as a function of the atmospheric stability parameter (ζ, Supporting Information Eq. S7e, Table S3) using the Monin-Obukhov theory (Monin and Obukhov 1954). Even though wind speed is often considered to be the main physical parameter affecting the transfer coefficient values, other processes such as air thermal stratification (e.g., Deardorff 1968;Xiao et al. 2013),  586 waves and relative humidity may also contribute (Toffoli et al. 2012). The transfer coefficients are higher under unstable atmospheric boundary layer conditions. However, the effect of humidity and wave parameters on the air-water heat and momentum exchanges was found to be different under weak and strong wind conditions. In order to include these processes in the calibration procedure, field measurements are needed which are not available for Lake Geneva. The obtained optimal values, C m2,opt = 0.01 and C q1,opt = −1.52, are, respectively, lower and higher than the optimum values of Zeng et al. (1998), that is, C m2,uc = 0.11 and C q1,uc = −2.57. Fig. 8 illustrates the variation of C e,m and C d,m as a function of wind speed, U 10 , when the uncalibrated and calibrated factors are compared. The coefficient C h,m has a similar shape as C e,m (not shown). These curves were obtained by randomly sampling 2000 points from the dataset under unstable conditions (negative stability parameter, ζ < 0). Following the sensitivity analysis in the Supporting Information Fig. S8, the difference between calibrated and uncalibrated factors results in lower (higher) humidity and temperature bulk transfer coefficients under low (high) wind speed conditions (Fig. 8a). We assumed a constant value of 0.013 for the Charnock parameter (Charnock 1955) in the calculation of the momentum roughness length, z 0 (Supporting Information Eq. S10). Therefore, the drag coefficient, C d,m , under high wind conditions (> 7 ms −1 ) is similar for uncalibrated and calibrated conditions (Fig. 8b).
To verify the calibration factors, we compare the shape and range of these curves with some measurements taken over water. The general form of theses curves is similar to the measured values over inland and open waters (e.g., Wüest and Lorke 2003;Wei et al. 2016). As we obtained lower values for both C m2 and C q1 than Zeng et al. (1998), the calibrated transport coefficients are smaller (higher) at low (high) winds than the uncalibrated coefficients in Fig. 8. These coefficients are sensitive to the choice of parameterization, especially in low wind regimes (Webster and Lukas 1992). For weak winds, the measured data cover a wide range of drag coefficients from 2 × 10 −3 to 2 × 10 −2 . Higher coefficient values (~2 × 10 −2 ) are reported mainly for ocean experiments (Geernaert et al. 1988;Bradley et al. 1991) while lower values (~2 × 10 −3 ) were observed over shallow coastal waters (Mahrt et al. 1996) and estuaries (Lin et al. 2002). Wei et al. (2016) found a high drag coefficient value of~10 −2 at a wind speed of~1 ms −1 for Lake Kasumigaura (Japan), while a value of~2.5 × 10 −3 was measured for Lake Neuchatel (Switzerland) for the same wind speed (Simon 1997). For Lake Kasumigaura, however, the humidity transfer coefficient was~3 × 10 −3 at 1 ms −1 wind speed, which is closer to our estimated values (Fig. 8a). Under high wind regimes, the calibrated value (~2 × 10 −3 ) and the almost linear variation of the transfer coefficients are in agreement with reported measurements (Graf and Prost 1980;Merzi and Graf 1988;Mahrt et al. 1996;Babanin and Makin 2008;Toffoli et al. 2012;Wei et al. 2016). Although the transfer coefficient values are consistent with other studies, there is a shift in the wind speed under which these minima are observed. The minimum values of the transfer coefficients (~10 −3 ) were obtained at a low wind speed of~2 ms −1 (Fig. 8), whereas the minima reported in the literature are, for example, at~4 ms −1 (Mahrt et al. 1996;Lin et al. 2002) or~5 ms −1 (Wüest and Lorke 2003;Wei et al. 2016).
The large scatter seen in the transport coefficients under weak wind regimes could be due to the measurement technique, method of statistical calculation, or site-dependent parameters. Recently, Wei et al. (2016) suggested some possible reasons for high values at low wind speeds, for example, lake surface currents, wave effects, and gustiness. However, the behavior of transfer coefficients under low wind speeds is less clear. Mahrt et al. (1996) investigated the influence of fetch on the drag coefficient curve. They reasoned that the 587 drag coefficient for a short fetch (<4 km) is greater than for a long fetch, particularly for high wind speeds. They associated this variation to the wave field difference under different fetches. Their results also indicate that the drag coefficient for short fetches has a minimum at a wind speed of~3 ms −1 . Our results agree with the short-fetch results of Mahrt et al. (1996), especially for low wind speeds (< 3 ms −1 ). The variation of transport coefficients in this regime is mainly due to the second term in the right-hand side of the Supporting Information Eq. S10, C m2 v a /u * . Compared to measurements by Zeng et al. (1998), Wüest and Lorke (2003), and Wei et al. (2016), the lower transport coefficients at low wind speeds reported here are due to the small C m2 value, 0.01, compared to the commonly used value of 0.11. Other parameterizations for this term are proposed, for example, C m3 σ w =ρ w u 2 * (Jin 1994;Bourassa et al. 1999), where C m3 is a calibration factor and σ w is the surface tension. Better insight into realistic parameterizations could be obtained from systematic direct measurements of the atmospheric boundary layer.

Sensitivity of the calibration factors
We quantified the effect of uncertainty associated with the four calibration factors using a straightforward sensitivity analysis in which one parameter was varied over a specific range while keeping the remaining three constant. The difference in heating/cooling caused by the different parameters is expressed as the temperature change in the near surface water layer, for example, the upper 10 m of the water column, ΔT ( Cy −1 ), using: where ΔG m,sen (Jm −2 y −1 ) is the mean annual heat content change due to variation of the corresponding parameter, and H sl is the surface layer depth (m). This metric was used to compare the cooling/heating induced by employing uncalibrated SurHF box models with a warming water temperature trend. In this context, a AE 1 Cy −1 variation is approximately equal to a AE 1.3 Wm −2 bias in the estimation of the net SurHF. Fig. 9 shows the effect on the heat content variation at SHL2 and GE3. The results indicate that relatively small deviations in calibration factors, particularly in C an , affect the lake's temperature trend significantly and can be much higher than the annual climate change trend of 0.065 Cy −1 as reported by Gillet and Quetin (2006) for the near surface layer. Variations in C an and C cloud have a linear influence on the heat flux estimation (Figs. 9a,b) while the effect of an uncalibrated C q1 and C m2 is non-linear (Figs. 9c,d). This is due to nonlinearity inherent in the turbulent heat flux parameterization. The sensitivity of the results to C an variations is striking (Fig. 9a), whereas variations of C q1 are less marked (Fig. 9c). A small deviation in C an (< 1%) results in a noticeable change in the SurHF estimation (> 2.5 Wm −2 ) and consequently the lake heat content.
The results are also sensitive to C cloud . A AE 10% variation of C cloud leads to a bias of~AE1.6 Wm −2 in the estimation of net SurHF.
The results also indicate that the responses to C q1 and C m2 at SHL2 and GE3 are different. The C m2 variation has a more pronounced effect at SHL2 (Fig. 9c) while GE3 is more sensitive to C q1 (Fig. 9d) due to the spatial variability of meteorological forcing and LSWT over Lake Geneva. The differences in wind speed, LSWT, and relative humidity are quite noticeable between the two locations ( Fig. 2; Supporting Information  Fig. S6). Wind speed variations (Fig. 2b), for example, are on average higher at GE3 than at SHL2. As the value of C m2 mainly affects the intensity of turbulent heat fluxes at low wind regimes (Supporting Information Fig. S8a), the tendency for weak winds to occur at SHL2 reflects its sensitivity to C m2 . In contrast, stronger wind forcing at GE3 makes it more sensitive to the value of C q1 , which controls heat flux for high wind regimes (Supporting Information Fig. S8b). Again, these differences underscore the possible significant errors arising from single-point model calibration.
The same analysis can be applied to quantify the effect of geothermal heat flux on the model calibration. The water column heat content variation due to geothermal bottom heating is~0.003 GJm −2 y −1 using the typical value for the geothermal heat flux, that is, $0.1 Wm −2 (Finckh 1976), Fig. 9. Sensitivity analysis of model-induced cooling/warming for the upper 10 m of Lake Geneva at SHL2 (green lines) and GE3 (blue lines) depending on the optimum calibration factors for (a) C an , (b) C cloud , (c) C q1 , and (d) C m2 . The vertical dashed lines indicate the position at the optimal values for each calibration factor resulting from our analysis. It is noted that there is a high amount of overlap between the green and blue lines in panels (a) and (b).

588
which is 100 times smaller than the calculated RMSE employing Model 3 (legends in Fig. 5). As this variation is small, we did not repeat the optimization procedure with the updated heat content values. Instead, we used the sensitivity analysis results to estimate its effect on the obtained calibration factors. The calculated geothermal-induced slight warming (0.003 GJm −2 y −1 ) corresponds to a 0.08 Cy −1 variation in a 10-m water column.

Comments and recommendations
Large lakes can be characterized by considerable spatial variability in meteorological parameters. For example, the surrounding topography can exert a strong influence on the wind patterns and solar radiation, and hence on SurHF. LSWT may likewise exhibit significant spatial variability, mainly during the summertime. Therefore, the determination of SurHF at a single location only provides a partial understanding of the energy exchange dynamics over the whole lake surface and could result in significant errors in the estimation of SurHF for the whole lake. In addition, various formulations and parameterizations exist for different heat flux components, in particular for sensible and latent heat fluxes. Thus far, a systematic analysis of their optimal combination, which determines the net SurHF, was not reported. In this study, we addressed these questions by expanding the net SurHF estimation for a 7-yr period using a two-point calibration, instead of the commonly used onepoint calibration. We tested 54 different net SurHF models Q N , and presented the results for the best six models associated with different combinations of one model for solar shortwave radiation Q sn , two models for atmospheric radiation Q an , one model for longwave radiation from the water surface Q br , and three models for turbulent heat fluxes, that is, sensible Q co and latent Q ev heat fluxes. For the period of 2008-2014, we evaluated the heat content response of Lake Geneva to these models by implementing frequently used coefficients given in the literature for comparable water bodies.
Our analysis emphasizes the importance of choosing appropriate calibration factors when estimating the heat budgets of large lakes. Since none of the coefficients given in the literature provided acceptable SurHF estimates, optimization was used to find the best calibration factors for the selected SurHF models. However, the common approach of computing SurHF based on a single location did not result in satisfactory SurHF predictions at both locations. We found a high sensitivity of SurHF estimations to certain calibration factors indicating that a systematic calibration of bulk models is required for each study site. We demonstrated that a small variation in calibration factors, especially those controlling atmospheric radiation, leads to a significant change in the heating/cooling estimation of the lake.
The results indicate that multipoint (two point in this study) calibration is best performed using a comprehensive model for sensible-latent heat flux calculation. The parameterization in the bulk formula based on similarity theory was found to account for the spatial variability adequately. On the other hand, the temporal variation of the air-water heat exchange is highly sensitive to the atmospheric radiation modeling. Note that all the tested models gave reasonable RMSE values for short periods, that is, < 2 yr (Fig. 6). However, only a few of them gave satisfactory calibration over two points for longer periods, that is, more than 3 yr. Therefore, an accurate model selection and calibration is important for long-term climate studies, assuming that the calibration remains valid for a shifting temperature pattern.
The results show that for the heat exchange analysis of a large lake, a properly calibrated atmospheric radiation model and, in particular, an appropriate turbulent SurHF model are essential. The quality of the results is affected by model simplifications/limitations, errors in temperature measurements and satellite data retrieval, and uncertainties associated with meteorological data. However, the results are still reliable in terms of showing the need for optimized SurHF models, the advantage of two-point versus one-point calibration, and the sensitivity of the lake heat exchange to the actual values of different calibration factors. We used a 3D numerical simulation of Lake Geneva to quantify the negligible contribution of lateral heat exchange to the heat content variation of the water column far from the shore. In general, direct measurements of different heat flux terms and advective heat transport in the water body should be carried out, as this may further refine the validation of the different SurHF terms and the heat balance algorithm applied. Likewise, measurements from more than two points would extend and enhance the present survey. The methodology developed in this study and the results obtained improve the computation of the spatiotemporal SurHF over large lakes and consequently give a better estimation of the total energy exchange at the air-water interface. At the same time, these results may increase the reliability of numerical weather modeling by better accounting for lake-atmosphere heat exchanges.