Volume 17, Issue 6 p. 362-375
New Methods
Free Access

An empirically validated method for characterizing pelagic habitats in the Gulf of Mexico using ocean model data

Matthew W. Johnston

Corresponding Author

Matthew W. Johnston

Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Dania Beach, Florida

Correspondence: [email protected]Search for more papers by this author
Rosanna J. Milligan

Rosanna J. Milligan

Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Dania Beach, Florida

Search for more papers by this author
Cole G. Easson

Cole G. Easson

Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Dania Beach, Florida

Department of Biology, Middle Tennessee State University, Murfreesboro, Tennessee

Search for more papers by this author
Sergio deRada

Sergio deRada

U.S. Naval Research Laboratory, Stennis Space Center, Mississippi

Search for more papers by this author
David C. English

David C. English

College of Marine Science, University of South Florida, St. Petersburg, Florida

Search for more papers by this author
Bradley Penta

Bradley Penta

U.S. Naval Research Laboratory, Stennis Space Center, Mississippi

Search for more papers by this author
Tracey T. Sutton

Tracey T. Sutton

Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Dania Beach, Florida

Search for more papers by this author
First published: 11 May 2019
Citations: 14
Associate editor: Craig Lee

Abstract

Mesoscale oceanic features such as eddies generate considerable environmental heterogeneity within the pelagic oceans, but their transient nature makes it difficult to identify both their spatial and temporal extent and their effects on the distribution of pelagic fauna. Simplifying these complex features using a biologically meaningful classification system will likely be a useful first step in understanding the extent of their influence in structuring open-ocean ecosystems. In this study, we present a tool to classify the pelagic environment in the Gulf of Mexico using sea-surface height and temperature-at-depth data from the 1/25° GOM HYbrid Coordinate Ocean Model (HYCOM). Three “water types” were identified: Loop Current-origin water (LCOW), Gulf common water (CW), and mixed (MIX) water, where the latter represents an intermediate state during the degradation of LCOW to CW. The HYCOM-derived classifications were validated against in situ CTD data and microbial samples collected through 2015–2016 by the Deep Pelagic Nekton Dynamics of the Gulf of Mexico (DEEPEND) consortium. The validation data comprised classifications derived from both temperature-depth (TD) and temperature-salinity (TS) profiles and from microbial community analyses from the surface to mesopelagic depths. The HYCOM classifications produced an overall agreement rate of 77% with the TS/TD classifications, and 79% with the microbial classifications. With applicability across a wide range of spatial and temporal scales, we believe that the system provides a useful, complementary tool for biological oceanographers and resource managers interested in better understanding the effects of major mesoscale features on the pelagic biota.

Mesoscale oceanic features such as cyclonic and anticyclonic eddies generate regions of heterogeneity (e.g., of nutrients, chlorophyll, zooplankton, trophic resources, etc.) at scales of tens to hundreds of kilometers within the pelagic realm. Cyclonic eddies can contain locally enriched phytoplankton biomass (as chlorophyll a) relative to the surrounding waters (Falkowski et al. 1991; McGillicuddy et al. 1998), and those that are persistent may have localized populations of organisms within them. Using the oceanic Gulf of Mexico (GoM hereafter) as a low latitude analog, these features can result in “hot spots” or “oases” of biological activity (Biggs and Ressler 2001), ostensibly through the shortening of pelagic food chains; i.e., increased nutrient input (nitrates) favoring new production of large phytoplankton (e.g., diatoms), favoring production of large zooplankton (e.g., calanoid copepods), and favoring production of higher trophic levels (e.g., fishes, macrocrustaceans, pelagic molluscs). Many important habitats also occur at boundaries between different zones, for example, at eddy margins (Bakun 2006). Convergent fronts regions can concentrate planktonic food resources (Olson and Backus 1985; Zimmerman and Biggs 1999) and enhance vertical mixing, but may also concentrate pollutants. Commercial fishers often rely on satellite and model results to predict these boundary areas as they tend to concentrate plankton (Lee et al. 1991), which attracts schooling predatory, target fishes (Lindo-Atichati et al. 2012). As frontal boundaries and mesoscale eddies are also the preferred spawning habitat and nursery grounds for many important commercial fish species such as scombrids and carangids (Ditty et al. 2004; Lindo-Atichati et al. 2012; Muhling et al. 2013; Rooker et al. 2013; Mohan et al. 2017; Cornic et al. 2018), these features may increase the exposure of their larvae and eggs to environmental toxins (Hazen et al. 2016; Mohan et al. 2017).

Although there is a growing interest by biological oceanographers and fisheries managers in the role of mesoscale features in many oceanic processes (e.g., structuring of pelagic assemblages, larval transport, food resource availability, etc.), it remains difficult to generalize their effects across different oceanic regions. In part, this is due to the highly complex, four-dimensional nature of the pelagic realm, and the inherently transient nature of mesoscale features, whose properties are determined by the particular oceanographic and meteorological settings in which they occur. Nonetheless, despite (or perhaps because of) this complexity, classifying individual mesoscale features in a simple manner can help when making initial predictions about their effects on biophysical processes, as well as on faunal assemblages and populations of interest. This level of variability is also important when considering spatially explicit models (e.g., food webs) for ecosystem-based management.

One region where it is particularly desirable to understand the influence of mesoscale oceanographic features on the oceanic fauna is the GoM. The GoM is a semi-closed, semi-tropical oceanic basin, lying between the Caribbean Sea and the Atlantic Ocean. Most oceanic water enters the GoM through the Yucatan Channel (Rivas et al. 2005; Sturges and Kenyon 2008; Chang et al. 2011) forming the Gulf Loop Current before exiting through the Florida Straits into the NW Atlantic to form the Gulf Stream. The Loop Current is a major source of environmental heterogeneity in the upper circulation (to ca. 1000–1200 m; Cardona and Bracco 2016) of the offshore GoM, through the semi-regular formation of Loop Current eddies. Loop Current eddies are relatively large (tens to hundreds of kilometers diameter), persistent (months to years), downwelling features that have separated from the Loop Current, and which gradually propagate westward through the GoM, mixing with the surrounding water mass (Gulf Common Water [CW]) until they ultimately dissipate (Vukovich 2007). Loop Current eddies across the GoM are readily identified by an elevated mean sea-surface height (SSH; e.g., Vukovich 2007), increased temperatures at depth (Herring 2010), and low concentrations of chlorophyll a (Chl a) (e.g., Biggs 1992). Loop Current-origin water (LCOW) may be distinguished from CW by the presence of the Subtropical Underwater water mass, which is absent from CW. Characterized by a pronounced salinity maximum (ca. 36.7) at ca. 150–200 m and ca. 23°C (Rivas et al. 2005), Subtropical Underwater provides a useful means of discriminating between the two water types (CW and LCOW) in the GoM, if in situ temperature and salinity data are available. The lateral and vertical mixed boundaries of mesoscale features form transitional gradients between LCOW and CW, exhibiting intermediate temperature-depth (TD) profiles, pycnocline temperatures (Herring 2010), and SSHs (Lindo-Atichati et al. 2012).

Microbial community structure provides another means of tracking mesoscale features and discerning feature boundaries via their use as indicator taxa. This is because the oceanic environment imposes constraints that microbes are forced to respond to quickly, leading to rapid growth and death with limited to nonexistent mobility. The microbial community in water mass can be measured directly and therefore can provide clues to the water's composition and origin. Previous research (e.g., Rabalais et al. 2001; Hewson et al. 2006; Lam and Kuypers 2011; Joye et al. 2014; Karl 2014; Mason et al. 2016) shows strong evidence of microbe-environment coupling at both local and global scales, related to a suite of environmental variables including nitrogen and phosphorus concentrations, irradiance, temperature, and depth (e.g., Sunagawa et al. 2015). Because the microbial community should respond the most rapidly to physical and chemical conditions (i.e., when compared to higher-order mobile organisms), these data can integrate multiple physical and chemical variables into a single biological metric and may therefore be a valuable complementary indicator of mesoscale features. Moreover, microbial plankton community structure can be related to higher trophic levels (large fishes, cetaceans, and seabirds) through their reflection of mesoscale variance of pelagic food web structure (Cho and Azam 1990; Guidi et al. 2016). In cases of mesoscale enrichment (e.g., cyclonic upwelling, offshore transport of coastal waters), algal composition can shift from the picoplankton-sized flora that normally dominate oligotrophic regimes to larger cell sizes (e.g., diatoms, Goldman and Dennis 2003). This shift favors grazing by metazoans due to particle-size selectivity (e.g., grazing by large calanoid copepods, the preferred prey of oceanic zooplanktivorous fishes, which are the primary prey of oceanic top predators; Hopkins et al. 1996). Thus, microbial “indicator” assemblages are potentially useful proxies when examining the dynamics of pelagic ecosystem structure in an end-to-end sense.

Collecting continuous temporal data across such a large spatial area as a pelagic ocean basin is costly and difficult. However, in recent years, computing power has advanced sufficiently that it is now possible to create relatively high-resolution, temporally and spatially explicit physical models of ocean basins at relatively low cost. In the GoM, one such model is the HYbrid Coordinate Ocean Model (HYCOM) (Chassignet et al. 2007). HYCOM simulation data are empirically validated estimates of water flow that perform well seasonally and annually (Kourafalou et al. 2009), and are widely used by the scientific community (e.g., Chassignet et al. 2003; Prasad and Hogan 2007; Le Hénaff et al. 2012; Johnston and Purkis 2015). Models such as HYCOM can provide greater temporal and spatial coverage than is feasible with in situ measurements, and can simulate three-dimensional conditions rather than being restricted to the surface ocean only (e.g., as with satellite-derived measures). As a result, they can be extremely useful in identifying and predicting oceanographic features such as mesoscale eddies.

The Loop Current and Loop Current eddies have been associated with reduced abundances and biomass of pelagic fauna within the GoM (e.g., Biggs 1992; Biggs and Muller-Karger 1994; Zimmerman and Biggs 1999; Wells et al. 2017), and may be an important mechanism for transporting planktonic and low-mobility fauna into and out of the GoM (e.g., Olson 1991). The presence of two distinct water masses (i.e., CW and LCOW) with distinct biological properties provides the primary impetus for the development of a discrete classification system for the GoM. Given the limited understanding of how the population structures or assemblage structure of offshore fauna changes through space and time, a classification tool that combines the accuracy of in situ measurements with the broad spatio-temporal extent of oceanographic model data would be able to help resolve the role of mesoscale features as drivers of pelagic faunal assemblages at a range of scales. In the present study, we aim to produce a robust, standalone classification system based on publically available HYCOM simulation output to identify mesoscale oceanographic features in the GoM that has been validated by in situ physical (i.e., conductivity, temperature, and depth (CTD) casts) and biological (i.e., microbial community structure) data. The system aims to classify the GoM into three primary water types: LCOW, CW, and mixed water (MIX; representing an intermediate state between CW and LCOW). We anticipate this system will be useful for biological oceanographers operating within the GoM to help, for example, with survey designs for understanding how water mass structure affects biological communities in the GoM.

Materials and procedures

Empirical classification of mesoscale features

Temperature, conductivity, density, depth, and water column microbial plankton data were collected between 0 and 1500 m in the northern GoM across four research cruises (DP01–DP04), conducted in May and August 2015–2016 by the Deep Pelagic Nekton Dynamics of the Gulf of Mexico (DEEPEND) Consortium (www.deependconsortium.org) (Figs. 1, 2). CTD casts (N = 64) were made at each sampling station to generate TD and temperature-salinity (TS) profiles (“TS/TD classifications” section) and concurrently collect water samples for analysis of the microbial plankton community composition (“Microbial classifications” section). The TD profiles were used to inform the starting parameters for the HYCOM-based classification system, and all empirical data were used to validate the modeled predictions.

Details are in the caption following the image
HYCOM SSH for the GoM for the first day of sampling (08 August 2015) by the DEEPEND Consortium for cruise DP02. Sampled stations indicated with black dots. Chart datum: WGS84.
Details are in the caption following the image
CTD casts showing the mean (a) TS and (b) TD profiles (50–1000 m only shown for clarity) following visual classification of the CTD casts collected during DP01–DP04. LCOW is colored red, MIX is orange, and CW is gray. The mean profiles for LCOW (solid line) and CW (dashed line) water are overlain. The shaded box and dashed line in (b) highlight the 200–600 m depth range and 300 m, respectively.

TS/TD classifications

Temperature and salinity profiles were generated from 64 CTD casts made during DP01–DP04 (Fig. 2). A Sea-Bird Scientific SBE-911plus CTD was used on all cruises and Seabird Scientific's SBE data processing software (version 7.25.0) was used to convert the raw data to calibrated units, apply operational corrections (e.g., align to pressure and thermal mass corrections), and test data against variability or descent rate thresholds. Information from rosette-bottle water samples (e.g., Chl a concentration) was then applied to CTD fluorometer measurements to improve the accuracy of the in situ fluorometry estimates. A median of the filtered and corrected CTD values in 1-m depth increments was used to produce the 1-m depth binned values. Since the water masses of interest occurred at mesopelagic depths, only data collected below 10 m were included for consideration to reduce the variability of the data. Erroneous data points and severe outliers were manually removed from each profile before the data were plotted. Each cast was visually classified as LCOW or CW according to the presence or absence of Subtropical Underwater, respectively. The TD profiles through the upper permanent thermocline (ca. 200–600 m depth) were then compared between the two classes, and used to help identify MIX water. Temperatures within the permanent thermocline have been shown to discriminate well between water types (Herring 2010) throughout the GoM, while reducing the influence of seasonal temperature differences and seasonal mixed layers within the upper 200 m.

Following visual classification of the CTD casts, the approximate minimum mean LCOW and maximum mean CW temperatures (200-600m) were used to parameterize the initial HYCOM-based classification and in subsequent calibration (Table 1). MIX classifications were those that did not meet the criteria of LCOW or CW.

Table 1. HYCOM classification scheme values: SSH and T300 parameter values for the HYCOM classification scheme separated by water type and by initial and final calibrated criteria.
Water type SSH: Initial criteria T300: Initial criteria SSH: Calibrated criteria T300: Calibrated criteria
LCOW >daily mean SSH + 0.073 >16.5°C >daily mean SSH + 0.067 >15.92°C
MIX >daily mean SSH and <daily mean SSH + 0.073 13.0–16.5°C >daily mean SSH and <daily mean SSH + 0.067 13.46–15.92°C
CW ≤daily mean SSH <13.0°C ≤daily mean SSH <13.46°C

Microbial classifications

Seawater samples were collected at a series of depths corresponding to: surface waters (0–10 m), epipelagic (at the chlorophyll maximum: 40–130 m), and mesopelagic (at the O2 minimum: 230–750 m) between 0 and 1500 m using Niskin bottles. Immediately after collection, three seawater samples at each depth (volume between 1 and 2 L each) were filtered through a 0.45-μm filter membrane under low pressure to separate the bacterial microbiome from the water. After filtration, membranes were frozen at −80°C until laboratory processing at Nova Southeastern University. Genomic DNA was extracted from half of each filter membrane using the MoBio PowerSoil Powerlyzer DNA extraction kit, following the manufacturer's protocol, and the remaining half of each filter membrane was archived at −80°C. polymerase chain reaction (PCR) amplification, primer design, amplicon cleaning, and normalization were performed according to standard protocols of the Earth Microbiome Project (EMP; www.earthmicrobiome.org; Caporaso et al. 2010, 2012). Sample preparation and loading onto the Illumina MiSeq sequencer followed the standard Illumina protocol, with the exception of adding custom EMP primers to the default primers. Sequencing was accomplished using a 500 cycle V2 chemistry Illumina kit, which outputs paired-end 250 base pair amplicons.

Initial sequence processing was done using QIIME (Caporaso et al. 2010). In QIIME, forward and reverse sequences were paired, quality filtered (q-score ≥ 25), and clustered into operational taxonomic units (OTUs) using the QIIME default settings for open reference OTU picking. OTUs were classified using the SILVA database version 128 (Quast et al. 2013).

Statistical analysis

Random forest analysis was used to classify microbial communities into discrete categories using the “randomForest” package (Liaw and Wiener 2002) in RStudio software (version no. 1.0.153/R version 3.4.1; RStudio Team 2015, R Core Team 2017). Samples from each cruise were analyzed separately, and designations of LCOW and CW population types were made for samples in the depth bins: surface water, epipelagic, and mesopelagic. For brevity, the samples and their classifications are referred to as the CTD cast from which they were collected. Before analysis, rare taxa were removed (taxa occurring in < 10% of all samples), and remaining features were converted to relative abundance. Within each cruise, the TS/TD profiles (see “TS/TD classifications” section) were used alongside satellite-derived Chl a concentrations (NASA Goddard Space Flight Center et al. 2014; MODIS/Aqua; NASA Worldview) to select microbial samples from CTD casts that most likely occurred exclusively in either LCOW or CW (Figs. 2, 3). These data were then used as the training data set for the random forest analysis (N = 5000 trees). Preliminary analysis showed good agreement (0% out of bounds error) between all a priori classifications and the training data. Subsequently, the output of the analysis was used to classify the remaining (test) data using the “predict.randomForest” function in R. The output of the predict function was then used to conduct a supervised random forest classification (N = 5000 trees) for all stations and to calculate the confidence score of the predicted classifications. Final classifications of LCOW, CW, or MIX were determined by the confidence score of the supervised classification. CTD samples were classified as either LCOW or CW, and a proportion of “votes” for each class was recorded as a measure of classification confidence. Initial classifications were based on majority rules, but to account for boundary conditions, we also considered class confidence. Since we treated MIX water as a transitional gradient from one water type to the other, we assigned three categories (i.e., confidence cutoffs) to the random forest confidence results to reflect this gradient. The categories were: > 80%, 60–80%, and < 60%, which represent a single water type, mostly one water type, and mixed water, respectively. Whole CTD cast classifications were then determined by combining the classifications of the individual samples in the cast (up to three samples per cast). LCOW and CW casts were those that contained only one water type, and MIX casts were those that either contained a combination of LCOW and CW at different depths or low confidence scores for individual samples in the cast.

Details are in the caption following the image
Location of microbial training data set collections with a background of chlorophyll satellite imagery for cruises DP01 (a), DP02 (b), DP03 (c), and DP04 (d). Stars indicate stations where training data set samples were collected. Red stars indicate CW training samples and yellow stars indicate LCOW training samples. Surface chlorophyll concentration is colored from blue (low) to red (high).

HYCOM classification algorithm

HYCOM is a three-dimensional, eddy-resolving circulation model that depicts ocean conditions (e.g., SSH, zonal velocity, meridional velocity, temperature, and salinity) in near real time from surface waters to the benthos. Outputs are interpolated to 40 standard z-levels (depths) and available in nonassimilated or data-assimilated versions, the latter integrating satellite altimeter observations, in situ and satellite sea surface temperature, and in situ vertical temperature and salinity profiles. In the GoM, HYCOM data are available at 1/25° (ca. 4 km2) horizontal resolution, in hourly intervals from 1993 to the present day (publicly available at: http://hycom.org). SSH and water temperature data were derived from HYCOM simulations for the years 2015 and 2016 spanning the entire GoM (HYCOM + NCODA Gulf of Mexico 1/25° Analysis, GOMl0.04/expt_32.5). Tidal variations in SSH were removed by calculating daily averaged SSH measurements for a model domain with bounding coordinates of 98°W to 79°W and 18°N to 31°N.

HYCOM classification: Initial parameterization

The classification system was initially parameterized using data from 08 August 2015 (day 1 of DP02). This date was chosen because the Loop Current was extended into the northern GoM, a large Loop Current eddy was in the process of shedding, and all three water types of interest could be approximated visually using SSH and water temperature at 300 m depth (T300). The main LCOW region was visually identified from a SSH map (Fig. 1), and locations near the edges of the feature were queried using a GIS (ESRI ArcMap 10.3) to obtain the approximate SSH value that delineated LCOW and MIX from CW. This value was then normalized by subtracting the mean GoM-wide SSH on that day from the identified SSH threshold. The initial values for parameterizing T300 were derived from the CTD mean temperature ranges at 300 m for each water type (see “TS/TD classifications” section), where the boundary between LCOW and MIX was taken to be the lowest temperature at 300 m obtained from an LCOW cast, and the boundary between CW and MIX was taken to be the highest temperature recorded from a CW cast. Locations that did not meet these criteria were flagged as unknown (UNK) and shelf waters (i.e., < 300 m—lacking T300 data) were also identified. All initial starting values are shown in Table 1.

The above classification system (the HYCOM classifications) was then applied to all GoM waters within the study domain from May 2015 to August 2016 to cover the period sampled by DP01–DP04. The HYCOM classifications at each of the empirical sample locations were then extracted for comparison to the classifications derived from in situ TS/TD profiles (“TS/TD classifications” section) and microbial data (“Microbial classifications” section).

HYCOM classification: Calibration

Model calibration was conducted across a gradient of SSH and T300 values to refine the HYCOM classification parameter values and to improve the agreement between classification schemes. First, the minimum SSH criteria for LCOWs were increased in ± 1% increments from the initial value (daily mean SSH + 0.073) while simultaneously increasing the minimum and maximum T300 criteria in 0.5% increments. Next, the SSH and T300 criteria were reduced from the starting values using the same increments. The overall testing scenario resulted in a data set of predicted classifications spanning 200 combinations of SSH and T300 values. All predicted classifications for each of the 200 scenarios were compared to the TS/TD and microbial classifications, and the percentage agreement calculated for each CTD cast across the entire data set. From this testing scenario, the final values for the classification scheme were chosen based on best fit percentages when combined over years to produce the “best” classification criteria. These criteria were applied to the entire 2015 and 2016 daily HYCOM data sets over the cruise periods to classify all GoM waters (Fig. 4, Supporting Information Animations S1–4).

Details are in the caption following the image
Locations of sample stations surveyed during each cruise in relation to a HYCOM classification scheme for a typical day: (a) DP01 (03 May 2015); (b) DP02 (19 August 2015); (c) DP03 (01 May 2016); (d) DP04 (06 August 2016). LCOW is colored in red, MIX in orange; CW in dark gray; UNK and waters shallower than 300 m are shown in light gray. Black symbols indicate stations where the empirical data agree with the HYCOM classes; gray symbols indicate partial agreement (i.e., either microbial or TS/TD classes agree); white symbols indicate no agreement. Circles indicate stations where both TS/TD and microbial data were available; squares indicate TS/TD only.

Assessment

TS/TD classifications

The TS/TD profiles suggested that 13 CTD casts occurred within LCOW, 38 within CW, and seven within MIX (Fig. 2). The MIX profiles were highly variable, though the TS and TD profiles typically showed degraded Subtropical Underwater profiles, with intermediate temperatures in the TD profiles (compared to LCOW and CW) through the upper thermocline, or a steep, local thermocline suggesting a transition between LCOW and CW within the cast. Six casts could not be classified with confidence or contained missing data and were excluded from further consideration. Two casts (DP03-CTD-038 and DP03-CTD-039) classified as CW contained notably colder water than the other samples through the permanent thermocline.

T300 measurements from each cast showed good separation between each of the three water types. From the present data set, the T300 values ranged from 7.49°C to 13.59°C for CW, 12.71°C to 18.28°C for MIX, and 16.81°C to 20.63°C for LCOW water. LCOWs may be hotter and CWs cooler than recorded here without affecting their classification. The highest MIX temperature (18.28°C) was recorded from DP02-CTD-027, which clearly transitioned from LCOW water to MIX/CW at ca. 400 m. If this data point is excluded, the range for MIX water becomes 12.71–15.66°C. Comparing the T300 measures from the CTD data and HYCOM predictions showed a positive linear relationship between the data sets (Pearson's correlation coefficient = 0.897) but suggested that the HYCOM predictions underestimated the observed temperatures by 1.36°C on average (range: −2.5 to 4.5°C; Supporting Information Fig. S1).

Microbial classifications

Random forest analysis of the microbial samples showed that distinct communities can be used to distinguish LCOW and CW water types at each depth. Significant differences between sampled stations were largely attributed to variation in the relative abundance of microbial taxa and only partially to unique taxa presence in either LCOW or CW. For each training data set, individual taxa that distinguished different water masses and depths were identified. These indicator taxa were ranked by how their removal effected the model accuracy. For example, in DP02, surface CW contained a higher relative abundance of many photosynthetic taxa relative to LCOW, including some Stramenopiles (diatoms), Synechococcus (Cyanobacteria), and SAR86 Gammaproteobacteria (labeled as Candidatus Portiera in Greengenes database) as well as other taxa in Bacteroidetes, Verrucomicrobia, and Actinobacteria. The higher abundance of these specific taxa in surface CW was used by the random forest model to distinguish CW from LCOW (Fig. 5 and Supporting Information Table S1). While these indicator taxa exhibited higher relative abundance in CW, other taxa from some of these same groups were abundant in LCOW. Within depth zones, taxa differences were often due to shifts in more closely related taxa rather than the broad taxonomic shifts observed across depths. These shifts in more closely related taxa may indicate shifts in microbial ecotypes based on the environmental conditions (e.g., optimal irradiance) in each water mass.

Details are in the caption following the image
The relative abundance of the 30 most abundant taxa in the training samples for the random forest model for (a) DP01, (b) DP02, (c) DP03, and (d) DP04. Sample groups are separated by depth zones based on collection depth and representing the ocean surface (surf), Chl a maximum in the epipelagic (Epi), and the oxygen minimum zone in the mesopelagic (Meso). Microbial communities are classified to genus or the lowest classification possible based on the GreenGenes database.

A total of 58 CTD casts (out of 64) yielded 153 discrete samples across the three depths (DP01: N = 12, DP02: N = 29, DP03: N = 67, DP04: N = 45) that were used for analysis. Of these 58 CTDs, 43 were classified as CW, eight as LCOW, and seven as MIX.

In DP01, all samples were classified as primarily CW, but two stations (DP01-CTD-006 and DP01-CTD-007) appeared to contain some mixing with LCOW water. A clear LCOW sample was not present in DP01. On the other hand, DP02 CTDs indicated both LCOW and CW water. The prominent LCOW during DP02 intruded far into the northern GoM, and the cruise track was designed to sample a gradient of LCOW to CW across this feature. A total of 19 sampled depths showed a single water mass classification, and seven samples indicated mostly one water mass. Six of the seven samples in the latter classification were from the mesopelagic zone. Three samples were mixtures of LCOW and CW and included: DP02-CTD-16-MESO, DP02-CTD-24-EPI and -MESO. Some stations in DP02 exhibited a shift in water mass type across the depth gradient. For example, DP02-CTD-22, which appeared near the edge of the LCOW feature in satellite imagery, contained CW at the surface, LCOW water in the epipelagic, and mostly LCOW water in the mesopelagic. DP02-CTD-16 also showed a depth-related shift with LCOW at the surface, CW in the epipelagic, and mixed water in the mesopelagic.

In DP03, both LCOW and CW water masses were sampled though CW comprised the majority of samples during this cruise. LCOW water was detected in DP03-CTD-44 to DP03-CTD-47; however, for DP03-CTD-45 and DP03-CTD-47, only the surface and epipelagic samples were classified as LCOW. Mixed LCOW and CW were detected in DP03-CTD-36-EPI, DP03-CTD-51-EPI, and DP03-CTD-48-EPI.

HYCOM classifications

From 64 CTD casts over all DEEPEND cruises, the HYCOM classification yielded 11 LCOW, eight MIX, and 45 CW classes (Table 1; Fig. 4). During cruises DP01 and DP03, LCOW water was not identified by the HYCOM scheme at the sampled locations. However, several stations (DP01 [N = 2], DP03 [N = 3]) were bordering LCOWs and met the MIX criteria. The remaining stations (DP01 [N = 6], DP03 [N = 20]) were identified as CW. In DP02, the Loop Current featured prominently within our sample grid and so eight LCOW stations were sampled, two MIXs, and the remainder (N = 8) were CWs. In DP04, two stations were within an aging Loop Current eddy in the north-central GoM and were classed as LCOW. The balance of the DP04 locations comprised one MIX and 11 CW.

When applied to the entire study domain spanning cruises DP01–DP04, the HYCOM scheme found 21.4% LCOW, 13.0% MIX, and 29.4% CW. Notably, only 1.9% of water was unable to be classified (UNK). About 1/3 (34.3%) of water was outside of the deep-pelagic zone and was identified as shelf water (i.e., < 300 m).

Comparison between methods

The initial HYCOM classifications showed 60% agreement with the TS/TD classifications and 54% with the microbial classifications overall. Following calibration, the maximum overall agreement with the TS/TD classifications increased to 77% (44 of 57 stations) and agreement with the microbial classifications increased to 79% (46 of 58 stations) (Table 2, see also Supporting Information Table S2 for a detailed comparison between methods). In 2015, the calibrated HYCOM classifications agreed with 65% of the TS/TD classifications (15 of 23 stations) and 75% of microbial classifications (15 of 20 stations). In 2016, agreement was higher; the calibrated HYCOM classifications agreed with 88% of the TS/TD classifications (30 of 34 stations) and 82% of microbial classifications (31 of 38 stations).

Table 2. Percent agreements between HYCOM, TD/SD, and microbial classification schemes.
Empirical class HYCOM: LCOW HYCOM: MIX HYCOM: CW
Microbial: LCOW 6 (75%) 2 (25%)
Microbial: MIX 1 (11%) 2 (22%) 6 (66%)
Microbial: CW 3 (6%) 48 (94%)
TS/TD: LCOW 9 (69%) 4 (31%)
TS/TD: MIX 2 (33%) 4 (67%)
TS/TD: CW 2 (5%) 36 (95%)
  • The bold terms are positive agreements between methods.

Overall, 13 of the 57 TS/TD classifications disagreed with the HYCOM classifications. Of these, the HYCOM system classified two stations as MIX where the TS/TD data classed them as CW, four as MIX rather than LCOW, five as CW rather than MIX, and two as LCOW rather than MIX. Similar results were observed for the microbial data where 12 stations disagreed with the HYCOM classifications. Of these, the HYCOM system classified four stations as MIX where the microbial data classed them as CW, two as MIX rather than LCOW, and six as CW rather than MIX. Disagreements never occurred between LCOW and CW classes using either the microbial or TS/TD classifications.

The microbial and TS/TD classifications agreed in 92% of cases (48 of 52 stations), with the microbial classifications being more likely to class a sample as MIX than the TS/TD plots. The microbial data classed two stations as MIX where the TS/TD data classed them as LCOW, and two stations as MIX where the TS/TD data classed them as CW. In the former disagreements, microbial and HYCOM classifications agree, and in the latter, TS/TD and HYCOM classifications agreed.

Discussion

Understanding how environmental heterogeneity influences the pelagic fauna is a complex question, requiring accessible methods that can reliably identify and discriminate between physical and chemical water features of biological relevance at appropriate scales. Most prior studies that aimed to classify pelagic water types were limited to singular methods such as temperature at depth (e.g., Herring 2010), microbial community structure (e.g., Djurhuus et al. 2017), or satellite measurements of the phenomenon (Lindo-Atichati et al. 2012). The classification system we present, however, allows the identification and discrimination of LCOW, CW, and MIX water types in the GoM using only publicly available HYCOM simulation data, without the need for field sampling, and has been validated using in situ physical, chemical, and microbial data collected during 2015–2016. The HYCOM classification system agreed with the empirical data classifications in 77% (TS/TD profiles) and 79% (microbial community data) of cases, with strongest overall agreement observed within the CW and LCOW classifications. The TS/TD profiles from the present study also showed good agreement with water mass records taken throughout the GoM (Rivas et al. 2005; Herring 2010), suggesting that the proposed classification system should be generally applicable to the wider GoM, beyond the limits of the surveyed locations. Taken together, these findings indicate that the HYCOM classification system detailed here may be a useful tool to identify biologically meaningful water types in the GoM, particularly in cases where in situ or remote-sensing data are either not available, or cannot provide adequate temporal, spatial, or depth resolution for the study being considered.

We anticipate that this tool will be particularly relevant in addressing questions concerning how such mesoscale features may influence the distributions and biodiversity patterns of pelagic fauna and also the exposure of their offspring to pollutants, from the surface to mesopelagic depths. By coupling the classification system to the HYCOM model, it will also be possible to address how mesoscale features might influence processes such as the dispersal and connectivity of planktonic (including microbial plankton) and low-mobility fauna, which may play roles in structuring pelagic food webs (Mohan et al. 2017). Our microbial plankton community data from GoM surface waters indicate clear differences between LCOW and CW with the LCOW dominated by small photosynthetic cyanobacteria in the genus Prochlorococcus, whereas CW communities contained a much higher relative abundance of larger phytoplankton such as diatoms. In oligotrophic waters, dominance of bacteria (heterotrophic and small phototrophic taxa) over eukaryotic phytoplankton can lead to a food web that transports very little carbon to higher trophic levels (Cho and Azam 1990, Legendre and Rassoulzadegan 1995, Gasol et al. 1997, Giudi et al. 2016). Conversely, microbial plankton communities dominated by groups such as diatoms support enhanced production of mesozooplankton (e.g., copepods and euphausiids), which in turn support enhanced production of intermediate trophic levels (epipelagic and mesopelagic “baitfishes”), and so-on, up to top predators (cetaceans, seabirds, and commercially important fishes; Biggs and Ressler 2001). Additionally, numerous economically important species in the GoM produce planktonic larvae in the spring and summer, including yellowfin, blackfin, and Atlantic bluefin tunas (Hazen et al. 2016; Cornic et al. 2018). Understanding the spawning behaviors of these and other pelagic species in relation to mesoscale features that may concentrate zooplankton prey and serve as nursery areas (Ditty et al. 2004) can be useful to predict the timing and habitat used during the spawn and the subsequent size and distribution of cohorts. Such data are important, for example, to inform population models that are used for stock assessments and to plan the recovery of populations impacted by natural and human-mediated disasters, and accessible methods for classifying pelagic habitats will likely be of value to such efforts. Furthermore, as the use of habitat models in marine conservation and fisheries management expands, combining adult telemetry data with fisheries observer bycatch data and larval samples collected in defined pelagic habitats can provide a more holistic picture of habitat use in the GoM for pelagic fauna that inhabit epipelagic and/or mesopelagic depths (Hazen et al. 2016).

Overall, the majority of sampled stations were classed as either CW or LCOW and the agreement rate between methods was high for these classes (see Table 2). All disagreements between the three classification systems were of MIX water, highlighting the greater difficulties in accurately predicting the behavior of LCOW boundaries, where small, ephemeral variations in physical and chemical properties occur as the parent LCOW mixes with the surrounding CW. Such processes likely occur at finer spatial and temporal scales than could be simulated by the present HYCOM system (i.e., < 3 h and < 4 × 4 km). How problematic these disagreements are will depend on the requirements of a given study. However, at the spatio-temporal scales considered here, our findings suggest that it is possible to consistently identify the more stable CW and LCOW water masses with a high degree of confidence, with caution exercised near boundary regions.

The TS/TD and microbial classifications agreed in 92.3% of cases (48 of 52 samples). All four differing classifications were instances where TS/TD classifications indicated LCOW or CW, whereas microbial classifications showed MIX water, but this discrepancy may simply reflect the different spatio-temporal scales inherent to each method. Of the two methods, the microbial classifications are likely to be the most sensitive to fine-scale environmental variability through the water column (Whitman et al. 1998; Fuhrman 2009; Sunagawa et al. 2015), whereas the TS/TD profiles produce a more integrative summary of the entire water column. The strong agreement between these metrics suggests that microbial community data are an accurate biological indicator of water mass structure within pelagic ecosystems from the surface to mesopelagic depths.

A classification system of this type is useful for simplifying some of the environmental variability associated with LCOWs in the GoM, but it is important to note that the classes presented here should be considered representative of broadly similar water types across the GoM. For example, the physical and chemical properties of MIX water may depend on properties such as the age and size of the parent LCOW and prevailing conditions within the GoM. An older, low-energy LCOW (present during DP04, Fig. 4D) propagating to the western GoM may have a larger, and more diffuse boundary of MIX water around its core than a younger, high-energy, and more cohesive LCOW (present during DP03, Fig. 4C) in the eastern GoM, which could have quite different effects on the rates of biotic processes. Similarly, the shape and vertical extent of an LCOW is not uniform across its diameter, which may give rise to complex vertical structuring, as was observed particularly in the TS/TD profiles classed as MIX. The microbial community analysis showed similar complexity within MIX waters, with irregular shifts in composition occurring across sampling depths. In fact, all of the microbial community disagreements with the HYCOM classification scheme were due to shifts across depths, demonstrating the sensitivity of the microbial classification in particular to fine-scale variability within the water column. While the proposed HYCOM classification system does not explicitly consider the fine-scale vertical structure of the water column, the use of both T300 and SSH was intended to mitigate some of this variability (at least to 300 m depth), since both metrics should covary if the water mass structure is the same at both the surface and at depth. Locations where this covariance does not occur (i.e., in some locations classed as UNK) could then be indicative of discrepancies caused by depth effects and in turn be indicative of the vertical boundaries in MIX waters.

Although CW has been classified as a single water mass, it is more heterogeneous in terms of its physical properties than LCOW, particularly in the upper waters (< 200 m) of the GoM, which are subject to marked seasonal changes in temperature, salinity, and productivity (e.g., Muller-Karger et al. 2015). Upwelling cyclonic rings (CRs) also commonly form within CW regions, on the boundaries of Loop Current eddies, and share the water mass characteristics of CW (Rudnick et al. 2015), but are typically cooler at depth than the surrounding CW and are relatively small (tens of kilometers) and ephemeral (days to weeks) compared to LCOWs. From a biological standpoint, CRs have been shown to support higher rates of productivity than surrounding waters (Shulzitski et al. 2015), and may work in conjunction with LCOW to physically draw highly productive, low-salinity coastal water hundreds of kilometers offshore (Biggs and Muller-Karger 1994). While a lack of sufficient data meant it was not possible to identify and classify CRs in the present classification scheme, alternative methods for the identification of CRs and the extent of the coastal plume, for example, using 18S environmental DNA that captures microeukaryotes, will be considered in the future.

Last, the HYCOM classification scheme presented here was derived from SSH and T300 metrics estimated from the HYCOM simulations, and while HYCOM generally performs well in surface waters, few studies have tested HYCOM simulation accuracy at depth. When comparing HYCOM estimates to in situ measurements obtained via the CTDs in this study, we found that HYCOM underestimated T300 by an average of 1.36°C overall (Supporting Information Fig. S1). As the intent was to produce a classification system that can be implemented using unaltered public HYCOM simulation data, the T300 values we used as delimiters contain this inherent HYCOM error. It follows that as HYCOM simulations evolve and mature, the cutoff values we used may need adjustment if future HYCOM simulations more accurately forecast in situ temperatures in deep water. Nonetheless, incorporating deep-water characteristics into the classification system means it can be usefully applied to investigations into the distributions of both surface- and deep-living fauna.

Comments and recommendations

In a post-Deepwater Horizon oil spill GoM, there is a need to better understand the biological effects of this and future disasters, particularly in the deep-pelagic realm where there are few baseline data against which to assess the impacts of human activities relative to other natural phenomena (St. John et al. 2016). Such data can also help improve dynamic approaches to fisheries management when aiming to conserve and rebuild populations after disturbances. The classification system in the current study takes a first step in this process by developing an automated way to discern the spatial and temporal extents of different water masses in the GoM. Using two independent, empirical data sets for validation, we show that our classification system can reliably categorize LCOW and CW and is unaffected by seasonal variations in surface waters. Going forward, it will be important to test our classification system for biological significance. This could be done, for example, by collecting fauna within discrete pelagic habitats as defined by our system, and then testing for correlation between water type and community composition. While HYCOM ocean climate data were used in this study, we suggest that our system could be used with any contemporary or future oceanographic forecasting model that is able to realistically emulate in situ ocean conditions. Finally, we believe that the system will be a complementary tool for biological oceanographers and resource managers interested in better understanding the effects of major mesoscale features on the pelagic biota of the GoM across varying spatial and temporal scales.

Acknowledgments

This research was made possible by a grant from The Gulf of Mexico Research Initiative. Data are publicly available through the Gulf of Mexico Research Initiative Information & Data Cooperative (GRIIDC) at https://data.gulfresearchinitiative.org (doi:10.7266/N7QR4VK0; doi:10.7266/N73R0QSX; doi:10.7266/N7PV6HS1; doi: 10.7266/N7R49P43; doi: 10.7266/N7MC8XDC). We acknowledge the use of imagery from the NASA Worldview application (https://worldview.earthdata.nasa.gov/) operated by the NASA/Goddard Space Flight Center Earth Science Data and Information System (ESDIS) project. We also thank Nina Pruzinsky for her insight and contributions to the final version of this paper.