Influence of commercial DNA extraction kit choice on prokaryotic community metrics in marine sediment

Commercial DNA extraction kits are widely used for cultivation‐free surveys of marine sediment. However, the consequences of popular extraction‐kit choices for sequence‐based biological inferences about marine sedimentary communities have not previously been exhaustively assessed. To address this issue, we extracted DNA from multiple paired subsamples of marine sediment using two popular commercial extraction kits (MO BIO Laboratories PowerSoil® DNA isolation kit and MP Biomedicals FastDNATM Spin Kit for Soil). We report comparisons of (1) total DNA yield, (2) extract purity, (3) gene‐targeted quantification, and (4) post‐sequencing ecological inferences in near‐seafloor (< 1 meter below seafloor [mbsf]) and subsurface (> 1 mbsf) marine sediment. In near‐seafloor sediment, the MP Biomedicals FastDNATM Spin Kit for Soil exhibits higher extraction yields, higher 16S rRNA gene loads, higher taxonomic diversity, and lower contaminant loads. In subseafloor sediment, both kits yield similar values for all of these parameters. The MO BIO Laboratories PowerSoil® DNA isolation kit generally co‐extracts less protein with the DNA in both near‐seafloor and subseafloor sediment. For samples from all depths, both kits exhibit similar depth‐dependent community richness patterns, taxonomic composition, and ordination‐based similarity trends. We conclude that, despite kit‐specific differences in extract yields, purity and reagent contaminant loads, ecological inferences based on next‐generation sequencing of DNA extracted using these popular commercial kits are robustly comparable, particularly for subseafloor sediment samples.

Microbes in marine sediment comprise a sizable fraction of Earth's biosphere (D'Hondt et al. 2004;Kallmeyer et al. 2012). Community-wide adaptions to starvation (Jørgensen and D'Hondt 2006), extreme energy limitation (Hoehler and Jørgensen 2013), and slow biomass turnover rates (Lomstein et al. 2012) make cultivation-based ecological interrogation of this habitat prohibitively difficult (Cragg et al. 1990;Parkes et al. 2014;Jørgensen and Marshall 2016). In contrast, the cost of DNA sequencing, an alternative to cultivationbased surveys, is at an all-time low (Muir et al. 2016) and fuels the "next-generation sequencing (NGS)" revolution (Park and Kim 2016). NGS has contributed significantly to understanding of subseafloor sedimentary life (Biddle et al. 2008(Biddle et al. , 2018Jørgensen et al. 2012;Orsi et al. 2013Orsi et al. , 2018Spang et al. 2015;Starnawski et al. 2017;Karst et al. 2018) and is now a standard tool for ecological studies of this and other marine habitats (Orcutt et al. 2011).
The first step in NGS is nucleic-acid extraction from the sample matrix (Lombard et al. 2011). Given the remarkable complexity of marine sediment, a universally optimal DNA extraction method is unrealistic. Therefore, numerous protocols exist that aim to optimize specific aspects of the extraction process (Lipp et al. 2008;Kallmeyer and Smith 2009;Lloyd et al. 2010;Morono et al. 2013Morono et al. , 2014Lever et al. 2015). Different methods of DNA extraction are known to differently affect DNA yield and quality (Mahmoudi et al. 2011;Knauth et al. 2013) and these consequences must be considered in cross-study comparisons (Felczykowska et al. 2015). Standardization of nucleic-acid extraction technique is a known challenge for surveying microbial biogeography across subseafloor habitats (Orcutt et al. 2011). Although formal efforts have been made to bring this issue to light (Orcutt et al. 2013), no consensus has been reached.
Commercial DNA-extraction kits provide standardized, organic-solvent-free alternatives to the laborious tasks of inhouse reagent preparation and protocol optimization (Tan and Yiap 2009). They are increasingly popular for studies of subseafloor habitats. Drawbacks of commercial kits include the unknown composition of proprietary reagents, protocol inflexibility to address sample-specific needs (Lever et al. 2015), and the pervasiveness of kit-specific contaminants (Salter et al. 2014). Surprisingly, given their popularity, a complete assessment of marine sediment extraction kit choice on NGS-based results, from physical characterization of extracts to postsequencing ecological inferences, is lacking. To begin to address this issue, we extracted DNA from marine sediment using two commercial extraction kits (MO BIO Laboratories PowerSoil V R DNA isolation kit and MP Biomedicals FastDNA TM Spin Kit for Soil) that are widely used for studies of this habitat (Francis et al. 2005;Schippers et al. 2005;Edgcomb et al. 2011;Jørgensen et al. 2012;Spang et al. 2015;Walsh et al. 2016). To assess the consequences of kit choice, we compare (1) total DNA yields, (2) extract purities, (3) gene-targeted quantification, and (4) post-sequencing ecological inferences.

Sample collection
All samples are from Site SR1703-10-17PC/TC, cored by R/ V Sally Ride expedition SR1703 to the USA-M exico Border Lands in the Eastern Pacific Ocean in February 2017 (mobilization: San Diego, Scripts Institute of Oceanography, demobilization: San Diego, Scripts Institute of Oceanography). Site SR1703-10-17PC/TC is located at latitude: 33800.959 0 N, longitude: 117852.114 0 W, in water depth of 915 m (Fig. 1A). The samples span the sediment depths from 0 cm to 513 cm below seafloor. The samples were taken immediately after core recovery, stored at 2808C, and transported to shore for DNA extraction, 16S rRNA gene quantification and sequencing. Subsamples of each sample were separately processed with the Mo Bio PowerSoil V R DNA Isolation Kit and the MP Biomedicals FastDNA TM Spin Kit for Soil as detailed below.

Total organic carbon and geochemical description
Briefly, we measured total organic carbon (TOC) in sediment and pore water by pyrolysis, following the procedures of Verardo et al. (1990). Ammonium concentrations were measured fluorometrically, following the procedures of Holmes et al. (1999). We measured oxygen and nitrate using optodes and ion chromatography with UV-absorbance detection, respectively, as detailed elsewhere (D'Hondt et al. 2015).

Sediment characterization
A full physical and geochemical description of the sediment from a nearby site, down to hundreds of meters below seafloor (mbsf), (Tanner Basin, ODP Site 1014) is found elsewhere (Lyle et al. 1997). Briefly, sediment from this area is siliciclastic clay containing calcareous nannofossil and foraminifers. At our coring site, oxygen was not detected, nitrate decreased to below detection within a few centimeter below seafloor (cmbsf), ammonium concentrations increased with sediment depth and TOC (%wt) remained within a 2-5% range ( Fig. 1B-D), matching the observed range of TOC values at ODP Site 1014 at equivalent depths (4-5% for 30-490 cmbsf). Thus, we examined a high biomass anoxic habitat where organic-matter oxidation is likely coupled to sulfate reduction. A constant sedimentation rate of 11.5 cm kyr 21 throughout the late Quaternary (Lyle et al. 1997) dates our deepest sample (513 cmbsf) to 44.6 kyr.  . Briefly, as recommended by the manufacturer, 0.25 g of sediment sample were vortexed, in a PowerBead Tube (2 mL volume), horizontally using a flat-bed vortex adapter pad (MO BIO Catalog# 1300-V1-24) at maximum speed for 10 min. In sum, all steps, involving proprietary reagents and spin filters, were performed as outlined in the manufacture's protocol. Final extracts were eluted in 100 lL of sterile PCR grade DNA-free water and stored at 2808C. Extraction blanks were extracted in parallel.

DNA extraction: MP Biomedicals FastDNA TM Spin Kit for Soil
Chromosomal DNA extractions were performed in triplicate on all samples using the MP Biomedicals FastDNA TM Spin Kit for Soil (MP Biomedicals, Santa Ana, California, U.S.A.). Briefly, as recommended for marine sediment by the manufacturer, 0.5 g of sediment sample was automatically homogenized by bead beating in Lysing Matrix E tubes (2 mL volume) using a FastPrep-96 TM homogenizer at speed 5.5 twice for 40 s (with a 2 min rest on ice). In sum, all steps involving proprietary reagents and spin filters were performed as outlined in the manufacture's protocol. Final extracts were eluted in 100 lL of sterile PCR grade DNA-free water and stored at 2808C. Extraction blanks were extracted in parallel.

Extraction yield and purity
DNA extraction yields were calculated using a Qubit V R 2.0 fluorometer with BR (broad range) dsDNA assay reagents (Invitrogen, Life Technologies). Both A 260 /A 280 and A 260 /A 230 absorbance ratios were measured using a Nanodrop 1000 spectrophotometer (Thermo Scientific).

Sequence analyses
All sequences were processed with mothur v.1.34.4 (Schloss et al. 2009) following the mothur Illumina MiSeq Standard Operation Procedure (Kozich et al. 2013). Briefly, forward and reverse reads were merged into 1.7 million contigs. Only contigs meeting the following criteria were retained: maximum homopolymers of 6 bp, minimal length of 288 bp, maximum length of 294 bp and zero ambiguities. Only 1.03 million contigs ascribed to these criteria. All sequences from extraction blanks were removed from all other groups, thus eliminating blank extractions completely from further analyses. The remaining 688,196 sequences were aligned to the mothur-recreated Silva SEDD v119 database (Yarza et al. 2010), trimmed to the V4-hypervarieble alignment region and subsequently preclustered at 1% dissimilarity using the pre.cluster (diffs 5 3, for 300 bp amplicons, as suggested in the mothur SOP) command. Spurious sequence generation was mitigated by abundance ranking sequences and merging with rare sequences if sequences differed by three base pairs, as outlined elsewhere (Kozich et al. 2013). Chimera screening and removal was performed by implementation of de novo mode of UCHIME (Edgar et al. 2011). Using the average neighbor method, a distance matrix was generated clustering 642,732 sequences into operational taxonomic units (OTUs) at 3% or higher dissimilarity cut off. Taxonomic classification of OTUs was done with mothur using the SILVA v119 database. A summary of group-specific sequence numbers following quality control commands is provided in the Supporting Information Table S1.
Ram ırez et al.
DNA extraction kit choice and subseafloor biology

Statistical analyses and visualizations
All community analyses were performed on a subsampled dataset (n 5 32,031 sequences per group). Estimations of sample richness (Observed OTU counts, Chao1, and Shannon), principal coordinate analysis (PCoA), using Bray-Curtis distances, and OTU abundance heatmaps were generated in RStudio version 0.98.1091 (Racine 2012)

Pore-water geochemistry and TOC
The sediment was free of dissolved oxygen at all sampled depths. Nitrate was 36 lM at the sediment/water interface and was not detected at 17.5 cmbsf (Fig. 1B). Ammonium generally increased with sediment depth from 2 mM near the sediment/water interface to 4.9 mM at 513 cmbsf (Fig.  1C). It increases to 40 mM at 450 mbsf at the nearby Tanner Basin ODP drill site 1014 (Lyle et al. 1997). TOC generally increased with sediment depth from 2% at 0 cmbsf to 4-5% at 232 cmbsf and 523 cmbsf (Fig. 1D).

Chromosomal DNA yields and extract purity
Extracted DNA quantities per cm 3 of wet sediment decrease as a function of sediment depth ( Fig. 2A). Significantly higher DNA quantities were extracted using the MP kit in shallow (0-8 cmbsf) horizons (p < 0.05). In deeper horizons (> 8 cmbsf), extraction yields are similar for both kits and drop rapidly in magnitude with increasing depth, as expected with lower biomass loads ( Fig. 2A). For both kits, extract concentrations from horizons deeper than 32 cmbsf are nearly two orders of magnitude lower than concentrations recovered from the sediment-water interface (0 cmbsf). The MB kit yielded higher concentrations than the MP kit at 32 cmbsf and 513 cmbsf. The 260/280 absorbance ratios are lowest in depths < 10 cmbsf independently of extraction kit used (Fig. 2B). In deeper horizons (32-513 cmbsf), mean 260/280 absorbance ratios are 1.8 but exhibit greater variability than at shallower depths. The 260/230 absorbance ratios, a secondary DNA purity measurement, are significantly different for MB and MP extracts (p 0.05) at all sample depths, with MB 260/230 absorbance ratios consistently above 0.20 and MP ratios consistently below 0.05 (Fig. 2C).
Domain-specific 16S rRNA gene quantification 16S rRNA gene counts for Bacteria and Archaea are highest near the sediment/water interface and decrease as a function of sediment depth (Fig. 3). Count magnitudes for each domain are similar at the sediment-water interface. In deep samples, Archaea comprise an average of 18% of the total prokaryotic 16S rRNA genes independently of extraction kit used and thus the Archaea to Bacteria ratio is lowest. Generally, significantly higher Bacterial and Archaeal gene counts were recovered with the MP extraction kit in the 0-32 cmbsf depth range (p 0.05); however, at 3 cmbsf, domain-specific counts from both kits are statistically identical. In deeper horizons, significantly higher bacterial gene counts were recovered with the MB kit ( Fig. 3A) but Archaeal gene counts were not consistently kit-dependent (Fig. 3B).

Contaminant sequences
Near-seafloor (0-32 cmbsf) samples processed with the MP kit contain a small fraction of sequences observed in MP extraction blanks (3.0-5.6% of total sequences) (Fig. 4). A larger fraction of the sequences in near-seafloor samples processed with the MB kit are also present in the MB extraction blanks (8.6-30.0% of total sequences). For samples from greater depths (232 cmbsf and 513 cmbsf), sequence overlap between blanks and sediment samples is very high (> 45% of the sediment sequences) and does not depend on the kit (Fig. 4).

Prokaryotic community composition
Community composition from both MB and MP extracts is nearly identical for each sample (Fig. 5). Shallow samples (0-32 cmbsf) have sequences assigned to both Archaeal and Bacterial classes. Archaea are represented by unclassified Archaea, Thaumarchaeota, and Euryarchaeota. The Proteobacteria, Planctomycetes, Firmicutes, Chloroflexi, and candidate divisions OP8 (Aminicenantes) and JS1 (Atribacteria) represent the Bacteria. Deeper samples are exclusively dominated by Bacterial sequences; specifically, by Proteobacteria, Firmicutes, and Actinobacteria (> 2% relative abundance). Proteobacteria are abundant in all sampled depths. In shallow samples (0-32 cmbsf), Delta-and Gamma-proteobacteria are present (Fig. 5). At sediment depths greater than 32 cmbsf, the Gamma-proteobacteria constitute nearly 50% of the sequences and exclusively represent the Proteobacteria. In shallow samples, the Firmicute class Bacilli is only present  in the MB communities, despite being extracted with both kits from sample depths > 32 cmbsf. Last, less abundant taxonomic guilds (< 2% of community) and the relative fraction of unclassified prokaryotic sequences decrease as a function of sediment depth for both extraction kits.

OTU-specific prevalence
Both kits yield generally similar abundances for the most abundant 100 OTUs in the dataset (Fig. 6). Despite this trend, a few exceptions exist, defined by substantially different (one order of magnitude or more) kit-specific OTU abundances within the same depth horizon (Fig. 6, red boxes). For the 10 most abundant OTUs, these differences are most accentuated in the 3 cmbsf and 8 cmbsf horizons where the MP kit yields much lower normalized abundances for OTUs 1-5 and 8. For the next 90 most abundant OTUs (10-100), across all sampled depth horizons, there is no consistency regarding abundance and extraction kit since, for a single sediment depth, different OTUs where large abundance disparities are present differ in the extraction kit that reports the high vs. low values (e.g., at 232 cmbsf; OTUs 12,73,82,86,89,90,and 92).

Alpha-and Beta-diversity metrics
For near-seafloor sediment (0-32 cmbsf), OTU counts, Chao1, and Shannon diversity estimators are consistently higher for DNA extracted with the MP kit than for DNA extracted with the MB kit. For deeper horizons (232 cmbsf and 513 cmbsf), both extraction kits yield nearly identical diversity estimates (Fig. 7A). PCoA analysis resolves shallow (0-32 cmbsf) samples as a function of extraction kit along axis 1 (47% of variance). Axis 2 (18.9% of variance) separates each shallow group as a function of sediment depth (Fig.  7B). The two kits yield slightly different results (different

Discussion
MP extracts more DNA under near-seafloor conditions, but the same amount as MB under subseafloor conditions The MP kit extracted more DNA from near-seafloor samples than the MB kit, specifically in the 0-8 cmbsf depth range ( Fig. 2A). Similar observations have been made by others (Lever et al. 2015). Physical force alone may explain this observation, since the automated "bead beating" homogenization step of the MP protocol is more vigorous than the homologous horizontal vortexing step used in the MB protocol (see "Materials and procedures" section). Interestingly, for deeper horizons, extraction efficiency appears to be independent of homogenization vigor, since the MB kit reports higher yields at 32 cmbsf and 513 cmbsf. However, previous studies have shown that DNA extraction efficiency depends on both extraction protocol and taxonomic group (Morono et al. 2014). A detailed mechanism to explain our observations is unattainable because the reagent identities and chemical mechanisms of commercial extraction kits are proprietary; however, we speculate that (1) a decrease in biomass loads coupled with (2) a decrease in diversity affect (1) the amount of physical force needed for efficient community-wide lysis and (2) lower inter-taxonomic variation in membrane strength, respectively, eliminating the advantages observed for the MP protocol with near-seafloor sediment.

MP kit co-extracts more protein at all depths
In addition to DNA yield, extract purity is an important consideration for downstream applications (e.g., q-PCR, tagsequencing, shot-gun metagenomics). The presence of coextracted proteins and/or phenolic compounds (humic and fulvic acids) in near-seafloor (0-8 cmbsf) extracts from both kits is suggested by 260/280 nm absorbance ratios lower than 1.8 (Fig. 2B). This is a commonly reported issue for organic-rich marine sediment (Lloyd et al. 2010). For both kits, DNA extracts from samples of deep sediment (32-513 cmbsf) were notably cleaner (260/280 nm averages of 1.8) than those from near-seafloor samples. A secondary nucleic acid to protein ratio measure (260/230 nm), generally expected to be equal to or higher than 1.8 for "pure samples," shows that both kits co-extract protein but that the MP kit consistently co-extracts higher protein amounts relative to the MB kit across all sampled depths (Fig. 2C). Thus, despite the higher DNA yields obtained using the MP kit in shallow sediment, the vigorous homogenization in the MP protocol may also result in higher levels of inadvertent proteinaceous co-extraction.

Domain-specific gene quantification: Same depthdependent trends with both kits
With both extraction kits, counts of Bacterial and Archaeal 16S rRNA genes decrease with increasing sediment depth (Fig. 3). Kit-specific differences are generally larger in magnitude for Bacteria, where kit choice significantly influences the gene quantities reported at all depths except at 3 cmbsf (p < 0.05) (Fig. 3A). In surface sediment, 16S gene count magnitudes for each domain are comparable ( 1-5 3 10 8 genes cm 23 ); at depth, however, counts for Archaea (Fig. 3B) decrease relative to those of Bacteria ( 10 6 genes cm 23 vs. 10 7 genes cm 23 , respectively). Previous work has shown that complex DNA-substrate interactions may favor the extraction of Bacterial over Archaeal DNA from marine basalt (Wang and Edwards 2009). Given that both extraction kits in this study yield gene counts for both Domains within the same range in shallow (0-32 cmbsf) samples, the dominance of Bacterial over Archaeal genes in deeper sediment (> 32 cmbsf) is likely not due to extraction Ram ırez et al.
DNA extraction kit choice and subseafloor biology bias with either kit. It should be noted that lower Archaea counts at depth may still be a technical artifact because all available primers are inadequate at capturing the true breadth of diversity (e.g., Asgard Archaea) of this Domain (Karst et al. 2018). Overall, despite potential primer bias and kit-specific significant differences in absolute quantities reported for Bacteria, both kits provide nearly identical narratives regarding the relative prevalence and abundance of Domain-specific 16S rRNA genes at all sampled sediment depths.

Contaminant sequence loads
Co-extraction and sequencing of kit-specific blanks allowed the in silico subtraction of contaminant sequences. Pre-and post-contaminant subtraction sequence loss shows that both kits yield extracts containing amplifiable kitspecific contaminant DNA (Fig. 4). The MB kit has higher contaminant-sequence percentages in surface sediment. Regardless of kit used, contaminant amplification increases substantially with sediment depth, to comprise nearly 50% of all sequences in the two deepest samples. Since (1) each extraction is assumed to contain the same initial amount of kit-contaminant DNA, (2) all sequencing occurred in a single sequencing run, and (3) the two subseafloor sediment horizons show the highest contaminant fractions, contaminant amplicon surges likely result from competitive PCR amplification (Salter et al. 2014). This is a longstanding problem for marine sediment that is exacerbated in low-biomass settings (Webster et al. 2003). We suggest addressing this issue in vitro, via larger extraction volumes to avoid template competition, and/or with arduous controls for efficient in silico subtraction.

Community composition is generally the same with both kits
Both extraction kits yield nearly identical community composition at the class levels (Fig. 5). However, from nearseafloor samples (0-32 cmbsf), Firmicutes (Bacilli class) are only represented in DNA extracted by the MB kit. Firmicutes are recovered by both kits in deeper samples. It is unknown if the Firmicute sequences extracted by the MB kit represent spores or vegetative cells. Given the following points: (1) Firmicutes dominate our two deeper samples (Fig. 5), (2) marine sediment may be supplied with 10 8 Firmicutes spores per m 2 per year (Hubert et al. 2009); we speculate that Firmicute spores are present in the near-seafloor sediment and more efficiently detected with the MB extraction protocol. Given that the MP kit more efficiently extracts bacterial spore DNA from soils compared to the MB kit (Dineen et al. 2010), an unsurprising result given the more vigorous homogenization step of the former, it is rather unexpected that the MB kit may be more efficient at this task in marine sediment.
Generally, extraction kits agree on OTU counts for discrete depths Although normalized OTU counts are imperfect assessments of taxonomic distribution in nature (Weiss et al. 2017), comparing this metric for each sampled horizon as a function of extraction kit reveals insightful patterns. Kitspecific abundances for the top 100 OTUs clustered from the combined dataset show that relative abundances recovered with either kit generally agree to within one order of magnitude (Fig. 6). Exceptions to this trend, however, are observed (Fig. 6, red boxes). In near-seafloor samples (3 cmbsf and 8 cmbsf), six of the 10 most abundant OTUs are consistently under-recovered with the MP kit relative to the MB kit, despite both kits reporting similar values for this OTU subset at all other depths. Among the next most abundant 90 OTUs, across all sampled depths, kit-specific differences are sporadically seen; however, high vs. low values may be attributed to either extraction kit.
Near-seafloor diversity is better captured with MP, subseafloor diversity is similar for both MP and MB The number of OTUs per 32,031 reads is higher for the MP kit extractions in near-seafloor (0-32 cmbsf) sediment (Fig. 7A). Despite higher DNA yields with the MB kit at 32 cmbsf (Fig. 7A), OTU counts are higher for the MP extracts at this depth (Fig. 7A). This observation suggests that in high-biomass horizons and independently of bulk DNA yield, the MP extraction protocol favors lysis of a more diverse cellular cohort, resulting in higher OTU counts and better capturing alpha diversity. However, in the subseafloor sediment (at 232 cmbsf and 513 cmbsf), where 16S gene counts are lowest, both extraction kits yield nearly identical OTU counts (Fig. 7A). This convergence at depth may be explained by (1) subseafloor microbial populations being comprised of taxa equally susceptible to cell membrane disruption by both extraction protocols, (2) longer entombment times in deeper sediment weakening a fraction of the community that during early burial was more resistant to lysis by the MB kit (513 cmbsf cells are 44.6 kyr old), and/ or (3) a lower silica matrix saturation point for MB relative to MP, an effect that exaggerates richness differences in near-seafloor horizons, where biomass is highest, but is negligible in deeper sediment with fewer than 10 8 16S rRNA genes cm 23 .

Ordination patterns not affected by extraction kit choice
Beta-diversity patterns, as inferred from multivariate ordination, follow similar depth-dependent trends as OTU counts; near-seafloor (0-32 cmbsf) horizons show kit-specific differences while subseafloor (232 cmbsf and 513 cmbsf) horizons cluster tightly (Fig. 7B). The MP extracts more clearly resolve ordination patterns (the MP-derived data are spread over a larger area in two-dimensional space, relative to the MB-derived data). Overall, the major conclusion drawn from ordination patterns-that near-seafloor communities are more dissimilar relative to subseafloor communities-may be drawn from the DNA extracted with either kit.

Comments and recommendations
Despite differences in DNA yield, protein co-extraction loads and absolute 16S rRNA gene counts, two popular DNA extraction kits (MO BIO Laboratories PowerSoil V R DNA isolation kit and MP Biomedicals FastDNA TM Spin Kit for Soil) generate similar depth-dependent prokaryotic community composition, OTU-specific abundances and diversity trends for subseafloor sediment. Higher average extraction yields

Dedication
This manuscript is dedicated to Augustus Huitzilin Ram ırez-Tatone; welcome to Earth.