Ultra‐small and abundant: Candidate phyla radiation bacteria are potential catalysts of carbon transformation in a thermokarst lake ecosystem

The candidate phyla radiation (CPR) is a diverse group of uncultured bacterial lineages with poorly understood metabolic functions. CPR bacteria can represent a large proportion of the total planktonic microbial community in subarctic thermokarst lakes, but their functional roles remain unexplored. We applied sequential water filtration and metagenomic shotgun sequencing to a peatland permafrost thaw lake, and found high proportions of CPR bacteria in both summer and winter (> 40% of 16S rRNA reads in the 0.02–0.22 μm pore‐size fraction). The metagenome‐assembled genomes of CPR bacteria representatives showed capacities to degrade and ferment permafrost‐ and peatland‐derived organic matter. Potential products of their metabolic activities included acetate, CO2, and hydrogen, implying a syntrophic relationship with other community members, including methanogens and methanotrophs. The results indicate biogeochemical interdependencies in organic matter utilization within thermokarst microbial communities, with CPR members playing a key intermediate role in carbon and methane cycling.

with other community members, including methanogens and methanotrophs. The results indicate biogeochemical interdependencies in organic matter utilization within thermokarst microbial communities, with CPR members playing a key intermediate role in carbon and methane cycling.
The candidate phyla radiation (CPR) is a large and diverse radiation of bacteria that to date consists exclusively of uncultured microorganisms. CPR bacteria tend to be absent or underestimated in 16S rRNA gene amplicon surveys, likely because of introns within their 16S rRNA gene that interfere with polymerase chain reaction (PCR) amplification, and mismatches with standard "universal" primers (Brown et al. 2015). However, genome resolved metagenomics and single cell sequencing of environmental samples have revealed this large and previously hidden bacterial radiation. With more than 74 phyla clustered within two major superphyla, the Parcubacteria and the Microgenomates, CPR taxa may contribute up to half of the global bacterial biodiversity (Rinke et al. 2013;Brown et al. 2015;Hug et al. 2016;Castelle et al. 2018).
High proportions of CPR bacteria have been recently reported in subarctic thermokarst lakes and ponds (Wurzbacher et al. 2017;Vigneron et al. 2019). These small waterbodies form by the thawing and collapse of ice-rich permafrost, and are abundant throughout the subarctic region. In these ecosystems, the permafrost-derived organic matter is mobilized and transformed by microbial activities, leading to high rates of greenhouse gas emission to the atmosphere (Matveev et al. 2016). Geochemical conditions such as oxygen, methane, and sulfide concentrations contrast markedly between the ice-free and icecover periods (Matveev et al. 2019), and this seasonal change is accompanied by pronounced shifts in microbial community structure and metabolic capacities (Vigneron et al. 2019). In the oxygenated surface waters in summer, the microbial communities are predominantly aerobic bacteria (Betaproteobacteria and Actinobacteria, Fig. 1), with many having the capacity for phototrophic metabolism, and the enzymatic potential to degrade labile organic matter or to oxidize one-carbon compounds (Vigneron et al. 2019). In contrast, the microbiome of the fully anoxic water column in winter is composed of anaerobic bacteria (including Planctomycetes, Deltaproteobacteria, Chloroflexi, and methanogens, Fig. 1) that may be involved in syntrophic and methanogenic degradation of complex organic matter (Vigneron et al. 2019). Using metagenomic data sets, CPR bacteria, including Parcubacteria and Microgenomates, were found to represent a large proportion (15-25%) of the microbial community in a subarctic thermokarst lake in both seasonal contexts (Vigneron et al. 2019), raising questions about their potential roles and lifestyles in such a poorly explored environment.
CPR bacteria were initially recovered from groundwaters and aquifers, where they have been found to account for up to 20% of the microbial community (Kantor et al. 2013;Luef et al. 2015). They have also been detected as minority members of the community in other environments, including surface freshwaters (Linz et al. 2017), terrestrial systems (Parks et al. 2017) and marine habitats (Dombrowski et al. 2017;Tully et al. 2018), as well as in animal microbiomes (Camanocha and Dewhirst 2014). Consistent with their enrichment in groundwater samples that have been prefiltered through 0.22 μm filters (Luef et al. 2015;Anantharaman et al. 2016;Danczak et al. 2017;Castelle et al. 2018), transmission electron microscopy indicates that some of the Parcubacteria members are extremely small (0.1 μm in diameter; Luef et al. 2015). This "ultra-small" cell size could also explain their relatively low proportions in aquatic microbial samples collected on 0.22 μm filters, since they would be poorly retained on such filters.
The functional capacities of CPR phyla in natural ecosystems remain unclear as their apparently reduced genomes (~1 Mb; Brown et al. 2015) are mainly composed of uncharacterized genes. In addition, some of the core metabolic genes thought to be essential in other bacterial phyla are consistently absent in the metagenomic assembled genomes (Castelle et al. 2018). Extensive genome data mining of Parcubacteria and Microgenomates has indicated a lack of known respiratory pathways for the large majority of phyla, suggesting syntrophic/symbiotic and fermentative lifestyles for these bacteria, which could also help to explain the absence of cultured representatives (Wrighton et al. 2012;Castelle et al. 2018). However, genomes of some Parcubacteria species indicate a potential involvement in nitrogen cycling (Castelle et al. 2017;Danczak et al. 2017;León-Zayas et al. 2017). Analysis of glycoside hydrolase (GH) families suggests that the metabolic capacity for amylose and cellulose degradation is widespread within the radiation, indicating a role in carbon cycling and complex organic matter degradation. Other catabolic capabilities have also been reported implying differences in ecological niches and carbon substrate utilization (Danczak et al. 2017).
In the present study, we combined sequential filtration of thermokarst waters on 0.22 and 0.02 μm pore-size filters and PCR-free metagenomic sequencing to determine whether the distribution of CPR members in thermokarst lakes might have been previously underestimated due to the small cell-size of some representatives. We then used metagenomic binning to investigate the metabolic capacities of these microbial lineages and to address the question of their role in permafrost organic carbon cycling in thermokarst lake ecosystems.

Methods
Water samples were collected from a 2.8 m deep thermokarst lake in the Sasapimakwananisikw (SAS) River Valley located in the sporadic permafrost zone of subarctic Quebec. Water samples were collected from the anoxic water column under ice during winter 2015 and at the oxycline (0.5 m depth) during summer 2016 as previously described in Vigneron et al. (2019). Further site details are given in the Supporting Information. Three independently sampled replicates were analyzed per season and all water samples were collected by serial filtration onto 0.22 Sterivex™ filter units (EMD Millipore, MA, U.S.A.) then 0.02 μm Anotop™ filters (Whatman, GE Healthcare, UK) using three independent lines of a peristaltic pump system. The filters were immediately frozen and stored at −70 C until DNA extraction.
For the six 0.22 μm Sterivex samples, nucleic acids were extracted using Qiagen Allprep DNA/RNA Mini Kit with modifications (Cruaud et al. 2017). For the six 0.02 μm Anotop filters, nucleic acids were extracted using Master Pure Complete DNA purification kit as in Mueller et al. (2014). Due to the nature of the filters (membrane or glass microfibers) a single DNA extraction procedure was not possible, nonetheless both DNA extraction protocols were composed of similar enzymatic (Lysozyme and Proteinase K) and chemical lysis. Metagenomic libraries were prepared using Illumina Nextera XT kit (for Sterivex samples) and SWIFT Accel NGS ® 1S kit (allowing lower DNA input for Anotop samples) following the manufacturer's recommendations. DNA was sequenced on Illumina Nextseq for 0.22 μm filters (at the CGEB-Integrated Microbiome Resource, Dalhousie University) and on an Illumina HiSeq for the 0.02 μm filters (Génome Québec sequencing facility, University McGill), both producing 150 bp reads. The bioinformatics methods for microbial community composition analysis and metagenomic assembly, binning and analysis are given in the Supporting Information. Bacterial relative abundance was determined by quantitative PCR using the 1369f/1492r primer set as detailed in Supporting Information.

Results and discussion
Ultra-small bacteria in thermokarst lake waters Specific bacterial lineages that passed through the 0.22 μm pore-size filters but were retained by 0.02 μm pore-size filters were identified by 16S rRNA genes extracted from metagenomes of winter and summer samples (Fig. 1). The sequential filtration enriched for small-sized cells, but spores, bacteria with flexible membrane, viruses, and environmental DNA adsorbed onto small particles would also have been recovered in this fraction. CPR bacteria represented on average (AE SD), 55.7% AE 7.8% and 43.0% AE 7.7% of the 16S rRNA gene reads on the 0.02 μm filters in winter and summer samples, respectively (Fig. 1). For comparison, CPR bacteria represented on average 16.2% AE 3.6% and 13.4% AE 2.3% of the community on the 0.22 μm filters in winter and summer samples, respectively. In particular, Parcubacteria lineages (including Candidatus Azambacteria and Candidatus Nomurabacteria) were significantly enriched (12 times greater) in metagenomes from the 0.02 μm filters compared to the 0.22 μm filters (t-test, p = 0.003), regardless of season (Fig. 1). These results extend previous observations carried out on groundwater samples to permafrost pond environments and are consistent with the small cell size previously inferred or observed for these lineages (Brown et al. 2015;Luef et al. 2015;Danczak et al. 2017). Nonetheless, other Parcubacteria lineages, including Ca. Moranbacteria and Ca. Falkowbacteria and Microgenomates lineages, including Ca. Daviesbacteria and Ca. Levybacteria were detected in similar proportions in both size filters (Fig. 1).
The abundance of bacterial 16S rRNA genes on the 0.02 and 0.22 μm filters was estimated by quantitative PCR (Fig. 2). Our results for the 0.02-0.22 μm fraction indicated an average of 9.9 × 10 3 16S rRNA genes mL −1 and 4.16 × 10 5 16S rRNA genes mL −1 in winter and summer, respectively, while the bacterial abundance quantified on the 0.22 μm filters averaged 4.03 × 10 5 and 6.74 × 10 5 16S rRNA genes mL −1 for the same seasons. Assuming the same gene copy numbers per cell in both fractions, this would imply that bacteria passing through 0.22 μm filters represent 2.4% of the total bacterial community (0.22 + 0.02 μm filters together) in winter and 38.2% in summer (Fig. 2). However, all complete CPR genomes reported to date have only one copy of the 16S rRNA gene and some CPR bacteria would also likely escape quantification due to primer mismatches (Brown et al. 2015). These abundance estimates therefore represent lower bounds for total bacteria and relative abundance compared to lineages with multiple 16S rRNA gene copies.
The metagenome analyses indicated that Parcubacteria were predominant members of the 0.02-0.22 μm community in both summer (31% of the 16S rRNA reads) and winter (51% of the 16S rRNA reads), despite differences in the microbial community composition on the 0.22 μm filter samples (Fig. 1) and geochemical changes between the summer and winter, including in oxygen and sulfide concentrations (Vigneron et al. 2019). This result is also supported by previous RNAbased amplicon sequencing carried out on the same samples that identified Parcubacteria members in both summer and winter samples (Vigneron et al. 2019). Although this result only concerns the Parcubacteria lineage that can be amplified by standard primer sets, it suggested that at least the PCRdetectable part of the Parcubacteria are probably active in summer and winter seasons. Although the lower abundance observed by quantitative PCR (qPCR) in winter samples compared to summer samples should be taken into consideration, these results imply that Parcubacteria members may be more resistant than other lineages to oxygen concentrations and other variable factors observed in permafrost ponds, such as light availability, microbial grazers and viruses, and inhibitory compounds including hydrogen sulfide. This resistance might explain their success in thermokarst lake waters where strong seasonal environmental variations restructure the rest of the microbial community (Vigneron et al. 2019).
Based on their limited metabolic repertoire and absence of biosynthetic genes, previous studies have proposed an episymbiotic relationship for Parcubacteria whereby they are attached to the surface of other cells (Nelson and Stegen 2015;Castelle et al. 2018). However, the microbial communities were markedly different between summer and winter, and no other microbial lineage was found to covary with Parcubacteria. This weakens any hypothesis of a stable symbiotic host for Parcubacteria at least in this ecosystem type. Nonetheless, we cannot exclude interdependencies or syntrophy among Parcubacteria lineages.
In contrast to Parcubacteria, the relative proportion of Microgenomates lineages and Candidatus Berkelbacteria were enriched in the 0.02 μm filter samples only in summer, possibly contributing to the log difference in relative abundance detected by qPCR (9.9 × 10 3 16S rRNA genes mL −1 vs. 4.16 × 10 5 16S rRNA genes mL −1 in winter and summer, respectively; Fig. 2). These results suggest a different response of these lineages to the thermokarst aquatic environment than by the predominant Parcubacteria members, with the possibility of symbiotic relationships with taxa in the summer microbial community.

Carbon metabolism by thermokarst lake CPR bacteria
After binning of the coassembled metagenomes, we analyzed seven genomic bins affiliated to Parcubacteria, and one bin affiliated to Microgenomates (Supporting Information  Table S1). The completeness of these bins ranged from 47% to 73% based on a set of 101 bacterial markers that includes genes consistently not detected in CPR bins (Nelson and Stegen 2015), and with a contamination level below 7% (Fig. 3). Based on 16S rRNA genes and other ribosomal protein (rps) genes analysis, two of the Parcubacteria bins were related to Candidatus Yanofskybacteria (bin13 and bin20), three to Candidatus Moranbacteria (bin9, 23, and 91) and a single genomic bin was related to Candidatus Kaiserbacteria (bin137). In addition, bin120 was also reconstructed but 16S rRNA gene was missing and rps genes analysis was unable assign this bin to a specific taxon within the Parcubacteria superphylum. Finally, the single bin affiliated to Microgenomates superphylum was related to Candidatus Daviesbacteria (bin78).
Using IMG/MR and KAAS servers for genome annotation and pathway reconstruction (Moriya et al. 2007;Markowitz et al. 2009), characterized genes were identified on approximately a third of the contigs (> 2000 bp), representing 249 AE 57 different KEGG orthologies (KO) per bin. Their genomic composition mirrored taxonomic affiliation, suggesting similar gene composition within members of the same phylum (Fig. 3). Although the incompleteness of the bins will have led to nondetection of certain genes, all Parcubacteria and Microgenomates genomic bins recovered from our samples lacked a respiratory chain and tricarboxylic acid cycle, consistent with fermentative metabolism that would be adapted to the winter anoxic thermokarst lake environment. Nonetheless, catalase (K03781), peroxiredoxin (K11188), and superoxide dismutase (K04565) genes were identified in the genomic bins, suggesting enzymatic resistance to oxygen for thermokarst lake Parcubacteria lineages and potentially explaining their high relative proportion in summer oxic waters (Supporting Information Table S2). All genomic bins encode for the lower part of the glycolytic pathway (phosphofructokinase genes were not detected, excluding the possibility for neoglucogenesis by the usual enzymatic pathway), with the upper glycolysis being completed for three bins (bins 20, 78, and 137) by the nonoxidative pentose phosphate pathway (Fig. 3). Genes involved in downstream transformation of pyruvate to acetyl-CoA and acetate were detected in Candidatus Moranbacteria and Candidatus Yanofskybacteria, suggesting that these lineages might ferment carbon compounds to acetate (Fig. 3). The capacity to generate H 2 via NiFe hydrogenases was also found in Candidatus Yankofskybacteria bin13. Lactate dehydrogenase gene was detected in Candidatus Kaiserbacteria, suggesting that lactate may be an end product of its fermentation. In addition, the capacity for CO 2 producing (decarboxylative) malate fermentation was identified in Candidatus Moranbacteria and Candidatus Kaiserbacteria (Fig. 3). Although these results described only a limited part of the CPR bacteria community identified in our samples, these metabolic capabilities align with metabolic functions identified in Parcubacteria bins recovered from other freshwater environments (Nelson and Stegen 2015;Castelle et al. 2018), and extend our knowledge of CPR bacterial metabolism to thermokarst lakes and ponds, one of the most abundant ecosystem types in the circumpolar North.

Degradation of permafrost-derived organic matter by CPR bacteria
An average of 29 AE 12 carbohydrate-active enzyme (CAZy) genes per CPR genomic bin was identified (Fig. 4), indicating the potential for degradation of complex carbon substrates (Cantarel et al, 2009). The number of CAZy genes was lower than in other complex organic carbon degrading bacterial lineages from the thermokarst waters (Vigneron et al. 2019), but is consistent with previous analysis of CPR genomes (Danczak et al. 2017). Genes coding for amylases which usually occur in Parcubacteria genomes from marine and freshwater samples were detected in most of the bins along with the capacities for degrading cellulose (GH1) and mannose (GH130), supporting previous results from aquifer samples (Danczak et al. 2017). Cellulose and mannose are the most abundant and stable constituents of Sphagnum peats (Theander et al. 1954), which constitute most of the available or permafrost-stored organic carbon in our peatland permafrost study site (Arlen-Pouliot and Bhiry 2005;Matveev et al. 2016). Our results thereby might indicate the potential for permafrost and peatland-derived organic carbon degradation and fermentation by CPR bacteria in these permafrost thaw lakes. A capacity for chitin degradation (GH18) was also detected in Candidatus Moranbacteria, suggesting that some Parcubacteria members might also degrade remains of fungi and invertebrates. In addition, various genes coding glycosyltransferases (GT) were identified, suggesting that conversion of (poly)saccharides, (poly)peptides, and glycerolipids might be an important mechanism in CPR bacteria, which frequently lack de novo synthetic pathways for these compounds. This metabolic potential would benefit these small bacteria since such conversions could aid cellular membrane synthesis and cell to cell interactions (León-Zayas et al. 2017), supporting a close relationship with other members of the community.

Who benefits from Parcubacteria metabolism?
Our results indicate that acetate would be one of the major end products of Parcubacteria metabolism. We therefore searched for potential acetate-utilizing microbial lineages based on the presence of acetyl-CoA synthetase (K01895), acetate kinase (K00925), or phosphate acetyltransferase (K00625) genes, conferring the ability for assimilatory or dissimilatory use of acetate (Wolfe 2005), in the genomic bins for the same samples (Vigneron et al. 2019). Various microbial lineages were identified as potential acetate utilizers and together represented up to 21% AE 1% and 55% AE 1% of the community on 0.22 μm pore-size filters in winter and summer, respectively (Fig. 5). Although acetyl-CoA synthetase is bidirectional and might be involved in acetate production (Wolfe 2005), acetateutilizing microorganisms include members of the Actinobacteria, Betaproteobacteria, Planctomycetes, and Chloroflexi phyla as well as methanogens and methanotrophs. In addition, genes involved in acetate transport were also identified in most of these lineages, supporting this potential (Fig. 5). These results suggest that in addition to their CO 2 production by fermentation, Fig. 4. Genes for carbohydrate active enzymes (CAZy) identified in the CPR bacteria bins. Bins were sorted according to their genome similarity, based on identified KOs and Bray-Curtis similarity index. Taxonomic affiliation of the bins was determined using 16S rRNA genes and other ribosomal protein genes. The size of the filled circles represents the number of detected genes (min: 1, max: 16). Green: plant or algal detritus degradation; Orange: chitin degradation; Blue: cell envelope/necromass degradation; Yellow: all classes of degradation. Percentages of CAZy genes were calculated based on the number of total coding regions per bin.
certain end products of Parcubacteria metabolism might potentially fuel greenhouse-gas producing members of the microbial community.

Conclusions
Our results show that CPR bacteria are a major component of the thermokarst lake microbial community, accounting for more than 20% of the microbial reads detected on 0.22 μm filters and up to 56% of the microorganisms in 0.02 μm filters. These high percentages were maintained in both aerobic surface waters in summer and in the anoxic under-ice conditions in winter. These ultra-small microorganisms were also potentially abundant, representing quantitatively up to a third of the total bacterial community quantified in this study. Reconstruction of Parcubacteria genomes was consistent with previous reports and suggested that they may degrade and ferment permafrost and peatland-derived organic matter, thereby providing volatile fatty acids and hydrogen to methanogenic and other thermokarst pond microorganisms. In addition, they may also contribute to thermokarst pond CO 2 emissions through their fermentative metabolism. Although our approach remains to be applied to a range of permafrost lakes and ponds before extrapolating these conclusions more widely, the results imply that Parcubacteria could play a major role in the carbon cycling network of these ecosystems and may act as catalysts for carbon transformation in all seasons. Given their poor resolution by standard water filtration methods and molecular protocols in aquatic microbial ecology, and their role suggested here in generating labile organic carbon substrates, the importance of the CPR group is likely to have been greatly underestimated or overlooked in lake ecosystems in general.