Expression of genes involved in phagocytosis in uncultured heterotrophic flagellates

Environmental molecular sequencing has revealed an abundance of microorganisms that were previously unknown, mainly because most had not been cultured in the laboratory. Within this novel diversity, there are the uncultured MAST clades (MArine STramenopiles), which are major components of marine heterotrophic flagellates (HFs) thought to be active bacterial grazers. In this study, we investigated the gene expression of natural HFs in a mixed community where bacterivory was promoted. Using fluorescence in situ hybridization and 18S rDNA derived from metatranscriptomics, we followed the taxonomic dynamics during the incubation, and confirmed the increase in relative abundance of different MAST lineages. We then used single cell genomes of several MAST species to gain an insight into their most expressed genes, with a particular focus on genes related to phagocytosis. The genomes of MAST‐4A and MAST‐4B were the most represented in the metatranscriptomes, and we identified highly expressed genes of these two species involved in motility and cytoskeleton remodeling, as well as many lysosomal enzymes. Particularly relevant were the cathepsins, which are characteristic digestive enzymes of the phagolysosome and the rhodopsins, perhaps used for vacuole acidification. The combination of single cell genomics and metatranscriptomics gives insights on the phagocytic capacity of uncultured and ecologically relevant HF species.

Heterotrophic flagellates (HFs) are widespread throughout the eukaryotic tree of life and may represent the most ancient eukaryote lifestyle (Cavalier-Smith 2006;Jürgens and Massana 2008;Adl et al. 2019). These colorless flagellated protists are important consumers of primary and secondary production in marine ecosystems (Arndt et al. 2000;Worden et al. 2015), and play a pivotal role in microbial food webs by ensuring the recycling of nutrients. Although they make a significant contribution, it is difficult to assess their diversity because they cannot be easily differentiated by microscopy and most remain uncultured. Moreover, HFs are often ignored in quantitative studies because they are less abundant than photosynthetic protists, and are not well represented in sequence databases, especially of sequenced genomes ). To study the gene expression and elucidate functional characteristics of these diverse and complex assemblages, metatranscriptomics provides a promising but also challenging opportunity.
Most HFs are considered to be bacterivorous, that is, they consume bacteria by phagocytosis. The engulfment and digestion of a foreign organism as prey, is an important nutritional process in many species of unicellular eukaryotes, and consists of three main steps: (1) motility and prey recognition, (2) internalization, formation, and maturation of the phagosome, and (3) digestion of prey within the phagolysosome (Levin et al. 2016;Uribe-Querol and Rosales 2017). In phagocytosis, prey is internalized by invagination of the plasma membrane to form an intracellular vacuole known as the phagosome (Niedergang and Grinstein 2018). Engulfment is controlled by the actin cytoskeleton and coordinated by phagocytic receptors that activate the GTPases Rac, Rho, and Cdc42 genes (Vieira et al. 2002;Niedergang and Grinstein 2018). The phagosomes then undergo a maturation process, acquire different proteins (like the Rab GTPase) (Rink et al. 2005;Fairn and Grinstein 2012), and become acidified. Mature phagosomes become functional phagolysosomes by fusing with lysosomes, which provide digestive enzymes and further acidify the environment to optimize the performance of these enzymes. Thus, a mature phagolysosome is characterized by the presence of a range of lysosomal acid hydrolases such as proteases, lysozymes, and lipases (Vieira et al. 2002;Fairn and Grinstein 2012). Most of our understanding of phagocytosis at the genomic level is limited to studies on a few species of eukaryotes: several mammals (Boulais et al. 2010), choanoflagellates (Dayel and King 2014), green algae (Burns et al. 2015), and amoebae (Okada et al. 2005). The molecular machinery of phagocytosis in environmentally relevant HFs has not been described, which prevents us from quantitatively evaluating particular traits representative for this lifestyle.
A substantial part of marine HF assemblages are MArine STramenopiles (MASTs) ), a set of largely uncultured clades within Stramenopiles, a taxa-rich supergroup including autotrophic (Ochrophyta) and heterotrophic (Pseudofungi, Sagenista, and Opalozoa) high-rank lineages. To date, 18 MAST clades have been identified within these three heterotrophic lineages, and each one may have a distinct ecological niche . Some MAST lineages are widespread and highly abundant in the surface ocean (de Vargas et al. 2015;Mangot et al. 2018). Most MASTs are assumed to be bacterial feeders, as this has been demonstrated in a few of them (e.g., MAST-1 and MAST-4) by microscopic inspection of bacterial preys inside the cells (Massana et al. 2009;Lin et al. 2012). However, it is not clear that all MASTs can perform phagocytosis, nor has it been confirmed which sets of genes and proteins are used for this process. Because of the difficulty of cultivating the most relevant species of marine HFs, it is not possible to perform direct physiological and gene expression studies on single species, so there has been little progress in understanding the genetic basis of phagocytosis in these organisms.
In this article, we circumvent the need for culture-based approaches using a combination of molecular tools to study a set of MAST species growing in near-natural conditions. First, we established an unamended incubation of a coastal surface sample in which active HF cells were growing by feeding on bacteria (Massana et al. 2006), and obtained metatranscriptomic data at several time points during the incubation. Second, by using the cell counts and the 18S rDNA signatures of the metatranscriptomes, we analyzed the temporal dynamics of the taxonomic groups succeeding in the incubation, tracing their positive or negative response. Third, using genomic data obtained by single cell genomics (SCG) (Mangot et al. 2017), we recruited transcripts from several MAST species, allowing the analysis of expressed genes likely contributing to the phagocytosis pathway. Thus, the combination of metatranscriptomics with single cell genomes, available for a few dominant species, allowed us for the first time to study the expression profile of uncultured HFs in natural assemblages.

Growth of marine HFs in an unamended incubation
Approximately 100 L of surface seawater were sampled from Blanes Bay (41 40 0 N, 2 48 0 E) on 4 th July 2017, prefiltered by gravity through a nylon mesh of 200 μm, and transported to the institute within 2 h. In the lab, 50 L of seawater were gravity-filtered through 3 μm pore size polycarbonate filters (47 mm diameter) into a polycarbonate bottle (Nalgene). The bottle was incubated for 5 d in the dark at 24 C, the in situ temperature of the sampling site, and sampled twice a day for cell counts and once a day for molecular data. For total cell counts, aliquots were fixed with glutaraldehyde and stained with 4 0 ,6-diamidino-2-phenylindole (DAPI). Cell counts of heterotrophic bacteria, Synechococcus, and phototrophic and heterotrophic flagellates (2-3 μm in size) were obtained using epifluorescence microscopy, with excitation by UV radiation (DAPI stained DNA signal) and blue light (to confirm the presence of chlorophyll) (Giner et al. 2016). For cell counts by fluorescence in situ hybridization (FISH) (Amann et al. 1995), aliquots were fixed with formaldehyde and hybridized with oligonucleotide probes specific to MAST-4, MAST-7, Minorisa minuta, and Prymnesiophyceae, as described previously (Cabello et al. 2016;Giner et al. 2016). Samples were then examined by epifluorescence microscopy with blue light excitation. For molecular analyses, 2 L of the incubation were filtered through 0.6 μm pore size polycarbonate filters of 47 mm diameter, which were then flash frozen in liquid nitrogen and stored at −80 C until RNA extraction.

RNA extraction and Illumina sequencing
For RNA extraction, the filters were shattered and vortexed in a tube containing Power Soil beads (Mobio) as described previously (Alonso-Sáez et al. 2018). RNA extraction was performed using the RNeasy Mini Kit (Qiagen). About 60 μL of the primary RNA extract were processed using Turbo DNAse (Ambion, Turbo DNAfree kit) to completely remove residual DNA. The RNA extract was purified by ethanol precipitation, and the pellet resuspended in 40 μL 10 mmol L −1 TRIS. Metatranscriptomic sequencing was performed using 200-400 ng of total RNA extract. Illumina RNASeq libraries were prepared at CNAG (https://www.cnag.crg. eu/) using KAPA Stranded mRNA-Seq Illumina (Roche-KAPA Biosystems). The polyadenylated eukaryotic transcripts were first isolated using poly-T oligonucleotides attached to beads. Then, the mRNA was fragmented using heat and magnesium, converted to complementary DNA (cDNA) by reverse transcription, and sequencing adaptors were ligated to the ends of cDNA molecules. Ligation products were enriched by 15 polymerase chain reaction (PCR) cycles and final libraries were validated with an Agilent 2100 Bioanalyzer. Sequencing was performed at CNAG using an Illumina HiSeq 2500 (TruSeq SBS Kit v4) system, which generated 73 million paired-end reads (2 × 100 bp) for a final sequencing depth of 15 Gbp per sample. The sequence data are available as raw reads at the NCBI BioSample database (SAMN11783926).

Taxonomic characterization of the microbial community
Focusing on the entire eukaryotic domain, we built a reference database of the hypervariable V4 region of the 18S rRNA gene using sequences from SILVA (Quast et al. 2013), and from data sets of environmental marine protists based on 454 (Massana et al.

Labarre et al.
Phagocytosis expressed in uncultured protists 2015), and Illumina sequencing (Giner et al. 2019). The database is available at https://github.com/aleixop/eukaryotesV4. These references were assigned to several "class level" taxonomic ranks using manual curation. We used local alignment by USEARCH (Edgar 2010) to retrieve reads from the metatranscriptomes related to this reference database (> 97% similarity and > 90% alignment coverage). For each individual read, we selected all hits with the same maximal score and filtered them to obtain the correct taxonomic classification: we assigned a read to a taxonomic class when all top hits belonged to that class; otherwise, the read remained unclassified. The relative abundance of a taxonomic class was obtained by dividing its number of V4 reads by the total number of V4 reads in the sample. To obtain a finer classification of reads within MAST lineages, we prepared a second reference data set using sequences classified within separate MAST lineages ) and applied the same criteria as for read classification.
Assembling the metatranscriptome and read mapping We performed quality/adapter trimming of the Illumina HiSeq raw reads using Trimmomatic v0.33 (Bolger et al. 2014) with default settings. We then used SortMeRNA v2.1 (Kopylova et al. 2012) to identify and remove ribosomal RNA reads by comparing all reads against the SILVA SSU and LSU rDNA databases (Quast et al. 2013) and the PR 2 database (Guillou et al. 2013), using a match score e-value of < 0.01. The resulting set of rRNA free reads allowed us to perform a de novo metatranscriptomic coassembly of the six time points sampled using Trinity (Grabherr et al. 2011) with default parameters. We then used the Burrows-Wheeler aligner BWA software (Li and Durbin 2009) to map the individual cleaned paired-end reads from each sample back to the metatranscriptome assembly, allowing up to two mismatches per read, and then we estimated the expression level of each transcript in TPM units (transcripts per million) using the Salmon software (Patro et al. 2017). We computed TPM values for each isoform defined by Trinity (Grabherr et al. 2011), and for subsequent analyses we summed the signal from all isoforms from the same gene and kept the longest isoform as the representative sequence of each gene.

Extracting MAST transcripts using functionally annotated genes from single amplified genomes (SAGs)
To perform taxonomic binning of the final metatranscriptome, we used the Bowtie2 algorithm with default parameters (Langmead and Salzberg 2012), to map the assembled transcripts to predicted open reading frames of reference genomes from 10 representative MAST species obtained by SCG (Supporting Information Table S1). The single cell amplified genomes of MAST-4A and -4E are published (Mangot et al. 2017), while others have been sequenced, assembled and annotated as part of a separate study (Labarre et al. unpubl.). Gene prediction and functional annotation was performed using Augustus (Stanke et al. 2004) and InterProScan (Jones et al. 2014). Based on the gene annotation obtained using the single cell genome, we associated each MAST-specific transcript to a gene family that has a given function. We also assigned the proteins encoded by the genomes to the eggNOG database (Huerta-Cepas et al. 2016) that provides general functional overview classified into main categories of biological metabolism. Genome assemblies are available at doi.org/10.6084/m9.figshare.c.4534562.

Growth of marine HFs in an unamended incubation
We examined a mixed community of picoplanktonic microbes (≤3 μm) growing in a closed system, where higher trophic levels like larger predatory flagellates or ciliates had been filtered out. Using direct epifluorescence microscopy counts, we evaluated the temporal dynamics of several components of this mixed assemblage (Fig. 1a). Bacteria developed progressively during the incubation, obtaining their highest abundance at day 3 (~2 × 10 6 cells mL −1 ), and appearing to decrease at day 4. During the 5 d of incubation, HFs increased continuously, multiplying by nearly 10-fold (from 10 3 to 10 4 cells mL −1 ). Photosynthesis was inhibited by incubating the samples in the dark, such that the abundance of photosynthetic flagellates and Synechococcus decreased markedly from their initial abundance. As intended, the dark unamended incubation promoted the growth of HFs, which became the most important eukaryotic component of the system.
We constructed metatranscriptomic data for samples taken at six time points during the incubation. Despite the enrichment of mRNA in these transcripts (reverse transcription was based on the mRNA poly-A tail), we obtained many 18S rRNA reads (5-12% per sample), which we used to assess the taxonomic composition and dynamics of the assemblage by classifying individual reads to broad taxonomic classes (Fig. 1b). Initially, the samples were dominated by groups composed mainly of photosynthetic species, principally Prymnesiophyceae, Dictyochophycea, and Dinoflagellates, and these decreased markedly over time. In contrast, groups with heterotrophic taxa became more abundant, namely Choanomonada and several MAST lineages (MAST-1, -7, and -3). The increase in abundance of Chlorarachniophyta, which is a generally photosynthetic class, was due to the presence of its only known heterotrophic member, M. minuta.
We used taxonomy profiles based on 18S rRNA data to characterize the dynamics of the 40 most abundant classes, which collectively accounted for 99% of the community (Fig. 2a). The metatranscriptomic data set showed a large reduction in relative abundance of virtually all autotrophic groups, including Archaeplastida, Cryptomonadales, Prymnesiophyceae, Diatomea, Pelagophyceae, and the three MOCH lineages. In contrast, heterotrophic groups showed a more varied response to the incubation, with some decreasing in abundance (Picozoa, MAST-11, MALV-II), others remaining stable (Telonema, Katablepharids, Cercozoa), et al increasing (Choanomonada, most MAST Labarre et al. Phagocytosis expressed in uncultured protists S151 lineages, Bicosoecida and Labyrinthulomycetes). As expected, groups containing both autotrophic and heterotrophic species did not show a clear trend. To support these observations, we targeted a few groups using FISH (Fig. 2b), and observed a marked decrease in prymnesiophytes (from 400 to 3 cells mL −1 ), and an increase in M. minuta, MAST-7, and MAST-4 (increasing from 65 to 1300 cells mL −1 , 11 to 300 cells mL −1 , and 47 to 200 cells mL −1 , respectively). Our metatranscriptomics data revealed a very complex assemblage containing a large diversity of taxonomic groups and confirmed the growth of HFs and the decrease of photosynthetic groups.

Characterization of HFs using metatranscriptomics
After excluding rRNA reads, we built a de novo metatranscriptome assembly by merging the paired-end RNA-Seq data for the six time points sampled during incubation. We obtained 3,812,907 transcripts with an average length of 414 bp. We computed the TPM value for each transcript in every sample, and summed the values for all isoforms from the same gene, resulting in 3,338,309 transcript values. The resulting gene expression profile of the mixed community is very heterogeneous, as it represents hundreds of species from distant phylogenetic groups, each one expressing a different set of specific genes.

Labarre et al.
Phagocytosis expressed in uncultured protists S153 To assess gene expression and their putative function in individual species, we mapped the assembled transcripts to the predicted open reading frames of the SAGs of 10 MAST species. The two species that retrieved most transcripts were MAST-4A and MAST-4B (10,419 and 3789, respectively), and these had the highest expression level (Fig. 3a). Noticeably, the two closely related MAST-4C and -E retrieved very few transcripts and very little expression signal, as well as the remaining MAST genomes tested, with perhaps the exception of MAST-11 for which the expression signal was still considerable. Overall, we obtained a low recovery of transcripts mapping the SAGs, with about 0.22% of the entire metatranscriptome assigned to MAST-4A and 0.17% to MAST-4B. This low retrieval was generally consistent with the temporal changes in MAST relative abundances identified by the finest taxonomic classification of 18S rDNA reads (Supporting Information Fig. S1), which suggested little dominance of the 10 species represented by our SAGs. Thus, while MAST-7 increased in relative abundance (Figs. 1b, 2), only 144 transcripts mapped to the MAST-7A SAG (Fig. 3a), as the subclades growing in the incubation were MAST-7B, -D, and -E (Supporting Information Fig. S1). Similarly, few transcripts were fetched using the MAST-3A and -3F SAGs (the subclades growing during incubation were -3E, -I, and -J) and MAST-9A SAG (the subclade growing was -9D). The signal represented by MAST-4 was moderate and increased during the incubation both by FISH cell counts and by 18S rDNA mapping mostly to clades-A and -B. Finally, results were not consistent in two cases: MAST-11 was barely detectable by the 18S rRNA mapping but retrieved 823 transcripts with a moderate expression profile, while MAST-1D retrieved very few transcripts but abundant 18S rDNA reads.

Gene expression of two uncultured HFs
The transcripts associated to MAST-4A (10,419 transcripts) and to MAST-4B (3789) were annotated using the eggNOG database, which categorizes their functions into roles in metabolism, cellular and signaling processes, and information storage (Supporting Information Table S2). The large majority of transcripts were identified through the eggNOG database (only 1652 transcripts in MAST-4A and 707 in MAST-4B did not have a match) but many of them affiliated to uncharacterized proteins (3679 and 1544 transcripts). Most general functions were well represented by the expressed genes, being posttranslational modification and cytoskeleton the most represented categories. The next step was to target expressed genes potentially involved in the phagocytosis pathway, focusing on the three main steps: prey recognition and motility, phagosome maturation, and degradation in lysosome (Table 1). Many of these genes were associated with motility and/or cytoskeleton functions (475 transcripts in MAST-4A and 219 in MAST-4B). Genes coding for actin and tubulin, which are structural components of the flagella and the cytoskeleton, were abundant and highly expressed in both species. We also detected genes associated with microtubule formation, such as myosin, dynein, and kinesin, as well as several flagella-associated proteins and intraflagellar transporters that are essential for flagellar growth. In both species, we found highly expressed genes likely involved in the phagolysosome maturation step  some have more diverse cellular functions. These include phosphatidylinositol 3/4 kinase and several proton pumps that are potentially responsible for phagosome acidification (Table 1). For example, a rhodopsin was the third most highly expressed gene in MAST-4A, along with the vacuolar pyrophosphatase and GTPase Arf type in MAST-4B, which in turn are well known as regulators of vesicular traffic and actin remodeling. Although less expressed, the presence of the SNARE complex (Soluble N-ethylmaleimide-sensitive factor activating protein receptor) is very informative, as it is responsible for intracellular membrane fusion and trafficking steps interacting with vacuolar protein sorting. Finally, we also observed the expression of a set of genes encoding digestive enzymes (437 transcripts in MAST-4A and 126 in MAST-4B), such as the glycoside hydrolase family and peptidases, especially the lysosomal proteases cysteine cathepsins. These genes were among the most expressed in MAST-4B and were also important in MAST-4A (Table 1).
Finally, as we followed the gene expression on several time points along the incubation, we explored the possibility of changes in the expression profiles of the two species. The abundance of transcripts was relatively constant over time for MAST-4A, and increased markedly for MAST-4B (Fig. 4a). This was consistent with the 18S rDNA signal (Supporting Information Fig. S1), and suggested that the proportion of MAST-4A cells remained stable in the community, while that of MAST-4B cells increased. Then we focused on the subset of the 100 most highly expressed genes, with the expression signal normalized within each species to account for the changes of species abundance along the incubation (Fig. 4b). No clear changes seemed to occur during time, with the few temporal clusters appearing unrelated to any gene function in particular. Therefore, it seemed that the two species were transcriptionally at a similar stage throughout the incubation. Interestingly, in this subset of most highly expressed genes, we detected several that were related to the three main steps of phagocytosis described, and more specifically the Labarre et al.

Phagocytosis expressed in uncultured protists
presence of digestive enzymes as the cathepsins, confirming the relevance of this process for the flagellate ecophysiology.

Discussion
Using a mixed natural community sampled from surface water in Blanes Bay, we successfully promoted the growth of bacterivorous HFs in order to study their gene expression profiles. Based on both cell counts and 18S rDNA analyses, we observed a pronounced change in community composition during the incubation, most notably an increase in the abundance of HFs and a decrease in photosynthetic taxa. The switch from autotrophy to heterotrophy was expected as the experiment was conducted in the dark. This suppressed the growth of photosynthetic taxa, which in turn could be grazed by the heterotrophs. The development of the HFs was further supported by the growth of bacteria that proliferated early in the incubation as well as the absence of higher trophic grazers, following similar dynamics to those previously observed in other incubations with different initial communities (Massana et al. 2006;Weber et al. 2012). Larger taxa (i.e., dinoflagellates and ciliates) were nevertheless detected at the beginning in our incubation, and this signal most likely derived from broken cells, explaining the modest 18S rDNA signal detected and their decrease along time (Fig. 2). By integrating taxonomic classification of transcripts with microscopical cell counts, we detected a large diversity of efficient grazers in a predator dynamics scenario.
While 18S rDNA sequences can be classified phylogenetically and are useful for ecological studies on species distribution, many lineages such as the marine stramenopiles are understudied because they remain uncultured. In particular, they generally lack genomic information because most genomic research is biased toward a few cultured model species (Pawlowski et al. 2012;del Campo et al. 2014). This gap can now be filled by SCG, which allows retrieving the genomes of individually sorted cells without the need for culturing (Mangot et al. 2017). When combined with metatranscriptomics, which is a reliable approach for investigating metabolically active cells, SCG allowed the distinction between unicellular individual lineages present in the incubation that were so far barely recognized as most genomics focuses on a few model eukaryotes . While SCG is essential to assign gene functions to uncultured species, it is also known to have some limitations, such as lack of coverage in some genomic regions (Rinke et al. 2014), and the presence of contaminant sequences that can compromise the quality of the final assembly. Thus, the absence of a particular gene in the metatranscriptomics analysis could be because it was not amplified by the multiple displacement amplification, was not assembled, or was not annotated in the final SAG used as reference. At any rate, in this study, SCG has allowed unprecedented insights into MASTs gene expression profiles, which could have not been revealed with the metatranscriptomics on its own.
Phagocytosis is a specific form of endocytosis (uptake of extracellular material) that involves engulfing large particles (Niedergang and Grinstein 2018) and that originated billions of years ago (Yutin et al. 2009). It is a complex process found in diverse eukaryotes, and that involves a variety of functional genes that are often not unique to the phagocytosis pathway but are shared with other processes (e.g., actin filament, ABC transporters). This mode of feeding has been studied in depth in macrophages in the mammalian immune system, and also in some unicellular microbial eukaryotes (Dayel and King 2014). Gotthardt et al. (2002), performed a targeted study of the proteins involved in phagocytosis (and their corresponding genes) using direct proteomic analyses of extracted phagosomes. Recently, a complementary effort using comparative genomics has attempted to identify the set of genes that are unique and representative of different trophic modes, including phagotrophy (Burns et al. 2015(Burns et al. , 2018. These insights showed a complex process controlled by regulatory mechanisms involving numerous genes, revealing the potential for molecular detection of specific markers of phagocytosis. Linking gene expression and ecosystem function is feasible in marine bacteria, where marker genes for given biogeochemical functions in the oceans have been identified (Ferrera et al. 2015), allowing targeted studies of, for example, ammonia oxidation or phosphorous uptake (Imhoff 2016). Similar efforts toward the identification of genes indicative of a trophic strategy in microbial eukaryotes have recently been published (Alexander et al. 2015;Liu et al. 2016), including the search of specific traits driving heterotrophy (Beisser et al. 2017;Hu et al. 2018). Phagocytosis offers a unique case where genes can be related to microbial food webs. Therefore, to analyze genes that participate in phagocytosis in uncultured HFs, we used a metatranscriptomics approach on a community enrichment combined with a selection of specific transcripts provided by single cell genomes. Our method allowed to identify genes that control multiple aspects of the phagocytosis. We have detected many known characterized genes, and even though we sampled at different times during the enrichment, we did not see noticeable transcriptional changes. Therefore, in our targeted species, the phagocytosis machinery showed similar functional signal along the sampled time.
For two uncultured HF species, MAST-4A and MAST-4B, we detected genes involved in several of the main steps of the phagocytosis pathway as have been described mostly for mammalian macrophages. Phagocytosis is an actin-dependent process that is required to initiate prey capture. We identified genes that control the nucleation of new actin filaments, namely the assembly factors of the Arp2/3 complex (May and Machesky 2001;Lai et al. 2008). Actin polymerization is also promoted by the small GTPases of the Rho family, of which we detected Cdc42 here; this is generally followed by Rac1 and Rac2 activation, but these were not observed. These GTPases interact with proteins of the WASP and Scar/WAVE family that stimulate the Arp2/3 complex (Castellano et al. 2001), and both were actively expressed in the community assemblage. Involved in phagosome maturation, the Rabfamily GTPases (Vieira et al. 2003;Fairn and Grinstein 2012) participate in the formation of the phagolysosome, and here we failed to detect important genes involved in this step, such as Rab5 and Rab7. These gene absences could be due to genome incompleteness of the SAGs. In addition to GTPases, we found a putative phosphatidylinositol 3-kinase (PI3K), which is necessary for successful phagolysosome formation (Vieira et al. 2003). The food vacuole seems to be acidified by highly expressed proton and cation pumps that are organized by vacuolar ATPases (Kissing et al. 2015). In addition, we identified high expression of a rhodopsin gene in MAST-4A, and we hypothesize that the corresponding protein could act as a light-driven proton pump contributing to phagosome acidification (Slamovits et al. 2011;Kandori 2015). Microbial rhodopsins were initially found in Archaebacteria and are now known to be widely dispersed light-driven ion transporters across all domains of life (Beja et al. 2000;Finkel et al. 2013). Finally, the strongly acidified phagolysosome becomes rich in hydrolytic enzymes that promote the degradation of the ingested microbial prey, and here we found several highly expressed digestive enzymes such as cathepsins and glycoside hydrolases. Our results indicate a set of genes related to phagocytosis that were highly expressed in environmentally relevant bacterivorous uncultured HFs. The genes defined here are often not exclusive to phagocytosis, but represent a continuum of proteins involved in different types of fusion, vesicle transport, and digestive processes. In addition to markers of phagosome acidification (V-ATPase and rhodopsins), the digestive enzymes cathepsins would be ideal candidates for detecting phagocytosis in assemblages of marine HFs. Future work will need to assess their suitability to target this trophic mode in natural communities.

Labarre et al. Phagocytosis expressed in uncultured protists
Historically, the study of microorganisms, including ecogenomics and gene expression profiling, has focused on single species in pure culture. Metatranscriptomics allows us to circumvent the culture-dependent analysis of many microbial species and to perform functional studies in complex communities. In our analysis, SCG has facilitated the targeting of the transcriptional profile of uncultured HFs. This approach is a new opportunity to examine the heterogeneity of microbial communities, recover their true diversity, and better understand specific biological processes performed by particular species.