Analysis of Microbial Community Structure and Interactions in a Natural Metal-Contaminated Ecosystem
Principal Investigator: Jill Banfield
Co-Principal investigator: Gary Andersen
Collaborators: Robert Hettich, Michael Thelen, and Forest Rowher
Program Manager: Dr. Dan Drell
Brief summary of project objectives:
The natural environment is populated by multi-species microbial consortia whose function is likely to vary with species membership and environmental conditions. Our research is investigating this concept via analysis of simple, well defined microbial communities that are largely responsible for the formation of acid mine drainage, a problem related to energy resources, thus of direct relevance to the Department of Energy. Our work leverages genomic data from an extremely acidophilic (pH ~ 0.8) natural biofilm at the Richmond Mine, Iron Mt., CA obtained in the prior grant period.
Approach:
We are developing microarrays in order to determine the ways in which the expression of both strain-specific and community-essential genes depend upon the mixture of organism types and geochemical conditions. This requires construction of arrays that can detect gene expression in members of natural species populations. Consequently, our first focus has been to characterize strain diversity in the environment and to develop arrays that are sensitive to this variation.
In a second approach, we have tested mass spectrometry-based approaches for the identification of the abundant proteins in the natural biofilms. Our research includes development of methods to resolve species-species and species-environment interactions and to identify the proteins that are most abundant, and likely to be most important in environmental adaptation.
Research progress:
1. Reconstruction of genomes from the biofilm community: In a collaborative effort with the JGI, we acquired and assembled an initial 76 Mb of shogun sequence data to largely reconstruct the genomes of the 5 dominant biofilm members and analyze their metabolic potential (Tyson et al. Nature, 2004). An important finding in this work was extensive homologous recombination amongst members of one archaeal population. In the current grant period, the binning (assigning contigs to organisms) of the genomic data has been fine tuned using tetranucleotide frequency analysis and additional manual genome assembly. Homologous recombination has been analyzed in the F. acidarmanus population.
The (environmental) genome assemblies of the two dominant organism types, Leptospirillum group II and Ferroplasma type II, have been greatly improved, and both genomes are now close to complete. The Leptospirillum group II genome is in about 70 pieces, a small subset of which appear to be a plasmid (probably integrated). This genome assembly is included in the supplementary materials of Ram et al. (2005). In the case of Leptospirillum group II, the population is close to clonal (few single nucleotide polymorphisms). However, linking of contigs required consideration of strain heterogeneity. The initial assembly was primarily confounded by the presence of (usually single) genes (often transposases) that occur in only as subset of population members. Most of the more significant variability in gene content appears to be associated with the putative plasmid fragments. The assembly of the Ferroplasma type II genome is in about 15 pieces, most of which have been tentatively ordered. This population exhibits considerable strain heterogeneity, both at the level of gene sequence and gene content.
2: Genomic heterogeneity in the Ferroplasma acidarmanus population: The fer1 isolategenome reconstruction was largely completed before the current grant period. In the current grant period the focus of work was to comprehensively compare the isolate genome to its associated strain population in order to quantify genome heterogeneity. This involved analysis of variation in gene content, gene order, and gene sequence across > 90% of the genome reconstructed using environmental genomic data (Allen et al. PNAS, in review).
3. Strain population gene content: In order to obtain an inventory of genes for microarray and proteomics studies, we have developed methods for reconstructing gene variants from natural populations. This has included a collaborative effort that has resulted in a new program (‘GenomeStrainer’). At this time, we can use this program to identify and reconstruct strain variant sequences for genes and print them to a database (Eppley et al. in prep.). Recently, we have been developing approaches for longer-range reconstruction of strain variant types, which requires explicit consideration of the recombination structure.
We have developed a number of scripts to analyze sequences and calculate dN/dS values (genome wide) for use in studies of selection (results reported in Allen et al. in review). For the F. acidarmanus population, results indicate that essentially all genes sampled are under strong purifying selection.
We have used two approaches to determine the extent to which sequences reconstructed from the genome sequence data are representative of those in most environmental biofilms at the site. The first involves screening of small insert (shotgun) libraries) at both UC Berkeley and JGI (in the later case, preliminary to further sequencing). In the second, PCR-based methods were used to verify the discovery that only a small number of strains dominate the sites sampled recently (Whitaker et al. in prep; note that the PCR component is partly funded via a post doctoral scholarship to Rachel Whitaker from NSF). Verification of limited strain heterogeneity across sampling sites and biofilm growth stages is vital to the design of microarrays that can be used to monitor natural populations in situ.
4. Microarray design and construction: In consultation with Affymetrix Inc., a high-density microarray for use with Ferroplasma acidarmanus populations is in the final stages of development. This array will have a dual use for i) the determination of environmental populations of the different Ferroplasma strains and ii) the analysis of Ferroplasmagene expression. The gene sequence database development is well advanced (see above). Dr. Ping Hu, in the laboratory of Gary Andersen at LBNL, has optimized the method of selection of probes for this array. For gene expression, 11 unique probe-pairs are used for measuring mRNA expression for all non-duplicated genes in the Ferroplasma genome. Dr. Hu has written microarray analysis scripts to allow for gene sequence polymorphisms that may occur when analyzing previously unsequenced strains of Ferroplasma. Additional probes are used to identify nucleotide polymorphisms that are known to correspond to distinctive environmental isolates. Using total genomic DNA without the complications that may be caused by PCR amplification, the array approach will allow an unprecedented view into the species complexity of different environmental samples.
5. Proteomic characterization of a natural microbial community: In order to explore a second approach for evaluating microbial activity, we undertook an analysis of the proteome of a biofilm sample similar to the one for which genome sequence data is available (same biofilm type, different sampling location, two years later). The biofilm was collected, frozen on dry ice on site, and transported back to UC Berkeley. Proteins extracted from the multi-organism biofilm were separated into fractions that derived from the three major sources: cytoplasmic, membrane, and extracellular. Abundant proteins from the extracellular fraction were purified, heme-stained and N-terminal sequenced in order to identify abundant possible cytochromes. Whole proteome fractions were sent to ORNL for mass spectroscopic analysis. At ORNL, proteins were digested with trypsin, peptides run through a liquid chromatography column, and fractions analyzed via a two step MS process (LC-MS/MS) using two different instruments. Peptides were assigned to proteins (in most cases uniquely) by comparison with peptides predicted from the genomic data. A false positive protein identification rate of << 4.5% was established in a test that involved predicted proteins from over 200 microbial genomes.
We confidently detected (i.e., 2 or more different peptides were matched to a protein) 2,033 proteins, representing 17% of the predicted proteins encoded by the 5 characterized genomes and 48% of the predicted proteins encoded by the genome of the dominant biofilm member (Ram et al. Science, 2005). Although the later represents detection comparable to that obtained previously for isolates, we found that MS spectral assignment was prevented when differences existed between the protein variant in this sample and the sample characterized genomically. This problem can be tacked via PCR on a case by case basis, but strain divergence remains a significant challenge for future proteogenomic work (addressed, in our case, initially by moving from composite genome- to strain variant genome-based analysis).
The level of protein detection achieved allowed some evaluation of functional partitioning (nitrogen fixation, biofilm polymer formation) and extensive analysis of resource allocation by Leptospirillum group II (major investments in defense against radicals and maintenance of correctly folded proteins). We validated 572 predicted novel proteins, yet found that a novel (unique) hypothetical proteins are under-detected relative to proteins from other functional groups. Hypothetical proteins expressed slightly or not at all tend to be encoded in blocks of novel genes of putative phage or plasmid origin, suggesting that many of these laterally acquired genes are not functional or are rarely used. The most abundant hypothetical protein in the extracellular fraction was shown to be a novel cytochrome central to iron oxidation and acid mine drainage formation. For a full description of these findings and of other results see Ram et al. Science, 2005 and associated extensive on line supplementary on line materials.
6. Novel lineages: We reconstructed the sequence of one few kb genome fragment that encoded a previously undetected 16S rRNA gene sequence. Based on phylogenetic analysis of this and other genes on the fragment, this sequence derives from a member of a novel archaeal lineage (WTF-1). The rRNA gene sequence was not obtained in previous extensive PCR-based surveys because it contained several key mismatches to commonly used primer sets. Ribosomal FISH probes designed to this organism revealed a very small cell size and indicated that this group is present in most biofilm communities within the Richmond Mine. Subsequently, redesigned PCR primers established the existence of a radiation that includes three distinct WTF lineages that branch near the base of the euryarchaeota. Enrichments of WTF cells were obtained by filtration. Although TEM imaging of the resulting cell fraction confirmed extremely small size, some uncertainty remains about the correspondence between cell morphology and cell type. Although Science has indicated willingness to accept the paper reporting these results (Baker et al), we have requested suspension of the manuscript until the cell morphology issue can be perfectly resolved. Regardless of the cell morphology, the enrichment includes a large fraction of WTF cells. A JGI CSP proposal requesting up to 100 Mb of sequence to characterize this concentrate via shotgun sequence analysis has been awarded (B. Baker, proposer, 2005).
A gene identified as a possible arsenate reductase from WTF-1 was cloned into an arsenate reductase-deficient E. coli strain. The ability to confirm arsenic resistance has been confirmed (Flanagan et al. in prep.).
7. Phage analysis: As noted above and in prior annual reports, genome sequence analysis indicates the importance of phage- or phage-like entities as sources of blocks of novel genes. Recent analysis has established that variation in the phage-like gene content is a major source of variability within strain populations. In a supplement-funded effort with Forest Rowher’s lab, we have undertaken an effort to characterize the phage population at the site. Multiple samples have been sent to the Rowher lab for viral counts and initial characterization. Researchers have encountered some difficulties in handling of DNA from these samples, a major obstacle to genome sequencing. Problems were encountered in all phases of library construction for prokaryote sequencing. We are confident that these problems can be overcome with more focused effort. As a result of these challenges, little of the supplement funding has been spent. A renewed effort is planned this summer, taking advantage of sampling trips in June and July to obtain fresh materials.
8. Cultivation and isolation of representatives of new lineages: Our group has isolated the first representative of Leptospirillum group III and characterized its basic physiology. The isolation was directed via genome sequence data from Tyson et al. (2004), which revealed that only members of this group have the ability to fix N2 (the first environmental genome sequence-directed isolation, to our knowledge). The paper describing this work was submitted to Applied and Environmental Microbiology in late 2004 and is now in press (Tyson et al. in press).
9. Biofilm characterization: In order understand microbial communities in their natural environments, we have investigated biofilm structure and growth in situ. This has included determination of the amount of C and N fixed per unit area over a known time since biofilm initiation (13 days) and estimation of the average doubling rates for this community. Experiments planned in a pair of field trips June and July 2005 will reproduce and extend these experiments. Results indicate that the rate of carbon fixation is comparable to that in the more productive areas of the ocean - on both a per unit area and per cell basis. The average cell doubling rate estimated from field-based observations is close to that obtained in the laboratory under optimal conditions (15 hr). Characterization has involved fluorescence in situ hybridization, 16S rRNA gene library analysis, construction of small insert libraries for genomic screening, and scanning electron microscopy. Results of this work will be reported by Belnap (Ph.D. student, work in progress).
In addition to determination of community membership and organization, we have established a collaboration with Dr. George Cody at the Carnegie Institute to characterize biofilm polymers. Genome sequence data suggested cellulose synthesis, so several biofilms have been characterized by NMR. Initial results confirmed the presence of cellulose, although other polysaccharides are also present.
10. Other relevant developments. The other source of support for this project is from the NSF Biocomplexity program (PIs Banfield, Power, Getz, Richardson). The focus of this work has been on laboratory experiment-based tests of ecological theories related to colonization and response to environmental perturbation. The research includes ecological prediction (Power) and ecosystem mathematical modeling (Getz). Relevant to ongoing work under this grant, work by Chris Belanp under the NSF project has refined bioreactor development and has now achieved a design capable of growing multi-species biofilms that closely resemble those sampled in the field. Sufficient biomass can be cultivated for both proteomic and microarray experiments.
Researchers supported or partially supported:
Gene Tyson, Rachna Ram, Chris Belnap (fraction), Brett Baker (fraction), Eric Allen (NSF post doctoral fellowship, partial support), Rachel Whitaker (NSF-funded post doctoral fellowship, partial research costs only), Judith Flanagan (now assistant research professor at UC San Francisco), Philip Hugenholtz (now at JGI), Michael Thelen (LLNL sabbatical visitor in 2004, research costs only).
Via LBNL: Ping Hu, via ORNL supplement: Nathan VerBerkmoes
Publications and manuscripts in advanced preparation from this grant:
Includes 2004 papers stemming primarily from the prior grant period:
Tyson, G.W., Lo, I., Baker, B.B., Allen, E.E., Hugenholtz, P. and Banfield, J.F. (2005) Genome-Directed Isolation of the Key Nitrogen Fixer, Leptospirillum ferrodiazotrophum sp. nov., from an Acidophilic Microbial Community. Applied and Environmental Microbiology, in press.
Tyson, G.W. and Banfield, J.F. (2005) Cultivating the uncultivated: a community genomics perspective. Trends in Microbiology, in press.
Allen, E.E. and Banfield, J.F. (2005) Community genomics in microbial ecology and evolution. Nature Reviews Microbiology, in press (online June 1, 2005)
Ram, R.J., VerBerkmoes, N.C., Thelen, M. P., Tyson, G.W., Baker, B.J. Blake, R.C. II, Shah, M., Hettich, R.L. and Banfield, J.F. (2005) Community proteomics of a natural microbial biofilm, Science, May 5 Science Express.
Baker, B.J., Lutz, M.A., Dawson, S.C., Bond, P.L., and Banfield, J.F. (2004) Metabolically active eukaryotes in extremely acidic mine drainage. Appl. Environ. Microbiol. 70, 6264-6271
Macalady, J.L., Vestling, M.M., Baumler, D., Boekelheide, N, Kaspar, CW, and Banfield JF (2004) Tetraether-linked membrane monolayers in Ferroplasma spp.: a key to survival in acid. Extremophiles, 8: 411-419
Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E., Ram, R.J., Richardson, P., Solovyev, V., Rubin, E., Rokhsar, D., and Banfield, J.F. (2004) Insights into microbial community structure and metabolism by reconstruction of genomes from a natural environment. Nature, 428, 37 – 43.
Allen, E.A., Tyson, G.W., Whitaker, R.J., Detter, J.C., Richardson, P.M., and Banfield, J.F. Genome dynamics in a natural microbial strain population, PNAS, in review.
Baker, B.J., Tyson, G.W., Webb, R.I., Hugenholtz, P., and Banfield, J.F. An acidophilic ultra-small archaeon revealed by community genome sequencing. Science, accepted but placed on hold for confirmation of the TEM result.
Flanagan, J., Baker, B., and Banfield, J.F. Characterization of Arsenate Reductase from an acidophilic ultra-small archaeon. Extremophiles, in prep.
Eppley, J.M., Tyson, G.W., Banfield, J.F., and Getz, W. Reconstruction of strain variant sequences from community genomic data to enable gene expression and population genomic analyses. Genome Biology, in prep.