A wide variety of non-coding RNAs have been identified in various species of organisms known to science. However, RNAs have also been identified in "metagenomics" sequences derived from samples of DNA or RNA extracted from the environment, which contain unknown species. Initial work in this area detected homologs of known bacterial RNAs in such metagenome samples.[1] [2] Many of these RNA sequences were distinct from sequences within cultivated bacteria, and provide the potential for additional information on the RNA classes to which they belong.
The distinct environmental sequences were exploited to detect previously unknown RNAs in the marine bacterium Pelagibacter ubique. P. ubique is extremely common in marine sequences. So sequences of DNA extracted from oceans, many of which are inevitably derived from species related to P. ubique, were exploited to facilitate the analysis of possible secondary structures of RNAs predicted in this species.[3]
Subsequent studies identified novel RNAs exclusively using sequences extracted from environmental samples. The first study determined the sequences of RNAs directly extracted from microbial biomass in the Pacific Ocean.[4] The researches found that a large fraction of the total extracted RNA molecules did not appear to code for protein, but instead appear to conserve consistent RNA secondary structures. A number of these were shown to belong to known small RNA sequence families, including riboswitches. A larger fraction of these microbial small RNAs appeared to represent novel, non-coding small RNAs, not yet described in any databases.A second study used sequences of DNA extracted from various environments, and inferred the presence of conserved RNA secondary structures among some of these sequences.[5] Both studies identified RNAs that were not present in then-available genome sequences of any known organisms, and determined that some of the RNAs were remarkably abundant.[4] [5] In fact, two of the RNA classes (the IMES-1 RNA motif and IMES-2 RNA motif) exceeded ribosomes in copy number, which is extremely unusual among RNAs in bacteria. IMES-1 RNAs were also determined to be highly abundant near the shore in the Atlantic Ocean using different techniques.
RNAs that were identified in environmental sequence samples include the IMES-1, IMES-3, IMES-4, Whalefall-1, potC, Termite-flg and Gut-1 RNA motifs. These RNA structures have not been detected in the genome of any known species. The IMES-2 RNA motif, GOLLD RNA motif and manA RNA motif were discovered using environmental DNA or RNA sequence samples, and are present in a small number of known species. Additional non-coding RNAs are predicted in marine environments,[4] although no specific conserved secondary structures have been published for these other candidates. Other conserved RNA structures were originally detected using environmental sequence data, e.g., the glnA RNA motif, but were subsequently detected in numerous cultivated species of bacteria.
The discovery of RNAs that are not detected among currently known species mirrors findings of protein classes that are currently unique to environmental samples.[6]