The kingdom Fungi, long thought to be primitive plants, are now known to be more closely related to animals and other metazoans, forming part of the larger group Opisthokonta (organisms with a single posterior flagellum). Fungi are a hugely diverse group of eukaryotes that includes chytrids, molds, mushrooms, lichens, rusts, smuts, and yeasts. They form close symbioses with plants (as mycorrhiza and endophytes) and algae (as lichens), break down woody and leafy matter on forest floors, play roles in weathering rocks, and parasitize animals, plants, and other fungi. Fungi straddle both the macroscopic and microscopic observational scales. Most people are familiar with mushrooms or cup fungi that grow on rotting logs and soil in spring or fall. However, the majority of fungal species remain hidden from view as microscopic structures. Although some species can be identified from macroscale structures, mycologists must rely on shapes, colors, and development of fungal cells observed with the microscope in order to study microfungi. Most fungi exist only as microscopic forms, growing as single cells (yeasts) or as a series of cylindrical cells (hyphae), periodically bearing structures for propagation (spores). Traditionally, great importance was placed on comparing fungal spore–producing structures, arising after either sexual or asexual division, for identification. Often, though, only a part of the fungal life cycle is observed. As an added complication, very similar structures sometimes occur in unrelated species. For such microorganisms, therefore, it is not surprising that scientists rely increasingly on DNA sequence comparisons to link different disconnected parts of the life cycle of a fungus and to distinguish between different species.
Mycologists have used DNA sequences for species identification in several ways since the early 1990s, but DNA barcoding, as it is known today, was proposed first by insect scientists at the University of Guelph, Canada, in 2003. The concept was to make species identification more efficient, employing a concept similar to Universal Product Codes used to identify retail products. DNA barcoding involves determining a string of DNA letters (500–700) from a single gene or region of DNA, standardized across large groups of life. These sequences then are compared with databases of reference sequences that have been firmly connected to accurately identified and preserved cultures or specimens, with associated data on geographical distributions and biology. The Barcode of Life Data Systems (BOLD) now includes close to a million DNA barcodes, representing more than 70,000 species; however, few of these are fungal.
How does DNA barcoding work?
The first step in creating a DNA barcode is to select a defined DNA sequence, such as a gene or region, which will be unique for each species across a large group of organisms. For animals, the gene cytochrome c oxidase I (COI or cox1) was chosen as a marker. This gene is found in the energy production center of the cell, the mitochondrion, on a set of DNA molecules that is separate from the chromosomes of the nucleus. Each cell contains many mitochondria and it is easy to use a technique called the polymerase chain reaction (PCR) to select and amplify COI for analysis. In addition, only a small amount of tissue or cells is needed. In animals, COI varies enough to distinguish closely related species, but the sequences are also similar enough to allow statistical comparisons of sequences from very different species. One pitfall of using a single gene is that its rate of evolutionary change may differ from one group of organisms to another. As such, COI may be evolving too slowly in most plant species. However, in fungi, vast differences in length make it challenging to amplify using PCR. Hence, the gene area of choice in most fungi is a short structural gene (the 5.8S gene) that forms part of the nuclear ribosome, with two “spacer” regions [internal transcribed spacer 1 and 2 (ITS1 and ITS2)] on either side that do not encode for functional genes (see illustration). While the rate of change of the 5.8S gene is very slow, the two surrounding ITS1 and ITS2 spacers evolve much more quickly. The whole combined region is often referred to as the ITS. The ribosome is the cellular structure that functions to build proteins from DNA templates and therefore is integral to biological life. Ribosomal structural genes, located in the nucleus, exist in multiple copies and, similar to mitochondrial genes, are extremely easy to amplify by PCR. Other DNA areas also are used for species identification in fungi. For example, researchers working on yeasts (single cellular fungi that occur in several branches of the fungal tree of life) favor a variable stretch of another ribosomal structural gene, the 28S (LSU, or large subunit).
Organizations and databases
The ITS region was selected as the barcoding region to be recommended for fungi by a diverse group of mycologists at an international workshop organized by the Consortium for the Barcode of Life (CBOL) in 2007. CBOL is funded by the Alfred P. Sloan Foundation and has its offices at the Smithsonian Institution. The Consortium's mandate is to promote the use of barcoding technologies in all major biological disciplines, oversee the international standardization of barcoding markers for all groups of organisms, and provide technical protocols for scientists who wish to incorporate barcoding in their research. The Consortium includes 170 members from natural history collections and museums, government and nongovernmental organizations, biotechnology companies, and academia. The International Barcode of Life initiative, iBOL, is an international research network centered at the University of Guelph and is scheduled to begin its five-year mandate in October 2010. iBOL involves scientists from 25 countries, studying all groups of living organisms, except bacteria.
After the selection of standardized marker genes, barcoding requires the development of reliable databases based on sets of specimens of known identity and origins, which then can be used for identification purposes by other scientists. GenBank, located at the National Center for Biotechnology Information (NCBI) in the United States, and its international database partners, the European Molecular Biology Laboratory (EMBL) and the DNA Data Bank of Japan (DDBJ), house the majority of DNA sequences produced by researchers investigating fungal diversity. GenBank presently lists more than 150,000 sequences from the ITS in fungi, representing approximately 14,000 species. Although a careful verification process is performed for data quality, GenBank policies allow submitters to determine the identity of organisms associated with DNA sequences. Erroneously identified sequences complicate the identification of unknown samples. Therefore, NCBI is developing a well-validated set of reference sequences (RefSeq). An actively curated subset is already available for identification of bacterial sequences and this is now being expanded to fungi. Additionally, a number of sequence databases exist to aid fungal data gathering and identification with their own sets of reference sequences. The UNITE database is focused on well-verified root-associated fungi, but is expanding to other groups of fungi from soil. The Assembling the Fungal Tree of Life (AFTOL) database includes a diverse set that is representative of the entire fungal kingdom, including ITS sequences (also deposited at GenBank). Among economically important molds, there are online DNA sequence–based identification databases for the plant pathogenic genus Fusarium and the biocontrol genus Trichoderma. Several other specialized databases focusing on specific groups of fungi are housed at the Fungal Biodiversity Centre in the Netherlands as well as elsewhere.
Importance of DNA barcoding
The most common, conservative estimate for fungal species diversity is 1.5 million species. Given that approximately 100,000 species are now known to science, this means that more than 90% are undocumented. By enabling identification of fungi by anyone able to access the necessary technology, DNA barcoding will allow exploration of the full range of fungal diversity and the interactions that fungi have with other groups of organisms. New sequencing technologies have rapidly accelerated the amount of DNA sequence data that can be collected from the environment without collecting physical specimens. Mycologists continue to discover new diversity inside plants and rocks, and even at the bottom of the ocean. An understudied environment, the soil, is a particular focus for several research groups. One example is an enigmatic group of very common fungi known informally as “soil clone group I” (SCGI). Members of this group have been detected in soil on at least three continents and fall in the fungal phylum Ascomycota, but they are unrelated to any known class of Fungi. So far, they are only known by their unique DNA sequences and nothing is known of their biology or their appearance. In addition to addressing important gaps in our knowledge of the ecological roles of fungi in carbon cycling and other processes, there are several economical and practical implications to having an effective barcoding system in place for fungi. Governments are concerned with minimizing movement of pathogenic fungi as crops, lumber, and agricultural produce are traded around the globe. Hidden fungi, such as endophytes living inside plants, can be effectively detected and identified without labor-intensive culturing and microscopy. Last, barcoding will help the biotechnology sector increase the efficiency of discovery of novel fungi that can be used for biological control of harmful pests or pathogens and for production of industrial enzymes [as in bioconversion (including biofuels)], antibiotics, and probiotics, thereby benefiting human health.
See also: Biodiversity; Biotechnology; Deoxyribonucleic acid (DNA); Fungal ecology; Fungal genomics; Fungal phylogenetic classification; Fungi; Genetic code; Genetic mapping; Mycology; Ribosomes