Systems Biology Approaches to Understanding and Predicting Fungal Virulence



Fig. 3.1.
The “bench–desk–bench” loop of SysBio. A typical SysBio workflow. Based on an observation, such as a difference in yeast colony size and phenotype under two different conditions, a hypothesis is formed based on the experimental results and further supported by prior knowledge in published literature. This hypothesis can be addressed using a number of qualitative and quantitative methods, the results of which are deposited in publically available databases. With these data, modeling approaches attempt to mimic, predict, and visualize data. Once modeled, experimental verification and refinement of the model, creates the bench–desk–bench loop, where iterative cycles of prediction and verification are undertaken until the model and experiment validation are representative of one another



The integration of data from these various technologies into SysBio models has remained a formidable challenge. Hence, SysBio approaches have been classified as “top-down,” “bottom-up,” or “middle-out” (Bray 2003; O’Malley and Dupre 2005; Bruggeman and Westerhoff 2007; Petranovic and Nielsen 2008). A top-down approach aims at extracting principles from experimental data representing molecular properties of the studied system. A top-down approach focuses on the comparison of genome-wide data sets, such as transcriptome and proteome, to formulate a focused and testable biological hypothesis. A bottom-up approach uses the knowledge of molecular properties of the system components to predict the behavior of the system as a whole. In short, the bottom-up approach connects smaller entities to predict, identify, and simulate the behavior of a bigger system. A bottom-up approach usually starts with prior knowledge of a specific gene, and a model is then generated based on these data to investigate the system as a whole. A middle-out approach describes a process of starting at the level for which the best information for the process of interest is available, and then combining higher and lower levels of structural and functional information, essentially breaking out of a more strict top-down and bottom-up loop in order to validate the hypothesis at the current state of biological understanding (Brenner et al. 2001). Regardless of the approach used, SysBio strategies differentiate themselves from more classical biological methods by consciously taking into consideration different levels and dynamics of biological data (DNA, RNA, protein) simultaneously.

Fungal SysBio was ushered in with the completion of the Saccharomyces cerevisiae genome sequence, the first completely sequenced eukaryotic genome (Goffeau et al. 1996). Since then, the genomes of many of the most common fungal pathogens including, but not limited to, Candida albicans, Aspergillus fumigatus, and Cryptococcus neoformans, have become available (Loftus et al. 2005; Nierman et al. 2005; van het Hoog et al. 2007). Community-wide initiatives such as the Fungal Genome Initiative (Cuomo and Birren 2010) and the 1,000 Fungal Genomes Project (http://​1000.​fungalgenomes.​org/​) have been useful tools for studying the evolution of fungal virulence. The discovery of key genes positively and negatively regulated during the infection process, and understanding the function of their products, will drive the design of new strategies to combat fungal pathogens.

In this chapter, we provide a comprehensive overview of recent SysBio methods suitable for study of fungal virulence, including genome sequencing, -omics technologies, and bioinformatics tools, with an emphasis on computational and modeling-based approaches. We focus on the genera Candida and Saccharomyces; the latter stands out as a “workhorse” of fungal SysBio in which many of the methods described herein were originally established or tested (Mustacchi et al. 2006; Santamaria et al. 2011). These approaches are used to identify molecular wiring and dynamics in biological networks, with the goal of identifying their biological function and eventually identifying novel therapeutic options. We describe the applicability of each method to specific experimental questions using numerous case examples and critically discuss some of the current pitfalls in the analysis of SysBio data sets.



II. High-Throughput and –Omics-Based Methods for Studying Fungal Virulence


SysBio approaches, especially top-down analyses, incorporate genome-wide data sets such as comparative genomics, transcriptomics, and proteomics data. These approaches fit into the category of “–omics” or “–ome” studies, which attempt to analyze a genome-wide response to a specific condition. –Omics studies represent an important shift in the way biological data is both produced and interpreted, complementing traditional hypothesis-driven research (Weinstein 2001). In order to understand SysBio as a whole, it is important to understand the types of data sets that it utilizes to address a given question. Several important methods have established themselves in this field over the past decade and have been used extensively to investigate fungal virulence. For clarity, we have divided these methods into qualitative and quantitative approaches. We address some of the most popular methods used at different levels of biological understanding, including DNA, RNA, and protein, as well as epigenetic modifications and validation methods, and examine the key contributions they have made to the understanding of fungal virulence.


A. Genomics


The initial genomic sequencing of S. cerevisiae was a monumental international collaboration that included some 600 scientists worldwide (Goffeau et al. 1996).

This sequencing was performed using a series of hybrid plasmids, called “cosmids.” Cosmids had the advantage that a much longer DNA sequence stretches could be incorporated than using normal plasmids, and at the same time longer DNA stretches could be sequenced to build up the genomic library. Sequencing polymerase chain reaction (PCR) fragments then filled the remaining gaps between sequence stretches of the assembled genomic library to complete the genome (Dujon 1993). The Candida Genome Sequencing project began directly after the S. cerevisiae sequencing in 1996, ending in 2004 with the C. albicans genome assembly known as Assembly 19 (Jones et al. 2004). This genome assembly was divided into 412 contigs (consensus stretches of DNA that are assembled to form the scaffold of the genome assembly) and sequenced with a shotgun-based sequencing strategy. In order to obtain a more complete view of the diploid sequence, Assembly 21 was created using a fosmid library, which is conceptually similar to a cosmid library, except that it is based instead on a bacterial F-plasmid and is more stable than a cosmid because of its low copy number (Hall 2004). These early sequencing projects took years because of the low throughput.

Today, genome and transcriptome sequencing has become routine, with ever-increasing stability, coverage (several fungal genomes can now be sequenced in a couple of days), and bioinformatics assembly tools publically available. As DNA sequencing technologies have become more efficient, there has been a surge in the number of sequenced genomes, with over 150 fungi sequenced so far (Marcet-Houben and Gabaldon 2009). These sequences facilitate functional and comparative genomics studies. Functional genomics aims to understand relationships between genotype and phenotype. Comparative genomics attempts to identify genes or genetic rearrangements between closely related species based on their DNA sequence; in the case of fungal pathogens, this often includes a highly virulent species compared to a significantly less or even avirulent species. This is done in order to identify genetic transitions that might explain the evolutionary divergence of pathogens or the identification of novel virulence factors.

Comparative genomics studies use two main techniques: comparative whole genome sequencing or hybridization-based microarrays. Comparative whole genome sequencing literally attempts to identify genetic elements present in one species and absent in another based on the genome sequence; this is done by overlapping the genome sequences and identifying outlier sequence stretches that do not match between them. Comparative genomic hybridization (CGH) arrays identify genome-wide variation in gene copy number. CGH experiments assume that the binding ratio of the experimental sample to the control is proportional to the sequencing concentration in the samples. These methods provided significant insight into the evolution of pathogenicity for many fungal species. For example, early comparative studies of fungi identified a strong sequence homolog among 228 genes in S. cerevisiae, Schizosaccharomyces pombe, Aspergillus niger, Magnaporthe grisea, C. albicans and Neurospora crassa genomes for which no homology was found in the human or mouse genomes, representing potential targets for pan-fungal treatment (Braun et al. 2005).

Numerous studies have investigated the evolution of pathogenicity within a single fungal clade. For example, in the Candida clade, eight genomes were sequenced, including the C. albicans WO-1 strain (which is characterized for white-opaque switching and is associated with specificity to host tissues), along with the de novo sequencing of C. tropicalis and C. parapsilosis, Lodderomyces elongisporus, C. guilliermondii and C. lusitaniae, many of which are now classified as emerging fungal pathogens (Butler et al. 2009). These strains were compared to the previously sequenced genomes of C. albicans clinical isolate SC5314, the marine yeast Debaryomyces hansenii, and species from the Saccharomyces clade. Some 21 gene families emerged that were enriched in pathogenic species as compared to nonpathogenic fungi. A related study investigated the closest known relative of C. albicans, C. dubliniensis, which, despite its similarities, is significantly less virulent than C. albicans. Comparative sequence analysis has identified almost 200 species-specific genes in C. albicans, including the absence of the key C. albicans invasion gene ALS3 in C. dubliniensis, and members of the aspartyl proteinase family SAP4 and SAP5 (Jackson et al. 2009). ALS3 is among the most important virulence factors in C. albicans. It is a cell surface protein that plays a major role in adhesion to host cells and in maintenance of infection (Hoyer 2001; Hoyer et al. 2008). Notably, numerous translocations were identified in C. dubliniensis, especially in the SAP family, which is known to play a role in Candida pathogenesis. Comparative genomics has even lent itself to the investigation of genetic variations at chromosome level, using a single C. albicans isolate that had been passaged multiple times in an in vivo model organism using CGH (Forche et al. 2009), showing the environmental impact on the host strain evolution. Together, these studies collectively demonstrate that even closely related species have significantly diverged at their genomic levels, suggesting mechanisms for the evolution of fungal virulence factors.

A number of resources for fungal genomics research have recently been made available. Large genome databases, including the Broad Institute (http://​www.​broad.​mit.​edu/​annotation/​fgi/​), the Sanger Center (http://​www.​sanger.​ac.​uk/​Projects/​Fungi/​), the Institute for Genomic Research (http://​www.​tigr.​org/​tdb/​fungal/​), and the National Center for Biotechnology Information (http://​www.​ncbi.​nlm.​nih.​gov/​genomes/​FUNGI/​funtab.​html), are all publically available.


B. Transcriptomics


Looking at the DNA level, one can investigate how the information stored in the genetic code is translated into protein molecules; however, it does not provide information on the diverse molecules actually produced in the cell normally or in response to different environmental conditions. This takes place at the RNA level by looking at the complete set of RNA species produced in a given population of cells at a specific time. This field is referred to as transcriptomics.

There are two main methods to investigate transcriptional dynamics, also known as expression profiling. These include genome-wide microarrays and, more recently, next-generation sequencing (NGS) technologies, most notably, RNA-seq. Microarray experiments employ special microarray chips carrying printed copies of the entire genome and are used for assessing the relative differences in gene expression between a control sample and a treated sample. A microarray chip is, in principle, able to measure the relative changes in expression levels of all known genes simultaneously. Southern blotting was the inspiration for the microarray technology (Maskos and Southern 1992). Southern blotting involves the hybridization of a DNA probe to a specific DNA fragment on a solid substrate. Microarrays use the same principle but cover a genome-wide scale rather than single genes. The chip itself is made up of probes, corresponding to short DNA stretches for all genes in the genome. Depending on the specific experimental question, there may be multiple probes for each gene of interest and the number of spotted probes is often well into the thousands. In a typical microarray experiment, RNA is collected from an organism of interest, transcribed into cDNA (the sample is also referred to as the “target”), and hybridized to the chip where the target forms hydrogen bonds with the probes. In order to determine the relative abundance of the transcripts, the chip is then scanned with the hybridized sample. In theory, if a gene is expressed in the organism, it will have hybridized to the probe on the chip. The abundance of a gene product is then measured by the detection of chemiluminescent-labeled targets. Based on the intensity of the target–probe hybridization, the relative abundance of the RNA produced by the organism in response to a condition can be measured. Because microarray technology is chip-based, its ability to detect a specific gene or transcript is limited to the original spotting on the chip itself. This is especially important to keep in mind for certain organisms, where only incomplete genomic sequences are available, leading to low quality annotations and unknown alternative spliced products, which would remain undetected if not already taken into account in the original design of the microarray.

The first genome-wide array was developed for S. cerevisiae in 1997 (Lashkari et al. 1997). Since then, microarrays for fungi have evolved into high-density tiling arrays (Sellam et al. 2010) and splicing-sensitive exon-junction arrays (Inada and Pleiss 2010), among many others. Microarray technology has been extensively used to investigate global changes in gene expression in response to changing environmental conditions and genetic knockouts. It has also been used in conjunction with immune cell and animal models at different infections stages in vitro and in vivo to investigate different infection stages. For example, the transcriptional response of both S. cerevisiae and Candida glabrata to antifungal agents and other chemical stress agents in vitro was profiled (Lelandais et al. 2008). To identify pathogen-specific responses on the side of C. glabrata, the authors compared the transcriptional profile of both species after treatment. Surprisingly, they found a high conservation among the regulated genes during infection, and a subpopulation of genes that were pathogen-specific. Further, in vitro infection experiments of human blood cells with C. albicans identified a number of differentially expressed genes, which may be important in the survival of Candida during bloodstream infections (Fradin et al. 2003). Transcriptional profiling has been used to identify effects of phagocytosis of C. albicans by immune cells, including neutrophils (Rubin-Bejerano et al. 2003) and macrophages (Lorenz et al. 2004). These studies identified the extent of the amino-acid-deficient environment within the phagosome, and characterized the dynamic starvation response of Candida over the time course of infection. The first dual transcriptional profiling using microarrays for a host and pathogen interaction was also performed with conidia of A. fumigatus during infection of human airway epithelial cells. This work confirmed the upregulation of inflammatory interleukin (IL)-6 and the immune response to conidia, as well as pathways whose activation had previously only been investigated from either the host or pathogen perspective alone (Oosthuizen et al. 2011).

Transcription profiling using RNA-seq is conceptually similar to microarray, insofar as the end result of the experiment is often a list of differently expressed genes. However, the sample is sequenced using a parallel sequencing approach referred to as next-generation sequencing (NGS) instead of using hybridization-based methods. Based on Sanger sequencing methods, high-throughput technology began with tag-based methods that were developed so that multiple sequencing reactions could be run in parallel. These included serial analysis of gene expression (SAGE) (Velculescu et al. 1995), cap analysis of gene expression (CAGE) (Kodzius et al. 2006) and massive parallel signature sequencing (MPSS) (Reinartz et al. 2002). In order to increase the scale of reactions taking place, a number of novel sequencing strategies and commercially available platforms have been developed. These included Roche/454, Illumina/Solexa, Life/APG Helicos BioSciences, and Pacific Biosciences. Each system has pros and cons, depending on the biological application (Metzker 2010).

In a typical RNA-seq experiment, cDNA is first fragmented; these templates are then attached to a substrate (which will vary with the technology used) with the aid of adaptor sequences. The immobilization of the template samples gives the advantage of allowing billions of simultaneous sequencing reactions, differentiating itself from first generation sequencing technology in terms of capacity and cost (Metzker 2010). Templates can be sequenced either from one end (single-end sequencing) or both ends (paired-end sequencing). The resulting sequencing reads can vary in length, depending on the technology used, from less than 30 bp to over 300 bp (Wang et al. 2009; Metzker 2010). Reads are then mapped back to the reference genome to determine gene expression and, when compared to other samples, differential gene expression.

RNA-seq has rapidly gained in broad popularity over the past few years, especially because of its ability to sequence to a high depth and also because it detects low abundance transcripts, offering a more complete view of the transcriptional profile of an organism than microarrays.

The sequencing technology has significant advantages over microarray, especially for non-model organism species, as the detection of expressed genes is not dependent on having a priori knowledge of the gene investigated. Moreover, RNA-seq does not have intrinsic limitations to the dynamic range of detection (Royce et al. 2007). RNA-seq has been especially important in the detection of novel noncoding RNA species and small RNAs, as well as for de novo annotation (Wang et al. 2009). Under in vitro conditions, a de novo annotation of the C. albicans transcriptome under nine different environmental conditions was recently performed and was able to identify over 600 novel transcriptionally active regions and introns from a total of 177 million uniquely mapped reads (Bruno et al. 2010). Similarly, with A. fumigatus, RNA-seq was used to investigate planktonic and biofilm growth to identify differences in pathological and morphological characteristics in these two stages. Numerous biofilm-specific genes were identified as being regulated, representing targets for biofilm development.

Most recently, the first dual-species RNA-seq approach, sequencing RNA mixture comprising both host and fungal pathogen transcriptomes over a time course of infection, has been accomplished. Furthermore, this study predicted, using mathematical approaches, and experimentally verified novel host–pathogen regulatory networks implicated in the interaction. The use of a combination of sequence analysis and network inference enabled this dual-systems approach (Tierney et al. 2012). This study presents the first adaptation of network inference to model host–pathogen interactions, validating the use of network inference for the analysis of multiple species data sets.


1. Clustering Gene Expression Data Sets


The most common output of transcriptomics is a list of differentially expressed genes in one condition versus a control condition. Differential gene expression analysis begins with a testing for the statistical significance of the variation within the sample. Statistical approaches for determining differential expression have been extensively reviewed elsewhere (Cui and Churchill 2003), as have a number of freely available tools to aid in statistical analysis for both microarray (Steinhoff and Vingron 2006) and RNA-seq data sets (Sun and Zhu 2012). This method reduces a genome-wide comparison down to only those genes significantly affected under a specific condition. Convenient analysis pipelines, especially for RNA-seq data, have been recently created to help non-computational biologists in the analysis of sequencing from the raw data file to a list of differentially expressed genes (Oshlack et al. 2010; Garber et al. 2011). High-dimensionality data such as microarray or RNA-seq samples complicate data analysis due to the inequality of variables measured compared to the sample number. Because the list of differentially expressed genes can still be on an order of magnitude of several hundred genes, additional methods to reduce complexity are often necessary.

Partitioning expression data into subgroups of genes, called clusters, facilitates data visualization and interpretation underlying a biological process of interest. Depending on the approach used, the groupings can then be visualized by scatter plots, histograms, dendrograms, or heat maps. Genes are clustered into specific categories, which can be functional, structural, temporal, or a combination of the above. A number of clustering approaches have been developed, including principle component analysis (PCA), hierarchical clustering, fuzzy clustering, biclustering, and mutual information analysis, each of which tackle different potential bias aspects of the data set (Eisen et al. 1998; Kerr et al. 2008).

PCA identifies data trends within samples, called principal components, such that very large data sets can be graphically represented using a smaller number of dimensions (Ringner 2008). This technique is especially useful for visually identifying batch effects or noise between samples, which may otherwise negatively affect downstream analysis.

Hierarchical clustering aims to create a hierarchy of gene groups, whereby relationships among genes are represented by a dendrogram. The shorter the length of dendrogram branches between objects, the more closely related the gene expression patterns are. These differences are assessed by pair-wise similarity functions. In this way, the method builds a hierarchy of gene groups by progressively merging clusters (Eisen et al. 1998). One of the major limitations of hierarchical clustering is that the decision-making for gene assignments is focused locally, without considering the data set as a whole, which can affect downstream interpretation (Tamayo et al. 1999).

Fuzzy clustering (also referred to as soft clustering) was developed to partially counteract the local bias of hierarchical clustering approaches. Fuzzy clustering allows for data elements to simultaneously belong to multiple groups with respect to a given criteria. Each data element has a “degree of belonging” to a cluster, instead of being assigned to an individual cluster and this degree represents how close the fit is in multiple clusters (Dembele and Kastner 2003; Fu and Medico 2007). This is in contrast to hard clustering, where data elements only are allowed to belong to one group.

Some of the newest clustering approaches have attempted to incorporate prior biological knowledge into the clustering algorithm. This has been attempted with a form of biclustering (Madeira and Oliveira 2004), a matrix-based clustering approach that includes both genes and conditions in the algorithm. One example algorithm, called cMonkey, was used to identify and cluster sequence motifs in Helicobacter pylori, S. cerevisiae, and Escherichia coli based on microarray data sets (Reiss et al. 2006). A similar clustering approach that incorporates prior knowledge, called mutual information analysis, has also been shown to identify transcriptional interactions with a high fidelity in mammalian cells (Margolin et al. 2006). Finally, a number of standardized tools and analysis techniques are already publically available (Table 3.1) to facilitate transcriptional data analysis. To date, they have been able to provide detailed views of changing transcriptional landscapes in response to different environmental conditions on a functional level, and have been highly beneficial for the identification and prediction of virulence factors in fungi.


Table 3.1.
OMICS resources

























































































































-Omics

Methods

Standards

Databases

Analysis resources and tools

Fungal-specific resources

Genomics

CGA, NGS

MIGS

1000 Genomes

ClustalW2

Candida Database




ENSEMBL

UCSC Genome Browser

Aspergillus Database




GOLD

IGV

Saccharomyce s Database




Transcriptomics

Microarray, RNA-seq

MIAME, MINSEQE

Array Express

Galaxy

YEASTRACT




GEO

Bioconductor

Filamentous Fungal Gene Expression Database




Proteomics

2D-PAGE, MS

MIAPE

PRIDE

InterProScan

Proteopathogen




MassBank

APEX

NetPhosYeast




Metabolomics

NMR, HPLC-MS

MIAMET, CIMR

HMDB

Arcadia

YEASTNET




GOLM

MetaboAnalyst

FunSecKB




Epigenomics

ChIP-chip, ChIP-seq

MIAME, MINSEQE

Roadmap

STAR Genome Browser

ChromatinDB




ENDCODE

RMAP

Nucleosome Acetylation and Methylation in Yeast





The table includes a nonexhaustive list of -omics methods and resources with an emphasis on those available for fungi. Reporting standards are abbreviated as follows with their corresponding reference: MIGS minimum information about a genome sequence (Field et al. 2008); MIAME minimum information about a microarray experiment (Brazma et al. 2001); MINSEQE minimum information about a high-throughput sequencing experiment (http://​www.​mged.​org/​ minseqe/); MIAPE minimum information about a proteomics experiment (Taylor et al. 2007); MIAMET minimum information about a metabolomics experiment (Bino et al. 2004); and CIMR core information for metabolomics reporting (http://​msi-workgroups.​sourceforge.​net/​).


C. Proteomics


The term “proteome” was coined in 1996 to describe the complete set of proteins that is synthesized by a cell (Wilkins et al. 1996). The proteome provides the highest level of functional information of a cell, revealing the end product of the transcription and downstream transcriptional processing. The use of proteomics data sets is also becoming a popular approach for studying proteins involved in virulence.

The major areas in proteomics research include identification of proteins and their posttranslational modifications as well as protein–protein interactions.

These areas are investigated using two main methods, traditional two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and, increasingly, mass spectrometry (MS). In 2D-PAGE, protein samples are resolved by two intrinsic properties: first in one dimension on an SDS gel, and then in the second dimension, at a 90° rotation (O’Farrell 1975); These properties can include their isoelectric point, protein complex mass in the native state, and protein mass. The properties chosen will depend on the specific experiment. Proteins are then visualized by staining of gels, often using silver, Coomassie Blue or Ponceau S staining techniques. Once visible, spots can then be picked out by hand or more often using automated detection software based on their location on the gel. The identified spots are then excised, proteolytically digested, and then subjected to MS analysis. Briefly, MS measures the mass-to-charge ratio of charged particles such as peptides, and this information can then be used to identify the composition of the peptide and the gene it is derived from. Experimentally, MS samples are first vaporized and then ionized using an electron beam. The produced ions are then detected by the mass analyzer, which sorts the ions by their masses, and then processed into mass spectra where the detector measures the quality and quantity of the ions present. Variations of MS, including liquid chromatography tandem MS (LC-MS/MS) (Yates et al. 1999) and gel-free proteomics techniques (2012; Stastna and Van Eyk 2012) are also widely used approaches. These facilitate the analysis of proteins that are not easily separated in 2D gels due to their high hydrophobicity or high molecular weight, as in the case of many integral membrane proteins (Aebersold and Mann 2003; de Godoy et al. 2008). MS/MS involves additional rounds of ionization; however, the reproducibility between technical replicates of a sample remains in the range of 35–60% overlap (Tabb et al. 2010). Unfortunately, absolute protein quantification remains out of reach at the moment (Peng et al. 2012). Major hurdles remain to improve the reproducibility and standardization of the MS-based methods (Kniemeyer et al. 2011). Nonetheless, since 1996, the percentage of protein-coding genes in S. cerevisiae for which some biological function has been identified has increased to over 80%, greater than for any other sequenced eukaryotic genome (Botstein and Fink 2011). Proteomics studies have been highly beneficial in achieving this.

Proteomics approaches have led to the identification of a number of fungal virulence factors. Using an in vitro approach, the proteome of C. albicans yeast-form cells in the exponential or stationary growth phase was investigated in response to nutrient limitation using 2D-PAGE. The authors aimed to identify metabolic response patterns in these two cell types that might confer a tolerance phenotype (Kusch et al. 2008) similar to that observed in S. cerevisiae in response to stress (Herman 2002). They observed that the stationary phase cells upregulated a number of proteins, including those involved in the defense against reactive oxygen species and heat stress, as compared to exponentially growing cells. The ability to undergo morphological transitions between yeast and hyphal cells is an important virulence trait of many but not all Candida spp. This is especially important as the cell wall itself is always subjected to recognition by the host cell surface and is thus exposed to immune recognition.

For example, a number of proteins are expressed in the yeast or hyphal stage only, suggesting a potential mechanism for secretion of cytosolic proteins, which may contribute its overall virulence in these different morphological states (Ebanks et al. 2006). These data further support the idea that the regulation of Hsp90, an essential chaperone protein that is activated in response to stress, is posttranscriptional in hyphal cells.

Additional variations of MS, such as matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) MS has been shown to be a useful tool in drug susceptibility screening of C. albicans to fluconazole (Marinach et al. 2009). A complete map of the yeast proteome using MS has most recently been completed, using a combination approach of high-throughput peptide synthesis in conjunction with MS for the S. cerevisiae proteome. This study provides insight into the evolution of yeast proteins and protein complexes (Picotti et al. 2013).

For A. fumigatus, the conidia mediates the initial contact with the immune system of the host and therefore is an interesting target for proteomics studies looking for fungal virulence factors and posttranscriptional responses upon host recognition. For example, comparison of the proteome profiles of A. fumigatus conidia and mycelial cells revealed some 50 conidia-specific proteins (Teutschbein et al. 2010). Interestingly, the data suggested that many proteins that are not needed during the resting stage are stored, perhaps for a rapid response to the activation of metabolic processes or in response to recognition by the immune system.

In vitro co-culture approaches with immune cells have also been influential in investigating differential protein expression in response to infection conditions. Using a time course of interaction between C. albicans and macrophages, a combination of proteomics and transcriptomics techniques highlighting specific pathways related to the virulence of Candida spp, including the regulation of apoptosis (Fernandez-Arenas et al. 2004; Fernandez-Arenas et al. 2007), was performed. The authors used a C. albicans strain of attenuated virulence, in which the kinase HOG1, important for the oxidative stress response, was absent. They were able to identify several novel C. albicans antigens and further characterized the protective antibody response of mice against C. albicans infection. The use of proteomics methods, in general, has been useful in validating transcriptional data sets. However, they also revealed a number of discrepancies between the transcriptome and the proteome, which remains an active area of research in the validation of fungal virulence factors. Finally, many resources have recently become available, including the Proteopathogen Database for studying host–pathogen interactions (http://​proteopathogen.​dacya.​ucm.​es) with C. albicans, Compluyeast (http://​compluyeast2dpag​e.​dacya.​ucm.​es/​cgi-bin/​2d/​2d.​cgi), which catalogues 2D-PAGE data sets from C. albicans, Mus musculus, and S. cerevisiae for comparative proteomics.


E. Metabolomics


Metabolites are the products of metabolism or reaction intermediates and are usually small molecules serving a number of functions within the cell, including signaling and inhibition or stimulation of enzymes, among a number of other functions. As reaction intermediates, metabolites provide the “missing-link” between DNA, RNA, and protein interactions within a cell. One of the major themes of metabolomics is to investigate the influence of metabolites on cellular phenotypes. The metabolome is composed of intracellular metabolites and the exo-metabolome, also referred to as the secretome, which contains all small molecules secreted from a cell. It has been estimated that over 70% of metabolites participate in more than two biological reactions, and therefore represent interesting molecules for SysBio approaches (Nielsen 2003). Furthermore, from an evolutionary perspective, it is expected that a number of the filamentous fungi share their primary metabolism with their yeast ancestor S. cerevisiae, suggesting a broad applicability of the metabolomics research in the fungal research community. In fungal cells, there is an estimated number of more than 1,000 metabolites in the steady state (Smedsgaard and Nielsen 2005), some of which are extremely short-lived or of low abundance, making their quantification a formidable challenge.

A number of methods are in use to identify metabolite profiles in cells. The most common are nuclear magnetic resonance (NMR) spectroscopy, MS (see Sect. II.C) as well as metabolic labeling with radioactive isotopes (Niittylae et al. 2009; Zamboni and Sauer 2009). Another method for investigation of metabolomics is gas chromatography coupled to mass spectrometry (GC-MS). GC-MS utilizes GC with detection by MS. GC is used in analytical chemistry to separate and identify molecules based on their migration within a capillary system. The sample is vaporized and travels through the capillary using an inert carrier gas. The time it takes for each molecule to elute from the column will vary according to its molecular properties and therefore can be used to identify compounds. Combining this elution with MS gives a highly detailed description of the molecule. High performance liquid chromatography (HPLC) is often used in combination with MS (HPLC-MS). HPLC is a chromatographic purification technique using a high-pressured capillary tube system, allowing for the fine separation of molecules. These methods, among others, provide a comprehensive way to identify the structure of metabolites on a genome-wide scale.

The identification and function of metabolites is highly relevant for a better understanding of fungal virulence. Fungi, more so than other pathogenic species, are notoriously known for the diversity of metabolites produced in response to host immune defense, and are thus useful organisms for studying metabolic diversity (Jewett et al. 2006). Notably, about a dozen A. fumigatus secondary metabolites have been implicated in niche adaptation and virulence (Galagan et al. 2005). To date, significant progress has only been made in metabolic profiling of fungi such as S. cerevisiae. The first metabolic network reconstruction of S. cerevisiae used an extensive data-mining approach of previous literature in combination with mathematical techniques to identify approximately 600 metabolites (Forster et al. 2003). Shortly thereafter, GC-MS methods were able to verify the presence of approximately 100 of these metabolites under standard laboratory growth conditions (Villas-Boas et al. 2005). Metabolic flux in over 30 S. cerevisiae mutants demonstrated robustness and inherent redundancies built into yeast metabolism (Blank et al. 2005). In C. albicans, LC-tandem MS was used to profile the regulation of the secretome under standard laboratory conditions (Sorgo et al. 2010) and in response to the antifungal agent fluconazole (Sorgo et al. 2011), identifying numerous immunogenic peptides as novel vaccine candidates for antifungal therapy. Recently, the metabolome of A. fumigatus was investigated using 1H-NMR metabolomics under infection conditions (Grahl et al. 2011). Using this technique, the authors detected ethanol in the lungs in a murine model of invasive pulmonary aspergillosis, suggesting a role for fungal alcohol dehydrogenase in pathogenesis (Grahl et al. 2011). 1H-NMR metabolomics also enabled the identification of pneumococcal or cryptococcal meningitis without prior sample culture, which if implemented in a clinical setting would speed up the time it takes for patients to be diagnosed (Himmelreich et al. 2009).


F. Epigenomics


Among the biologically relevant –omics approaches, the most recent addition, epigenomics, has entered center stage. The epigenome describes the global epigenetic modifications that take place within a cell. Epigenetic modifications take place on the DNA, histones, and chromatin in its various functional states. They use numerous posttranslational modifications, including, but not limited to, the addition of single or multiple methyl residues, ubiquitination, acetylation, phosphorylation, or adenylation just to name the most common modifications [for review see (Hnisz et al. 2011)]. Most importantly, many modifications are reversible, providing an additional and even heritable level of cellular regulation.

The most common methods for investigation of the epigenetic landscape are studies on the variation of chromatin states using chromatin immunoprecipitation (ChIP).

Combinations of ChIP with microarray technology, known as ChIP-Chip or ChIP-on-Chip, and a similar combination of ChIP with NGS technology, termed ChIP-seq, have been recently introduced. ChIP identifies transient in vivo protein–DNA complexes by crosslinking DNA and associated proteins within a cell lysate. The DNA is then fragmented either by sonication or nuclease digestion. The proteins of interest are then selected using an antibody, precipitated, purified, and the associated DNA is either sequenced or placed on a microarray, depending on the technology used.

ChIP-Chip has been used to investigate genome-wide changes in patterns of histone methylation in the fission yeast S. pombe. A complex composed of two proteins, Swm1 and Swm2, mediates demethylation of lysine 9 in histone H3 (H3K9) (Opel et al. 2007). Epigenetic regulation via this complex, in concert with additional histone deacetylases and chromatin remodelers, is a major factor in the transcriptional regulation of S. pombe (Opel et al. 2007). In C. albicans, Nobile and colleagues identified the transcriptional network for controlling biofilm formation using a combination of ChIP-Chip and in vivo animal models. The six identified core transcriptional regulators, regulating over 1,000 target genes, provide insight into biofilm formation during host infection (Nobile et al. 2012). In yeast, ChIP-Chip was used to investigate histone and gene deletion mutants during environmental stress, highlighting the importance of epigenetic regulation in this process (Weiner et al. 2012).

In C. neoformans, the size of the capsule increases under infection conditions and is a well-established virulence factor of the species. The direct targets of Ada2 in C. neoformans were recently investigated using ChIP-seq (Haynes et al. 2011). Ada2 is a member of the Spt-Ada-Gcn5 acetyltransferase (SAGA) complex, which regulates transcription by histone acetylation. The authors identified a relationship between the function of Ada2 and capsule size, linking this epigenetic modification and its targets to the overall virulence of the species (Haynes et al. 2011). Most recently, in C. albicans, a role of chromatin-modifying enzymes in the inhibition of the yeast-to-hyphal transition was discovered using a combined approach of ChIP-seq and RNA-seq. The authors identified a role for the histone deacetylase Set3/Hos2 complex (Set3C) as a transcriptional cofactor of metabolic and morphogenesis-related gene expression. They found that the acetylation status of C. albicans chromatin influences transcription kinetics at target genes, showing that the epigenetic regulation supersedes a core transcriptional factor circuit involved in morphogenesis, a circuit that might be shared among other fungal pathogens (Hnisz et al. 2012).


G. Data Mining Approaches and Genome-Wide Fungal Resources



1. Databases


High-throughput molecular biology techniques have enormously increased the sheer volume of data generated and the need for proper data storage has never been higher (Kersey and Apweiler 2006). The value of this biological information is dependent on the ability of researchers to access and extract the information in a quick and reliable format, but also requires high-level curation.

Databases classify, organize, and systematize information. The maintenance of databases is essential in disseminating biological data to the community. The early development of two excellent databases for S. cerevisiae, the Saccharomyces Genome Database (www.​yeastgenome.​org/​) and the Yeast Proteome Database (http://​www.​proteome.​com/​YPDhome.​html) led to the rapid use of S. cerevisiae as a functional genomics tool and model organism (Botstein and Fink 2011). Despite their importance, maintaining high-quality, reliable databases is a constant struggle (Baker 2012), partly because of inherent high costs required for the curation of ever-changing biological data sets. Systems biologists extensively use databases for retrieval of prior knowledge, both qualitative and quantitative, on the biological question to increase the strength predictability, and identifiability models.

Major database initiatives include PubMed (http://​www.​ncbi.​nlm.​nih.​gov/​pubmed/​), Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://​www.​genome.​jp/​kegg/​), and Gene Ontology (GO) http://​www.​geneontology.​org, all of which have established themselves as staple websites for researcher. Specialized databases are also becoming increasingly popular, such as the database of virulence factors in fungal pathogens (http://​sysbio.​unl.​edu/​DFVF/​), which enable inclusion of more in-depth information about a specific topic that may not be sufficiently covered in larger databases.

An additional benefit to including scientific data into databases is the ability to standardize the reporting format, facilitating both the integration of distinct data sets from different laboratories and the development of analysis tools. Standardization has been greatly aided with the push for minimum reporting guidelines for biological and biomedical information (Taylor et al. 2008) (http://​mibbi.​sourceforge.​net/​). Reporting guidelines now exist for all major –omics methodologies (Table 3.1) and there has been a general push from the scientific community to adhere to and popularize these standards for biological information.


2. Strain Collections


Genome-wide profiling at the RNA and protein level has been greatly aided by publically available strain libraries in the form of loss-of-function (deletions) and gain-of-function (overexpression) collections. They have provided an efficient screening tool for scientists worldwide to investigate transcriptional and posttranscriptional changes in response to external stimuli, such as drug treatment and environmental variation, or exposure to host immune surveillance. Specifically for fungi, they have increased the throughput of virulence factor screening.

Of all fungi, S. cerevisiae has contributed the most number of strain collections. Starting with the yeast knockout (YKO) strain collection, this set methodologically deletes open reading frames (ORFs) by substituting the gene of interest with a selectable drug-resistance cassette, allowing for the systematic screening of the effects of gene loss (Winzeler et al. 1999; Giaever et al. 2002). More than 20,000 strains are currently available from the Saccharomyces Genome Deletion Project (http://​www-sequence.​stanford.​edu/​group/​yeast_​deletion_​project/​), including both homozygous and heterozygous diploid deletions, MATα and MAT a haploids, green fluorescent protein (GFP)-tagged (Huh et al. 2003), even essential temperature-sensitive collections (Li et al. 2004, 2011; Yan et al. 2008).

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Sep 20, 2016 | Posted by in CARDIOLOGY | Comments Off on Systems Biology Approaches to Understanding and Predicting Fungal Virulence

Full access? Get Clinical Tree

Get Clinical Tree app for offline access