WorldWideScience

Sample records for scale genome resequencing

  1. BFAST: an alignment tool for large scale genome resequencing.

    Directory of Open Access Journals (Sweden)

    Nils Homer

    2009-11-01

    Full Text Available The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation.We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels.We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net.

  2. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    Energy Technology Data Exchange (ETDEWEB)

    Muchero, Wellington [ORNL; Labbe, Jessy L [ORNL; Priya, Ranjan [University of Tennessee, Knoxville (UTK); DiFazio, Steven P [West Virginia University, Morgantown; Tuskan, Gerald A [ORNL

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel and fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.

  3. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    OpenAIRE

    Guo, Longbiao; Gao, Zhenyu; Qian, Qian

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively...

  4. SNP detection for massively parallel whole-genome resequencing.

    Science.gov (United States)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong; Yang, Huanming; Wang, Jian; Kristiansen, Karsten; Wang, Jun

    2009-06-01

    Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection method for sequencing-by-synthesis Illumina Genome Analyzer technology. We designed this method by carefully considering the data quality, alignment, and experimental errors common to this technology. All of this information was integrated into a single quality score for each base under Bayesian theory to measure the accuracy of consensus calling. We tested this methodology using a large-scale human resequencing data set of 36x coverage and assembled a high-quality nonrepetitive consensus sequence for 92.25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered at 99.97% and 99.84% consistency, respectively. At a low sequencing depth, we used prior probability of dbSNP alleles and were able to improve coverage of the dbSNP sites significantly as compared to that obtained using a nonimputation model. Our analyses demonstrate that our method has a very low false call rate at any sequencing depth and excellent genome coverage at a high sequencing depth.

  5. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    Science.gov (United States)

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357

  6. Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data

    Directory of Open Access Journals (Sweden)

    Matteo Chiara

    2017-07-01

    Full Text Available Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human “variome,” through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Generation Sequencing (NGS for diagnostics and clinical screenings, and the widespread application of such technologies promises major revolutions in medical science. Bioinformatic analysis of human resequencing data is one of the main factors limiting the effectiveness and general applicability of NGS for clinical studies. The requirement for multiple tools, to be combined in dedicated protocols in order to accommodate different types of data (gene panels, exomes, or whole genomes and the high variability of the data makes difficult the establishment of a ultimate strategy of general use. While there already exist several studies comparing sensitivity and accuracy of bioinformatic pipelines for the identification of single nucleotide variants from resequencing data, little is known about the impact of quality assessment and reads pre-processing strategies. In this work we discuss major strengths and limitations of the various genome resequencing protocols are currently used in molecular diagnostics and for the discovery of novel disease-causing mutations. By taking advantage of publicly available data we devise and suggest a series of best practices for the pre-processing of the data that consistently improve the outcome of genotyping with minimal impacts on computational costs.

  7. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing

    DEFF Research Database (Denmark)

    Hou, Yong; Wu, Kui; Shi, Xulian

    2015-01-01

    BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate-oligonucleoti......BACKGROUND: Single-cell resequencing (SCRS) provides many biomedical advances in variations detection at the single-cell level, but it currently relies on whole genome amplification (WGA). Three methods are commonly used for WGA: multiple displacement amplification (MDA), degenerate...

  8. GeneChip Resequencing of the Smallpox Virus Genome Can Identify Novel Strains: a Biodefense Application▿

    Science.gov (United States)

    Sulaiman, Irshad M.; Tang, Kevin; Osborne, John; Sammons, Scott; Wohlhueter, Robert M.

    2007-01-01

    We developed a set of seven resequencing GeneChips, based on the complete genome sequences of 24 strains of smallpox virus (variola virus), for rapid characterization of this human-pathogenic virus. Each GeneChip was designed to analyze a divergent segment of approximately 30,000 bases of the smallpox virus genome. This study includes the hybridization results of 14 smallpox virus strains. Of the 14 smallpox virus strains hybridized, only 7 had sequence information included in the design of the smallpox virus resequencing GeneChips; similar information for the remaining strains was not tiled as a reference in these GeneChips. By use of variola virus-specific primers and long-range PCR, 22 overlapping amplicons were amplified to cover nearly the complete genome and hybridized with the smallpox virus resequencing GeneChip set. These GeneChips were successful in generating nucleotide sequences for all 14 of the smallpox virus strains hybridized. Analysis of the data indicated that the GeneChip resequencing by hybridization was fast and reproducible and that the smallpox virus resequencing GeneChips could differentiate the 14 smallpox virus strains characterized. This study also suggests that high-density resequencing GeneChips have potential biodefense applications and may be used as an alternate tool for rapid identification of smallpox virus in the future. PMID:17182757

  9. Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

    DEFF Research Database (Denmark)

    Zhan, Bujie; Fadista, João; Thomsen, Bo

    2011-01-01

    sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were...... of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation...

  10. Identifying structural variation in haploid microbial genomes from short-read resequencing data using breseq.

    Science.gov (United States)

    Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G

    2014-11-29

    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation

  11. Whole-genome resequencing of 100 healthy individuals using DNA pooling.

    Science.gov (United States)

    Wang, Xiaobin; Sui, Weiguo; Wu, Weiqing; Hou, Xianliang; Ou, Minglin; Xiang, Yueying; Dai, Yong

    2016-11-01

    With the advent of next-generation sequencing technology, the cost of sequencing has significantly decreased. However, sequencing costs remain high for large-scale studies. In the present study, DNA pooling was applied as a cost-effective strategy for sequencing. The sequencing results for 100 healthy individuals obtained via whole-genome resequencing and using DNA pooling are presented in the present study. In order to minimise the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for all samples and then subjected to whole-genome sequencing using four lanes for each library and resulting in at least a 30-fold haploid coverage for each sample. The NCBI human genome build37 (hg19) was used as a reference genome for the present study and the short reads were aligned to the reference genome achieving 99.84% coverage. In addition, the average sequencing depth was 32.76. In total, ~3 million single-nucleotide polymorphisms were identified, of which 99.88% were in the NCBI dbSNP database. Furthermore, ~600,000 small insertion/deletions, 500,000 structure variants, 5,000 copy number variations and 13,000 single nucleotide variants were identified. According to the present study, the whole genome has been sequenced for a small sample subjects from southern China for the first time. Furthermore, new variation sites were identified by comparing with the reference sequence, and new knowledge of the human genome variation was added to the human genomic databases. Furthermore, the particular distribution regions of variation were illustrated by analyzing various sites of variation, such as single-nucleotide polymorphisms.

  12. Genome resequencing and comparative variome analysis in a Brassica rapa and Brassica oleracea collection

    NARCIS (Netherlands)

    Cheng, Feng; Wu, Jian; Cai, Chengcheng; Fu, Lixia; Liang, Jianli; Borm, Theo; Zhuang, Mu; Zhang, Yangyong; Zhang, Fenglan; Bonnema, Guusje; Wang, Xiaowu

    2016-01-01

    The closely related species Brassica rapa and B. oleracea encompass a wide range of vegetable, fodder and oil crops. The release of their reference genomes has facilitated resequencing collections of B. rapa and B. oleracea aiming to build their variome datasets. These data can be used to

  13. Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Yuan Zhu

    Full Text Available The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using "pooled" data and compared them with "true" frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive.

  14. SNP detection and prediction of variability between chicken lines using genome resequencing of DNA pools

    Directory of Open Access Journals (Sweden)

    Carlborg Örjan

    2010-11-01

    Full Text Available Abstract Background Next-generation sequencing technologies are widely used for detection of millions of Single Nucleotide Polymorphisms (SNPs and also provide a means of assessing their variation. This information is useful for composing subsets of highly informative SNPs for region-specific or genome-wide analysis and to identify mutations regulating phenotypic differences within or between populations. In this study, we investigated the sensitivity of SNP detection and introduced the flanking SNPs value (FSV as a novel measure for predicting SNP-variability using ~5X genome resequencing with ABI SOLID and DNA pools from two chicken lines divergently selected for juvenile bodyweight. Results Genotyping with a 60 K SNP chip revealed polymorphisms within or between two divergently selected chicken lines for 31 363 SNPs, 48% of which were also detected using resequencing of DNA pools. SNP detection using resequencing was more powerful for positions with larger differences in allele frequency between the lines. About 50% of the SNPs with non-reference allele frequencies in the range 0.5-0.6 and 67% of those with frequencies > 0.9 could be detected. On average, ~3.7 SNPs/kb were detected by resequencing, with about 5% lower density on microchromosomes than on macrochromosomes. There was a positive correlation between the observed between-line SNP variation from the 60 K chip analysis and our proposed FSV score computed from the genome resequencing data. The strongest correlations on macrochromosomes and microchromosomes were observed when the FSV was calculated with total flanking regions of 62 kb (correlation 0.55 and 38 kb (correlation 0.45, respectively. Conclusions Genome resequencing with limited coverage (~5X using pooled DNA samples and three non-reference reads as a threshold for SNP detection, identified 50 - 67% of the 60 K SNPs with a non-reference allele frequency larger than 0.5. The SNP density was around 5% lower on the

  15. Whole-genome resequencing and transcriptomic analysis to identify genes involved in leaf-color diversity in ornamental rice plants.

    Directory of Open Access Journals (Sweden)

    Chang-Kug Kim

    Full Text Available Rice field art is a large-scale art form in which people design rice fields using various kinds of ornamental rice plants with different leaf colors. Leaf color-related genes play an important role in the study of chlorophyll biosynthesis, chloroplast structure and function, and anthocyanin biosynthesis. Despite the role of different metabolites in the traditional relationship between leaf and color, comprehensive color-specific metabolite studies of ornamental rice have been limited. We performed whole-genome resequencing and transcriptomic analysis of regulatory patterns and genetic diversity among different rice cultivars to discover new genetic mechanisms that promote enhanced levels of various leaf colors. We resequenced the genomes of 10 rice leaf-color accessions to an average of 40× reads depth and >95% coverage and performed 30 RNA-seq experiments using the 10 rice accessions sampled at three developmental stages. The sequencing results yielded a total of 1,814 × 106 reads and identified an average of 713,114 SNPs per rice accession. Based on our analysis of the DNA variation and gene expression, we selected 47 candidate genes. We used an integrated analysis of the whole-genome resequencing data and the RNA-seq data to divide the candidate genes into two groups: genes related to macronutrient (i.e., magnesium and sulfur transport and genes related to flavonoid pathways, including anthocyanidin biosynthesis. We verified the candidate genes with quantitative real-time PCR using transgenic T-DNA insertion mutants. Our study demonstrates the potential of integrated screening methods combined with genetic-variation and transcriptomic data to isolate genes involved in complex biosynthetic networks and pathways.

  16. Signatures of selection in tilapia revealed by whole genome resequencing.

    Science.gov (United States)

    Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua

    2015-09-16

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.

  17. Genome resequencing and comparative variome analysis in a Brassica rapa and Brassica oleracea collection.

    Science.gov (United States)

    Cheng, Feng; Wu, Jian; Cai, Chengcheng; Fu, Lixia; Liang, Jianli; Borm, Theo; Zhuang, Mu; Zhang, Yangyong; Zhang, Fenglan; Bonnema, Guusje; Wang, Xiaowu

    2016-12-20

    The closely related species Brassica rapa and B. oleracea encompass a wide range of vegetable, fodder and oil crops. The release of their reference genomes has facilitated resequencing collections of B. rapa and B. oleracea aiming to build their variome datasets. These data can be used to investigate the evolutionary relationships between and within the different species and the domestication of the crops, hereafter named morphotypes. These data can also be used in genetic studies aiming at the identification of genes that influence agronomic traits. We selected and resequenced 199 B. rapa and 119 B. oleracea accessions representing 12 and nine morphotypes, respectively. Based on these resequencing data, we obtained 2,249,473 and 3,852,169 high quality SNPs (single-nucleotide polymorphisms), as well as 303,617 and 417,004 InDels for the B. rapa and B. oleracea populations, respectively. The variome datasets of B. rapa and B. oleracea represent valuable resources to researchers working on evolution, domestication or breeding of Brassica vegetable crops.

  18. Full genome re-sequencing reveals a novel circadian clock mutation in Arabidopsis.

    Science.gov (United States)

    Ashelford, Kevin; Eriksson, Maria E; Allen, Christopher M; D'Amore, Rosalinda; Johansson, Mikael; Gould, Peter; Kay, Suzanne; Millar, Andrew J; Hall, Neil; Hall, Anthony

    2011-01-01

    Map based cloning in Arabidopsis thaliana can be a difficult and time-consuming process, specifically if the phenotype is subtle and scoring labour intensive. Here, we have re-sequenced the 120-Mb genome of a novel Arabidopsis clock mutant early bird (ebi-1) in Wassilewskija (Ws-2). We demonstrate the utility of sequencing a backcrossed line in limiting the number of SNPs considered. We identify a SNP in the gene AtNFXL-2 as the likely cause of the ebi-1 phenotype.

  19. QTL mapping of leafy heads by genome resequencing in the RIL population of Brassica rapa.

    Directory of Open Access Journals (Sweden)

    Xiang Yu

    Full Text Available Leaf heads of cabbage (Brassica oleracea, Chinese cabbage (B. rapa, and lettuce (Lactuca sativa are important vegetables that supply mineral nutrients, crude fiber and vitamins in the human diet. Head size, head shape, head weight, and heading time contribute to yield and quality. In an attempt to investigate genetic basis of leafy head in Chinese cabbage (B. rapa, we took advantage of recent technical advances of genome resequencing to perform quantitative trait locus (QTL mapping using 150 recombinant inbred lines (RILs derived from the cross between heading and non-heading Chinese cabbage. The resequenced genomes of the parents uncovered more than 1 million SNPs. Genotyping of RILs using the high-quality SNPs assisted by Hidden Markov Model (HMM generated a recombination map. The raw genetic map revealed some physical assembly error and missing fragments in the reference genome that reduced the quality of SNP genotyping. By deletion of the genetic markers in which recombination rates higher than 20%, we have obtained a high-quality genetic map with 2209 markers and detected 18 QTLs for 6 head traits, from which 3 candidate genes were selected. These QTLs provide the foundation for study of genetic basis of leafy heads and the other complex traits.

  20. Integration of transcriptome and whole genomic resequencing data to identify key genes affecting swine fat deposition.

    Directory of Open Access Journals (Sweden)

    Kai Xing

    Full Text Available Fat deposition is highly correlated with the growth, meat quality, reproductive performance and immunity of pigs. Fatty acid synthesis takes place mainly in the adipose tissue of pigs; therefore, in this study, a high-throughput massively parallel sequencing approach was used to generate adipose tissue transcriptomes from two groups of Songliao black pigs that had opposite backfat thickness phenotypes. The total number of paired-end reads produced for each sample was in the range of 39.29-49.36 millions. Approximately 188 genes were differentially expressed in adipose tissue and were enriched for metabolic processes, such as fatty acid biosynthesis, lipid synthesis, metabolism of fatty acids, etinol, caffeine and arachidonic acid and immunity. Additionally, many genetic variations were detected between the two groups through pooled whole-genome resequencing. Integration of transcriptome and whole-genome resequencing data revealed important genomic variations among the differentially expressed genes for fat deposition, for example, the lipogenic genes. Further studies are required to investigate the roles of candidate genes in fat deposition to improve pig breeding programs.

  1. Evaluation of genetic variation among Brazilian soybean cultivars through genome resequencing.

    Science.gov (United States)

    Maldonado dos Santos, João Vitor; Valliyodan, Babu; Joshi, Trupti; Khan, Saad M; Liu, Yang; Wang, Juexin; Vuong, Tri D; de Oliveira, Marcelo Fernandes; Marcelino-Guimarães, Francismar Corrêa; Xu, Dong; Nguyen, Henry T; Abdelnoor, Ricardo Vilela

    2016-02-13

    Soybean [Glycine max (L.) Merrill] is one of the most important legumes cultivated worldwide, and Brazil is one of the main producers of this crop. Since the sequencing of its reference genome, interest in structural and allelic variations of cultivated and wild soybean germplasm has grown. To investigate the genetics of the Brazilian soybean germplasm, we selected soybean cultivars based on the year of commercialization, geographical region and maturity group and resequenced their genomes. We resequenced the genomes of 28 Brazilian soybean cultivars with an average genome coverage of 14.8X. A total of 5,835,185 single nucleotide polymorphisms (SNPs) and 1,329,844 InDels were identified across the 20 soybean chromosomes, with 541,762 SNPs, 98,922 InDels and 1,093 CNVs that were exclusive to the 28 Brazilian cultivars. In addition, 668 allelic variations of 327 genes were shared among all of the Brazilian cultivars, including genes related to DNA-dependent transcription-elongation, photosynthesis, ATP synthesis-coupled electron transport, cellular respiration, and precursors of metabolite generation and energy. A very homogeneous structure was also observed for the Brazilian soybean germplasm, and we observed 41 regions putatively influenced by positive selection. Finally, we detected 3,880 regions with copy-number variations (CNVs) that could help to explain the divergence among the accessions evaluated. The large number of allelic and structural variations identified in this study can be used in marker-assisted selection programs to detect unique SNPs for cultivar fingerprinting. The results presented here suggest that despite the diversification of modern Brazilian cultivars, the soybean germplasm remains very narrow because of the large number of genome regions that exhibit low diversity. These results emphasize the need to introduce new alleles to increase the genetic diversity of the Brazilian germplasm.

  2. Molecular footprints of domestication and improvement in soybean revealed by whole genome re-sequencing

    DEFF Research Database (Denmark)

    Li, Ying-hui; Zhao, Shan-cen; Ma, Jian-xin

    2013-01-01

    BACKGROUND:Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re......-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed.RESULTS:A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found...... that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits...

  3. Comparative genomics-based investigation of resequencing targets in Vibrio fischeri: Focus on point miscalls and artefactual expansions

    Directory of Open Access Journals (Sweden)

    Ruby Edward G

    2008-03-01

    Full Text Available Abstract Background Sequence closure often represents the end-point of a genome project, without a system in place for subsequent improvement and refinement. Building on the genome project of Vibrio fischeri ES114, we used a comparative approach to identify and investigate genes that had a high likelihood of sequence error. Results Comparison of the V. fischeri ES114 genome with that of conspecific strain MJ11 identified 82 target loci in ES114 as containing likely errors, and thus of high-priority for resequencing. Analysis of the targets identified 75 loci in which an error had occurred, resulting in the correction of 10,457 base pairs to generate the new ES114 genomic sequence. A majority of the inaccurate loci involved frameshift errors, correction of which fused adjacent ORFs. Although insertions/deletions are thought to be rare in microbial genome assemblies, fourteen of the loci contained extraneous sequence of over 300 bp, likely due to imperfect contig ends that were misassembled in tandem rather than as overlapping segments. Additionally we updated the entire genome annotation with 113 new features including previously uncalled protein-coding genes, regulatory RNA genes and operon leader peptides, and we analyzed the transcriptional apparatus encoded by ES114. Conclusion We demonstrate that errors in microbial genome sequences, thought to largely be confined to point mutations, may also consist of other prevalent large-scale rearrangements such as insertions. Ongoing genome quality control and annotation programs are necessary to accompany technological advancements in data generation. These updates further advance V. fischeri as an important model for understanding intercellular communication and colonization of animal tissue.

  4. Whole genome resequencing reveals natural target site preferences of transposable elements in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Raquel S Linheiro

    Full Text Available Transposable elements are mobile DNA sequences that integrate into host genomes using diverse mechanisms with varying degrees of target site specificity. While the target site preferences of some engineered transposable elements are well studied, the natural target preferences of most transposable elements are poorly characterized. Using population genomic resequencing data from 166 strains of Drosophila melanogaster, we identified over 8,000 new insertion sites not present in the reference genome sequence that we used to decode the natural target preferences of 22 families of transposable element in this species. We found that terminal inverted repeat transposon and long terminal repeat retrotransposon families present clade-specific target site duplications and target site sequence motifs. Additionally, we found that the sequence motifs at transposable element target sites are always palindromes that extend beyond the target site duplication. Our results demonstrate the utility of population genomics data for high-throughput inference of transposable element targeting preferences in the wild and establish general rules for terminal inverted repeat transposon and long terminal repeat retrotransposon target site selection in eukaryotic genomes.

  5. TE-Tracker: systematic identification of transposition events through whole-genome resequencing.

    Science.gov (United States)

    Gilly, Arthur; Etcheverry, Mathilde; Madoui, Mohammed-Amin; Guy, Julie; Quadrana, Leandro; Alberti, Adriana; Martin, Antoine; Heitkam, Tony; Engelen, Stefan; Labadie, Karine; Le Pen, Jeremie; Wincker, Patrick; Colot, Vincent; Aury, Jean-Marc

    2014-11-19

    Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements. We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker . We show that TE-Tracker accurately detects both the source and destination of

  6. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery

    Directory of Open Access Journals (Sweden)

    Stothard Paul

    2011-11-01

    Full Text Available Abstract Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs, 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs. Ten

  7. Genome-wide survey of pseudogenes in 80 fully re-sequenced Arabidopsis thaliana accessions.

    Directory of Open Access Journals (Sweden)

    Long Wang

    Full Text Available Pseudogenes (Ψs, including processed and non-processed Ψs, are ubiquitous genetic elements derived from originally functional genes in all studied genomes within the three kingdoms of life. However, systematic surveys of non-processed Ψs utilizing genomic information from multiple samples within a species are still rare. Here a systematic comparative analysis was conducted of Ψs within 80 fully re-sequenced Arabidopsis thaliana accessions, and 7546 genes, representing ∼28% of the genomic annotated open reading frames (ORFs, were found with disruptive mutations in at least one accession. The distribution of these Ψs on chromosomes showed a significantly negative correlation between Ψs/ORFs and their local gene densities, suggesting a higher proportion of Ψs in gene desert regions, e.g. near centromeres. On the other hand, compared with the non-Ψ loci, even the intact coding sequences (CDSs in the Ψ loci were found to have shorter CDS length, fewer exon number and lower GC content. In addition, a significant functional bias against the null hypothesis was detected in the Ψs mainly involved in responses to environmental stimuli and biotic stress as reported, suggesting that they are likely important for adaptive evolution to rapidly changing environments by pseudogenization to accumulate successive mutations.

  8. Whole-genome resequencing uncovers molecular signatures of natural and sexual selection in wild bighorn sheep.

    Science.gov (United States)

    Kardos, Marty; Luikart, Gordon; Bunch, Rowan; Dewey, Sarah; Edwards, William; McWilliam, Sean; Stephenson, John; Allendorf, Fred W; Hogg, John T; Kijas, James

    2015-11-01

    The identification of genes influencing fitness is central to our understanding of the genetic basis of adaptation and how it shapes phenotypic variation in wild populations. Here, we used whole-genome resequencing of wild Rocky Mountain bighorn sheep (Ovis canadensis) to >50-fold coverage to identify 2.8 million single nucleotide polymorphisms (SNPs) and genomic regions bearing signatures of directional selection (i.e. selective sweeps). A comparison of SNP diversity between the X chromosome and the autosomes indicated that bighorn males had a dramatically reduced long-term effective population size compared to females. This probably reflects a long history of intense sexual selection mediated by male-male competition for mates. Selective sweep scans based on heterozygosity and nucleotide diversity revealed evidence for a selective sweep shared across multiple populations at RXFP2, a gene that strongly affects horn size in domestic ungulates. The massive horns carried by bighorn rams appear to have evolved in part via strong positive selection at RXFP2. We identified evidence for selection within individual populations at genes affecting early body growth and cellular response to hypoxia; however, these must be interpreted more cautiously as genetic drift is strong within local populations and may have caused false positives. These results represent a rare example of strong genomic signatures of selection identified at genes with known function in wild populations of a nonmodel species. Our results also showcase the value of reference genome assemblies from agricultural or model species for studies of the genomic basis of adaptation in closely related wild taxa. © 2015 John Wiley & Sons Ltd.

  9. CNV discovery for milk composition traits in dairy cattle using whole genome resequencing.

    Science.gov (United States)

    Gao, Yahui; Jiang, Jianping; Yang, Shaohua; Hou, Yali; Liu, George E; Zhang, Shengli; Zhang, Qin; Sun, Dongxiao

    2017-03-29

    Copy number variations (CNVs) are important and widely distributed in the genome. CNV detection opens a new avenue for exploring genes associated with complex traits in humans, animals and plants. Herein, we present a genome-wide assessment of CNVs that are potentially associated with milk composition traits in dairy cattle. In this study, CNVs were detected based on whole genome re-sequencing data of eight Holstein bulls from four half- and/or full-sib families, with extremely high and low estimated breeding values (EBVs) of milk protein percentage and fat percentage. The range of coverage depth per individual was 8.2-11.9×. Using CNVnator, we identified a total of 14,821 CNVs, including 5025 duplications and 9796 deletions. Among them, 487 differential CNV regions (CNVRs) comprising ~8.23 Mb of the cattle genome were observed between the high and low groups. Annotation of these differential CNVRs were performed based on the cattle genome reference assembly (UMD3.1) and totally 235 functional genes were found within the CNVRs. By Gene Ontology and KEGG pathway analyses, we found that genes were significantly enriched for specific biological functions related to protein and lipid metabolism, insulin/IGF pathway-protein kinase B signaling cascade, prolactin signaling pathway and AMPK signaling pathways. These genes included INS, IGF2, FOXO3, TH, SCD5, GALNT18, GALNT16, ART3, SNCA and WNT7A, implying their potential association with milk protein and fat traits. In addition, 95 CNVRs were overlapped with 75 known QTLs that are associated with milk protein and fat traits of dairy cattle (Cattle QTLdb). In conclusion, based on NGS of 8 Holstein bulls with extremely high and low EBVs for milk PP and FP, we identified a total of 14,821 CNVs, 487 differential CNVRs between groups, and 10 genes, which were suggested as promising candidate genes for milk protein and fat traits.

  10. Discovery of Gene Sources for Economic Traits in Hanwoo by Whole-genome Resequencing

    Directory of Open Access Journals (Sweden)

    Younhee Shin

    2016-09-01

    Full Text Available Hanwoo, a Korean native cattle (Bos taurus coreana, has great economic value due to high meat quality. Also, the breed has genetic variations that are associated with production traits such as health, disease resistance, reproduction, growth as well as carcass quality. In this study, next generation sequencing technologies and the availability of an appropriate reference genome were applied to discover a large amount of single nucleotide polymorphisms (SNPs in ten Hanwoo bulls. Analysis of whole-genome resequencing generated a total of 26.5 Gb data, of which 594,716,859 and 592,990,750 reads covered 98.73% and 93.79% of the bovine reference genomes of UMD 3.1 and Btau 4.6.1, respectively. In total, 2,473,884 and 2,402,997 putative SNPs were discovered, of which 1,095,922 (44.3% and 982,674 (40.9% novel SNPs were discovered against UMD3.1 and Btau 4.6.1, respectively. Among the SNPs, the 46,301 (UMD 3.1 and 28,613 SNPs (Btau 4.6.1 that were identified as Hanwoo-specific SNPs were included in the functional genes that may be involved in the mechanisms of milk production, tenderness, juiciness, marbling of Hanwoo beef and yellow hair. Most of the Hanwoo-specific SNPs were identified in the promoter region, suggesting that the SNPs influence differential expression of the regulated genes relative to the relevant traits. In particular, the non-synonymous (ns SNPs found in CORIN, which is a negative regulator of Agouti, might be a causal variant to determine yellow hair of Hanwoo. Our results will provide abundant genetic sources of variation to characterize Hanwoo genetics and for subsequent breeding.

  11. Whole-genome resequencing of Bacillus cereus and expression of genes functioning in sodium chloride stress.

    Science.gov (United States)

    Xu, Zhenbo; Xie, Jinhong; Liu, Junyan; Ji, Lili; Soteyome, Thanapop; Peters, Brian M; Chen, Dingqiang; Li, Bing; Li, Lin; Shirtliff, Mark E

    2017-03-01

    Bacillus cereus is one of the most common opportunistic pathogens responsible for various foodborn diseases. To investigate the regulatory mechanism of B. cereus under high osmotic pressure, two B. cereus strains B25 and B26 were isolated from the industrial soy sauce residue containing high-salt concentration. Resequencing was performed by Illumina/Solexa platform and 13,646 SNPs and 434 InDels were identified as common variants between B25 and B26 against reference genome, followed by COG, GO, and KEGG enrichment analysis. Furthermore, 49 key genes involving in Na(+)/H(+),K(+) transporter, dipeptide or tripeptide transporter, stress response were selected and classified into 27 groups. Further validation was performed by qRT-PCR, and 4 candidate genes were found most associated with osmotic response. Gene expression of the 4 candidate genes was then analyzed accordingly, and down regulation was obtained for gene BC0669 and BC0754 associated with K(+) transport system. However, dramatic up regulation was detected for gene BC2114 involving in glutathione peroxidase, indicating the activation of antioxidant responses by osmotic stress via genetic regulation. As concluded, bioinformatic analysis and gene expression profile represented the basis of further investigation on the genetic and regulatory mechanism of bacterial salt tolerance. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. The Peach v2.0 release: high-resolution linkage mapping and deep resequencing improve chromosome-scale assembly and contiguity.

    Science.gov (United States)

    Verde, Ignazio; Jenkins, Jerry; Dondini, Luca; Micali, Sabrina; Pagliarani, Giulia; Vendramin, Elisa; Paris, Roberta; Aramini, Valeria; Gazza, Laura; Rossini, Laura; Bassi, Daniele; Troggio, Michela; Shu, Shengqiang; Grimwood, Jane; Tartarini, Stefano; Dettori, Maria Teresa; Schmutz, Jeremy

    2017-03-11

    The availability of the peach genome sequence has fostered relevant research in peach and related Prunus species enabling the identification of genes underlying important horticultural traits as well as the development of advanced tools for genetic and genomic analyses. The first release of the peach genome (Peach v1.0) represented a high-quality WGS (Whole Genome Shotgun) chromosome-scale assembly with high contiguity (contig L50 214.2 kb), large portions of mapped sequences (96%) and high base accuracy (99.96%). The aim of this work was to improve the quality of the first assembly by increasing the portion of mapped and oriented sequences, correcting misassemblies and improving the contiguity and base accuracy using high-throughput linkage mapping and deep resequencing approaches. Four linkage maps with 3,576 molecular markers were used to improve the portion of mapped and oriented sequences (from 96.0% and 85.6% of Peach v1.0 to 99.2% and 98.2% of v2.0, respectively) and enabled a more detailed identification of discernible misassemblies (10.4 Mb in total). The deep resequencing approach fixed 859 homozygous SNPs (Single Nucleotide Polymorphisms) and 1347 homozygous indels. Moreover, the assembled NGS contigs enabled the closing of 212 gaps with an improvement in the contig L50 of 19.2%. The improved high quality peach genome assembly (Peach v2.0) represents a valuable tool for the analysis of the genetic diversity, domestication, and as a vehicle for genetic improvement of peach and related Prunus species. Moreover, the important phylogenetic position of peach and the absence of recent whole genome duplication (WGD) events make peach a pivotal species for comparative genomics studies aiming at elucidating plant speciation and diversification processes.

  13. A Genome Resequencing-Based Genetic Map Reveals the Recombination Landscape of an Outbred Parasitic Nematode in the Presence of Polyploidy and Polyandry.

    Science.gov (United States)

    Doyle, Stephen R; Laing, Roz; Bartley, David J; Britton, Collette; Chaudhry, Umer; Gilleard, John S; Holroyd, Nancy; Mable, Barbara K; Maitland, Kirsty; Morrison, Alison A; Tait, Andy; Tracey, Alan; Berriman, Matthew; Devaney, Eileen; Cotton, James A; Sargison, Neil D

    2018-02-01

    The parasitic nematode Haemonchus contortus is an economically and clinically important pathogen of small ruminants, and a model system for understanding the mechanisms and evolution of traits such as anthelmintic resistance. Anthelmintic resistance is widespread and is a major threat to the sustainability of livestock agriculture globally; however, little is known about the genome architecture and parameters such as recombination that will ultimately influence the rate at which resistance may evolve and spread. Here, we performed a genetic cross between two divergent strains of H. contortus, and subsequently used whole-genome resequencing of a female worm and her brood to identify the distribution of genome-wide variation that characterizes these strains. Using a novel bioinformatic approach to identify variants that segregate as expected in a pseudotestcross, we characterized linkage groups and estimated genetic distances between markers to generate a chromosome-scale F1 genetic map. We exploited this map to reveal the recombination landscape, the first for any helminth species, demonstrating extensive variation in recombination rate within and between chromosomes. Analyses of these data also revealed the extent of polyandry, whereby at least eight males were found to have contributed to the genetic variation of the progeny analyzed. Triploid offspring were also identified, which we hypothesize are the result of nondisjunction during female meiosis or polyspermy. These results expand our knowledge of the genetics of parasitic helminths and the unusual life-history of H. contortus, and enhance ongoing efforts to understand the genetic basis of resistance to the drugs used to control these worms and for related species that infect livestock and humans throughout the world. This study also demonstrates the feasibility of using whole-genome resequencing data to directly construct a genetic map in a single generation cross from a noninbred nonmodel organism with a

  14. Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle.

    Science.gov (United States)

    Larkin, Denis M; Daetwyler, Hans D; Hernandez, Alvaro G; Wright, Chris L; Hetrick, Lorie A; Boucek, Lisa; Bachman, Sharon L; Band, Mark R; Akraiko, Tatsiana V; Cohen-Zinder, Miri; Thimmapuram, Jyothi; Macleod, Iona M; Harkins, Timothy T; McCague, Jennifer E; Goddard, Michael E; Hayes, Ben J; Lewin, Harris A

    2012-05-15

    Using a combination of whole-genome resequencing and high-density genotyping arrays, genome-wide haplotypes were reconstructed for two of the most important bulls in the history of the dairy cattle industry, Pawnee Farm Arlinda Chief ("Chief") and his son Walkway Chief Mark ("Mark"), each accounting for ∼7% of all current genomes. We aligned 20.5 Gbp (∼7.3× coverage) and 37.9 Gbp (∼13.5× coverage) of the Chief and Mark genomic sequences, respectively. More than 1.3 million high-quality SNPs were detected in Chief and Mark sequences. The genome-wide haplotypes inherited by Mark from Chief were reconstructed using ∼1 million informative SNPs. Comparison of a set of 15,826 SNPs that overlapped in the sequence-based and BovineSNP50 SNPs showed the accuracy of the sequence-based haplotype reconstruction to be as high as 97%. By using the BovineSNP50 genotypes, the frequencies of Chief alleles on his two haplotypes then were determined in 1,149 of his descendants, and the distribution was compared with the frequencies that would be expected assuming no selection. We identified 49 chromosomal segments in which Chief alleles showed strong evidence of selection. Candidate polymorphisms for traits that have been under selection in the dairy cattle population then were identified by referencing Chief's DNA sequence within these selected chromosome blocks. Eleven candidate genes were identified with functions related to milk-production, fertility, and disease-resistance traits. These data demonstrate that haplotype reconstruction of an ancestral proband by whole-genome resequencing in combination with high-density SNP genotyping of descendants can be used for rapid, genome-wide identification of the ancestor's alleles that have been subjected to artificial selection.

  15. Integrating transcriptome and genome re-sequencing data to identify key genes and mutations affecting chicken eggshell qualities.

    Directory of Open Access Journals (Sweden)

    Quan Zhang

    Full Text Available Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES and normal eggshell strength (NES genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO and Kyoto encyclopedia of genes and genomes (KEGG pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.

  16. The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions

    DEFF Research Database (Denmark)

    Guo, Shaogui; Zhang, Jianguo; Sun, Honghe

    2013-01-01

    an evolutionary scenario for the origin of the 11 watermelon chromosomes derived from a 7-chromosome paleohexaploid eudicot ancestor. Resequencing of 20 watermelon accessions representing three different C. lanatus subspecies produced numerous haplotypes and identified the extent of genetic diversity...... into aspects of phloem-based vascular signaling in common between watermelon and cucumber and identified genes crucial to valuable fruit-quality traits, including sugar accumulation and citrulline metabolism....

  17. Whole genome resequencing of a laboratory-adapted Drosophila melanogaster population sample [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    William P. Gilks

    2016-11-01

    Full Text Available As part of a study into the molecular genetics of sexually dimorphic complex traits, we used next-generation sequencing to obtain data on genomic variation in an outbred laboratory-adapted fruit fly (Drosophila melanogaster population. We successfully resequenced the whole genome of 220 hemiclonal females that were heterozygous for the same Berkeley reference line genome (BDGP6/dm6, and a unique haplotype from the outbred base population (LHM. The use of a static and known genetic background enabled us to obtain sequences from whole genome phased haplotypes. We used a BWA-Picard-GATK pipeline for mapping sequence reads to the dm6 reference genome assembly, at a median depth of coverage of 31X, and have made the resulting data publicly-available in the NCBI Short Read Archive (Accession number SRP058502. We used Haplotype Caller to discover and genotype 1,726,931 small genomic variants (SNPs and indels, <200bp. Additionally we detected and genotyped 167 large structural variants (1-100Kb in size using GenomeStrip/2.0. Sequence and genotype data are publicly-available at the corresponding NCBI databases: Short Read Archive, dbSNP and dbVar (BioProject PRJNA282591. We have also released the unfiltered genotype data, and the code and logs for data processing and summary statistics (https://zenodo.org/communities/sussex_drosophila_sequencing/.

  18. Rapid Genome-wide Single Nucleotide Polymorphism Discovery in Soybean and Rice via Deep Resequencing of Reduced Representation Libraries with the Illumina Genome Analyzer

    Directory of Open Access Journals (Sweden)

    Stéphane Deschamps

    2010-07-01

    Full Text Available Massively parallel sequencing platforms have allowed for the rapid discovery of single nucleotide polymorphisms (SNPs among related genotypes within a species. We describe the creation of reduced representation libraries (RRLs using an initial digestion of nuclear genomic DNA with a methylation-sensitive restriction endonuclease followed by a secondary digestion with the 4bp-restriction endonuclease This strategy allows for the enrichment of hypomethylated genomic DNA, which has been shown to be rich in genic sequences, and the digestion with serves to increase the number of common loci resequenced between individuals. Deep resequencing of these RRLs performed with the Illumina Genome Analyzer led to the identification of 2618 SNPs in rice and 1682 SNPs in soybean for two representative genotypes in each of the species. A subset of these SNPs was validated via Sanger sequencing, exhibiting validation rates of 96.4 and 97.0%, in rice ( and soybean (, respectively. Comparative analysis of the read distribution relative to annotated genes in the reference genome assemblies indicated that the RRL strategy was primarily sampling within genic regions for both species. The massively parallel sequencing of methylation-sensitive RRLs for genome-wide SNP discovery can be applied across a wide range of plant species having sufficient reference genomic sequence.

  19. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing.

    Science.gov (United States)

    Lutz, Kerry A; Wang, Wenqin; Zdepski, Anna; Michael, Todd P

    2011-05-20

    High throughput sequencing (HTS) technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR). We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  20. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Directory of Open Access Journals (Sweden)

    Zdepski Anna

    2011-05-01

    Full Text Available Abstract Background High throughput sequencing (HTS technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR. We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants.

  1. Isolation and analysis of high quality nuclear DNA with reduced organellar DNA for plant genome sequencing and resequencing

    Science.gov (United States)

    2011-01-01

    Background High throughput sequencing (HTS) technologies have revolutionized the field of genomics by drastically reducing the cost of sequencing, making it feasible for individual labs to sequence or resequence plant genomes. Obtaining high quality, high molecular weight DNA from plants poses significant challenges due to the high copy number of chloroplast and mitochondrial DNA, as well as high levels of phenolic compounds and polysaccharides. Multiple methods have been used to isolate DNA from plants; the CTAB method is commonly used to isolate total cellular DNA from plants that contain nuclear DNA, as well as chloroplast and mitochondrial DNA. Alternatively, DNA can be isolated from nuclei to minimize chloroplast and mitochondrial DNA contamination. Results We describe optimized protocols for isolation of nuclear DNA from eight different plant species encompassing both monocot and eudicot species. These protocols use nuclei isolation to minimize chloroplast and mitochondrial DNA contamination. We also developed a protocol to determine the number of chloroplast and mitochondrial DNA copies relative to the nuclear DNA using quantitative real time PCR (qPCR). We compared DNA isolated from nuclei to total cellular DNA isolated with the CTAB method. As expected, DNA isolated from nuclei consistently yielded nuclear DNA with fewer chloroplast and mitochondrial DNA copies, as compared to the total cellular DNA prepared with the CTAB method. This protocol will allow for analysis of the quality and quantity of nuclear DNA before starting a plant whole genome sequencing or resequencing experiment. Conclusions Extracting high quality, high molecular weight nuclear DNA in plants has the potential to be a bottleneck in the era of whole genome sequencing and resequencing. The methods that are described here provide a framework for researchers to extract and quantify nuclear DNA in multiple types of plants. PMID:21599914

  2. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Directory of Open Access Journals (Sweden)

    Sathishkumar Natarajan

    Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  3. Single nucleotide variants and indels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds

    Science.gov (United States)

    Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose), Gyr, Girolando and Holstein (dairy production). A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer ge...

  4. Identification of candidate domestication regions in the radish genome based on high-depth resequencing analysis of 17 genotypes.

    Science.gov (United States)

    Kim, Namshin; Jeong, Young-Min; Jeong, Seongmun; Kim, Goon-Bo; Baek, Seunghoon; Kwon, Young-Eun; Cho, Ara; Choi, Sang-Bong; Kim, Jiwoong; Lim, Won-Jun; Kim, Kyoung Hyoun; Park, Won; Kim, Jae-Yoon; Kim, Jin-Hyun; Yim, Bomi; Lee, Young Joon; Chun, Byung-Moon; Lee, Young-Pyo; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan

    2016-09-01

    This study provides high-quality variation data of diverse radish genotypes. Genome-wide SNP comparison along with RNA-seq analysis identified candidate genes related to domestication that have potential as trait-related markers for genetics and breeding of radish. Radish (Raphanus sativus L.) is an annual root vegetable crop that also encompasses diverse wild species. Radish has a long history of domestication, but the origins and selective sweep of cultivated radishes remain controversial. Here, we present comprehensive whole-genome resequencing analysis of radish to explore genomic variation between the radish genotypes and to identify genetic bottlenecks due to domestication in Asian cultivars. High-depth resequencing and multi-sample genotyping analysis of ten cultivated and seven wild accessions obtained 4.0 million high-quality homozygous single-nucleotide polymorphisms (SNPs)/insertions or deletions. Variation analysis revealed that Asian cultivated radish types are closely related to wild Asian accessions, but are distinct from European/American cultivated radishes, supporting the notion that Asian cultivars were domesticated from wild Asian genotypes. SNP comparison between Asian genotypes identified 153 candidate domestication regions (CDRs) containing 512 genes. Network analysis of the genes in CDRs functioning in plant signaling pathways and biochemical processes identified group of genes related to root architecture, cell wall, sugar metabolism, and glucosinolate biosynthesis. Expression profiling of the genes during root development suggested that domestication-related selective advantages included a main taproot with few branched lateral roots, reduced cell wall rigidity and favorable taste. Overall, this study provides evolutionary insights into domestication-related genetic selection in radish as well as identification of gene candidates with the potential to act as trait-related markers for background selection of elite lines in molecular

  5. Genetic diversity, molecular phylogeny and selection evidence of the silkworm mitochondria implicated by complete resequencing of 41 genomes

    Directory of Open Access Journals (Sweden)

    Tellier Laurent C

    2010-03-01

    Full Text Available Abstract Background Mitochondria are a valuable resource for studying the evolutionary process and deducing phylogeny. A few mitochondria genomes have been sequenced, but a comprehensive picture of the domestication event for silkworm mitochondria remains to be established. In this study, we integrate the extant data, and perform a whole genome resequencing of Japanese wild silkworm to obtain breakthrough results in silkworm mitochondrial (mt population, and finally use these to deduce a more comprehensive phylogeny of the Bombycidae. Results We identified 347 single nucleotide polymorphisms (SNPs in the mt genome, but found no past recombination event to have occurred in the silkworm progenitor. A phylogeny inferred from these whole genome SNPs resulted in a well-classified tree, confirming that the domesticated silkworm, Bombyx mori, most recently diverged from the Chinese wild silkworm, rather than from the Japanese wild silkworm. We showed that the population sizes of the domesticated and Chinese wild silkworms both experience neither expansion nor contraction. We also discovered that one mt gene, named cytochrome b, shows a strong signal of positive selection in the domesticated clade. This gene is related to energy metabolism, and may have played an important role during silkworm domestication. Conclusions We present a comparative analysis on 41 mt genomes of B. mori and B. mandarina from China and Japan. With these, we obtain a much clearer picture of the evolution history of the silkworm. The data and analyses presented here aid our understanding of the silkworm in general, and provide a crucial insight into silkworm phylogeny.

  6. Development of novel InDel markers and genetic diversity in Chenopodium quinoa through whole-genome re-sequencing.

    Science.gov (United States)

    Zhang, Tifu; Gu, Minfeng; Liu, Yuhe; Lv, Yuanda; Zhou, Ling; Lu, Haiyan; Liang, Shuaiqiang; Bao, Huabin; Zhao, Han

    2017-09-05

    Quinoa (Chenopodium quinoa Willd.) is a balanced nutritional crop, but its breeding improvement has been limited by the lack of information on its genetics and genomics. Therefore, it is necessary to obtain knowledge on genomic variation, population structure, and genetic diversity and to develop novel Insertion/Deletion (InDel) markers for quinoa by whole-genome re-sequencing. We re-sequenced 11 quinoa accessions and obtained a coverage depth between approximately 7× to 23× the quinoa genome. Based on the 1453-megabase (Mb) assembly from the reference accession Riobamba, 8,441,022 filtered bi-allelic single nucleotide polymorphisms (SNPs) and 842,783 filtered InDels were identified, with an estimated SNP and InDel density of 5.81 and 0.58 per kilobase (kb). From the genomic InDel variations, 85 dimorphic InDel markers were newly developed and validated. Together with the 62 simple sequence repeat (SSR) markers reported, a total of 147 markers were used for genotyping the 129 quinoa accessions. Molecular grouping analysis showed classification into two major groups, the Andean highland (composed of the northern and southern highland subgroups) and Chilean coastal, based on combined STRUCTURE, phylogenetic tree and PCA (Principle Component Analysis) analyses. Further analysis of the genetic diversity exhibited a decreasing tendency from the Chilean coast group to the Andean highland group, and the gene flow between subgroups was more frequent than that between the two subgroups and the Chilean coastal group. The majority of the variations (approximately 70%) were found through an analysis of molecular variation (AMOVA) due to the diversity between the groups. This was congruent with the observation of a highly significant FST value (0.705) between the groups, demonstrating significant genetic differentiation between the Andean highland type of quinoa and the Chilean coastal type. Moreover, a core set of 16 quinoa germplasms that capture all 362 alleles was

  7. Prediction of Genes Related to Positive Selection Using Whole-Genome Resequencing in Three Commercial Pig Breeds

    Directory of Open Access Journals (Sweden)

    HyoYoung Kim

    2015-12-01

    Full Text Available Selective sweep can cause genetic differentiation across populations, which allows for the identification of possible causative regions/genes underlying important traits. The pig has experienced a long history of allele frequency changes through artificial selection in the domestication process. We obtained an average of 329,482,871 sequence reads for 24 pigs from three pig breeds: Yorkshire (n = 5, Landrace (n = 13, and Duroc (n = 6. An average read depth of 11.7 was obtained using whole-genome resequencing on an Illumina HiSeq2000 platform. In this study, cross-population extended haplotype homozygosity and cross-population composite likelihood ratio tests were implemented to detect genes experiencing positive selection for the genome-wide resequencing data generated from three commercial pig breeds. In our results, 26, 7, and 14 genes from Yorkshire, Landrace, and Duroc, respectively were detected by two kinds of statistical tests. Significant evidence for positive selection was identified on genes ST6GALNAC2 and EPHX1 in Yorkshire, PARK2 in Landrace, and BMP6, SLA-DQA1, and PRKG1 in Duroc.These genes are reportedly relevant to lactation, reproduction, meat quality, and growth traits. To understand how these single nucleotide polymorphisms (SNPs related positive selection affect protein function, we analyzed the effect of non-synonymous SNPs. Three SNPs (rs324509622, rs80931851, and rs80937718 in the SLA-DQA1 gene were significant in the enrichment tests, indicating strong evidence for positive selection in Duroc. Our analyses identified genes under positive selection for lactation, reproduction, and meat-quality and growth traits in Yorkshire, Landrace, and Duroc, respectively.

  8. Whole genome re-sequencing of date palms yields insights into diversification of a fruit tree crop

    Science.gov (United States)

    Hazzouri, Khaled M.; Flowers, Jonathan M.; Visser, Hendrik J.; Khierallah, Hussam S. M.; Rosas, Ulises; Pham, Gina M.; Meyer, Rachel S.; Johansen, Caryn K.; Fresquez, Zoë A.; Masmoudi, Khaled; Haider, Nadia; El Kadri, Nabila; Idaghdour, Youssef; Malek, Joel A.; Thirkhill, Deborah; Markhand, Ghulam S.; Krueger, Robert R.; Zaid, Abdelouahhab; Purugganan, Michael D.

    2015-01-01

    Date palms (Phoenix dactylifera) are the most significant perennial crop in arid regions of the Middle East and North Africa. Here, we present a comprehensive catalogue of approximately seven million single nucleotide polymorphisms in date palms based on whole genome re-sequencing of a collection of 62 cultivars. Population structure analysis indicates a major genetic divide between North Africa and the Middle East/South Asian date palms, with evidence of admixture in cultivars from Egypt and Sudan. Genome-wide scans for selection suggest at least 56 genomic regions associated with selective sweeps that may underlie geographic adaptation. We report candidate mutations for trait variation, including nonsense polymorphisms and presence/absence variation in gene content in pathways for key agronomic traits. We also identify a copia-like retrotransposon insertion polymorphism in the R2R3 myb-like orthologue of the oil palm virescens gene associated with fruit colour variation. This analysis documents patterns of post-domestication diversification and provides a genomic resource for this economically important perennial tree crop. PMID:26549859

  9. Facile whole mitochondrial genome resequencing from nipple aspirate fluid using MitoChip v2.0

    Directory of Open Access Journals (Sweden)

    Thayer Robert E

    2008-04-01

    Full Text Available Abstract Background Mutations in the mitochondrial genome (mtgenome have been associated with many disorders, including breast cancer. Nipple aspirate fluid (NAF from symptomatic women could potentially serve as a minimally invasive sample for breast cancer screening by detecting somatic mutations in this biofluid. This study is aimed at 1 demonstrating the feasibility of NAF recovery from symptomatic women, 2 examining the feasibility of sequencing the entire mitochondrial genome from NAF samples, 3 cross validation of the Human mitochondrial resequencing array 2.0 (MCv2, and 4 assessing the somatic mtDNA mutation rate in benign breast diseases as a potential tool for monitoring early somatic mutations associated with breast cancer. Methods NAF and blood were obtained from women with symptomatic benign breast conditions, and we successfully assessed the mutation load in the entire mitochondrial genome of 19 of these women. DNA extracts from NAF were sequenced using the mitochondrial resequencing array MCv2 and by capillary electrophoresis (CE methods as a quality comparison. Sequencing was performed independently at two institutions and the results compared. The germline mtDNA sequence determined using DNA isolated from the patient's blood (control was compared to the mutations present in cellular mtDNA recovered from patient's NAF. Results From the cohort of 28 women recruited for this study, NAF was successfully recovered from 23 participants (82%. Twenty two (96% of the women produced fluids from both breasts. Twenty NAF samples and corresponding blood were chosen for this study. Except for one NAF sample, the whole mtgenome was successfully amplified using a single primer pair, or three pairs of overlapping primers. Comparison of MCv2 data from the two institutions demonstrates 99.200% concordance. Moreover, MCv2 data was 99.999% identical to CE sequencing, indicating that MCv2 is a reliable method to rapidly sequence the entire mtgenome

  10. Quantitative trait locus mapping and candidate gene analysis for plant architecture traits using whole genome re-sequencing in rice.

    Science.gov (United States)

    Lim, Jung-Hyun; Yang, Hyun-Jung; Jung, Ki-Hong; Yoo, Soo-Cheul; Paek, Nam-Chon

    2014-02-01

    Plant breeders have focused on improving plant architecture as an effective means to increase crop yield. Here, we identify the main-effect quantitative trait loci (QTLs) for plant shape-related traits in rice (Oryza sativa) and find candidate genes by applying whole genome re-sequencing of two parental cultivars using next-generation sequencing. To identify QTLs influencing plant shape, we analyzed six traits: plant height, tiller number, panicle diameter, panicle length, flag leaf length, and flag leaf width. We performed QTL analysis with 178 F7 recombinant in-bred lines (RILs) from a cross of japonica rice line 'SNUSG1' and indica rice line 'Milyang23'. Using 131 molecular markers, including 28 insertion/deletion markers, we identified 11 main- and 16 minor-effect QTLs for the six traits with a threshold LOD value > 2.8. Our sequence analysis identified fifty-four candidate genes for the main-effect QTLs. By further comparison of coding sequences and meta-expression profiles between japonica and indica rice varieties, we finally chose 15 strong candidate genes for the 11 main-effect QTLs. Our study shows that the whole-genome sequence data substantially enhanced the efficiency of polymorphic marker development for QTL fine-mapping and the identification of possible candidate genes. This yields useful genetic resources for breeding high-yielding rice cultivars with improved plant architecture.

  11. Genome-Scale Models

    DEFF Research Database (Denmark)

    Bergdahl, Basti; Sonnenschein, Nikolaus; Machado, Daniel

    2016-01-01

    An introduction to genome-scale models, how to build and use them, will be given in this chapter. Genome-scale models have become an important part of systems biology and metabolic engineering, and are increasingly used in research, both in academica and in industry, both for modeling chemical...

  12. Whole genome re-sequencing identifies a mutation in an ABC transporter (mdr2) in a Plasmodium chabaudi clone with altered susceptibility to antifolate drugs ☆

    OpenAIRE

    Martinelli, Axel; Henriques, Gisela; Cravo, Pedro; Hunt, Paul

    2011-01-01

    In malaria parasites, mutations in two genes of folate biosynthesis encoding dihydrofolate reductase (dhfr) and dihydropteroate synthase (dhps) modify responses to antifolate therapies which target these enzymes. However, the involvement of other genes which modify the availability of exogenous folate, for example, has been proposed. Here, we used short-read whole-genome re-sequencing to determine the mutations in a clone of the rodent malaria parasite, Plasmodium chabaudi, which has altered ...

  13. Mutation Detection with Next-Generation Resequencing through a Mediator Genome

    Energy Technology Data Exchange (ETDEWEB)

    Wurtzel, Omri; Dori-Bachash, Mally; Pietrokovski, Shmuel; Jurkevitch, Edouard; Sorek, Rotem; Ben-Jacob, Eshel

    2010-12-31

    The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.

  14. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.

    Directory of Open Access Journals (Sweden)

    Gu Minghong

    2010-11-01

    Full Text Available Abstract Background Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs, for the analysis of complex traits in plant molecular genetics. Results In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. Conclusions The present results demonstrated that high

  15. Developing high throughput genotyped chromosome segment substitution lines based on population whole-genome re-sequencing in rice (Oryza sativa L.).

    Science.gov (United States)

    Xu, Jianjun; Zhao, Qiang; Du, Peina; Xu, Chenwu; Wang, Baohe; Feng, Qi; Liu, Qiaoquan; Tang, Shuzhu; Gu, Minghong; Han, Bin; Liang, Guohua

    2010-11-24

    Genetic populations provide the basis for a wide range of genetic and genomic studies and have been widely used in genetic mapping, gene discovery and genomics-assisted breeding. Chromosome segment substitution lines (CSSLs) are the most powerful tools for the detection and precise mapping of quantitative trait loci (QTLs), for the analysis of complex traits in plant molecular genetics. In this study, a wide population consisting of 128 CSSLs was developed, derived from the crossing and back-crossing of two sequenced rice cultivars: 9311, an elite indica cultivar as the recipient and Nipponbare, a japonica cultivar as the donor. First, a physical map of the 128 CSSLs was constructed on the basis of estimates of the lengths and locations of the substituted chromosome segments using 254 PCR-based molecular markers. From this map, the total size of the 142 substituted segments in the population was 882.2 Mb, was 2.37 times that of the rice genome. Second, every CSSL underwent high-throughput genotyping by whole-genome re-sequencing with a 0.13× genome sequence, and an ultrahigh-quality physical map was constructed. This sequencing-based physical map indicated that 117 new segments were detected; almost all were shorter than 3 Mb and were not apparent in the molecular marker map. Furthermore, relative to the molecular marker-based map, the sequencing-based map yielded more precise recombination breakpoint determination and greater accuracy of the lengths of the substituted segments, and provided more accurate background information. Third, using the 128 CSSLs combined with the bin-map converted from the sequencing-based physical map, a multiple linear regression QTL analysis mapped nine QTLs, which explained 89.50% of the phenotypic variance for culm length. A large-effect QTL was located in a 791,655 bp region that contained the rice 'green revolution' gene. The present results demonstrated that high throughput genotyped CSSLs combine the advantages of an ultrahigh

  16. High-Resolution Mapping of Crossover and Non-crossover Recombination Events by Whole-Genome Re-sequencing of an Avian Pedigree.

    Directory of Open Access Journals (Sweden)

    Linnéa Smeds

    2016-05-01

    Full Text Available Recombination is an engine of genetic diversity and therefore constitutes a key process in evolutionary biology and genetics. While the outcome of crossover recombination can readily be detected as shuffled alleles by following the inheritance of markers in pedigreed families, the more precise location of both crossover and non-crossover recombination events has been difficult to pinpoint. As a consequence, we lack a detailed portrait of the recombination landscape for most organisms and knowledge on how this landscape impacts on sequence evolution at a local scale. To localize recombination events with high resolution in an avian system, we performed whole-genome re-sequencing at high coverage of a complete three-generation collared flycatcher pedigree. We identified 325 crossovers at a median resolution of 1.4 kb, with 86% of the events localized to <10 kb intervals. Observed crossover rates were in excellent agreement with data from linkage mapping, were 52% higher in male (3.56 cM/Mb than in female meiosis (2.28 cM/Mb, and increased towards chromosome ends in male but not female meiosis. Crossover events were non-randomly distributed in the genome with several distinct hot-spots and a concentration to genic regions, with the highest density in promoters and CpG islands. We further identified 267 non-crossovers, whose location was significantly associated with crossover locations. We detected a significant transmission bias (0.18 in favour of 'strong' (G, C over 'weak' (A, T alleles at non-crossover events, providing direct evidence for the process of GC-biased gene conversion in an avian system. The approach taken in this study should be applicable to any species and would thereby help to provide a more comprehensive portray of the recombination landscape across organism groups.

  17. Complete resequencing of 40 genomes reveals domestication events and genes in silkworm (Bombyx)

    DEFF Research Database (Denmark)

    Xia, Qingyou; Guo, Yiran; Zhang, Ze

    2009-01-01

    A single-base pair resolution silkworm genetic variation map was constructed from 40 domesticated and wild silkworms, each sequenced to approximately threefold coverage, representing 99.88% of the genome. We identified ~16 million single-nucleotide polymorphisms, many indels, and structural...... genes that may have been important during domestication, some of which have enriched expression in the silk gland, midgut, and testis. These data add to our understanding of the domestication processes and may have applications in devising pest control strategies and advancing the use of silkworms...

  18. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large and com...

  19. Genome-wide DNA polymorphism in the indica rice varieties RGD-7S and Taifeng B as revealed by whole genome re-sequencing.

    Science.gov (United States)

    Fu, Chong-Yun; Liu, Wu-Ge; Liu, Di-Lin; Li, Ji-Hua; Zhu, Man-Shan; Liao, Yi-Long; Liu, Zhen-Rong; Zeng, Xue-Qin; Wang, Feng

    2016-03-01

    Next-generation sequencing technologies provide opportunities to further understand genetic variation, even within closely related cultivars. We performed whole genome resequencing of two elite indica rice varieties, RGD-7S and Taifeng B, whose F1 progeny showed hybrid weakness and hybrid vigor when grown in the early- and late-cropping seasons, respectively. Approximately 150 million 100-bp pair-end reads were generated, which covered ∼86% of the rice (Oryza sativa L. japonica 'Nipponbare') reference genome. A total of 2,758,740 polymorphic sites including 2,408,845 SNPs and 349,895 InDels were detected in RGD-7S and Taifeng B, respectively. Applying stringent parameters, we identified 961,791 SNPs and 46,640 InDels between RGD-7S and Taifeng B (RGD-7S/Taifeng B). The density of DNA polymorphisms was 256.8 SNPs and 12.5 InDels per 100 kb for RGD-7S/Taifeng B. Copy number variations (CNVs) were also investigated. In RGD-7S, 1989 of 2727 CNVs were overlapped in 218 genes, and 1231 of 2010 CNVs were annotated in 175 genes in Taifeng B. In addition, we verified a subset of InDels in the interval of hybrid weakness genes, Hw3 and Hw4, and obtained some polymorphic InDel markers, which will provide a sound foundation for cloning hybrid weakness genes. Analysis of genomic variations will also contribute to understanding the genetic basis of hybrid weakness and heterosis.

  20. Genome wide re-sequencing of newly developed Rice Lines from common wild rice (Oryza rufipogon Griff.) for the identification of NBS-LRR genes.

    Science.gov (United States)

    Liu, Wen; Ghouri, Fozia; Yu, Hang; Li, Xiang; Yu, Shuhong; Shahid, Muhammad Qasim; Liu, Xiangdong

    2017-01-01

    Common wild rice (Oryza rufipogon Griff.) is an important germplasm for rice breeding, which contains many resistance genes. Re-sequencing provides an unprecedented opportunity to explore the abundant useful genes at whole genome level. Here, we identified the nucleotide-binding site leucine-rich repeat (NBS-LRR) encoding genes by re-sequencing of two wild rice lines (i.e. Huaye 1 and Huaye 2) that were developed from common wild rice. We obtained 128 to 147 million reads with approximately 32.5-fold coverage depth, and uniquely covered more than 89.6% (> = 1 fold) of reference genomes. Two wild rice lines showed high SNP (single-nucleotide polymorphisms) variation rate in 12 chromosomes against the reference genomes of Nipponbare (japonica cultivar) and 93-11 (indica cultivar). InDels (insertion/deletion polymorphisms) count-length distribution exhibited normal distribution in the two lines, and most of the InDels were ranged from -5 to 5 bp. With reference to the Nipponbare genome sequence, we detected a total of 1,209,308 SNPs, 161,117 InDels and 4,192 SVs (structural variations) in Huaye 1, and 1,387,959 SNPs, 180,226 InDels and 5,305 SVs in Huaye 2. A total of 44.9% and 46.9% genes exhibited sequence variations in two wild rice lines compared to the Nipponbare and 93-11 reference genomes, respectively. Analysis of NBS-LRR mutant candidate genes showed that they were mainly distributed on chromosome 11, and NBS domain was more conserved than LRR domain in both wild rice lines. NBS genes depicted higher levels of genetic diversity in Huaye 1 than that found in Huaye 2. Furthermore, protein-protein interaction analysis showed that NBS genes mostly interacted with the cytochrome C protein (Os05g0420600, Os01g0885000 and BGIOSGA038922), while some NBS genes interacted with heat shock protein, DNA-binding activity, Phosphoinositide 3-kinase and a coiled coil region. We explored abundant NBS-LRR encoding genes in two common wild rice lines through genome wide re-sequencing

  1. Rice Chloroplast Genome Variation Architecture and Phylogenetic Dissection in Diverse Oryza Species Assessed by Whole-Genome Resequencing.

    Science.gov (United States)

    Tong, Wei; Kim, Tae-Sung; Park, Yong-Jin

    2016-12-01

    Chloroplast genome variations have been detected, despite its overall conserved structure, which has been valuable for plant population genetics and evolutionary studies. Here, we described chloroplast variation architecture of 383 rice accessions from diverse regions and different ecotypes, in order to mine the rice chloroplast genome variation architecture and phylogenetic. A total of 3677 variations across the chloroplast genome were identified with an average density of 27.33 per kb, in which wild rice showing a higher variation density than cultivated groups. Chloroplast genome nucleotide diversity investigation indicated a high degree of diversity in wild rice than in cultivated rice. Genetic distance estimation revealed that African rice showed a low level of breeding and connectivity with the Asian rice, suggesting the big distinction of them. Population structure and principal component analysis revealed the existence of clear clustering of African and Asian rice, as well as the indica and japonica in Asian cultivated rice. Phylogenetic analysis based on maximum likelihood and Bayesian inference methods and the population splits test suggested and supported the independent origins of indica and japonica within Asian cultivated rice. In addition, the African cultivated rice was thought to be domesticated differently from Asian cultivated rice. The chloroplast genome variation architecture in Asian and African rice are different, as well as within Asian or African rice. Wild rice and cultivated rice also have distinct nucleotide diversity or genetic distance. In chloroplast level, the independent origins of indica and japonica within Asian cultivated rice were suggested and the African cultivated rice was thought to be domesticated differently from Asian cultivated rice. These results will provide more candidate evidence for the further rice chloroplast genomic and evolution studies.

  2. Whole-genome resequencing of honeybee drones to detect genomic selection in a population managed for royal jelly.

    Science.gov (United States)

    Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain

    2016-06-03

    Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail.

  3. Bioinformatics pipelines for targeted resequencing and whole-exome sequencing of human and mouse genomes: a virtual appliance approach for instant deployment.

    Directory of Open Access Journals (Sweden)

    Jason Li

    Full Text Available Targeted resequencing by massively parallel sequencing has become an effective and affordable way to survey small to large portions of the genome for genetic variation. Despite the rapid development in open source software for analysis of such data, the practical implementation of these tools through construction of sequencing analysis pipelines still remains a challenging and laborious activity, and a major hurdle for many small research and clinical laboratories. We developed TREVA (Targeted REsequencing Virtual Appliance, making pre-built pipelines immediately available as a virtual appliance. Based on virtual machine technologies, TREVA is a solution for rapid and efficient deployment of complex bioinformatics pipelines to laboratories of all sizes, enabling reproducible results. The analyses that are supported in TREVA include: somatic and germline single-nucleotide and insertion/deletion variant calling, copy number analysis, and cohort-based analyses such as pathway and significantly mutated genes analyses. TREVA is flexible and easy to use, and can be customised by Linux-based extensions if required. TREVA can also be deployed on the cloud (cloud computing, enabling instant access without investment overheads for additional hardware. TREVA is available at http://bioinformatics.petermac.org/treva/.

  4. Mapping and Identifying a Candidate Gene (Bnmfs for Female-Male Sterility through Whole-Genome Resequencing and RNA-Seq in Rapeseed (Brassica napus L.

    Directory of Open Access Journals (Sweden)

    Changcai Teng

    2017-12-01

    Full Text Available In oilseed crops, carpel and stamen development play vital roles in pollination and rapeseed yield, but the genetic mechanisms underlying carpel and stamen development remain unclear. Herein, a male- and female-sterile mutant was obtained in offspring of a (Brassica napus cv. Qingyou 14 × (Qingyou 14 × B. rapa landrace Dahuang cross. Subsequently, F2–F9 populations were generated through selfing of the heterozygote plants among the progeny of each generation. The male- and female-sterility exhibited stable inheritance in successive generations and was controlled by a recessive gene. The mutant kept the same chromosome number (2n = 38 as B. napus parent but showed abnormal meiosis for male and female. One candidate gene for the sterility was identified by simple sequence repeat (SSR and insertion deletion length polymorphism (InDel markers in F7–F9 plants, and whole-genome resequencing with F8 pools and RNA sequencing with F9 pools. Whole-genome resequencing found three candidate intervals (35.40–35.68, 35.74–35.75, and 45.34–46.45 Mb on chromosome C3 in B. napus and candidate region for Bnmfs was narrowed to approximately 1.11-Mb (45.34–46.45 M by combining SSR and InDel marker analyses with whole-genome resequencing. From transcriptome profiling in 0–2 mm buds, all of the genes in the candidate interval were detected, and only two genes with significant differences (BnaC03g56670D and BnaC03g56870D were revealed. BnaC03g56870D was a candidate gene that shared homology with the CYP86C4 gene of Arabidopsis thaliana. Quantitative reverse transcription (qRT-PCR analysis showed that Bnmfs primarily functioned in flower buds. Thus, sequencing and expression analyses provided evidence that BnaC03g56870D was the candidate gene for male and female sterility in the B. napus mutant.

  5. Experimental evolution, genetic analysis and genome re-sequencing reveal the mutation conferring artemisinin resistance in an isogenic lineage of malaria parasites

    KAUST Repository

    Hunt, Paul

    2010-09-16

    Background: Classical and quantitative linkage analyses of genetic crosses have traditionally been used to map genes of interest, such as those conferring chloroquine or quinine resistance in malaria parasites. Next-generation sequencing technologies now present the possibility of determining genome-wide genetic variation at single base-pair resolution. Here, we combine in vivo experimental evolution, a rapid genetic strategy and whole genome re-sequencing to identify the precise genetic basis of artemisinin resistance in a lineage of the rodent malaria parasite, Plasmodium chabaudi. Such genetic markers will further the investigation of resistance and its control in natural infections of the human malaria, P. falciparum.Results: A lineage of isogenic in vivo drug-selected mutant P. chabaudi parasites was investigated. By measuring the artemisinin responses of these clones, the appearance of an in vivo artemisinin resistance phenotype within the lineage was defined. The underlying genetic locus was mapped to a region of chromosome 2 by Linkage Group Selection in two different genetic crosses. Whole-genome deep coverage short-read re-sequencing (IlluminaSolexa) defined the point mutations, insertions, deletions and copy-number variations arising in the lineage. Eight point mutations arise within the mutant lineage, only one of which appears on chromosome 2. This missense mutation arises contemporaneously with artemisinin resistance and maps to a gene encoding a de-ubiquitinating enzyme.Conclusions: This integrated approach facilitates the rapid identification of mutations conferring selectable phenotypes, without prior knowledge of biological and molecular mechanisms. For malaria, this model can identify candidate genes before resistant parasites are commonly observed in natural human malaria populations. 2010 Hunt et al; licensee BioMed Central Ltd.

  6. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Directory of Open Access Journals (Sweden)

    Nedenia Bonvino Stafuzza

    Full Text Available Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose, Gyr, Girolando and Holstein (dairy production. A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs and 3,828,041 insertions/deletions (InDels were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  7. Genome resequencing and transcriptome profiling reveal structural diversity and expression patterns of constitutive disease resistance genes in Huanglongbing-tolerant Poncirus trifoliata and its hybrids

    Science.gov (United States)

    Rawat, Nidhi; Kumar, Brajendra; Albrecht, Ute; Du, Dongliang; Huang, Ming; Yu, Qibin; Zhang, Yi; Duan, Yong-Ping; Bowman, Kim D; Gmitter, Fred G; Deng, Zhanao

    2017-01-01

    Huanglongbing (HLB) is the most destructive bacterial disease of citrus worldwide. While most citrus varieties are susceptible to HLB, Poncirus trifoliata, a close relative of Citrus, and some of its hybrids with Citrus are tolerant to HLB. No specific HLB tolerance genes have been identified in P. trifoliata but recent studies have shown that constitutive disease resistance (CDR) genes were expressed at much higher levels in HLB-tolerant Poncirus hybrids and the expression of CDR genes was modulated by Candidatus Liberibacter asiaticus (CLas), the pathogen of HLB. The current study was undertaken to mine and characterize the CDR gene family in Citrus and Poncirus and to understand its association with HLB tolerance in Poncirus. We identified 17 CDR genes in two citrus genomes, deduced their structures, and investigated their phylogenetic relationships. We revealed that the expansion of the CDR family in Citrus seems to be due to segmental and tandem duplication events. Through genome resequencing and transcriptome sequencing, we identified eight CDR genes in the Poncirus genome (PtCDR1-PtCDR8). The number of SNPs was the highest in PtCDR2 and the lowest in PtCDR7. Most of the deletion and insertion events were observed in the UTR regions of Citrus and Poncirus CDR genes. PtCDR2 and PtCDR8 were in abundance in the leaf transcriptomes of two HLB-tolerant Poncirus genotypes and were also upregulated in HLB-tolerant, Poncirus hybrids as revealed by real-time PCR analysis. These two CDR genes seem to be good candidate genes for future studies of their role in citrus-CLas interactions. PMID:29152310

  8. Application of whole genome re-sequencing data in the development of diagnostic DNA markers tightly linked to a disease-resistance locus for marker-assisted selection in lupin (Lupinus angustifolius).

    Science.gov (United States)

    Yang, Huaan; Jian, Jianbo; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark W; Tan, Cong; Li, Chengdao

    2015-09-02

    Molecular marker-assisted breeding provides an efficient tool to develop improved crop varieties. A major challenge for the broad application of markers in marker-assisted selection is that the marker phenotypes must match plant phenotypes in a wide range of breeding germplasm. In this study, we used the legume crop species Lupinus angustifolius (lupin) to demonstrate the utility of whole genome sequencing and re-sequencing on the development of diagnostic markers for molecular plant breeding. Nine lupin cultivars released in Australia from 1973 to 2007 were subjected to whole genome re-sequencing. The re-sequencing data together with the reference genome sequence data were used in marker development, which revealed 180,596 to 795,735 SNP markers from pairwise comparisons among the cultivars. A total of 207,887 markers were anchored on the lupin genetic linkage map. Marker mining obtained an average of 387 SNP markers and 87 InDel markers for each of the 24 genome sequence assembly scaffolds bearing markers linked to 11 genes of agronomic interest. Using the R gene PhtjR conferring resistance to phomopsis stem blight disease as a test case, we discovered 17 candidate diagnostic markers by genotyping and selecting markers on a genetic linkage map. A further 243 candidate diagnostic markers were discovered by marker mining on a scaffold bearing non-diagnostic markers linked to the PhtjR gene. Nine out from the ten tested candidate diagnostic markers were confirmed as truly diagnostic on a broad range of commercial cultivars. Markers developed using these strategies meet the requirements for broad application in molecular plant breeding. We demonstrated that low-cost genome sequencing and re-sequencing data were sufficient and very effective in the development of diagnostic markers for marker-assisted selection. The strategies used in this study may be applied to any trait or plant species. Whole genome sequencing and re-sequencing provides a powerful tool to overcome

  9. The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions

    DEFF Research Database (Denmark)

    Guo, Shaogui; Zhang, Jianguo; Sun, Honghe

    2013-01-01

    Watermelon, Citrullus lanatus, is an important cucurbit crop grown throughout the world. Here we report a high-quality draft genome sequence of the east Asia watermelon cultivar 97103 (2n = 2× = 22) containing 23,440 predicted protein-coding genes. Comparative genomics analysis provided an evolut...

  10. Whole genome re-sequencing identifies a mutation in an ABC transporter (mdr2) in a Plasmodium chabaudi clone with altered susceptibility to antifolate drugs.

    Science.gov (United States)

    Martinelli, Axel; Henriques, Gisela; Cravo, Pedro; Hunt, Paul

    2011-02-01

    In malaria parasites, mutations in two genes of folate biosynthesis encoding dihydrofolate reductase (dhfr) and dihydropteroate synthase (dhps) modify responses to antifolate therapies which target these enzymes. However, the involvement of other genes which modify the availability of exogenous folate, for example, has been proposed. Here, we used short-read whole-genome re-sequencing to determine the mutations in a clone of the rodent malaria parasite, Plasmodium chabaudi, which has altered susceptibility to both sulphadoxine and pyrimethamine. This clone bears a previously identified S106N mutation in dhfr and no mutation in dhps. Instead, three additional point mutations in genes on chromosomes 2, 13 and 14 were identified. The mutated gene on chromosome 13 (mdr2 K392Q) encodes an ABC transporter. Because Quantitative Trait Locus analysis previously indicated an association of genetic markers on chromosome 13 with responses to individual and combined antifolates, MDR2 is proposed to modulate antifolate responses, possibly mediated by the transport of folate intermediates. Copyright © 2010 Australian Society for Parasitology Inc. All rights reserved.

  11. Scanning and Filling: Ultra-Dense SNP Genotyping Combining Genotyping-By-Sequencing, SNP Array and Whole-Genome Resequencing Data.

    Directory of Open Access Journals (Sweden)

    Davoud Torkamaneh

    Full Text Available Genotyping-by-sequencing (GBS represents a highly cost-effective high-throughput genotyping approach. By nature, however, GBS is subject to generating sizeable amounts of missing data and these will need to be imputed for many downstream analyses. The extent to which such missing data can be tolerated in calling SNPs has not been explored widely. In this work, we first explore the use of imputation to fill in missing genotypes in GBS datasets. Importantly, we use whole genome resequencing data to assess the accuracy of the imputed data. Using a panel of 301 soybean accessions, we show that over 62,000 SNPs could be called when tolerating up to 80% missing data, a five-fold increase over the number called when tolerating up to 20% missing data. At all levels of missing data examined (between 20% and 80%, the resulting SNP datasets were of uniformly high accuracy (96-98%. We then used imputation to combine complementary SNP datasets derived from GBS and a SNP array (SoySNP50K. We thus produced an enhanced dataset of >100,000 SNPs and the genotypes at the previously untyped loci were again imputed with a high level of accuracy (95%. Of the >4,000,000 SNPs identified through resequencing 23 accessions (among the 301 used in the GBS analysis, 1.4 million tag SNPs were used as a reference to impute this large set of SNPs on the entire panel of 301 accessions. These previously untyped loci could be imputed with around 90% accuracy. Finally, we used the 100K SNP dataset (GBS + SoySNP50K to perform a GWAS on seed oil content within this collection of soybean accessions. Both the number of significant marker-trait associations and the peak significance levels were improved considerably using this enhanced catalog of SNPs relative to a smaller catalog resulting from GBS alone at ≤20% missing data. Our results demonstrate that imputation can be used to fill in both missing genotypes and untyped loci with very high accuracy and that this leads to more

  12. Genome-wide SNPs and re-sequencing of growth habit and inflorescence genes in barley: implications for association mapping in germplasm arrays varying in size and structure

    Directory of Open Access Journals (Sweden)

    Muehlbauer Gary J

    2010-12-01

    Full Text Available Abstract Background Considerations in applying association mapping (AM to plant breeding are population structure and size: not accounting for structure and/or using small populations can lead to elevated false-positive rates. The principal determinants of population structure in cultivated barley are growth habit and inflorescence type. Both are under complex genetic control: growth habit is controlled by the epistatic interactions of several genes. For inflorescence type, multiple loss-of-function alleles in one gene lead to the same phenotype. We used these two traits as models for assessing the effectiveness of AM. This research was initiated using the CAP Core germplasm array (n = 102 assembled at the start of the Barley Coordinated Agricultural Project (CAP. This array was genotyped with 4,608 SNPs and we re-sequenced genes involved in morphology, growth and development. Larger arrays of breeding germplasm were subsequently genotyped and phenotyped under the auspices of the CAP project. This provided sets of 247 accessions phenotyped for growth habit and 2,473 accessions phenotyped for inflorescence type. Each of the larger populations was genotyped with 3,072 SNPs derived from the original set of 4,608. Results Significant associations with SNPs located in the vicinity of the loci involved in growth habit and inflorescence type were found in the CAP Core. Differentiation of true and spurious associations was not possible without a priori knowledge of the candidate genes, based on re-sequencing. The re-sequencing data were used to define allele types of the determinant genes based on functional polymorphisms. In a second round of association mapping, these synthetic markers based on allele types gave the most significant associations. When the synthetic markers were used as anchor points for analysis of interactions, we detected other known-function genes and candidate loci involved in the control of growth habit and inflorescence type. We

  13. A re-sequencing based assessment of genomic heterogeneity and fast neutron-induced deletions in a common bean cultivar

    Directory of Open Access Journals (Sweden)

    Jamie A. O'Rourke

    2013-06-01

    Full Text Available A small fast neutron mutant population has been established from Phaseolus vulgaris cv. Red Hawk. We leveraged the available P. vulgaris genome sequence and high throughput next generation DNA sequencing to examine the genomic structure of five Phaseolus vulgaris cv. Red Hawk fast neutron mutants with striking visual phenotypes. Analysis of these genomes identified three classes of structural variation; between cultivar variation, natural variation within the fast neutron mutant population, and fast neutron induced mutagenesis. Our analyses focused on the latter two classes. We identified 23 large deletions (>40 bp common to multiple individuals, illustrating residual heterogeneity and regions of structural variation within the common bean cv. Red Hawk. An additional 18 large deletions were identified in individual mutant plants. These deletions, ranging in size from 40 bp to 43,000 bp, are potentially the result of fast neutron mutagenesis. Six of the 18 deletions lie near or within gene coding regions, identifying potential candidate genes causing the mutant phenotype.

  14. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    Directory of Open Access Journals (Sweden)

    Jie Qiu

    Full Text Available Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou and a wild line (Lanxi 1 collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1 no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2 besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3 high heterozygous rates (0.19-0.49 were observed in several semi-wild lines; and (4 over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure.

  15. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    Science.gov (United States)

    Qiu, Jie; Wang, Yu; Wu, Sanling; Wang, Ying-Ying; Ye, Chu-Yu; Bai, Xuefei; Li, Zefeng; Yan, Chenghai; Wang, Weidi; Wang, Ziqiang; Shu, Qingyao; Xie, Jiahua; Lee, Suk-Ha; Fan, Longjiang

    2014-01-01

    Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou) and a wild line (Lanxi 1) collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1) no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2) besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3) high heterozygous rates (0.19-0.49) were observed in several semi-wild lines; and (4) over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure.

  16. Resequencing analysis of five Mendelian genes and the top genes from genome-wide association studies in Parkinson's Disease.

    Science.gov (United States)

    Benitez, Bruno A; Davis, Albert A; Jin, Sheng Chih; Ibanez, Laura; Ortega-Cubero, Sara; Pastor, Pau; Choi, Jiyoon; Cooper, Breanna; Perlmutter, Joel S; Cruchaga, Carlos

    2016-04-19

    Most sequencing studies in Parkinson's disease (PD) have focused on either a particular gene, primarily in familial and early onset PD samples, or on screening single variants in sporadic PD cases. To date, there is no systematic study that sequences the most common PD causing genes with Mendelian inheritance [α-synuclein (SNCA), leucine-rich repeat kinase 2 (LRRK2), PARKIN, PTEN-induced putative kinase 1 (PINK1) and DJ-1 (Daisuke-Junko-1)] and susceptibility genes [glucocerebrosidase beta acid (GBA) and microtubule-associated protein tau (MAPT)] identified through genome-wide association studies (GWAS) in a European-American case-control sample (n=815). Disease-causing variants in the SNCA, LRRK2 and PARK2 genes were found in 2% of PD patients. The LRRK2, p.G2019S mutation was found in 0.6 % of sporadic PD and 4.8 % of familial PD cases. Gene-based analysis suggests that additional variants in the LRRK2 gene also contribute to PD risk. The SNCA duplication was found in 0.8 % of familial PD patients. Novel variants were found in 0.8% of PD cases and 0.6 % of controls. Heterozygous Gaucher disease-causing mutations in the GBA gene were found in 7.1 % of PD patients. Here, we established that the GBA variant (p.T408M) is associated with PD risk and age at onset. Additionally, gene-based and single-variant analyses demostrated that GBA gene variants (p.L483P, p.R83C, p.N409S, p.H294Q and p.E365K) increase PD risk. Our data suggest that the impact of additional untested coding variants in the GBA and LRRK2 genes is higher than previously estimated. Our data also provide compelling evidence of the existence of additional untested variants in the primary Mendelian and PD GWAS genes that contribute to the genetic etiology of sporadic PD.

  17. How well do HapMap haplotypes identify common haplotypes of genes? A comparison with haplotypes of 334 genes resequenced in the environmental genome project.

    Science.gov (United States)

    Taylor, Jack A; Xu, Zong-Li; Kaplan, Norman L; Morris, Richard W

    2006-01-01

    One of the goals of the International HapMap Project is the identification of common haplotypes in genes. However, HapMap uses an incomplete catalogue of single nucleotide polymorphisms (SNPs) and might miss some common haplotypes. We examined this issue using data from the Environmental Genome Project (EGP) which resequenced 335 genes in 90 people, and thus, has a nearly complete catalogue of gene SNPs. The EGP identified a total of 45,243 SNPs, of which 10,780 were common SNPs (minor allele frequency >or=0.1). Using EGP common SNP genotype data, we identified 1,459 haplotypes with frequency >or=0.05 and we use these as "benchmark" haplotypes. HapMap release 16 had genotype information for 1,573 of 10,780 (15%) EGP common SNPs. Using these SNPs, we identified common HapMap haplotypes (frequency >or=0.05) in each of the four HapMap ethnic groups. To compare common HapMap haplotypes to EGP benchmark haplotypes, we collapsed benchmark haplotypes to the set of 1,573 SNPs. Ninety-eight percent of the collapsed benchmark haplotypes could be found as common HapMap haplotypes in one or more of the four HapMap ethnic groups. However, collapsing benchmark haplotypes to the set of SNPs available in HapMap resulted in a loss of haplotype information: 545 of 1,459 (37%) benchmark haplotypes were uniquely identified, and only 25% of genes had all their benchmark haplotypes uniquely identified. We resampled the EGP data to examine the effect of increasing the number of HapMap SNPs to 5 million, and estimate that approximately 40% of common SNPs in genes will be sampled and that half of the genes will have sufficient SNPs to identify all common haplotypes. This inability to distinguish common haplotypes of genes may result in loss of power when examining haplotype-disease association. (Cancer Epidemiol Biomarkers Prev 2006;15(1):133-7).

  18. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    Science.gov (United States)

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most

  19. Analysis of genetic variation and potential applications in genome-scale metabolic modeling

    Directory of Open Access Journals (Sweden)

    João Gonçalo Rocha Cardoso

    2015-02-01

    Full Text Available Genetic variation is the motor of evolution and allows organisms to overcome the environmental challenges they encounter. It can be both beneficial and harmful in the process of engineering cell factories for the production of proteins and chemicals. Throughout the history of biotechnology, there have been efforts to exploit genetic variation in our favor to create strains with favorable phenotypes. Genetic variation can either be present in natural populations or it can be artificially created by mutagenesis and selection or adaptive laboratory evolution. On the other hand, unintended genetic variation during a long term production process may lead to significant economic losses and it is important to understand how to control this type of variation. With the emergence of next-generation sequencing technologies, genetic variation in microbial strains can now be determined on an unprecedented scale and resolution by re-sequencing thousands of strains systematically. In this article, we review challenges in the integration and analysis of large-scale re-sequencing data, present an extensive overview of bioinformatics methods for predicting the effects of genetic variants on protein function, and discuss approaches for interfacing existing bioinformatics approaches with genome-scale models of cellular processes in order to predict effects of sequence variation on cellular phenotypes.

  20. Genome-Wide Analysis of Simple Sequence Repeats and Efficient Development of Polymorphic SSR Markers Based on Whole Genome Re-Sequencing of Multiple Isolates of the Wheat Stripe Rust Fungus.

    Science.gov (United States)

    Luo, Huaiyong; Wang, Xiaojie; Zhan, Gangming; Wei, Guorong; Zhou, Xinli; Zhao, Jing; Huang, Lili; Kang, Zhensheng

    2015-01-01

    The biotrophic parasitic fungus Puccinia striiformis f. sp. tritici (Pst) causes stripe rust, a devastating disease of wheat, endangering global food security. Because the Pst population is highly dynamic, it is difficult to develop wheat cultivars with durable and highly effective resistance. Simple sequence repeats (SSRs) are widely used as molecular markers in genetic studies to determine population structure in many organisms. However, only a small number of SSR markers have been developed for Pst. In this study, a total of 4,792 SSR loci were identified using the whole genome sequences of six isolates from different regions of the world, with a marker density of one SSR per 22.95 kb. The majority of the SSRs were di- and tri-nucleotide repeats. A database containing 1,113 SSR markers were established. Through in silico comparison, the previously reported SSR markers were found mainly in exons, whereas the SSR markers in the database were mostly in intergenic regions. Furthermore, 105 polymorphic SSR markers were confirmed in silico by their identical positions and nucleotide variations with INDELs identified among the six isolates. When 104 in silico polymorphic SSR markers were used to genotype 21 Pst isolates, 84 produced the target bands, and 82 of them were polymorphic and revealed the genetic relationships among the isolates. The results show that whole genome re-sequencing of multiple isolates provides an ideal resource for developing SSR markers, and the newly developed SSR markers are useful for genetic and population studies of the wheat stripe rust fungus.

  1. Genome-Wide Analysis of Simple Sequence Repeats and Efficient Development of Polymorphic SSR Markers Based on Whole Genome Re-Sequencing of Multiple Isolates of the Wheat Stripe Rust Fungus.

    Directory of Open Access Journals (Sweden)

    Huaiyong Luo

    Full Text Available The biotrophic parasitic fungus Puccinia striiformis f. sp. tritici (Pst causes stripe rust, a devastating disease of wheat, endangering global food security. Because the Pst population is highly dynamic, it is difficult to develop wheat cultivars with durable and highly effective resistance. Simple sequence repeats (SSRs are widely used as molecular markers in genetic studies to determine population structure in many organisms. However, only a small number of SSR markers have been developed for Pst. In this study, a total of 4,792 SSR loci were identified using the whole genome sequences of six isolates from different regions of the world, with a marker density of one SSR per 22.95 kb. The majority of the SSRs were di- and tri-nucleotide repeats. A database containing 1,113 SSR markers were established. Through in silico comparison, the previously reported SSR markers were found mainly in exons, whereas the SSR markers in the database were mostly in intergenic regions. Furthermore, 105 polymorphic SSR markers were confirmed in silico by their identical positions and nucleotide variations with INDELs identified among the six isolates. When 104 in silico polymorphic SSR markers were used to genotype 21 Pst isolates, 84 produced the target bands, and 82 of them were polymorphic and revealed the genetic relationships among the isolates. The results show that whole genome re-sequencing of multiple isolates provides an ideal resource for developing SSR markers, and the newly developed SSR markers are useful for genetic and population studies of the wheat stripe rust fungus.

  2. A database and API for variation, dense genotyping and resequencing data

    Directory of Open Access Journals (Sweden)

    Flicek Paul

    2010-05-01

    Full Text Available Abstract Background Advances in sequencing and genotyping technologies are leading to the widespread availability of multi-species variation data, dense genotype data and large-scale resequencing projects. The 1000 Genomes Project and similar efforts in other species are challenging the methods previously used for storage and manipulation of such data necessitating the redesign of existing genome-wide bioinformatics resources. Results Ensembl has created a database and software library to support data storage, analysis and access to the existing and emerging variation data from large mammalian and vertebrate genomes. These tools scale to thousands of individual genome sequences and are integrated into the Ensembl infrastructure for genome annotation and visualisation. The database and software system is easily expanded to integrate both public and non-public data sources in the context of an Ensembl software installation and is already being used outside of the Ensembl project in a number of database and application environments. Conclusions Ensembl's powerful, flexible and open source infrastructure for the management of variation, genotyping and resequencing data is freely available at http://www.ensembl.org.

  3. Interspecies hybridization on DNA resequencing microarrays: efficiency of sequence recovery and accuracy of SNP detection in human, ape, and codfish mitochondrial DNA genomes sequenced on a human-specific MitoChip

    Directory of Open Access Journals (Sweden)

    Carr Steven M

    2007-09-01

    Full Text Available Abstract Background Iterative DNA "resequencing" on oligonucleotide microarrays offers a high-throughput method to measure intraspecific biodiversity, one that is especially suited to SNP-dense gene regions such as vertebrate mitochondrial (mtDNA genomes. However, costs of single-species design and microarray fabrication are prohibitive. A cost-effective, multi-species strategy is to hybridize experimental DNAs from diverse species to a common microarray that is tiled with oligonucleotide sets from multiple, homologous reference genomes. Such a strategy requires that cross-hybridization between the experimental DNAs and reference oligos from the different species not interfere with the accurate recovery of species-specific data. To determine the pattern and limits of such interspecific hybridization, we compared the efficiency of sequence recovery and accuracy of SNP identification by a 15,452-base human-specific microarray challenged with human, chimpanzee, gorilla, and codfish mtDNA genomes. Results In the human genome, 99.67% of the sequence was recovered with 100.0% accuracy. Accuracy of SNP identification declines log-linearly with sequence divergence from the reference, from 0.067 to 0.247 errors per SNP in the chimpanzee and gorilla genomes, respectively. Efficiency of sequence recovery declines with the increase of the number of interspecific SNPs in the 25b interval tiled by the reference oligonucleotides. In the gorilla genome, which differs from the human reference by 10%, and in which 46% of these 25b regions contain 3 or more SNP differences from the reference, only 88% of the sequence is recoverable. In the codfish genome, which differs from the reference by > 30%, less than 4% of the sequence is recoverable, in short islands ≥ 12b that are conserved between primates and fish. Conclusion Experimental DNAs bind inefficiently to homologous reference oligonucleotide sets on a re-sequencing microarray when their sequences differ by

  4. A genomic scale map of genetic diversity in Trypanosoma cruzi

    Directory of Open Access Journals (Sweden)

    Ackermann Alejandro A

    2012-12-01

    Full Text Available Abstract Background Trypanosoma cruzi, the causal agent of Chagas Disease, affects more than 16 million people in Latin America. The clinical outcome of the disease results from a complex interplay between environmental factors and the genetic background of both the human host and the parasite. However, knowledge of the genetic diversity of the parasite, is currently limited to a number of highly studied loci. The availability of a number of genomes from different evolutionary lineages of T. cruzi provides an unprecedented opportunity to look at the genetic diversity of the parasite at a genomic scale. Results Using a bioinformatic strategy, we have clustered T. cruzi sequence data available in the public domain and obtained multiple sequence alignments in which one or two alleles from the reference CL-Brener were included. These data covers 4 major evolutionary lineages (DTUs: TcI, TcII, TcIII, and the hybrid TcVI. Using these set of alignments we have identified 288,957 high quality single nucleotide polymorphisms and 1,480 indels. In a reduced re-sequencing study we were able to validate ~ 97% of high-quality SNPs identified in 47 loci. Analysis of how these changes affect encoded protein products showed a 0.77 ratio of synonymous to non-synonymous changes in the T. cruzi genome. We observed 113 changes that introduce or remove a stop codon, some causing significant functional changes, and a number of tri-allelic and tetra-allelic SNPs that could be exploited in strain typing assays. Based on an analysis of the observed nucleotide diversity we show that the T. cruzi genome contains a core set of genes that are under apparent purifying selection. Interestingly, orthologs of known druggable targets show statistically significant lower nucleotide diversity values. Conclusions This study provides the first look at the genetic diversity of T. cruzi at a genomic scale. The analysis covers an estimated ~ 60% of the genetic diversity present in the

  5. Rare and common regulatory variation in population-scale sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Stephen B Montgomery

    2011-07-01

    Full Text Available Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

  6. Extreme-Scale De Novo Genome Assembly

    Energy Technology Data Exchange (ETDEWEB)

    Georganas, Evangelos [Intel Corporation, Santa Clara, CA (United States); Hofmeyr, Steven [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Egan, Rob [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.; Rokhsar, Daniel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Joint Genome Inst.

    2017-09-26

    De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and the large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.

  7. Targeted capture and resequencing of 1040 genes reveal environmentally driven functional variation in grey wolves.

    Science.gov (United States)

    Schweizer, Rena M; Robinson, Jacqueline; Harrigan, Ryan; Silva, Pedro; Galverni, Marco; Musiani, Marco; Green, Richard E; Novembre, John; Wayne, Robert K

    2016-01-01

    In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations. © 2015 John Wiley & Sons Ltd.

  8. Resequencing at ≥40-Fold Depth of the Parental Genomes of a Solanum lycopersicum × S. pimpinellifolium Recombinant Inbred Line Population and Characterization of Frame-Shift InDels That Are Highly Likely to Perturb Protein Function.

    Science.gov (United States)

    Kevei, Zoltan; King, Robert C; Mohareb, Fady; Sergeant, Martin J; Awan, Sajjad Z; Thompson, Andrew J

    2015-03-24

    A recombinant in-bred line population derived from a cross between Solanum lycopersicum var. cerasiforme (E9) and S. pimpinellifolium (L5) has been used extensively to discover quantitative trait loci (QTL), including those that act via rootstock genotype, however, high-resolution single-nucleotide polymorphism genotyping data for this population are not yet publically available. Next-generation resequencing of parental lines allows the vast majority of polymorphisms to be characterized and used to progress from QTL to causative gene. We sequenced E9 and L5 genomes to 40- and 44-fold depth, respectively, and reads were mapped to the reference Heinz 1706 genome. In L5 there were three clear regions on chromosome 1, chromosome 4, and chromosome 8 with increased rates of polymorphism. Two other regions were highly polymorphic when we compared Heinz 1706 with both E9 and L5 on chromosome 1 and chromosome 10, suggesting that the reference sequence contains a divergent introgression in these locations. We also identified a region on chromosome 4 consistent with an introgression from S. pimpinellifolium into Heinz 1706. A large dataset of polymorphisms for the use in fine-mapping QTL in a specific tomato recombinant in-bred line population was created, including a high density of InDels validated as simple size-based polymerase chain reaction markers. By careful filtering and interpreting the SnpEff prediction tool, we have created a list of genes that are predicted to have highly perturbed protein functions in the E9 and L5 parental lines. Copyright © 2015 Kevei et al.

  9. Genome scale engineering techniques for metabolic engineering.

    Science.gov (United States)

    Liu, Rongming; Bassalo, Marcelo C; Zeitoun, Ramsey I; Gill, Ryan T

    2015-11-01

    Metabolic engineering has expanded from a focus on designs requiring a small number of genetic modifications to increasingly complex designs driven by advances in genome-scale engineering technologies. Metabolic engineering has been generally defined by the use of iterative cycles of rational genome modifications, strain analysis and characterization, and a synthesis step that fuels additional hypothesis generation. This cycle mirrors the Design-Build-Test-Learn cycle followed throughout various engineering fields that has recently become a defining aspect of synthetic biology. This review will attempt to summarize recent genome-scale design, build, test, and learn technologies and relate their use to a range of metabolic engineering applications. Copyright © 2015 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.

  10. Methodology for the analysis of rare genetic variation in genome-wide association and re-sequencing studies of complex human traits.

    Science.gov (United States)

    Moutsianas, Loukas; Morris, Andrew P

    2014-09-01

    Genome-wide association studies have been successful in identifying common variants that impact complex human traits and diseases. However, despite this success, the joint effects of these variants explain only a small proportion of the genetic variance in these phenotypes, leading to speculation that rare genetic variation might account for much of the 'missing heritability'. Consequently, there has been an exciting period of research and development into the methodology for the analysis of rare genetic variants, typically by considering their joint effects on complex traits within the same functional unit or genomic region. In this review, we describe a general framework for modelling the joint effects of rare genetic variants on complex traits in association studies of unrelated individuals. We summarise a range of widely used association tests that have been developed from this model and provide an overview of the relative performance of these approaches from published simulation studies. © The Author 2014. Published by Oxford University Press.

  11. Towards dynamic genome-scale models.

    Science.gov (United States)

    Gilbert, David; Heiner, Monika; Jayaweera, Yasoda; Rohr, Christian

    2017-10-13

    The analysis of the dynamic behaviour of genome-scale models of metabolism (GEMs) currently presents considerable challenges because of the difficulties of simulating such large and complex networks. Bacterial GEMs can comprise about 5000 reactions and metabolites, and encode a huge variety of growth conditions; such models cannot be used without sophisticated tool support. This article is intended to aid modellers, both specialist and non-specialist in computerized methods, to identify and apply a suitable combination of tools for the dynamic behaviour analysis of large-scale metabolic designs. We describe a methodology and related workflow based on publicly available tools to profile and analyse whole-genome-scale biochemical models. We use an efficient approximative stochastic simulation method to overcome problems associated with the dynamic simulation of GEMs. In addition, we apply simulative model checking using temporal logic property libraries, clustering and data analysis, over time series of reaction rates and metabolite concentrations. We extend this to consider the evolution of reaction-oriented properties of subnets over time, including dead subnets and functional subsystems. This enables the generation of abstract views of the behaviour of these models, which can be large-up to whole genome in size-and therefore impractical to analyse informally by eye. We demonstrate our methodology by applying it to a reduced model of the whole-genome metabolism of Escherichia coli K-12 under different growth conditions. The overall context of our work is in the area of model-based design methods for metabolic engineering and synthetic biology. © The Author 2017. Published by Oxford University Press.

  12. Implementation and Data Analysis of Tn-seq, Whole-Genome Resequencing, and Single-Molecule Real-Time Sequencing for Bacterial Genetics.

    Science.gov (United States)

    Burby, Peter E; Nye, Taylor M; Schroeder, Jeremy W; Simmons, Lyle A

    2017-01-01

    Few discoveries have been more transformative to the biological sciences than the development of DNA sequencing technologies. The rapid advancement of sequencing and bioinformatics tools has revolutionized bacterial genetics, deepening our understanding of model and clinically relevant organisms. Although application of newer sequencing technologies to studies in bacterial genetics is increasing, the implementation of DNA sequencing technologies and development of the bioinformatics tools required for analyzing the large data sets generated remain a challenge for many. In this minireview, we have chosen to summarize three sequencing approaches that are particularly useful for bacterial genetics. We provide resources for scientists new to and interested in their application. Here, we discuss the analysis of data from transposon mutagenesis followed by deep sequencing (Tn-seq) to determine gene disruptions differentially represented in a mutant population and Illumina sequencing for identification of suppressor or other mutations, and we summarize single-molecule real-time (SMRT) sequencing for de novo genome assembly and the use of the output data for detection of DNA base modifications. Copyright © 2016 American Society for Microbiology.

  13. Genome scale metabolic modeling of cancer

    DEFF Research Database (Denmark)

    Nilsson, Avlant; Nielsen, Jens

    2017-01-01

    Cancer cells reprogram metabolism to support rapid proliferation and survival. Energy metabolism is particularly important for growth and genes encoding enzymes involved in energy metabolism are frequently altered in cancer cells. A genome scale metabolic model (GEM) is a mathematical formalization...... of metabolism which allows simulation and hypotheses testing of metabolic strategies. It has successfully been applied to many microorganisms and is now used to study cancer metabolism. Generic models of human metabolism have been reconstructed based on the existence of metabolic genes in the human genome....... Cancer specific models of metabolism have also been generated by reducing the number of reactions in the generic model based on high throughput expression data, e.g. transcriptomics and proteomics. Targets for drugs and bio markers for diagnostics have been identified using these models. They have also...

  14. Modeling cancer metabolism on a genome scale

    Science.gov (United States)

    Yizhak, Keren; Chaneton, Barbara; Gottlieb, Eyal; Ruppin, Eytan

    2015-01-01

    Cancer cells have fundamentally altered cellular metabolism that is associated with their tumorigenicity and malignancy. In addition to the widely studied Warburg effect, several new key metabolic alterations in cancer have been established over the last decade, leading to the recognition that altered tumor metabolism is one of the hallmarks of cancer. Deciphering the full scope and functional implications of the dysregulated metabolism in cancer requires both the advancement of a variety of omics measurements and the advancement of computational approaches for the analysis and contextualization of the accumulated data. Encouragingly, while the metabolic network is highly interconnected and complex, it is at the same time probably the best characterized cellular network. Following, this review discusses the challenges that genome-scale modeling of cancer metabolism has been facing. We survey several recent studies demonstrating the first strides that have been done, testifying to the value of this approach in portraying a network-level view of the cancer metabolism and in identifying novel drug targets and biomarkers. Finally, we outline a few new steps that may further advance this field. PMID:26130389

  15. Analysis of Genome-Scale Data

    NARCIS (Netherlands)

    Kemmeren, P.P.C.W.

    2005-01-01

    The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has

  16. Automated multiplex genome-scale engineering in yeast

    Science.gov (United States)

    Si, Tong; Chao, Ran; Min, Yuhao; Wu, Yuying; Ren, Wen; Zhao, Huimin

    2017-01-01

    Genome-scale engineering is indispensable in understanding and engineering microorganisms, but the current tools are mainly limited to bacterial systems. Here we report an automated platform for multiplex genome-scale engineering in Saccharomyces cerevisiae, an important eukaryotic model and widely used microbial cell factory. Standardized genetic parts encoding overexpression and knockdown mutations of >90% yeast genes are created in a single step from a full-length cDNA library. With the aid of CRISPR-Cas, these genetic parts are iteratively integrated into the repetitive genomic sequences in a modular manner using robotic automation. This system allows functional mapping and multiplex optimization on a genome scale for diverse phenotypes including cellulase expression, isobutanol production, glycerol utilization and acetic acid tolerance, and may greatly accelerate future genome-scale engineering endeavours in yeast. PMID:28469255

  17. Cloud-Scale Genomic Signals Processing for Robust Large-Scale Cancer Genomic Microarray Data Analysis.

    Science.gov (United States)

    Harvey, Benjamin Simeon; Ji, Soo-Yeon

    2017-01-01

    As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring forth oncological inference to the bioinformatics community through the analysis of large-scale cancer genomic (LSCG) DNA and mRNA microarray data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological interpretation by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale distributed parallel (CSDP) separable 1-D wavelet decomposition technique for denoising through differential expression thresholding and classification of LSCG microarray data. This research presents a novel methodology that utilizes a CSDP separable 1-D method for wavelet-based transformation in order to initialize a threshold which will retain significantly expressed genes through the denoising process for robust classification of cancer patients. Additionally, the overall study was implemented and encompassed within CSDP environment. The utilization of cloud computing and wavelet-based thresholding for denoising was used for the classification of samples within the Global Cancer Map, Cancer Cell Line Encyclopedia, and The Cancer Genome Atlas. The results proved that separable 1-D parallel distributed wavelet denoising in the cloud and differential expression thresholding increased the computational performance and enabled the generation of higher quality LSCG microarray datasets, which led to more accurate classification results.

  18. The roles of whole-genome and small-scale duplications in the functional specialization of Saccharomyces cerevisiae genes.

    Directory of Open Access Journals (Sweden)

    Mario A Fares

    Full Text Available Researchers have long been enthralled with the idea that gene duplication can generate novel functions, crediting this process with great evolutionary importance. Empirical data shows that whole-genome duplications (WGDs are more likely to be retained than small-scale duplications (SSDs, though their relative contribution to the functional fate of duplicates remains unexplored. Using the map of genetic interactions and the re-sequencing of 27 Saccharomyces cerevisiae genomes evolving for 2,200 generations we show that SSD-duplicates lead to neo-functionalization while WGD-duplicates partition ancestral functions. This conclusion is supported by: (a SSD-duplicates establish more genetic interactions than singletons and WGD-duplicates; (b SSD-duplicates copies share more interaction-partners than WGD-duplicates copies; (c WGD-duplicates interaction partners are more functionally related than SSD-duplicates partners; (d SSD-duplicates gene copies are more functionally divergent from one another, while keeping more overlapping functions, and diverge in their sub-cellular locations more than WGD-duplicates copies; and (e SSD-duplicates complement their functions to a greater extent than WGD-duplicates. We propose a novel model that uncovers the complexity of evolution after gene duplication.

  19. Using Genome-scale Models to Predict Biological Capabilities

    DEFF Research Database (Denmark)

    O’Brien, Edward J.; Monk, Jonathan M.; Palsson, Bernhard O.

    2015-01-01

    Constraint-based reconstruction and analysis (COBRA) methods at the genome scale have been under development since the first whole-genome sequences appeared in the mid-1990s. A few years ago, this approach began to demonstrate the ability to predict a range of cellular functions, including cellular...

  20. Exploring Networks at the genome scale

    NARCIS (Netherlands)

    Lam, M.C.; Puchalka, J.; Diez, M.S.; Martins Dos Santos, V.A.P.

    2010-01-01

    Systems biology is aimed at achieving a holistic understanding of living organisms, while synthetic biology seeks to design and construct new living organisms with targeted functionalities. Genome sequencing and the fields of ‘omics’ technology have proven a goldmine of information for scientists

  1. Use of genome-scale microbial models for metabolic engineering

    DEFF Research Database (Denmark)

    Patil, Kiran Raosaheb; Åkesson, M.; Nielsen, Jens

    2004-01-01

    network structures. The major challenge for metabolic engineering in the post-genomic era is to broaden its design methodologies to incorporate genome-scale biological data. Genome-scale stoichiometric models of microorganisms represent a first step in this direction.......Metabolic engineering serves as an integrated approach to design new cell factories by providing rational design procedures and valuable mathematical and experimental tools. Mathematical models have an important role for phenotypic analysis, but can also be used for the design of optimal metabolic...

  2. In the fast lane: large-scale bacterial genome engineering.

    Science.gov (United States)

    Fehér, Tamás; Burland, Valerie; Pósfai, György

    2012-07-31

    The last few years have witnessed rapid progress in bacterial genome engineering. The long-established, standard ways of DNA synthesis, modification, transfer into living cells, and incorporation into genomes have given way to more effective, large-scale, robust genome modification protocols. Expansion of these engineering capabilities is due to several factors. Key advances include: (i) progress in oligonucleotide synthesis and in vitro and in vivo assembly methods, (ii) optimization of recombineering techniques, (iii) introduction of parallel, large-scale, combinatorial, and automated genome modification procedures, and (iv) rapid identification of the modifications by barcode-based analysis and sequencing. Combination of the brute force of these techniques with sophisticated bioinformatic design and modeling opens up new avenues for the analysis of gene functions and cellular network interactions, but also in engineering more effective producer strains. This review presents a summary of recent technological advances in bacterial genome engineering. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. The OME Framework for genome-scale systems biology

    Energy Technology Data Exchange (ETDEWEB)

    Palsson, Bernhard O. [Univ. of California, San Diego, CA (United States); Ebrahim, Ali [Univ. of California, San Diego, CA (United States); Federowicz, Steve [Univ. of California, San Diego, CA (United States)

    2014-12-19

    The life sciences are undergoing continuous and accelerating integration with computational and engineering sciences. The biology that many in the field have been trained on may be hardly recognizable in ten to twenty years. One of the major drivers for this transformation is the blistering pace of advancements in DNA sequencing and synthesis. These advances have resulted in unprecedented amounts of new data, information, and knowledge. Many software tools have been developed to deal with aspects of this transformation and each is sorely needed [1-3]. However, few of these tools have been forced to deal with the full complexity of genome-scale models along with high throughput genome- scale data. This particular situation represents a unique challenge, as it is simultaneously necessary to deal with the vast breadth of genome-scale models and the dizzying depth of high-throughput datasets. It has been observed time and again that as the pace of data generation continues to accelerate, the pace of analysis significantly lags behind [4]. It is also evident that, given the plethora of databases and software efforts [5-12], it is still a significant challenge to work with genome-scale metabolic models, let alone next-generation whole cell models [13-15]. We work at the forefront of model creation and systems scale data generation [16-18]. The OME Framework was borne out of a practical need to enable genome-scale modeling and data analysis under a unified framework to drive the next generation of genome-scale biological models. Here we present the OME Framework. It exists as a set of Python classes. However, we want to emphasize the importance of the underlying design as an addition to the discussions on specifications of a digital cell. A great deal of work and valuable progress has been made by a number of communities [13, 19-24] towards interchange formats and implementations designed to achieve similar goals. While many software tools exist for handling genome-scale

  4. Reconstructing genome-scale metabolic models with merlin.

    Science.gov (United States)

    Dias, Oscar; Rocha, Miguel; Ferreira, Eugénio C; Rocha, Isabel

    2015-04-30

    The Metabolic Models Reconstruction Using Genome-Scale Information (merlin) tool is a user-friendly Java application that aids the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced. It performs the major steps of the reconstruction process, including the functional genomic annotation of the whole genome and subsequent construction of the portfolio of reactions. Moreover, merlin includes tools for the identification and annotation of genes encoding transport proteins, generating the transport reactions for those carriers. It also performs the compartmentalisation of the model, predicting the organelle localisation of the proteins encoded in the genome and thus the localisation of the metabolites involved in the reactions promoted by such enzymes. The gene-proteins-reactions (GPR) associations are automatically generated and included in the model. Finally, merlin expedites the transition from genomic data to draft metabolic models reconstructions exported in the SBML standard format, allowing the user to have a preliminary view of the biochemical network, which can be manually curated within the environment provided by merlin. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Analysing human genomes at different scales

    DEFF Research Database (Denmark)

    Liu, Siyang

    genotypes in the Danish Genetics of Overweight Young Adults (GOYA) obesity cohort and prove the clinical usage of the Danish reference panel in genomewide association studies. In the second project, we have collected ultra-low depth sequencing data of more than 140, 000 Chinese pregnant women. We developed...... and applied novel methods to analysing the data that are accumulating rapidly and now reach millions of sample scale. We show that we are able to discover mutations with allele frequencies down to around 0.2% and to explore fine-scale population structure and ancestry across the 31 administrative divisions...

  6. Beyond the HapMap Genotypic Data: Prospects of Deep Resequencing Projects.

    Science.gov (United States)

    Zhang, Wei; Dolan, M Eileen

    2008-09-01

    The International HapMap Project provides a key resource of genotypic data on human samples including lymphoblastoid cell lines derived from individuals of four major world populations of African, European, Japanese and Chinese ancestry. Researchers have utilized this resource to identify genetic elements that correlate with various phenotypes such as risks of common diseases, individual drug response and gene expression variation. However, recent comparative studies have suggested that the currently available HapMap genotypic data may not capture a substantial proportion of rare or untyped SNPs in these populations, implying that the HapMap SNPs may not be sufficient for comprehensive association studies. In this paper, three large-scale deep resequencing projects covering the HapMap samples: ENCODE (Encyclopedia of DNA Elements), SeattleSNPs and NIEHS (National Institute of Environmental Health Sciences) Environmental Genome Project are discussed. Prospectively, once integrated with the HapMap resource, these efforts will greatly benefit the next wave of association studies and data mining using these cell lines.

  7. Genome-scale engineering for systems and synthetic biology.

    Science.gov (United States)

    Esvelt, Kevin M; Wang, Harris H

    2013-01-01

    Genome-modification technologies enable the rational engineering and perturbation of biological systems. Historically, these methods have been limited to gene insertions or mutations at random or at a few pre-defined locations across the genome. The handful of methods capable of targeted gene editing suffered from low efficiencies, significant labor costs, or both. Recent advances have dramatically expanded our ability to engineer cells in a directed and combinatorial manner. Here, we review current technologies and methodologies for genome-scale engineering, discuss the prospects for extending efficient genome modification to new hosts, and explore the implications of continued advances toward the development of flexibly programmable chasses, novel biochemistries, and safer organismal and ecological engineering.

  8. Genome-scale engineering for systems and synthetic biology

    Science.gov (United States)

    Esvelt, Kevin M; Wang, Harris H

    2013-01-01

    Genome-modification technologies enable the rational engineering and perturbation of biological systems. Historically, these methods have been limited to gene insertions or mutations at random or at a few pre-defined locations across the genome. The handful of methods capable of targeted gene editing suffered from low efficiencies, significant labor costs, or both. Recent advances have dramatically expanded our ability to engineer cells in a directed and combinatorial manner. Here, we review current technologies and methodologies for genome-scale engineering, discuss the prospects for extending efficient genome modification to new hosts, and explore the implications of continued advances toward the development of flexibly programmable chasses, novel biochemistries, and safer organismal and ecological engineering. PMID:23340847

  9. Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks

    OpenAIRE

    Fabian Fröhlich; Barbara Kaltenbacher; Theis, Fabian J; Jan Hasenauer

    2017-01-01

    Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model ...

  10. Multi-scale structural community organisation of the human genome.

    Science.gov (United States)

    Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin

    2017-04-11

    Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.

  11. A population genetic approach to mapping neurological disorder genes using deep resequencing.

    Directory of Open Access Journals (Sweden)

    Rachel A Myers

    2011-02-01

    Full Text Available Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders. Here, we present the results from a large-sample resequencing (n  =  285 patients study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia. Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts. In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone. Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls. This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies. Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.

  12. Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks.

    Science.gov (United States)

    Fröhlich, Fabian; Kaltenbacher, Barbara; Theis, Fabian J; Hasenauer, Jan

    2017-01-01

    Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics.

  13. Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks.

    Directory of Open Access Journals (Sweden)

    Fabian Fröhlich

    2017-01-01

    Full Text Available Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics.

  14. Tag SNP selection for candidate gene association studies using HapMap and gene resequencing data.

    Science.gov (United States)

    Xu, Zongli; Kaplan, Norman L; Taylor, Jack A

    2007-10-01

    HapMap provides linkage disequilibrium (LD) information on a sample of 3.7 million SNPs that can be used for tag SNP selection in whole-genome association studies. HapMap can also be used for tag SNP selection in candidate genes, although its performance has yet to be evaluated against gene resequencing data, where there is near-complete SNP ascertainment. The Environmental Genome Project (EGP) is the largest gene resequencing effort to date with over 500 resequenced genes. We used HapMap data to select tag SNPs and calculated the proportions of common SNPs (MAF>or=0.05) tagged (rho2>or=0.8) for each of 127 EGP Panel 2 genes where individual ethnic information was available. Median gene-tagging proportions are 50, 80 and 74% for African, Asian, and European groups, respectively. These low gene-tagging proportions may be problematic for some candidate gene studies. In addition, although HapMap targeted nonsynonymous SNPs (nsSNPs), we estimate only approximately 30% of nonsynonymous SNPs in EGP are in high LD with any HapMap SNP. We show that gene-tagging proportions can be improved by adding a relatively small number of tag SNPs that were selected based on resequencing data. We also demonstrate that ethnic-mixed data can be used to improve HapMap gene-tagging proportions, but are not as efficient as ethnic-specific data. Finally, we generalized the greedy algorithm proposed by Carlson et al (2004) to select tag SNPs for multiple populations and implemented the algorithm into a freely available software package mPopTag.

  15. Automation on the generation of genome-scale metabolic models.

    Science.gov (United States)

    Reyes, R; Gamermann, D; Montagud, A; Fuente, D; Triana, J; Urchueguía, J F; de Córdoba, P Fernández

    2012-12-01

    Nowadays, the reconstruction of genome-scale metabolic models is a nonautomatized and interactive process based on decision making. This lengthy process usually requires a full year of one person's work in order to satisfactory collect, analyze, and validate the list of all metabolic reactions present in a specific organism. In order to write this list, one manually has to go through a huge amount of genomic, metabolomic, and physiological information. Currently, there is no optimal algorithm that allows one to automatically go through all this information and generate the models taking into account probabilistic criteria of unicity and completeness that a biologist would consider. This work presents the automation of a methodology for the reconstruction of genome-scale metabolic models for any organism. The methodology that follows is the automatized version of the steps implemented manually for the reconstruction of the genome-scale metabolic model of a photosynthetic organism, Synechocystis sp. PCC6803. The steps for the reconstruction are implemented in a computational platform (COPABI) that generates the models from the probabilistic algorithms that have been developed. For validation of the developed algorithm robustness, the metabolic models of several organisms generated by the platform have been studied together with published models that have been manually curated. Network properties of the models, like connectivity and average shortest mean path of the different models, have been compared and analyzed.

  16. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Science.gov (United States)

    Du, Jiang; Bjornson, Robert D; Zhang, Zhengdong D; Kong, Yong; Snyder, Michael; Gerstein, Mark B

    2009-07-01

    The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at

  17. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Directory of Open Access Journals (Sweden)

    Jiang Du

    2009-07-01

    Full Text Available The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen, with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs. SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome. To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of

  18. Guidelines for Genome-Scale Analysis of Biological Rhythms.

    Science.gov (United States)

    Hughes, Michael E; Abruzzi, Katherine C; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M Fernanda; Chen, Zheng; Chiu, Joanna C; Cox, Juergen; Crowell, Alexander M; DeBruyne, Jason P; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J; Duffield, Giles E; Dunlap, Jay C; Eckel-Mahan, Kristin; Esser, Karyn A; FitzGerald, Garret A; Forger, Daniel B; Francey, Lauren J; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H; Herzel, Hanspeter; Herzog, Erik D; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J; Hurley, Jennifer M; de la Iglesia, Horacio O; Johnson, Carl; Kay, Steve A; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A; Li, Jiajia; Li, Xiaodong; Liu, Andrew C; Loros, Jennifer J; Martino, Tami A; Menet, Jerome S; Merrow, Martha; Millar, Andrew J; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N; Olmedo, Maria; Nusinow, Dmitri A; Ptáček, Louis J; Rand, David; Reddy, Akhilesh B; Robles, Maria S; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D; Rund, Samuel S C; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J; Storch, Kai-Florian; Takahashi, Joseph S; Ueda, Hiroki R; Wang, Han; Weitz, Charles; Westermark, Pål O; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B

    2017-10-01

    Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding "big data" that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.

  19. Power Laws, Scale-Free Networks and Genome Biology

    CERN Document Server

    Koonin, Eugene V; Karev, Georgy P

    2006-01-01

    Power Laws, Scale-free Networks and Genome Biology deals with crucial aspects of the theoretical foundations of systems biology, namely power law distributions and scale-free networks which have emerged as the hallmarks of biological organization in the post-genomic era. The chapters in the book not only describe the interesting mathematical properties of biological networks but moves beyond phenomenology, toward models of evolution capable of explaining the emergence of these features. The collection of chapters, contributed by both physicists and biologists, strives to address the problems in this field in a rigorous but not excessively mathematical manner and to represent different viewpoints, which is crucial in this emerging discipline. Each chapter includes, in addition to technical descriptions of properties of biological networks and evolutionary models, a more general and accessible introduction to the respective problems. Most chapters emphasize the potential of theoretical systems biology for disco...

  20. Next-generation genome-scale models for metabolic engineering

    DEFF Research Database (Denmark)

    King, Zachary A.; Lloyd, Colton J.; Feist, Adam M.

    2015-01-01

    Constraint-based reconstruction and analysis (COBRA) methods have become widely used tools for metabolic engineering in both academic and industrial laboratories. By employing a genome-scale in silico representation of the metabolic network of a host organism, COBRA methods can be used to predict...... examples of applying COBRA methods to strain optimization are presented and discussed. Then, an outlook is provided on the next generation of COBRA models and the new types of predictions they will enable for systems metabolic engineering....

  1. Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network

    DEFF Research Database (Denmark)

    Förster, Jochen; Famili, I.; Fu, P.

    2003-01-01

    and the environment were included. A total of 708 structural open reading frames (ORFs) were accounted for in the reconstructed network, corresponding to 1035 metabolic reactions. Further, 140 reactions were included on the basis of biochemical evidence resulting in a genome-scale reconstructed metabolic network...... with Escherichia coli. The reconstructed metabolic network is the first comprehensive network for a eukaryotic organism, and it may be used as the basis for in silico analysis of phenotypic functions....

  2. Systems metabolic engineering: Genome-scale models and beyond

    Science.gov (United States)

    Blazeck, John; Alper, Hal

    2010-01-01

    The advent of high throughput genome-scale bioinformatics has led to an exponential increase in available cellular system data. Systems metabolic engineering attempts to use data-driven approaches – based on the data collected with high throughput technologies – to identify gene targets and optimize phenotypical properties on a systems level. Current systems metabolic engineering tools are limited for predicting and defining complex phenotypes such as chemical tolerances and other global, multigenic traits. The most pragmatic systems-based tool for metabolic engineering to arise is the in silico genome-scale metabolic reconstruction. This tool has seen wide adoption for modeling cell growth and predicting beneficial gene knockouts, and we examine here how this approach can be expanded for novel organisms. This review will highlight advances of the systems metabolic engineering approach with a focus on de novo development and use of genome-scale metabolic reconstructions for metabolic engineering applications. We will then discuss the challenges and prospects for this emerging field to enable model-based metabolic engineering. Specifically, we argue that current state-of-the-art systems metabolic engineering techniques represent a viable first step for improving product yield that still must be followed by combinatorial techniques or random strain mutagenesis to achieve optimal cellular systems. PMID:20151446

  3. Systems metabolic engineering: genome-scale models and beyond.

    Science.gov (United States)

    Blazeck, John; Alper, Hal

    2010-07-01

    The advent of high throughput genome-scale bioinformatics has led to an exponential increase in available cellular system data. Systems metabolic engineering attempts to use data-driven approaches--based on the data collected with high throughput technologies--to identify gene targets and optimize phenotypical properties on a systems level. Current systems metabolic engineering tools are limited for predicting and defining complex phenotypes such as chemical tolerances and other global, multigenic traits. The most pragmatic systems-based tool for metabolic engineering to arise is the in silico genome-scale metabolic reconstruction. This tool has seen wide adoption for modeling cell growth and predicting beneficial gene knockouts, and we examine here how this approach can be expanded for novel organisms. This review will highlight advances of the systems metabolic engineering approach with a focus on de novo development and use of genome-scale metabolic reconstructions for metabolic engineering applications. We will then discuss the challenges and prospects for this emerging field to enable model-based metabolic engineering. Specifically, we argue that current state-of-the-art systems metabolic engineering techniques represent a viable first step for improving product yield that still must be followed by combinatorial techniques or random strain mutagenesis to achieve optimal cellular systems.

  4. High throughput genetic analysis of congenital myasthenic syndromes using resequencing microarrays.

    Directory of Open Access Journals (Sweden)

    Lisa Denning

    Full Text Available BACKGROUND: The use of resequencing microarrays for screening multiple, candidate disease loci is a promising alternative to conventional capillary sequencing. We describe the performance of a custom resequencing microarray for mutational analysis of Congenital Myasthenic Syndromes (CMSs, a group of disorders in which the normal process of neuromuscular transmission is impaired. METHODOLOGY/PRINCIPAL FINDINGS: Our microarray was designed to assay the exons and flanking intronic regions of 8 genes linked to CMSs. A total of 31 microarrays were hybridized with genomic DNA from either individuals with known CMS mutations or from healthy controls. We estimated an overall microarray call rate of 93.61%, and we found the percentage agreement between the microarray and capillary sequencing techniques to be 99.95%. In addition, our microarray exhibited 100% specificity and 99.99% reproducibility. Finally, the microarray detected 22 out of the 23 known missense mutations, but it failed to detect all 7 known insertion and deletion (indels mutations, indicating an overall sensitivity of 73.33% and a sensitivity with respect to missense mutations of 95.65%. CONCLUSIONS/SIGNIFICANCE: Overall, our microarray prototype exhibited strong performance and proved highly efficient for screening genes associated with CMSs. Until indels can be efficiently assayed with this technology, however, we recommend using resequencing microarrays for screening CMS mutations after common indels have been first assayed by capillary sequencing.

  5. Genomic divergences among cattle, dog and human estimated from large-scale alignments of genomic sequences

    Directory of Open Access Journals (Sweden)

    Shade Larry L

    2006-06-01

    Full Text Available Abstract Background Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages. Results Optimal three-way multi-species global sequence alignments for 84 cattle clones or loci (each >50 kb of genomic sequence were constructed using the human and dog genome assemblies as references. Genomic divergences and substitution rates were examined for each clone and for various sequence classes under different functional constraints. Analysis of these alignments revealed that the overall genomic divergences are relatively constant (0.32–0.37 change/site for pairwise comparisons among cattle, dog and human; however substitution rates vary across genomic regions and among different sequence classes. A neutral mutation rate (2.0–2.2 × 10(-9 change/site/year was derived from ancestral repetitive sequences, whereas the substitution rate in coding sequences (1.1 × 10(-9 change/site/year was approximately half of the overall rate (1.9–2.0 × 10(-9 change/site/year. Relative rate tests also indicated that cattle have a significantly faster rate of substitution as compared to dog and that this difference is about 6%. Conclusion This analysis provides a large-scale and unbiased assessment of genomic divergences and regional variation of substitution rates among cattle, dog and human. It is expected that these data will serve as a baseline for future mammalian molecular evolution studies.

  6. Genome-scale rates of evolutionary change in bacteria.

    Science.gov (United States)

    Duchêne, Sebastian; Holt, Kathryn E; Weill, François-Xavier; Le Hello, Simon; Hawkey, Jane; Edwards, David J; Fourment, Mathieu; Holmes, Edward C

    2016-11-01

    Estimating the rates at which bacterial genomes evolve is critical to understanding major evolutionary and ecological processes such as disease emergence, long-term host-pathogen associations and short-term transmission patterns. The surge in bacterial genomic data sets provides a new opportunity to estimate these rates and reveal the factors that shape bacterial evolutionary dynamics. For many organisms estimates of evolutionary rate display an inverse association with the time-scale over which the data are sampled. However, this relationship remains unexplored in bacteria due to the difficulty in estimating genome-wide evolutionary rates, which are impacted by the extent of temporal structure in the data and the prevalence of recombination. We collected 36 whole genome sequence data sets from 16 species of bacterial pathogens to systematically estimate and compare their evolutionary rates and assess the extent of temporal structure in the absence of recombination. The majority (28/36) of data sets possessed sufficient clock-like structure to robustly estimate evolutionary rates. However, in some species reliable estimates were not possible even with 'ancient DNA' data sampled over many centuries, suggesting that they evolve very slowly or that they display extensive rate variation among lineages. The robustly estimated evolutionary rates spanned several orders of magnitude, from approximately 10-5 to 10-8 nucleotide substitutions per site year-1. This variation was negatively associated with sampling time, with this relationship best described by an exponential decay curve. To avoid potential estimation biases, such time-dependency should be considered when inferring evolutionary time-scales in bacteria.

  7. Pangolin genomes and the evolution of mammalian scales and immunity.

    Science.gov (United States)

    Choo, Siew Woh; Rayko, Mike; Tan, Tze King; Hari, Ranjeev; Komissarov, Aleksey; Wee, Wei Yee; Yurchenko, Andrey A; Kliver, Sergey; Tamazian, Gaik; Antunes, Agostinho; Wilson, Richard K; Warren, Wesley C; Koepfli, Klaus-Peter; Minx, Patrick; Krasheninnikova, Ksenia; Kotze, Antoinette; Dalton, Desire L; Vermaak, Elaine; Paterson, Ian C; Dobrynin, Pavel; Sitam, Frankie Thomas; Rovie-Ryan, Jeffrine J; Johnson, Warren E; Yusoff, Aini Mohamed; Luo, Shu-Jin; Karuppannan, Kayal Vizi; Fang, Gang; Zheng, Deyou; Gerstein, Mark B; Lipovich, Leonard; O'Brien, Stephen J; Wong, Guat Jah

    2016-10-01

    Pangolins, unique mammals with scales over most of their body, no teeth, poor vision, and an acute olfactory system, comprise the only placental order (Pholidota) without a whole-genome map. To investigate pangolin biology and evolution, we developed genome assemblies of the Malayan (Manis javanica) and Chinese (M. pentadactyla) pangolins. Strikingly, we found that interferon epsilon (IFNE), exclusively expressed in epithelial cells and important in skin and mucosal immunity, is pseudogenized in all African and Asian pangolin species that we examined, perhaps impacting resistance to infection. We propose that scale development was an innovation that provided protection against injuries or stress and reduced pangolin vulnerability to infection. Further evidence of specialized adaptations was evident from positively selected genes involving immunity-related pathways, inflammation, energy storage and metabolism, muscular and nervous systems, and scale/hair development. Olfactory receptor gene families are significantly expanded in pangolins, reflecting their well-developed olfaction system. This study provides insights into mammalian adaptation and functional diversification, new research tools and questions, and perhaps a new natural IFNE-deficient animal model for studying mammalian immunity. © 2016 Choo et al.; Published by Cold Spring Harbor Laboratory Press.

  8. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    Energy Technology Data Exchange (ETDEWEB)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  9. Genome-scale constraint-based modeling of Geobacter metallireducens

    Directory of Open Access Journals (Sweden)

    Famili Iman

    2009-01-01

    Full Text Available Abstract Background Geobacter metallireducens was the first organism that can be grown in pure culture to completely oxidize organic compounds with Fe(III oxide serving as electron acceptor. Geobacter species, including G. sulfurreducens and G. metallireducens, are used for bioremediation and electricity generation from waste organic matter and renewable biomass. The constraint-based modeling approach enables the development of genome-scale in silico models that can predict the behavior of complex biological systems and their responses to the environments. Such a modeling approach was applied to provide physiological and ecological insights on the metabolism of G. metallireducens. Results The genome-scale metabolic model of G. metallireducens was constructed to include 747 genes and 697 reactions. Compared to the G. sulfurreducens model, the G. metallireducens metabolic model contains 118 unique reactions that reflect many of G. metallireducens' specific metabolic capabilities. Detailed examination of the G. metallireducens model suggests that its central metabolism contains several energy-inefficient reactions that are not present in the G. sulfurreducens model. Experimental biomass yield of G. metallireducens growing on pyruvate was lower than the predicted optimal biomass yield. Microarray data of G. metallireducens growing with benzoate and acetate indicated that genes encoding these energy-inefficient reactions were up-regulated by benzoate. These results suggested that the energy-inefficient reactions were likely turned off during G. metallireducens growth with acetate for optimal biomass yield, but were up-regulated during growth with complex electron donors such as benzoate for rapid energy generation. Furthermore, several computational modeling approaches were applied to accelerate G. metallireducens research. For example, growth of G. metallireducens with different electron donors and electron acceptors were studied using the genome-scale

  10. Genome-scale metabolic representation of Amycolatopsis balhimycina

    DEFF Research Database (Denmark)

    Vongsangnak, Wanwipa; Figueiredo, L. F.; Förster, Jochen

    2012-01-01

    Infection caused by methicillin‐resistant Staphylococcus aureus (MRSA) is an increasing societal problem. Typically, glycopeptide antibiotics are used in the treatment of these infections. The most comprehensively studied glycopeptide antibiotic biosynthetic pathway is that of balhimycin biosynth......Infection caused by methicillin‐resistant Staphylococcus aureus (MRSA) is an increasing societal problem. Typically, glycopeptide antibiotics are used in the treatment of these infections. The most comprehensively studied glycopeptide antibiotic biosynthetic pathway is that of balhimycin...... biosynthesis in Amycolatopsis balhimycina. The balhimycin yield obtained by A. balhimycina is, however, low and there is therefore a need to improve balhimycin production. In this study, we performed genome sequencing, assembly and annotation analysis of A. balhimycina and further used these annotated data...... to reconstruct a genome‐scale metabolic model for the organism. Here we generated an almost complete A. balhimycina genome sequence comprising 10,562,587 base pairs assembled into 2,153 contigs. The high GC‐genome (∼69%) includes 8,585 open reading frames (ORFs). We used our integrative toolbox called SEQTOR...

  11. Current state of genome-scale modeling in filamentous fungi.

    Science.gov (United States)

    Brandl, Julian; Andersen, Mikael R

    2015-06-01

    The group of filamentous fungi contains important species used in industrial biotechnology for acid, antibiotics and enzyme production. Their unique lifestyle turns these organisms into a valuable genetic reservoir of new natural products and biomass degrading enzymes that has not been used to full capacity. One of the major bottlenecks in the development of new strains into viable industrial hosts is the alteration of the metabolism towards optimal production. Genome-scale models promise a reduction in the time needed for metabolic engineering by predicting the most potent targets in silico before testing them in vivo. The increasing availability of high quality models and molecular biological tools for manipulating filamentous fungi renders the model-guided engineering of these fungal factories possible with comprehensive metabolic networks. A typical fungal model contains on average 1138 unique metabolic reactions and 1050 ORFs, making them a vast knowledge-base of fungal metabolism. In the present review we focus on the current state as well as potential future applications of genome-scale models in filamentous fungi.

  12. Analysis of Aspergillus nidulans metabolism at the genome-scale

    Directory of Open Access Journals (Sweden)

    Nielsen Jens

    2008-04-01

    Full Text Available Abstract Background Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs in the genome, of which less than 10% were assigned a function. Results In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data

  13. Identification of regulatory RNA in bacterial genomes by genome-scale mapping of transcription start sites.

    Science.gov (United States)

    Singh, Navjot; Wade, Joseph T

    2014-01-01

    The ability to map transcription start sites is critical for studies of gene regulation and for identification of novel RNAs. Conventional RNA-seq is often insufficient for identification of transcription start sites due to low coverage and/or RNA processing events. We have developed a highly sensitive, genome-scale method for detection of transcription start sites in bacteria. This method uses deep sequencing of cDNA libraries to identify transcription start sites with strand specificity at nucleotide resolution. Here, we describe the application of this method for transcription start site identification in Escherichia coli.

  14. Small- and large-scale heterogeneity in genetic variation across the collard flycatcher genome: implications for estimating genetic diversity in nonmodel organisms.

    Science.gov (United States)

    Ingvarsson, Pär K; Wang, Jing

    2017-07-01

    Population genetic studies in nonmodel organisms are often hampered by a lack of reference genomes that are essential for whole-genome resequencing. In the light of this, genotyping methods have been developed to effectively eliminate the need for a reference genome, such as genotyping by sequencing or restriction site-associated DNA sequencing (RAD-seq). However, what remains relatively poorly studied is how accurately these methods capture both average and variation in genetic diversity across an organism's genome. In this issue of Molecular Ecology Resources, Dutoit et al. (2016) use whole-genome resequencing data from the collard flycatcher to assess what factors drive heterogeneity in nucleotide diversity across the genome. Using these data, they then simulate how well different sequencing designs, including RAD sequencing, could capture most of the variation in genetic diversity. They conclude that for evolutionary and conservation-related studies focused on the estimating genomic diversity, researchers should emphasize the number of loci analysed over the number of individuals sequenced. © 2017 John Wiley & Sons Ltd.

  15. Current state of genome-scale modeling in filamentous fungi

    DEFF Research Database (Denmark)

    Brandl, Julian; Andersen, Mikael Rørdam

    2015-01-01

    The group of filamentous fungi contains important species used in industrial biotechnology for acid, antibiotics and enzyme production. Their unique lifestyle turns these organisms into a valuable genetic reservoir of new natural products and biomass degrading enzymes that has not been used to full...... testing them in vivo. The increasing availability of high quality models and molecular biological tools for manipulating filamentous fungi renders the model-guided engineering of these fungal factories possible with comprehensive metabolic networks. A typical fungal model contains on average 1138 unique...... metabolic reactions and 1050 ORFs, making them a vast knowledge-base of fungal metabolism. In the present review we focus on the current state as well as potential future applications of genome-scale models in filamentous fungi....

  16. Next-generation genome-scale models for metabolic engineering.

    Science.gov (United States)

    King, Zachary A; Lloyd, Colton J; Feist, Adam M; Palsson, Bernhard O

    2015-12-01

    Constraint-based reconstruction and analysis (COBRA) methods have become widely used tools for metabolic engineering in both academic and industrial laboratories. By employing a genome-scale in silico representation of the metabolic network of a host organism, COBRA methods can be used to predict optimal genetic modifications that improve the rate and yield of chemical production. A new generation of COBRA models and methods is now being developed--encompassing many biological processes and simulation strategies-and next-generation models enable new types of predictions. Here, three key examples of applying COBRA methods to strain optimization are presented and discussed. Then, an outlook is provided on the next generation of COBRA models and the new types of predictions they will enable for systems metabolic engineering. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    Energy Technology Data Exchange (ETDEWEB)

    Arneodo, Alain, E-mail: alain.arneodo@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Vaillant, Cedric, E-mail: cedric.vaillant@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Audit, Benjamin, E-mail: benjamin.audit@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); Argoul, Francoise, E-mail: francoise.argoul@ens-lyon.f [Universite de Lyon, F-69000 Lyon (France); Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, Ecole Normale Superieure de Lyon, F-69007 Lyon (France); D' Aubenton-Carafa, Yves, E-mail: daubenton@cgm.cnrs-gif.f [Centre de Genetique Moleculaire, CNRS, Allee de la Terrasse, 91198 Gif-sur-Yvette (France); Thermes, Claude, E-mail: claude.thermes@cgm.cnrs-gif.f [Centre de Genetique Moleculaire, CNRS, Allee de la Terrasse, 91198 Gif-sur-Yvette (France)

    2011-02-15

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  18. The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution.

    Science.gov (United States)

    Verde, Ignazio; Abbott, Albert G; Scalabrin, Simone; Jung, Sook; Shu, Shengqiang; Marroni, Fabio; Zhebentyayeva, Tatyana; Dettori, Maria Teresa; Grimwood, Jane; Cattonaro, Federica; Zuccolo, Andrea; Rossini, Laura; Jenkins, Jerry; Vendramin, Elisa; Meisel, Lee A; Decroocq, Veronique; Sosinski, Bryon; Prochnik, Simon; Mitros, Therese; Policriti, Alberto; Cipriani, Guido; Dondini, Luca; Ficklin, Stephen; Goodstein, David M; Xuan, Pengfei; Del Fabbro, Cristian; Aramini, Valeria; Copetti, Dario; Gonzalez, Susana; Horner, David S; Falchi, Rachele; Lucas, Susan; Mica, Erica; Maldonado, Jonathan; Lazzari, Barbara; Bielenberg, Douglas; Pirona, Raul; Miculan, Mara; Barakat, Abdelali; Testolin, Raffaele; Stella, Alessandra; Tartarini, Stefano; Tonutti, Pietro; Arús, Pere; Orellana, Ariel; Wells, Christina; Main, Dorrie; Vizzotto, Giannina; Silva, Herman; Salamini, Francesco; Schmutz, Jeremy; Morgante, Michele; Rokhsar, Daniel S

    2013-05-01

    Rosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.

  19. Identification of regulatory modules in genome scale transcription regulatory networks.

    Science.gov (United States)

    Song, Qi; Grene, Ruth; Heath, Lenwood S; Li, Song

    2017-12-15

    In gene regulatory networks, transcription factors often function as co-regulators to synergistically induce or inhibit expression of their target genes. However, most existing module-finding algorithms can only identify densely connected genes but not co-regulators in regulatory networks. We have developed a new computational method, CoReg, to identify transcription co-regulators in large-scale regulatory networks. CoReg calculates gene similarities based on number of common neighbors of any two genes. Using simulated and real networks, we compared the performance of different similarity indices and existing module-finding algorithms and we found CoReg outperforms other published methods in identifying co-regulatory genes. We applied CoReg to a large-scale network of Arabidopsis with more than 2.8 million edges and we analyzed more than 2,300 published gene expression profiles to charaterize co-expression patterns of gene moduled identified by CoReg. We identified three types of modules in the Arabidopsis network: regulator modules, target modules and intermediate modules. Regulator modules include genes with more than 90% edges as out-going edges; Target modules include genes with more than 90% edges as incoming edges. Other modules are classified as intermediate modules. We found that genes in target modules tend to be highly co-expressed under abiotic stress conditions, suggesting this network struture is robust against perturbation. Our analysis shows that the CoReg is an accurate method in identifying co-regulatory genes in large-scale networks. We provide CoReg as an R package, which can be applied in finding co-regulators in any organisms with genome-scale regulatory network data.

  20. Large-scale parallel genome assembler over cloud computing environment.

    Science.gov (United States)

    Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

    2017-06-01

    The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

  1. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes

    DEFF Research Database (Denmark)

    Xu, Xun; Liu, Xin; Ge, Song

    2012-01-01

    Rice is a staple crop that has undergone substantial phenotypic and physiological changes during domestication. Here we resequenced the genomes of 40 cultivated accessions selected from the major groups of rice and 10 accessions of their wild progenitors (Oryza rufipogon and Oryza nivara) to >15 x...... raw data coverage. We investigated genome-wide variation patterns in rice and obtained 6.5 million high-quality single nucleotide polymorphisms (SNPs) after excluding sites with missing data in any accession. Using these population SNP data, we identified thousands of genes with significantly lower...... diversity in cultivated but not wild rice, which represent candidate regions selected during domestication. Some of these variants are associated with important biological features, whereas others have yet to be functionally characterized. The molecular markers we have identified should be valuable...

  2. SNP detection for massively parallel whole-genome resequencing

    DEFF Research Database (Denmark)

    Li, Ruiqiang; Li, Yingrui; Fang, Xiaodong

    2009-01-01

    .25% of the diploid autosomes and 88.07% of the haploid X chromosome. Comparison of the consensus sequence with Illumina human 1M BeadChip genotyped alleles from the same DNA sample showed that 98.6% of the 37,933 genotyped alleles on the X chromosome and 98% of 999,981 genotyped alleles on autosomes were covered...

  3. Metabolite coupling in genome-scale metabolic networks

    Directory of Open Access Journals (Sweden)

    Palsson Bernhard Ø

    2006-03-01

    Full Text Available Abstract Background Biochemically detailed stoichiometric matrices have now been reconstructed for various bacteria, yeast, and for the human cardiac mitochondrion based on genomic and proteomic data. These networks have been manually curated based on legacy data and elementally and charge balanced. Comparative analysis of these well curated networks is now possible. Pairs of metabolites often appear together in several network reactions, linking them topologically. This co-occurrence of pairs of metabolites in metabolic reactions is termed herein "metabolite coupling." These metabolite pairs can be directly computed from the stoichiometric matrix, S. Metabolite coupling is derived from the matrix ŜŜT, whose off-diagonal elements indicate the number of reactions in which any two metabolites participate together, where Ŝ is the binary form of S. Results Metabolite coupling in the studied networks was found to be dominated by a relatively small group of highly interacting pairs of metabolites. As would be expected, metabolites with high individual metabolite connectivity also tended to be those with the highest metabolite coupling, as the most connected metabolites couple more often. For metabolite pairs that are not highly coupled, we show that the number of reactions a pair of metabolites shares across a metabolic network closely approximates a line on a log-log scale. We also show that the preferential coupling of two metabolites with each other is spread across the spectrum of metabolites and is not unique to the most connected metabolites. We provide a measure for determining which metabolite pairs couple more often than would be expected based on their individual connectivity in the network and show that these metabolites often derive their principal biological functions from existing in pairs. Thus, analysis of metabolite coupling provides information beyond that which is found from studying the individual connectivity of individual

  4. CFMDS: CUDA-based fast multidimensional scaling for genome-scale data.

    Science.gov (United States)

    Park, Sungin; Shin, Soo-Yong; Hwang, Kyu-Baek

    2012-01-01

    Multidimensional scaling (MDS) is a widely used approach to dimensionality reduction. It has been applied to feature selection and visualization in various areas. Among diverse MDS methods, the classical MDS is a simple and theoretically sound solution for projecting data objects onto a low dimensional space while preserving the original distances among them as much as possible. However, it is not trivial to apply it to genome-scale data (e.g., microarray gene expression profiles) on regular desktop computers, because of its high computational complexity. We implemented a highly-efficient software application, called CFMDS (CUDA-based Fast MultiDimensional Scaling), which produces an approximate solution of the classical MDS based on CUDA (compute unified device architecture) and the divide-and-conquer principle. CUDA is a parallel computing architecture exploiting the power of the GPU (graphics processing unit). The principle of divide-and-conquer was adopted for circumventing the small memory problem of usual graphics cards. Our application software has been tested on various benchmark datasets including microarrays and compared with the classical MDS algorithms implemented using C# and MATLAB. In our experiments, CFMDS was more than a hundred times faster for large data than such general solutions. Regarding the quality of dimensionality reduction, our approximate solutions were as good as those from the general solutions, as the Pearson's correlation coefficients between them were larger than 0.9. CFMDS is an expeditious solution for the data dimensionality reduction problem. It is especially useful for efficient processing of genome-scale data consisting of several thousands of objects in several minutes.

  5. Genome sequence and genetic diversity of the common carp, Cyprinus carpio.

    Science.gov (United States)

    Xu, Peng; Zhang, Xiaofeng; Wang, Xumin; Li, Jiongtang; Liu, Guiming; Kuang, Youyi; Xu, Jian; Zheng, Xianhu; Ren, Lufeng; Wang, Guoliang; Zhang, Yan; Huo, Linhe; Zhao, Zixia; Cao, Dingchen; Lu, Cuiyun; Li, Chao; Zhou, Yi; Liu, Zhanjiang; Fan, Zhonghua; Shan, Guangle; Li, Xingang; Wu, Shuangxiu; Song, Lipu; Hou, Guangyuan; Jiang, Yanliang; Jeney, Zsigmond; Yu, Dan; Wang, Li; Shao, Changjun; Song, Lai; Sun, Jing; Ji, Peifeng; Wang, Jian; Li, Qiang; Xu, Liming; Sun, Fanyue; Feng, Jianxin; Wang, Chenghui; Wang, Shaolin; Wang, Baosen; Li, Yan; Zhu, Yaping; Xue, Wei; Zhao, Lan; Wang, Jintu; Gu, Ying; Lv, Weihua; Wu, Kejing; Xiao, Jingfa; Wu, Jiayan; Zhang, Zhang; Yu, Jun; Sun, Xiaowen

    2014-11-01

    The common carp, Cyprinus carpio, is one of the most important cyprinid species and globally accounts for 10% of freshwater aquaculture production. Here we present a draft genome of domesticated C. carpio (strain Songpu), whose current assembly contains 52,610 protein-coding genes and approximately 92.3% coverage of its paleotetraploidized genome (2n = 100). The latest round of whole-genome duplication has been estimated to have occurred approximately 8.2 million years ago. Genome resequencing of 33 representative individuals from worldwide populations demonstrates a single origin for C. carpio in 2 subspecies (C. carpio Haematopterus and C. carpio carpio). Integrative genomic and transcriptomic analyses were used to identify loci potentially associated with traits including scaling patterns and skin color. In combination with the high-resolution genetic map, the draft genome paves the way for better molecular studies and improved genome-assisted breeding of C. carpio and other closely related species.

  6. Development and Validation of a Genomic Knowledge Scale to Advance Informed Decision-Making Research in Genomic Sequencing

    Directory of Open Access Journals (Sweden)

    Michelle M. Langer PhD

    2017-02-01

    Full Text Available Background: This study evaluated the psychometric properties of a new, comprehensive measure of knowledge about genomic sequencing, the University of North Carolina Genomic Knowledge Scale (UNC-GKS. Methods: The UNC-GKS assesses knowledge in four domains thought to be critical for informed decision making about genomic sequencing. The scale was validated using classical test theory and item response theory in 286 adult patients and 132 parents of pediatric patients undergoing diagnostic whole exome sequencing (WES in the NCGENES study. Results: The UNC-GKS assessed a single underlying construct (genomic knowledge with good internal reliability (Cronbach’s α = 0.90. Scores were most informative (able to discriminate between individuals with different levels of genomic knowledge at one standard deviation above the scale mean or lower, a range that included most participants. Convergent validity was supported by associations with health literacy and numeracy (rs = 0.41–0.46. The scale functioned well across subgroups differing in sex, race/ethnicity, education, and English proficiency. Discussion: Findings supported the promise of the UNC-GKS as a valid and reliable measure of genomic knowledge among people facing complex decisions about WES and comparable sequencing methods. It is neither disease- nor population-specific, and it functioned well across important subgroups, making it usable in diverse populations.

  7. Genome-scale genetic engineering in Escherichia coli.

    Science.gov (United States)

    Jeong, Jaehwan; Cho, Namjin; Jung, Daehee; Bang, Duhee

    2013-11-01

    Genome engineering has been developed to create useful strains for biological studies and industrial uses. However, a continuous challenge remained in the field: technical limitations in high-throughput screening and precise manipulation of strains. Today, technical improvements have made genome engineering more rapid and efficient. This review introduces recent advances in genome engineering technologies applied to Escherichia coli as well as multiplex automated genome engineering (MAGE), a recent technique proposed as a powerful toolkit due to its straightforward process, rapid experimental procedures, and highly efficient properties. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Analysis of Aspergillus nidulans metabolism at the genome-scale

    DEFF Research Database (Denmark)

    David, Helga; Ozcelik, İlknur Ş; Hofmann, Gerald

    2008-01-01

    biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned...

  9. Large-scale genomic 2D visualization reveals extensive CG-AT skew correlation in bird genomes

    Directory of Open Access Journals (Sweden)

    Deng Xuemei

    2007-11-01

    Full Text Available Abstract Background Bird genomes have very different compositional structure compared with other warm-blooded animals. The variation in the base skew rules in the vertebrate genomes remains puzzling, but it must relate somehow to large-scale genome evolution. Current research is inclined to relate base skew with mutations and their fixation. Here we wish to explore base skew correlations in bird genomes, to develop methods for displaying and quantifying such correlations at different scales, and to discuss possible explanations for the peculiarities of the bird genomes in skew correlation. Results We have developed a method called Base Skew Double Triangle (BSDT for exhibiting the genome-scale change of AT/CG skew as a two-dimensional square picture, showing base skews at many scales simultaneously in a single image. By this method we found that most chicken chromosomes have high AT/CG skew correlation (symmetry in 2D picture, except for some microchromosomes. No other organisms studied (18 species show such high skew correlations. This visualized high correlation was validated by three kinds of quantitative calculations with overlapping and non-overlapping windows, all indicating that chicken and birds in general have a special genome structure. Similar features were also found in some of the mammal genomes, but clearly much weaker than in chickens. We presume that the skew correlation feature evolved near the time that birds separated from other vertebrate lineages. When we eliminated the repeat sequences from the genomes, the AT and CG skews correlation increased for some mammal genomes, but were still clearly lower than in chickens. Conclusion Our results suggest that BSDT is an expressive visualization method for AT and CG skew and enabled the discovery of the very high skew correlation in bird genomes; this peculiarity is worth further study. Computational analysis indicated that this correlation might be a compositional characteristic

  10. Incorporating Protein Biosynthesis into the Saccharomyces cerevisiae Genome-scale Metabolic Model

    DEFF Research Database (Denmark)

    Olivares Hernandez, Roberto

    Based on stoichiometric biochemical equations that occur into the cell, the genome-scale metabolic models can quantify the metabolic fluxes, which are regarded as the final representation of the physiological state of the cell. For Saccharomyces Cerevisiae the genome scale model has been......, translation initiation, translation elongation, translation termination, translation elongation, and mRNA decay. Considering these information from the mechanisms of transcription and translation, we will include this stoichiometric reactions into the genome scale model for S. Cerevisiae to obtain the first...

  11. Guidelines for Genome-Scale Analysis of Biological Rhythms

    NARCIS (Netherlands)

    Hughes, Michael E.; Abruzzi, Katherine C.; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M. Fernanda; Chen, Zheng; Chiu, Joanna C.; Cox, Juergen; Crowell, Alexander M.; Debruyne, Jason P.; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J.; Duffield, Giles E.; Dunlap, Jay C.; Eckel-Mahan, Kristin; Esser, Karyn A.; FitzGerald, Garret A.; Forger, Daniel B.; Francey, Lauren J.; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S.; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H.; Herzel, Hanspeter; Herzog, Erik D.; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J.; Hurley, Jennifer M.; de la Iglesia, Horacio O.; Johnson, Carl; Kay, Steve A.; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A.; Li, Jiajia; Li, Xiaodong; Liu, Andrew C.; Loros, Jennifer J.; Martino, Tami A.; Menet, Jerome S.; Merrow, Martha; Millar, Andrew J.; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N.; Olmedo, Maria; Nusinow, Dmitri A.; Ptáček, Louis J.; Rand, David; Reddy, Akhilesh B.; Robles, Maria S.; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D.; Rund, Samuel S. C.; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J.; Storch, Kai-Florian; Takahashi, Joseph S.; Ueda, Hiroki R.; Wang, Han; Weitz, Charles; Westermark, Pål O.; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B.

    2017-01-01

    Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future

  12. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip

    2011-01-01

    , copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications...... differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies....

  13. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models

    DEFF Research Database (Denmark)

    King, Zachary A.; Lu, Justin; Dräger, Andreas

    2016-01-01

    Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized...... redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scale metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases....... Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource...

  14. GIGGLE: a search engine for large-scale integrated genome analysis.

    Science.gov (United States)

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-01-08

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

  15. Environmental versatility promotes modularity in genome-scale metabolic networks.

    Science.gov (United States)

    Samal, Areejit; Wagner, Andreas; Martin, Olivier C

    2011-08-24

    The ubiquity of modules in biological networks may result from an evolutionary benefit of a modular organization. For instance, modularity may increase the rate of adaptive evolution, because modules can be easily combined into new arrangements that may benefit their carrier. Conversely, modularity may emerge as a by-product of some trait. We here ask whether this last scenario may play a role in genome-scale metabolic networks that need to sustain life in one or more chemical environments. For such networks, we define a network module as a maximal set of reactions that are fully coupled, i.e., whose fluxes can only vary in fixed proportions. This definition overcomes limitations of purely graph based analyses of metabolism by exploiting the functional links between reactions. We call a metabolic network viable in a given chemical environment if it can synthesize all of an organism's biomass compounds from nutrients in this environment. An organism's metabolism is highly versatile if it can sustain life in many different chemical environments. We here ask whether versatility affects the modularity of metabolic networks. Using recently developed techniques to randomly sample large numbers of viable metabolic networks from a vast space of metabolic networks, we use flux balance analysis to study in silico metabolic networks that differ in their versatility. We find that highly versatile networks are also highly modular. They contain more modules and more reactions that are organized into modules. Most or all reactions in a module are associated with the same biochemical pathways. Modules that arise in highly versatile networks generally involve reactions that process nutrients or closely related chemicals. We also observe that the metabolism of E. coli is significantly more modular than even our most versatile networks. Our work shows that modularity in metabolic networks can be a by-product of functional constraints, e.g., the need to sustain life in multiple

  16. Environmental versatility promotes modularity in genome-scale metabolic networks

    Science.gov (United States)

    2011-01-01

    Background The ubiquity of modules in biological networks may result from an evolutionary benefit of a modular organization. For instance, modularity may increase the rate of adaptive evolution, because modules can be easily combined into new arrangements that may benefit their carrier. Conversely, modularity may emerge as a by-product of some trait. We here ask whether this last scenario may play a role in genome-scale metabolic networks that need to sustain life in one or more chemical environments. For such networks, we define a network module as a maximal set of reactions that are fully coupled, i.e., whose fluxes can only vary in fixed proportions. This definition overcomes limitations of purely graph based analyses of metabolism by exploiting the functional links between reactions. We call a metabolic network viable in a given chemical environment if it can synthesize all of an organism's biomass compounds from nutrients in this environment. An organism's metabolism is highly versatile if it can sustain life in many different chemical environments. We here ask whether versatility affects the modularity of metabolic networks. Results Using recently developed techniques to randomly sample large numbers of viable metabolic networks from a vast space of metabolic networks, we use flux balance analysis to study in silico metabolic networks that differ in their versatility. We find that highly versatile networks are also highly modular. They contain more modules and more reactions that are organized into modules. Most or all reactions in a module are associated with the same biochemical pathways. Modules that arise in highly versatile networks generally involve reactions that process nutrients or closely related chemicals. We also observe that the metabolism of E. coli is significantly more modular than even our most versatile networks. Conclusions Our work shows that modularity in metabolic networks can be a by-product of functional constraints, e.g., the need to

  17. Environmental versatility promotes modularity in genome-scale metabolic networks

    Directory of Open Access Journals (Sweden)

    Wagner Andreas

    2011-08-01

    Full Text Available Abstract Background The ubiquity of modules in biological networks may result from an evolutionary benefit of a modular organization. For instance, modularity may increase the rate of adaptive evolution, because modules can be easily combined into new arrangements that may benefit their carrier. Conversely, modularity may emerge as a by-product of some trait. We here ask whether this last scenario may play a role in genome-scale metabolic networks that need to sustain life in one or more chemical environments. For such networks, we define a network module as a maximal set of reactions that are fully coupled, i.e., whose fluxes can only vary in fixed proportions. This definition overcomes limitations of purely graph based analyses of metabolism by exploiting the functional links between reactions. We call a metabolic network viable in a given chemical environment if it can synthesize all of an organism's biomass compounds from nutrients in this environment. An organism's metabolism is highly versatile if it can sustain life in many different chemical environments. We here ask whether versatility affects the modularity of metabolic networks. Results Using recently developed techniques to randomly sample large numbers of viable metabolic networks from a vast space of metabolic networks, we use flux balance analysis to study in silico metabolic networks that differ in their versatility. We find that highly versatile networks are also highly modular. They contain more modules and more reactions that are organized into modules. Most or all reactions in a module are associated with the same biochemical pathways. Modules that arise in highly versatile networks generally involve reactions that process nutrients or closely related chemicals. We also observe that the metabolism of E. coli is significantly more modular than even our most versatile networks. Conclusions Our work shows that modularity in metabolic networks can be a by-product of functional

  18. Identifying anti-growth factors for human cancer cell lines through genome-scale metabolic modeling

    DEFF Research Database (Denmark)

    Ghaffari, Pouyan; Mardinoglu, Adil; Asplund, Anna

    2015-01-01

    Human cancer cell lines are used as important model systems to study molecular mechanisms associated with tumor growth, hereunder how genomic and biological heterogeneity found in primary tumors affect cellular phenotypes. We reconstructed Genome scale metabolic models (GEMs) for eleven cell line...

  19. Structural genomics of eukaryotic targets at a laboratory scale.

    Science.gov (United States)

    Busso, Didier; Poussin-Courmontagne, Pierre; Rosé, David; Ripp, Raymond; Litt, Alain; Thierry, Jean-Claude; Moras, Dino

    2005-01-01

    Structural genomics programs are distributed worldwide and funded by large institutions such as the NIH in United-States, the RIKEN in Japan or the European Commission through the SPINE network in Europe. Such initiatives, essentially managed by large consortia, led to technology and method developments at the different steps required to produce biological samples compatible with structural studies. Besides specific applications, method developments resulted mainly upon miniaturization and parallelization. The challenge that academic laboratories faces to pursue structural genomics programs is to produce, at a higher rate, protein samples. The Structural Biology and Genomics Department (IGBMC - Illkirch - France) is implicated in a structural genomics program of high eukaryotes whose goal is solving crystal structures of proteins and their complexes (including large complexes) related to human health and biotechnology. To achieve such a challenging goal, the Department has established a medium-throughput pipeline for producing protein samples suitable for structural biology studies. Here, we describe the setting up of our initiative from cloning to crystallization and we demonstrate that structural genomics may be manageable by academic laboratories by strategic investments in robotic and by adapting classical bench protocols and new developments, in particular in the field of protein expression, to parallelization.

  20. PUGS: A novel scale to assess perceptions of uncertainties in genome sequencing

    OpenAIRE

    Biesecker, Barbara B.; Woolford, Samuel W.; Klein, William MP; Brothers, Kyle B.; Umstead, Kendall L.; Lewis, Katie L.; Biesecker, Leslie G.; Han, Paul KJ

    2017-01-01

    Expectations of results from genome sequencing by end users are influenced by perceptions of uncertainty. This study aimed to assess uncertainties about sequencing by developing, evaluating, and implementing a novel scale. The Perceptions of Uncertainties in Genome Sequencing (PUGS) scale comprised ten items to assess uncertainties within three domains: clinical, affective, and evaluative. Participants (n=535) from the ClinSeq® NIH sequencing study completed a baseline survey that included th...

  1. Visualizing metabolic activity on a genome-wide scale

    NARCIS (Netherlands)

    Luyf, A. C. M.; de Gast, J.; van Kampen, A. H. C.

    2002-01-01

    Motivation: To enhance the exploration of gene expression data in a metabolic context, one requires an application that allows the integration of this data and which represents this data in a (genome-wide) metabolic map. The layout of this metabolic map must be highly flexible to enable discoveries

  2. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip

    2011-01-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, ...

  3. Pore-scale simulation of microbial growth using a genome-scale metabolic model: Implications for Darcy-scale reactive transport

    Science.gov (United States)

    Tartakovsky, G. D.; Tartakovsky, A. M.; Scheibe, T. D.; Fang, Y.; Mahadevan, R.; Lovley, D. R.

    2013-09-01

    Recent advances in microbiology have enabled the quantitative simulation of microbial metabolism and growth based on genome-scale characterization of metabolic pathways and fluxes. We have incorporated a genome-scale metabolic model of the iron-reducing bacteria Geobacter sulfurreducens into a pore-scale simulation of microbial growth based on coupling of iron reduction to oxidation of a soluble electron donor (acetate). In our model, fluid flow and solute transport is governed by a combination of the Navier-Stokes and advection-diffusion-reaction equations. Microbial growth occurs only on the surface of soil grains where solid-phase mineral iron oxides are available. Mass fluxes of chemical species associated with microbial growth are described by the genome-scale microbial model, implemented using a constraint-based metabolic model, and provide the Robin-type boundary condition for the advection-diffusion equation at soil grain surfaces. Conventional models of microbially-mediated subsurface reactions use a lumped reaction model that does not consider individual microbial reaction pathways, and describe reactions rates using empirically-derived rate formulations such as the Monod-type kinetics. We have used our pore-scale model to explore the relationship between genome-scale metabolic models and Monod-type formulations, and to assess the manifestation of pore-scale variability (microenvironments) in terms of apparent Darcy-scale microbial reaction rates. The genome-scale model predicted lower biomass yield, and different stoichiometry for iron consumption, in comparison to prior Monod formulations based on energetics considerations. We were able to fit an equivalent Monod model, by modifying the reaction stoichiometry and biomass yield coefficient, that could effectively match results of the genome-scale simulation of microbial behaviors under excess nutrient conditions, but predictions of the fitted Monod model deviated from those of the genome-scale model

  4. Pore-scale simulation of microbial growth using a genome-scale metabolic model: Implications for Darcy-scale reactive transport

    Energy Technology Data Exchange (ETDEWEB)

    Tartakovsky, Guzel D.; Tartakovsky, Alexandre M.; Scheibe, Timothy D.; Fang, Yilin; Mahadevan, Radhakrishnan; Lovley, Derek R.

    2013-09-07

    Recent advances in microbiology have enabled the quantitative simulation of microbial metabolism and growth based on genome-scale characterization of metabolic pathways and fluxes. We have incorporated a genome-scale metabolic model of the iron-reducing bacteria Geobacter sulfurreducens into a pore-scale simulation of microbial growth based on coupling of iron reduction to oxidation of a soluble electron donor (acetate). In our model, fluid flow and solute transport is governed by a combination of the Navier-Stokes and advection-diffusion-reaction equations. Microbial growth occurs only on the surface of soil grains where solid-phase mineral iron oxides are available. Mass fluxes of chemical species associated with microbial growth are described by the genome-scale microbial model, implemented using a constraint-based metabolic model, and provide the Robin-type boundary condition for the advection-diffusion equation at soil grain surfaces. Conventional models of microbially-mediated subsurface reactions use a lumped reaction model that does not consider individual microbial reaction pathways, and describe reactions rates using empirically-derived rate formulations such as the Monod-type kinetics. We have used our pore-scale model to explore the relationship between genome-scale metabolic models and Monod-type formulations, and to assess the manifestation of pore-scale variability (microenvironments) in terms of apparent Darcy-scale microbial reaction rates. The genome-scale model predicted lower biomass yield, and different stoichiometry for iron consumption, in comparisonto prior Monod formulations based on energetics considerations. We were able to fit an equivalent Monod model, by modifying the reaction stoichiometry and biomass yield coefficient, that could effectively match results of the genome-scale simulation of microbial behaviors under excess nutrient conditions, but predictions of the fitted Monod model deviated from those of the genome-scale model under

  5. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

    Science.gov (United States)

    Chechetkin, V R; Lobzin, V V

    2017-08-07

    Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Resequencing of Treponema pallidum ssp. pallidum strains Nichols and SS14: correction of sequencing errors resulted in increased separation of syphilis treponeme subclusters.

    Directory of Open Access Journals (Sweden)

    Helena Pětrošová

    Full Text Available BACKGROUND: Treponema pallidum ssp. pallidum (TPA, the causative agent of syphilis, is a highly clonal bacterium showing minimal genetic variability in the genome sequence of individual strains. Nevertheless, genetically characterized syphilis strains can be clearly divided into two groups, Nichols-like strains and SS14-like strains. TPA Nichols and SS14 strains were completely sequenced in 1998 and 2008, respectively. Since publication of their complete genome sequences, a number of sequencing errors in each genome have been reported. Therefore, we have resequenced TPA Nichols and SS14 strains using next-generation sequencing techniques. METHODOLOGY/PRINCIPAL FINDINGS: The genomes of TPA strains Nichols and SS14 were resequenced using the 454 and Illumina sequencing methods that have a combined average coverage higher than 90x. In the TPA strain Nichols genome, 134 errors were identified (25 substitutions and 109 indels, and 102 of them affected protein sequences. In the TPA SS14 genome, a total of 191 errors were identified (85 substitutions and 106 indels and 136 of them affected protein sequences. A set of new intrastrain heterogenic regions in the TPA SS14 genome were identified including the tprD gene, where both tprD and tprD2 alleles were found. The resequenced genomes of both TPA Nichols and SS14 strains clustered more closely with related strains (i.e. strains belonging to same syphilis treponeme subcluster. At the same time, groups of Nichols-like and SS14-like strains were found to be more distantly related. CONCLUSION/SIGNIFICANCE: We identified errors in 11.5% of all annotated genes and, after correction, we found a significant impact on the predicted proteomes of both Nichols and SS14 strains. Corrections of these errors resulted in protein elongations, truncations, fusions and indels in more than 11% of all annotated proteins. Moreover, it became more evident that syphilis is caused by treponemes belonging to two separate genetic

  7. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.

    Science.gov (United States)

    Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M

    2016-03-31

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.

  8. Accomplishments in genome-scale in silico modeling for industrial and medical biotechnology.

    Science.gov (United States)

    Milne, Caroline B; Kim, Pan-Jun; Eddy, James A; Price, Nathan D

    2009-12-01

    Driven by advancements in high-throughput biological technologies and the growing number of sequenced genomes, the construction of in silico models at the genome scale has provided powerful tools to investigate a vast array of biological systems and applications. Here, we review comprehensively the uses of such models in industrial and medical biotechnology, including biofuel generation, food production, and drug development. While the use of in silico models is still in its early stages for delivering to industry, significant initial successes have been achieved. For the cases presented here, genome-scale models predict engineering strategies to enhance properties of interest in an organism or to inhibit harmful mechanisms of pathogens. Going forward, genome-scale in silico models promise to extend their application and analysis scope to become a trans-formative tool in biotechnology.

  9. Scaling up genome annotation using MAKER and work queue.

    Science.gov (United States)

    Thrasher, Andrew; Musgrave, Zachary; Kachmarck, Brian; Thain, Douglas; Emrich, Scott

    2014-01-01

    Next generation sequencing technologies have enabled sequencing many genomes. Because of the overall increasing demand and the inherent parallelism available in many required analyses, these bioinformatics applications should ideally run on clusters, clouds and/or grids. We present a modified annotation framework that achieves a speed-up of 45x using 50 workers using a Caenorhabditis japonica test case. We also evaluate these modifications within the Amazon EC2 cloud framework. The underlying genome annotation (MAKER) is parallelised as an MPI application. Our framework enables it to now run without MPI while utilising a wide variety of distributed computing resources. This parallel framework also allows easy explicit data transfer, which helps overcome a major limitation of bioinformatics tools that often rely on shared file systems. Combined, our proposed framework can be used, even during early stages of development, to easily run sequence analysis tools on clusters, grids and clouds.

  10. Direct-to-consumer genomics on the scales of autonomy

    Science.gov (United States)

    Vayena, Effy

    2015-01-01

    Direct-to-consumer (DTC) genetic services have generated enormous controversy from their first emergence. A dramatic recent manifestation of this is the Food and Drug Administration's (FDA) cease and desist order against 23andMe, the leading provider in the market. Critics have argued for the restrictive regulation of such services, and even their prohibition, on the grounds of the harm they pose to consumers. Their advocates, by contrast, defend them as a means of enhancing the autonomy of those same consumers. Autonomy emerges as a key battle-field in this debate, because many of the ‘harm’ arguments can be interpreted as identifying threats to autonomy. This paper assesses whether DTC genomic services are a threat to, or instead, an enhancement of, personal autonomy. It deploys Joseph Raz's account of personal autonomy, with its emphasis on choice from a range of valuable options. It then seeks to counter claims that DTC genomics threatens autonomy because it involves manipulation in contravention of consumers’ independence or because it does not generate valuable options which can be meaningfully engaged with by consumers. It is stressed that the value of the options generated by DTC genomics should not be judged exclusively from the perspective of medical actionability, but should take into consideration plural utilities. Finally, the paper ends by broaching policy recommendations, suggesting that there is a strong autonomy-based argument for permitting DTC genomic services, and that the key question is the nature of the regulatory conditions under which they should be permitted. The discussion of autonomy in this paper helps illuminate some of these conditions. PMID:24797610

  11. Direct-to-consumer genomics on the scales of autonomy.

    Science.gov (United States)

    Vayena, Effy

    2015-04-01

    Direct-to-consumer (DTC) genetic services have generated enormous controversy from their first emergence. A dramatic recent manifestation of this is the Food and Drug Administration's (FDA) cease and desist order against 23andMe, the leading provider in the market. Critics have argued for the restrictive regulation of such services, and even their prohibition, on the grounds of the harm they pose to consumers. Their advocates, by contrast, defend them as a means of enhancing the autonomy of those same consumers. Autonomy emerges as a key battle-field in this debate, because many of the 'harm' arguments can be interpreted as identifying threats to autonomy. This paper assesses whether DTC genomic services are a threat to, or instead, an enhancement of, personal autonomy. It deploys Joseph Raz's account of personal autonomy, with its emphasis on choice from a range of valuable options. It then seeks to counter claims that DTC genomics threatens autonomy because it involves manipulation in contravention of consumers' independence or because it does not generate valuable options which can be meaningfully engaged with by consumers. It is stressed that the value of the options generated by DTC genomics should not be judged exclusively from the perspective of medical actionability, but should take into consideration plural utilities. Finally, the paper ends by broaching policy recommendations, suggesting that there is a strong autonomy-based argument for permitting DTC genomic services, and that the key question is the nature of the regulatory conditions under which they should be permitted. The discussion of autonomy in this paper helps illuminate some of these conditions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  12. TIGER: Toolbox for integrating genome-scale metabolic models, expression data, and transcriptional regulatory networks

    Directory of Open Access Journals (Sweden)

    Jensen Paul A

    2011-09-01

    Full Text Available Abstract Background Several methods have been developed for analyzing genome-scale models of metabolism and transcriptional regulation. Many of these methods, such as Flux Balance Analysis, use constrained optimization to predict relationships between metabolic flux and the genes that encode and regulate enzyme activity. Recently, mixed integer programming has been used to encode these gene-protein-reaction (GPR relationships into a single optimization problem, but these techniques are often of limited generality and lack a tool for automating the conversion of rules to a coupled regulatory/metabolic model. Results We present TIGER, a Toolbox for Integrating Genome-scale Metabolism, Expression, and Regulation. TIGER converts a series of generalized, Boolean or multilevel rules into a set of mixed integer inequalities. The package also includes implementations of existing algorithms to integrate high-throughput expression data with genome-scale models of metabolism and transcriptional regulation. We demonstrate how TIGER automates the coupling of a genome-scale metabolic model with GPR logic and models of transcriptional regulation, thereby serving as a platform for algorithm development and large-scale metabolic analysis. Additionally, we demonstrate how TIGER's algorithms can be used to identify inconsistencies and improve existing models of transcriptional regulation with examples from the reconstructed transcriptional regulatory network of Saccharomyces cerevisiae. Conclusion The TIGER package provides a consistent platform for algorithm development and extending existing genome-scale metabolic models with regulatory networks and high-throughput data.

  13. On the road to synthetic life: the minimal cell and genome-scale engineering.

    Science.gov (United States)

    Juhas, Mario

    2016-01-01

    Synthetic biology employs rational engineering principles to build biological systems from the libraries of standard, well characterized biological parts. Biological systems designed and built by synthetic biologists fulfill a plethora of useful purposes, ranging from better healthcare and energy production to biomanufacturing. Recent advancements in the synthesis, assembly and "booting-up" of synthetic genomes and in low and high-throughput genome engineering have paved the way for engineering on the genome-wide scale. One of the key goals of genome engineering is the construction of minimal genomes consisting solely of essential genes (genes indispensable for survival of living organisms). Besides serving as a toolbox to understand the universal principles of life, the cell encoded by minimal genome could be used to build a stringently controlled "cell factory" with a desired phenotype. This review provides an update on recent advances in the genome-scale engineering with particular emphasis on the engineering of minimal genomes. Furthermore, it presents an ongoing discussion to the scientific community for better suitability of minimal or robust cells for industrial applications.

  14. Savant Genome Browser 2: visualization and analysis for population-scale genomics.

    Science.gov (United States)

    Fiume, Marc; Smith, Eric J M; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M; Robinson, Mark D; Wodak, Shoshana J; Brudno, Michael

    2012-07-01

    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.

  15. Rapid genome-scale mapping of chromatin accessibility in tissue

    DEFF Research Database (Denmark)

    Grøntved, Lars; Bandle, Russell; John, Sam

    2012-01-01

    BACKGROUND: The challenge in extracting genome-wide chromatin features from limiting clinical samples poses a significant hurdle in identification of regulatory marks that impact the physiological or pathological state. Current methods that identify nuclease accessible chromatin are reliant....... RESULTS: Here we introduce a novel technique that specifically identifies Tissue Accessible Chromatin (TACh). The TACh method uses pulverized frozen tissue as starting material and employs one of the two robust endonucleases, Benzonase or Cyansase, which are fully active under a range of stringent...... positively with levels of gene transcription. Accessible chromatin identified by TACh overlaps to a large extend with accessible chromatin identified by DNase I using nuclei purified from freshly isolated liver tissue as starting material. The similarities are most pronounced at highly accessible regions...

  16. Long-Range Periodic Patterns in Microbial Genomes Indicate Significant Multi-Scale Chromosomal Organization.

    Directory of Open Access Journals (Sweden)

    2006-01-01

    Full Text Available Genome organization can be studied through analysis of chromosome position-dependent patterns in sequence-derived parameters. A comprehensive analysis of such patterns in prokaryotic sequences and genome-scale functional data has yet to be performed. We detected spatial patterns in sequence-derived parameters for 163 chromosomes occurring in 135 bacterial and 16 archaeal organisms using wavelet analysis. Pattern strength was found to correlate with organism-specific features such as genome size, overall GC content, and the occurrence of known motility and chromosomal binding proteins. Given additional functional data for Escherichia coli, we found significant correlations among chromosome position dependent patterns in numerous properties, some of which are consistent with previously experimentally identified chromosome macrodomains. These results demonstrate that the large-scale organization of most sequenced genomes is significantly nonrandom, and, moreover, that this organization is likely linked to genome size, nucleotide composition, and information transfer processes. Constraints on genome evolution and design are thus not solely dependent upon information content, but also upon an intricate multi-parameter, multi-length-scale organization of the chromosome.

  17. Small and large scale genomic DNA isolation protocol for chickpea ...

    African Journals Online (AJOL)

    Both small and large scale preparations were essentially suitable for PCR and Southern blot hybridization analyses, which are the key steps in crop improvement programme through marker development and genetic engineering techniques. Key words: Cicer arietinum L., phenolics, restriction enzyme digestion, PCR ...

  18. Modeling Method for Increased Precision and Scope of Directly Measurable Fluxes at a Genome-Scale

    DEFF Research Database (Denmark)

    McCloskey, Douglas; Young, Jamey D.; Xu, Sibei

    2016-01-01

    -off between increased scope and decreased precision in flux estimations. This work presents a tunable workflow for expanding the scope of MFA to the genome-scale without trade-offs in flux precision. The genome-scale MFA model presented here, iDM2014, accounts for 537 net reactions, which includes the core...... pathways of traditional MFA models and also covers the additional pathways of purine, pyrimidine, isoprenoid, methionine, riboflavin, coenzyme A, and folate, as well as other biosynthetic pathways. When evaluating the iDM2014 using a set of measured intracellular intermediate and cofactor mass isotopomer...

  19. The architecture of ArgR-DNA complexes at the genome-scale in Escherichia coli

    DEFF Research Database (Denmark)

    Cho, Suhyung; Cho, Yoo-Bok; Kang, Taek Jin

    2015-01-01

    DNA-binding motifs that are recognized by transcription factors (TFs) have been well studied; however, challenges remain in determining the in vivo architecture of TF-DNA complexes on a genome-scale. Here, we determined the in vivo architecture of Escherichia coli arginine repressor (Arg...... facilitate the non-specific contacts between ArgR subunits and the residual sequences. Additionally, our approach may also reveal other fundamental structural features of TF-DNA interactions that have implications for studying genome-scale transcriptional regulatory networks....

  20. Rapid prototyping of microbial cell factories via genome-scale engineering.

    Science.gov (United States)

    Si, Tong; Xiao, Han; Zhao, Huimin

    2015-11-15

    Advances in reading, writing and editing genetic materials have greatly expanded our ability to reprogram biological systems at the resolution of a single nucleotide and on the scale of a whole genome. Such capacity has greatly accelerated the cycles of design, build and test to engineer microbes for efficient synthesis of fuels, chemicals and drugs. In this review, we summarize the emerging technologies that have been applied, or are potentially useful for genome-scale engineering in microbial systems. We will focus on the development of high-throughput methodologies, which may accelerate the prototyping of microbial cell factories. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Double and multiple knockout simulations for genome-scale metabolic network reconstructions.

    Science.gov (United States)

    Goldstein, Yaron Ab; Bockmayr, Alexander

    2015-01-01

    Constraint-based modeling of genome-scale metabolic network reconstructions has become a widely used approach in computational biology. Flux coupling analysis is a constraint-based method that analyses the impact of single reaction knockouts on other reactions in the network. We present an extension of flux coupling analysis for double and multiple gene or reaction knockouts, and develop corresponding algorithms for an in silico simulation. To evaluate our method, we perform a full single and double knockout analysis on a selection of genome-scale metabolic network reconstructions and compare the results. A prototype implementation of double knockout simulation is available at http://hoverboard.io/L4FC.

  2. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  3. AlgaGEM--a genome-scale metabolic reconstruction of algae based on the Chlamydomonas reinhardtii genome.

    Science.gov (United States)

    Dal'Molin, Cristiana Gomes de Oliveira; Quek, Lake-Ee; Palfreyman, Robin W; Nielsen, Lars K

    2011-12-22

    Microalgae have the potential to deliver biofuels without the associated competition for land resources. In order to realise the rates and titres necessary for commercial production, however, system-level metabolic engineering will be required. Genome scale metabolic reconstructions have revolutionized microbial metabolic engineering and are used routinely for in silico analysis and design. While genome scale metabolic reconstructions have been developed for many prokaryotes and model eukaryotes, the application to less well characterized eukaryotes such as algae is challenging not at least due to a lack of compartmentalization data. We have developed a genome-scale metabolic network model (named AlgaGEM) covering the metabolism for a compartmentalized algae cell based on the Chlamydomonas reinhardtii genome. AlgaGEM is a comprehensive literature-based genome scale metabolic reconstruction that accounts for the functions of 866 unique ORFs, 1862 metabolites, 2249 gene-enzyme-reaction-association entries, and 1725 unique reactions. The reconstruction was compartmentalized into the cytoplasm, mitochondrion, plastid and microbody using available data for algae complemented with compartmentalisation data for Arabidopsis thaliana. AlgaGEM describes a functional primary metabolism of Chlamydomonas and significantly predicts distinct algal behaviours such as the catabolism or secretion rather than recycling of phosphoglycolate in photorespiration. AlgaGEM was validated through the simulation of growth and algae metabolic functions inferred from literature. Using efficient resource utilisation as the optimality criterion, AlgaGEM predicted observed metabolic effects under autotrophic, heterotrophic and mixotrophic conditions. AlgaGEM predicts increased hydrogen production when cyclic electron flow is disrupted as seen in a high producing mutant derived from mutational studies. The model also predicted the physiological pathway for H2 production and identified new targets

  4. Rapid genome-scale mapping of chromatin accessibility in tissue

    Science.gov (United States)

    2012-01-01

    Background The challenge in extracting genome-wide chromatin features from limiting clinical samples poses a significant hurdle in identification of regulatory marks that impact the physiological or pathological state. Current methods that identify nuclease accessible chromatin are reliant on large amounts of purified nuclei as starting material. This complicates analysis of trace clinical tissue samples that are often stored frozen. We have developed an alternative nuclease based procedure to bypass nuclear preparation to interrogate nuclease accessible regions in frozen tissue samples. Results Here we introduce a novel technique that specifically identifies Tissue Accessible Chromatin (TACh). The TACh method uses pulverized frozen tissue as starting material and employs one of the two robust endonucleases, Benzonase or Cyansase, which are fully active under a range of stringent conditions such as high levels of detergent and DTT. As a proof of principle we applied TACh to frozen mouse liver tissue. Combined with massive parallel sequencing TACh identifies accessible regions that are associated with euchromatic features and accessibility at transcriptional start sites correlates positively with levels of gene transcription. Accessible chromatin identified by TACh overlaps to a large extend with accessible chromatin identified by DNase I using nuclei purified from freshly isolated liver tissue as starting material. The similarities are most pronounced at highly accessible regions, whereas identification of less accessible regions tends to be more divergence between nucleases. Interestingly, we show that some of the differences between DNase I and Benzonase relate to their intrinsic sequence biases and accordingly accessibility of CpG islands is probed more efficiently using TACh. Conclusion The TACh methodology identifies accessible chromatin derived from frozen tissue samples. We propose that this simple, robust approach can be applied across a broad range of

  5. Rapid genome-scale mapping of chromatin accessibility in tissue

    Directory of Open Access Journals (Sweden)

    Grøntved Lars

    2012-06-01

    Full Text Available Abstract Background The challenge in extracting genome-wide chromatin features from limiting clinical samples poses a significant hurdle in identification of regulatory marks that impact the physiological or pathological state. Current methods that identify nuclease accessible chromatin are reliant on large amounts of purified nuclei as starting material. This complicates analysis of trace clinical tissue samples that are often stored frozen. We have developed an alternative nuclease based procedure to bypass nuclear preparation to interrogate nuclease accessible regions in frozen tissue samples. Results Here we introduce a novel technique that specifically identifies Tissue Accessible Chromatin (TACh. The TACh method uses pulverized frozen tissue as starting material and employs one of the two robust endonucleases, Benzonase or Cyansase, which are fully active under a range of stringent conditions such as high levels of detergent and DTT. As a proof of principle we applied TACh to frozen mouse liver tissue. Combined with massive parallel sequencing TACh identifies accessible regions that are associated with euchromatic features and accessibility at transcriptional start sites correlates positively with levels of gene transcription. Accessible chromatin identified by TACh overlaps to a large extend with accessible chromatin identified by DNase I using nuclei purified from freshly isolated liver tissue as starting material. The similarities are most pronounced at highly accessible regions, whereas identification of less accessible regions tends to be more divergence between nucleases. Interestingly, we show that some of the differences between DNase I and Benzonase relate to their intrinsic sequence biases and accordingly accessibility of CpG islands is probed more efficiently using TACh. Conclusion The TACh methodology identifies accessible chromatin derived from frozen tissue samples. We propose that this simple, robust approach can be applied

  6. Designing metabolic engineering strategies with genome-scale metabolic flux modeling

    Directory of Open Access Journals (Sweden)

    Yen JY

    2015-01-01

    Full Text Available Jiun Y Yen,1,2 Imen Tanniche,1 Amanda K Fisher,1–3 Glenda E Gillaspy,2 David R Bevan,2,3 Ryan S Senger1 1Department of Biological Systems Engineering, 2Department of Biochemistry, 3Genomics, Bioinformatics, and Computational Biology Interdisciplinary Program, Virginia Tech, Blacksburg, VA, USA Abstract: New in silico tools that make use of genome-scale metabolic flux modeling are improving the design of metabolic engineering strategies. This review highlights the latest developments in this area, explains the interface between these in silico tools and the experimental implementation tools of metabolic engineers, and provides a way forward so that in silico predictions can better mimic reality and more experimental methods can be considered in simulation studies. The several methodologies for solving genome-scale models (eg, flux balance analysis [FBA], parsimonious FBA, flux variability analysis, and minimization of metabolic adjustment all have unique advantages and applications. There are two basic approaches to designing metabolic engineering strategies in silico, and both have demonstrated success in the literature. The first involves: 1 making a genetic manipulation in a model; 2 testing for improved performance through simulation; and 3 iterating the process. The second approach has been used in more recently designed in silico tools and involves: 1 comparing metabolic flux profiles of a wild-type and ideally engineered state and 2 designing engineering strategies based on the differences in these flux profiles. Improvements in genome-scale modeling are anticipated in areas such as the inclusion of all relevant cellular machinery, the ability to understand and anticipate the results of combinatorial enrichment experiments, and constructing dynamic and flexible biomass equations that can respond to environmental and genetic manipulations. Keywords: genome-scale modeling, genome-scale modeling, flux balance analysis, flux variability

  7. Development and experimental verification of a genome-scale metabolic model for Corynebacterium glutamicum

    Directory of Open Access Journals (Sweden)

    Hirasawa Takashi

    2009-08-01

    Full Text Available Abstract Background In silico genome-scale metabolic models enable the analysis of the characteristics of metabolic systems of organisms. In this study, we reconstructed a genome-scale metabolic model of Corynebacterium glutamicum on the basis of genome sequence annotation and physiological data. The metabolic characteristics were analyzed using flux balance analysis (FBA, and the results of FBA were validated using data from culture experiments performed at different oxygen uptake rates. Results The reconstructed genome-scale metabolic model of C. glutamicum contains 502 reactions and 423 metabolites. We collected the reactions and biomass components from the database and literatures, and made the model available for the flux balance analysis by filling gaps in the reaction networks and removing inadequate loop reactions. Using the framework of FBA and our genome-scale metabolic model, we first simulated the changes in the metabolic flux profiles that occur on changing the oxygen uptake rate. The predicted production yields of carbon dioxide and organic acids agreed well with the experimental data. The metabolic profiles of amino acid production phases were also investigated. A comprehensive gene deletion study was performed in which the effects of gene deletions on metabolic fluxes were simulated; this helped in the identification of several genes whose deletion resulted in an improvement in organic acid production. Conclusion The genome-scale metabolic model provides useful information for the evaluation of the metabolic capabilities and prediction of the metabolic characteristics of C. glutamicum. This can form a basis for the in silico design of C. glutamicum metabolic networks for improved bioproduction of desirable metabolites.

  8. Population genomics reveals chromosome-scale heterogeneous evolution in a protoploid yeast.

    Science.gov (United States)

    Friedrich, Anne; Jung, Paul; Reisser, Cyrielle; Fischer, Gilles; Schacherer, Joseph

    2015-01-01

    Yeast species represent an ideal model system for population genomic studies but large-scale polymorphism surveys have only been reported for species of the Saccharomyces genus so far. Hence, little is known about intraspecific diversity and evolution in yeast. To obtain a new insight into the evolutionary forces shaping natural populations, we sequenced the genomes of an expansive worldwide collection of isolates from a species distantly related to Saccharomyces cerevisiae: Lachancea kluyveri (formerly S. kluyveri). We identified 6.5 million single nucleotide polymorphisms and showed that a large introgression event of 1 Mb of GC-rich sequence in the chromosomal arm probably occurred in the last common ancestor of all L. kluyveri strains. Our population genomic data clearly revealed that this 1-Mb region underwent a molecular evolution pattern very different from the rest of the genome. It is characterized by a higher recombination rate, with a dramatically elevated A:T → G:C substitution rate, which is the signature of an increased GC-biased gene conversion. In addition, the predicted base composition at equilibrium demonstrates that the chromosome-scale compositional heterogeneity will persist after the genome has reached mutational equilibrium. Altogether, the data presented herein clearly show that distinct recombination and substitution regimes can coexist and lead to different evolutionary patterns within a single genome. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  9. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models

    Science.gov (United States)

    King, Zachary A.; Lu, Justin; Dräger, Andreas; Miller, Philip; Federowicz, Stephen; Lerman, Joshua A.; Ebrahim, Ali; Palsson, Bernhard O.; Lewis, Nathan E.

    2016-01-01

    Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repositories of high-quality models must be established, models must adhere to established standards and model components must be linked to relevant databases. Tools for model visualization further enhance their utility. To meet these needs, we present BiGG Models (http://bigg.ucsd.edu), a completely redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scale metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases. Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource for highly curated, standardized and accessible models of metabolism, BiGG Models will facilitate diverse systems biology studies and support knowledge-based analysis of diverse experimental data. PMID:26476456

  10. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models.

    Science.gov (United States)

    King, Zachary A; Lu, Justin; Dräger, Andreas; Miller, Philip; Federowicz, Stephen; Lerman, Joshua A; Ebrahim, Ali; Palsson, Bernhard O; Lewis, Nathan E

    2016-01-04

    Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repositories of high-quality models must be established, models must adhere to established standards and model components must be linked to relevant databases. Tools for model visualization further enhance their utility. To meet these needs, we present BiGG Models (http://bigg.ucsd.edu), a completely redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scale metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases. Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource for highly curated, standardized and accessible models of metabolism, BiGG Models will facilitate diverse systems biology studies and support knowledge-based analysis of diverse experimental data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Segmentation of genomic DNA through entropic divergence: Power laws and scaling

    Science.gov (United States)

    Azad, Rajeev K.; Bernaola-Galván, Pedro; Ramaswamy, Ramakrishna; Rao, J. Subba

    2002-05-01

    Genomic DNA is fragmented into segments using the Jensen-Shannon divergence. Use of this criterion results in the fragments being entropically homogeneous to within a predefined level of statistical significance. Application of this procedure is made to complete genomes of organisms from archaebacteria, eubacteria, and eukaryotes. The distribution of fragment lengths in bacterial and primitive eukaryotic DNAs shows two distinct regimes of power-law scaling. The characteristic length separating these two regimes appears to be an intrinsic property of the sequence rather than a finite-size artifact, and is independent of the significance level used in segmenting a given genome. Fragment length distributions obtained in the segmentation of the genomes of more highly evolved eukaryotes do not have such distinct regimes of power-law behavior.

  12. Genome scale transcriptome analysis of shoot organogenesis in Populus.

    Science.gov (United States)

    Bao, Yanghuan; Dharmawardhana, Palitha; Mockler, Todd C; Strauss, Steven H

    2009-11-17

    Our aim is to improve knowledge of gene regulatory circuits important to dedifferentiation, redifferentiation, and adventitious meristem organization during in vitro regeneration of plants. Regeneration of transgenic cells remains a major obstacle to research and commercial deployment of most taxa of transgenic plants, and woody species are particularly recalcitrant. The model woody species Populus, due to its genome sequence and amenability to in vitro manipulation, is an excellent species for study in this area. The genes recognized may help to guide the development of new tools for improving the efficiency of plant regeneration and transformation. We analyzed gene expression during poplar in vitro dedifferentiation and shoot regeneration using an Affymetrix array representing over 56,000 poplar transcripts. We focused on callus induction and shoot formation, thus we sampled RNAs from tissues: prior to callus induction, 3 days and 15 days after callus induction, and 3 days and 8 days after the start of shoot induction. We used a female hybrid white poplar clone (INRA 717-1 B4, Populus tremula x P. alba) that is used widely as a model transgenic genotype. Approximately 15% of the monitored genes were significantly up-or down-regulated when controlling the false discovery rate (FDR) at 0.01; over 3,000 genes had a 5-fold or greater change in expression. We found a large initial change in expression after the beginning of hormone treatment (at the earliest stage of callus induction), and then a much smaller number of additional differentially expressed genes at subsequent regeneration stages. A total of 588 transcription factors that were distributed in 45 gene families were differentially regulated. Genes that showed strong differential expression included components of auxin and cytokinin signaling, selected cell division genes, and genes related to plastid development and photosynthesis. When compared with data on in vitro callogenesis in Arabidopsis, 25% (1

  13. Genome scale transcriptome analysis of shoot organogenesis in Populus

    Directory of Open Access Journals (Sweden)

    Mockler Todd C

    2009-11-01

    Full Text Available Abstract Background Our aim is to improve knowledge of gene regulatory circuits important to dedifferentiation, redifferentiation, and adventitious meristem organization during in vitro regeneration of plants. Regeneration of transgenic cells remains a major obstacle to research and commercial deployment of most taxa of transgenic plants, and woody species are particularly recalcitrant. The model woody species Populus, due to its genome sequence and amenability to in vitro manipulation, is an excellent species for study in this area. The genes recognized may help to guide the development of new tools for improving the efficiency of plant regeneration and transformation. Results We analyzed gene expression during poplar in vitro dedifferentiation and shoot regeneration using an Affymetrix array representing over 56,000 poplar transcripts. We focused on callus induction and shoot formation, thus we sampled RNAs from tissues: prior to callus induction, 3 days and 15 days after callus induction, and 3 days and 8 days after the start of shoot induction. We used a female hybrid white poplar clone (INRA 717-1 B4, Populus tremula × P. alba that is used widely as a model transgenic genotype. Approximately 15% of the monitored genes were significantly up-or down-regulated when controlling the false discovery rate (FDR at 0.01; over 3,000 genes had a 5-fold or greater change in expression. We found a large initial change in expression after the beginning of hormone treatment (at the earliest stage of callus induction, and then a much smaller number of additional differentially expressed genes at subsequent regeneration stages. A total of 588 transcription factors that were distributed in 45 gene families were differentially regulated. Genes that showed strong differential expression included components of auxin and cytokinin signaling, selected cell division genes, and genes related to plastid development and photosynthesis. When compared with data on in

  14. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic Stress Disorder Enters the Age of Large-Scale Genomic Collaboration.

    Science.gov (United States)

    Logue, Mark W; Amstadter, Ananda B; Baker, Dewleen G; Duncan, Laramie; Koenen, Karestan C; Liberzon, Israel; Miller, Mark W; Morey, Rajendra A; Nievergelt, Caroline M; Ressler, Kerry J; Smith, Alicia K; Smoller, Jordan W; Stein, Murray B; Sumner, Jennifer A; Uddin, Monica

    2015-09-01

    The development of posttraumatic stress disorder (PTSD) is influenced by genetic factors. Although there have been some replicated candidates, the identification of risk variants for PTSD has lagged behind genetic research of other psychiatric disorders such as schizophrenia, autism, and bipolar disorder. Psychiatric genetics has moved beyond examination of specific candidate genes in favor of the genome-wide association study (GWAS) strategy of very large numbers of samples, which allows for the discovery of previously unsuspected genes and molecular pathways. The successes of genetic studies of schizophrenia and bipolar disorder have been aided by the formation of a large-scale GWAS consortium: the Psychiatric Genomics Consortium (PGC). In contrast, only a handful of GWAS of PTSD have appeared in the literature to date. Here we describe the formation of a group dedicated to large-scale study of PTSD genetics: the PGC-PTSD. The PGC-PTSD faces challenges related to the contingency on trauma exposure and the large degree of ancestral genetic diversity within and across participating studies. Using the PGC analysis pipeline supplemented by analyses tailored to address these challenges, we anticipate that our first large-scale GWAS of PTSD will comprise over 10 000 cases and 30 000 trauma-exposed controls. Following in the footsteps of our PGC forerunners, this collaboration-of a scope that is unprecedented in the field of traumatic stress-will lead the search for replicable genetic associations and new insights into the biological underpinnings of PTSD.

  15. Re-sequencing data for refining candidate genes and polymorphisms in QTL regions affecting adiposity in chicken.

    Directory of Open Access Journals (Sweden)

    Pierre-François Roux

    Full Text Available In this study, we propose an approach aiming at fine-mapping adiposity QTL in chicken, integrating whole genome re-sequencing data. First, two QTL regions for adiposity were identified by performing a classical linkage analysis on 1362 offspring in 11 sire families obtained by crossing two meat-type chicken lines divergently selected for abdominal fat weight. Those regions, located on chromosome 7 and 19, contained a total of 77 and 84 genes, respectively. Then, SNPs and indels in these regions were identified by re-sequencing sires. Considering issues related to polymorphism annotations for regulatory regions, we focused on the 120 and 104 polymorphisms having an impact on protein sequence, and located in coding regions of 35 and 42 genes situated in the two QTL regions. Subsequently, a filter was applied on SNPs considering their potential impact on the protein function based on conservation criteria. For the two regions, we identified 42 and 34 functional polymorphisms carried by 18 and 24 genes, and likely to deeply impact protein, including 3 coding indels and 4 nonsense SNPs. Finally, using gene functional annotation, a short list of 17 and 4 polymorphisms in 6 and 4 functional genes has been defined. Even if we cannot exclude that the causal polymorphisms may be located in regulatory regions, this strategy gives a complete overview of the candidate polymorphisms in coding regions and prioritize them on conservation- and functional-based arguments.

  16. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster.

    Science.gov (United States)

    Chan, Andrew H; Jenkins, Paul A; Song, Yun S

    2012-01-01

    Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features-including recombination rates, diversity, divergence, GC content, gene content, and sequence quality-is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and diversity.

  17. Panoptes: web-based exploration of large scale genome variation data.

    Science.gov (United States)

    Vauterin, Paul; Jeffery, Ben; Miles, Alistair; Amato, Roberto; Hart, Lee; Wright, Ian; Kwiatkowski, Dominic

    2017-10-15

    The size and complexity of modern large-scale genome variation studies demand novel approaches for exploring and sharing the data. In order to unlock the potential of these data for a broad audience of scientists with various areas of expertise, a unified exploration framework is required that is accessible, coherent and user-friendly. Panoptes is an open-source software framework for collaborative visual exploration of large-scale genome variation data and associated metadata in a web browser. It relies on technology choices that allow it to operate in near real-time on very large datasets. It can be used to browse rich, hybrid content in a coherent way, and offers interactive visual analytics approaches to assist the exploration. We illustrate its application using genome variation data of Anopheles gambiae, Plasmodium falciparum and Plasmodium vivax. Freely available at https://github.com/cggh/panoptes, under the GNU Affero General Public License. paul.vauterin@gmail.com.

  18. Generation and Evaluation of a Genome-Scale Metabolic Network Model of Synechococcus elongatus PCC7942

    Directory of Open Access Journals (Sweden)

    Julián Triana

    2014-08-01

    Full Text Available The reconstruction of genome-scale metabolic models and their applications represent a great advantage of systems biology. Through their use as metabolic flux simulation models, production of industrially-interesting metabolites can be predicted. Due to the growing number of studies of metabolic models driven by the increasing genomic sequencing projects, it is important to conceptualize steps of reconstruction and analysis. We have focused our work in the cyanobacterium Synechococcus elongatus PCC7942, for which several analyses and insights are unveiled. A comprehensive approach has been used, which can be of interest to lead the process of manual curation and genome-scale metabolic analysis. The final model, iSyf715 includes 851 reactions and 838 metabolites. A biomass equation, which encompasses elementary building blocks to allow cell growth, is also included. The applicability of the model is finally demonstrated by simulating autotrophic growth conditions of Synechococcus elongatus PCC7942.

  19. Genome-Scale Model of Streptococcus thermophilus LMG18311 for Metabolic Comparison of Lactic Acid Bacteria

    NARCIS (Netherlands)

    Pastink, M.I.; Teusink, B.; Hols, P.; Visser, S.; Vos, de W.M.; Hugenholtz, J.

    2009-01-01

    In this report we describe amino acid-metabolism and amino acid-dependency of the dairy bacterium Streptococcus thermophilus LMG18311 and compare that with two other characterized lactic acid bacteria, Lactococcus lactis and Lactobacillus plantarum. Through the construction of a genome-scale

  20. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. Results: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic...

  1. Reliable and efficient solution of genome-scale models of Metabolism and macromolecular Expression

    DEFF Research Database (Denmark)

    Ma, Ding; Yang, Laurence; Fleming, Ronan M. T.

    2017-01-01

    Constraint-Based Reconstruction and Analysis (COBRA) is currently the only methodology that permits integrated modeling of Metabolism and macromolecular Expression (ME) at genome-scale. Linear optimization computes steady-state flux solutions to ME models, but flux values are spread over many...

  2. Genome-scale cold stress response regulatory networks in ten Arabidopsis thaliana ecotypes

    DEFF Research Database (Denmark)

    Barah, Pankaj; Jayavelu, Naresh Doni; Rasmussen, Simon

    2013-01-01

    BACKGROUND: Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. RE...

  3. Flux balance analysis of genome-scale metabolic model of rice ...

    Indian Academy of Sciences (India)

    Here, we analyse a genome-scale metabolic model of rice leaf using Flux Balance Analysis to investigate whether it has potential metabolic flexibility to increase the biosynthesis of any of the biomass components. We initially simulate the metabolic responses under an objective to maximize the biomass components.

  4. Genome-scale CRISPR-Cas9 knockout screening in human cells

    National Research Council Canada - National Science Library

    Shalem, Ophir; Sanjana, Neville E; Hartenian, Ella; Shi, Xi; Scott, David A; Mikkelsen, Tarjei S; Heckl, Dirk; Ebert, Benjamin L; Root, David E; Doench, John G; Zhang, Feng

    .... We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells...

  5. Comparative genome-scale metabolic modeling of actinomycetes : The topology of essential core metabolism

    NARCIS (Netherlands)

    Alam, Mohammad Tauqeer; Medema, Marnix H.; Takano, Eriko; Breitling, Rainer; Gojobori, Takashi

    2011-01-01

    Actinomycetes are highly important bacteria. On one hand, some of them cause severe human and plant diseases, on the other hand, many species are known for their ability to produce antibiotics. Here we report the results of a comparative analysis of genome-scale metabolic models of 37 species of

  6. Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli

    DEFF Research Database (Denmark)

    McCloskey, Douglas; Palsson, Bernhard; Feist, Adam

    2013-01-01

    The genome-scale model (GEM) of metabolism in the bacterium Escherichia coli K-12 has been in development for over a decade and is now in wide use. GEM-enabled studies of E. coli have been primarily focused on six applications: (1) metabolic engineering, (2) model-driven discovery, (3) prediction...

  7. MultiMetEval : Comparative and Multi-Objective Analysis of Genome-Scale Metabolic Models

    NARCIS (Netherlands)

    Zakrzewski, Piotr; Medema, Marnix H.; Gevorgyan, Albert; Kierzek, Andrzej M.; Breitling, Rainer; Takano, Eriko; Fong, Stephen S.

    2012-01-01

    Comparative metabolic modelling is emerging as a novel field, supported by the development of reliable and standardized approaches for constructing genome-scale metabolic models in high throughput. New software solutions are needed to allow efficient comparative analysis of multiple models in the

  8. Genome Sequencing

    DEFF Research Database (Denmark)

    Sato, Shusei; Andersen, Stig Uggerhøj

    2014-01-01

    The current Lotus japonicus reference genome sequence is based on a hybrid assembly of Sanger TAC/BAC, Sanger shotgun and Illumina shotgun sequencing data generated from the Miyakojima-MG20 accession. It covers nearly all expressed L. japonicus genes and has been annotated mainly based...... on transcriptional evidence. Analysis of repetitive sequences suggests that they are underrepresented in the reference assembly, reflecting an enrichment of gene-rich regions in the current assembly. Characterization of Lotus natural variation by resequencing of L. japonicus accessions and diploid Lotus species...... is currently ongoing, facilitated by the MG20 reference sequence...

  9. Rare UNC13B variations and risk of schizophrenia: Whole-exome sequencing in a multiplex family and follow-up resequencing and a case-control study.

    Science.gov (United States)

    Egawa, Jun; Hoya, Satoshi; Watanabe, Yuichiro; Nunokawa, Ayako; Shibuya, Masako; Ikeda, Masashi; Inoue, Emiko; Okuda, Shujiro; Kondo, Kenji; Saito, Takeo; Kaneko, Naoshi; Muratake, Tatsuyuki; Igeta, Hirofumi; Iwata, Nakao; Someya, Toshiyuki

    2016-09-01

    Rare genomic variations inherited in multiplex schizophrenia families are suggested to play a role in the genetic etiology of the disease. To identify rare variations with large effects on the risk of developing schizophrenia, we performed whole-exome sequencing (WES) in two affected and one unaffected individual of a multiplex family with 10 affected individuals. We also performed follow-up resequencing of the unc-13 homolog B (Caenorhabditis elegans) (UNC13B) gene, a potential risk gene identified by WES, in the multiplex family and undertook a case-control study to investigate association between UNC13B and schizophrenia. UNC13B coding regions (39 exons) from 15 individuals of the multiplex family and 111 affected offspring for whom parental DNA samples were available were resequenced. Rare missense UNC13B variations identified by resequencing were further tested for association with schizophrenia in two independent case-control populations comprising a total of 1,753 patients and 1,602 controls. A rare missense variation (V1525M) in UNC13B was identified by WES in the multiplex family; this variation was present in five of six affected individuals, but not in eight unaffected individuals or one individual of unknown disease status. Resequencing UNC13B coding regions identified five rare missense variations (T103M, M813T, P1349T, I1362T, and V1525M). In the case-control study, there was no significant association between rare missense UNC13B variations and schizophrenia, although single-variant meta-analysis indicated that M813T was nominally associated with schizophrenia. These results do not support a contribution of rare missense UNC13B variations to the genetic etiology of schizophrenia in the Japanese population. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  10. Survey of protein–DNA interactions in Aspergillus oryzae on a genomic scale

    Science.gov (United States)

    Wang, Chao; Lv, Yangyong; Wang, Bin; Yin, Chao; Lin, Ying; Pan, Li

    2015-01-01

    The genome-scale delineation of in vivo protein–DNA interactions is key to understanding genome function. Only ∼5% of transcription factors (TFs) in the Aspergillus genus have been identified using traditional methods. Although the Aspergillus oryzae genome contains >600 TFs, knowledge of the in vivo genome-wide TF-binding sites (TFBSs) in aspergilli remains limited because of the lack of high-quality antibodies. We investigated the landscape of in vivo protein–DNA interactions across the A. oryzae genome through coupling the DNase I digestion of intact nuclei with massively parallel sequencing and the analysis of cleavage patterns in protein–DNA interactions at single-nucleotide resolution. The resulting map identified overrepresented de novo TF-binding motifs from genomic footprints, and provided the detailed chromatin remodeling patterns and the distribution of digital footprints near transcription start sites. The TFBSs of 19 known Aspergillus TFs were also identified based on DNase I digestion data surrounding potential binding sites in conjunction with TF binding specificity information. We observed that the cleavage patterns of TFBSs were dependent on the orientation of TF motifs and independent of strand orientation, consistent with the DNA shape features of binding motifs with flanking sequences. PMID:25883143

  11. Integration of expression data in genome-scale metabolic network reconstructions

    Directory of Open Access Journals (Sweden)

    Anna S. Blazier

    2012-08-01

    Full Text Available With the advent of high-throughput technologies, the field of systems biology has amassed an abundance of omics data, quantifying thousands of cellular components across a variety of scales, ranging from mRNA transcript levels to metabolite quantities. Methods are needed to not only integrate this omics data but to also use this data to heighten the predictive capabilities of computational models. Several recent studies have successfully demonstrated how flux balance analysis (FBA, a constraint-based modeling approach, can be used to integrate transcriptomic data into genome-scale metabolic network reconstructions to generate predictive computational models. In this review, we summarize such FBA-based methods for integrating expression data into genome-scale metabolic network reconstructions, highlighting their advantages as well as their limitations.

  12. The use of weighted graphs for large-scale genome analysis.

    Directory of Open Access Journals (Sweden)

    Fang Zhou

    Full Text Available There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance, isoenzymatic (these summarize enzymatic variety/redundancy, and sequence-similarity (these summarize sequence conservation; and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution.

  13. Toward the automated generation of genome-scale metabolic networks in the SEED

    Directory of Open Access Journals (Sweden)

    Gould John

    2007-04-01

    Full Text Available Abstract Background Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process. Results We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis. We have implemented our tools and database within the SEED, an open-source software environment for comparative

  14. Analysis of Genes Involved in Body Weight Regulation by Targeted Re-Sequencing.

    Directory of Open Access Journals (Sweden)

    Anna-Lena Volckmar

    Full Text Available Genes involved in body weight regulation that were previously investigated in genome-wide association studies (GWAS and in animal models were target-enriched followed by massive parallel next generation sequencing.We enriched and re-sequenced continuous genomic regions comprising FTO, MC4R, TMEM18, SDCCAG8, TKNS, MSRA and TBC1D1 in a screening sample of 196 extremely obese children and adolescents with age and sex specific body mass index (BMI ≥ 99th percentile and 176 lean adults (BMI ≤ 15th percentile. 22 variants were confirmed by Sanger sequencing. Genotyping was performed in up to 705 independent obesity trios (extremely obese child and both parents, 243 extremely obese cases and 261 lean adults.We detected 20 different non-synonymous variants, one frame shift and one nonsense mutation in the 7 continuous genomic regions in study groups of different weight extremes. For SNP Arg695Cys (rs58983546 in TBC1D1 we detected nominal association with obesity (pTDT = 0.03 in 705 trios. Eleven of the variants were rare, thus were only detected heterozygously in up to ten individual(s of the complete screening sample of 372 individuals. Two of them (in FTO and MSRA were found in lean individuals, nine in extremely obese. In silico analyses of the 11 variants did not reveal functional implications for the mutations. Concordant with our hypothesis we detected a rare variant that potentially leads to loss of FTO function in a lean individual. For TBC1D1, in contrary to our hypothesis, the loss of function variant (Arg443Stop was found in an obese individual. Functional in vitro studies are warranted.

  15. In Silico Genome-Scale Reconstruction and Validation of the Corynebacterium glutamicum Metabolic Network

    DEFF Research Database (Denmark)

    Kjeldsen, Kjeld Raunkjær; Nielsen, J.

    2009-01-01

    A genome-scale metabolic model of the Gram-positive bacteria Corynebacterium glutamicum ATCC 13032 was constructed comprising 446 reactions and 411 metabolite, based on the annotated genome and available biochemical information. The network was analyzed using constraint based methods. The model...... was extensively validated against published flux data, and flux distribution values were found to correlate well between simulations and experiments. The split pathway of the lysine synthesis pathway of C. glutamicum was investigated, and it was found that the direct dehydrogenase variant gave a higher lysine...

  16. Systematic planning of genome-scale experiments in poorly studied species.

    Directory of Open Access Journals (Sweden)

    Yuanfang Guan

    2010-03-01

    Full Text Available Genome-scale datasets have been used extensively in model organisms to screen for specific candidates or to predict functions for uncharacterized genes. However, despite the availability of extensive knowledge in model organisms, the planning of genome-scale experiments in poorly studied species is still based on the intuition of experts or heuristic trials. We propose that computational and systematic approaches can be applied to drive the experiment planning process in poorly studied species based on available data and knowledge in closely related model organisms. In this paper, we suggest a computational strategy for recommending genome-scale experiments based on their capability to interrogate diverse biological processes to enable protein function assignment. To this end, we use the data-rich functional genomics compendium of the model organism to quantify the accuracy of each dataset in predicting each specific biological process and the overlap in such coverage between different datasets. Our approach uses an optimized combination of these quantifications to recommend an ordered list of experiments for accurately annotating most proteins in the poorly studied related organisms to most biological processes, as well as a set of experiments that target each specific biological process. The effectiveness of this experiment- planning system is demonstrated for two related yeast species: the model organism Saccharomyces cerevisiae and the comparatively poorly studied Saccharomyces bayanus. Our system recommended a set of S. bayanus experiments based on an S. cerevisiae microarray data compendium. In silico evaluations estimate that less than 10% of the experiments could achieve similar functional coverage to the whole microarray compendium. This estimation was confirmed by performing the recommended experiments in S. bayanus, therefore significantly reducing the labor devoted to characterize the poorly studied genome. This experiment

  17. The genome sequence of E. coli W (ATCC 9637: comparative genome analysis and an improved genome-scale reconstruction of E. coli

    Directory of Open Access Journals (Sweden)

    Lee Sang

    2011-01-01

    Full Text Available Abstract Background Escherichia coli is a model prokaryote, an important pathogen, and a key organism for industrial biotechnology. E. coli W (ATCC 9637, one of four strains designated as safe for laboratory purposes, has not been sequenced. E. coli W is a fast-growing strain and is the only safe strain that can utilize sucrose as a carbon source. Lifecycle analysis has demonstrated that sucrose from sugarcane is a preferred carbon source for industrial bioprocesses. Results We have sequenced and annotated the genome of E. coli W. The chromosome is 4,900,968 bp and encodes 4,764 ORFs. Two plasmids, pRK1 (102,536 bp and pRK2 (5,360 bp, are also present. W has unique features relative to other sequenced laboratory strains (K-12, B and Crooks: it has a larger genome and belongs to phylogroup B1 rather than A. W also grows on a much broader range of carbon sources than does K-12. A genome-scale reconstruction was developed and validated in order to interrogate metabolic properties. Conclusions The genome of W is more similar to commensal and pathogenic B1 strains than phylogroup A strains, and therefore has greater utility for comparative analyses with these strains. W should therefore be the strain of choice, or 'type strain' for group B1 comparative analyses. The genome annotation and tools created here are expected to allow further utilization and development of E. coli W as an industrial organism for sucrose-based bioprocesses. Refinements in our E. coli metabolic reconstruction allow it to more accurately define E. coli metabolism relative to previous models.

  18. Three-stage quality control strategies for DNA re-sequencing data.

    Science.gov (United States)

    Guo, Yan; Ye, Fei; Sheng, Quanghu; Clark, Travis; Samuels, David C

    2014-11-01

    Advances in next-generation sequencing (NGS) technologies have greatly improved our ability to detect genomic variants for biomedical research. In particular, NGS technologies have been recently applied with great success to the discovery of mutations associated with the growth of various tumours and in rare Mendelian diseases. The advance in NGS technologies has also created significant challenges in bioinformatics. One of the major challenges is quality control of the sequencing data. In this review, we discuss the proper quality control procedures and parameters for Illumina technology-based human DNA re-sequencing at three different stages of sequencing: raw data, alignment and variant calling. Monitoring quality control metrics at each of the three stages of NGS data provides unique and independent evaluations of data quality from differing perspectives. Properly conducting quality control protocols at all three stages and correctly interpreting the quality control results are crucial to ensure a successful and meaningful study. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  19. Using a resequencing microarray as a multiple respiratory pathogen detection assay.

    Science.gov (United States)

    Lin, Baochuan; Blaney, Kate M; Malanoski, Anthony P; Ligler, Adam G; Schnur, Joel M; Metzgar, David; Russell, Kevin L; Stenger, David A

    2007-02-01

    Simultaneous testing for detection of infectious pathogens that cause similar symptoms (e.g., acute respiratory infections) is invaluable for patient treatment, outbreak prevention, and efficient use of antibiotic and antiviral agents. In addition, such testing may provide information regarding possible coinfections or induced secondary infections, such as virally induced bacterial infections. Furthermore, in many cases, detection of a pathogen requires more than genus/species-level resolution, since harmful agents (e.g., avian influenza virus) are grouped with other, relatively benign common agents, and for every pathogen, finer resolution is useful to allow tracking of the location and nature of mutations leading to strain variations. In this study, a previously developed resequencing microarray that has been demonstrated to have these capabilities was further developed to provide individual detection sensitivity ranging from 10(1) to 10(3) genomic copies for more than 26 respiratory pathogens while still retaining the ability to detect and differentiate between close genetic neighbors. In addition, the study demonstrated that this system allows unambiguous and reproducible sequence-based strain identification of the mixed pathogens. Successful proof-of-concept experiments using clinical specimens show that this approach is potentially very useful for both diagnostics and epidemic surveillance.

  20. PUGS: A novel scale to assess perceptions of uncertainties in genome sequencing.

    Science.gov (United States)

    Biesecker, B B; Woolford, S W; Klein, W M P; Brothers, K B; Umstead, K L; Lewis, K L; Biesecker, L G; Han, P K J

    2017-08-01

    Expectations of results from genome sequencing by end users are influenced by perceptions of uncertainty. This study aimed to assess uncertainties about sequencing by developing, evaluating, and implementing a novel scale. The Perceptions of Uncertainties in Genome Sequencing (PUGS) scale comprised ten items to assess uncertainties within three domains: clinical, affective, and evaluative. Participants (n=535) from the ClinSeq® NIH sequencing study completed a baseline survey that included the PUGS; responses (mean = 3.4/5, SD=0.58) suggested modest perceptions of certainty. A confirmatory factor analysis identified factor loadings that led to elimination of two items. A revised eight-item PUGS scale was used to test correlations with perceived ambiguity (r = -0.303, p PUGS (mean = 3.4/5, SD = 0.72), and configural invariance was supported across the two datasets. As such, the PUGS is a promising scale for evaluating perceived uncertainties in genome sequencing, which can inform interventions to help patients form realistic expectations of these uncertainties. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  1. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains

    Science.gov (United States)

    Salipante, Stephen J.; Roach, David J.; Kitzman, Jacob O.; Snyder, Matthew W.; Stackhouse, Bethany; Butler-Wu, Susan M.; Lee, Choli; Cookson, Brad T.

    2015-01-01

    Large-scale bacterial genome sequencing efforts to date have provided limited information on the most prevalent category of disease: sporadically acquired infections caused by common pathogenic bacteria. Here, we performed whole-genome sequencing and de novo assembly of 312 blood- or urine-derived isolates of extraintestinal pathogenic (ExPEC) Escherichia coli, a common agent of sepsis and community-acquired urinary tract infections, obtained during the course of routine clinical care at a single institution. We find that ExPEC E. coli are highly genomically heterogeneous, consistent with pan-genome analyses encompassing the larger species. Investigation of differential virulence factor content and antibiotic resistance phenotypes reveals markedly different profiles among lineages and among strains infecting different body sites. We use high-resolution molecular epidemiology to explore the dynamics of infections at the level of individual patients, including identification of possible person-to-person transmission. Notably, a limited number of discrete lineages caused the majority of bloodstream infections, including one subclone (ST131-H30) responsible for 28% of bacteremic E. coli infections over a 3-yr period. We additionally use a microbial genome-wide-association study (GWAS) approach to identify individual genes responsible for antibiotic resistance, successfully recovering known genes but notably not identifying any novel factors. We anticipate that in the near future, whole-genome sequencing of microorganisms associated with clinical disease will become routine. Our study reveals what kind of information can be obtained from sequencing clinical isolates on a large scale, even well-characterized organisms such as E. coli, and provides insight into how this information might be utilized in a healthcare setting. PMID:25373147

  2. iAK692: a genome-scale metabolic model of Spirulina platensis C1.

    Science.gov (United States)

    Klanchui, Amornpan; Khannapho, Chiraphan; Phodee, Atchara; Cheevadhanarak, Supapon; Meechai, Asawin

    2012-06-15

    Spirulina (Arthrospira) platensis is a well-known filamentous cyanobacterium used in the production of many industrial products, including high value compounds, healthy food supplements, animal feeds, pharmaceuticals and cosmetics, for example. It has been increasingly studied around the world for scientific purposes, especially for its genome, biology, physiology, and also for the analysis of its small-scale metabolic network. However, the overall description of the metabolic and biotechnological capabilities of S. platensis requires the development of a whole cellular metabolism model. Recently, the S. platensis C1 (Arthrospira sp. PCC9438) genome sequence has become available, allowing systems-level studies of this commercial cyanobacterium. In this work, we present the genome-scale metabolic network analysis of S. platensis C1, iAK692, its topological properties, and its metabolic capabilities and functions. The network was reconstructed from the S. platensis C1 annotated genomic sequence using Pathway Tools software to generate a preliminary network. Then, manual curation was performed based on a collective knowledge base and a combination of genomic, biochemical, and physiological information. The genome-scale metabolic model consists of 692 genes, 837 metabolites, and 875 reactions. We validated iAK692 by conducting fermentation experiments and simulating the model under autotrophic, heterotrophic, and mixotrophic growth conditions using COBRA toolbox. The model predictions under these growth conditions were consistent with the experimental results. The iAK692 model was further used to predict the unique active reactions and essential genes for each growth condition. Additionally, the metabolic states of iAK692 during autotrophic and mixotrophic growths were described by phenotypic phase plane (PhPP) analysis. This study proposes the first genome-scale model of S. platensis C1, iAK692, which is a predictive metabolic platform for a global understanding of

  3. iAK692: A genome-scale metabolic model of Spirulina platensis C1

    Directory of Open Access Journals (Sweden)

    Klanchui Amornpan

    2012-06-01

    Full Text Available Abstract Background Spirulina (Arthrospira platensis is a well-known filamentous cyanobacterium used in the production of many industrial products, including high value compounds, healthy food supplements, animal feeds, pharmaceuticals and cosmetics, for example. It has been increasingly studied around the world for scientific purposes, especially for its genome, biology, physiology, and also for the analysis of its small-scale metabolic network. However, the overall description of the metabolic and biotechnological capabilities of S. platensis requires the development of a whole cellular metabolism model. Recently, the S. platensis C1 (Arthrospira sp. PCC9438 genome sequence has become available, allowing systems-level studies of this commercial cyanobacterium. Results In this work, we present the genome-scale metabolic network analysis of S. platensis C1, iAK692, its topological properties, and its metabolic capabilities and functions. The network was reconstructed from the S. platensis C1 annotated genomic sequence using Pathway Tools software to generate a preliminary network. Then, manual curation was performed based on a collective knowledge base and a combination of genomic, biochemical, and physiological information. The genome-scale metabolic model consists of 692 genes, 837 metabolites, and 875 reactions. We validated iAK692 by conducting fermentation experiments and simulating the model under autotrophic, heterotrophic, and mixotrophic growth conditions using COBRA toolbox. The model predictions under these growth conditions were consistent with the experimental results. The iAK692 model was further used to predict the unique active reactions and essential genes for each growth condition. Additionally, the metabolic states of iAK692 during autotrophic and mixotrophic growths were described by phenotypic phase plane (PhPP analysis. Conclusions This study proposes the first genome-scale model of S. platensis C1, iAK692, which is a

  4. PopGeV: a web-based large-scale population genome browser.

    Science.gov (United States)

    Shi, Xinyi; Peng, Jing; Yu, Xiaohan; Zhang, Xiaohong; Li, Dongye; Liu, Baohui; Kong, Fanjiang; Yuan, Xiaohui

    2015-09-15

    The development of high-throughput sequencing technology has made it possible for more and more researchers to use population sequencing data to mine genes associated with specific traits. However, the massive amounts of sequencing data have also brought new challenges to the researchers. The question of how to browse population genomic data in an easy and intuitive manner must be addressed. Web-based genome browsers allow user to conveniently view the results of genomic analyses, but heavy usage can reduce the response speed of the webpage, which limits its usefulness in the display of large-scale genome data. IndexedDB technology is a good solution to this problem; it supports web browsers and so creates local databases. In this way, data can be read from the local storage, achieving a smooth display of population genomic data. PopGeV has the following characteristics. First, it uses a new encoding method for compression of population SNP and INDEL data. IndexedDB technology is used to download the results to local storage so that users can browse the results smoothly even when the network traffic is heavy. Second, PopGeV identify similar genomic regions between two individuals based on SNP data. Population diversity indexes are calculated when comparing two populations. Third, user defined annotation information can be integrated for user-friendly mining of gene functions. Simulation shows that PopGeV can smoothly display analysis results of population genome containing over 500 individuals with 2 millions SNP data. PopGeV is available at www.soyomics.com/popgev/ yuanxh@iga.ac.cn. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Investigating host-pathogen behavior and their interaction using genome-scale metabolic network models.

    Science.gov (United States)

    Sadhukhan, Priyanka P; Raghunathan, Anu

    2014-01-01

    Genome Scale Metabolic Modeling methods represent one way to compute whole cell function starting from the genome sequence of an organism and contribute towards understanding and predicting the genotype-phenotype relationship. About 80 models spanning all the kingdoms of life from archaea to eukaryotes have been built till date and used to interrogate cell phenotype under varying conditions. These models have been used to not only understand the flux distribution in evolutionary conserved pathways like glycolysis and the Krebs cycle but also in applications ranging from value added product formation in Escherichia coli to predicting inborn errors of Homo sapiens metabolism. This chapter describes a protocol that delineates the process of genome scale metabolic modeling for analysing host-pathogen behavior and interaction using flux balance analysis (FBA). The steps discussed in the process include (1) reconstruction of a metabolic network from the genome sequence, (2) its representation in a precise mathematical framework, (3) its translation to a model, and (4) the analysis using linear algebra and optimization. The methods for biological interpretations of computed cell phenotypes in the context of individual host and pathogen models and their integration are also discussed.

  6. Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales

    Directory of Open Access Journals (Sweden)

    Long Qian

    2016-10-01

    Full Text Available The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.

  7. Genome-wide evolutionary dynamics of influenza B viruses on a global scale.

    Directory of Open Access Journals (Sweden)

    Pinky Langat

    2017-12-01

    Full Text Available The global-scale epidemiology and genome-wide evolutionary dynamics of influenza B remain poorly understood compared with influenza A viruses. We compiled a spatio-temporally comprehensive dataset of influenza B viruses, comprising over 2,500 genomes sampled worldwide between 1987 and 2015, including 382 newly-sequenced genomes that fill substantial gaps in previous molecular surveillance studies. Our contributed data increase the number of available influenza B virus genomes in Europe, Africa and Central Asia, improving the global context to study influenza B viruses. We reveal Yamagata-lineage diversity results from co-circulation of two antigenically-distinct groups that also segregate genetically across the entire genome, without evidence of intra-lineage reassortment. In contrast, Victoria-lineage diversity stems from geographic segregation of different genetic clades, with variability in the degree of geographic spread among clades. Differences between the lineages are reflected in their antigenic dynamics, as Yamagata-lineage viruses show alternating dominance between antigenic groups, while Victoria-lineage viruses show antigenic drift of a single lineage. Structural mapping of amino acid substitutions on trunk branches of influenza B gene phylogenies further supports these antigenic differences and highlights two potential mechanisms of adaptation for polymerase activity. Our study provides new insights into the epidemiological and molecular processes shaping influenza B virus evolution globally.

  8. Mitochondrial resequencing arrays detect tumor-specific mutations in salivary rinses of patients with head and neck cancer.

    Science.gov (United States)

    Mithani, Suhail K; Smith, Ian M; Zhou, Shaoyu; Gray, Andrew; Koch, Wayne M; Maitra, Anirban; Califano, Joseph A

    2007-12-15

    Alterations of the mitochondrial genome have been identified in multiple solid tumors and in many head and neck squamous cell carcinomas (HNSCC). Identification of mitochondrial mutations in the salivary rinses of patients with HNSCC has potential application in disease detection. In this study, we used the MitoChip v2.0 mitochondrial genome resequencing array to detect minor populations of mitochondrial DNA in salivary rinses of patients with HNSCC. Salivary rinses from 13 patients with HNSCC, whose tumors carried mitochondrial mutations, were collected before surgical resection. DNA isolated from salivary rinses and serial dilutions of DNA derived from HNSCC-derived cell lines with known mitochondrial mutations were sequenced using the MitoChip, and analyzed using a quantitative algorithm which we developed to detect minor populations of mitochondrial DNA from MitoChip probe intensity data. We detected heteroplasmic populations of mitochondrial DNA up to a 1:200 dilution using MitoChip v2.0 and our analysis algorithm. A logarithmic relationship between the magnitude of assay intensity and concentration of minor mitochondrial populations was shown. This technique was able to identify tumor-specific mitochondrial mutations in salivary rinses from 10 of 13 (76.9%) patients with head and neck cancer. Minor populations of mitochondrial DNA and disease-specific mitochondrial mutations in salivary rinses of patients with HNSCC can be successfully identified using the MitoChip resequencing array and the algorithm which we have developed. This technique has potential application in the surveillance of patients after resection and may have applicability in the surveillance of body fluids in other tumor types.

  9. Rapid genotyping by low-coverage resequencing to construct genetic linkage maps of fungi: a case study in Lentinula edodes.

    Science.gov (United States)

    Au, Chun Hang; Cheung, Man Kit; Wong, Man Chun; Chu, Astley Kin Kan; Law, Patrick Tik Wan; Kwan, Hoi Shan

    2013-08-02

    Genetic linkage maps are important tools in breeding programmes and quantitative trait analyses. Traditional molecular markers used for genotyping are limited in throughput and efficiency. The advent of next-generation sequencing technologies has facilitated progeny genotyping and genetic linkage map construction in the major grains. However, the applicability of the approach remains untested in the fungal system. Shiitake mushroom, Lentinula edodes, is a basidiomycetous fungus that represents one of the most popular cultivated edible mushrooms. Here, we developed a rapid genotyping method based on low-coverage (~0.5 to 1.5-fold) whole-genome resequencing. We used the approach to genotype 20 single-spore isolates derived from L. edodes strain L54 and constructed the first high-density sequence-based genetic linkage map of L. edodes. The accuracy of the proposed genotyping method was verified experimentally with results from mating compatibility tests and PCR-single-strand conformation polymorphism on a few known genes. The linkage map spanned a total genetic distance of 637.1 cM and contained 13 linkage groups. Two hundred sequence-based markers were placed on the map, with an average marker spacing of 3.4 cM. The accuracy of the map was confirmed by comparing with previous maps the locations of known genes such as matA and matB. We used the shiitake mushroom as an example to provide a proof-of-principle that low-coverage resequencing could allow rapid genotyping of basidiospore-derived progenies, which could in turn facilitate the construction of high-density genetic linkage maps of basidiomycetous fungi for quantitative trait analyses and improvement of genome assembly.

  10. Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions.

    Directory of Open Access Journals (Sweden)

    Guillaume Laval

    Full Text Available BACKGROUND: Estimating the historical and demographic parameters that characterize modern human populations is a fundamental part of reconstructing the recent history of our species. In addition, the development of a model of human evolution that can best explain neutral genetic diversity is required to identify confidently regions of the human genome that have been targeted by natural selection. METHODOLOGY/PRINCIPAL FINDINGS: We have resequenced 20 independent noncoding autosomal regions dispersed throughout the genome in 213 individuals from different continental populations, corresponding to a total of approximately 6 Mb of diploid resequencing data. We used these data to explore and co-estimate an extensive range of historical and demographic parameters with a statistical framework that combines the evaluation of multiple models of human evolution via a best-fit approach, followed by an Approximate Bayesian Computation (ABC analysis. From a methodological standpoint, evaluating the accuracy of the parameter co-estimation allowed us to identify the most accurate set of statistics to be used for the estimation of each of the different historical and demographic parameters characterizing recent human evolution. CONCLUSIONS/SIGNIFICANCE: Our results support a model in which modern humans left Africa through a single major dispersal event occurring approximately 60,000 years ago, corresponding to a drastic reduction of approximately 5 times the effective population size of the ancestral African population of approximately 13,800 individuals. Subsequently, the ancestors of modern Europeans and East Asians diverged much later, approximately 22,500 years ago, from the population of ancestral migrants. This late diversification of Eurasians after the African exodus points to the occurrence of a long maturation phase in which the ancestral Eurasian population was not yet diversified.

  11. A systems approach to predict oncometabolites via context-specific genome-scale metabolic networks.

    Directory of Open Access Journals (Sweden)

    Hojung Nam

    2014-09-01

    Full Text Available Altered metabolism in cancer cells has been viewed as a passive response required for a malignant transformation. However, this view has changed through the recently described metabolic oncogenic factors: mutated isocitrate dehydrogenases (IDH, succinate dehydrogenase (SDH, and fumarate hydratase (FH that produce oncometabolites that competitively inhibit epigenetic regulation. In this study, we demonstrate in silico predictions of oncometabolites that have the potential to dysregulate epigenetic controls in nine types of cancer by incorporating massive scale genetic mutation information (collected from more than 1,700 cancer genomes, expression profiling data, and deploying Recon 2 to reconstruct context-specific genome-scale metabolic models. Our analysis predicted 15 compounds and 24 substructures of potential oncometabolites that could result from the loss-of-function and gain-of-function mutations of metabolic enzymes, respectively. These results suggest a substantial potential for discovering unidentified oncometabolites in various forms of cancers.

  12. Genome scale models of yeast: towards standardized evaluation and consistent omic integration

    DEFF Research Database (Denmark)

    Sanchez, Benjamin J.; Nielsen, Jens

    2015-01-01

    Genome scale models (GEMs) have enabled remarkable advances in systems biology, acting as functional databases of metabolism, and as scaffolds for the contextualization of high-throughput data. In the case of Saccharomyces cerevisiae (budding yeast), several GEMs have been published and are curre......Genome scale models (GEMs) have enabled remarkable advances in systems biology, acting as functional databases of metabolism, and as scaffolds for the contextualization of high-throughput data. In the case of Saccharomyces cerevisiae (budding yeast), several GEMs have been published...... and are currently used for metabolic engineering and elucidating biological interactions. Here we review the history of yeast's GEMs, focusing on recent developments. We study how these models are typically evaluated, using both descriptive and predictive metrics. Additionally, we analyze the different ways...

  13. Mapping condition-dependent regulation of metabolism in yeast through genome-scale modeling

    DEFF Research Database (Denmark)

    Österlund, Tobias; Nookaew, Intawat; Bordel, Sergio

    2013-01-01

    ABSTRACT: BACKGROUND: The genome-scale metabolic model of Saccharomyces cerevisiae, first presented in 2003, was the first genome-scale network reconstruction for a eukaryotic organism. Since then continuous efforts have been made in order to improve and expand the yeast metabolic network. RESULTS......-filling methods and by introducing new reactions and pathways based on studies of the literature and databases. The model was shown to perform well both for growth simulations in different media and gene essentiality analysis for single and double knock-outs. Further, the model was used as a scaffold......-to-date collection of knowledge on yeast metabolism. The model was used for simulating the yeast metabolism under four different growth conditions and experimental data from these four conditions was integrated to the model. The model together with experimental data is a useful tool to identify condition...

  14. Genome-scale metabolic model of Pichia pastoris with native and humanized glycosylation of recombinant proteins

    DEFF Research Database (Denmark)

    Irani, Zahra Azimzadeh; Kerkhoven, Eduard J.; Shojaosadati, Seyed Abbas

    2016-01-01

    native nor humanized N-glycosylation, and we therefore developed ihGlycopastoris, an extension to the iLC915 model with both native and humanized N-glycosylation for recombinant protein production, but also an estimation of N-glycosylation of P. pastoris native proteins. This new model gives a better......Pichia pastoris is used for commercial production of human therapeutic proteins, and genome-scale models of P. pastoris metabolism have been generated in the past to study the metabolism and associated protein production by this yeast. A major challenge with clinical usage of recombinant proteins...... produced by P. pastoris is the difference in N-glycosylation of proteins produced by humans and this yeast. However, through metabolic engineering, a P. pastoris strain capable of producing humanized N-glycosylated proteins was constructed. The current genome-scale models of P. pastoris do not address...

  15. Using the reconstructed genome-scale human metabolic network to study physiology and pathology

    OpenAIRE

    Bordbar, Aarash; Palsson, Bernhard O.

    2012-01-01

    Metabolism plays a key role in many major human diseases. Generation of high-throughput omics data has ushered in a new era of systems biology. Genome-scale metabolic network reconstructions provide a platform to interpret omics data in a biochemically meaningful manner. The release of the global human metabolic network, Recon 1, in 2007 has enabled new systems biology approaches to study human physiology, pathology, and pharmacology. There are currently over 20 publications that utilize Reco...

  16. Biofilm Formation Mechanisms of Pseudomonas aeruginosa Predicted via Genome-Scale Kinetic Models of Bacterial Metabolism

    Science.gov (United States)

    2016-03-15

    RESEARCH ARTICLE Biofilm Formation Mechanisms of Pseudomonas aeruginosa Predicted via Genome-Scale Kinetic Models of Bacterial Metabolism Francisco G...jaques.reifman.civ@mail.mil Abstract A hallmark of Pseudomonas aeruginosa is its ability to establish biofilm -based infections that are difficult to...eradicate. Biofilms are less susceptible to host inflammatory and immune responses and have higher antibiotic tolerance than free-living planktonic

  17. Enumeration of smallest intervention strategies in genome-scale metabolic networks.

    Directory of Open Access Journals (Sweden)

    Axel von Kamp

    2014-01-01

    Full Text Available One ultimate goal of metabolic network modeling is the rational redesign of biochemical networks to optimize the production of certain compounds by cellular systems. Although several constraint-based optimization techniques have been developed for this purpose, methods for systematic enumeration of intervention strategies in genome-scale metabolic networks are still lacking. In principle, Minimal Cut Sets (MCSs; inclusion-minimal combinations of reaction or gene deletions that lead to the fulfilment of a given intervention goal provide an exhaustive enumeration approach. However, their disadvantage is the combinatorial explosion in larger networks and the requirement to compute first the elementary modes (EMs which itself is impractical in genome-scale networks. We present MCSEnumerator, a new method for effective enumeration of the smallest MCSs (with fewest interventions in genome-scale metabolic network models. For this we combine two approaches, namely (i the mapping of MCSs to EMs in a dual network, and (ii a modified algorithm by which shortest EMs can be effectively determined in large networks. In this way, we can identify the smallest MCSs by calculating the shortest EMs in the dual network. Realistic application examples demonstrate that our algorithm is able to list thousands of the most efficient intervention strategies in genome-scale networks for various intervention problems. For instance, for the first time we could enumerate all synthetic lethals in E.coli with combinations of up to 5 reactions. We also applied the new algorithm exemplarily to compute strain designs for growth-coupled synthesis of different products (ethanol, fumarate, serine by E.coli. We found numerous new engineering strategies partially requiring less knockouts and guaranteeing higher product yields (even without the assumption of optimal growth than reported previously. The strength of the presented approach is that smallest intervention strategies can be

  18. In silico analysis of human metabolism: Reconstruction, contextualization and application of genome-scale models

    DEFF Research Database (Denmark)

    Geng, Jun; Nielsen, Jens

    2017-01-01

    framework for uncovering the mechanistic relationship between genotype and phenotype. GEMs hereby enable uncovering associations between cellular physiology and pathology in a systematic manner. Here we review the current progress on reconstruction of human GEMs and how they have been employed as scaffold......The arising prevalence of metabolic diseases calls for a holistic approach for analysis of the underlying nature of abnormalities in cellular functions. Through mathematic representation and topological analysis of cellular metabolism, GEnome scale metabolic Models (GEMs) provide a promising...

  19. Expanding a dynamic flux balance model of yeast fermentation to genome-scale

    OpenAIRE

    Vargas, Felipe A; Pizarro, Francisco; Pérez-Correa, J Ricardo; Agosin, Eduardo

    2011-01-01

    Abstract Background Yeast is considered to be a workhorse of the biotechnology industry for the production of many value-added chemicals, alcoholic beverages and biofuels. Optimization of the fermentation is a challenging task that greatly benefits from dynamic models able to accurately describe and predict the fermentation profile and resulting products under different genetic and environmental conditions. In this article, we developed and validated a genome-scale dynamic flux balance model,...

  20. Double and multiple knockout simulations for genome-scale metabolic network reconstructions

    OpenAIRE

    Goldstein, Yaron AB; Bockmayr, Alexander

    2015-01-01

    Background Constraint-based modeling of genome-scale metabolic network reconstructions has become a widely used approach in computational biology. Flux coupling analysis is a constraint-based method that analyses the impact of single reaction knockouts on other reactions in the network. Results We present an extension of flux coupling analysis for double and multiple gene or reaction knockouts, and develop corresponding algorithms for an in silico simulation. To evaluate our method, we perfor...

  1. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae

    Directory of Open Access Journals (Sweden)

    Hansen Kim

    2008-05-01

    Full Text Available Abstract Background Since ancient times the filamentous fungus Aspergillus oryzae has been used in the fermentation industry for the production of fermented sauces and the production of industrial enzymes. Recently, the genome sequence of A. oryzae with 12,074 annotated genes was released but the number of hypothetical proteins accounted for more than 50% of the annotated genes. Considering the industrial importance of this fungus, it is therefore valuable to improve the annotation and further integrate genomic information with biochemical and physiological information available for this microorganism and other related fungi. Here we proposed the gene prediction by construction of an A. oryzae Expressed Sequence Tag (EST library, sequencing and assembly. We enhanced the function assignment by our developed annotation strategy. The resulting better annotation was used to reconstruct the metabolic network leading to a genome scale metabolic model of A. oryzae. Results Our assembled EST sequences we identified 1,046 newly predicted genes in the A. oryzae genome. Furthermore, it was possible to assign putative protein functions to 398 of the newly predicted genes. Noteworthy, our annotation strategy resulted in assignment of new putative functions to 1,469 hypothetical proteins already present in the A. oryzae genome database. Using the substantially improved annotated genome we reconstructed the metabolic network of A. oryzae. This network contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073 metabolites and 1,846 (1,053 unique biochemical reactions. The metabolic reactions are compartmentalized into the cytosol, the mitochondria, the peroxisome and the extracellular space. Transport steps between the compartments and the extracellular space represent 281 reactions, of which 161 are unique. The metabolic model was validated and shown to correctly describe the phenotypic behavior of A. oryzae grown on different carbon sources. Conclusion A much

  2. Diagnostics for stochastic genome-scale modeling via model slicing and debugging.

    Directory of Open Access Journals (Sweden)

    Kevin J Tsai

    Full Text Available Modeling of biological behavior has evolved from simple gene expression plots represented by mathematical equations to genome-scale systems biology networks. However, due to obstacles in complexity and scalability of creating genome-scale models, several biological modelers have turned to programming or scripting languages and away from modeling fundamentals. In doing so, they have traded the ability to have exchangeable, standardized model representation formats, while those that remain true to standardized model representation are faced with challenges in model complexity and analysis. We have developed a model diagnostic methodology inspired by program slicing and debugging and demonstrate the effectiveness of the methodology on a genome-scale metabolic network model published in the BioModels database. The computer-aided identification revealed specific points of interest such as reversibility of reactions, initialization of species amounts, and parameter estimation that improved a candidate cell's adenosine triphosphate production. We then compared the advantages of our methodology over other modeling techniques such as model checking and model reduction. A software application that implements the methodology is available at http://gel.ym.edu.tw/gcs/.

  3. Diagnostics for stochastic genome-scale modeling via model slicing and debugging.

    Science.gov (United States)

    Tsai, Kevin J; Chang, Chuan-Hsiung

    2014-01-01

    Modeling of biological behavior has evolved from simple gene expression plots represented by mathematical equations to genome-scale systems biology networks. However, due to obstacles in complexity and scalability of creating genome-scale models, several biological modelers have turned to programming or scripting languages and away from modeling fundamentals. In doing so, they have traded the ability to have exchangeable, standardized model representation formats, while those that remain true to standardized model representation are faced with challenges in model complexity and analysis. We have developed a model diagnostic methodology inspired by program slicing and debugging and demonstrate the effectiveness of the methodology on a genome-scale metabolic network model published in the BioModels database. The computer-aided identification revealed specific points of interest such as reversibility of reactions, initialization of species amounts, and parameter estimation that improved a candidate cell's adenosine triphosphate production. We then compared the advantages of our methodology over other modeling techniques such as model checking and model reduction. A software application that implements the methodology is available at http://gel.ym.edu.tw/gcs/.

  4. Genome-scale modeling of human metabolism - a systems biology approach.

    Science.gov (United States)

    Mardinoglu, Adil; Gatto, Francesco; Nielsen, Jens

    2013-09-01

    Altered metabolism is linked to the appearance of various human diseases and a better understanding of disease-associated metabolic changes may lead to the identification of novel prognostic biomarkers and the development of new therapies. Genome-scale metabolic models (GEMs) have been employed for studying human metabolism in a systematic manner, as well as for understanding complex human diseases. In the past decade, such metabolic models - one of the fundamental aspects of systems biology - have started contributing to the understanding of the mechanistic relationship between genotype and phenotype. In this review, we focus on the construction of the Human Metabolic Reaction database, the generation of healthy cell type- and cancer-specific GEMs using different procedures, and the potential applications of these developments in the study of human metabolism and in the identification of metabolic changes associated with various disorders. We further examine how in silico genome-scale reconstructions can be employed to simulate metabolic flux distributions and how high-throughput omics data can be analyzed in a context-dependent fashion. Insights yielded from this mechanistic modeling approach can be used for identifying new therapeutic agents and drug targets as well as for the discovery of novel biomarkers. Finally, recent advancements in genome-scale modeling and the future challenge of developing a model of whole-body metabolism are presented. The emergent contribution of GEMs to personalized and translational medicine is also discussed. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Optimal knockout strategies in genome-scale metabolic networks using particle swarm optimization.

    Science.gov (United States)

    Nair, Govind; Jungreuthmayer, Christian; Zanghellini, Jürgen

    2017-02-01

    Knockout strategies, particularly the concept of constrained minimal cut sets (cMCSs), are an important part of the arsenal of tools used in manipulating metabolic networks. Given a specific design, cMCSs can be calculated even in genome-scale networks. We would however like to find not only the optimal intervention strategy for a given design but the best possible design too. Our solution (PSOMCS) is to use particle swarm optimization (PSO) along with the direct calculation of cMCSs from the stoichiometric matrix to obtain optimal designs satisfying multiple objectives. To illustrate the working of PSOMCS, we apply it to a toy network. Next we show its superiority by comparing its performance against other comparable methods on a medium sized E. coli core metabolic network. PSOMCS not only finds solutions comparable to previously published results but also it is orders of magnitude faster. Finally, we use PSOMCS to predict knockouts satisfying multiple objectives in a genome-scale metabolic model of E. coli and compare it with OptKnock and RobustKnock. PSOMCS finds competitive knockout strategies and designs compared to other current methods and is in some cases significantly faster. It can be used in identifying knockouts which will force optimal desired behaviors in large and genome scale metabolic networks. It will be even more useful as larger metabolic models of industrially relevant organisms become available.

  6. Pantograph: A template-based method for genome-scale metabolic model reconstruction.

    Science.gov (United States)

    Loira, Nicolas; Zhukova, Anna; Sherman, David James

    2015-04-01

    Genome-scale metabolic models are a powerful tool to study the inner workings of biological systems and to guide applications. The advent of cheap sequencing has brought the opportunity to create metabolic maps of biotechnologically interesting organisms. While this drives the development of new methods and automatic tools, network reconstruction remains a time-consuming process where extensive manual curation is required. This curation introduces specific knowledge about the modeled organism, either explicitly in the form of molecular processes, or indirectly in the form of annotations of the model elements. Paradoxically, this knowledge is usually lost when reconstruction of a different organism is started. We introduce the Pantograph method for metabolic model reconstruction. This method combines a template reaction knowledge base, orthology mappings between two organisms, and experimental phenotypic evidence, to build a genome-scale metabolic model for a target organism. Our method infers implicit knowledge from annotations in the template, and rewrites these inferences to include them in the resulting model of the target organism. The generated model is well suited for manual curation. Scripts for evaluating the model with respect to experimental data are automatically generated, to aid curators in iterative improvement. We present an implementation of the Pantograph method, as a toolbox for genome-scale model reconstruction, curation and validation. This open source package can be obtained from: http://pathtastic.gforge.inria.fr.

  7. De-novo learning of genome-scale regulatory networks in S. cerevisiae.

    Directory of Open Access Journals (Sweden)

    Sisi Ma

    Full Text Available De-novo reverse-engineering of genome-scale regulatory networks is a fundamental problem of biological and translational research. One of the major obstacles in developing and evaluating approaches for de-novo gene network reconstruction is the absence of high-quality genome-scale gold-standard networks of direct regulatory interactions. To establish a foundation for assessing the accuracy of de-novo gene network reverse-engineering, we constructed high-quality genome-scale gold-standard networks of direct regulatory interactions in Saccharomyces cerevisiae that incorporate binding and gene knockout data. Then we used 7 performance metrics to assess accuracy of 18 statistical association-based approaches for de-novo network reverse-engineering in 13 different datasets spanning over 4 data types. We found that most reconstructed networks had statistically significant accuracies. We also determined which statistical approaches and datasets/data types lead to networks with better reconstruction accuracies. While we found that de-novo reverse-engineering of the entire network is a challenging problem, it is possible to reconstruct sub-networks around some transcription factors with good accuracy. The latter transcription factors can be identified by assessing their connectivity in the inferred networks. Overall, this study provides the gene network reverse-engineering community with a rigorous assessment of the accuracy of S. cerevisiae gene network reconstruction and variability in performance of various approaches for learning both the entire network and sub-networks around transcription factors.

  8. Development of Insertion and Deletion Markers based on Biparental Resequencing for Fine Mapping Seed Weight in Soybean

    Directory of Open Access Journals (Sweden)

    Ying-hui Li

    2014-11-01

    Full Text Available As a complement to single nucleotide polymorphisms (SNPs and simple sequence repeats (SSRs, biallelic insertions and deletions (InDels represent powerful molecular markers with desirable features for filling the gap in current genetic linkage maps. In this study, 28,908 small InDel polymorphisms (1–5 base pair, bp distributed genome-wide were identified and annotated by comparison of a whole-genome resequencing data set from two soybean [ (L. Merr.] genotypes, cultivar Zhonghunag13 (ZH and line Zhongpin03-5373 (ZP. The physical distribution of InDel polymorphisms in soybean genome was uneven, and matched closely with the distribution of previously annotated genes. The average density of InDel in the arm region was significantly higher than that in the pericentromeric region. The genomic regions that were fixed between the two elites were elucidated. With this information, five InDel markers within a putative quantitative trait locus (QTL for seed weight (SW, , were developed and used to genotype 254 recombinant inbred lines (RILs derived from the cross of ZP × ZH. Adding these five InDel markers to previously used SNP and SSR markers facilitated the discovery of further recombination events allowing fine-mapping the QTL to a 0.5 Mbp region. Our study clearly underlines the high value of InDel markers for map-based cloning and marker-assisted selection in soybean.

  9. Tracing the Spread of Clostridium difficile Ribotype 027 in Germany Based on Bacterial Genome Sequences.

    Directory of Open Access Journals (Sweden)

    Matthias Steglich

    Full Text Available We applied whole-genome sequencing to reconstruct the spatial and temporal dynamics underpinning the expansion of Clostridium difficile ribotype 027 in Germany. Based on re-sequencing of genomes from 57 clinical C. difficile isolates, which had been collected from hospitalized patients at 36 locations throughout Germany between 1990 and 2012, we demonstrate that C. difficile genomes have accumulated sequence variation sufficiently fast to document the pathogen's spread at a regional scale. We detected both previously described lineages of fluoroquinolone-resistant C. difficile ribotype 027, FQR1 and FQR2. Using Bayesian phylogeographic analyses, we show that fluoroquinolone-resistant C. difficile 027 was imported into Germany at least four times, that it had been widely disseminated across multiple federal states even before the first outbreak was noted in 2007, and that it has continued to spread since.

  10. Tracing the Spread of Clostridium difficile Ribotype 027 in Germany Based on Bacterial Genome Sequences.

    Science.gov (United States)

    Steglich, Matthias; Nitsche, Andreas; von Müller, Lutz; Herrmann, Mathias; Kohl, Thomas A; Niemann, Stefan; Nübel, Ulrich

    2015-01-01

    We applied whole-genome sequencing to reconstruct the spatial and temporal dynamics underpinning the expansion of Clostridium difficile ribotype 027 in Germany. Based on re-sequencing of genomes from 57 clinical C. difficile isolates, which had been collected from hospitalized patients at 36 locations throughout Germany between 1990 and 2012, we demonstrate that C. difficile genomes have accumulated sequence variation sufficiently fast to document the pathogen's spread at a regional scale. We detected both previously described lineages of fluoroquinolone-resistant C. difficile ribotype 027, FQR1 and FQR2. Using Bayesian phylogeographic analyses, we show that fluoroquinolone-resistant C. difficile 027 was imported into Germany at least four times, that it had been widely disseminated across multiple federal states even before the first outbreak was noted in 2007, and that it has continued to spread since.

  11. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    Science.gov (United States)

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available

  12. Linking great apes genome evolution across time scales using polymorphism-aware phylogenetic models.

    Science.gov (United States)

    De Maio, Nicola; Schlötterer, Christian; Kosiol, Carolin

    2013-10-01

    The genomes of related species contain valuable information on the history of the considered taxa. Great apes in particular exhibit variation of evolutionary patterns along their genomes. However, the great ape data also bring new challenges, such as the presence of incomplete lineage sorting and ancestral shared polymorphisms. Previous methods for genome-scale analysis are restricted to very few individuals or cannot disentangle the contribution of mutation rates and fixation biases. This represents a limitation both for the understanding of these forces as well as for the detection of regions affected by selection. Here, we present a new model designed to estimate mutation rates and fixation biases from genetic variation within and between species. We relax the assumption of instantaneous substitutions, modeling substitutions as mutational events followed by a gradual fixation. Hence, we straightforwardly account for shared ancestral polymorphisms and incomplete lineage sorting. We analyze genome-wide synonymous site alignments of human, chimpanzee, and two orangutan species. From each taxon, we include data from several individuals. We estimate mutation rates and GC-biased gene conversion intensity. We find that both mutation rates and biased gene conversion vary with GC content. We also find lineage-specific differences, with weaker fixation biases in orangutan species, suggesting a reduced historical effective population size. Finally, our results are consistent with directional selection acting on coding sequences in relation to exonic splicing enhancers.

  13. Massive screening of copy number population-scale variation in Bos taurus genome.

    Science.gov (United States)

    Cicconardi, Francesco; Chillemi, Giovanni; Tramontano, Anna; Marchitelli, Cinzia; Valentini, Alessio; Ajmone-Marsan, Paolo; Nardone, Alessandro

    2013-02-26

    Copy number variations (CNVs) represent a significant source of genomic structural variation. Their length ranges from approximately one hundred to millions of base pair. Genome-wide screenings have clarified that CNVs are a ubiquitous phenomenon affecting essentially the whole genome. Although Bos taurus is one of the most important domestic animal species worldwide and one of the most studied ruminant models for metabolism, reproduction, and disease, relatively few studies have investigated CNVs in cattle and little is known about how CNVs contribute to normal phenotypic variation and to disease susceptibility in this species, compared to humans and other model organisms. Here we characterize and compare CNV profiles in 2654 animals from five dairy and beef Bos taurus breeds, using the Illumina BovineSNP50 genotyping array (54001 SNP probes). In this study we applied the two most commonly used algorithms for CNV discovery (QuantiSNP and PennCNV) and identified 4830 unique candidate CNVs belonging to 326 regions. These regions overlap with 5789 known genes, 76.7% of which are significantly co-localized with segmental duplications (SD). This large scale screening significantly contributes to the enrichment of the Bos taurus CNV map, demonstrates the ubiquity, great diversity and complexity of this type of genomic variation and sets the basis for testing the influence of CNVs on Bos taurus complex functional and production traits.

  14. An evolutionary framework for association testing in resequencing studies.

    Directory of Open Access Journals (Sweden)

    C Ryan King

    2010-11-01

    Full Text Available Sequencing technologies are becoming cheap enough to apply to large numbers of study participants and promise to provide new insights into human phenotypes by bringing to light rare and previously unknown genetic variants. We develop a new framework for the analysis of sequence data that incorporates all of the major features of previously proposed approaches, including those focused on allele counts and allele burden, but is both more general and more powerful. We harness population genetic theory to provide prior information on effect sizes and to create a pooling strategy for information from rare variants. Our method, EMMPAT (Evolutionary Mixed Model for Pooled Association Testing, generates a single test per gene (substantially reducing multiple testing concerns, facilitates graphical summaries, and improves the interpretation of results by allowing calculation of attributable variance. Simulations show that, relative to previously used approaches, our method increases the power to detect genes that affect phenotype when natural selection has kept alleles with large effect sizes rare. We demonstrate our approach on a population-based re-sequencing study of association between serum triglycerides and variation in ANGPTL4.

  15. Analysis of growth of Lactobacillus plantarum WCFS1 on a complex medium using a genome-scale metabolic model

    NARCIS (Netherlands)

    Teusink, B.; Wiersma, A.; Molenaar, D.; Francke, C.; Vos, W.M. de; Siezen, R.J.; Smid, E.J.

    2006-01-01

    A genome-scale metabolic model of the lactic acid bacterium Lactobacillus plantarum WCFS1 was constructed based on genomic content and experimental data. The complete model includes 721 genes, 643 reactions, and 531 metabolites. Different stoichiometric modeling techniques were used for

  16. Structural characterization of genomes by large scale sequence-structure threading

    Directory of Open Access Journals (Sweden)

    Cherkasov Artem

    2004-04-01

    Full Text Available Abstract Background Using sequence-structure threading we have conducted structural characterization of complete proteomes of 37 archaeal, bacterial and eukaryotic organisms (including worm, fly, mouse and human totaling 167,888 genes. Results The reported data represent first rather general evaluation of performance of full sequence-structure threading on multiple genomes providing opportunity to evaluate its general applicability for large scale studies. According to the estimated results the sequence-structure threading has assigned protein folds to more then 60% of eukaryotic, 68% of archaeal and 70% of bacterial proteomes. The repertoires of protein classes, architectures, topologies and homologous superfamilies (according to the CATH 2.4 classification have been established for distant organisms and superkingdoms. It has been found that the average abundance of CATH classes decreases from "alpha and beta" to "mainly beta", followed by "mainly alpha" and "few secondary structures". 3-Layer (aba Sandwich has been characterized as the most abundant protein architecture and Rossman fold as the most common topology. Conclusion The analysis of genomic occurrences of CATH 2.4 protein homologous superfamilies and topologies has revealed the power-law character of their distributions. The corresponding double logarithmic "frequency – genomic occurrence" dependences characteristic of scale-free systems have been established for individual organisms and for three superkingdoms. Supplementary materials to this works are available at 1.

  17. GenomeScale Reconstruction of Metabolic Networks of Lactobacillus casei ATCC 334 and 12A

    Science.gov (United States)

    Vinay-Lara, Elena; Hamilton, Joshua J.; Stahl, Buffy; Broadbent, Jeff R.; Reed, Jennifer L.; Steele, James L.

    2014-01-01

    Lactobacillus casei strains are widely used in industry and the utility of this organism in these industrial applications is strain dependent. Hence, tools capable of predicting strain specific phenotypes would have utility in the selection of strains for specific industrial processes. Genome-scale metabolic models can be utilized to better understand genotype-phenotype relationships and to compare different organisms. To assist in the selection and development of strains with enhanced industrial utility, genome-scale models for L. casei ATCC 334, a well characterized strain, and strain 12A, a corn silage isolate, were constructed. Draft models were generated from RAST genome annotations using the Model SEED database and refined by evaluating ATP generating cycles, mass-and-charge-balances of reactions, and growth phenotypes. After the validation process was finished, we compared the metabolic networks of these two strains to identify metabolic, genetic and ortholog differences that may lead to different phenotypic behaviors. We conclude that the metabolic capabilities of the two networks are highly similar. The L. casei ATCC 334 model accounts for 1,040 reactions, 959 metabolites and 548 genes, while the L. casei 12A model accounts for 1,076 reactions, 979 metabolites and 640 genes. The developed L. casei ATCC 334 and 12A metabolic models will enable better understanding of the physiology of these organisms and be valuable tools in the development and selection of strains with enhanced utility in a variety of industrial applications. PMID:25365062

  18. Optimizing eukaryotic cell hosts for protein production through systems biotechnology and genome-scale modeling.

    Science.gov (United States)

    Gutierrez, Jahir M; Lewis, Nathan E

    2015-07-01

    Eukaryotic cell lines, including Chinese hamster ovary cells, yeast, and insect cells, are invaluable hosts for the production of many recombinant proteins. With the advent of genomic resources, one can now leverage genome-scale computational modeling of cellular pathways to rationally engineer eukaryotic host cells. Genome-scale models of metabolism include all known biochemical reactions occurring in a specific cell. By describing these mathematically and using tools such as flux balance analysis, the models can simulate cell physiology and provide targets for cell engineering that could lead to enhanced cell viability, titer, and productivity. Here we review examples in which metabolic models in eukaryotic cell cultures have been used to rationally select targets for genetic modification, improve cellular metabolic capabilities, design media supplementation, and interpret high-throughput omics data. As more comprehensive models of metabolism and other cellular processes are developed for eukaryotic cell culture, these will enable further exciting developments in cell line engineering, thus accelerating recombinant protein production and biotechnology in the years to come. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Genome-scale metabolic network of Cordyceps militaris useful for comparative analysis of entomopathogenic fungi.

    Science.gov (United States)

    Vongsangnak, Wanwipa; Raethong, Nachon; Mujchariyakul, Warasinee; Nguyen, Nam Ninh; Leong, Hon Wai; Laoteng, Kobkul

    2017-08-30

    The first genome-scale metabolic network of Cordyceps militaris (iWV1170) was constructed representing its whole metabolisms, which consisted of 894 metabolites and 1,267 metabolic reactions across five compartments, including the plasma membrane, cytoplasm, mitochondria, peroxisome and extracellular space. The iWV1170 could be exploited to explain its phenotypes of growth ability, cordycepin and other metabolites production on various substrates. A high number of genes encoding extracellular enzymes for degradation of complex carbohydrates, lipids and proteins were existed in C. militaris genome. By comparative genome-scale analysis, the adenine metabolic pathway towards putative cordycepin biosynthesis was reconstructed, indicating their evolutionary relationships across eleven species of entomopathogenic fungi. The overall metabolic routes involved in the putative cordycepin biosynthesis were also identified in C. militaris, including central carbon metabolism, amino acid metabolism (glycine, l-glutamine and l-aspartate) and nucleotide metabolism (adenosine and adenine). Interestingly, a lack of the sequence coding for ribonucleotide reductase inhibitor was observed in C. militaris that might contribute to its over-production of cordycepin. Copyright © 2017. Published by Elsevier B.V.

  20. Genome-Scale Model Reveals Metabolic Basis of Biomass Partitioning in a Model Diatom.

    Directory of Open Access Journals (Sweden)

    Jennifer Levering

    Full Text Available Diatoms are eukaryotic microalgae that contain genes from various sources, including bacteria and the secondary endosymbiotic host. Due to this unique combination of genes, diatoms are taxonomically and functionally distinct from other algae and vascular plants and confer novel metabolic capabilities. Based on the genome annotation, we performed a genome-scale metabolic network reconstruction for the marine diatom Phaeodactylum tricornutum. Due to their endosymbiotic origin, diatoms possess a complex chloroplast structure which complicates the prediction of subcellular protein localization. Based on previous work we implemented a pipeline that exploits a series of bioinformatics tools to predict protein localization. The manually curated reconstructed metabolic network iLB1027_lipid accounts for 1,027 genes associated with 4,456 reactions and 2,172 metabolites distributed across six compartments. To constrain the genome-scale model, we determined the organism specific biomass composition in terms of lipids, carbohydrates, and proteins using Fourier transform infrared spectrometry. Our simulations indicate the presence of a yet unknown glutamine-ornithine shunt that could be used to transfer reducing equivalents generated by photosynthesis to the mitochondria. The model reflects the known biochemical composition of P. tricornutum in defined culture conditions and enables metabolic engineering strategies to improve the use of P. tricornutum for biotechnological applications.

  1. Genome-Scale Model Reveals Metabolic Basis of Biomass Partitioning in a Model Diatom.

    Science.gov (United States)

    Levering, Jennifer; Broddrick, Jared; Dupont, Christopher L; Peers, Graham; Beeri, Karen; Mayers, Joshua; Gallina, Alessandra A; Allen, Andrew E; Palsson, Bernhard O; Zengler, Karsten

    2016-01-01

    Diatoms are eukaryotic microalgae that contain genes from various sources, including bacteria and the secondary endosymbiotic host. Due to this unique combination of genes, diatoms are taxonomically and functionally distinct from other algae and vascular plants and confer novel metabolic capabilities. Based on the genome annotation, we performed a genome-scale metabolic network reconstruction for the marine diatom Phaeodactylum tricornutum. Due to their endosymbiotic origin, diatoms possess a complex chloroplast structure which complicates the prediction of subcellular protein localization. Based on previous work we implemented a pipeline that exploits a series of bioinformatics tools to predict protein localization. The manually curated reconstructed metabolic network iLB1027_lipid accounts for 1,027 genes associated with 4,456 reactions and 2,172 metabolites distributed across six compartments. To constrain the genome-scale model, we determined the organism specific biomass composition in terms of lipids, carbohydrates, and proteins using Fourier transform infrared spectrometry. Our simulations indicate the presence of a yet unknown glutamine-ornithine shunt that could be used to transfer reducing equivalents generated by photosynthesis to the mitochondria. The model reflects the known biochemical composition of P. tricornutum in defined culture conditions and enables metabolic engineering strategies to improve the use of P. tricornutum for biotechnological applications.

  2. Reconstruction and analysis of a genome-scale metabolic model for Scheffersomyces stipitis

    Directory of Open Access Journals (Sweden)

    Balagurunathan Balaji

    2012-02-01

    Full Text Available Abstract Background Fermentation of xylose, the major component in hemicellulose, is essential for economic conversion of lignocellulosic biomass to fuels and chemicals. The yeast Scheffersomyces stipitis (formerly known as Pichia stipitis has the highest known native capacity for xylose fermentation and possesses several genes for lignocellulose bioconversion in its genome. Understanding the metabolism of this yeast at a global scale, by reconstructing the genome scale metabolic model, is essential for manipulating its metabolic capabilities and for successful transfer of its capabilities to other industrial microbes. Results We present a genome-scale metabolic model for Scheffersomyces stipitis, a native xylose utilizing yeast. The model was reconstructed based on genome sequence annotation, detailed experimental investigation and known yeast physiology. Macromolecular composition of Scheffersomyces stipitis biomass was estimated experimentally and its ability to grow on different carbon, nitrogen, sulphur and phosphorus sources was determined by phenotype microarrays. The compartmentalized model, developed based on an iterative procedure, accounted for 814 genes, 1371 reactions, and 971 metabolites. In silico computed growth rates were compared with high-throughput phenotyping data and the model could predict the qualitative outcomes in 74% of substrates investigated. Model simulations were used to identify the biosynthetic requirements for anaerobic growth of Scheffersomyces stipitis on glucose and the results were validated with published literature. The bottlenecks in Scheffersomyces stipitis metabolic network for xylose uptake and nucleotide cofactor recycling were identified by in silico flux variability analysis. The scope of the model in enhancing the mechanistic understanding of microbial metabolism is demonstrated by identifying a mechanism for mitochondrial respiration and oxidative phosphorylation. Conclusion The genome-scale

  3. Resequencing three candidate genes for major depressive disorder in a Dutch cohort.

    Directory of Open Access Journals (Sweden)

    Eva C Verbeek

    Full Text Available Major depressive disorder (MDD is a psychiatric disorder, characterized by periods of low mood of more than two weeks, loss of interest in normally enjoyable activities and behavioral changes. MDD is a complex disorder and does not have a single genetic cause. In 2009 a genome wide association study (GWAS was performed on the Dutch GAIN-MDD cohort. Many of the top signals of this GWAS mapped to a region spanning the gene PCLO, and the non-synonymous coding single nucleotide polymorphism (SNP rs2522833 in the PCLO gene became genome wide significant after post-hoc analysis. We performed resequencing of PCLO, GRM7, and SLC6A4 in 50 control samples from the GAIN-MDD cohort, to detect new genomic variants. Subsequently, we genotyped these variants in the entire GAIN-MDD cohort and performed association analysis to investigate if rs2522833 is the causal variant or simply in linkage disequilibrium with a more associated variant. GRM7 and SLC6A4 are both candidate genes for MDD from literature. We aimed to gather more evidence that rs2522833 is indeed the causal variant in the GAIN-MDD cohort or to find a previously undetected common variant in either PCLO, GRM7, or SLC6A4 with a higher association in this cohort. After next generation sequencing and association analysis we excluded the possibility of an undetected common variant to be more associated. For neither PCLO nor GRM7 we found a more associated variant. For SLC6A4, we found a new SNP that showed a lower P-value (P = 0.07 than in the GAIN-MDD GWAS (P = 0.09. However, no evidence for genome-wide significance was found. Although we did not take into account rare variants, we conclude that our results provide further support for the hypothesis that the non-synonymous coding SNP rs2522833 in the PCLO gene is indeed likely to be the causal variant in the GAIN-MDD cohort.

  4. Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach

    Science.gov (United States)

    Ponce-de-Leon, Miguel; Calle-Espinosa, Jorge; Peretó, Juli; Montero, Francisco

    2015-01-01

    Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-scale model. Finally, a set of 36 models that had not been considered during the construction of the metamodel was used, as a proof of concept, to extend the metamodel with new biochemical information, and to assess its impact on gap-filling results. The analysis performed on the metamodel allowed to conclude: 1) the recurrent inconsistencies found in the models were already present in the metabolic database used during the reconstructions process; 2) the presence of inconsistencies in a metabolic database can be propagated to the reconstructed models; 3) there are reactions not manifested as blocked which are active as a consequence of some classes of artifacts, and; 4) the results of an automatic gap-filling are highly dependent on the consistency and completeness of the metamodel or metabolic database used as the reference network. In conclusion the consistency analysis should be applied to metabolic databases in order to detect and fill gaps as well as to detect and remove artifacts and redundant information. PMID:26629901

  5. Characterizing the optimal flux space of genome-scale metabolic reconstructions through modified latin-hypercube sampling

    NARCIS (Netherlands)

    Chaudhary, N.; Tøndel, K.; Bhatnagar, R.; Martins dos Santos, V.A.P.; Puchalka, J.

    2016-01-01

    Genome-Scale Metabolic Reconstructions (GSMRs), along with optimization-based methods, predominantly Flux Balance Analysis (FBA) and its derivatives, are widely applied for assessing and predicting the behavior of metabolic networks upon perturbation, thereby enabling identification of potential

  6. Genome-scale reconstruction of the Streptococcus pyogenes M49 metabolic network reveals growth requirements and indicates potential drug targets

    NARCIS (Netherlands)

    Levering, J.; Fiedler, T.; Sieg, A.; van Grinsven, K.W.A.; Hering, S.; Veith, N.; Olivier, B.G.; Klett, L.; Hugenholtz, J.; Teusink, B.; Kreikemeyer, B.; Kummer, U.

    2016-01-01

    Genome-scale metabolic models comprise stoichiometric relations between metabolites, as well as associations between genes and metabolic reactions and facilitate the analysis of metabolism. We computationally reconstructed the metabolic network of the lactic acid bacterium Streptococcus pyogenes

  7. A Consensus Genome-scale Reconstruction of Chinese Hamster Ovary Cell Metabolism

    DEFF Research Database (Denmark)

    Hefzi, Hooman; Ang, Kok Siong; Hanscho, Michael

    2016-01-01

    Chinese hamster ovary (CHO) cells dominate biotherapeutic protein production and are widely used in mammalian cell line engineering research. To elucidate metabolic bottlenecks in protein production and to guide cell engineering and bioprocess optimization, we reconstructed the metabolic pathways...... in CHO and associated them with >1,700 genes in the Cricetulus griseus genome. The genome-scale metabolic model based on this reconstruction, iCHO1766, and cell-line-specific models for CHO-K1, CHO-S, and CHO-DG44 cells provide the biochemical basis of growth and recombinant protein production....... The models accurately predict growth phenotypes and known auxotrophies in CHO cells. With the models, we quantify the protein synthesis capacity of CHO cells and demonstrate that common bioprocess treatments, such as histone deacetylase inhibitors, inefficiently increase product yield. However, our...

  8. Genome-scale RNAi screens for high-throughput phenotyping in bloodstream-form African trypanosomes.

    Science.gov (United States)

    Glover, Lucy; Alsford, Sam; Baker, Nicola; Turner, Daniel J; Sanchez-Flores, Alejandro; Hutchinson, Sebastian; Hertz-Fowler, Christiane; Berriman, Matthew; Horn, David

    2015-01-01

    The ability to simultaneously assess every gene in a genome for a role in a particular process has obvious appeal. This protocol describes how to perform genome-scale RNAi library screens in bloodstream-form African trypanosomes, a family of parasites that causes lethal human and animal diseases and also serves as a model for studies on basic aspects of eukaryotic biology and evolution. We discuss strain assembly, screen design and implementation, the RNAi target sequencing approach and hit validation, and we provide a step-by-step protocol. A screen can yield from one to thousands of 'hits' associated with the phenotype of interest. The screening protocol itself takes 2 weeks or less to be completed, and high-throughput sequencing may also be completed within weeks. Pre- and post-screen strain assembly, validation and follow-up can take several months, depending on the type of screen and the number of hits analyzed.

  9. Genome-scale genetic manipulation methods for exploring bacterial molecular biology.

    Science.gov (United States)

    Gagarinova, Alla; Emili, Andrew

    2012-06-01

    Bacteria are diverse and abundant, playing key roles in human health and disease, the environment, and biotechnology. Despite progress in genome sequencing and bioengineering, much remains unknown about the functional organization of prokaryotes. For instance, roughly a third of the protein-coding genes of the best-studied model bacterium, Escherichia coli, currently lack experimental annotations. Systems-level experimental approaches for investigating the functional associations of bacterial genes and genetic structures are essential for defining the fundamental molecular biology of microbes, preventing the spread of antibacterial resistance in the clinic, and driving the development of future biotechnological applications. This review highlights recently introduced large-scale genetic manipulation and screening procedures for the systematic exploration of bacterial gene functions, molecular relationships, and the global organization of bacteria at the gene, pathway, and genome levels.

  10. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  11. Improved Evidence-Based Genome-scale Metabolic Models for Maize Leaf, Embryo, and Endosperm.

    Directory of Open Access Journals (Sweden)

    Samuel eSeaver

    2015-03-01

    Full Text Available There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.

  12. Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

    Science.gov (United States)

    Seaver, Samuel M. D.; Bradbury, Louis M. T.; Frelin, Océane; Zarecki, Raphy; Ruppin, Eytan; Hanson, Andrew D.; Henry, Christopher S.

    2015-01-01

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes. PMID:25806041

  13. Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

    Energy Technology Data Exchange (ETDEWEB)

    Seaver, Samuel M. D.; Bradbury, Louis M. T.; Frelin, Océane; Zarecki, Raphy; Ruppin, Eytan; Hanson, Andrew D.; Henry, Christopher S.

    2015-03-10

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.

  14. Fusion of large-scale genomic knowledge and frequency data computationally prioritizes variants in epilepsy.

    Directory of Open Access Journals (Sweden)

    Ian M Campbell

    Full Text Available Curation and interpretation of copy number variants identified by genome-wide testing is challenged by the large number of events harbored in each personal genome. Conventional determination of phenotypic relevance relies on patterns of higher frequency in affected individuals versus controls; however, an increasing amount of ascertained variation is rare or private to clans. Consequently, frequency data have less utility to resolve pathogenic from benign. One solution is disease-specific algorithms that leverage gene knowledge together with variant frequency to aid prioritization. We used large-scale resources including Gene Ontology, protein-protein interactions and other annotation systems together with a broad set of 83 genes with known associations to epilepsy to construct a pathogenicity score for the phenotype. We evaluated the score for all annotated human genes and applied Bayesian methods to combine the derived pathogenicity score with frequency information from our diagnostic laboratory. Analysis determined Bayes factors and posterior distributions for each gene. We applied our method to subjects with abnormal chromosomal microarray results and confirmed epilepsy diagnoses gathered by electronic medical record review. Genes deleted in our subjects with epilepsy had significantly higher pathogenicity scores and Bayes factors compared to subjects referred for non-neurologic indications. We also applied our scores to identify a recently validated epilepsy gene in a complex genomic region and to reveal candidate genes for epilepsy. We propose a potential use in clinical decision support for our results in the context of genome-wide screening. Our approach demonstrates the utility of integrative data in medical genomics.

  15. Population Structure and Domestication Revealed by High-Depth Resequencing of Korean Cultivated and Wild Soybean Genomes†

    Science.gov (United States)

    Chung, Won-Hyong; Jeong, Namhee; Kim, Jiwoong; Lee, Woo Kyu; Lee, Yun-Gyeong; Lee, Sang-Heon; Yoon, Woongchang; Kim, Jin-Hyun; Choi, Ik-Young; Choi, Hong-Kyu; Moon, Jung-Kyung; Kim, Namshin; Jeong, Soon-Chun

    2014-01-01

    Despite the importance of soybean as a major crop, genome-wide variation and evolution of cultivated soybeans are largely unknown. Here, we catalogued genome variation in an annual soybean population by high-depth resequencing of 10 cultivated and 6 wild accessions and obtained 3.87 million high-quality single-nucleotide polymorphisms (SNPs) after excluding the sites with missing data in any accession. Nuclear genome phylogeny supported a single origin for the cultivated soybeans. We identified 10-fold longer linkage disequilibrium (LD) in the wild soybean relative to wild maize and rice. Despite the small population size, the long LD and large SNP data allowed us to identify 206 candidate domestication regions with significantly lower diversity in the cultivated, but not in the wild, soybeans. Some of the genes in these candidate regions were associated with soybean homologues of canonical domestication genes. However, several examples, which are likely specific to soybean or eudicot crop plants, were also observed. Consequently, the variation data identified in this study should be valuable for breeding and for identifying agronomically important genes in soybeans. However, the long LD of wild soybeans may hinder pinpointing causal gene(s) in the candidate regions. PMID:24271940

  16. The feasibility of genome-scale biological network inference using Graphics Processing Units.

    Science.gov (United States)

    Thiagarajan, Raghuram; Alavi, Amir; Podichetty, Jagdeep T; Bazil, Jason N; Beard, Daniel A

    2017-01-01

    Systems research spanning fields from biology to finance involves the identification of models to represent the underpinnings of complex systems. Formal approaches for data-driven identification of network interactions include statistical inference-based approaches and methods to identify dynamical systems models that are capable of fitting multivariate data. Availability of large data sets and so-called 'big data' applications in biology present great opportunities as well as major challenges for systems identification/reverse engineering applications. For example, both inverse identification and forward simulations of genome-scale gene regulatory network models pose compute-intensive problems. This issue is addressed here by combining the processing power of Graphics Processing Units (GPUs) and a parallel reverse engineering algorithm for inference of regulatory networks. It is shown that, given an appropriate data set, information on genome-scale networks (systems of 1000 or more state variables) can be inferred using a reverse-engineering algorithm in a matter of days on a small-scale modern GPU cluster.

  17. Scaling up: A guide to high throughput genomic approaches for biodiversity analysis.

    Science.gov (United States)

    Porter, Teresita M; Hajibabaei, Mehrdad

    2018-01-02

    The purpose of this review is to present the most common and emerging DNA-based methods used to generate data for biodiversity and biomonitoring studies. Since environmental assessment and monitoring programs may require biodiversity information at multiple levels, we pay particular attention to the DNA metabarcoding method and discuss a number of bioinformatic tools and considerations for producing DNA-based indicators using operational taxonomic units (OTUs), taxa at a variety of ranks, and community composition. By developing the capacity to harness the advantages provided by the newest technologies, investigators can 'scale-up' by increasing the number of samples and replicates processed, the frequency of sampling over time and space, and even the depth of sampling such as by sequencing more reads per sample or more markers per sample. The ability to scale-up is made possible by the reduced hands-on time and cost per sample provided by the newest kits, platforms, and software tools. Results gleaned from broad-scale monitoring will provide an opportunity to address key scientific questions linked to biodiversity and its dynamics across time and space as well as being more relevant for policy makers, enabling science-based decision making, and provide a greater socio-economic impact. Since genomic approaches are continually evolving, we provide this guide to methods used in biodiversity genomics. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  18. Evidence for substantial fine-scale variation in recombination rates across the human genome.

    Science.gov (United States)

    Crawford, Dana C; Bhangale, Tushar; Li, Na; Hellenthal, Garrett; Rieder, Mark J; Nickerson, Deborah A; Stephens, Matthew

    2004-07-01

    Characterizing fine-scale variation in human recombination rates is important, both to deepen understanding of the recombination process and to aid the design of disease association studies. Current genetic maps show that rates vary on a megabase scale, but studying finer-scale variation using pedigrees is difficult. Sperm-typing experiments have characterized regions where crossovers cluster into 1-2-kb hot spots, but technical difficulties limit the number of studies. An alternative is to use population variation to infer fine-scale characteristics of the recombination process. Several surveys reported 'block-like' patterns of diversity, which may reflect fine-scale recombination rate variation, but limitations of available methods made this impossible to assess. Here, we applied a new statistical method, which overcomes these limitations, to infer patterns of fine-scale recombination rate variation in 74 genes. We found extensive rate variation both within and among genes. In particular, recombination hot spots are a common feature of the human genome: 47% (35 of 74) of genes showed substantive evidence for a hot spot, and many more showed evidence for some rate variation. No primary sequence characteristics are consistently associated with precise hot-spot location, although G+C content and nucleotide diversity are correlated with local recombination rate.

  19. Genome Partitioner: A web tool for multi-level partitioning of large-scale DNA constructs for synthetic biology applications.

    Directory of Open Access Journals (Sweden)

    Matthias Christen

    Full Text Available Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.

  20. Annotated Draft Genome Assemblies for the Northern Bobwhite (Colinus virginianus) and the Scaled Quail (Callipepla squamata) Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size.

    Science.gov (United States)

    Oldeschulte, David L; Halley, Yvette A; Wilson, Miranda L; Bhattarai, Eric K; Brashear, Wesley; Hill, Joshua; Metz, Richard P; Johnson, Charles D; Rollins, Dale; Peterson, Markus J; Bickhart, Derek M; Decker, Jared E; Sewell, John F; Seabury, Christopher M

    2017-09-07

    Northern bobwhite ( Colinus virginianus ; hereafter bobwhite) and scaled quail ( Callipepla squamata ) populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0) and second- (v2.0) generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb) was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb), which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%), genome-wide repetitive content (10.40%; 10.43%), and MAKER-predicted protein coding genes (17,131; 17,165) were similar for the scaled quail (v1.0) and bobwhite (v2.0) assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8%) and the bobwhite (v2.0; 82.5%), as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0), and 711 in the bobwhite genome (v2.0), including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0) and bobwhite (v2.0) genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15-20 KYA. Copyright © 2017 Oldeschulte et al.

  1. Incorporating Protein Biosynthesis into the Saccharomyces cerevisiae Genome-scale Metabolic Model

    DEFF Research Database (Denmark)

    Olivares Hernandez, Roberto

    , translation initiation, translation elongation, translation termination, translation elongation, and mRNA decay. Considering these information from the mechanisms of transcription and translation, we will include this stoichiometric reactions into the genome scale model for S. Cerevisiae to obtain the first...... by a rapidly growing cell. To extend the model including protein synthesis, from the survey of the available literature was possible to identify a few enzymatic reactions and gene functions in the early steps of gene expression for proteins: mRNA transcription, mRNA processing, mRNA export out of the nucleus...

  2. Reconstruction of genome-scale human metabolic models using omics data

    DEFF Research Database (Denmark)

    Ryu, Jae Yong; Kim, Hyun Uk; Lee, Sang Yup

    2015-01-01

    used to describe metabolic phenotypes of healthy and diseased human tissues and cells, and to predict therapeutic targets. Here we review recent trends in genome-scale human metabolic modeling, including various generic and tissue/cell type-specific human metabolic models developed to date, and methods...... refined through gap filling, reaction directionality assignment and the subcellular localization of metabolic reactions. We review relevant tools for this model refinement procedure as well. Finally, we suggest the direction of further studies on reconstructing an improved human metabolic model....

  3. Principles of proteome allocation are revealed using proteomic data and genome-scale models

    DEFF Research Database (Denmark)

    Yang, Laurence; Yurkovich, James T.; Lloyd, Colton J.

    2016-01-01

    Integrating omics data to refine or make context-specific models is an active field of constraint-based modeling. Proteomics now cover over 95% of the Escherichia coli proteome by mass. Genome-scale models of Metabolism and macromolecular Expression (ME) compute proteome allocation linked...... to metabolism and fitness. Using proteomics data, we formulated allocation constraints for key proteome sectors in the ME model. The resulting calibrated model effectively computed the "generalist" (wild-type) E. coli proteome and phenotype across diverse growth environments. Across 15 growth conditions...

  4. Reconstruction and modeling protein translocation and compartmentalization in Escherichia coli at the genome-scale

    DEFF Research Database (Denmark)

    Liu, Joanne K.; O’Brien, Edward J.; Lerman, Joshua A.

    2014-01-01

    cell morphology. Comparison of computations performed with this expanded ME-model, named iJL1678-ME, against available experimental data reveals that the model accurately describes translocation pathway expression and the functional proteome by compartmentalized mass. Conclusion: iJL1678-ME enables...... the functional content of membranes, cellular compartment-specific composition, and that it can be utilized to examine the effect of perturbing an expanded set of network components. iJL1678-ME takes a notable step towards the inclusion of cellular ultra-structure in genome-scale models....

  5. Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints

    DEFF Research Database (Denmark)

    Sanchez, Benjamin J.; Zhang, Xi-Cheng; Nilsson, Avlant

    2017-01-01

    Genome-scale metabolic models (GEMs) are widely used to calculate metabolic phenotypes. They rely on defining a set of constraints, the most common of which is that the production of metabolites and/or growth are limited by the carbon source uptake rate. However, enzyme abundances and kinetics......, which act as limitations on metabolic fluxes, are not taken into account. Here, we present GECKO, a method that enhances a GEM to account for enzymes as part of reactions, thereby ensuring that each metabolic flux does not exceed its maximum capacity, equal to the product of the enzyme's abundance...

  6. MEMOSys 2.0: an update of the bioinformatics database for genome-scale models and genomic data.

    Science.gov (United States)

    Pabinger, Stephan; Snajder, Rene; Hardiman, Timo; Willi, Michaela; Dander, Andreas; Trajanoski, Zlatko

    2014-01-01

    The MEtabolic MOdel research and development System (MEMOSys) is a versatile database for the management, storage and development of genome-scale models (GEMs). Since its initial release, the database has undergone major improvements, and the new version introduces several new features. First, the novel concept of derived models allows users to create model hierarchies that automatically propagate modifications along their order. Second, all stored components can now be easily enhanced with additional annotations that can be directly extracted from a supplied Systems Biology Markup Language (SBML) file. Third, the web application has been substantially revised and now features new query mechanisms, an easy search system for reactions and new link-out services to publicly available databases. Fourth, the updated database now contains 20 publicly available models, which can be easily exported into standardized formats for further analysis. Fifth, MEMOSys 2.0 is now also available as a fully configured virtual image and can be found online at http://www.icbi.at/memosys and http://memoys.i-med.ac.at. Database URL: http://memosys.i-med.ac.at.

  7. Sexagesimal scale for mapping human genome Escala sexagesimal para mapear el genoma humano

    Directory of Open Access Journals (Sweden)

    RICARDO CRUZ-COKE

    2001-03-01

    Full Text Available In a previous work I designed a diagram of the human genome based on a circular ideogram of the haploid set of chromosomes, using a low resolution scale of Megabase units. The purpose of this work is to draft a new scale to measure the physical map of the human genome at the highest resolution level. The entire length of the haploid genome of males is deployed in a circumference, marked with a sexagesimal scale with 360 degrees and 1296000 arc seconds. The radio of this circunference displays a semilogaritmic metric scale from 1 m up to the nanometer level. The base pair level of DNA sequences, 10-9 of this circunsference, is measured in milliarsec unit (mas, equivalent to a thousand of arcsecond. The "mas" unit, correspond to 1.27 nanometers (nm or 0.427 base pair (bp and it is the framework for measure DNA sequences. Thus the three billion base pairs of the human genome may be identified by 1296000000 "mas" units in continous correlation from number 1 to number 1296000000. This sexagesimal scale covers all the levels of the nuclear genetic material, from nucleotides to chromosomes. The locations of every codon and every gene may be numbered in the physical map of chomosome regions according to this new scale, instead of the partial kilobase and Megabase scales used today. The advantage of the new scale is the unification of the set of chromosomes under a continous scale of measurement at the DNA level, facilitating the correlation with the phenotypes of man and other speciesEn un trabajo anterior yo diseñé un diagrama del genoma humano basado en un ideograma circular del conjunto haploide de cromosomas, usando una escala de baja resolución en megabases. El propósito de este trabajo es el de diseñar una nueva escala para medir el mapa físico del genoma humano al más alto nivel de resolución. La longitud completa del genoma haploide del varon es extendido en una circunsferencia, marcada con una escala sexagesimal de 360 grados y 1296000

  8. The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics

    DEFF Research Database (Denmark)

    Gopalakrishnan, Shyam; Samaniego Castruita, Jose Alfredo; Sinding, Mikkel Holger Strander

    2017-01-01

    Background An increasing number of studies are addressing the evolutionary genomics of dog domestication, principally through resequencing dog, wolf and related canid genomes. There is, however, only one de novo assembled canid genome currently available against which to map such data...... - that of a boxer dog (Canis lupus familiaris). We generated the first de novo wolf genome (Canis lupus lupus) as an additional choice of reference, and explored what implications may arise when previously published dog and wolf resequencing data are remapped to this reference. Results Reassuringly, we find......, then using the boxer reference genome is appropriate, but if the aim of the study is to look at the variation within wolves and their relationships to dogs, then there are clear benefits to using the de novo assembled wolf reference genome....

  9. Chromosome-scale comparative sequence analysis unravels molecular mechanisms of genome evolution between two wheat cultivars

    KAUST Repository

    Thind, Anupriya Kaur

    2018-02-08

    Background: Recent improvements in DNA sequencing and genome scaffolding have paved the way to generate high-quality de novo assemblies of pseudomolecules representing complete chromosomes of wheat and its wild relatives. These assemblies form the basis to compare the evolutionary dynamics of wheat genomes on a megabase-scale. Results: Here, we provide a comparative sequence analysis of the 700-megabase chromosome 2D between two bread wheat genotypes, the old landrace Chinese Spring and the elite Swiss spring wheat line CH Campala Lr22a. There was a high degree of sequence conservation between the two chromosomes. Analysis of large structural variations revealed four large insertions/deletions (InDels) of >100 kb. Based on the molecular signatures at the breakpoints, unequal crossing over and double-strand break repair were identified as the evolutionary mechanisms that caused these InDels. Three of the large InDels affected copy number of NLRs, a gene family involved in plant immunity. Analysis of single nucleotide polymorphism (SNP) density revealed three haploblocks of 8 Mb, 9 Mb and 48 Mb with a 35-fold increased SNP density compared to the rest of the chromosome. Conclusions: This comparative analysis of two high-quality chromosome assemblies enabled a comprehensive assessment of large structural variations. The insight obtained from this analysis will form the basis of future wheat pan-genome studies.

  10. What can genome-scale metabolic network reconstructions do for prokaryotic systematics?

    Science.gov (United States)

    Barona-Gómez, Francisco; Cruz-Morales, Pablo; Noda-García, Lianet

    2012-01-01

    It has recently been proposed that in addition to Nomenclature, Classification and Identification, Comprehending Microbial Diversity may be considered as the fourth tenet of microbial systematics [Staley JT (2010) The Bulletin of BISMiS, 1(1): 1-5]. As this fourth goal implies a fundamental understanding of microbial speciation, this perspective article argues that translation of bacterial genome sequences into metabolic features may contribute to the development of modern polyphasic taxonomic approaches. Genome-scale metabolic network reconstructions (GSMRs), which are the result of computationally predicted and experimentally confirmed stoichiometric matrices incorporating all enzyme and metabolite components encoded by a genome sequence, provide a platform that can illustrate bacterial speciation. As the topology and the composition of GSMRs are expected to be the result of adaptive evolution, the features of these networks may provide the prokaryotic taxonomist with novel tools for reaching the fourth tenet of microbial systematics. Through selected examples from the Actinobacteria, which have been inferred from GSMRs and experimentally confirmed after phenotypic characterisation, it will be shown that this level of information can be incorporated into modern polyphasic taxonomic approaches. In conclusion, three specific examples are illustrated to show how GSMRs will revolutionize prokaryotic systematics, as has previously occurred in many other fields of microbiology.

  11. Genome-scale metabolic network validation of Shewanella oneidensis using transposon insertion frequency analysis.

    Directory of Open Access Journals (Sweden)

    Hong Yang

    2014-09-01

    Full Text Available Transposon mutagenesis, in combination with parallel sequencing, is becoming a powerful tool for en-masse mutant analysis. A probability generating function was used to explain observed miniHimar transposon insertion patterns, and gene essentiality calls were made by transposon insertion frequency analysis (TIFA. TIFA incorporated the observed genome and sequence motif bias of the miniHimar transposon. The gene essentiality calls were compared to: 1 previous genome-wide direct gene-essentiality assignments; and, 2 flux balance analysis (FBA predictions from an existing genome-scale metabolic model of Shewanella oneidensis MR-1. A three-way comparison between FBA, TIFA, and the direct essentiality calls was made to validate the TIFA approach. The refinement in the interpretation of observed transposon insertions demonstrated that genes without insertions are not necessarily essential, and that genes that contain insertions are not always nonessential. The TIFA calls were in reasonable agreement with direct essentiality calls for S. oneidensis, but agreed more closely with E. coli essentiality calls for orthologs. The TIFA gene essentiality calls were in good agreement with the MR-1 FBA essentiality predictions, and the agreement between TIFA and FBA predictions was substantially better than between the FBA and the direct gene essentiality predictions.

  12. Genome sequences of Phytophthora enable translational plant disease management and accelerate research

    Science.gov (United States)

    Niklaus J. Grünwald

    2012-01-01

    Whole and partial genome sequences are becoming available at an ever-increasing pace. For many plant pathogen systems, we are moving into the era of genome resequencing. The first Phytophthora genomes, P. ramorum and P. sojae, became available in 2004, followed shortly by P. infestans...

  13. Simplified large-scale Sanger genome sequencing for influenza A/H3N2 virus.

    Directory of Open Access Journals (Sweden)

    Hong Kai Lee

    Full Text Available BACKGROUND: The advent of next-generation sequencing technologies and the resultant lower costs of sequencing have enabled production of massive amounts of data, including the generation of full genome sequences of pathogens. However, the small genome size of the influenza virus arguably justifies the use of the more conventional Sanger sequencing technology which is still currently more readily available in most diagnostic laboratories. RESULTS: We present a simplified Sanger-based genome sequencing method for sequencing the influenza A/H3N2 virus in a large-scale format. The entire genome sequencing was completed with 19 reverse transcription-polymerase chain reactions (RT-PCRs and 39 sequencing reactions. This method was tested on 15 native clinical samples and 15 culture isolates, respectively, collected between 2009 and 2011. The 15 native clinical samples registered quantification cycle values ranging from 21.0 to 30.56, which were equivalent to 2.4×10(3-1.4×10(6 viral copies/µL of RNA extract. All the PCR-amplified products were sequenced directly without PCR product purification. Notably, high quality sequencing data up to 700 bp were generated for all the samples tested. The completed sequence covered 408,810 nucleotides in total, with 13,627 nucleotides per genome, attaining 100% coding completeness. Of all the bases produced, an average of 89.49% were Phred quality value 40 (QV40 bases (representing an accuracy of circa one miscall for every 10,000 bases or higher, and an average of 93.46% were QV30 bases (one miscall every 1000 bases or higher. CONCLUSIONS: This sequencing protocol has been shown to be cost-effective and less labor-intensive in obtaining full influenza genomes. The constant high quality of sequences generated imparts confidence in extending the application of this non-purified amplicon sequencing approach to other gene sequencing assays, with appropriate use of suitably designed primers.

  14. A forest-based feature screening approach for large-scale genome data with complex structures.

    Science.gov (United States)

    Wang, Gang; Fu, Guifang; Corcoran, Christopher

    2015-12-23

    Genome-wide association studies (GWAS) interrogate large-scale whole genome to characterize the complex genetic architecture for biomedical traits. When the number of SNPs dramatically increases to half million but the sample size is still limited to thousands, the traditional p-value based statistical approaches suffer from unprecedented limitations. Feature screening has proved to be an effective and powerful approach to handle ultrahigh dimensional data statistically, yet it has not received much attention in GWAS. Feature screening reduces the feature space from millions to hundreds by removing non-informative noise. However, the univariate measures used to rank features are mainly based on individual effect without considering the mutual interactions with other features. In this article, we explore the performance of a random forest (RF) based feature screening procedure to emphasize the SNPs that have complex effects for a continuous phenotype. Both simulation and real data analysis are conducted to examine the power of the forest-based feature screening. We compare it with five other popular feature screening approaches via simulation and conclude that RF can serve as a decent feature screening tool to accommodate complex genetic effects such as nonlinear, interactive, correlative, and joint effects. Unlike the traditional p-value based Manhattan plot, we use the Permutation Variable Importance Measure (PVIM) to display the relative significance and believe that it will provide as much useful information as the traditional plot. Most complex traits are found to be regulated by epistatic and polygenic variants. The forest-based feature screening is proven to be an efficient, easily implemented, and accurate approach to cope whole genome data with complex structures. Our explorations should add to a growing body of enlargement of feature screening better serving the demands of contemporary genome data.

  15. Quantitative assessment of thermodynamic constraints on the solution space of genome-scale metabolic models.

    Science.gov (United States)

    Hamilton, Joshua J; Dwivedi, Vivek; Reed, Jennifer L

    2013-07-16

    Constraint-based methods provide powerful computational techniques to allow understanding and prediction of cellular behavior. These methods rely on physiochemical constraints to eliminate infeasible behaviors from the space of available behaviors. One such constraint is thermodynamic feasibility, the requirement that intracellular flux distributions obey the laws of thermodynamics. The past decade has seen several constraint-based methods that interpret this constraint in different ways, including those that are limited to small networks, rely on predefined reaction directions, and/or neglect the relationship between reaction free energies and metabolite concentrations. In this work, we utilize one such approach, thermodynamics-based metabolic flux analysis (TMFA), to make genome-scale, quantitative predictions about metabolite concentrations and reaction free energies in the absence of prior knowledge of reaction directions, while accounting for uncertainties in thermodynamic estimates. We applied TMFA to a genome-scale network reconstruction of Escherichia coli and examined the effect of thermodynamic constraints on the flux space. We also assessed the predictive performance of TMFA against gene essentiality and quantitative metabolomics data, under both aerobic and anaerobic, and optimal and suboptimal growth conditions. Based on these results, we propose that TMFA is a useful tool for validating phenotypes and generating hypotheses, and that additional types of data and constraints can improve predictions of metabolite concentrations. Copyright © 2013 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  16. A Method to Constrain Genome-Scale Models with 13C Labeling Data.

    Directory of Open Access Journals (Sweden)

    Héctor García Martín

    2015-09-01

    Full Text Available Current limitations in quantitatively predicting biological behavior hinder our efforts to engineer biological systems to produce biofuels and other desired chemicals. Here, we present a new method for calculating metabolic fluxes, key targets in metabolic engineering, that incorporates data from 13C labeling experiments and genome-scale models. The data from 13C labeling experiments provide strong flux constraints that eliminate the need to assume an evolutionary optimization principle such as the growth rate optimization assumption used in Flux Balance Analysis (FBA. This effective constraining is achieved by making the simple but biologically relevant assumption that flux flows from core to peripheral metabolism and does not flow back. The new method is significantly more robust than FBA with respect to errors in genome-scale model reconstruction. Furthermore, it can provide a comprehensive picture of metabolite balancing and predictions for unmeasured extracellular fluxes as constrained by 13C labeling data. A comparison shows that the results of this new method are similar to those found through 13C Metabolic Flux Analysis (13C MFA for central carbon metabolism but, additionally, it provides flux estimates for peripheral metabolism. The extra validation gained by matching 48 relative labeling measurements is used to identify where and why several existing COnstraint Based Reconstruction and Analysis (COBRA flux prediction algorithms fail. We demonstrate how to use this knowledge to refine these methods and improve their predictive capabilities. This method provides a reliable base upon which to improve the design of biological systems.

  17. An experimentally-supported genome-scale metabolic network reconstruction for Yersinia pestis CO92

    Directory of Open Access Journals (Sweden)

    Motin Vladimir L

    2011-10-01

    Full Text Available Abstract Background Yersinia pestis is a gram-negative bacterium that causes plague, a disease linked historically to the Black Death in Europe during the Middle Ages and to several outbreaks during the modern era. Metabolism in Y. pestis displays remarkable flexibility and robustness, allowing the bacterium to proliferate in both warm-blooded mammalian hosts and cold-blooded insect vectors such as fleas. Results Here we report a genome-scale reconstruction and mathematical model of metabolism for Y. pestis CO92 and supporting experimental growth and metabolite measurements. The model contains 815 genes, 678 proteins, 963 unique metabolites and 1678 reactions, accurately simulates growth on a range of carbon sources both qualitatively and quantitatively, and identifies gaps in several key biosynthetic pathways and suggests how those gaps might be filled. Furthermore, our model presents hypotheses to explain certain known nutritional requirements characteristic of this strain. Conclusions Y. pestis continues to be a dangerous threat to human health during modern times. The Y. pestis genome-scale metabolic reconstruction presented here, which has been benchmarked against experimental data and correctly reproduces known phenotypes, provides an in silico platform with which to investigate the metabolism of this important human pathogen.

  18. An Experimentally-Supported Genome-Scale Metabolic Network Reconstruction for Yersinia pestis CO92

    Energy Technology Data Exchange (ETDEWEB)

    Charusanti, Pep; Chauhan, Sadhana; Mcateer, Kathleen; Lerman, Joshua A.; Hyduke, Daniel R.; Motin, Vladimir L.; Ansong, Charles; Adkins, Joshua N.; Palsson, Bernhard O.

    2011-10-13

    Yersinia pestis is a gram-negative bacterium that causes plague, a disease linked historically to the Black Death in Europe during the Middle Ages and to several outbreaks during the modern era. Metabolism in Y. pestis displays remarkable flexibility and robustness, allowing the bacterium to proliferate in both warm-blooded mammalian hosts and cold-blooded insect vectors such as fleas. Here we report a genome-scale reconstruction and mathematical model of metabolism for Y. pestis CO92 and supporting experimental growth and metabolite measurements. The model contains 815 genes, 678 proteins, 963 unique metabolites and 1678 reactions, accurately simulates growth on a range of carbon sources both qualitatively and quantitatively, and identifies gaps in several key biosynthetic pathways and suggests how those gaps might be filled. Furthermore, our model presents hypotheses to explain certain known nutritional requirements characteristic of this strain. Y. pestis continues to be a dangerous threat to human health during modern times. The Y. pestis genome-scale metabolic reconstruction presented here, which has been benchmarked against experimental data and correctly reproduces known phenotypes, thus provides an in silico platform with which to investigate the metabolism of this important human pathogen.

  19. A mixed-integer linear programming approach to the reduction of genome-scale metabolic networks.

    Science.gov (United States)

    Röhl, Annika; Bockmayr, Alexander

    2017-01-03

    Constraint-based analysis has become a widely used method to study metabolic networks. While some of the associated algorithms can be applied to genome-scale network reconstructions with several thousands of reactions, others are limited to small or medium-sized models. In 2015, Erdrich et al. introduced a method called NetworkReducer, which reduces large metabolic networks to smaller subnetworks, while preserving a set of biological requirements that can be specified by the user. Already in 2001, Burgard et al. developed a mixed-integer linear programming (MILP) approach for computing minimal reaction sets under a given growth requirement. Here we present an MILP approach for computing minimum subnetworks with the given properties. The minimality (with respect to the number of active reactions) is not guaranteed by NetworkReducer, while the method by Burgard et al. does not allow specifying the different biological requirements. Our procedure is about 5-10 times faster than NetworkReducer and can enumerate all minimum subnetworks in case there exist several ones. This allows identifying common reactions that are present in all subnetworks, and reactions appearing in alternative pathways. Applying complex analysis methods to genome-scale metabolic networks is often not possible in practice. Thus it may become necessary to reduce the size of the network while keeping important functionalities. We propose a MILP solution to this problem. Compared to previous work, our approach is more efficient and allows computing not only one, but even all minimum subnetworks satisfying the required properties.

  20. A genome-scale metabolic model of the lipid-accumulating yeast Yarrowia lipolytica

    Directory of Open Access Journals (Sweden)

    Loira Nicolas

    2012-05-01

    Full Text Available Abstract Background Yarrowia lipolytica is an oleaginous yeast which has emerged as an important microorganism for several biotechnological processes, such as the production of organic acids, lipases and proteases. It is also considered a good candidate for single-cell oil production. Although some of its metabolic pathways are well studied, its metabolic engineering is hindered by the lack of a genome-scale model that integrates the current knowledge about its metabolism. Results Combining in silico tools and expert manual curation, we have produced an accurate genome-scale metabolic model for Y. lipolytica. Using a scaffold derived from a functional metabolic model of the well-studied but phylogenetically distant yeast S. cerevisiae, we mapped conserved reactions, rewrote gene associations, added species-specific reactions and inserted specialized copies of scaffold reactions to account for species-specific expansion of protein families. We used physiological measures obtained under lab conditions to validate our predictions. Conclusions Y. lipolytica iNL895 represents the first well-annotated metabolic model of an oleaginous yeast, providing a base for future metabolic improvement, and a starting point for the metabolic reconstruction of other species in the Yarrowia clade and other oleaginous yeasts.

  1. Comprehensive Mapping of Pluripotent Stem Cell Metabolism Using Dynamic Genome-Scale Network Modeling

    Directory of Open Access Journals (Sweden)

    Sriram Chandrasekaran

    2017-12-01

    Full Text Available Summary: Metabolism is an emerging stem cell hallmark tied to cell fate, pluripotency, and self-renewal, yet systems-level understanding of stem cell metabolism has been limited by the lack of genome-scale network models. Here, we develop a systems approach to integrate time-course metabolomics data with a computational model of metabolism to analyze the metabolic state of naive and primed murine pluripotent stem cells. Using this approach, we find that one-carbon metabolism involving phosphoglycerate dehydrogenase, folate synthesis, and nucleotide synthesis is a key pathway that differs between the two states, resulting in differential sensitivity to anti-folates. The model also predicts that the pluripotency factor Lin28 regulates this one-carbon metabolic pathway, which we validate using metabolomics data from Lin28-deficient cells. Moreover, we identify and validate metabolic reactions related to S-adenosyl-methionine production that can differentially impact histone methylation in naive and primed cells. Our network-based approach provides a framework for characterizing metabolic changes influencing pluripotency and cell fate. : Chandrasekaran et al. use computational modeling, metabolomics, and metabolic inhibitors to discover metabolic differences between various pluripotent stem cell states and infer their impact on stem cell fate decisions. Keywords: systems biology, stem cell biology, metabolism, genome-scale modeling, pluripotency, histone methylation, naive (ground state, primed state, cell fate, metabolic network

  2. Data partitioning enables the use of standard SOAP Web Services in genome-scale workflows.

    Science.gov (United States)

    Sztromwasser, Pawel; Puntervoll, Pål; Petersen, Kjell

    2011-07-26

    Biological databases and computational biology tools are provided by research groups around the world, and made accessible on the Web. Combining these resources is a common practice in bioinformatics, but integration of heterogeneous and often distributed tools and datasets can be challenging. To date, this challenge has been commonly addressed in a pragmatic way, by tedious and error-prone scripting. Recently however a more reliable technique has been identified and proposed as the platform that would tie together bioinformatics resources, namely Web Services. In the last decade the Web Services have spread wide in bioinformatics, and earned the title of recommended technology. However, in the era of high-throughput experimentation, a major concern regarding Web Services is their ability to handle large-scale data traffic. We propose a stream-like communication pattern for standard SOAP Web Services, that enables efficient flow of large data traffic between a workflow orchestrator and Web Services. We evaluated the data-partitioning strategy by comparing it with typical communication patterns on an example pipeline for genomic sequence annotation. The results show that data-partitioning lowers resource demands of services and increases their throughput, which in consequence allows to execute in-silico experiments on genome-scale, using standard SOAP Web Services and workflows. As a proof-of-principle we annotated an RNA-seq dataset using a plain BPEL workflow engine.

  3. Network Thermodynamic Curation of Human and Yeast Genome-Scale Metabolic Models

    Science.gov (United States)

    Martínez, Verónica S.; Quek, Lake-Ee; Nielsen, Lars K.

    2014-01-01

    Genome-scale models are used for an ever-widening range of applications. Although there has been much focus on specifying the stoichiometric matrix, the predictive power of genome-scale models equally depends on reaction directions. Two-thirds of reactions in the two eukaryotic reconstructions Homo sapiens Recon 1 and Yeast 5 are specified as irreversible. However, these specifications are mainly based on biochemical textbooks or on their similarity to other organisms and are rarely underpinned by detailed thermodynamic analysis. In this study, a to our knowledge new workflow combining network-embedded thermodynamic and flux variability analysis was used to evaluate existing irreversibility constraints in Recon 1 and Yeast 5 and to identify new ones. A total of 27 and 16 new irreversible reactions were identified in Recon 1 and Yeast 5, respectively, whereas only four reactions were found with directions incorrectly specified against thermodynamics (three in Yeast 5 and one in Recon 1). The workflow further identified for both models several isolated internal loops that require further curation. The framework also highlighted the need for substrate channeling (in human) and ATP hydrolysis (in yeast) for the essential reaction catalyzed by phosphoribosylaminoimidazole carboxylase in purine metabolism. Finally, the framework highlighted differences in proline metabolism between yeast (cytosolic anabolism and mitochondrial catabolism) and humans (exclusively mitochondrial metabolism). We conclude that network-embedded thermodynamics facilitates the specification and validation of irreversibility constraints in compartmentalized metabolic models, at the same time providing further insight into network properties. PMID:25028891

  4. Quantitative Assessment of Thermodynamic Constraints on the Solution Space of Genome-Scale Metabolic Models

    Science.gov (United States)

    Hamilton, Joshua J.; Dwivedi, Vivek; Reed, Jennifer L.

    2013-01-01

    Constraint-based methods provide powerful computational techniques to allow understanding and prediction of cellular behavior. These methods rely on physiochemical constraints to eliminate infeasible behaviors from the space of available behaviors. One such constraint is thermodynamic feasibility, the requirement that intracellular flux distributions obey the laws of thermodynamics. The past decade has seen several constraint-based methods that interpret this constraint in different ways, including those that are limited to small networks, rely on predefined reaction directions, and/or neglect the relationship between reaction free energies and metabolite concentrations. In this work, we utilize one such approach, thermodynamics-based metabolic flux analysis (TMFA), to make genome-scale, quantitative predictions about metabolite concentrations and reaction free energies in the absence of prior knowledge of reaction directions, while accounting for uncertainties in thermodynamic estimates. We applied TMFA to a genome-scale network reconstruction of Escherichia coli and examined the effect of thermodynamic constraints on the flux space. We also assessed the predictive performance of TMFA against gene essentiality and quantitative metabolomics data, under both aerobic and anaerobic, and optimal and suboptimal growth conditions. Based on these results, we propose that TMFA is a useful tool for validating phenotypes and generating hypotheses, and that additional types of data and constraints can improve predictions of metabolite concentrations. PMID:23870272

  5. Computational Prediction of Synthetic Lethals in Genome-Scale Metabolic Models Using Fast-SL.

    Science.gov (United States)

    Raman, Karthik; Pratapa, Aditya; Mohite, Omkar; Balachandran, Shankar

    2018-01-01

    In this chapter, we describe Fast-SL, an in silico approach to predict synthetic lethals in genome-scale metabolic models. Synthetic lethals are sets of genes or reactions where only the simultaneous removal of all genes or reactions in the set abolishes growth of an organism. In silico approaches to predict synthetic lethals are based on Flux Balance Analysis (FBA), a popular constraint-based analysis method based on linear programming. FBA has been shown to accurately predict the viability of various genome-scale metabolic models. Fast-SL builds on the framework of FBA and enables the prediction of synthetic lethal reactions or genes in different organisms, under various environmental conditions. Predicting synthetic lethals in metabolic network models allows us to generate hypotheses on possible novel genetic interactions and potential candidates for combinatorial therapy, in case of pathogenic organisms. We here summarize the Fast-SL approach for analyzing metabolic networks and detail the procedure to predict synthetic lethals in any given metabolic model. We illustrate the approach by predicting synthetic lethals in Escherichia coli. The Fast-SL implementation for MATLAB is available from https://github.com/RamanLab/FastSL/ .

  6. Genome scale evolution of myxoma virus reveals host-pathogen adaptation and rapid geographic spread.

    Science.gov (United States)

    Kerr, Peter J; Rogers, Matthew B; Fitch, Adam; Depasse, Jay V; Cattadori, Isabella M; Twaddle, Alan C; Hudson, Peter J; Tscharke, David C; Read, Andrew F; Holmes, Edward C; Ghedin, Elodie

    2013-12-01

    The evolutionary interplay between myxoma virus (MYXV) and the European rabbit (Oryctolagus cuniculus) following release of the virus in Australia in 1950 as a biological control is a classic example of host-pathogen coevolution. We present a detailed genomic and phylogeographic analysis of 30 strains of MYXV, including the Australian progenitor strain Standard Laboratory Strain (SLS), 24 Australian viruses isolated from 1951 to 1999, and three isolates from the early radiation in Britain from 1954 and 1955. We show that in Australia MYXV has spread rapidly on a spatial scale, with multiple lineages cocirculating within individual localities, and that both highly virulent and attenuated viruses were still present in the field through the 1990s. In addition, the detection of closely related virus lineages at sites 1,000 km apart suggests that MYXV moves freely in geographic space, with mosquitoes, fleas, and rabbit migration all providing means of transport. Strikingly, despite multiple introductions, all modern viruses appear to be ultimately derived from the original introductions of SLS. The rapidity of MYXV evolution was also apparent at the genomic scale, with gene duplications documented in a number of viruses. Duplication of potential virulence genes may be important in increasing the expression of virulence proteins and provides the basis for the evolution of novel functions. Mutations leading to loss of open reading frames were surprisingly frequent and in some cases may explain attenuation, but no common mutations that correlated with virulence or attenuation were identified.

  7. Reconstruction and analysis of the genome-scale metabolic model of Lactobacillus casei LC2W.

    Science.gov (United States)

    Xu, Nan; Liu, Jie; Ai, Lianzhong; Liu, Liming

    2015-01-10

    Lactobacillus casei LC2W is a recently isolated probiotic lactic acid bacterial strain, which is widely used in the dairy and pharmaceutical industries and in clinical medicine. The first genome-scale metabolic model for L. casei, composed of 846 genes, 969 metabolic reactions, and 785 metabolites, was reconstructed using both manual genome annotation and an automatic SEED model. Then, the iJL846 model was validated by simulating cell growth on 15 reported carbon sources. The iJL846 model explored the metabolism of L. casei on a genome scale: (1) explanation of the genetic codes-metabolic functions of 342 genes were reannotated in this model; (2) characterization of the physiology-10 amino acids and 7 vitamins were identified to be essential nutrients for L. casei LC2W growth; (3) analyses of metabolic pathways-the transport and metabolism of the 17 essential nutrients and exopolysaccharide (EPS) biosynthesis-were performed; (4) exploration of metabolic capacity was conducted-for lactate, the importance of genes in its biosynthetic pathways was evaluated, and the requirements of amino acids were predicted for mixed acid fermentation; for flavor compounds, the effects of oxygen were analyzed, and three new knockout targets were selected for acetoin production; for EPS, 11 types of nutrients in the rich medium and important reactions in the biosynthetic pathway were identified that enhanced EPS production. In conclusion, the iJL846 model serves as a useful tool for understanding and engineering the metabolism of this probiotic strain. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Genome-scale reconstruction of Salinispora tropica CNB-440 metabolism to study strain-specific adaptation.

    Science.gov (United States)

    Contador, C A; Rodríguez, V; Andrews, B A; Asenjo, J A

    2015-11-01

    The first manually curated genome-scale metabolic model for Salinispora tropica strain CNB-440 was constructed. The reconstruction enables characterization of the metabolic capabilities for understanding and modeling the cellular physiology of this actinobacterium. The iCC908 model was based on physiological and biochemical information of primary and specialised metabolism pathways. The reconstructed stoichiometric matrix consists of 1169 biochemical conversions, 204 transport reactions and 1317 metabolites. A total of 908 structural open reading frames (ORFs) were included in the reconstructed network. The number of gene functions included in the reconstructed network corresponds to 20% of all characterized ORFs in the S. tropica genome. The genome-scale metabolic model was used to study strain-specific capabilities in defined minimal media. iCC908 was used to analyze growth capabilities in 41 different minimal growth-supporting environments. These nutrient sources were evaluated experimentally to assess the accuracy of in silico growth simulations. The model predicted no auxotrophies for essential amino acids, which was corroborated experimentally. The strain is able to use 21 different carbon sources, 8 nitrogen sources and 4 sulfur sources from the nutrient sources tested. Experimental observation suggests that the cells may be able to store sulfur. False predictions provided opportunities to gain new insights into the physiology of this species, and to gap fill the missing knowledge. The incorporation of modifications led to increased accuracy in predicting the outcome of growth/no growth experiments from 76 to 93%. iCC908 can thus be used to define the metabolic capabilities of S. tropica and guide and enhance the production of specialised metabolites.

  9. Genome-scale metabolic network guided engineering of Streptomyces tsukubaensis for FK506 production improvement.

    Science.gov (United States)

    Huang, Di; Li, Shanshan; Xia, Menglei; Wen, Jianping; Jia, Xiaoqiang

    2013-05-24

    FK506 is an important immunosuppressant, which can be produced by Streptomyces tsukubaensis. However, the production capacity of the strain is very low. Hereby, a computational guided engineering approach was proposed in order to improve the intracellular precursor and cofactor availability of FK506 in S. tsukubaensis. First, a genome-scale metabolic model of S. tsukubaensis was constructed based on its annotated genome and biochemical information. Subsequently, several potential genetic targets (knockout or overexpression) that guaranteed an improved yield of FK506 were identified by the recently developed methodology. To validate the model predictions, each target gene was manipulated in the parent strain D852, respectively. All the engineered strains showed a higher FK506 production, compared with D852. Furthermore, the combined effect of the genetic modifications was evaluated. Results showed that the strain HT-ΔGDH-DAZ with gdhA-deletion and dahp-, accA2-, zwf2-overexpression enhanced FK506 concentration up to 398.9 mg/L, compared with 143.5 mg/L of the parent strain D852. Finally, fed-batch fermentations of HT-ΔGDH-DAZ were carried out, which led to the FK506 production of 435.9 mg/L, 1.47-fold higher than the parent strain D852 (158.7 mg/L). Results confirmed that the promising targets led to an increase in FK506 titer. The present work is the first attempt to engineer the primary precursor pathways to improve FK506 production in S. tsukubaensis with genome-scale metabolic network guided metabolic engineering. The relationship between model prediction and experimental results demonstrates the rationality and validity of this approach for target identification. This strategy can also be applied to the improvement of other important secondary metabolites.

  10. Identification of novel targets for breast cancer by exploring gene switches on a genome scale

    Directory of Open Access Journals (Sweden)

    Wu Ming

    2011-11-01

    Full Text Available Abstract Background An important feature that emerges from analyzing gene regulatory networks is the "switch-like behavior" or "bistability", a dynamic feature of a particular gene to preferentially toggle between two steady-states. The state of gene switches plays pivotal roles in cell fate decision, but identifying switches has been difficult. Therefore a challenge confronting the field is to be able to systematically identify gene switches. Results We propose a top-down mining approach to exploring gene switches on a genome-scale level. Theoretical analysis, proof-of-concept examples, and experimental studies demonstrate the ability of our mining approach to identify bistable genes by sampling across a variety of different conditions. Applying the approach to human breast cancer data identified genes that show bimodality within the cancer samples, such as estrogen receptor (ER and ERBB2, as well as genes that show bimodality between cancer and non-cancer samples, where tumor-associated calcium signal transducer 2 (TACSTD2 is uncovered. We further suggest a likely transcription factor that regulates TACSTD2. Conclusions Our mining approach demonstrates that one can capitalize on genome-wide expression profiling to capture dynamic properties of a complex network. To the best of our knowledge, this is the first attempt in applying mining approaches to explore gene switches on a genome-scale, and the identification of TACSTD2 demonstrates that single cell-level bistability can be predicted from microarray data. Experimental confirmation of the computational results suggest TACSTD2 could be a potential biomarker and attractive candidate for drug therapy against both ER+ and ER- subtypes of breast cancer, including the triple negative subtype.

  11. SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Large Scale

    Energy Technology Data Exchange (ETDEWEB)

    Meng, Jintao; Seo, Sangmin; Balaji, Pavan; Wei, Yanjie; Wang, Bingqiang; Feng, Shengzhong

    2016-08-16

    In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with the size of sequencing data ranging from terabyes to petabytes. According to the performance analysis results, the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted as SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMER assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

  12. Comparison of HapMap and 1000 genomes reference panels in a large-scale genome-wide association study

    NARCIS (Netherlands)

    P.S. de Vries (Paul); M. Sabater-Lleal (Maria); D.I. Chasman (Daniel); S. Trompet (Stella); T.S. Ahluwalia (Tarunveer Singh); A. Teumer (Alexander); M.E. Kleber (Marcus); M.-H. Chen (Ming-Huei); J.J. Wang (Jie Jin); J. Attia (John); R.E. Marioni (Riccardo); M. Steri (Maristella); Weng, L.-C. (Lu-Chen); R. Pool (Reńe); V. Grossmann (Vera); J. Brody (Jennifer); C. Venturini (Cristina); T. Tanaka (Toshiko); L.M. Rose (Lynda); C. Oldmeadow (Christopher); J. Mazur (Johanna); S. Basu (Saonli); M. Frånberg (Mattias); Q. Yang (Qiong); S. Ligthart (Symen); J.J. Hottenga (Jouke Jan); A. Rumley (Ann); Mulas, A. (Antonella); A.J. de Craen (Anton); A. Grotevendt (Anne); K.D. Taylor (Kent D.); G. Delgado; A. Kifley (Annette); L.M. Lopez (Lorna); T.L. Berentzen (Tina L.); M. Mangino (Massimo); S. Bandinelli (Stefania); Morrison, A.C. (Alanna C.); A. Hamsten (Anders); G.H. Tofler (Geoffrey); M.P.M. de Maat (Moniek); G. Draisma (Gerrit); G.D. Lowe (Gordon D.); M. Zoledziewska (Magdalena); N. Sattar (Naveed); Lackner, K.J. (Karl J.); U. Völker (Uwe); McKnight, B. (Barbara); J. Huang (Jian); E.G. Holliday (Elizabeth); McEvoy, M.A. (Mark A.); J.M. Starr (John); P.G. Hysi (Pirro); D.G. Hernandez (Dena); W. Guan (Weihua); F. Rivadeneira Ramirez (Fernando); W.L. McArdle (Wendy); P.E. Slagboom (Eline); Zeller, T. (Tanja); B.M. Psaty (Bruce); A.G. Uitterlinden (André); E.J.C. de Geus (Eco); D.J. Stott (David J.); H. Binder (Harald); A. Hofman (Albert); O.H. Franco (Oscar); J.I. Rotter (Jerome I.); L. Ferrucci (Luigi); Spector, T.D. (Tim D.); I.J. Deary (Ian J.); W. März (Winfried); A. Greinacher (Andreas); P.S. Wild (Philipp S.); F. Cucca (Francesco); D.I. Boomsma (Dorret); Watkins, H. (Hugh); Tang, W. (Weihong); P.M. Ridker (Paul); J.W. Jukema; R.J. Scott (Rodney J.); P. Mitchell (Paul); T. Hansen (T.); O'Donnell, C.J. (Christopher J.); Smith, N.L. (Nicholas L.); D.P. Strachan (David P.); A. Dehghan (Abbas)

    2017-01-01

    textabstractAn increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap

  13. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

    NARCIS (Netherlands)

    de Vries, Paul S; Sabater-Lleal, Maria; Chasman, Daniel I; Trompet, Stella; Ahluwalia, Tarunveer S; Teumer, Alexander; Kleber, Marcus E; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R; Marioni, Riccardo E; Steri, Maristella; Weng, Lu-Chen; Pool, Rene; Grossmann, Vera; Brody, Jennifer A; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Frånberg, Mattias; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J; Rumley, Ann; Mulas, Antonella; De Craen, Anton J M; Grotevendt, Anne; Taylor, Kent D; Delgado, Graciela E; Kifley, Annette; Lopez, Lorna M; Berentzen, Tina L; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P M; Draisma, Harmen H M; Lowe, Gordon D; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G; McEvoy, Mark A; Starr, John M; Hysi, Pirro G; Hernandez, Dena G; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L; Slagboom, P. Eline; Zeller, Tanja; Psaty, Bruce M; Uitterlinden, André G; de Geus, Eco J C; Stott, David J; Binder, Harald; Hofman, Albert; Franco, Oscar H; Rotter, Jerome I; Ferrucci, Luigi; Spector, Tim D; Deary, Ian J; März, Winfried; Greinacher, Andreas; Wild, Philipp S; Cucca, Francesco; Boomsma, Dorret I; Watkins, Hugh; Tang, Weihong; Ridker, Paul M; Jukema, Jan W; Scott, Rodney J; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J; Smith, Nicholas L; Strachan, David P; Dehghan, Abbas

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In

  14. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    Directory of Open Access Journals (Sweden)

    Brunham Robert C

    2004-07-01

    Full Text Available Abstract Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics.

  15. Decoding the fine-scale structure of a breast cancer genome and transcriptome

    OpenAIRE

    Volik, Stanislav; Raphael, Benjamin J.; Huang, Guiqing; Stratton, Michael R.; Bignel, Graham; Murnane, John; Brebner, John H.; Bajsarowicz, Krystyna; Paris, Pamela L.; Tao, Quanzhou; Kowbel, David; Lapuk, Anna; Shagin, Dmitri A.; Shagina, Irina A.; Gray, Joe W.

    2006-01-01

    A comprehensive understanding of cancer is predicated upon knowledge of the structure of malignant genomes underlying its many variant forms and the molecular mechanisms giving rise to them. It is well established that solid tumor genomes accumulate a large number of genome rearrangements during tumorigenesis. End Sequence Profiling (ESP) maps and clones genome breakpoints associated with all types of genome rearrangements elucidating the structural organization of tumor genomes. Here we exte...

  16. Population-based resequencing revealed an ancestral winter group of cultivated flax: implication for flax domestication processes

    Science.gov (United States)

    Fu, Yong-Bi

    2012-01-01

    Cultivated flax (Linum usitatissimum L.) is the earliest oil and fiber crop and its early domestication history may involve multiple events of domestication for oil, fiber, capsular indehiscence, and winter hardiness. Genetic studies have demonstrated that winter cultivated flax is closely related to oil and fiber cultivated flax and shows little relatedness to its progenitor, pale flax (L. bienne Mill.), but winter hardiness is one major characteristic of pale flax. Here, we assessed the genetic relationships of 48 Linum samples representing pale flax and four trait-specific groups of cultivated flax (dehiscent, fiber, oil, and winter) through population-based resequencing at 24 genomic regions, and revealed a winter group of cultivated flax that displayed close relatedness to the pale flax samples. Overall, the cultivated flax showed a 27% reduction of nucleotide diversity when compared with the pale flax. Recombination frequently occurred at these sampled genomic regions, but the signal of selection and bottleneck was relatively weak. These findings provide some insight into the impact and processes of flax domestication and are significant for expanding our knowledge about early flax domestication, particularly for winter hardiness. PMID:22822439

  17. In situ to in silico and back: elucidating the physiology and ecology of Geobacter spp. using genome-scale modelling.

    Science.gov (United States)

    Mahadevan, Radhakrishnan; Palsson, Bernhard Ø; Lovley, Derek R

    2011-01-01

    There is a wide diversity of unexplored metabolism encoded in the genomes of microorganisms that have an important environmental role. Genome-scale metabolic modelling enables the individual reactions that are encoded in annotated genomes to be organized into a coherent whole, which can then be used to predict metabolic fluxes that will optimize cell function under a range of conditions. In this Review, we summarize a series of studies in which genome-scale metabolic modelling of Geobacter spp. has resulted in an in-depth understanding of their central metabolism and ecology. A similar iterative modelling and experimental approach could accelerate elucidation of the physiology and ecology of other microorganisms inhabiting a diversity of environments, and could guide optimization of the practical applications of these species.

  18. The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics

    DEFF Research Database (Denmark)

    Gopalakrishnan, Shyam; Samaniego Castruita, Jose Alfredo; Sinding, Mikkel Holger Strander

    2017-01-01

    - that of a boxer dog (Canis lupus familiaris). We generated the first de novo wolf genome (Canis lupus lupus) as an additional choice of reference, and explored what implications may arise when previously published dog and wolf resequencing data are remapped to this reference. Results Reassuringly, we find...

  19. Genome-scale metabolic analysis of Clostridium thermocellum for bioethanol production

    Directory of Open Access Journals (Sweden)

    Brooks J Paul

    2010-03-01

    Full Text Available Abstract Background Microorganisms possess diverse metabolic capabilities that can potentially be leveraged for efficient production of biofuels. Clostridium thermocellum (ATCC 27405 is a thermophilic anaerobe that is both cellulolytic and ethanologenic, meaning that it can directly use the plant sugar, cellulose, and biochemically convert it to ethanol. A major challenge in using microorganisms for chemical production is the need to modify the organism to increase production efficiency. The process of properly engineering an organism is typically arduous. Results Here we present a genome-scale model of C. thermocellum metabolism, iSR432, for the purpose of establishing a computational tool to study the metabolic network of C. thermocellum and facilitate efforts to engineer C. thermocellum for biofuel production. The model consists of 577 reactions involving 525 intracellular metabolites, 432 genes, and a proteomic-based representation of a cellulosome. The process of constructing this metabolic model led to suggested annotation refinements for 27 genes and identification of areas of metabolism requiring further study. The accuracy of the iSR432 model was tested using experimental growth and by-product secretion data for growth on cellobiose and fructose. Analysis using this model captures the relationship between the reduction-oxidation state of the cell and ethanol secretion and allowed for prediction of gene deletions and environmental conditions that would increase ethanol production. Conclusions By incorporating genomic sequence data, network topology, and experimental measurements of enzyme activities and metabolite fluxes, we have generated a model that is reasonably accurate at predicting the cellular phenotype of C. thermocellum and establish a strong foundation for rational strain design. In addition, we are able to draw some important conclusions regarding the underlying metabolic mechanisms for observed behaviors of C. thermocellum

  20. Genome-scale reconstruction of the metabolic network in Yersinia pestis, strain 91001

    Energy Technology Data Exchange (ETDEWEB)

    Navid, A; Almaas, E

    2009-01-13

    The gram-negative bacterium Yersinia pestis, the aetiological agent of bubonic plague, is one the deadliest pathogens known to man. Despite its historical reputation, plague is a modern disease which annually afflicts thousands of people. Public safety considerations greatly limit clinical experimentation on this organism and thus development of theoretical tools to analyze the capabilities of this pathogen is of utmost importance. Here, we report the first genome-scale metabolic model of Yersinia pestis biovar Mediaevalis based both on its recently annotated genome, and physiological and biochemical data from literature. Our model demonstrates excellent agreement with Y. pestis known metabolic needs and capabilities. Since Y. pestis is a meiotrophic organism, we have developed CryptFind, a systematic approach to identify all candidate cryptic genes responsible for known and theoretical meiotrophic phenomena. In addition to uncovering every known cryptic gene for Y. pestis, our analysis of the rhamnose fermentation pathway suggests that betB is the responsible cryptic gene. Despite all of our medical advances, we still do not have a vaccine for bubonic plague. Recent discoveries of antibiotic resistant strains of Yersinia pestis coupled with the threat of plague being used as a bioterrorism weapon compel us to develop new tools for studying the physiology of this deadly pathogen. Using our theoretical model, we can study the cell's phenotypic behavior under different circumstances and identify metabolic weaknesses which may be harnessed for the development of therapeutics. Additionally, the automatic identification of cryptic genes expands the usage of genomic data for pharmaceutical purposes.

  1. Genome-scale reconstruction and analysis of the metabolic network in the hyperthermophilic archaeon Sulfolobus solfataricus.

    Directory of Open Access Journals (Sweden)

    Thomas Ulas

    Full Text Available We describe the reconstruction of a genome-scale metabolic model of the crenarchaeon Sulfolobus solfataricus, a hyperthermoacidophilic microorganism. It grows in terrestrial volcanic hot springs with growth occurring at pH 2-4 (optimum 3.5 and a temperature of 75-80°C (optimum 80°C. The genome of Sulfolobus solfataricus P2 contains 2,992,245 bp on a single circular chromosome and encodes 2,977 proteins and a number of RNAs. The network comprises 718 metabolic and 58 transport/exchange reactions and 705 unique metabolites, based on the annotated genome and available biochemical data. Using the model in conjunction with constraint-based methods, we simulated the metabolic fluxes induced by different environmental and genetic conditions. The predictions were compared to experimental measurements and phenotypes of S. solfataricus. Furthermore, the performance of the network for 35 different carbon sources known for S. solfataricus from the literature was simulated. Comparing the growth on different carbon sources revealed that glycerol is the carbon source with the highest biomass flux per imported carbon atom (75% higher than glucose. Experimental data was also used to fit the model to phenotypic observations. In addition to the commonly known heterotrophic growth of S. solfataricus, the crenarchaeon is also able to grow autotrophically using the hydroxypropionate-hydroxybutyrate cycle for bicarbonate fixation. We integrated this pathway into our model and compared bicarbonate fixation with growth on glucose as sole carbon source. Finally, we tested the robustness of the metabolism with respect to gene deletions using the method of Minimization of Metabolic Adjustment (MOMA, which predicted that 18% of all possible single gene deletions would be lethal for the organism.

  2. A Consensus Genome-scale Reconstruction of Chinese Hamster Ovary Cell Metabolism

    KAUST Repository

    Hefzi, Hooman

    2016-11-23

    Chinese hamster ovary (CHO) cells dominate biotherapeutic protein production and are widely used in mammalian cell line engineering research. To elucidate metabolic bottlenecks in protein production and to guide cell engineering and bioprocess optimization, we reconstructed the metabolic pathways in CHO and associated them with >1,700 genes in the Cricetulus griseus genome. The genome-scale metabolic model based on this reconstruction, iCHO1766, and cell-line-specific models for CHO-K1, CHO-S, and CHO-DG44 cells provide the biochemical basis of growth and recombinant protein production. The models accurately predict growth phenotypes and known auxotrophies in CHO cells. With the models, we quantify the protein synthesis capacity of CHO cells and demonstrate that common bioprocess treatments, such as histone deacetylase inhibitors, inefficiently increase product yield. However, our simulations show that the metabolic resources in CHO are more than three times more efficiently utilized for growth or recombinant protein synthesis following targeted efforts to engineer the CHO secretory pathway. This model will further accelerate CHO cell engineering and help optimize bioprocesses.

  3. A multi-layer method to study genome-scale positions of nucleosomes.

    Science.gov (United States)

    Di Gesù, Vito; Lo Bosco, Giosuè; Pinello, Luca; Yuan, Guo-Cheng; Corona, Davide F V

    2009-02-01

    The basic unit of eukaryotic chromatin is the nucleosome, consisting of about 150 bp of DNA wrapped around a protein core made of histone proteins. Nucleosomes position is modulated in vivo to regulate fundamental nuclear processes. To measure nucleosome positions on a genomic scale both theoretical and experimental approaches have been recently reported. We have developed a new method, Multi-Layer Model (MLM), for the analysis of nucleosome position data obtained with microarray-based approach. The MLM is a feature extraction method in which the input data is processed by a classifier to distinguish between several kinds of patterns. We applied our method to simulated-synthetic and experimental nucleosome position data and found that besides a high nucleosome recognition and a strong agreement with standard statistical methods, the MLM can identify distinct classes of nucleosomes, making it an important tool for the genome wide analysis of nucleosome position and function. In conclusion, the MLM allows a better representation of nucleosome position data and a significant reduction in computational time.

  4. Genome-scale reconstruction of metabolic network for a halophilic extremophile, Chromohalobacter salexigens DSM 3043

    Directory of Open Access Journals (Sweden)

    Oner Ebru

    2011-01-01

    Full Text Available Abstract Background Chromohalobacter salexigens (formerly Halomonas elongata DSM 3043 is a halophilic extremophile with a very broad salinity range and is used as a model organism to elucidate prokaryotic osmoadaptation due to its strong euryhaline phenotype. Results C. salexigens DSM 3043's metabolism was reconstructed based on genomic, biochemical and physiological information via a non-automated but iterative process. This manually-curated reconstruction accounts for 584 genes, 1386 reactions, and 1411 metabolites. By using flux balance analysis, the model was extensively validated against literature data on the C. salexigens phenotypic features, the transport and use of different substrates for growth as well as against experimental observations on the uptake and accumulation of industrially important organic osmolytes, ectoine, betaine, and its precursor choline, which play important roles in the adaptive response to osmotic stress. Conclusions This work presents the first comprehensive genome-scale metabolic model of a halophilic bacterium. Being a useful guide for identification and filling of knowledge gaps, the reconstructed metabolic network iOA584 will accelerate the research on halophilic bacteria towards application of systems biology approaches and design of metabolic engineering strategies.

  5. A scalable algorithm to explore the Gibbs energy landscape of genome-scale metabolic networks.

    Directory of Open Access Journals (Sweden)

    Daniele De Martino

    Full Text Available The integration of various types of genomic data into predictive models of biological networks is one of the main challenges currently faced by computational biology. Constraint-based models in particular play a key role in the attempt to obtain a quantitative understanding of cellular metabolism at genome scale. In essence, their goal is to frame the metabolic capabilities of an organism based on minimal assumptions that describe the steady states of the underlying reaction network via suitable stoichiometric constraints, specifically mass balance and energy balance (i.e. thermodynamic feasibility. The implementation of these requirements to generate viable configurations of reaction fluxes and/or to test given flux profiles for thermodynamic feasibility can however prove to be computationally intensive. We propose here a fast and scalable stoichiometry-based method to explore the Gibbs energy landscape of a biochemical network at steady state. The method is applied to the problem of reconstructing the Gibbs energy landscape underlying metabolic activity in the human red blood cell, and to that of identifying and removing thermodynamically infeasible reaction cycles in the Escherichia coli metabolic network (iAF1260. In the former case, we produce consistent predictions for chemical potentials (or log-concentrations of intracellular metabolites; in the latter, we identify a restricted set of loops (23 in total in the periplasmic and cytoplasmic core as the origin of thermodynamic infeasibility in a large sample (10(6 of flux configurations generated randomly and compatibly with the prior information available on reaction reversibility.

  6. Deriving metabolic engineering strategies from genome-scale modeling with flux ratio constraints.

    Science.gov (United States)

    Yen, Jiun Y; Nazem-Bokaee, Hadi; Freedman, Benjamin G; Athamneh, Ahmad I M; Senger, Ryan S

    2013-05-01

    Optimized production of bio-based fuels and chemicals from microbial cell factories is a central goal of systems metabolic engineering. To achieve this goal, a new computational method of using flux balance analysis with flux ratios (FBrAtio) was further developed in this research and applied to five case studies to evaluate and design metabolic engineering strategies. The approach was implemented using publicly available genome-scale metabolic flux models. Synthetic pathways were added to these models along with flux ratio constraints by FBrAtio to achieve increased (i) cellulose production from Arabidopsis thaliana; (ii) isobutanol production from Saccharomyces cerevisiae; (iii) acetone production from Synechocystis sp. PCC6803; (iv) H2 production from Escherichia coli MG1655; and (v) isopropanol, butanol, and ethanol (IBE) production from engineered Clostridium acetobutylicum. The FBrAtio approach was applied to each case to simulate a metabolic engineering strategy already implemented experimentally, and flux ratios were continually adjusted to find (i) the end-limit of increased production using the existing strategy, (ii) new potential strategies to increase production, and (iii) the impact of these metabolic engineering strategies on product yield and culture growth. The FBrAtio approach has the potential to design "fine-tuned" metabolic engineering strategies in silico that can be implemented directly with available genomic tools. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Moving image analysis to the cloud: A case study with a genome-scale tomographic study

    Energy Technology Data Exchange (ETDEWEB)

    Mader, Kevin [4Quant Ltd., Switzerland & Institute for Biomedical Engineering at University and ETH Zurich (Switzerland); Stampanoni, Marco [Institute for Biomedical Engineering at University and ETH Zurich, Switzerland & Swiss Light Source at Paul Scherrer Institut, Villigen (Switzerland)

    2016-01-28

    Over the last decade, the time required to measure a terabyte of microscopic imaging data has gone from years to minutes. This shift has moved many of the challenges away from experimental design and measurement to scalable storage, organization, and analysis. As many scientists and scientific institutions lack training and competencies in these areas, major bottlenecks have arisen and led to substantial delays and gaps between measurement, understanding, and dissemination. We present in this paper a framework for analyzing large 3D datasets using cloud-based computational and storage resources. We demonstrate its applicability by showing the setup and costs associated with the analysis of a genome-scale study of bone microstructure. We then evaluate the relative advantages and disadvantages associated with local versus cloud infrastructures.

  8. Genome-scale modeling of the protein secretory machinery in yeast

    DEFF Research Database (Denmark)

    Feizi, Amir; Österlund, Tobias; Petranovic, Dina

    2013-01-01

    The protein secretory machinery in Eukarya is involved in post-translational modification (PTMs) and sorting of the secretory and many transmembrane proteins. While the secretory machinery has been well-studied using classic reductionist approaches, a holistic view of its complex nature is lacking....... Here, we present the first genome-scale model for the yeast secretory machinery which captures the knowledge generated through more than 50 years of research. The model is based on the concept of a Protein Specific Information Matrix (PSIM: characterized by seven PTMs features). An algorithm...... was developed which mimics secretory machinery and assigns each secretory protein to a particular secretory class that determines the set of PTMs and transport steps specific to each protein. Protein abundances were integrated with the model in order to gain system level estimation of the metabolic demands...

  9. Use of genome-scale metabolic models in evolutionary systems biology.

    Science.gov (United States)

    Papp, Balázs; Szappanos, Balázs; Notebaart, Richard A

    2011-01-01

    One of the major aims of the nascent field of evolutionary systems biology is to test evolutionary hypotheses that are not only realistic from a population genetic point of view but also detailed in terms of molecular biology mechanisms. By providing a mapping between genotype and phenotype for hundreds of genes, genome-scale systems biology models of metabolic networks have already provided valuable insights into the evolution of metabolic gene contents and phenotypes of yeast and other microbial species. Here we review the recent use of these computational models to predict the fitness effect of mutations, genetic interactions, evolutionary outcomes, and to decipher the mechanisms of mutational robustness. While these studies have demonstrated that even simplified models of biochemical reaction networks can be highly informative for evolutionary analyses, they have also revealed the weakness of this modeling framework to quantitatively predict mutational effects, a challenge that needs to be addressed for future progress in evolutionary systems biology.

  10. Integration of gene expression data into genome-scale metabolic models

    DEFF Research Database (Denmark)

    Åkesson, M.; Förster, Jochen; Nielsen, Jens

    2004-01-01

    of gene expression from chemostat and batch cultures of Saccharomyces cerevisiae were combined with a recently developed genome-scale model, and the computed metabolic flux distributions were compared to experimental values from carbon labeling experiments and metabolic network analysis. The integration......A framework for integration of transcriptome data into stoichiometric metabolic models to obtain improved flux predictions is presented. The key idea is to exploit the regulatory information in the expression data to give additional constraints on the metabolic fluxes in the model. Measurements...... of expression data resulted in improved predictions of metabolic behavior in batch cultures, enabling quantitative predictions of exchange fluxes as well as qualitative estimations of changes in intracellular fluxes. A critical discussion of correlation between gene expression and metabolic fluxes is given....

  11. Network thermodynamic curation of human and yeast genome-scale metabolic models.

    Science.gov (United States)

    Martínez, Verónica S; Quek, Lake-Ee; Nielsen, Lars K

    2014-07-15

    Genome-scale models are used for an ever-widening range of applications. Although there has been much focus on specifying the stoichiometric matrix, the predictive power of genome-scale models equally depends on reaction directions. Two-thirds of reactions in the two eukaryotic reconstructions Homo sapiens Recon 1 and Yeast 5 are specified as irreversible. However, these specifications are mainly based on biochemical textbooks or on their similarity to other organisms and are rarely underpinned by detailed thermodynamic analysis. In this study, a to our knowledge new workflow combining network-embedded thermodynamic and flux variability analysis was used to evaluate existing irreversibility constraints in Recon 1 and Yeast 5 and to identify new ones. A total of 27 and 16 new irreversible reactions were identified in Recon 1 and Yeast 5, respectively, whereas only four reactions were found with directions incorrectly specified against thermodynamics (three in Yeast 5 and one in Recon 1). The workflow further identified for both models several isolated internal loops that require further curation. The framework also highlighted the need for substrate channeling (in human) and ATP hydrolysis (in yeast) for the essential reaction catalyzed by phosphoribosylaminoimidazole carboxylase in purine metabolism. Finally, the framework highlighted differences in proline metabolism between yeast (cytosolic anabolism and mitochondrial catabolism) and humans (exclusively mitochondrial metabolism). We conclude that network-embedded thermodynamics facilitates the specification and validation of irreversibility constraints in compartmentalized metabolic models, at the same time providing further insight into network properties. Copyright © 2014 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  12. Genome-scale cold stress response regulatory networks in ten Arabidopsis thaliana ecotypes

    Science.gov (United States)

    2013-01-01

    Background Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. Results In this study, we report genome-scale transcript response diversity of 10 A. thaliana ecotypes originating from different geographical locations to non-freezing cold stress (10°C). To analyze the transcriptional response diversity, we initially compared transcriptome changes in all 10 ecotypes using Arabidopsis NimbleGen ATH6 microarrays. In total 6061 transcripts were significantly cold regulated (p cold stress regulon genes. Significant numbers of non-synonymous amino acid changes were observed in the coding region of the CBF regulon genes. Considering the limited knowledge about regulatory interactions between transcription factors and their target genes in the model plant A. thaliana, we have adopted a powerful systems genetics approach- Network Component Analysis (NCA) to construct an in-silico transcriptional regulatory network model during response to cold stress. The resulting regulatory network contained 1,275 nodes and 7,720 connections, with 178 transcription factors and 1,331 target genes. Conclusions A. thaliana ecotypes exhibit considerable variation in transcriptome level responses to non-freezing cold stress treatment. Ecotype specific transcripts and related gene ontology (GO) categories were identified to delineate natural variation of cold stress regulated differential gene expression in the model plant A. thaliana. The predicted regulatory network model was able to identify new ecotype specific transcription factors and their regulatory interactions, which might be crucial for their local geographic adaptation to cold temperature. Additionally, since the approach presented here is general, it could be adapted to study networks regulating

  13. Optimization of lipid production with a genome-scale model of Yarrowia lipolytica.

    Science.gov (United States)

    Kavšček, Martin; Bhutada, Govindprasad; Madl, Tobias; Natter, Klaus

    2015-10-26

    Yarrowia lipolytica is a non-conventional yeast that is extensively investigated for its ability to excrete citrate or to accumulate large amounts of storage lipids, which is of great significance for single cell oil production. Both traits are thus of interest for basic research as well as for biotechnological applications but they typically occur simultaneously thus lowering the respective yields. Therefore, engineering of strains with high lipid content relies on novel concepts such as computational simulation to better understand the two competing processes and to eliminate citrate excretion. Using a genome-scale model (GSM) of baker's yeast as a scaffold, we reconstructed the metabolic network of Y. lipolytica and optimized it for use in flux balance analysis (FBA), with the aim to simulate growth and lipid production phases of this yeast. We validated our model and found the predictions of the growth behavior of Y. lipolytica in excellent agreement with experimental data. Based on these data, we successfully designed a fed-batch strategy to avoid citrate excretion during the lipid production phase. Further analysis of the network suggested that the oxygen demand of Y. lipolytica is reduced upon induction of lipid synthesis. According to this finding we hypothesized that a reduced aeration rate might induce lipid accumulation. This prediction was indeed confirmed experimentally. In a fermentation combining these two strategies lipid content of the biomass was increased by 80%, and lipid yield was improved more than four-fold, compared to standard conditions. Genome scale network reconstructions provide a powerful tool to predict the effects of genetic modifications and the metabolic response to environmental conditions. The high accuracy and the predictive value of a newly reconstructed GSM of Y. lipolytica to optimize growth conditions for lipid accumulation are demonstrated. Based on these findings, further strategies for engineering Y. lipolytica towards

  14. Genome-scale model guided design ofPropionibacteriumfor enhanced propionic acid production.

    Science.gov (United States)

    Navone, Laura; McCubbin, Tim; Gonzalez-Garcia, Ricardo A; Nielsen, Lars K; Marcellin, Esteban

    2018-06-01

    Production of propionic acid by fermentation of propionibacteria has gained increasing attention in the past few years. However, biomanufacturing of propionic acid cannot compete with the current oxo-petrochemical synthesis process due to its well-established infrastructure, low oil prices and the high downstream purification costs of microbial production. Strain improvement to increase propionic acid yield is the best alternative to reduce downstream purification costs. The recent generation of genome-scale models for a number of Propionibacterium species facilitates the rational design of metabolic engineering strategies and provides a new opportunity to explore the metabolic potential of the Wood-Werkman cycle. Previous strategies for strain improvement have individually targeted acid tolerance, rate of propionate production or minimisation of by-products. Here we used the P. freudenreichii subsp . shermanii and the pan- Propionibacterium genome-scale metabolic models (GEMs) to simultaneously target these combined issues. This was achieved by focussing on strategies which yield higher energies and directly suppress acetate formation. Using P. freudenreichii subsp . shermanii , two strategies were assessed. The first tested the ability to manipulate the redox balance to favour propionate production by over-expressing the first two enzymes of the pentose-phosphate pathway (PPP), Zwf (glucose-6-phosphate 1-dehydrogenase) and Pgl (6-phosphogluconolactonase). Results showed a 4-fold increase in propionate to acetate ratio during the exponential growth phase. Secondly, the ability to enhance the energy yield from propionate production by over-expressing an ATP-dependent phosphoenolpyruvate carboxykinase (PEPCK) and sodium-pumping methylmalonyl-CoA decarboxylase (MMD) was tested, which extended the exponential growth phase. Together, these strategies demonstrate that in silico design strategies are predictive and can be used to reduce by-product formation in

  15. Zea mays iRS1563: A Comprehensive Genome-Scale Metabolic Reconstruction of Maize Metabolism

    Science.gov (United States)

    Saha, Rajib; Suthers, Patrick F.; Maranas, Costas D.

    2011-01-01

    The scope and breadth of genome-scale metabolic reconstructions have continued to expand over the last decade. Herein, we introduce a genome-scale model for a plant with direct applications to food and bioenergy production (i.e., maize). Maize annotation is still underway, which introduces significant challenges in the association of metabolic functions to genes. The developed model is designed to meet rigorous standards on gene-protein-reaction (GPR) associations, elementally and charged balanced reactions and a biomass reaction abstracting the relative contribution of all biomass constituents. The metabolic network contains 1,563 genes and 1,825 metabolites involved in 1,985 reactions from primary and secondary maize metabolism. For approximately 42% of the reactions direct literature evidence for the participation of the reaction in maize was found. As many as 445 reactions and 369 metabolites are unique to the maize model compared to the AraGEM model for A. thaliana. 674 metabolites and 893 reactions are present in Zea mays iRS1563 that are not accounted for in maize C4GEM. All reactions are elementally and charged balanced and localized into six different compartments (i.e., cytoplasm, mitochondrion, plastid, peroxisome, vacuole and extracellular). GPR associations are also established based on the functional annotation information and homology prediction accounting for monofunctional, multifunctional and multimeric proteins, isozymes and protein complexes. We describe results from performing flux balance analysis under different physiological conditions, (i.e., photosynthesis, photorespiration and respiration) of a C4 plant and also explore model predictions against experimental observations for two naturally occurring mutants (i.e., bm1 and bm3). The developed model corresponds to the largest and more complete to-date effort at cataloguing metabolism for a plant species. PMID:21755001

  16. Genome-scale strain designs based on regulatory minimal cut sets.

    Science.gov (United States)

    Mahadevan, Radhakrishnan; von Kamp, Axel; Klamt, Steffen

    2015-09-01

    Stoichiometric and constraint-based methods of computational strain design have become an important tool for rational metabolic engineering. One of those relies on the concept of constrained minimal cut sets (cMCSs). However, as most other techniques, cMCSs may consider only reaction (or gene) knockouts to achieve a desired phenotype. We generalize the cMCSs approach to constrained regulatory MCSs (cRegMCSs), where up/downregulation of reaction rates can be combined along with reaction deletions. We show that flux up/downregulations can virtually be treated as cuts allowing their direct integration into the algorithmic framework of cMCSs. Because of vastly enlarged search spaces in genome-scale networks, we developed strategies to (optionally) preselect suitable candidates for flux regulation and novel algorithmic techniques to further enhance efficiency and speed of cMCSs calculation. We illustrate the cRegMCSs approach by a simple example network and apply it then by identifying strain designs for ethanol production in a genome-scale metabolic model of Escherichia coli. The results clearly show that cRegMCSs combining reaction deletions and flux regulations provide a much larger number of suitable strain designs, many of which are significantly smaller relative to cMCSs involving only knockouts. Furthermore, with cRegMCSs, one may also enable the fine tuning of desired behaviours in a narrower range. The new cRegMCSs approach may thus accelerate the implementation of model-based strain designs for the bio-based production of fuels and chemicals. MATLAB code and the examples can be downloaded at http://www.mpi-magdeburg.mpg.de/projects/cna/etcdownloads.html. krishna.mahadevan@utoronto.ca or klamt@mpi-magdeburg.mpg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Zea mays iRS1563: a comprehensive genome-scale metabolic reconstruction of maize metabolism.

    Science.gov (United States)

    Saha, Rajib; Suthers, Patrick F; Maranas, Costas D

    2011-01-01

    The scope and breadth of genome-scale metabolic reconstructions have continued to expand over the last decade. Herein, we introduce a genome-scale model for a plant with direct applications to food and bioenergy production (i.e., maize). Maize annotation is still underway, which introduces significant challenges in the association of metabolic functions to genes. The developed model is designed to meet rigorous standards on gene-protein-reaction (GPR) associations, elementally and charged balanced reactions and a biomass reaction abstracting the relative contribution of all biomass constituents. The metabolic network contains 1,563 genes and 1,825 metabolites involved in 1,985 reactions from primary and secondary maize metabolism. For approximately 42% of the reactions direct literature evidence for the participation of the reaction in maize was found. As many as 445 reactions and 369 metabolites are unique to the maize model compared to the AraGEM model for A. thaliana. 674 metabolites and 893 reactions are present in Zea mays iRS1563 that are not accounted for in maize C4GEM. All reactions are elementally and charged balanced and localized into six different compartments (i.e., cytoplasm, mitochondrion, plastid, peroxisome, vacuole and extracellular). GPR associations are also established based on the functional annotation information and homology prediction accounting for monofunctional, multifunctional and multimeric proteins, isozymes and protein complexes. We describe results from performing flux balance analysis under different physiological conditions, (i.e., photosynthesis, photorespiration and respiration) of a C4 plant and also explore model predictions against experimental observations for two naturally occurring mutants (i.e., bm1 and bm3). The developed model corresponds to the largest and more complete to-date effort at cataloguing metabolism for a plant species.

  18. Genome-scale consequences of cofactor balancing in engineered pentose utilization pathways in Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Amit Ghosh

    Full Text Available Biofuels derived from lignocellulosic biomass offer promising alternative renewable energy sources for transportation fuels. Significant effort has been made to engineer Saccharomyces cerevisiae to efficiently ferment pentose sugars such as D-xylose and L-arabinose into biofuels such as ethanol through heterologous expression of the fungal D-xylose and L-arabinose pathways. However, one of the major bottlenecks in these fungal pathways is that the cofactors are not balanced, which contributes to inefficient utilization of pentose sugars. We utilized a genome-scale model of S. cerevisiae to predict the maximal achievable growth rate for cofactor balanced and imbalanced D-xylose and L-arabinose utilization pathways. Dynamic flux balance analysis (DFBA was used to simulate batch fermentation of glucose, D-xylose, and L-arabinose. The dynamic models and experimental results are in good agreement for the wild type and for the engineered D-xylose utilization pathway. Cofactor balancing the engineered D-xylose and L-arabinose utilization pathways simulated an increase in ethanol batch production of 24.7% while simultaneously reducing the predicted substrate utilization time by 70%. Furthermore, the effects of cofactor balancing the engineered pentose utilization pathways were evaluated throughout the genome-scale metabolic network. This work not only provides new insights to the global network effects of cofactor balancing but also provides useful guidelines for engineering a recombinant yeast strain with cofactor balanced engineered pathways that efficiently co-utilizes pentose and hexose sugars for biofuels production. Experimental switching of cofactor usage in enzymes has been demonstrated, but is a time-consuming effort. Therefore, systems biology models that can predict the likely outcome of such strain engineering efforts are highly useful for motivating which efforts are likely to be worth the significant time investment.

  19. Modeling Genome-Scale mRNA Expression Datasets: From Matrix Algebra to Genetic Networks

    Science.gov (United States)

    Alter, Orly

    2003-03-01

    for two genome-scale datasets. GSVD is a unique data-driven linear transformation of the two datasets from the two genes × arrays spaces to two reduced and diagonal ``genelets'' × ``arraylets'' spaces. Some of the genelets can be associated with independent regulatory programs that are common to both datasets. Other genelets can be associated with independent biological processes or experimental artifacts that are almost exclusive to one of the datasets or the other. I will conclude with a discussion of the insights that these models may offer into the biology, chemistry and physics of gene expression.

  20. A Genome-Scale Model of Shewanella piezotolerans Simulates Mechanisms of Metabolic Diversity and Energy Conservation.

    Science.gov (United States)

    Dufault-Thompson, Keith; Jian, Huahua; Cheng, Ruixue; Li, Jiefu; Wang, Fengping; Zhang, Ying

    2017-01-01

    Shewanella piezotolerans strain WP3 belongs to the group 1 branch of the Shewanella genus and is a piezotolerant and psychrotolerant species isolated from the deep sea. In this study, a genome-scale model was constructed for WP3 using a combination of genome annotation, ortholog mapping, and physiological verification. The metabolic reconstruction contained 806 genes, 653 metabolites, and 922 reactions, including central metabolic functions that represented nonhomologous replacements between the group 1 and group 2 Shewanella species. Metabolic simulations with the WP3 model demonstrated consistency with existing knowledge about the physiology of the organism. A comparison of model simulations with experimental measurements verified the predicted growth profiles under increasing concentrations of carbon sources. The WP3 model was applied to study mechanisms of anaerobic respiration through investigating energy conservation, redox balancing, and the generation of proton motive force. Despite being an obligate respiratory organism, WP3 was predicted to use substrate-level phosphorylation as the primary source of energy conservation under anaerobic conditions, a trait previously identified in other Shewanella species. Further investigation of the ATP synthase activity revealed a positive correlation between the availability of reducing equivalents in the cell and the directionality of the ATP synthase reaction flux. Comparison of the WP3 model with an existing model of a group 2 species, Shewanella oneidensis MR-1, revealed that the WP3 model demonstrated greater flexibility in ATP production under the anaerobic conditions. Such flexibility could be advantageous to WP3 for its adaptation to fluctuating availability of organic carbon sources in the deep sea. IMPORTANCE The well-studied nature of the metabolic diversity of Shewanella bacteria makes species from this genus a promising platform for investigating the evolution of carbon metabolism and energy conservation

  1. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    Science.gov (United States)

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.

  2. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

    DEFF Research Database (Denmark)

    de Vries, Paul S; Sabater-Lleal, Maria; Chasman, Daniel I

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation....... In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91......,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10-8 is based on the number...

  3. Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species.

    Science.gov (United States)

    Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K

    2016-03-01

    A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.

  4. Testing and validation of high density resequencing microarray for broad range biothreat agents detection.

    Directory of Open Access Journals (Sweden)

    Tomasz A Leski

    Full Text Available Rapid and effective detection and identification of emerging microbiological threats and potential biowarfare agents is very challenging when using traditional culture-based methods. Contemporary molecular techniques, relying upon reverse transcription and/or polymerase chain reaction (RT-PCR/PCR provide a rapid and effective alternative, however, such assays are generally designed and optimized to detect only a limited number of targets, and seldom are capable of differentiation among variants of detected targets. To meet these challenges, we have designed a broad-range resequencing pathogen microarray (RPM for detection of tropical and emerging infectious agents (TEI including biothreat agents: RPM-TEI v 1.0 (RPM-TEI. The scope of the RPM-TEI assay enables detection and differential identification of 84 types of pathogens and 13 toxin genes, including most of the class A, B and C select agents as defined by the Centers for Disease Control and Prevention (CDC, Atlanta, GA. Due to the high risks associated with handling these particular target pathogens, the sensitivity validation of the RPM-TEI has been performed using an innovative approach, in which synthetic DNA fragments are used as templates for testing the assay's limit of detection (LOD. Assay specificity and sensitivity was subsequently confirmed by testing with full-length genomic nucleic acids of selected agents. The LOD for a majority of the agents detected by RPM-TEI was determined to be at least 10(4 copies per test. Our results also show that the RPM-TEI assay not only detects and identifies agents, but is also able to differentiate near neighbors of the same agent types, such as closely related strains of filoviruses of the Ebola Zaire group, or the Machupo and Lassa arenaviruses. Furthermore, each RPM-TEI assay results in specimen-specific agent gene sequence information that can be used to assess pathogenicity, mutations, and virulence markers, results that are not generally

  5. A genome-scale metabolic reconstruction of Pseudomonas putida KT2440: iJN746 as a cell factory

    Science.gov (United States)

    Nogales, Juan; Palsson, Bernhard Ø; Thiele, Ines

    2008-01-01

    Background Pseudomonas putida is the best studied pollutant degradative bacteria and is harnessed by industrial biotechnology to synthesize fine chemicals. Since the publication of P. putida KT2440's genome, some in silico analyses of its metabolic and biotechnology capacities have been published. However, global understanding of the capabilities of P. putida KT2440 requires the construction of a metabolic model that enables the integration of classical experimental data along with genomic and high-throughput data. The constraint-based reconstruction and analysis (COBRA) approach has been successfully used to build and analyze in silico genome-scale metabolic reconstructions. Results We present a genome-scale reconstruction of P. putida KT2440's metabolism, iJN746, which was constructed based on genomic, biochemical, and physiological information. This manually-curated reconstruction accounts for 746 genes, 950 reactions, and 911 metabolites. iJN746 captures biotechnologically relevant pathways, including polyhydroxyalkanoate synthesis and catabolic pathways of aromatic compounds (e.g., toluene, benzoate, phenylacetate, nicotinate), not described in other metabolic reconstructions or biochemical databases. The predictive potential of iJN746 was validated using experimental data including growth performance and gene deletion studies. Furthermore, in silico growth on toluene was found to be oxygen-limited, suggesting the existence of oxygen-efficient pathways not yet annotated in P. putida's genome. Moreover, we evaluated the production efficiency of polyhydroxyalkanoates from various carbon sources and found fatty acids as the most prominent candidates, as expected. Conclusion Here we presented the first genome-scale reconstruction of P. putida, a biotechnologically interesting all-surrounder. Taken together, this work illustrates the utility of iJN746 as i) a knowledge-base, ii) a discovery tool, and iii) an engineering platform to explore P. putida's potential in

  6. A genome-scale metabolic reconstruction of Pseudomonas putida KT2440: iJN746 as a cell factory

    Directory of Open Access Journals (Sweden)

    Thiele Ines

    2008-09-01

    Full Text Available Abstract Background Pseudomonas putida is the best studied pollutant degradative bacteria and is harnessed by industrial biotechnology to synthesize fine chemicals. Since the publication of P. putida KT2440's genome, some in silico analyses of its metabolic and biotechnology capacities have been published. However, global understanding of the capabilities of P. putida KT2440 requires the construction of a metabolic model that enables the integration of classical experimental data along with genomic and high-throughput data. The constraint-based reconstruction and analysis (COBRA approach has been successfully used to build and analyze in silico genome-scale metabolic reconstructions. Results We present a genome-scale reconstruction of P. putida KT2440's metabolism, iJN746, which was constructed based on genomic, biochemical, and physiological information. This manually-curated reconstruction accounts for 746 genes, 950 reactions, and 911 metabolites. iJN746 captures biotechnologically relevant pathways, including polyhydroxyalkanoate synthesis and catabolic pathways of aromatic compounds (e.g., toluene, benzoate, phenylacetate, nicotinate, not described in other metabolic reconstructions or biochemical databases. The predictive potential of iJN746 was validated using experimental data including growth performance and gene deletion studies. Furthermore, in silico growth on toluene was found to be oxygen-limited, suggesting the existence of oxygen-efficient pathways not yet annotated in P. putida's genome. Moreover, we evaluated the production efficiency of polyhydroxyalkanoates from various carbon sources and found fatty acids as the most prominent candidates, as expected. Conclusion Here we presented the first genome-scale reconstruction of P. putida, a biotechnologically interesting all-surrounder. Taken together, this work illustrates the utility of iJN746 as i a knowledge-base, ii a discovery tool, and iii an engineering platform to explore P

  7. Base-Resolution Analysis of Cisplatin–DNA Adducts at the Genome Scale

    OpenAIRE

    Shu, Xiaoting; Xiong, Xushen; Song, Jinghui; He, Chuan; Yi, Chengqi

    2016-01-01

    Cisplatin, one of the most widely used anticancer drugs, crosslinks DNA and ultimately induces cell death. However, the genomic pattern of cisplatin–DNA adducts has remained unknown owing to the lack of a reliable and sensitive genome-wide method. Herein we present “cisplatin-seq” to identify genome-wide cisplatin crosslinking sites at base resolution. Cisplatin-seq reveals that mitochondrial DNA is a preferred target of cisplatin. For nuclear genomes, cisplatin–DNA adducts are enriched withi...

  8. Noise analysis of genome-scale protein synthesis using a discrete computational model of translation

    Energy Technology Data Exchange (ETDEWEB)

    Racle, Julien; Hatzimanikatis, Vassily, E-mail: vassily.hatzimanikatis@epfl.ch [Laboratory of Computational Systems Biotechnology, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne (Switzerland); Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne (Switzerland); Stefaniuk, Adam Jan [Laboratory of Computational Systems Biotechnology, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne (Switzerland)

    2015-07-28

    Noise in genetic networks has been the subject of extensive experimental and computational studies. However, very few of these studies have considered noise properties using mechanistic models that account for the discrete movement of ribosomes and RNA polymerases along their corresponding templates (messenger RNA (mRNA) and DNA). The large size of these systems, which scales with the number of genes, mRNA copies, codons per mRNA, and ribosomes, is responsible for some of the challenges. Additionally, one should be able to describe the dynamics of ribosome exchange between the free ribosome pool and those bound to mRNAs, as well as how mRNA species compete for ribosomes. We developed an efficient algorithm for stochastic simulations that addresses these issues and used it to study the contribution and trade-offs of noise to translation properties (rates, time delays, and rate-limiting steps). The algorithm scales linearly with the number of mRNA copies, which allowed us to study the importance of genome-scale competition between mRNAs for the same ribosomes. We determined that noise is minimized under conditions maximizing the specific synthesis rate. Moreover, sensitivity analysis of the stochastic system revealed the importance of the elongation rate in the resultant noise, whereas the translation initiation rate constant was more closely related to the average protein synthesis rate. We observed significant differences between our results and the noise properties of the most commonly used translation models. Overall, our studies demonstrate that the use of full mechanistic models is essential for the study of noise in translation and transcription.

  9. Noise analysis of genome-scale protein synthesis using a discrete computational model of translation.

    Science.gov (United States)

    Racle, Julien; Stefaniuk, Adam Jan; Hatzimanikatis, Vassily

    2015-07-28

    Noise in genetic networks has been the subject of extensive experimental and computational studies. However, very few of these studies have considered noise properties using mechanistic models that account for the discrete movement of ribosomes and RNA polymerases along their corresponding templates (messenger RNA (mRNA) and DNA). The large size of these systems, which scales with the number of genes, mRNA copies, codons per mRNA, and ribosomes, is responsible for some of the challenges. Additionally, one should be able to describe the dynamics of ribosome exchange between the free ribosome pool and those bound to mRNAs, as well as how mRNA species compete for ribosomes. We developed an efficient algorithm for stochastic simulations that addresses these issues and used it to study the contribution and trade-offs of noise to translation properties (rates, time delays, and rate-limiting steps). The algorithm scales linearly with the number of mRNA copies, which allowed us to study the importance of genome-scale competition between mRNAs for the same ribosomes. We determined that noise is minimized under conditions maximizing the specific synthesis rate. Moreover, sensitivity analysis of the stochastic system revealed the importance of the elongation rate in the resultant noise, whereas the translation initiation rate constant was more closely related to the average protein synthesis rate. We observed significant differences between our results and the noise properties of the most commonly used translation models. Overall, our studies demonstrate that the use of full mechanistic models is essential for the study of noise in translation and transcription.

  10. Determining the control circuitry of redox metabolism at the genome-scale.

    Directory of Open Access Journals (Sweden)

    Stephen Federowicz

    2014-04-01

    Full Text Available Determining how facultative anaerobic organisms sense and direct cellular responses to electron acceptor availability has been a subject of intense study. However, even in the model organism Escherichia coli, established mechanisms only explain a small fraction of the hundreds of genes that are regulated during electron acceptor shifts. Here we propose a qualitative model that accounts for the full breadth of regulated genes by detailing how two global transcription factors (TFs, ArcA and Fnr of E. coli, sense key metabolic redox ratios and act on a genome-wide basis to regulate anabolic, catabolic, and energy generation pathways. We first fill gaps in our knowledge of this transcriptional regulatory network by carrying out ChIP-chip and gene expression experiments to identify 463 regulatory events. We then interfaced this reconstructed regulatory network with a highly curated genome-scale metabolic model to show that ArcA and Fnr regulate >80% of total metabolic flux and 96% of differential gene expression across fermentative and nitrate respiratory conditions. Based on the data, we propose a feedforward with feedback trim regulatory scheme, given the extensive repression of catabolic genes by ArcA and extensive activation of chemiosmotic genes by Fnr. We further corroborated this regulatory scheme by showing a 0.71 r(2 (p<1e-6 correlation between changes in metabolic flux and changes in regulatory activity across fermentative and nitrate respiratory conditions. Finally, we are able to relate the proposed model to a wealth of previously generated data by contextualizing the existing transcriptional regulatory network.

  11. SHARP: genome-scale identification of gene-protein-reaction associations in cyanobacteria.

    Science.gov (United States)

    Krishnakumar, S; Durai, Dilip A; Wangikar, Pramod P; Viswanathan, Ganesh A

    2013-11-01

    Genome scale metabolic model provides an overview of an organism's metabolic capability. These genome-specific metabolic reconstructions are based on identification of gene to protein to reaction (GPR) associations and, in turn, on homology with annotated genes from other organisms. Cyanobacteria are photosynthetic prokaryotes which have diverged appreciably from their nonphotosynthetic counterparts. They also show significant evolutionary divergence from plants, which are well studied for their photosynthetic apparatus. We argue that context-specific sequence and domain similarity can add to the repertoire of the GPR associations and significantly expand our view of the metabolic capability of cyanobacteria. We took an approach that combines the results of context-specific sequence-to-sequence similarity search with those of sequence-to-profile searches. We employ PSI-BLAST for the former, and CDD, Pfam, and COG for the latter. An optimization algorithm was devised to arrive at a weighting scheme to combine the different evidences with KEGG-annotated GPRs as training data. We present the algorithm in the form of software "Systematic, Homology-based Automated Re-annotation for Prokaryotes (SHARP)." We predicted 3,781 new GPR associations for the 10 prokaryotes considered of which eight are cyanobacteria species. These new GPR associations fall in several metabolic pathways and were used to annotate 7,718 gaps in the metabolic network. These new annotations led to discovery of several pathways that may be active and thereby providing new directions for metabolic engineering of these species for production of useful products. Metabolic model developed on such a reconstructed network is likely to give better phenotypic predictions.

  12. Genome-scale metabolic model for Lactococcus lactis MG1363 and its application to the analysis of flavor formation

    NARCIS (Netherlands)

    Flahaut, N.A.L.; Wiersma, A.; Bunt, van der B.; Martens, D.E.; Schaap, P.J.; Sijtsma, L.; Martins Dos Santos, V.A.P.; Vos, de W.M.

    2013-01-01

    Lactococcus lactis subsp. cremoris MG1363 is a paradigm strain for lactococci used in industrial dairy fermentations. However, despite of its importance for process development, no genome-scale metabolic model has been reported thus far. Moreover, current models for other lactococci only focus on

  13. redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models.

    Directory of Open Access Journals (Sweden)

    Meric Ataman

    2017-07-01

    Full Text Available Genome-scale metabolic reconstructions have proven to be valuable resources in enhancing our understanding of metabolic networks as they encapsulate all known metabolic capabilities of the organisms from genes to proteins to their functions. However the complexity of these large metabolic networks often hinders their utility in various practical applications. Although reduced models are commonly used for modeling and in integrating experimental data, they are often inconsistent across different studies and laboratories due to different criteria and detail, which can compromise transferability of the findings and also integration of experimental data from different groups. In this study, we have developed a systematic semi-automatic approach to reduce genome-scale models into core models in a consistent and logical manner focusing on the central metabolism or subsystems of interest. The method minimizes the loss of information using an approach that combines graph-based search and optimization methods. The resulting core models are shown to be able to capture key properties of the genome-scale models and preserve consistency in terms of biomass and by-product yields, flux and concentration variability and gene essentiality. The development of these "consistently-reduced" models will help to clarify and facilitate integration of different experimental data to draw new understanding that can be directly extendable to genome-scale models.

  14. Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Thomas L Turner

    2011-03-01

    Full Text Available Body size is a classic quantitative trait with evolutionarily significant variation within many species. Locating the alleles responsible for this variation would help understand the maintenance of variation in body size in particular, as well as quantitative traits in general. However, successful genome-wide association of genotype and phenotype may require very large sample sizes if alleles have low population frequencies or modest effects. As a complementary approach, we propose that population-based resequencing of experimentally evolved populations allows for considerable power to map functional variation. Here, we use this technique to investigate the genetic basis of natural variation in body size in Drosophila melanogaster. Significant differentiation of hundreds of loci in replicate selection populations supports the hypothesis that the genetic basis of body size variation is very polygenic in D. melanogaster. Significantly differentiated variants are limited to single genes at some loci, allowing precise hypotheses to be formed regarding causal polymorphisms, while other significant regions are large and contain many genes. By using significantly associated polymorphisms as a priori candidates in follow-up studies, these data are expected to provide considerable power to determine the genetic basis of natural variation in body size.

  15. RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

    Science.gov (United States)

    Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A

    2013-11-01

    Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in

  16. Scaling properties and fractality in the distribution of coding segments in eukaryotic genomes revealed through a block entropy approach

    Science.gov (United States)

    Athanasopoulou, Labrini; Athanasopoulos, Stavros; Karamanos, Kostas; Almirantis, Yannis

    2010-11-01

    Statistical methods, including block entropy based approaches, have already been used in the study of long-range features of genomic sequences seen as symbol series, either considering the full alphabet of the four nucleotides or the binary purine or pyrimidine character set. Here we explore the alternation of short protein-coding segments with long noncoding spacers in entire chromosomes, focusing on the scaling properties of block entropy. In previous studies, it has been shown that the sizes of noncoding spacers follow power-law-like distributions in most chromosomes of eukaryotic organisms from distant taxa. We have developed a simple evolutionary model based on well-known molecular events (segmental duplications followed by elimination of most of the duplicated genes) which reproduces the observed linearity in log-log plots. The scaling properties of block entropy H(n) have been studied in several works. Their findings suggest that linearity in semilogarithmic scale characterizes symbol sequences which exhibit fractal properties and long-range order, while this linearity has been shown in the case of the logistic map at the Feigenbaum accumulation point. The present work starts with the observation that the block entropy of the Cantor-like binary symbol series scales in a similar way. Then, we perform the same analysis for the full set of human chromosomes and for several chromosomes of other eukaryotes. A similar but less extended linearity in semilogarithmic scale, indicating fractality, is observed, while randomly formed surrogate sequences clearly lack this type of scaling. Genomic sequences always present entropy values much lower than their random surrogates. Symbol sequences produced by the aforementioned evolutionary model follow the scaling found in genomic sequences, thus corroborating the conjecture that “segmental duplication-gene elimination” dynamics may have contributed to the observed long rangeness in the coding or noncoding alternation in

  17. The genome of Eucalyptus grandis

    Energy Technology Data Exchange (ETDEWEB)

    Myburg, Alexander A.; Grattapaglia, Dario; Tuskan, Gerald A.; Hellsten, Uffe; Hayes, Richard D.; Grimwood, Jane; Jenkins, Jerry; Lindquist, Erika; Tice, Hope; Bauer, Diane; Goodstein, David M.; Dubchak, Inna; Poliakov, Alexandre; Mizrachi, Eshchar; Kullan, Anand R. K.; Hussey, Steven G.; Pinard, Desre; van der Merwe, Karen; Singh, Pooja; van Jaarsveld, Ida; Silva-Junior, Orzenil B.; Togawa, Roberto C.; Pappas, Marilia R.; Faria, Danielle A.; Sansaloni, Carolina P.; Petroli, Cesar D.; Yang, Xiaohan; Ranjan, Priya; Tschaplinski, Timothy J.; Ye, Chu-Yu; Li, Ting; Sterck, Lieven; Vanneste, Kevin; Murat, Florent; Soler, Marçal; Clemente, Hélène San; Saidi, Naijib; Cassan-Wang, Hua; Dunand, Christophe; Hefer, Charles A.; Bornberg-Bauer, Erich; Kersting, Anna R.; Vining, Kelly; Amarasinghe, Vindhya; Ranik, Martin; Naithani, Sushma; Elser, Justin; Boyd, Alexander E.; Liston, Aaron; Spatafora, Joseph W.; Dharmwardhana, Palitha; Raja, Rajani; Sullivan, Christopher; Romanel, Elisson; Alves-Ferreira, Marcio; Külheim, Carsten; Foley, William; Carocha, Victor; Paiva, Jorge; Kudrna, David; Brommonschenkel, Sergio H.; Pasquali, Giancarlo; Byrne, Margaret; Rigault, Philippe; Tibbits, Josquin; Spokevicius, Antanas; Jones, Rebecca C.; Steane, Dorothy A.; Vaillancourt, René E.; Potts, Brad M.; Joubert, Fourie; Barry, Kerrie; Pappas, Georgios J.; Strauss, Steven H.; Jaiswal, Pankaj; Grima-Pettenati, Jacqueline; Salse, Jérôme; Van de Peer, Yves; Rokhsar, Daniel S.; Schmutz, Jeremy

    2014-06-11

    Eucalypts are the world s most widely planted hardwood trees. Their broad adaptability, rich species diversity, fast growth and superior multipurpose wood, have made them a global renewable resource of fiber and energy that mitigates human pressures on natural forests. We sequenced and assembled >94% of the 640 Mbp genome of Eucalyptus grandis into its 11 chromosomes. A set of 36,376 protein coding genes were predicted revealing that 34% occur in tandem duplications, the largest proportion found thus far in any plant genome. Eucalypts also show the highest diversity of genes for plant specialized metabolism that act as chemical defence against biotic agents and provide unique pharmaceutical oils. Resequencing of a set of inbred tree genomes revealed regions of strongly conserved heterozygosity, likely hotspots of inbreeding depression. The resequenced genome of the sister species E. globulus underscored the high inter-specific genome colinearity despite substantial genome size variation in the genus. The genome of E. grandis is the first reference for the early diverging Rosid order Myrtales and is placed here basal to the Eurosids. This resource expands knowledge on the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.

  18. Identification of novel SNPs in glioblastoma using targeted resequencing.

    Science.gov (United States)

    Keller, Andreas; Harz, Christian; Matzas, Mark; Meder, Benjamin; Katus, Hugo A; Ludwig, Nicole; Fischer, Ulrike; Meese, Eckart

    2011-01-01

    High-throughput sequencing opens avenues to find genetic variations that may be indicative of an increased risk for certain diseases. Linking these genomic data to other "omics" approaches bears the potential to deepen our understanding of pathogenic processes at the molecular level. To detect novel single nucleotide polymorphisms (SNPs) for glioblastoma multiforme (GBM), we used a combination of specific target selection and next generation sequencing (NGS). We generated a microarray covering the exonic regions of 132 GBM associated genes to enrich target sequences in two GBM tissues and corresponding leukocytes of the patients. Enriched target genes were sequenced with Illumina and the resulting reads were mapped to the human genome. With this approach we identified over 6000 SNPs, including over 1300 SNPs located in the targeted genes. Integrating the genome-wide association study (GWAS) catalog and known disease associated SNPs, we found that several of the detected SNPs were previously associated with smoking behavior, body mass index, breast cancer and high-grade glioma. Particularly, the breast cancer associated allele of rs660118 SNP in the gene SART1 showed a near doubled frequency in glioblastoma patients, as verified in an independent control cohort by Sanger sequencing. In addition, we identified SNPs in 20 of 21 GBM associated antigens providing further evidence that genetic variations are significantly associated with the immunogenicity of antigens.

  19. Identification of novel SNPs in glioblastoma using targeted resequencing.

    Directory of Open Access Journals (Sweden)

    Andreas Keller

    Full Text Available High-throughput sequencing opens avenues to find genetic variations that may be indicative of an increased risk for certain diseases. Linking these genomic data to other "omics" approaches bears the potential to deepen our understanding of pathogenic processes at the molecular level. To detect novel single nucleotide polymorphisms (SNPs for glioblastoma multiforme (GBM, we used a combination of specific target selection and next generation sequencing (NGS. We generated a microarray covering the exonic regions of 132 GBM associated genes to enrich target sequences in two GBM tissues and corresponding leukocytes of the patients. Enriched target genes were sequenced with Illumina and the resulting reads were mapped to the human genome. With this approach we identified over 6000 SNPs, including over 1300 SNPs located in the targeted genes. Integrating the genome-wide association study (GWAS catalog and known disease associated SNPs, we found that several of the detected SNPs were previously associated with smoking behavior, body mass index, breast cancer and high-grade glioma. Particularly, the breast cancer associated allele of rs660118 SNP in the gene SART1 showed a near doubled frequency in glioblastoma patients, as verified in an independent control cohort by Sanger sequencing. In addition, we identified SNPs in 20 of 21 GBM associated antigens providing further evidence that genetic variations are significantly associated with the immunogenicity of antigens.

  20. Characterizing steady states of genome-scale metabolic networks in continuous cell cultures.

    Directory of Open Access Journals (Sweden)

    Jorge Fernandez-de-Cossio-Diaz

    2017-11-01

    Full Text Available In the continuous mode of cell culture, a constant flow carrying fresh media replaces culture fluid, cells, nutrients and secreted metabolites. Here we present a model for continuous cell culture coupling intra-cellular metabolism to extracellular variables describing the state of the bioreactor, taking into account the growth capacity of the cell and the impact of toxic byproduct accumulation. We provide a method to determine the steady states of this system that is tractable for metabolic networks of arbitrary complexity. We demonstrate our approach in a toy model first, and then in a genome-scale metabolic network of the Chinese hamster ovary cell line, obtaining results that are in qualitative agreement with experimental observations. We derive a number of consequences from the model that are independent of parameter values. The ratio between cell density and dilution rate is an ideal control parameter to fix a steady state with desired metabolic properties. This conclusion is robust even in the presence of multi-stability, which is explained in our model by a negative feedback loop due to toxic byproduct accumulation. A complex landscape of steady states emerges from our simulations, including multiple metabolic switches, which also explain why cell-line and media benchmarks carried out in batch culture cannot be extrapolated to perfusion. On the other hand, we predict invariance laws between continuous cell cultures with different parameters. A practical consequence is that the chemostat is an ideal experimental model for large-scale high-density perfusion cultures, where the complex landscape of metabolic transitions is faithfully reproduced.

  1. High-throughput multiplex cpDNA resequencing clarifies the genetic diversity and genetic relationships among Brassica napus, Brassica rapa and Brassica oleracea.

    Science.gov (United States)

    Qiao, Jiangwei; Cai, Mengxian; Yan, Guixin; Wang, Nian; Li, Feng; Chen, Binyun; Gao, Guizhen; Xu, Kun; Li, Jun; Wu, Xiaoming

    2016-01-01

    Brassica napus (rapeseed) is a recent allotetraploid plant and the second most important oilseed crop worldwide. The origin of B. napus and the genetic relationships with its diploid ancestor species remain largely unresolved. Here, chloroplast DNA (cpDNA) from 488 B. napus accessions of global origin, 139 B. rapa accessions and 49 B. oleracea accessions were populationally resequenced using Illumina Solexa sequencing technologies. The intraspecific cpDNA variants and their allelic frequencies were called genomewide and further validated via EcoTILLING analyses of the rpo region. The cpDNA of the current global B. napus population comprises more than 400 variants (SNPs and short InDels) and maintains one predominant haplotype (Bncp1). Whole-genome resequencing of the cpDNA of Bncp1 haplotype eliminated its direct inheritance from any accession of the B. rapa or B. oleracea species. The distribution of the polymorphism information content (PIC) values for each variant demonstrated that B. napus has much lower cpDNA diversity than B. rapa; however, a vast majority of the wild and cultivated B. oleracea specimens appeared to share one same distinct cpDNA haplotype, in contrast to its wild C-genome relatives. This finding suggests that the cpDNA of the three Brassica species is well differentiated. The predominant B. napus cpDNA haplotype may have originated from uninvestigated relatives or from interactions between cpDNA mutations and natural/artificial selection during speciation and evolution. These exhaustive data on variation in cpDNA would provide fundamental data for research on cpDNA and chloroplasts. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  2. Decoding the fine-scale structure of a breast cancer genome and transcriptome.

    Science.gov (United States)

    Volik, Stanislav; Raphael, Benjamin J; Huang, Guiqing; Stratton, Michael R; Bignel, Graham; Murnane, John; Brebner, John H; Bajsarowicz, Krystyna; Paris, Pamela L; Tao, Quanzhou; Kowbel, David; Lapuk, Anna; Shagin, Dmitri A; Shagina, Irina A; Gray, Joe W; Cheng, Jan-Fang; de Jong, Pieter J; Pevzner, Pavel; Collins, Colin

    2006-03-01

    A comprehensive understanding of cancer is predicated upon knowledge of the structure of malignant genomes underlying its many variant forms and the molecular mechanisms giving rise to them. It is well established that solid tumor genomes accumulate a large number of genome rearrangements during tumorigenesis. End Sequence Profiling (ESP) maps and clones genome breakpoints associated with all types of genome rearrangements elucidating the structural organization of tumor genomes. Here we extend the ESP methodology in several directions using the breast cancer cell line MCF-7. First, targeted ESP is applied to multiple amplified loci, revealing a complex process of rearrangement and co-amplification in these regions reminiscent of breakage/fusion/bridge cycles. Second, genome breakpoints identified by ESP are confirmed using a combination of DNA sequencing and PCR. Third, in vitro functional studies assign biological function to a rearranged tumor BAC clone, demonstrating that it encodes anti-apoptotic activity. Finally, ESP is extended to the transcriptome identifying four novel fusion transcripts and providing evidence that expression of fusion genes may be common in tumors. These results demonstrate the distinct advantages of ESP including: (1) the ability to detect all types of rearrangements and copy number changes; (2) straightforward integration of ESP data with the annotated genome sequence; (3) immortalization of the genome; (4) ability to generate tumor-specific reagents for in vitro and in vivo functional studies. Given these properties, ESP could play an important role in a tumor genome project.

  3. Ascertainment bias in studies of human genome-wide polymorphism

    DEFF Research Database (Denmark)

    Clark, Andrew G.; Hubisz, Melissa J.; Bustamente, Carlos D.

    2005-01-01

    was a resequencing-by-hybridization effort using the 24 people of diverse origin in the Polymorphism Discovery Resource. Here we take these two data sets and contrast two basic summary statistics, heterozygosity and FST, as well as the site frequency spectra, for 500-kb windows spanning the genome. The magnitude...

  4. Integrated translational genomics for analysis of complex traits in sorghum

    Science.gov (United States)

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  5. Population genomics of an endemic Mediterranean fish: differentiation by fine scale dispersal and adaptation.

    Science.gov (United States)

    Carreras, Carlos; Ordóñez, Víctor; Zane, Lorenzo; Kruschel, Claudia; Nasto, Ina; Macpherson, Enrique; Pascual, Marta

    2017-03-06

    The assessment of the genetic structuring of biodiversity is crucial for management and conservation. For species with large effective population sizes a low number of markers may fail to identify population structure. A solution of this shortcoming can be high-throughput sequencing that allows genotyping thousands of markers on a genome-wide approach while facilitating the detection of genetic structuring shaped by selection. We used Genotyping-by-Sequencing (GBS) on 176 individuals of the endemic East Atlantic peacock wrasse (Symphodus tinca), from 6 locations in the Adriatic and Ionian seas. We obtained a total of 4,155 polymorphic SNPs and we observed two strong barriers to gene flow. The first one differentiated Tremiti Islands, in the northwest, from all the other locations while the second one separated east and south-west localities. Outlier SNPs potentially under positive selection and neutral SNPs both showed similar patterns of structuring, although finer scale differentiation was unveiled with outlier loci. Our results reflect the complexity of population genetic structure and demonstrate that both habitat fragmentation and positive selection are on play. This complexity should be considered in biodiversity assessments of different taxa, including non-model yet ecologically relevant organisms.

  6. Erroneous energy-generating cycles in published genome scale metabolic networks: Identification and removal

    Science.gov (United States)

    Hartleb, Daniel; Papp, Balázs

    2017-01-01

    Energy metabolism is central to cellular biology. Thus, genome-scale models of heterotrophic unicellular species must account appropriately for the utilization of external nutrients to synthesize energy metabolites such as ATP. However, metabolic models designed for flux-balance analysis (FBA) may contain thermodynamically impossible energy-generating cycles: without nutrient consumption, these models are still capable of charging energy metabolites (such as ADP→ATP or NADP+→NADPH). Here, we show that energy-generating cycles occur in over 85% of metabolic models without extensive manual curation, such as those contained in the ModelSEED and MetaNetX databases; in contrast, such cycles are rare in the manually curated models of the BiGG database. Energy generating cycles may represent model errors, e.g., erroneous assumptions on reaction reversibilities. Alternatively, part of the cycle may be thermodynamically feasible in one environment, while the remainder is thermodynamically feasible in another environment; as standard FBA does not account for thermodynamics, combining these into an FBA model allows erroneous energy generation. The presence of energy-generating cycles typically inflates maximal biomass production rates by 25%, and may lead to biases in evolutionary simulations. We present efficient computational methods (i) to identify energy generating cycles, using FBA, and (ii) to identify minimal sets of model changes that eliminate them, using a variant of the GlobalFit algorithm. PMID:28419089

  7. Reprogramming cell fate with a genome-scale library of artificial transcription factors.

    Science.gov (United States)

    Eguchi, Asuka; Wleklinski, Matthew J; Spurgat, Mackenzie C; Heiderscheit, Evan A; Kropornicka, Anna S; Vu, Catherine K; Bhimsaria, Devesh; Swanson, Scott A; Stewart, Ron; Ramanathan, Parameswaran; Kamp, Timothy J; Slukvin, Igor; Thomson, James A; Dutton, James R; Ansari, Aseem Z

    2016-12-20

    Artificial transcription factors (ATFs) are precision-tailored molecules designed to bind DNA and regulate transcription in a preprogrammed manner. Libraries of ATFs enable the high-throughput screening of gene networks that trigger cell fate decisions or phenotypic changes. We developed a genome-scale library of ATFs that display an engineered interaction domain (ID) to enable cooperative assembly and synergistic gene expression at targeted sites. We used this ATF library to screen for key regulators of the pluripotency network and discovered three combinations of ATFs capable of inducing pluripotency without exogenous expression of Oct4 (POU domain, class 5, TF 1). Cognate site identification, global transcriptional profiling, and identification of ATF binding sites reveal that the ATFs do not directly target Oct4; instead, they target distinct nodes that converge to stimulate the endogenous pluripotency network. This forward genetic approach enables cell type conversions without a priori knowledge of potential key regulators and reveals unanticipated gene network dynamics that drive cell fate choices.

  8. Genome-scale modeling of the evolutionary path to C4 photosynthesis

    Science.gov (United States)

    Myers, Christopher R.; Bogart, Eli

    In C4 photosynthesis, plants maintain a high carbon dioxide level in specialized bundle sheath cells surrounding leaf veins and restrict CO2 assimilation to those cells, favoring CO2 over O2 in competition for Rubisco active sites. In C3 plants, which do not possess such a carbon concentrating mechanism, CO2 fixation is reduced due to this competition. Despite the complexity of the C4 system, it has evolved convergently from more than 60 independent origins in diverse families of plants around the world over the last 30 million years. We study the evolution of the C4 system in a genome-scale model of plant metabolism that describes interacting mesophyll and bundle sheath cells and enforces key nonlinear kinetic relationships. Adapting the zero-temperature string method for simulating transition paths in physics and chemistry, we find the highest-fitness paths connecting C3 and C4 positions in the model's high-dimensional parameter space, and show that they reproduce known aspects of the C3-C4 transition while making additional predictions about metabolic changes along the path. We explore the relationship between evolutionary history and C4 biochemical subtype, and the effects of atmospheric carbon dioxide levels.

  9. Computational solution to automatically map metabolite libraries in the context of genome scale metabolic networks

    Directory of Open Access Journals (Sweden)

    Benjamin eMerlet

    2016-02-01

    Full Text Available This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc and flat file formats (SBML and Matlab files. We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics and Glasgow Polyomics on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks.In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks.In order to achieve this goal we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.

  10. Inference and Evolutionary Analysis of Genome-Scale Regulatory Networks in Large Phylogenies.

    Science.gov (United States)

    Koch, Christopher; Konieczka, Jay; Delorey, Toni; Lyons, Ana; Socha, Amanda; Davis, Kathleen; Knaack, Sara A; Thompson, Dawn; O'Shea, Erin K; Regev, Aviv; Roy, Sushmita

    2017-05-24

    Changes in transcriptional regulatory networks can significantly contribute to species evolution and adaptation. However, identification of genome-scale regulatory networks is an open challenge, especially in non-model organisms. Here, we introduce multi-species regulatory network learning (MRTLE), a computational approach that uses phylogenetic structure, sequence-specific motifs, and transcriptomic data, to infer the regulatory networks in different species. Using simulated data from known networks and transcriptomic data from six divergent yeasts, we demonstrate that MRTLE predicts networks with greater accuracy than existing methods because it incorporates phylogenetic information. We used MRTLE to infer the structure of the transcriptional networks that control the osmotic stress responses of divergent, non-model yeast species and then validated our predictions experimentally. Interrogating these networks reveals that gene duplication promotes network divergence across evolution. Taken together, our approach facilitates study of regulatory network evolutionary dynamics across multiple poorly studied species. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

  11. Applications of genome-scale metabolic network model in metabolic engineering.

    Science.gov (United States)

    Kim, Byoungjin; Kim, Won Jun; Kim, Dong In; Lee, Sang Yup

    2015-03-01

    Genome-scale metabolic network model (GEM) is a fundamental framework in systems metabolic engineering. GEM is built upon extensive experimental data and literature information on gene annotation and function, metabolites and enzymes so that it contains all known metabolic reactions within an organism. Constraint-based analysis of GEM enables the identification of phenotypic properties of an organism and hypothesis-driven engineering of cellular functions to achieve objectives. Along with the advances in omics, high-throughput technology and computational algorithms, the scope and applications of GEM have substantially expanded. In particular, various computational algorithms have been developed to predict beneficial gene deletion and amplification targets and used to guide the strain development process for the efficient production of industrially important chemicals. Furthermore, an Escherichia coli GEM was integrated with a pathway prediction algorithm and used to evaluate all possible routes for the production of a list of commodity chemicals in E. coli. Combined with the wealth of experimental data produced by high-throughput techniques, much effort has been exerted to add more biological contexts into GEM through the integration of omics data and regulatory network information for the mechanistic understanding and improved prediction capabilities. In this paper, we review the recent developments and applications of GEM focusing on the GEM-based computational algorithms available for microbial metabolic engineering.

  12. Genome-Scale Analysis of Translation Elongation with a Ribosome Flow Model

    Science.gov (United States)

    Meilijson, Isaac; Kupiec, Martin; Ruppin, Eytan

    2011-01-01

    We describe the first large scale analysis of gene translation that is based on a model that takes into account the physical and dynamical nature of this process. The Ribosomal Flow Model (RFM) predicts fundamental features of the translation process, including translation rates, protein abundance levels, ribosomal densities and the relation between all these variables, better than alternative (‘non-physical’) approaches. In addition, we show that the RFM can be used for accurate inference of various other quantities including genes' initiation rates and translation costs. These quantities could not be inferred by previous predictors. We find that increasing the number of available ribosomes (or equivalently the initiation rate) increases the genomic translation rate and the mean ribosome density only up to a certain point, beyond which both saturate. Strikingly, assuming that the translation system is tuned to work at the pre-saturation point maximizes the predictive power of the model with respect to experimental data. This result suggests that in all organisms that were analyzed (from bacteria to Human), the global initiation rate is optimized to attain the pre-saturation point. The fact that similar results were not observed for heterologous genes indicates that this feature is under selection. Remarkably, the gap between the performance of the RFM and alternative predictors is strikingly large in the case of heterologous genes, testifying to the model's promising biotechnological value in predicting the abundance of heterologous proteins before expressing them in the desired host. PMID:21909250

  13. Cloud-scale genomic signals processing classification analysis for gene expression microarray data.

    Science.gov (United States)

    Harvey, Benjamin; Soo-Yeon Ji

    2014-01-01

    As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring inference though analysis of DNA/mRNA sequence data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological inference by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale classification analysis of microarray data using Wavelet thresholding in a Cloud environment to identify significantly expressed features. This paper proposes a novel methodology that uses Wavelet based Denoising to initialize a threshold for determination of significantly expressed genes for classification. Additionally, this research was implemented and encompassed within cloud-based distributed processing environment. The utilization of Cloud computing and Wavelet thresholding was used for the classification 14 tumor classes from the Global Cancer Map (GCM). The results proved to be more accurate than using a predefined p-value for differential expression classification. This novel methodology analyzed Wavelet based threshold features of gene expression in a Cloud environment, furthermore classifying the expression of samples by analyzing gene patterns, which inform us of biological processes. Moreover, enabling researchers to face the present and forthcoming challenges that may arise in the analysis of data in functional genomics of large microarray datasets.

  14. Reconstruction of the microalga Nannochloropsis salina genome-scale metabolic model with applications to lipid production.

    Science.gov (United States)

    Loira, Nicolás; Mendoza, Sebastian; Paz Cortés, María; Rojas, Natalia; Travisany, Dante; Genova, Alex Di; Gajardo, Natalia; Ehrenfeld, Nicole; Maass, Alejandro

    2017-07-04

    Nannochloropsis salina (= Eustigmatophyceae) is a marine microalga which has become a biotechnological target because of its high capacity to produce polyunsaturated fatty acids and triacylglycerols. It has been used as a source of biofuel, pigments and food supplements, like Omega 3. Only some Nannochloropsis species have been sequenced, but none of them benefit from a genome-scale metabolic model (GSMM), able to predict its metabolic capabilities. We present iNS934, the first GSMM for N. salina, including 2345 reactions, 934 genes and an exhaustive description of lipid and nitrogen metabolism. iNS934 has a 90% of accuracy when making simple growth/no-growth predictions and has a 15% error rate in predicting growth rates in different experimental conditions. Moreover, iNS934 allowed us to propose 82 different knockout strategies for strain optimization of triacylglycerols. iNS934 provides a powerful tool for metabolic improvement, allowing predictions and simulations of N. salina metabolism under different media and genetic conditions. It also provides a systemic view of N. salina metabolism, potentially guiding research and providing context to -omics data.

  15. Expanding a dynamic flux balance model of yeast fermentation to genome-scale

    Directory of Open Access Journals (Sweden)

    Agosin Eduardo

    2011-05-01

    Full Text Available Abstract Background Yeast is considered to be a workhorse of the biotechnology industry for the production of many value-added chemicals, alcoholic beverages and biofuels. Optimization of the fermentation is a challenging task that greatly benefits from dynamic models able to accurately describe and predict the fermentation profile and resulting products under different genetic and environmental conditions. In this article, we developed and validated a genome-scale dynamic flux balance model, using experimentally determined kinetic constraints. Results Appropriate equations for maintenance, biomass composition, anaerobic metabolism and nutrient uptake are key to improve model performance, especially for predicting glycerol and ethanol synthesis. Prediction profiles of synthesis and consumption of the main metabolites involved in alcoholic fermentation closely agreed with experimental data obtained from numerous lab and industrial fermentations under different environmental conditions. Finally, fermentation simulations of genetically engineered yeasts closely reproduced previously reported experimental results regarding final concentrations of the main fermentation products such as ethanol and glycerol. Conclusion A useful tool to describe, understand and predict metabolite production in batch yeast cultures was developed. The resulting model, if used wisely, could help to search for new metabolic engineering strategies to manage ethanol content in batch fermentations.

  16. Expanding a dynamic flux balance model of yeast fermentation to genome-scale.

    Science.gov (United States)

    Vargas, Felipe A; Pizarro, Francisco; Pérez-Correa, J Ricardo; Agosin, Eduardo

    2011-05-19

    Yeast is considered to be a workhorse of the biotechnology industry for the production of many value-added chemicals, alcoholic beverages and biofuels. Optimization of the fermentation is a challenging task that greatly benefits from dynamic models able to accurately describe and predict the fermentation profile and resulting products under different genetic and environmental conditions. In this article, we developed and validated a genome-scale dynamic flux balance model, using experimentally determined kinetic constraints. Appropriate equations for maintenance, biomass composition, anaerobic metabolism and nutrient uptake are key to improve model performance, especially for predicting glycerol and ethanol synthesis. Prediction profiles of synthesis and consumption of the main metabolites involved in alcoholic fermentation closely agreed with experimental data obtained from numerous lab and industrial fermentations under different environmental conditions. Finally, fermentation simulations of genetically engineered yeasts closely reproduced previously reported experimental results regarding final concentrations of the main fermentation products such as ethanol and glycerol. A useful tool to describe, understand and predict metabolite production in batch yeast cultures was developed. The resulting model, if used wisely, could help to search for new metabolic engineering strategies to manage ethanol content in batch fermentations.

  17. Engineering Escherichia coli for poly-(3-hydroxybutyrate) production guided by genome-scale metabolic network analysis.

    Science.gov (United States)

    Zheng, Yangyang; Yuan, Qianqian; Yang, Xiaoyan; Ma, Hongwu

    2017-11-01

    Poly-(3-hydroxybutyrate) (P3HB) is a promising biodegradable plastic synthesized from acetyl-CoA. One important factor affecting the P3HB production cost is the P3HB yield. Through flux balance analysis of an extended genome-scale metabolic network of E. coli, we found that the introduction of non-oxidative glycolysis pathway (NOG), a previously reported pathway enabling complete carbon conservation, can increase the theoretical carbon yield from 67% to 89%, equivalent to the theoretical mass yield from 0.48g P3HB/g glucose to 0.64g P3HB/g glucose. Based on this analysis result, we introduced phosphoketolase and enhanced the NOG pathway in E. coli. The mass yield in the engineered strain was increased from 0.16g P3HB/g glucose to 0.24g P3HB/g glucose. We further overexpressed pntAB to enhance the NADPH availability and down-regulated TCA cycle to divert more acetyl-CoA toward P3HB. The final construct accumulated 5.7g/L P3HB and reached a carbon yield of 0.43 (a mass yield of 0.31g P3HB/g glucose) in shake flask cultures in shake flask cultures. The introduction of NOG pathway could also be useful for improving yields of many other biochemicals derived from acetyl-coA. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study

    Science.gov (United States)

    de Vries, Paul S.; Sabater-Lleal, Maria; Chasman, Daniel I.; Trompet, Stella; Kleber, Marcus E.; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R.; Marioni, Riccardo E.; Weng, Lu-Chen; Grossmann, Vera; Brody, Jennifer A.; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M.; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J.; Rumley, Ann; Mulas, Antonella; de Craen, Anton J. M.; Grotevendt, Anne; Taylor, Kent D.; Delgado, Graciela E.; Kifley, Annette; Lopez, Lorna M.; Berentzen, Tina L.; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C.; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P. M.; Draisma, Harmen H. M.; Lowe, Gordon D.; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J.; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G.; McEvoy, Mark A.; Starr, John M.; Hysi, Pirro G.; Hernandez, Dena G.; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L.; Slagboom, P. Eline; Zeller, Tanja; Psaty, Bruce M.; Uitterlinden, André G.; de Geus, Eco J. C.; Stott, David J.; Binder, Harald; Hofman, Albert; Franco, Oscar H.; Rotter, Jerome I.; Ferrucci, Luigi; Spector, Tim D.; Deary, Ian J.; März, Winfried; Greinacher, Andreas; Wild, Philipp S.; Cucca, Francesco; Boomsma, Dorret I.; Watkins, Hugh; Tang, Weihong; Ridker, Paul M.; Jukema, Jan W.; Scott, Rodney J.; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J.; Smith, Nicholas L.; Strachan, David P.

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10−8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10−8), the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development. PMID:28107422

  19. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study.

    Directory of Open Access Journals (Sweden)

    Paul S de Vries

    Full Text Available An increasing number of genome-wide association (GWA studies are now using the higher resolution 1000 Genomes Project reference panel (1000G for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10-8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value < 2.5×10-8, the number of loci significant only using HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development.

  20. Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study.

    Science.gov (United States)

    de Vries, Paul S; Sabater-Lleal, Maria; Chasman, Daniel I; Trompet, Stella; Ahluwalia, Tarunveer S; Teumer, Alexander; Kleber, Marcus E; Chen, Ming-Huei; Wang, Jie Jin; Attia, John R; Marioni, Riccardo E; Steri, Maristella; Weng, Lu-Chen; Pool, Rene; Grossmann, Vera; Brody, Jennifer A; Venturini, Cristina; Tanaka, Toshiko; Rose, Lynda M; Oldmeadow, Christopher; Mazur, Johanna; Basu, Saonli; Frånberg, Mattias; Yang, Qiong; Ligthart, Symen; Hottenga, Jouke J; Rumley, Ann; Mulas, Antonella; de Craen, Anton J M; Grotevendt, Anne; Taylor, Kent D; Delgado, Graciela E; Kifley, Annette; Lopez, Lorna M; Berentzen, Tina L; Mangino, Massimo; Bandinelli, Stefania; Morrison, Alanna C; Hamsten, Anders; Tofler, Geoffrey; de Maat, Moniek P M; Draisma, Harmen H M; Lowe, Gordon D; Zoledziewska, Magdalena; Sattar, Naveed; Lackner, Karl J; Völker, Uwe; McKnight, Barbara; Huang, Jie; Holliday, Elizabeth G; McEvoy, Mark A; Starr, John M; Hysi, Pirro G; Hernandez, Dena G; Guan, Weihua; Rivadeneira, Fernando; McArdle, Wendy L; Slagboom, P Eline; Zeller, Tanja; Psaty, Bruce M; Uitterlinden, André G; de Geus, Eco J C; Stott, David J; Binder, Harald; Hofman, Albert; Franco, Oscar H; Rotter, Jerome I; Ferrucci, Luigi; Spector, Tim D; Deary, Ian J; März, Winfried; Greinacher, Andreas; Wild, Philipp S; Cucca, Francesco; Boomsma, Dorret I; Watkins, Hugh; Tang, Weihong; Ridker, Paul M; Jukema, Jan W; Scott, Rodney J; Mitchell, Paul; Hansen, Torben; O'Donnell, Christopher J; Smith, Nicholas L; Strachan, David P; Dehghan, Abbas

    2017-01-01

    An increasing number of genome-wide association (GWA) studies are now using the higher resolution 1000 Genomes Project reference panel (1000G) for imputation, with the expectation that 1000G imputation will lead to the discovery of additional associated loci when compared to HapMap imputation. In order to assess the improvement of 1000G over HapMap imputation in identifying associated loci, we compared the results of GWA studies of circulating fibrinogen based on the two reference panels. Using both HapMap and 1000G imputation we performed a meta-analysis of 22 studies comprising the same 91,953 individuals. We identified six additional signals using 1000G imputation, while 29 loci were associated using both HapMap and 1000G imputation. One locus identified using HapMap imputation was not significant using 1000G imputation. The genome-wide significance threshold of 5×10-8 is based on the number of independent statistical tests using HapMap imputation, and 1000G imputation may lead to further independent tests that should be corrected for. When using a stricter Bonferroni correction for the 1000G GWA study (P-value HapMap imputation increased to 4 while the number of loci significant only using 1000G decreased to 5. In conclusion, 1000G imputation enabled the identification of 20% more loci than HapMap imputation, although the advantage of 1000G imputation became less clear when a stricter Bonferroni correction was used. More generally, our results provide insights that are applicable to the implementation of other dense reference panels that are under development.

  1. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    Science.gov (United States)

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer. Copyright © 2016. Published by Elsevier SAS.

  2. Improved Complete Genome Sequence of the Extremely Radioresistant Bacterium Deinococcus radiodurans R1 Obtained Using PacBio Single-Molecule Sequencing

    OpenAIRE

    Hua, Xiaoting; Hua, Yuejin

    2016-01-01

    The genome sequence of Deinococcus radiodurans R1 was published in 1999. We resequenced D.?radiodurans R1 using PacBio and compared the sequence with the published one. Large insertions and single nucleotide polymorphisms (SNPs) were observed among the genome sequences. A more accurate genome sequence will be helpful to studies of D.?radiodurans.

  3. A Genomic Survey of SCPP Family Genes in Fishes Provides Novel Insights into the Evolution of Fish Scales

    Directory of Open Access Journals (Sweden)

    Yunyun Lv

    2017-11-01

    Full Text Available The family of secretory calcium-binding phosphoproteins (SCPPs have been considered vital to skeletal tissue mineralization. However, most previous SCPP studies focused on phylogenetically distant animals but not on those closely related species. Here we provide novel insights into the coevolution of SCPP genes and fish scales in 10 species from Otophysi. According to their scale phenotypes, these fishes can be divided into three groups, i.e., scaled, sparsely scaled, and scaleless. We identified homologous SCPP genes in the genomes of these species and revealed an absence of some SCPP members in some genomes, suggesting an uneven evolutionary history of SCPP genes in fishes. In addition, most of these SCPP genes, with the exception of SPP1, individually form one or two gene cluster(s on each corresponding genome. Furthermore, we constructed phylogenetic trees using maximum likelihood method to estimate their evolution. The phylogenetic topology mostly supports two subclasses in some species, such as Cyprinus carpio, Sinocyclocheilus anshuiensis, S. grahamin, and S. rhinocerous, but not in the other examined fishes. By comparing the gene structures of recently reported candidate genes, SCPP1 and SCPP5, for determining scale phenotypes, we found that the hypothesis is suitable for Astyanax mexicanus, but denied by S. anshuiensis, even though they are both sparsely scaled for cave adaptation. Thus, we conclude that, although different fish species display similar scale phenotypes, the underlying genetic changes however might be diverse. In summary, this paper accelerates the recognition of the SCPP family in teleosts for potential scale evolution.

  4. Continuing Evolution of Burkholderia mallei Through Genome Reduction and Large-Scale Rearrangements

    Science.gov (United States)

    2010-01-22

    Myers GS, et al. 2006. Skewed genomic variability in strains of the toxigenic bacterial pathogen, Clostridium perfringens . Genome Res. 16:1031–1040...and Roth 1987; Kothapalli et al. 2005). In addition, Escherichia coli strains with differently sized replichores are at a growth disadvantage...Lesterlin et al. 2008). Thus, it is possible that the attenuation in virulence in NCTC10247 can be explained by these genomic constraints. The growth rate of

  5. Genome-scale transcriptional analyses of first-generation interspecific sunflower hybrids reveals broad regulatory compatibility

    OpenAIRE

    Rowe, Heather C.; Rieseberg, Loren H.

    2013-01-01

    Background Interspecific hybridization creates individuals harboring diverged genomes. The interaction of these genomes can generate successful evolutionary novelty or disadvantageous genomic conflict. Annual sunflowers Helianthus annuus and H. petiolaris have a rich history of hybridization in natural populations. Although first-generation hybrids generally have low fertility, hybrid swarms that include later generation and fully fertile backcross plants have been identified, as well as at l...

  6. Whole exome re-sequencing implicates CCDC38 and cilia structure and function in resistance to smoking related airflow obstruction.

    Directory of Open Access Journals (Sweden)

    Louise V Wain

    2014-05-01

    Full Text Available Chronic obstructive pulmonary disease (COPD is a leading cause of global morbidity and mortality and, whilst smoking remains the single most important risk factor, COPD risk is heritable. Of 26 independent genomic regions showing association with lung function in genome-wide association studies, eleven have been reported to show association with airflow obstruction. Although the main risk factor for COPD is smoking, some individuals are observed to have a high forced expired volume in 1 second (FEV1 despite many years of heavy smoking. We hypothesised that these "resistant smokers" may harbour variants which protect against lung function decline caused by smoking and provide insight into the genetic determinants of lung health. We undertook whole exome re-sequencing of 100 heavy smokers who had healthy lung function given their age, sex, height and smoking history and applied three complementary approaches to explore the genetic architecture of smoking resistance. Firstly, we identified novel functional variants in the "resistant smokers" and looked for enrichment of these novel variants within biological pathways. Secondly, we undertook association testing of all exonic variants individually with two independent control sets. Thirdly, we undertook gene-based association testing of all exonic variants. Our strongest signal of association with smoking resistance for a non-synonymous SNP was for rs10859974 (P = 2.34 × 10(-4 in CCDC38, a gene which has previously been reported to show association with FEV1/FVC, and we demonstrate moderate expression of CCDC38 in bronchial epithelial cells. We identified an enrichment of novel putatively functional variants in genes related to cilia structure and function in resistant smokers. Ciliary function abnormalities are known to be associated with both smoking and reduced mucociliary clearance in patients with COPD. We suggest that genetic influences on the development or function of cilia in the bronchial

  7. Inferring the demographic history of African farmers and pygmy hunter-gatherers using a multilocus resequencing data set.

    Directory of Open Access Journals (Sweden)

    Etienne Patin

    2009-04-01

    Full Text Available The transition from hunting and gathering to farming involved a major cultural innovation that has spread rapidly over most of the globe in the last ten millennia. In sub-Saharan Africa, hunter-gatherers have begun to shift toward an agriculture-based lifestyle over the last 5,000 years. Only a few populations still base their mode of subsistence on hunting and gathering. The Pygmies are considered to be the largest group of mobile hunter-gatherers of Africa. They dwell in equatorial rainforests and are characterized by their short mean stature. However, little is known about the chronology of the demographic events-size changes, population splits, and gene flow--ultimately giving rise to contemporary Pygmy (Western and Eastern groups and neighboring agricultural populations. We studied the branching history of Pygmy hunter-gatherers and agricultural populations from Africa and estimated separation times and gene flow between these populations. We resequenced 24 independent noncoding regions across the genome, corresponding to a total of approximately 33 kb per individual, in 236 samples from seven Pygmy and five agricultural populations dispersed over the African continent. We used simulation-based inference to identify the historical model best fitting our data. The model identified included the early divergence of the ancestors of Pygmy hunter-gatherers and farming populations approximately 60,000 years ago, followed by a split of the Pygmies' ancestors into the Western and Eastern Pygmy groups approximately 20,000 years ago. Our findings increase knowledge of the history of the peopling of the African continent in a region lacking archaeological data. An appreciation of the demographic and adaptive history of African populations with different modes of subsistence should improve our understanding of the influence of human lifestyles on genome diversity.

  8. Inferring the demographic history of African farmers and pygmy hunter-gatherers using a multilocus resequencing data set.

    Science.gov (United States)

    Patin, Etienne; Laval, Guillaume; Barreiro, Luis B; Salas, Antonio; Semino, Ornella; Santachiara-Benerecetti, Silvana; Kidd, Kenneth K; Kidd, Judith R; Van der Veen, Lolke; Hombert, Jean-Marie; Gessain, Antoine; Froment, Alain; Bahuchet, Serge; Heyer, Evelyne; Quintana-Murci, Lluís

    2009-04-01

    The transition from hunting and gathering to farming involved a major cultural innovation that has spread rapidly over most of the globe in the last ten millennia. In sub-Saharan Africa, hunter-gatherers have begun to shift toward an agriculture-based lifestyle over the last 5,000 years. Only a few populations still base their mode of subsistence on hunting and gathering. The Pygmies are considered to be the largest group of mobile hunter-gatherers of Africa. They dwell in equatorial rainforests and are characterized by their short mean stature. However, little is known about the chronology of the demographic events-size changes, population splits, and gene flow--ultimately giving rise to contemporary Pygmy (Western and Eastern) groups and neighboring agricultural populations. We studied the branching history of Pygmy hunter-gatherers and agricultural populations from Africa and estimated separation times and gene flow between these populations. We resequenced 24 independent noncoding regions across the genome, corresponding to a total of approximately 33 kb per individual, in 236 samples from seven Pygmy and five agricultural populations dispersed over the African continent. We used simulation-based inference to identify the historical model best fitting our data. The model identified included the early divergence of the ancestors of Pygmy hunter-gatherers and farming populations approximately 60,000 years ago, followed by a split of the Pygmies' ancestors into the Western and Eastern Pygmy groups approximately 20,000 years ago. Our findings increase knowledge of the history of the peopling of the African continent in a region lacking archaeological data. An appreciation of the demographic and adaptive history of African populations with different modes of subsistence should improve our understanding of the influence of human lifestyles on genome diversity.

  9. Inferring the Demographic History of African Farmers and Pygmy Hunter–Gatherers Using a Multilocus Resequencing Data Set

    Science.gov (United States)

    Patin, Etienne; Laval, Guillaume; Barreiro, Luis B.; Salas, Antonio; Semino, Ornella; Santachiara-Benerecetti, Silvana; Kidd, Kenneth K.; Kidd, Judith R.; Van der Veen, Lolke; Hombert, Jean-Marie; Gessain, Antoine; Froment, Alain; Bahuchet, Serge; Heyer, Evelyne; Quintana-Murci, Lluís

    2009-01-01

    The transition from hunting and gathering to farming involved a major cultural innovation that has spread rapidly over most of the globe in the last ten millennia. In sub-Saharan Africa, hunter–gatherers have begun to shift toward an agriculture-based lifestyle over the last 5,000 years. Only a few populations still base their mode of subsistence on hunting and gathering. The Pygmies are considered to be the largest group of mobile hunter–gatherers of Africa. They dwell in equatorial rainforests and are characterized by their short mean stature. However, little is known about the chronology of the demographic events—size changes, population splits, and gene flow—ultimately giving rise to contemporary Pygmy (Western and Eastern) groups and neighboring agricultural populations. We studied the branching history of Pygmy hunter–gatherers and agricultural populations from Africa and estimated separation times and gene flow between these populations. We resequenced 24 independent noncoding regions across the genome, corresponding to a total of ∼33 kb per individual, in 236 samples from seven Pygmy and five agricultural populations dispersed over the African continent. We used simulation-based inference to identify the historical model best fitting our data. The model identified included the early divergence of the ancestors of Pygmy hunter–gatherers and farming populations ∼60,000 years ago, followed by a split of the Pygmies' ancestors into the Western and Eastern Pygmy groups ∼20,000 years ago. Our findings increase knowledge of the history of the peopling of the African continent in a region lacking archaeological data. An appreciation of the demographic and adaptive history of African populations with different modes of subsistence should improve our understanding of the influence of human lifestyles on genome diversity. PMID:19360089

  10. LocateP: Genome-scale subcellular-location predictor for bacterial proteins

    Directory of Open Access Journals (Sweden)

    Zhou Miaomiao

    2008-03-01

    Full Text Available Abstract Background In the past decades, various protein subcellular-location (SCL predictors have been developed. Most of these predictors, like TMHMM 2.0, SignalP 3.0, PrediSi and Phobius, aim at the identification of one or a few SCLs, whereas others such as CELLO and Psortb.v.2.0 aim at a broader classification. Although these tools and pipelines can achieve a high precision in the accurate prediction of signal peptides and transmembrane helices, they have a much lower accuracy when other sequence characteristics are concerned. For instance, it proved notoriously difficult to identify the fate of proteins carrying a putative type I signal peptidase (SPIase cleavage site, as many of those proteins are retained in the cell membrane as N-terminally anchored membrane proteins. Moreover, most of the SCL classifiers are based on the classification of the Swiss-Prot database and consequently inherited the inconsistency of that SCL classification. As accurate and detailed SCL prediction on a genome scale is highly desired by experimental researchers, we decided to construct a new SCL prediction pipeline: LocateP. Results LocateP combines many of the existing high-precision SCL identifiers with our own newly developed identifiers for specific SCLs. The LocateP pipeline was designed such that it mimics protein targeting and secretion processes. It distinguishes 7 different SCLs within Gram-positive bacteria: intracellular, multi-transmembrane, N-terminally membrane anchored, C-terminally membrane anchored, lipid-anchored, LPxTG-type cell-wall anchored, and secreted/released proteins. Moreover, it distinguishes pathways for Sec- or Tat-dependent secretion and alternative secretion of bacteriocin-like proteins. The pipeline was tested on data sets extracted from literature, including experimental proteomics studies. The tests showed that LocateP performs as well as, or even slightly better than other SCL predictors for some locations and outperforms

  11. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses.

    Directory of Open Access Journals (Sweden)

    Md Shamsuzzoha Bayzid

    Full Text Available Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS, modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees, they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014 presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also

  12. CATMA, a comprehensive genome-scale resource for silencing and transcript profiling of Arabidopsis genes

    Directory of Open Access Journals (Sweden)

    Moreau Yves

    2007-10-01

    Gène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition. Conclusion To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.

  13. A protocol for large scale genomic DNA isolation for cacao genetics ...

    African Journals Online (AJOL)

    Advances in DNA technology, such as marker assisted selection, detection of quantitative trait loci and genomic selection also require the isolation of DNA from a large number of samples and the preservation of tissue samples for future use in cacao genome studies. The present study proposes a method for the ...

  14. Exploring the metabolic network of the epidemic pathogen Burkholderia cenocepacia J2315 via genome-scale reconstruction

    Directory of Open Access Journals (Sweden)

    Panda Gurudutta

    2011-05-01

    Full Text Available Abstract Background Burkholderia cenocepacia is a threatening nosocomial epidemic pathogen in patients with cystic fibrosis (CF or a compromised immune system. Its high level of antibiotic resistance is an increasing concern in treatments against its infection. Strain B. cenocepacia J2315 is the most infectious isolate from CF patients. There is a strong demand to reconstruct a genome-scale metabolic network of B. cenocepacia J2315 to systematically analyze its metabolic capabilities and its virulence traits, and to search for potential clinical therapy targets. Results We reconstructed the genome-scale metabolic network of B. cenocepacia J2315. An iterative reconstruction process led to the establishment of a robust model, iKF1028, which accounts for 1,028 genes, 859 internal reactions, and 834 metabolites. The model iKF1028 captures important metabolic capabilities of B. cenocepacia J2315 with a particular focus on the biosyntheses of key metabolic virulence factors to assist in understanding the mechanism of disease infection and identifying potential drug targets. The model was tested through BIOLOG assays. Based on the model, the genome annotation of B. cenocepacia J2315 was refined and 24 genes were properly re-annotated. Gene and enzyme essentiality were analyzed to provide further insights into the genome function and architecture. A total of 45 essential enzymes were identified as potential therapeutic targets. Conclusions As the first genome-scale metabolic network of B. cenocepacia J2315, iKF1028 allows a systematic study of the metabolic properties of B. cenocepacia and its key metabolic virulence factors affecting the CF community. The model can be used as a discovery tool to design novel drugs against diseases caused by this notorious pathogen.

  15. FORG3D: Force-directed 3D graph editor for visualization of integrated genome scale data

    Directory of Open Access Journals (Sweden)

    Paananen Jussi

    2009-02-01

    Full Text Available Abstract Background Genomics research produces vast amounts of experimental data that needs to be integrated in order to understand, model, and interpret the underlying biological phenomena. Interpreting these large and complex data sets is challenging and different visualization methods are needed to help produce knowledge from the data. Results To help researchers to visualize and interpret integrated genomics data, we present a novel visualization method and bioinformatics software tool called FORG3D that is based on real-time three-dimensional force-directed graphs. FORG3D can be used to visualize integrated networks of genome scale data such as interactions between genes or gene products, signaling transduction, metabolic pathways, functional interactions and evolutionary relationships. Furthermore, we demonstrate its utility by exploring gene network relationships using integrated data sets from a Caenorhabditis elegans Parkinson's disease model. Conclusion We have created an open source software tool called FORG3D that can be used for visualizing and exploring integrated genome scale data.

  16. Integrating splice-isoform expression into genome-scale models characterizes breast cancer metabolism.

    Science.gov (United States)

    Angione, Claudio

    2018-02-01

    Despite being often perceived as the main contributors to cell fate and physiology, genes alone cannot predict cellular phenotype. During the process of gene expression, 95% of human genes can code for multiple proteins due to alternative splicing. While most splice variants of a gene carry the same function, variants within some key genes can have remarkably different roles. To bridge the gap between genotype and phenotype, condition- and tissue-specific models of metabolism have been constructed. However, current metabolic models only include information at the gene level. Consequently, as recently acknowledged by the scientific community, common situations where changes in splice-isoform expression levels alter the metabolic outcome cannot be modeled. We here propose GEMsplice, the first method for the incorporation of splice-isoform expression data into genome-scale metabolic models. Using GEMsplice, we make full use of RNA-Seq quantitative expression profiles to predict, for the first time, the effects of splice isoform-level changes in the metabolism of 1455 patients with 31 different breast cancer types. We validate GEMsplice by generating cancer-versus-normal predictions on metabolic pathways, and by comparing with gene-level approaches and available literature on pathways affected by breast cancer. GEMsplice is freely available for academic use at https://github.com/GEMsplice/GEMsplice_code. Compared to state-of-the-art methods, we anticipate that GEMsplice will enable for the first time computational analyses at transcript level with splice-isoform resolution. https://github.com/GEMsplice/GEMsplice_code. c.angione@tees.ac.uk. Supplementary data are available at Bioinformatics online.

  17. Genome-scale prediction of proteins with long intrinsically disordered regions.

    Science.gov (United States)

    Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz

    2014-01-01

    Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.

  18. A multi-tissue type genome-scale metabolic network for analysis of whole-body systems physiology

    OpenAIRE

    Bordbar, Aarash; Feist, Adam M; Usaite-Black, Renata; Woodcock, Joseph; Palsson, Bernhard O; Famili, Iman

    2011-01-01

    Abstract Background Genome-scale metabolic reconstructions provide a biologically meaningful mechanistic basis for the genotype-phenotype relationship. The global human metabolic network, termed Recon 1, has recently been reconstructed allowing the systems analysis of human metabolic physiology and pathology. Utilizing high-throughput data, Recon 1 has recently been tailored to different cells and tissues, including the liver, kidney, brain, and alveolar macrophage. These models have shown ut...

  19. Genome-scale modeling using flux ratio constraints to enable metabolic engineering of clostridial metabolism in silico

    Directory of Open Access Journals (Sweden)

    McAnulty Michael J

    2012-05-01

    Full Text Available Abstract Background Genome-scale metabolic networks and flux models are an effective platform for linking an organism genotype to its phenotype. However, few modeling approaches offer predictive capabilities to evaluate potential metabolic engineering strategies in silico. Results A new method called “flux balance analysis with flux ratios (FBrAtio” was developed in this research and applied to a new genome-scale model of Clostridium acetobutylicum ATCC 824 (iCAC490 that contains 707 metabolites and 794 reactions. FBrAtio was used to model wild-type metabolism and metabolically engineered strains of C. acetobutylicum where only flux ratio constraints and thermodynamic reversibility of reactions were required. The FBrAtio approach allowed solutions to be found through standard linear programming. Five flux ratio constraints were required to achieve a qualitative picture of wild-type metabolism for C. acetobutylicum for the production of: (i acetate, (ii lactate, (iii butyrate, (iv acetone, (v butanol, (vi ethanol, (vii CO2 and (viii H2. Results of this simulation study coincide with published experimental results and show the knockdown of the acetoacetyl-CoA transferase increases butanol to acetone selectivity, while the simultaneous over-expression of the aldehyde/alcohol dehydrogenase greatly increases ethanol production. Conclusions FBrAtio is a promising new method for constraining genome-scale models using internal flux ratios. The method was effective for modeling wild-type and engineered strains of C. acetobutylicum.

  20. Genome-environment association study suggests local adaptation to climate at the regional scale in Fagus sylvatica.

    Science.gov (United States)

    Pluess, Andrea R; Frank, Aline; Heiri, Caroline; Lalagüe, Hadrien; Vendramin, Giovanni G; Oddou-Muratorio, Sylvie

    2016-04-01

    The evolutionary potential of long-lived species, such as forest trees, is fundamental for their local persistence under climate change (CC). Genome-environment association (GEA) analyses reveal if species in heterogeneous environments at the regional scale are under differential selection resulting in populations with potential preadaptation to CC within this area. In 79 natural Fagus sylvatica populations, neutral genetic patterns were characterized using 12 simple sequence repeat (SSR) markers, and genomic variation (144 single nucleotide polymorphisms (SNPs) out of 52 candidate genes) was related to 87 environmental predictors in the latent factor mixed model, logistic regressions and isolation by distance/environmental (IBD/IBE) tests. SSR diversity revealed relatedness at up to 150 m intertree distance but an absence of large-scale spatial genetic structure and IBE. In the GEA analyses, 16 SNPs in 10 genes responded to one or several environmental predictors and IBE, corrected for IBD, was confirmed. The GEA often reflected the proposed gene functions, including indications for adaptation to water availability and temperature. Genomic divergence and the lack of large-scale neutral genetic patterns suggest that gene flow allows the spread of advantageous alleles in adaptive genes. Thereby, adaptation processes are likely to take place in species occurring in heterogeneous environments, which might reduce their regional extinction risk under CC. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  1. Genome-scale modeling using flux ratio constraints to enable metabolic engineering of clostridial metabolism in silico

    Science.gov (United States)

    2012-01-01

    Background Genome-scale metabolic networks and flux models are an effective platform for linking an organism genotype to its phenotype. However, few modeling approaches offer predictive capabilities to evaluate potential metabolic engineering strategies in silico. Results A new method called “flux balance analysis with flux ratios (FBrAtio)” was developed in this research and applied to a new genome-scale model of Clostridium acetobutylicum ATCC 824 (iCAC490) that contains 707 metabolites and 794 reactions. FBrAtio was used to model wild-type metabolism and metabolically engineered strains of C. acetobutylicum where only flux ratio constraints and thermodynamic reversibility of reactions were required. The FBrAtio approach allowed solutions to be found through standard linear programming. Five flux ratio constraints were required to achieve a qualitative picture of wild-type metabolism for C. acetobutylicum for the production of: (i) acetate, (ii) lactate, (iii) butyrate, (iv) acetone, (v) butanol, (vi) ethanol, (vii) CO2 and (viii) H2. Results of this simulation study coincide with published experimental results and show the knockdown of the acetoacetyl-CoA transferase increases butanol to acetone selectivity, while the simultaneous over-expression of the aldehyde/alcohol dehydrogenase greatly increases ethanol production. Conclusions FBrAtio is a promising new method for constraining genome-scale models using internal flux ratios. The method was effective for modeling wild-type and engineered strains of C. acetobutylicum. PMID:22583864

  2. Large-scale robot-assisted genome shuffling yields industrial Saccharomyces cerevisiae yeasts with increased ethanol tolerance.

    Science.gov (United States)

    Snoek, Tim; Picca Nicolino, Martina; Van den Bremt, Stefanie; Mertens, Stijn; Saels, Veerle; Verplaetse, Alex; Steensels, Jan; Verstrepen, Kevin J

    2015-01-01

    During the final phases of bioethanol fermentation, yeast cells face high ethanol concentrations. This stress results in slower or arrested fermentations and limits ethanol production. Novel Saccharomyces cerevisiae strains with superior ethanol tolerance may therefore allow increased yield and efficiency. Genome shuffling has emerged as a powerful approach to rapidly enhance complex traits including ethanol tolerance, yet previous efforts have mostly relied on a mutagenized pool of a single strain, which can potentially limit the effectiveness. Here, we explore novel robot-assisted strategies that allow to shuffle the genomes of multiple parental yeasts on an unprecedented scale. Screening of 318 different yeasts for ethanol accumulation, sporulation efficiency, and genetic relatedness yielded eight heterothallic strains that served as parents for genome shuffling. In a first approach, the parental strains were subjected to multiple consecutive rounds of random genome shuffling with different selection methods, yielding several hybrids that showed increased ethanol tolerance. Interestingly, on average, hybrids from the first generation (F1) showed higher ethanol production than hybrids from the third generation (F3). In a second approach, we applied several successive rounds of robot-assisted targeted genome shuffling, yielding more than 3,000 targeted crosses. Hybrids selected for ethanol tolerance showed increased ethanol tolerance and production as compared to unselected hybrids, and F1 hybrids were on average superior to F3 hybrids. In total, 135 individual F1 and F3 hybrids were tested in small-scale very high gravity fermentations. Eight hybrids demonstrated superior fermentation performance over the commercial biofuel strain Ethanol Red, showing a 2 to 7% increase in maximal ethanol accumulation. In an 8-l pilot-scale test, the best-performing hybrid fermented medium containing 32% (w/v) glucose to dryness, yielding 18.7% (v/v) ethanol with a productivity

  3. Genome-scale comparison and constraint-based metabolic reconstruction of the facultative anaerobic Fe(III-reducer Rhodoferax ferrireducens

    Directory of Open Access Journals (Sweden)

    Daugherty Sean

    2009-09-01

    Full Text Available Abstract Background Rhodoferax ferrireducens is a metabolically versatile, Fe(III-reducing, subsurface microorganism that is likely to play an important role in the carbon and metal cycles in the subsurface. It also has the unique ability to convert sugars to electricity, oxidizing the sugars to carbon dioxide with quantitative electron transfer to graphite electrodes in microbial fuel cells. In order to expand our limited knowledge about R. ferrireducens, the complete genome sequence of this organism was further annotated and then the physiology of R. ferrireducens was investigated with a constraint-based, genome-scale in silico metabolic model and laboratory studies. Results The iterative modeling and experimental approach unveiled exciting, previously unknown physiological features, including an expanded range of substrates that support growth, such as cellobiose and citrate, and provided additional insights into important features such as the stoichiometry of the electron transport chain and the ability to grow via fumarate dismutation. Further analysis explained why R. ferrireducens is unable to grow via photosynthesis or fermentation of sugars like other members of this genus and uncovered novel genes for benzoate metabolism. The genome also revealed that R. ferrireducens is well-adapted for growth in the subsurface because it appears to be capable of dealing with a number of environmental insults, including heavy metals, aromatic compounds, nutrient limitation and oxidative stress. Conclusion This study demonstrates that combining genome-scale modeling with the annotation of a new genome sequence can guide experimental studies and accelerate the understanding of the physiology of under-studied yet environmentally relevant microorganisms.

  4. A Chromosome-Scale Assembly of the Bactrocera cucurbitae Genome Provides Insight to the Genetic Basis of white pupae

    Directory of Open Access Journals (Sweden)

    Sheina B. Sim

    2017-06-01

    Full Text Available Genetic sexing strains (GSS used in sterile insect technique (SIT programs are textbook examples of how classical Mendelian genetics can be directly implemented in the management of agricultural insect pests. Although the foundation of traditionally developed GSS are single locus, autosomal recessive traits, their genetic basis are largely unknown. With the advent of modern genomic techniques, the genetic basis of sexing traits in GSS can now be further investigated. This study is the first of its kind to integrate traditional genetic techniques with emerging genomics to characterize a GSS using the tephritid fruit fly pest Bactrocera cucurbitae as a model. These techniques include whole-genome sequencing, the development of a mapping population and linkage map, and quantitative trait analysis. The experiment designed to map the genetic sexing trait in B. cucurbitae, white pupae (wp, also enabled the generation of a chromosome-scale genome assembly by integrating the linkage map with the assembly. Quantitative trait loci analysis revealed SNP loci near position 42 MB on chromosome 3 to be tightly linked to wp. Gene annotation and synteny analysis show a near perfect relationship between chromosomes in B. cucurbitae and Muller elements A–E in Drosophila melanogaster. This chromosome-scale genome assembly is complete, has high contiguity, was generated using a minimal input DNA, and will be used to further characterize the genetic mechanisms underlying wp. Knowledge of the genetic basis of genetic sexing traits can be used to improve SIT in this species and expand it to other economically important Diptera.

  5. Folding Free Energies of 5'-UTRs Impact Post-Transcriptional Regulation on a Genomic Scale in Yeast.

    Directory of Open Access Journals (Sweden)

    2005-12-01

    Full Text Available Using high-throughput technologies, abundances and other features of genes and proteins have been measured on a genome-wide scale in Saccharomyces cerevisiae. In contrast, secondary structure in 5'-untranslated regions (UTRs of mRNA has only been investigated for a limited number of genes. Here, the aim is to study genome-wide regulatory effects of mRNA 5'-UTR folding free energies. We performed computations of secondary structures in 5'-UTRs and their folding free energies for all verified genes in S. cerevisiae. We found significant correlations between folding free energies of 5'-UTRs and various transcript features measured in genome-wide studies of yeast. In particular, mRNAs with weakly folded 5'-UTRs have higher translation rates, higher abundances of the corresponding proteins, longer half-lives, and higher numbers of transcripts, and are upregulated after heat shock. Furthermore, 5'-UTRs have significantly higher folding free energies than other genomic regions and randomized sequences. We also found a positive correlation between transcript half-life and ribosome occupancy that is more pronounced for short-lived transcripts, which supports a picture of competition between translation and degradation. Among the genes with strongly folded 5'-UTRs, there is a huge overrepresentation of uncharacterized open reading frames. Based on our analysis, we conclude that (i there is a widespread bias for 5'-UTRs to be weakly folded, (ii folding free energies of 5'-UTRs are correlated with mRNA translation and turnover on a genomic scale, and (iii transcripts with strongly folded 5'-UTRs are often rare and hard to find experimentally.

  6. Dissecting the energy metabolism in Mycoplasma pneumoniae through genome-scale metabolic modeling

    NARCIS (Netherlands)

    Wodke, J.A.; Puchalka, J.; Lluch-Senar, M.; Marcos, J.; Yus, E.; Godinho, M.; Gutierrez-Gallego, R.; Martins Dos Santos, V.A.P.; Serrano, L.; Klipp, E.; Maier, T.

    2013-01-01

    Mycoplasma pneumoniae, a threatening pathogen with a minimal genome, is a model organism for bacterial systems biology for which substantial experimental information is available. With the goal of understanding the complex interactions underlying its metabolism, we analyzed and characterized the

  7. Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models.

    Directory of Open Access Journals (Sweden)

    Matthew N Benedict

    2014-10-01

    Full Text Available Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information

  8. Outlier-based identification of copy number variations using targeted resequencing in a small cohort of patients with Tetralogy of Fallot.

    Directory of Open Access Journals (Sweden)

    Vikas Bansal

    Full Text Available Copy number variations (CNVs are one of the main sources of variability in the human genome. Many CNVs are associated with various diseases including cardiovascular disease. In addition to hybridization-based methods, next-generation sequencing (NGS technologies are increasingly used for CNV discovery. However, respective computational methods applicable to NGS data are still limited. We developed a novel CNV calling method based on outlier detection applicable to small cohorts, which is of particular interest for the discovery of individual CNVs within families, de novo CNVs in trios and/or small cohorts of specific phenotypes like rare diseases. Approximately 7,000 rare diseases are currently known, which collectively affect ∼6% of the population. For our method, we applied the Dixon's Q test to detect outliers and used a Hidden Markov Model for their assessment. The method can be used for data obtained by exome and targeted resequencing. We evaluated our outlier-based method in comparison to the CNV calling tool CoNIFER using eight HapMap exome samples and subsequently applied both methods to targeted resequencing data of patients with Tetralogy of Fallot (TOF, the most common cyanotic congenital heart disease. In both the HapMap samples and the TOF cases, our method is superior to CoNIFER, such that it identifies more true positive CNVs. Called CNVs in TOF cases were validated by qPCR and HapMap CNVs were confirmed with available array-CGH data. In the TOF patients, we found four copy number gains affecting three genes, of which two are important regulators of heart development (NOTCH1, ISL1 and one is located in a region associated with cardiac malformations (PRODH at 22q11. In summary, we present a novel CNV calling method based on outlier detection, which will be of particular interest for the analysis of de novo or individual CNVs in trios or cohorts up to 30 individuals, respectively.

  9. Re-sequencing transgenic plants revealed rearrangements at T-DNA inserts, and integration of a short T-DNA fragment, but no increase of small mutations elsewhere.

    Science.gov (United States)

    Schouten, Henk J; Vande Geest, Henri; Papadimitriou, Sofia; Bemer, Marian; Schaart, Jan G; Smulders, Marinus J M; Perez, Gabino Sanchez; Schijlen, Elio

    2017-03-01

    Transformation resulted in deletions and translocations at T-DNA inserts, but not in genome-wide small mutations. A tiny T-DNA splinter was detected that probably would remain undetected by conventional techniques. We investigated to which extent Agrobacterium tumefaciens-mediated transformation is mutagenic, on top of inserting T-DNA. To prevent mutations due to in vitro propagation, we applied floral dip transformation of Arabidopsis thaliana. We re-sequenced the genomes of five primary transformants, and compared these to genomic sequences derived from a pool of four wild-type plants. By genome-wide comparisons, we identified ten small mutations in the genomes of the five transgenic plants, not correlated to the positions or number of T-DNA inserts. This mutation frequency is within the range of spontaneous mutations occurring during seed propagation in A. thaliana, as determined earlier. In addition, we detected small as well as large deletions specifically at the T-DNA insert sites. Furthermore, we detected partial T-DNA inserts, one of these a tiny 50-bp fragment originating from a central part of the T-DNA construct used, inserted into the plant genome without flanking other T-DNA. Because of its small size, we named this fragment a T-DNA splinter. As far as we know this is the first report of such a small T-DNA fragment insert in absence of any T-DNA border sequence. Finally, we found evidence for translocations from other chromosomes, flanking T-DNA inserts. In this study, we showed that next-generation sequencing (NGS) is a highly sensitive approach to detect T-DNA inserts in transgenic plants.

  10. A new genome-scale metabolic model of Corynebacterium glutamicum and its application.

    Science.gov (United States)

    Zhang, Yu; Cai, Jingyi; Shang, Xiuling; Wang, Bo; Liu, Shuwen; Chai, Xin; Tan, Tianwei; Zhang, Yun; Wen, Tingyi

    2017-01-01

    Corynebacterium glutamicum is an important platform organism for industrial biotechnology to produce amino acids, organic acids, bioplastic monomers, and biofuels. The metabolic flexibility, broad substrate spectrum, and fermentative robustness of C. glutamicum make this organism an ideal cell factory to manufacture desired products. With increases in gene function, transport system, and metabolic profile information under certain conditions, developing a comprehensive genome-scale metabolic model (GEM) of C. glutamicum ATCC13032 is desired to improve prediction accuracy, elucidate cellular metabolism, and guide metabolic engineering. Here, we constructed a new GEM for ATCC13032, iCW773, consisting of 773 genes, 950 metabolites, and 1207 reactions. Compared to the previous model, iCW773 supplemented 496 gene-protein-reaction associations, refined five lumped reactions, balanced the mass and charge, and constrained the directionality of reactions. The simulated growth rates of C. glutamicum cultivated on seven different carbon sources using iCW773 were consistent with experimental values. Pearson's correlation coefficient between the iCW773-simulated and experimental fluxes was 0.99, suggesting that iCW773 provided an accurate intracellular flux distribution of the wild-type strain growing on glucose. Furthermore, genetic interventions for overproducing l-lysine, 1,2-propanediol and isobutanol simulated using OptForceMUST were in accordance with reported experimental results, indicating the practicability of iCW773 for the design of metabolic networks to overproduce desired products. In vivo genetic modifications of iCW773-predicted targets resulted in the de novo generation of an l-proline-overproducing strain. In fed-batch culture, the engineered C. glutamicum strain produced 66.43 g/L l-proline in 60 h with a yield of 0.26 g/g (l-proline/glucose) and a productivity of 1.11 g/L/h. To our knowledge, this is the highest titer and productivity reported for l

  11. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome...

  12. Multiplexed chromosome conformation capture sequencing for rapid genome-scale high-resolution detection of long-range chromatin interactions.

    Science.gov (United States)

    Stadhouders, Ralph; Kolovos, Petros; Brouwer, Rutger; Zuin, Jessica; van den Heuvel, Anita; Kockx, Christel; Palstra, Robert-Jan; Wendt, Kerstin S; Grosveld, Frank; van Ijcken, Wilfred; Soler, Eric

    2013-03-01

    Chromosome conformation capture (3C) technology is a powerful and increasingly popular tool for analyzing the spatial organization of genomes. Several 3C variants have been developed (e.g., 4C, 5C, ChIA-PET, Hi-C), allowing large-scale mapping of long-range genomic interactions. Here we describe multiplexed 3C sequencing (3C-seq), a 4C variant coupled to next-generation sequencing, allowing genome-scale detection of long-range interactions with candidate regions. Compared with several other available techniques, 3C-seq offers a superior resolution (typically single restriction fragment resolution; approximately 1-8 kb on average) and can be applied in a semi-high-throughput fashion. It allows the assessment of long-range interactions of up to 192 genes or regions of interest in parallel by multiplexing library sequencing. This renders multiplexed 3C-seq an inexpensive, quick (total hands-on time of 2 weeks) and efficient method that is ideal for the in-depth analysis of complex genetic loci. The preparation of multiplexed 3C-seq libraries can be performed by any investigator with basic skills in molecular biology techniques. Data analysis requires basic expertise in bioinformatics and in Linux and Python environments. The protocol describes all materials, critical steps and bioinformatics tools required for successful application of 3C-seq technology.

  13. Genome-scale reconstruction of the Streptococcus pyogenes M49 metabolic network reveals growth requirements and indicates potential drug targets.

    Science.gov (United States)

    Levering, Jennifer; Fiedler, Tomas; Sieg, Antje; van Grinsven, Koen W A; Hering, Silvio; Veith, Nadine; Olivier, Brett G; Klett, Lara; Hugenholtz, Jeroen; Teusink, Bas; Kreikemeyer, Bernd; Kummer, Ursula

    2016-08-20

    Genome-scale metabolic models comprise stoichiometric relations between metabolites, as well as associations between genes and metabolic reactions and facilitate the analysis of metabolism. We computationally reconstructed the metabolic network of the lactic acid bacterium Streptococcus pyogenes M49. Initially, we based the reconstruction on genome annotations and already existing and curated metabolic networks of Bacillus subtilis, Escherichia coli, Lactobacillus plantarum and Lactococcus lactis. This initial draft was manually curated with the final reconstruction accounting for 480 genes associated with 576 reactions and 558 metabolites. In order to constrain the model further, we performed growth experiments of wild type and arcA deletion strains of S. pyogenes M49 in a chemically defined medium and calculated nutrient uptake and production fluxes. We additionally performed amino acid auxotrophy experiments to test the consistency of the model. The established genome-scale model can be used to understand the growth requirements of the human pathogen S. pyogenes and define optimal and suboptimal conditions, but also to describe differences and similarities between S. pyogenes and related lactic acid bacteria such as L. lactis in order to find strategies to reduce the growth of the pathogen and propose drug targets. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Expanding Access to Large-Scale Genomic Data While Promoting Privacy: A Game Theoretic Approach.

    Science.gov (United States)

    Wan, Zhiyu; Vorobeychik, Yevgeniy; Xia, Weiyi; Clayton, Ellen Wright; Kantarcioglu, Murat; Malin, Bradley

    2017-02-02

    Emerging scientific endeavors are creating big data repositories of data from millions of individuals. Sharing data in a privacy-respecting manner could lead to important discoveries, but high-profile demonstrations show that links between de-identified genomic data and named persons can sometimes be reestablished. Such re-identification attacks have focused on worst-case scenarios and spurred the adoption of data-sharing practices that unnecessarily impede research. To mitigate concerns, organizations have traditionally relied upon legal deterrents, like data use agreements, and are considering suppressing or adding noise to genomic variants. In this report, we use a game theoretic lens to develop more effective, quantifiable protections for genomic data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources, not adversaries with unlimited means. We demonstrate this approach via a new public resource with genomic summary data from over 8,000 individuals-the Sequence and Phenotype Integration Exchange (SPHINX)-and show that risks can be balanced against utility more effectively than with traditional approaches. We further show the generalizability of this framework by applying it to other genomic data collection and sharing endeavors. Recognizing that such models are dependent on a variety of parameters, we perform extensive sensitivity analyses to show that our findings are robust to their fluctuations. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  15. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae

    DEFF Research Database (Denmark)

    Vongsangnak, Wanwipa; Olsen, Peter; Hansen, Kim

    2008-01-01

    of hypothetical proteins accounted for more than 50% of the annotated genes. Considering the industrial importance of this fungus, it is therefore valuable to improve the annotation and further integrate genomic information with biochemical and physiological information available for this microorganism and other...... related fungi. Here we proposed the gene prediction by construction of an A. oryzae Expressed Sequence Tag (EST) library, sequencing and assembly. We enhanced the function assignment by our developed annotation strategy. The resulting better annotation was used to reconstruct the metabolic network leading...... in assignment of new putative functions to 1,469 hypothetical proteins already present in the A. oryzae genome database. Using the substantially improved annotated genome we reconstructed the metabolic network of A. oryzae. This network contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073 metabolites and 1...

  16. Merlin: Computer-Aided Oligonucleotide Design for Large Scale Genome Engineering with MAGE.

    Science.gov (United States)

    Quintin, Michael; Ma, Natalie J; Ahmed, Samir; Bhatia, Swapnil; Lewis, Aaron; Isaacs, Farren J; Densmore, Douglas

    2016-06-17

    Genome engineering technologies now enable precise manipulation of organism genotype, but can be limited in scalability by their design requirements. Here we describe Merlin ( http://merlincad.org ), an open-source web-based tool to assist biologists in designing experiments using multiplex automated genome engineering (MAGE). Merlin provides methods to generate pools of single-stranded DNA oligonucleotides (oligos) for MAGE experiments by performing free energy calculation and BLAST scoring on a sliding window spanning the targeted site. These oligos are designed not only to improve recombination efficiency, but also to minimize off-target interactions. The application further assists experiment planning by reporting predicted allelic replacement rates after multiple MAGE cycles, and enables rapid result validation by generating primer sequences for multiplexed allele-specific colony PCR. Here we describe the Merlin oligo and primer design procedures and validate their functionality compared to OptMAGE by eliminating seven AvrII restriction sites from the Escherichia coli genome.

  17. Genome-scale regression analysis reveals a linear relationship for promoters and enhancers after combinatorial drug treatment

    KAUST Repository

    Rapakoulia, Trisevgeni

    2017-08-09

    Motivation: Drug combination therapy for treatment of cancers and other multifactorial diseases has the potential of increasing the therapeutic effect, while reducing the likelihood of drug resistance. In order to reduce time and cost spent in comprehensive screens, methods are needed which can model additive effects of possible drug combinations. Results: We here show that the transcriptional response to combinatorial drug treatment at promoters, as measured by single molecule CAGE technology, is accurately described by a linear combination of the responses of the individual drugs at a genome wide scale. We also find that the same linear relationship holds for transcription at enhancer elements. We conclude that the described approach is promising for eliciting the transcriptional response to multidrug treatment at promoters and enhancers in an unbiased genome wide way, which may minimize the need for exhaustive combinatorial screens.

  18. IMGMD: A platform for the integration and standardisation of In silico Microbial Genome-scale Metabolic Models.

    Science.gov (United States)

    Ye, Chao; Xu, Nan; Dong, Chuan; Ye, Yuannong; Zou, Xuan; Chen, Xiulai; Guo, Fengbiao; Liu, Liming

    2017-04-07

    Genome-scale metabolic models (GSMMs) constitute a platform that combines genome sequences and detailed biochemical information to quantify microbial physiology at the system level. To improve the unity, integrity, correctness, and format of data in published GSMMs, a consensus IMGMD database was built in the LAMP (Linux + Apache + MySQL + PHP) system by integrating and standardizing 328 GSMMs constructed for 139 microorganisms. The IMGMD database can help microbial researchers download manually curated GSMMs, rapidly reconstruct standard GSMMs, design pathways, and identify metabolic targets for strategies on strain improvement. Moreover, the IMGMD database facilitates the integration of wet-lab and in silico data to gain an additional insight into microbial physiology. The IMGMD database is freely available, without any registration requirements, at http://imgmd.jiangnan.edu.cn/database.

  19. Metabolic impact assessment for heterologous protein production in Streptomyces lividans based on genome-scale metabolic network modeling.

    Science.gov (United States)

    Lule, Ivan; D'Huys, Pieter-Jan; Van Mellaert, Lieve; Anné, Jozef; Bernaerts, Kristel; Van Impe, Jan

    2013-11-01

    The metabolic impact exerted on a microorganism due to heterologous protein production is still poorly understood in Streptomyces lividans. In this present paper, based on exometabolomic data, a proposed genome-scale metabolic network model is used to assess this metabolic impact in S. lividans. Constraint-based modeling results obtained in this work revealed that the metabolic impact due to heterologous protein production is widely distributed in the genome of S. lividans, causing both slow substrate assimilation and a shift in active pathways. Exchange fluxes that are critical for model performance have been identified for metabolites of mouse tumor necrosis factor, histidine, valine and lysine, as well as biomass. Our results unravel the interaction of heterologous protein production with intracellular metabolism of S. lividans, thus, a possible basis for further studies in relieving the metabolic burden via metabolic or bioprocess engineering. Copyright © 2013. Published by Elsevier Inc.

  20. Large Scale Sequencing of Dothideomycetes Provides Insights into Genome Evolution and Adaptation

    Energy Technology Data Exchange (ETDEWEB)

    Haridas, Sajeet; Crous, Pedro; Binder, Manfred; Spatafora, Joseph; Grigoriev, Igor

    2015-03-16

    Dothideomycetes is the largest and most diverse class of ascomycete fungi with 23 orders 110 families, 1300 genera and over 19,000 known species. We present comparative analysis of 70 Dothideomycete genomes including over 50 that we sequenced and are as yet unpublished. This extensive sampling has almost quadrupled the previous study of 18 species and uncovered a 10 fold range of genome sizes. We were able to clarify the phylogenetic positions of several species whose origins were unclear in previous morphological and sequence comparison studies. We analyzed selected gene families including proteases, transporters and small secreted proteins and show that major differences in gene content is influenced by speciation.

  1. Resuscitation Resequenced: A Rational Approach to Patients with Trauma in Shock.

    Science.gov (United States)

    Petrosoniak, Andrew; Hicks, Christopher

    2018-02-01

    Trauma resuscitation is a complex and dynamic process that requires a high-performing team to optimize patient outcomes. More than 30 years ago, Advanced Trauma Life Support was developed to formalize and standardize trauma care; however, the sequential nature of the algorithm that is used can lead to ineffective prioritization. An improved understanding of shock mandates an updated approach to trauma resuscitation. This article proposes a resequenced approach that (1) addresses immediate threats to life and (2) targets strategies for the diagnosis and management of shock causes. This updated approach emphasizes evidence-based resuscitation principles that align with physiologic priorities. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics.

    Science.gov (United States)

    Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter

    2015-01-20

    While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.

  3. Toward genome-scale models of the Chinese hamster ovary cells: incentives, status and perspectives

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Fan, Yuzhou; Weilguny, Dietmar

    2014-01-01

    Bioprocessing of the important Chinese hamster ovary (CHO) cell lines used for the production of biopharmaceuticals stands at the brink of several redefining events. In 2011, the field entered the genomics era, which has accelerated omics-based phenotyping of the cell lines. In this review we...

  4. Genome-scale transcriptional analyses of first-generation interspecific sunflower hybrids reveals broad regulatory compatibility.

    Science.gov (United States)

    Rowe, Heather C; Rieseberg, Loren H

    2013-05-23

    Interspecific hybridization creates individuals harboring diverged genomes. The interaction of these genomes can generate successful evolutionary novelty or disadvantageous genomic conflict. Annual sunflowers Helianthus annuus and H. petiolaris have a rich history of hybridization in natural populations. Although first-generation hybrids generally have low fertility, hybrid swarms that include later generation and fully fertile backcross plants have been identified, as well as at least three independently-originated stable hybrid taxa. We examine patterns of transcript accumulation in the earliest stages of hybridization of these species via analyses of transcriptome sequences from laboratory-derived F1 offspring of an inbred H. annuus cultivar and a wild H. petiolaris accession. While nearly 14% of the reference transcriptome showed significant accumulation differences between parental accessions, total F1 transcript levels showed little evidence of dominance, as midparent transcript levels were highly predictive of transcript accumulation in F1 plants. Allelic bias in F1 transcript accumulation was detected in 20% of transcripts containing sufficient polymorphism to distinguish parental alleles; however the magnitude of these biases were generally smaller than differences among parental accessions. While analyses of allelic bias suggest that cis regulatory differences between H. annuus and H. petiolaris are common, their effect on transcript levels may be more subtle than trans-acting regulatory differences. Overall, these analyses found little evidence of regulatory incompatibility or dominance interactions between parental genomes within F1 hybrid individuals, although it is unclear whether this is a legacy or an enabler of introgression between species.

  5. Reference genome assessment from a population scale perspective: an accurate profile of variability and noise.

    Science.gov (United States)

    Carbonell-Caballero, José; Amadoz, Alicia; Alonso, Roberto; Hidalgo, Marta R; Çubuk, Cankut; Conesa, David; López-Quílez, Antonio; Dopazo, Joaquín

    2017-11-15

    Current plant and animal genomic studies are often based on newly assembled genomes that have not been properly consolidated. In this scenario, misassembled regions can easily lead to false-positive findings. Despite quality control scores are included within genotyping protocols, they are usually employed to evaluate individual sample quality rather than reference sequence reliability. We propose a statistical model that combines quality control scores across samples in order to detect incongruent patterns at every genomic region. Our model is inherently robust since common artifact signals are expected to be shared between independent samples over misassembled regions of the genome. The reliability of our protocol has been extensively tested through different experiments and organisms with accurate results, improving state-of-the-art methods. Our analysis demonstrates synergistic relations between quality control scores and allelic variability estimators, that improve the detection of misassembled regions, and is able to find strong artifact signals even within the human reference assembly. Furthermore, we demonstrated how our model can be trained to properly rank the confidence of a set of candidate variants obtained from new independent samples. This tool is freely available at http://gitlab.com/carbonell/ces. jcarbonell.cipf@gmail.com or joaquin.dopazo@juntadeandalucia.es. Supplementary data are available at Bioinformatics online.

  6. Public participation in genomics research in the Netherlands: Validating a measurement scale

    NARCIS (Netherlands)

    Dijkstra, Anne M.; Gutteling, Jan M.; Swart, Jac A.A.; Wieringa, Nicolien F.; Van der Windt, Henny J.; Seydel, E.R.

    2012-01-01

    Nowadays, new technologies, like genomics, cannot be developed without the support of the public. However, although interested, the public does not always actively participate in science issues when offered the opportunity via public participation activities. In a study aimed at validating a

  7. Review: High-performance computing to detect epistasis in genome scale data sets.

    Science.gov (United States)

    Upton, Alex; Trelles, Oswaldo; Cornejo-García, José Antonio; Perkins, James Richard

    2016-05-01

    It is becoming clear that most human diseases have a complex etiology that cannot be explained by single nucleotide polymorphisms (SNPs) or simple additive combinations; the general consensus is that they are caused by combinations of multiple genetic variations. The limited success of some genome-wide association studies is partly a result of this focus on single genetic markers. A more promising approach is to take into account epistasis, by considering the association of multiple SNP interactions with disease. However, as genomic data continues to grow in resolution, and genome and exome sequencing become more established, the number of combinations of variants to consider increases rapidly. Two potential solutions should be considered: the use of high-performance computing, which allows us to consider a larger number of variables, and heuristics to make the solution more tractable, essential in the case of genome sequencing. In this review, we look at different computational methods to analyse epistatic interactions within disease-related genetic data sets created by microarray technology. We also review efforts to use epistatic analysis results to produce biomarkers for diagnostic tests and give our views on future directions in this field in light of advances in sequencing technology and variants in non-coding regions. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  8. The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics.

    Science.gov (United States)

    Gopalakrishnan, Shyam; Samaniego Castruita, Jose A; Sinding, Mikkel-Holger S; Kuderna, Lukas F K; Räikkönen, Jannikke; Petersen, Bent; Sicheritz-Ponten, Thomas; Larson, Greger; Orlando, Ludovic; Marques-Bonet, Tomas; Hansen, Anders J; Dalén, Love; Gilbert, M Thomas P

    2017-06-29

    An increasing number of studies are addressing the evolutionary genomics of dog domestication, principally through resequencing dog, wolf and related canid genomes. There is, however, only one de novo assembled canid genome currently available against which to map such data - that of a boxer dog (Canis lupus familiaris). We generated the first de novo wolf genome (Canis lupus lupus) as an additional choice of reference, and explored what implications may arise when previously published dog and wolf resequencing data are remapped to this reference. Reassuringly, we find that regardless of the reference genome choice, most evolutionary genomic analyses yield qualitatively similar results, including those exploring the structure between the wolves and dogs using admixture and principal component analysis. However, we do observe differences in the genomic coverage of re-mapped samples, the number of variants discovered, and heterozygosity estimates of the samples. In conclusion, the choice of reference is dictated by the aims of the study being undertaken; if the study focuses on the differences between the different dog breeds or the fine structure among dogs, then using the boxer reference genome is appropriate, but if the aim of the study is to look at the variation within wolves and their relationships to dogs, then there are clear benefits to using the de novo assembled wolf reference genome.

  9. Genome-scale metabolic model for Lactococcus lactis MG1363 and its application to the analysis of flavor formation.

    Science.gov (United States)

    Flahaut, Nicolas A L; Wiersma, Anne; van de Bunt, Bert; Martens, Dirk E; Schaap, Peter J; Sijtsma, Lolke; Dos Santos, Vitor A Martins; de Vos, Willem M

    2013-10-01

    Lactococcus lactis subsp. cremoris MG1363 is a paradigm strain for lactococci used in industrial dairy fermentations. However, despite of its importance for process development, no genome-scale metabolic model has been reported thus far. Moreover, current models for other lactococci only focus on growth and sugar degradation. A metabolic model that includes nitrogen metabolism and flavor-forming pathways is instrumental for the understanding and designing new industrial applications of these lactic acid bacteria. A genome-scale, constraint-based model of the metabolism and transport in L. lactis MG1363, accounting for 518 genes, 754 reactions, and 650 metabolites, was developed and experimentally validated. Fifty-nine reactions are directly or indirectly involved in flavor formation. Flux Balance Analysis and Flux Variability Analysis were used to investigate flux distributions within the whole metabolic network. Anaerobic carbon-limited continuous cultures were used for estimating the energetic parameters. A thorough model-driven analysis showing a highly flexible nitrogen metabolism, e.g., branched-chain amino acid catabolism which coupled with the redox balance, is pivotal for the prediction of the formation of different flavor compounds. Furthermore, the model predicted the formation of volatile sulfur compounds as a result of the fermentation. These products were subsequently identified in the experimental fermentations carried out. Thus, the genome-scale metabolic model couples the carbon and nitrogen metabolism in L. lactis MG1363 with complete known catabolic pathways leading to flavor formation. The model provided valuable insights into the metabolic networks underlying flavor formation and has the potential to contribute to new developments in dairy industries and cheese-flavor research.

  10. Comparison of histological grading and large-scale genomic status (DNA ploidy) as prognostic tools in oral dysplasia.

    Science.gov (United States)

    Sudbø, J; Bryne, M; Johannessen, A C; Kildal, W; Danielsen, H E; Reith, A

    2001-07-01

    Approximately one in ten oral white patches (leukoplakia) are histologically classified as dysplasia, with a well-documented potential for developing into oral squamous cell carcinoma (OSCC). Histological grading in oral dysplasia has limited prognostic value, whereas large-scale genomic status (DNA ploidy, nuclear DNA content) is an early marker of malignant transformation in several tissues. Biopsies from 196 patients with oral leukoplakia histologically typed as dysplasia were investigated. Inter-observer agreement among four experienced pathologists performing a simplified grading was assessed by Cohen's kappa values. For 150 of the 196 cases, it was also possible to assess large-scale genomic status and compare its prognostic impact with that of histological grading. Disease-free survival was estimated by life-table methods, with a mean follow-up time of 103 months (range 4-165 months). The primary considered end-point was the subsequent occurrence of OSCC. For grading of the total of 196 cases, kappa values ranged from 0.17 to 0.33 when three grading groups (mild, moderate, and severe dysplasia) were considered, and from 0.21 to 0.32 when two groups (low grade and high grade) were considered (p=0.41). For the 150 cases in which large-scale genomic status was also assessed, kappa values for the histological grading ranged from 0.21 to 0.33 for three grading groups and from 0.27 to 0.34 for two grading groups (p=0.47). In survival analysis, histological grading was without significant prognostic value for any of the four observers (p 0.14-0.44), in contrast to DNA ploidy (p=0.001). It is concluded that DNA ploidy in oral dysplasia has a practical prognostic value, unlike histological grading of the same lesions. Copyright 2001 John Wiley & Sons, Ltd.

  11. Reconstruction of an integrated genome-scale co-expression network reveals key modules involved in lung adenocarcinoma.

    Science.gov (United States)

    Bidkhori, Gholamreza; Narimani, Zahra; Hosseini Ashtiani, Saman; Moeini, Ali; Nowzari-Dalini, Abbas; Masoudi-Nejad, Ali

    2013-01-01

    Our goal of this study was to reconstruct a "genome-scale co-expression network" and find important modules in lung adenocarcinoma so that we could identify the genes involved in lung adenocarcinoma. We integrated gene mutation, GWAS, CGH, array-CGH and SNP array data in order to identify important genes and loci in genome-scale. Afterwards, on the basis of the identified genes a co-expression network was reconstructed from the co-expression data. The reconstructed network was named "genome-scale co-expression network". As the next step, 23 key modules were disclosed through clustering. In this study a number of genes have been identified for the first time to be implicated in lung adenocarcinoma by analyzing the modules. The genes EGFR, PIK3CA, TAF15, XIAP, VAPB, Appl1, Rab5a, ARF4, CLPTM1L, SP4, ZNF124, LPP, FOXP1, SOX18, MSX2, NFE2L2, SMARCC1, TRA2B, CBX3, PRPF6, ATP6V1C1, MYBBP1A, MACF1, GRM2, TBXA2R, PRKAR2A, PTK2, PGF and MYO10 are among the genes that belong to modules 1 and 22. All these genes, being implicated in at least one of the phenomena, namely cell survival, proliferation and metastasis, have an over-expression pattern similar to that of EGFR. In few modules, the genes such as CCNA2 (Cyclin A2), CCNB2 (Cyclin B2), CDK1, CDK5, CDC27, CDCA5, CDCA8, ASPM, BUB1, KIF15, KIF2C, NEK2, NUSAP1, PRC1, SMC4, SYCE2, TFDP1, CDC42 and ARHGEF9 are present that play a crucial role in cell cycle progression. In addition to the mentioned genes, there are some other genes (i.e. DLGAP5, BIRC5, PSMD2, Src, TTK, SENP2, PSMD2, DOK2, FUS and etc.) in the modules.

  12. Liquid-phase sequence capture and targeted re-sequencing revealed novel polymorphisms in tomato genes belonging to the MEP carotenoid pathway.

    Science.gov (United States)

    Terracciano, Irma; Cantarella, Concita; Fasano, Carlo; Cardi, Teodoro; Mennella, Giuseppe; D'Agostino, Nunzio

    2017-07-17

    Tomato (Solanum lycopersicum L.) plants are characterized by having a variety of fruit colours that reflect the composition and accumulation of diverse carotenoids in the berries. Carotenoids are extensively studied for their health-promoting effects and this explains the great attention these pigments received by breeders and researchers worldwide. In this work we applied Agilent's SureSelect liquid-phase sequence capture and Illumina targeted re-sequencing of 34 tomato genes belonging to the methylerythritol phosphate (MEP) carotenoid pathway on a panel of 48 genotypes which differ for carotenoid content calculated as the sum of β-carotene, cis- and trans-lycopene. We targeted 230 kb of genomic regions including all exons and regulatory regions and observed ~40% of on-target capture. We found ample genetic variation among all the genotypes under study and generated an extensive catalog of SNPs/InDels located in both genic and regulatory regions. SNPs/InDels were also classified based on genomic location and putative biological effect. With our work we contributed to the identification of allelic variations possibly underpinning a key agronomic trait in tomato. Results from this study can be exploited for the promotion of novel studies on tomato bio-fortification as well as of breeding programs related to carotenoid accumulation in fruits.

  13. A New Perspective on Polyploid Fragaria (Strawberry) Genome Composition Based on Large-Scale, Multi-Locus Phylogenetic Analysis.

    Science.gov (United States)

    Yang, Yilong; Davis, Thomas M

    2017-12-01

    The subgenomic compositions of the octoploid (2n = 8× = 56) strawberry (Fragaria) species, including the economically important cultivated species Fragaria x ananassa, have been a topic of long-standing interest. Phylogenomic approaches utilizing next-generation sequencing technologies offer a new window into species relationships and the subgenomic compositions of polyploids. We have conducted a large-scale phylogenetic analysis of Fragaria (strawberry) species using the Fluidigm Access Array system and 454 sequencing platform. About 24 single-copy or low-copy nuclear genes distributed across the genome were amplified and sequenced from 96 genomic DNA samples representing 16 Fragaria species from diploid (2×) to decaploid (10×), including the most extensive sampling of octoploid taxa yet reported. Individual gene trees were constructed by different tree-building methods. Mosaic genomic structures of diploid Fragaria species consisting of sequences at different phylogenetic positions were observed. Our findings support the presence in octoploid species of genetic signatures from at least five diploid ancestors (F. vesca, F. iinumae, F. bucharica, F. viridis, and at least one additional allele contributor of unknown identity), and questions the extent to which distinct subgenomes are preserved over evolutionary time in the allopolyploid Fragaria species. In addition, our data support divergence between the two wild octoploid species, F. virginiana and F. chiloensis. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

    Directory of Open Access Journals (Sweden)

    Edgar E. Lara-Ramírez

    2014-01-01

    Full Text Available The increasing number of dengue virus (DENV genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4 has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3 as well as the effective number of codons (ENC, ENCp versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA and clustering analysis on relative synonymous codon usage (RSCU within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution.

  15. Analysing 454 amplicon resequencing experiments using the modular and database oriented Variant Identification Pipeline

    Directory of Open Access Journals (Sweden)

    Deforce Dieter

    2010-05-01

    Full Text Available Abstract Background Next-generation amplicon sequencing enables high-throughput genetic diagnostics, sequencing multiple genes in several patients together in one sequencing run. Currently, no open-source out-of-the-box software solution exists that reliably reports detected genetic variations and that can be used to improve future sequencing effectiveness by analyzing the PCR reactions. Results We developed an integrated database oriented software pipeline for analysis of 454/Roche GS-FLX amplicon resequencing experiments using Perl and a relational database. The pipeline enables variation detection, variation detection validation, and advanced data analysis, which provides information that can be used to optimize PCR efficiency using traditional means. The modular approach enables customization of the pipeline where needed and allows researchers to adopt their analysis pipeline to their experiments. Clear documentation and training data is available to test and validate the pipeline prior to using it on real sequencing data. Conclusions We designed an open-source database oriented pipeline that enables advanced analysis of 454/Roche GS-FLX amplicon resequencing experiments using SQL-statements. This modular database approach allows easy coupling with other pipeline modules such as variant interpretation or a LIMS system. There is also a set of standard reporting scripts available.

  16. Analysing 454 amplicon resequencing experiments using the modular and database oriented Variant Identification Pipeline.

    Science.gov (United States)

    De Schrijver, Joachim M; De Leeneer, Kim; Lefever, Steve; Sabbe, Nick; Pattyn, Filip; Van Nieuwerburgh, Filip; Coucke, Paul; Deforce, Dieter; Vandesompele, Jo; Bekaert, Sofie; Hellemans, Jan; Van Criekinge, Wim

    2010-05-20

    Next-generation amplicon sequencing enables high-throughput genetic diagnostics, sequencing multiple genes in several patients together in one sequencing run. Currently, no open-source out-of-the-box software solution exists that reliably reports detected genetic variations and that can be used to improve future sequencing effectiveness by analyzing the PCR reactions. We developed an integrated database oriented software pipeline for analysis of 454/Roche GS-FLX amplicon resequencing experiments using Perl and a relational database. The pipeline enables variation detection, variation detection validation, and advanced data analysis, which provides information that can be used to optimize PCR efficiency using traditional means. The modular approach enables customization of the pipeline where needed and allows researchers to adopt their analysis pipeline to their experiments. Clear documentation and training data is available to test and validate the pipeline prior to using it on real sequencing data. We designed an open-source database oriented pipeline that enables advanced analysis of 454/Roche GS-FLX amplicon resequencing experiments using SQL-statements. This modular database approach allows easy coupling with other pipeline modules such as variant interpretation or a LIMS system. There is also a set of standard reporting scripts available.

  17. Genome-scale transcriptional analyses of first-generation interspecific sunflower hybrids reveals broad regulatory compatibility

    Science.gov (United States)

    2013-01-01

    Background Interspecific hybridization creates individuals harboring diverged genomes. The interaction of these genomes can generate successful evolutionary novelty or disadvantageous genomic conflict. Annual sunflowers Helianthus annuus and H. petiolaris have a rich history of hybridization in natural populations. Although first-generation hybrids generally have low fertility, hybrid swarms that include later generation and fully fertile backcross plants have been identified, as well as at least three independently-originated stable hybrid taxa. We examine patterns of transcript accumulation in the earliest stages of hybridization of these species via analyses of transcriptome sequences from laboratory-derived F1 offspring of an inbred H. annuus cultivar and a wild H. petiolaris accession. Results While nearly 14% of the reference transcriptome showed significant accumulation differences between parental accessions, total F1 transcript levels showed little evidence of dominance, as midparent transcript levels were highly predictive of transcript accumulation in F1 plants. Allelic bias in F1 transcript accumulation was detected in 20% of transcripts containing sufficient polymorphism to distinguish parental alleles; however the magnitude of these biases were generally smaller than differences among parental accessions. Conclusions While analyses of allelic bias suggest that cis regulatory differences between H. annuus and H. petiolaris are common, their effect on transcript levels may be more subtle than trans-acting regulatory differences. Overall, these analyses found little evidence of regulatory incompatibility or dominance interactions between parental genomes within F1 hybrid individuals, although it is unclear whether this is a legacy or an enabler of introgression between species. PMID:23701699

  18. Urban landscape genomics identifies fine-scale gene flow patterns in an avian invasive.

    Science.gov (United States)

    Low, G W; Chattopadhyay, B; Garg, K M; Irestedt, M; Ericson, Pgp; Yap, G; Tang, Q; Wu, S; Rheindt, F E

    2018-01-01

    Invasive species exert a serious impact on native fauna and flora and have been the target of many eradication and management efforts worldwide. However, a lack of data on population structure and history, exacerbated by the recency of many species introductions, limits the efficiency with which such species can be kept at bay. In this study we generated a novel genome of high assembly quality and genotyped 4735 genome-wide single nucleotide polymorphic (SNP) markers from 78 individuals of an invasive population of the Javan Myna Acridotheres javanicus across the island of Singapore. We inferred limited population subdivision at a micro-geographic level, a genetic patch size (~13-14 km) indicative of a pronounced dispersal ability, and barely an increase in effective population size since introduction despite an increase of four to five orders of magnitude in actual population size, suggesting that low population-genetic diversity following a bottleneck has not impeded establishment success. Landscape genomic analyses identified urban features, such as low-rise neighborhoods, that constitute pronounced barriers to gene flow. Based on our data, we consider an approach targeting the complete eradication of Javan Mynas across Singapore to be unfeasible. Instead, a mixed approach of localized mitigation measures taking into account urban geographic features and planning policy may be the most promising avenue to reducing the adverse impacts of this urban pest. Our study demonstrates how genomic methods can directly inform the management and control of invasive species, even in geographically limited datasets with high gene flow rates.

  19. Reconstructing the Backbone of the Saccharomycotina Yeast Phylogeny Using Genome-Scale Data

    Directory of Open Access Journals (Sweden)

    Xing-Xing Shen

    2016-12-01

    Full Text Available Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence and two data matrices (amino acids or the first two codon positions yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.

  20. Meeting the challenges of non-referenced genome assembly from short-read sequence data

    Science.gov (United States)

    M. Parks; A. Liston; R. Cronn

    2010-01-01

    Massively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great...

  1. Translational genomics for analysis of complex traits in peanut and sorghum

    Science.gov (United States)

    The integration of sequencing and genotype data from natural variation studies (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) facilitated the development of DNA markers in the form of single nucleotide polymorphic (SNP)...

  2. Integrated and translational genomics for analysis of complex traits in crops

    Science.gov (United States)

    We report here on integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of translating gems from these resources into useable DNA markers in the ...

  3. A New and Improved Rainbow Trout (Oncorhynchus mykiss) Reference Genome Assembly

    Science.gov (United States)

    In an effort to improve the rainbow trout reference genome assembly, we re-sequenced the doubled-haploid Swanson line using the longest available reads from the Illumina technology; generating over 510 million paired-end shotgun reads (2x260nt), and 1 billion mate-pair reads (2x160nt) from four sequ...

  4. The RAVEN toolbox and its use for generating a genome-scale metabolic model for Penicillium chrysogenum.

    Directory of Open Access Journals (Sweden)

    Rasmus Agren

    Full Text Available We present the RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology. The RAVEN Toolbox workflow was applied in order to reconstruct a genome-scale metabolic model for the important microbial cell factory Penicillium chrysogenum Wisconsin54-1255. The model was validated in a bibliomic study of in total 440 references, and it comprises 1471 unique biochemical reactions and 1006 ORFs. It was then used to study the roles of ATP and NADPH in the biosynthesis of penicillin, and to identify potential metabolic engineering targets for maximization of penicillin production.

  5. The RAVEN toolbox and its use for generating a genome-scale metabolic model for Penicillium chrysogenum.

    Science.gov (United States)

    Agren, Rasmus; Liu, Liming; Shoaie, Saeed; Vongsangnak, Wanwipa; Nookaew, Intawat; Nielsen, Jens

    2013-01-01

    We present the RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology. The RAVEN Toolbox workflow was applied in order to reconstruct a genome-scale metabolic model for the important microbial cell factory Penicillium chrysogenum Wisconsin54-1255. The model was validated in a bibliomic study of in total 440 references, and it comprises 1471 unique biochemical reactions and 1006 ORFs. It was then used to study the roles of ATP and NADPH in the biosynthesis of penicillin, and to identify potential metabolic engineering targets for maximization of penicillin production.

  6. Reconstruction of genome-scale active metabolic networks for 69 human cell types and 16 cancer types using INIT.

    Directory of Open Access Journals (Sweden)

    Rasmus Agren

    Full Text Available Development of high throughput analytical methods has given physicians the potential access to extensive and patient-specific data sets, such as gene sequences, gene expression profiles or metabolite footprints. This opens for a new approach in health care, which is both personalized and based on system-level analysis. Genome-scale metabolic networks provide a mechanistic description of the relationships between different genes, which is valuable for the analysis and interpretation of large experimental data-sets. Here we describe the generation of genome-scale active metabolic networks for 69 different cell types and 16 cancer types using the INIT (Integrative Network Inference for Tissues algorithm. The INIT algorithm uses cell type specific information about protein abundances contained in the Human Proteome Atlas as the main source of evidence. The generated models constitute the first step towards establishing a Human Metabolic Atlas, which will be a comprehensive description (accessible online of the metabolism of different human cell types, and will allow for tissue-level and organism-level simulations in order to achieve a better understanding of complex diseases. A comparative analysis between the active metabolic networks of cancer types and healthy cell types allowed for identification of cancer-specific metabolic features that constitute generic potential drug targets for cancer treatment.

  7. Sequential computation of elementary modes and minimal cut sets in genome-scale metabolic networks using alternate integer linear programming

    Energy Technology Data Exchange (ETDEWEB)

    Song, Hyun-Seob; Goldberg, Noam; Mahajan, Ashutosh; Ramkrishna, Doraiswami

    2017-03-27

    Elementary (flux) modes (EMs) have served as a valuable tool for investigating structural and functional properties of metabolic networks. Identification of the full set of EMs in genome-scale networks remains challenging due to combinatorial explosion of EMs in complex networks. It is often, however, that only a small subset of relevant EMs needs to be known, for which optimization-based sequential computation is a useful alternative. Most of the currently available methods along this line are based on the iterative use of mixed integer linear programming (MILP), the effectiveness of which significantly deteriorates as the number of iterations builds up. To alleviate the computational burden associated with the MILP implementation, we here present a novel optimization algorithm termed alternate integer linear programming (AILP). Results: Our algorithm was designed to iteratively solve a pair of integer programming (IP) and linear programming (LP) to compute EMs in a sequential manner. In each step, the IP identifies a minimal subset of reactions, the deletion of which disables all previously identified EMs. Thus, a subsequent LP solution subject to this reaction deletion constraint becomes a distinct EM. In cases where no feasible LP solution is available, IP-derived reaction deletion sets represent minimal cut sets (MCSs). Despite the additional computation of MCSs, AILP achieved significant time reduction in computing EMs by orders of magnitude. The proposed AILP algorithm not only offers a computational advantage in the EM analysis of genome-scale networks, but also improves the understanding of the linkage between EMs and MCSs.

  8. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments.

    Science.gov (United States)

    Monk, Jonathan M; Charusanti, Pep; Aziz, Ramy K; Lerman, Joshua A; Premyodhin, Ned; Orth, Jeffrey D; Feist, Adam M; Palsson, Bernhard Ø

    2013-12-10

    Genome-scale models (GEMs) of metabolism were constructed for 55 fully sequenced Escherichia coli and Shigella strains. The GEMs enable a systems approach to characterizing the pan and core metabolic capabilities of the E. coli species. The majority of pan metabolic content was found to consist of alternate catabolic pathways for unique nutrient sources. The GEMs were then used to systematically analyze growth capabilities in more than 650 different growth-supporting environments. The results show that unique strain-specific metabolic capabilities correspond to pathotypes and environmental niches. Twelve of the GEMs were used to predict growth on six differentiating nutrients, and the predictions were found to agree with 80% of experimental outcomes. Additionally, GEMs were used to predict strain-specific auxotrophies. Twelve of the strains modeled were predicted to be auxotrophic for vitamins niacin (vitamin B3), thiamin (vitamin B1), or folate (vitamin B9). Six of the strains modeled have lost biosynthetic pathways for essential amino acids methionine, tryptophan, or leucine. Genome-scale analysis of multiple strains of a species can thus be used to define the metabolic essence of a microbial species and delineate growth differences that shed light on the adaptation process to a particular microenvironment.

  9. Construction of a Genome-Scale Metabolic Model of Arthrospira platensis NIES-39 and Metabolic Design for Cyanobacterial Bioproduction.

    Directory of Open Access Journals (Sweden)

    Katsunori Yoshikawa

    Full Text Available Arthrospira (Spirulina platensis is a promising feedstock and host strain for bioproduction because of its high accumulation of glycogen and superior characteristics for industrial production. Metabolic simulation using a genome-scale metabolic model and flux balance analysis is a powerful method that can be used to design metabolic engineering strategies for the improvement of target molecule production. In this study, we constructed a genome-scale metabolic model of A. platensis NIES-39 including 746 metabolic reactions and 673 metabolites, and developed novel strategies to improve the production of valuable metabolites, such as glycogen and ethanol. The simulation results obtained using the metabolic model showed high consistency with experimental results for growth rates under several trophic conditions and growth capabilities on various organic substrates. The metabolic model was further applied to design a metabolic network to improve the autotrophic production of glycogen and ethanol. Decreased flux of reactions related to the TCA cycle and phosphoenolpyruvate reaction were found to improve glycogen production. Furthermore, in silico knockout simulation indicated that deletion of genes related to the respiratory chain, such as NAD(PH dehydrogenase and cytochrome-c oxidase, could enhance ethanol production by using ammonium as a nitrogen source.

  10. A genome-scale DNA repair RNAi screen identifies SPG48 as a novel gene associated with hereditary spastic paraplegia.

    Directory of Open Access Journals (Sweden)

    Mikołaj Słabicki

    Full Text Available DNA repair is essential to maintain genome integrity, and genes with roles in DNA repair are frequently mutated in a variety of human diseases. Repair via homologous recombination typically restores the original DNA sequence without introducing mutations, and a number of genes that are required for homologous recombination DNA double-strand break repair (HR-DSBR have been identified. However, a systematic analysis of this important DNA repair pathway in mammalian cells has not been reported. Here, we describe a genome-scale endoribonuclease-prepared short interfering RNA (esiRNA screen for genes involved in DNA double strand break repair. We report 61 genes that influenced the frequency of HR-DSBR and characterize in detail one of the genes that decreased the frequency of HR-DSBR. We show that the gene KIAA0415 encodes a putative helicase that interacts with SPG11 and SPG15, two proteins mutated in hereditary spastic paraplegia (HSP. We identify mutations in HSP patients, discovering KIAA0415/SPG48 as a novel HSP-associated gene, and show that a KIAA0415/SPG48 mutant cell line is more sensitive to DNA damaging drugs. We present the first genome-scale survey of HR-DSBR in mammalian cells providing a dataset that should accelerate the discovery of novel genes with roles in DNA repair and associated medical conditions. The discovery that proteins forming a novel protein complex are required for efficient HR-DSBR and are mutated in patients suffering from HSP suggests a link between HSP and DNA repair.

  11. iOD907, the first genome-scale metabolic model for the milk yeast Kluyveromyces lactis.

    Science.gov (United States)

    Dias, Oscar; Pereira, Rui; Gombert, Andreas K; Ferreira, Eugénio C; Rocha, Isabel

    2014-06-01

    We describe here the first genome-scale metabolic model of Kluyveromyces lactis, iOD907. It is partially compartmentalized (four compartments), composed of 1867 reactions and 1476 metabolites. The iOD907 model performed well when comparing the positive growth of K. lactis to Biolog experiments and to an online catalogue of strains that provides information on carbon sources in which K. lactis is able to grow. Chemostat experiments were used to adjust non-growth-associated energy requirements, and the model proved accurate when predicting the biomass, oxygen and carbon dioxide yields. When compared to published experiments, in silico knockouts accurately predicted in vivo phenotypes. The iOD907 genome-scale metabolic model complies with the MIRIAM (minimum information required for the annotation of biochemical models) standards for the annotation of enzymes, transporters, metabolites and reactions. Moreover, it contains direct links to Kyoto encyclopedia of genes and genomes (KEGG; for enzymes, metabolites and reactions) and to the Transporters Classification Database (TCDB) for transporters, allowing easy comparisons to other models. Furthermore, this model is provided in the well-established systems biology markup language (SBML) format, which means that it can be used in most metabolic engineering platforms, such as OptFlux or Cobra. The model is able to predict the behavior of K. lactis under different environmental conditions and genetic perturbations. Furthermore, by performing simulations and optimizations, it can be important in the design of minimal media and will allow insights on the milk yeast's metabolism, as well as identifying metabolic engineering targets for improving the production of products of interest. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  12. Comprehensive reconstruction and in silico analysis of Aspergillus niger genome-scale metabolic network model that accounts for 1210 ORFs.

    Science.gov (United States)

    Lu, Hongzhong; Cao, Weiqiang; Ouyang, Liming; Xia, Jianye; Huang, Mingzhi; Chu, Ju; Zhuang, Yingping; Zhang, Siliang; Noorman, Henk

    2017-03-01

    Aspergillus niger is one of the most important cell factories for industrial enzymes and organic acids production. A comprehensive genome-scale metabolic network model (GSMM) with high quality is crucial for efficient strain improvement and process optimization. The lack of accurate reaction equations and gene-protein-reaction associations (GPRs) in the current best model of A. niger named GSMM iMA871, however, limits its application scope. To overcome these limitations, we updated the A. niger GSMM by combining the latest genome annotation and literature mining technology. Compared with iMA871, the number of reactions in iHL1210 was increased from 1,380 to 1,764, and the number of unique ORFs from 871 to 1,210. With the aid of our transcriptomics analysis, the existence of 63% ORFs and 68% reactions in iHL1210 can be verified when glucose was used as the only carbon source. Physiological data from chemostat cultivations, 13 C-labeled and molecular experiments from the published literature were further used to check the performance of iHL1210. The average correlation coefficients between the predicted fluxes and estimated fluxes from 13 C-labeling data were sufficiently high (above 0.89) and the prediction of cell growth on most of the reported carbon and nitrogen sources was consistent. Using the updated genome-scale model, we evaluated gene essentiality on synthetic and yeast extract medium, as well as the effects of NADPH supply on glucoamylase production in A. niger. In summary, the new A. niger GSMM iHL1210 contains significant improvements with respect to the metabolic coverage and prediction performance, which paves the way for systematic metabolic engineering of A. niger. Biotechnol. Bioeng. 2017;114: 685-695. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  13. Genetical Genomics Reveals Large Scale Genotype-By-Environment Interactions in Arabidopsis thaliana.

    Science.gov (United States)

    Snoek, L Basten; Terpstra, Inez R; Dekter, René; Van den Ackerveken, Guido; Peeters, Anton J M

    2012-01-01

    One of the major goals of quantitative genetics is to unravel the complex interactions between molecular genetic factors and the environment. The effects of these genotype-by-environment interactions also affect and cause variation in gene expression. The regulatory loci responsible for this variation can be found by genetical genomics that involves the mapping of quantitative trait loci (QTLs) for gene expression traits also called expression-QTL (eQTLs). Most genetical genomics experiments published so far, are performed in a single environment and hence do not allow investigation of the role of genotype-by-environment interactions. Furthermore, most studies have been done in a steady state environment leading to acclimated expression patterns. However a response to the environment or change therein can be highly plastic and possibly lead to more and larger differences between genotypes. Here we present a genetical genomics study on 120 Arabidopsis thaliana, Landsberg erecta × Cape Verde Islands, recombinant inbred lines (RILs) in active response to the environment by treating them with 3 h of shade. The results of this experiment are compared to a previous study on seedlings of the same RILs from a steady state environment. The combination of two highly different conditions but exactly the same RILs with a fixed genetic variation showed the large role of genotype-by-environment interactions on gene expression levels. We found environment-dependent hotspots of transcript regulation. The major hotspot was confirmed by the expression profile of a near isogenic line. Our combined analysis leads us to propose CSN5A, a COP9 signalosome component, as a candidate regulator for the gene expression response to shade.

  14. Genetical genomics reveals large scale genotype-by-environment interactions in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    L. Basten eSnoek

    2013-01-01

    Full Text Available One of the major goals of quantitative genetics is to unravel the complex interactions between molecular genetic factors and the environment. The effects of these genotype-by-environment interactions also affect and cause variation in gene expression. The regulatory loci responsible for this variation can be found by genetical genomics that involves the mapping of quantitative trait loci (QTLs for gene expression traits also called expression QTL (eQTLs. Most genetical genomics experiments published so far, are performed in a single environment and hence do not allow investigation of the role of genotype-by-environment interactions. Furthermore, most studies have been done in a steady state environment leading to acclimated expression patterns. However a response to the environment or change therein can be highly plastic and possibly lead to more and larger differences between genotypes. Here we present a genetical genomics study on 120 Arabidopsis thaliana, Landsberg erecta x Cape Verde Islands, recombinant inbred lines (RILs in active response to the environment by treating them with 3 hours of shade. The results of this experiment are compared to a previous study on seedlings of the same RILs from a steady state environment. The combination of two highly different conditions but exactly the same RILs with a fixed genetic variation showed the large role of genotype-by-environment interactions on gene expression levels.We found environment-dependent hotspots of transcript regulation. The major hotspot was confirmed by the expression profile of a near isogenic line. Our combined analysis leads us to propose CSN5A, a COP9 signalosome component, as a candidate regulator for the gene expression response to shade.

  15. Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation

    Directory of Open Access Journals (Sweden)

    Li Sen

    2012-03-01

    Full Text Available Abstract Background The Approximate Bayesian Computation (ABC approach has been used to infer demographic parameters for numerous species, including humans. However, most applications of ABC still use limited amounts of data, from a small number of loci, compared to the large amount of genome-wide population-genetic data which have become available in the last few years. Results We evaluated the performance of the ABC approach for three 'population divergence' models - similar to the 'isolation with migration' model - when the data consists of several hundred thousand SNPs typed for multiple individuals by simulating data from known demographic models. The ABC approach was used to infer demographic parameters of interest and we compared the inferred values to the true parameter values that was used to generate hypothetical "observed" data. For all three case models, the ABC approach inferred most demographic parameters quite well with narrow credible intervals, for example, population divergence times and past population sizes, but some parameters were more difficult to infer, such as population sizes at present and migration rates. We compared the ability of different summary statistics to infer demographic parameters, including haplotype and LD based statistics, and found that the accuracy of the parameter estimates can be improved by combining summary statistics that capture different parts of information in the data. Furthermore, our results suggest that poor choices of prior distributions can in some circumstances be detected using ABC. Finally, increasing the amount of data beyond some hundred loci will substantially improve the accuracy of many parameter estimates using ABC. Conclusions We conclude that the ABC approach can accommodate realistic genome-wide population genetic data, which may be difficult to analyze with full likelihood approaches, and that the ABC can provide accurate and precise inference of demographic parameters from

  16. Genome-scale identification of Legionella pneumophila effectors using a machine learning approach.

    Directory of Open Access Journals (Sweden)

    David Burstein

    2009-07-01

    Full Text Available A large number of highly pathogenic bacteria utilize secretion systems to translocate effector proteins into host cells. Using these effectors, the bacteria subvert host cell processes during infection. Legionella pneumophila translocates effectors via the Icm/Dot type-IV secretion system and to date, approximately 100 effectors have been identified by various experimental and computational techniques. Effector identification is a critical first step towards the understanding of the pathogenesis system in L. pneumophila as well as in other bacterial pathogens. Here, we formulate the task of effector identification as a classification problem: each L. pneumophila open reading frame (ORF was classified as either effector or not. We computationally defined a set of features that best distinguish effectors from non-effectors. These features cover a wide range of characteristics including taxonomical dispersion, regulatory data, genomic organization, similarity to eukaryotic proteomes and more. Machine learning algorithms utilizing these features were then applied to classify all the ORFs within the L. pneumophila genome. Using this approach we were able to predict and experimentally validate 40 new effectors, reaching a success rate of above 90%. Increasing the number of validated effectors to around 140, we were able to gain novel insights into their characteristics. Effectors were found to have low G+C content, supporting the hypothesis that a large number of effectors originate via horizontal gene transfer, probably from their protozoan host. In addition, effectors were found to cluster in specific genomic regions. Finally, we were able to provide a novel description of the C-terminal translocation signal required for effector translocation by the Icm/Dot secretion system. To conclude, we have discovered 40 novel L. pneumophila effectors, predicted over a hundred additional highly probable effectors, and shown the applicability of machine

  17. Re-annotation, improved large-scale assembly and establishment of a catalogue of noncoding loci for the genome of the model brown alga Ectocarpus.

    Science.gov (United States)

    Cormier, Alexandre; Avia, Komlan; Sterck, Lieven; Derrien, Thomas; Wucher, Valentin; Andres, Gwendoline; Monsoor, Misharl; Godfroy, Olivier; Lipinska, Agnieszka; Perrineau, Marie-Mathilde; Van De Peer, Yves; Hitte, Christophe; Corre, Erwan; Coelho, Susana M; Cock, J Mark

    2017-04-01

    The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  18. Smallpox virus resequencing GeneChips can also rapidly ascertain species status for some zoonotic non-variola orthopoxviruses.

    Science.gov (United States)

    Sulaiman, Irshad M; Sammons, Scott A; Wohlhueter, Robert M

    2008-04-01

    We recently developed a set of seven resequencing GeneChips for the rapid sequencing of Variola virus strains in the WHO Repository of the Centers for Disease Control and Prevention. In this study, we attempted to hybridize these GeneChips with some known non-Variola orthopoxvirus isolates, including monkeypox, cowpox, and vaccinia viruses, for rapid detection.

  19. Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

    Directory of Open Access Journals (Sweden)

    In Young Choi

    2013-12-01

    Full Text Available The advances in electronic medical records (EMRs and bioinformatics (BI represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

  20. Genome-scale transcriptomic insights into early-stage fruit development in woodland strawberry Fragaria vesca.

    Science.gov (United States)

    Kang, Chunying; Darwish, Omar; Geretz, Aviva; Shahan, Rachel; Alkharouf, Nadim; Liu, Zhongchi

    2013-06-01

    Fragaria vesca, a diploid woodland strawberry with a small and sequenced genome, is an excellent model for studying fruit development. The strawberry fruit is unique in that the edible flesh is actually enlarged receptacle tissue. The true fruit are the numerous dry achenes dotting the receptacle's surface. Auxin produced from the achene is essential for the receptacle fruit set, a paradigm for studying crosstalk between hormone signaling and development. To investigate the molecular mechanism underlying strawberry fruit set, next-generation sequencing was employed to profile early-stage fruit development with five fruit tissue types and five developmental stages from floral anthesis to enlarged fruits. This two-dimensional data set provides a systems-level view of molecular events with precise spatial and temporal resolution. The data suggest that the endosperm and seed coat may play a more prominent role than the embryo in auxin and gibberellin biosynthesis for fruit set. A model is proposed to illustrate how hormonal signals produced in the endosperm and seed coat coordinate seed, ovary wall, and receptacle fruit development. The comprehensive fruit transcriptome data set provides a wealth of genomic resources for the strawberry and Rosaceae communities as well as unprecedented molecular insight into fruit set and early stage fruit development.

  1. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models.

    Science.gov (United States)

    Yousefi, Safoora; Amrollahi, Fatemeh; Amgad, Mohamed; Dong, Chengliang; Lewis, Joshua E; Song, Congzheng; Gutman, David A; Halani, Sameer H; Velazquez Vega, Jose Enrique; Brat, Daniel J; Cooper, Lee A D

    2017-09-15

    Translating the vast data generated by genomic platforms into accurate predictions of clinical outcomes is a fundamental challenge in genomic medicine. Many prediction methods face limitations in learning from the high-dimensional profiles generated by these platforms, and rely on experts to hand-select a small number of features for training prediction models. In this paper, we demonstrate how deep learning and Bayesian optimization methods that have been remarkably successful in general high-dimensional prediction tasks can be adapted to the problem of predicting cancer outcomes. We perform an extensive comparison of Bayesian optimized deep survival models and other state of the art machine learning methods for survival analysis, and describe a framework for interpreting deep survival models using a risk backpropagation technique. Finally, we illustrate that deep survival models can successfully transfer information across diseases to improve prognostic accuracy. We provide an open-source software implementation of this framework called SurvivalNet that enables automatic training, evaluation and interpretation of deep survival models.

  2. Chromosome scale genome assembly and transcriptome profiling of Nannochloropsis gaditana in nitrogen depletion.

    Science.gov (United States)

    Corteggiani Carpinelli, Elisa; Telatin, Andrea; Vitulo, Nicola; Forcato, Claudio; D'Angelo, Michela; Schiavon, Riccardo; Vezzi, Alessandro; Giacometti, Giorgio Mario; Morosinotto, Tomas; Valle, Giorgio

    2014-02-01

    Nannochloropsis is rapidly emerging as a model organism for the study of biofuel production in microalgae. Here, we report a high-quality genomic assembly of Nannochloropsis gaditana, consisting of large contigs, up to 500 kbp long, and scaffolds that in most cases span the entire length of the chromosomes. We identified 10646 complete genes and characterized possible alternative transcripts. The annotation of the predicted genes and the analysis of cellular processes revealed traits relevant for the genetic improvement of this organism such as genes involved in DNA recombination, RNA silencing, and cell wall synthesis. We also analyzed the modification of the transcriptional profile in nitrogen deficiency-a condition known to stimulate lipid accumulation. While the content of lipids increased, we did not detect major changes in expression of the genes involved in their biosynthesis. At the same time, we observed a very