WorldWideScience

Sample records for genomics based analysis

  1. Genomic analysis of plant chromosomes based on meiotic pairing

    Directory of Open Access Journals (Sweden)

    Lisete Chamma Davide

    2007-12-01

    Full Text Available This review presents the principles and applications of classical genomic analysis, with emphasis on plant breeding. The main mathematical models used to estimate the preferential chromosome pairing in diploid or polyploid, interspecific or intergenera hybrids are presented and discussed, with special reference to the applications and studies for the definition of genome relationships among species of the Poaceae family.

  2. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.

    Science.gov (United States)

    Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

    2015-01-16

    Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the

  3. Analysis of chimpanzee history based on genome sequence alignments.

    Directory of Open Access Journals (Sweden)

    Jennifer L Caswell

    2008-04-01

    Full Text Available Population geneticists often study small numbers of carefully chosen loci, but it has become possible to obtain orders of magnitude for more data from overlaps of genome sequences. Here, we generate tens of millions of base pairs of multiple sequence alignments from combinations of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human, an orangutan, and a macaque. Analysis provides a more precise understanding of demographic history than was previously available. We show that bonobos and common chimpanzees were separated approximately 1,290,000 years ago, western and other common chimpanzees approximately 510,000 years ago, and eastern and central chimpanzees at least 50,000 years ago. We infer that the central chimpanzee population size increased by at least a factor of 4 since its separation from western chimpanzees, while the western chimpanzee effective population size decreased. Surprisingly, in about one percent of the genome, the genetic relationships between humans, chimpanzees, and bonobos appear to be different from the species relationships. We used PCR-based resequencing to confirm 11 regions where chimpanzees and bonobos are not most closely related. Study of such loci should provide information about the period of time 5-7 million years ago when the ancestors of humans separated from those of the chimpanzees.

  4. Network Based Prediction Model for Genomics Data Analysis*

    OpenAIRE

    Huang, Ying; Wang, Pei

    2012-01-01

    Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. ...

  5. Whole-genome sequence-based analysis of thyroid function

    OpenAIRE

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 1...

  6. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... on Tabasco), led us to the detection of a high-resolution map of segmental duplications in the pig genome. Comparing these segments with four other Duroc animals sequenced at our institute, supplied the resources needed to describe the first genome-wide and systematic analysis of segmental duplications...

  7. Technology-Driven and Evidence-Based Genomic Analysis for Integrated Pediatric and Prenatal Genetics Evaluation

    Institute of Scientific and Technical Information of China (English)

    Yuan Wei; Fang Xu; Peining Li

    2013-01-01

    The first decade since the completion of the Human Genome Project has been marked with rapid development of genomic technologies and their immediate clinical applications.Genomic analysis using oligonucleotide array comparative genomic hybridization (aCGH) or single nucleotide polymorphism (SNP) chips has been applied to pediatric patients with developmental and intellectual disabilities (DD/ID),multiple congenital anomalies (MCA) and autistic spectrum disorders (ASD).Evaluation of analytical and clinical validities of aCGH showed > 99% sensitivity and specificity and increased analytical resolution by higher density probe coverage.Reviews of case series,multi-center comparison and large patient-control studies demonstrated a diagnostic yield of 12%-20%; approximately 60% of these abnormalities were recurrent genomic disorders.This pediatric experience has been extended toward prenatal diagnosis.A series of reports indicated approximately 10% of pregnancies with ultrasound-detected structural anomalies and normal cytogenetic findings had genomic abnormalities,and 30% of these abnormalities were syndromic genomic disorders.Evidence-based practice guidelines and standards for implementing genomic analysis and web-delivered knowledge resources for interpreting genomic findings have been established.The progress from this technology-driven and evidence-based genomic analysis provides not only opportunities to dissect disease-causing mechanisms and develop rational therapeutic interventions but also important lessons for integrating genomic sequencing into pediatric and prenatal genetic evaluation.

  8. Evolutionary insights from suffix array-based genome sequence analysis

    Indian Academy of Sciences (India)

    Anindya Poddar; Nagasuma Chandra; Madhavi Ganapathiraju; K Sekar; Judith Klein-Seetharaman; Raj Reddy; N Balakrishnan

    2007-08-01

    Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly, has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many other areas and are highly suited for sequence analysis too. Here we report an improvement to the design of construction of suffix arrays. Enhancement in versatility and scalability, enabled by this approach, is demonstrated through the use of real-life examples. The scalability of the algorithm to whole genomes renders it suitable to address many biologically interesting problems. One example is the evolutionary insight gained by analysing unigrams, bi-grams and higher n-grams, indicating that the genetic code has a direct influence on the overall composition of the genome. Further, different proteomes have been analysed for the coverage of the possible peptide space, which indicate that as much as a quarter of the total space at the tetra-peptide level is left un-sampled in prokaryotic organisms, although almost all tri-peptides can be seen in one protein or another in a proteome. Besides, distinct patterns begin to emerge for the counts of particular tetra and higher peptides, indicative of a ‘meaning’ for tetra and higher n-grams. The toolkit has also been used to demonstrate the usefulness of identifying repeats in whole proteomes efficiently. As an example, 16 members of one COG, coded by the genome of Mycobacterium tuberculosis H37Rv have been found to contain a repeating sequence of 300 amino acids.

  9. Analysis of the genomic homologous recombination in Theilovirus based on complete genomes

    Directory of Open Access Journals (Sweden)

    Yi Maoli

    2011-09-01

    Full Text Available Abstract At present, Theilovirus is considered to comprise four distinct serotypes, including Theiler's murine encephalomyelitis virus, Vilyuisk human encephalomyelitis virus, Thera virus, and Saffold virus. So far, there is no systematical study that investigated the genomic recombination of Theilovirus. The present study performed the phylogenetic and recombination analysis of Theilovirus over the complete genomes. Seven potentially significant recombination events were identified. However, according to the strains information and references related to the recombinants and their parental strains, four of the recombination events might happen non-naturally. These results will provide valuable hints for future research on evolution and antigenic variability of Theilovirus.

  10. Analysis of the genomic homologous recombination in Theilovirus based on complete genomes.

    Science.gov (United States)

    Sun, Guangming; Zhang, Xiaodan; Yi, Maoli; Shao, Shihe; Zhang, Wen

    2011-09-17

    At present, Theilovirus is considered to comprise four distinct serotypes, including Theiler's murine encephalomyelitis virus, Vilyuisk human encephalomyelitis virus, Thera virus, and Saffold virus. So far, there is no systematical study that investigated the genomic recombination of Theilovirus. The present study performed the phylogenetic and recombination analysis of Theilovirus over the complete genomes. Seven potentially significant recombination events were identified. However, according to the strains information and references related to the recombinants and their parental strains, four of the recombination events might happen non-naturally. These results will provide valuable hints for future research on evolution and antigenic variability of Theilovirus.

  11. Recombination analysis based on the complete genome of bocavirus

    Directory of Open Access Journals (Sweden)

    Chen Shengxia

    2011-04-01

    Full Text Available Abstract Bocavirus include bovine parvovirus, minute virus of canine, porcine bocavirus, gorilla bocavirus, and Human bocaviruses 1-4 (HBoVs. Although recent reports showed that recombination happened in bocavirus, no systematical study investigated the recombination of bocavirus. The present study performed the phylogenetic and recombination analysis of bocavirus over the complete genomes available in GenBank. Results confirmed that recombination existed among bocavirus, including the likely inter-genotype recombination between HBoV1 and HBoV4, and intra-genotype recombination among HBoV2 variants. Moreover, it is the first report revealing the recombination that occurred between minute viruses of canine.

  12. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  13. A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease

    Science.gov (United States)

    Kyriakou, Theodosios; Nelson, Christopher P; Hopewell, Jemma C; Webb, Thomas R; Zeng, Lingyao; Dehghan, Abbas; Alver, Maris; Armasu, Sebastian M; Auro, Kirsi; Bjonnes, Andrew; Chasman, Daniel I; Chen, Shufeng; Ford, Ian; Franceschini, Nora; Gieger, Christian; Grace, Christopher; Gustafsson, Stefan; Huang, Jie; Hwang, Shih-Jen; Kim, Yun Kyoung; Kleber, Marcus E; Lau, King Wai; Lu, Xiangfeng; Lu, Yingchang; Lyytikäinen, Leo-Pekka; Mihailov, Evelin; Morrison, Alanna C; Pervjakova, Natalia; Qu, Liming; Rose, Lynda M; Salfati, Elias; Saxena, Richa; Scholz, Markus; Smith, Albert V; Tikkanen, Emmi; Uitterlinden, Andre; Yang, Xueli; Zhang, Weihua; Zhao, Wei; de Andrade, Mariza; de Vries, Paul S; van Zuydam, Natalie R; Anand, Sonia S; Bertram, Lars; Beutner, Frank; Dedoussis, George; Frossard, Philippe; Gauguier, Dominique; Goodall, Alison H; Gottesman, Omri; Haber, Marc; Han, Bok-Ghee; Huang, Jianfeng; Jalilzadeh, Shapour; Kessler, Thorsten; König, Inke R; Lannfelt, Lars; Lieb, Wolfgang; Lind, Lars; Lindgren, Cecilia M; Lokki, Marja-Liisa; Magnusson, Patrik K; Mallick, Nadeem H; Mehra, Narinder; Meitinger, Thomas; Memon, Fazal-ur-Rehman; Morris, Andrew P; Nieminen, Markku S; Pedersen, Nancy L; Peters, Annette; Rallidis, Loukianos S; Rasheed, Asif; Samuel, Maria; Shah, Svati H; Sinisalo, Juha; Stirrups, Kathleen E; Trompet, Stella; Wang, Laiyuan; Zaman, Khan S; Ardissino, Diego; Boerwinkle, Eric; Borecki, Ingrid B; Bottinger, Erwin P; Buring, Julie E; Chambers, John C; Collins, Rory; Cupples, L Adrienne; Danesh, John; Demuth, Ilja; Elosua, Roberto; Epstein, Stephen E; Esko, Tõnu; Feitosa, Mary F; Franco, Oscar H; Franzosi, Maria Grazia; Granger, Christopher B; Gu, Dongfeng; Gudnason, Vilmundur; Hall, Alistair S; Hamsten, Anders; Harris, Tamara B; Hazen, Stanley L; Hengstenberg, Christian; Hofman, Albert; Ingelsson, Erik; Iribarren, Carlos; Jukema, J Wouter; Karhunen, Pekka J; Kim, Bong-Jo; Kooner, Jaspal S; Kullo, Iftikhar J; Lehtimäki, Terho; Loos, Ruth J F; Melander, Olle; Metspalu, Andres; März, Winfried; Palmer, Colin N; Perola, Markus; Quertermous, Thomas; Rader, Daniel J; Ridker, Paul M; Ripatti, Samuli; Roberts, Robert; Salomaa, Veikko; Sanghera, Dharambir K; Schwartz, Stephen M; Seedorf, Udo; Stewart, Alexandre F; Stott, David J; Thiery, Joachim; Zalloua, Pierre A; O’Donnell, Christopher J; Reilly, Muredach P; Assimes, Themistocles L; Thompson, John R; Erdmann, Jeanette; Clarke, Robert; Watkins, Hugh; Kathiresan, Sekar; McPherson, Ruth; Deloukas, Panos; Schunkert, Heribert; Samani, Nilesh J; Farrall, Martin

    2015-01-01

    Existing knowledge of genetic variants affecting risk of coronary artery disease (CAD) is largely based on genome-wide association studies (GWAS) analysis of common SNPs. Leveraging phased haplotypes from the 1000 Genomes Project, we report a GWAS meta-analysis of 185 thousand CAD cases and controls, interrogating 6.7 million common (MAF>0.05) as well as 2.7 million low frequency (0.005analysis provides a comprehensive survey of the fine genetic architecture of CAD showing that genetic susceptibility to this common disease is largely determined by common SNPs of small effect size. PMID:26343387

  14. Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments

    Directory of Open Access Journals (Sweden)

    Tcherepanov Vasily

    2004-07-01

    Full Text Available Abstract Background With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes is not feasible without new bioinformatics tools. Results A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1 rapidly identify and correct alignment errors in large, multiple genome alignments; and 2 generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs to retrieve detailed annotation information about the aligned genomes or use information from text files. Conclusion Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  15. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform.

    Science.gov (United States)

    Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh

    2016-01-01

    Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.

  16. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

    Directory of Open Access Journals (Sweden)

    Wenning Zheng

    2016-03-01

    Factor Database (VFDB specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.

  17. A genomic background based method for association analysis in related individuals.

    Directory of Open Access Journals (Sweden)

    Najaf Amin

    Full Text Available BACKGROUND: Feasibility of genotyping of hundreds and thousands of single nucleotide polymorphisms (SNPs in thousands of study subjects have triggered the need for fast, powerful, and reliable methods for genome-wide association analysis. Here we consider a situation when study participants are genetically related (e.g. due to systematic sampling of families or because a study was performed in a genetically isolated population. Of the available methods that account for relatedness, the Measured Genotype (MG approach is considered the 'gold standard'. However, MG is not efficient with respect to time taken for the analysis of genome-wide data. In this context we proposed a fast two-step method called Genome-wide Association using Mixed Model and Regression (GRAMMAR for the analysis of pedigree-based quantitative traits. This method certainly overcomes the drawback of time limitation of the measured genotype (MG approach, but pays in power. One of the major drawbacks of both MG and GRAMMAR, is that they crucially depend on the availability of complete and correct pedigree data, which is rarely available. METHODOLOGY: In this study we first explore type 1 error and relative power of MG, GRAMMAR, and Genomic Control (GC approaches for genetic association analysis. Secondly, we propose an extension to GRAMMAR i.e. GRAMMAR-GC. Finally, we propose application of GRAMMAR-GC using the kinship matrix estimated through genomic marker data, instead of (possibly missing and/or incorrect genealogy. CONCLUSION: Through simulations we show that MG approach maintains high power across a range of heritabilities and possible pedigree structures, and always outperforms other contemporary methods. We also show that the power of our proposed GRAMMAR-GC approaches to that of the 'gold standard' MG for all models and pedigrees studied. We show that this method is both feasible and powerful and has correct type 1 error in the context of genome-wide association analysis

  18. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

    Directory of Open Access Journals (Sweden)

    Stajich Jason E

    2006-11-01

    relatives. We could not confidently resolve whether Candida glabrata or Saccharomyces castellii lies at the base of the WGD clade. Conclusion We have constructed robust phylogenies for fungi based on whole genome analysis. Overall, our phylogenies provide strong support for the classification of phyla, sub-phyla, classes and orders. We have resolved the relationship of the classes Leotiomyctes and Sordariomycetes, and have identified two classes within the CTG clade of the Saccharomycotina that may correlate with sexual status.

  19. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species.

    Science.gov (United States)

    de Bruijn, Irene; de Kock, Maarten J D; Yang, Meng; de Waard, Pieter; van Beek, Teris A; Raaijmakers, Jos M

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these predictions, however, are untested and the association between genome sequence and biological function of the predicted metabolite is lacking. Here we report the genome-based identification of previously unknown CLP gene clusters in plant pathogenic Pseudomonas syringae strains B728a and DC3000 and in plant beneficial Pseudomonas fluorescens Pf0-1 and SBW25. For P. fluorescens SBW25, a model strain in studying bacterial evolution and adaptation, the structure of the CLP with a predicted 9-amino acid peptide moiety was confirmed by chemical analyses. Mutagenesis confirmed that the three identified NRPS genes are essential for CLP synthesis in strain SBW25. CLP production was shown to play a key role in motility, biofilm formation and in activity of SBW25 against zoospores of Phytophthora infestans. This is the first time that an antimicrobial metabolite is identified from strain SBW25. The results indicate that genome mining may enable the discovery of unknown gene clusters and traits that are highly relevant in the lifestyle of plant beneficial and plant pathogenic bacteria.

  20. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    Science.gov (United States)

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  1. Protein Interaction-Based Genome-Wide Analysis of Incident Coronary Heart Disease

    DEFF Research Database (Denmark)

    Jensen, Majken Karoline; Pers, Tune Hannes; Dworzynski, Piotr

    2011-01-01

    Background-Network-based approaches may leverage genome-wide association (GWA) analysis by testing for the aggregate association across several pathway members. We aimed to examine if networks of genes that represent experimentally determined protein-protein interactions (PPIs) are enriched...... are involved in abnormal cardiovascular system physiological features based on knockout mice (4-fold enrichment; Fisher exact test, P = 0.006). Ingenuity pathway analysis revealed that canonical pathways, especially related to blood pressure regulation, were significantly enriched in the genes from the top...... complex. Conclusions-The integration of a GWA study with PPI data successfully identifies a set of candidate susceptibility genes for incident CHD that would have been missed in single-marker GWA analysis. (Circ Cardiovasc Genet. 2011; 4:549-556.)...

  2. Molecular analysis of single oocyst of Eimeria by whole genome amplification (WGA) based nested PCR.

    Science.gov (United States)

    Wang, Yunzhou; Tao, Geru; Cui, Yujuan; Lv, Qiyao; Xie, Li; Li, Yuan; Suo, Xun; Qin, Yinghe; Xiao, Lihua; Liu, Xianyong

    2014-09-01

    PCR-based molecular tools are widely used for the identification and characterization of protozoa. Here we report the molecular analysis of Eimeria species using combined methods of whole genome amplification (WGA) and nested PCR. Single oocyst of Eimeria stiedai or Eimeriamedia was directly used for random amplification of the genomic DNA with either primer extension preamplification (PEP) or multiple displacement amplification (MDA), and then the WGA product was used as template in nested PCR with species-specific primers for ITS-1, 18S rDNA and 23S rDNA of E. stiedai and E. media. WGA-based PCR was successful for the amplification of these genes from single oocyst. For the species identification of single oocyst isolated from mixed E. stiedai or E. media, the results from WGA-based PCR were exactly in accordance with those from morphological identification, suggesting the availability of this method in molecular analysis of eimerian parasites at the single oocyst level. WGA-based PCR method can also be applied for the identification and genetic characterization of other protists.

  3. [CRISPR/Cas9-based genome editing systems and the analysis of targeted genome mutations in plants].

    Science.gov (United States)

    Xingliang, Ma; Yaoguang, Liu

    2016-02-01

    Targeted genomic editing technologies use programmable DNA nucleases to cleave genomic target sites, thus inducing targeted mutations in the genomes. The newly prevailed clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system that consists of the Cas9 nuclease and single guide RNA (sgRNA) has the advantages of simplicity and high efficiency as compared to other programmable DNA nuclease systems such as zinc finger nucleases (ZFNs) and transcription activator like effector nucleases (TALENs). Currently, a number of cases have been reported on the application of the CRISPR/Cas9 genomic editing technology in plants. In this review, we summarize the strategies for preparing the Cas9 and sgRNA expression constructs, the transformation method for obtaining targeted mutations, the efficiency and features of the resulting mutations and the methods for detecting or genotyping of the mutation sites. We also discuss the existing problems and perspectives of CRISPR/Cas9-based genomic editing in plants.

  4. MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.

    Science.gov (United States)

    Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin

    2015-04-01

    Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  5. DNAskew: Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-RuMA; Shao-BoXIAO; Ai-ZhenGUO; Jian-QiangLUE; Huan-ChunCHEN

    2004-01-01

    Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution,in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

  6. Scanning for signatures of geographically restricted selection based on population genomics analysis

    Institute of Scientific and Technical Information of China (English)

    2007-01-01

    Natural selection, as the driving force of human evolution, has direct impact on population differentiation. However, it is still unclear to what extent the genetic differentiation has been caused by natural selection. To explore this question, we performed a genome-wide scan with single nucleotide polymorphism (SNP) data from the International HapMap Project. Single locus FST analysis was applied to assess the frequency difference among populations in autosomes. Based on the empirical distribution of FST, we identified 12669 SNPs correlating to population differentiation and 1853 candidate genes subjected to geographic restricted natural selection. Further interpretation of gene ontogeny revealed 121 categories of biological process with the enrichments of candidate genes. Our results suggest that natural selection may play an important role in human population differentiation. In addition, our analysis provides new clues as well as research methods for our understanding of population differentiation and natural selection.

  7. Genome-wide interaction-based association analysis identified multiple new susceptibility Loci for common diseases.

    Directory of Open Access Journals (Sweden)

    Yang Liu

    2011-03-01

    Full Text Available Genome-wide interaction-based association (GWIBA analysis has the potential to identify novel susceptibility loci. These interaction effects could be missed with the prevailing approaches in genome-wide association studies (GWAS. However, no convincing loci have been discovered exclusively from GWIBA methods, and the intensive computation involved is a major barrier for application. Here, we developed a fast, multi-thread/parallel program named "pair-wise interaction-based association mapping" (PIAM for exhaustive two-locus searches. With this program, we performed a complete GWIBA analysis on seven diseases with stringent control for false positives, and we validated the results for three of these diseases. We identified one pair-wise interaction between a previously identified locus, C1orf106, and one new locus, TEC, that was specific for Crohn's disease, with a Bonferroni corrected P < 0.05 (P = 0.039. This interaction was replicated with a pair of proxy linked loci (P = 0.013 on an independent dataset. Five other interactions had corrected P < 0.5. We identified the allelic effect of a locus close to SLC7A13 for coronary artery disease. This was replicated with a linked locus on an independent dataset (P = 1.09 × 10⁻⁷. Through a local validation analysis that evaluated association signals, rather than locus-based associations, we found that several other regions showed association/interaction signals with nominal P < 0.05. In conclusion, this study demonstrated that the GWIBA approach was successful for identifying novel loci, and the results provide new insights into the genetic architecture of common diseases. In addition, our PIAM program was capable of handling very large GWAS datasets that are likely to be produced in the future.

  8. Messenger RNA- versus retrovirus-based induced pluripotent stem cell reprogramming strategies: analysis of genomic integrity.

    Science.gov (United States)

    Steichen, Clara; Luce, Eléanor; Maluenda, Jérôme; Tosca, Lucie; Moreno-Gimeno, Inmaculada; Desterke, Christophe; Dianat, Noushin; Goulinet-Mainot, Sylvie; Awan-Toor, Sarah; Burks, Deborah; Marie, Joëlle; Weber, Anne; Tachdjian, Gérard; Melki, Judith; Dubart-Kupperschmitt, Anne

    2014-06-01

    The use of synthetic messenger RNAs to generate human induced pluripotent stem cells (iPSCs) is particularly appealing for potential regenerative medicine applications, because it overcomes the common drawbacks of DNA-based or virus-based reprogramming strategies, including transgene integration in particular. We compared the genomic integrity of mRNA-derived iPSCs with that of retrovirus-derived iPSCs generated in strictly comparable conditions, by single-nucleotide polymorphism (SNP) and copy number variation (CNV) analyses. We showed that mRNA-derived iPSCs do not differ significantly from the parental fibroblasts in SNP analysis, whereas retrovirus-derived iPSCs do. We found that the number of CNVs seemed independent of the reprogramming method, instead appearing to be clone-dependent. Furthermore, differentiation studies indicated that mRNA-derived iPSCs differentiated efficiently into hepatoblasts and that these cells did not load additional CNVs during differentiation. The integration-free hepatoblasts that were generated constitute a new tool for the study of diseased hepatocytes derived from patients' iPSCs and their use in the context of stem cell-derived hepatocyte transplantation. Our findings also highlight the need to conduct careful studies on genome integrity for the selection of iPSC lines before using them for further applications.

  9. Surface ligation-based resonance light scattering analysis of methylated genomic DNA on a microarray platform.

    Science.gov (United States)

    Ma, Lan; Lei, Zhen; Liu, Xia; Liu, Dianjun; Wang, Zhenxin

    2016-05-10

    DNA methylation is a crucial epigenetic modification and is closely related to tumorigenesis. Herein, a surface ligation-based high throughput method combined with bisulfite treatment is developed for analysis of methylated genomic DNA. In this method, a DNA microarray is employed as a reaction platform, and resonance light scattering (RLS) of nanoparticles is used as the detection principle. The specificity stems from allele-specific ligation of Taq DNA ligase, which is further enhanced by improving the fidelity of Taq DNA ligase in a heterogeneous reaction. Two amplification techniques, rolling circle amplification (RCA) and silver enhancement, are employed after the ligation reaction and a gold nanoparticle (GNP) labeling procedure is used to amplify the signal. As little as 0.01% methylated DNA (i.e. 2 pmol L(-1)) can be distinguished from the cocktail of methylated and unmethylated DNA by the proposed method. More importantly, this method shows good accuracy and sensitivity in profiling the methylation level of genomic DNA of three selected colonic cancer cell lines. This strategy provides a high throughput alternative with reasonable sensitivity and resolution for cancer study and diagnosis.

  10. Genome-based proteomic analysis of Lignosus rhinocerotis (Cooke) Ryvarden sclerotium.

    Science.gov (United States)

    Yap, Hui-Yeng Yeannie; Fung, Shin-Yee; Ng, Szu-Ting; Tan, Chon-Seng; Tan, Nget-Hong

    2015-01-01

    Lignosus rhinocerotis (Cooke) Ryvarden (Polyporales, Basidiomycota), also known as the tiger milk mushroom, has received much interest in recent years owing to its wide-range ethnobotanical uses and the recent success in its domestication. The sclerotium is the part with medicinal value. Using two-dimensional gel electrophoresis coupled with mass spectrometry analysis, a total of 16 non-redundant, major proteins were identified with high confidence level in L. rhinocerotis sclerotium based on its genome as custom mapping database. Some of these proteins, such as the putative lectins, immunomodulatory proteins, superoxide dismutase, and aegerolysin may have pharmaceutical potential; while others are involved in nutrient mobilization and the protective antioxidant mechanism in the sclerotium. The findings from this study provide a molecular basis for future research on potential pharmacologically active proteins of L. rhinocerotis.

  11. Spectrogram Analysis of Genomes

    Directory of Open Access Journals (Sweden)

    David Sussillo

    2004-01-01

    Full Text Available We performed frequency-domain analysis in the genomes of various organisms using tricolor spectrograms, identifying several types of distinct visual patterns characterizing specific DNA regions. We relate patterns and their frequency characteristics to the sequence characteristics of the DNA. At times, the spectrogram patterns could be related to the structure of the corresponding protein region by using various public databases such as GenBank. Some patterns are explained from the biological nature of the corresponding regions, which relate to chromosome structure and protein coding, and some patterns have yet unknown biological significance. We found biologically meaningful patterns, on the scale of millions of base pairs, to a few hundred base pairs. Chromosome-wide patterns include periodicities ranging from 2 to 300. The color of the spectrogram depends on the nucleotide content at specific frequencies, and therefore can be used as a local indicator of CG content and other measures of relative base content. Several smaller-scale patterns were found to represent different types of domains made up of various tandem repeats.

  12. Gene expression profiles in squamous cell cervical carcinoma using array-based comparative genomic hybridization analysis.

    Science.gov (United States)

    Choi, Y-W; Bae, S M; Kim, Y-W; Lee, H N; Kim, Y W; Park, T C; Ro, D Y; Shin, J C; Shin, S J; Seo, J-S; Ahn, W S

    2007-01-01

    Our aim was to identify novel genomic regions of interest and provide highly dynamic range information on correlation between squamous cell cervical carcinoma and its related gene expression patterns by a genome-wide array-based comparative genomic hybridization (array-CGH). We analyzed 15 cases of cervical cancer from KangNam St Mary's Hospital of the Catholic University of Korea. Microdissection assay was performed to obtain DNA samples from paraffin-embedded cervical tissues of cancer as well as of the adjacent normal tissues. The bacterial artificial chromosome (BAC) array used in this study consisted of 1440 human BACs and the space among the clones was 2.08 Mb. All the 15 cases of cervical cancer showed the differential changes of the cervical cancer-associated genetic alterations. The analysis limit of average gains and losses was 53%. A significant positive correlation was found in 8q24.3, 1p36.32, 3q27.1, 7p21.1, 11q13.1, and 3p14.2 changes through the cervical carcinogenesis. The regions of high level of gain were 1p36.33-1p36.32, 8q24.3, 16p13.3, 1p36.33, 3q27.1, and 7p21.1. And the regions of homozygous loss were 2q12.1, 22q11.21, 3p14.2, 6q24.3, 7p15.2, and 11q25. In the high level of gain regions, GSDMDC1, RECQL4, TP73, ABCF3, ALG3, HDAC9, ESRRA, and RPS6KA4 were significantly correlated with cervical cancer. The genes encoded by frequently lost clones were PTPRG, GRM7, ZDHHC3, EXOSC7, LRP1B, and NR3C2. Therefore, array-CGH analyses showed that specific genomic alterations were maintained in cervical cancer that were critical to the malignant phenotype and may give a chance to find out possible target genes present in the gained or lost clones.

  13. Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2010-02-01

    Full Text Available Abstract Background The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. Findings GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb. It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. Conclusions The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind.

  14. Classifying Genomic Sequences by Sequence Feature Analysis

    Institute of Scientific and Technical Information of China (English)

    Zhi-Hua Liu; Dian Jiao; Xiao Sun

    2005-01-01

    Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream,exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.

  15. Identification of genetic bases of vibrio fluvialis species-specific biochemical pathways and potential virulence factors by comparative genomic analysis.

    Science.gov (United States)

    Lu, Xin; Liang, Weili; Wang, Yunduan; Xu, Jialiang; Zhu, Jun; Kan, Biao

    2014-03-01

    Vibrio fluvialis is an important food-borne pathogen that causes diarrheal illness and sometimes extraintestinal infections in humans. In this study, we sequenced the genome of a clinical V. fluvialis strain and determined its phylogenetic relationships with other Vibrio species by comparative genomic analysis. We found that the closest relationship was between V. fluvialis and V. furnissii, followed by those with V. cholerae and V. mimicus. Moreover, based on genome comparisons and gene complementation experiments, we revealed genetic mechanisms of the biochemical tests that differentiate V. fluvialis from closely related species. Importantly, we identified a variety of genes encoding potential virulence factors, including multiple hemolysins, transcriptional regulators, and environmental survival and adaptation apparatuses, and the type VI secretion system, which is indicative of complex regulatory pathways modulating pathogenesis in this organism. The availability of V. fluvialis genome sequences may promote our understanding of pathogenic mechanisms for this emerging pathogen.

  16. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these

  17. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these pre

  18. Meta-analysis of genome-wide association for migraine in six population-based European cohorts

    NARCIS (Netherlands)

    Ligthart, Lannie; de Vries, Boukje; Smith, Albert V.; Ikram, M. Arfan; Amin, Najaf; Hottenga, Jouke-Jan; Koelewijn, Stephany C.; Kattenberg, V. Mathijs; de Moor, Marleen H. M.; Janssens, A. Cecile J. W.; Aulchenko, Yurii S.; Oostra, Ben A.; de Geus, Eco J. C.; Smit, Johannes H.; Zitman, Frans G.; Uitterlinden, Andre G.; Hofman, Albert; Willemsen, Gonneke; Nyholt, Dale R.; Montgomery, Grant W.; Terwindt, Gisela M.; Gudnason, Vilmundur; Penninx, Brenda W. J. H.; Breteler, Monique; Ferrari, Michel D.; Launer, Lenore J.; van Duijn, Cornelia M.; van den Maagdenberg, Arn M. J. M.; Boomsma, Dorret I.

    2011-01-01

    Migraine is a common neurological disorder with a genetically complex background. This paper describes a meta-analysis of genome-wide association (GWA) studies on migraine, performed by the Dutch-Icelandic migraine genetics (DICE) consortium, which brings together six population-based European migra

  19. Meta-analysis of genome-wide association for migraine in six population-based European cohorts

    NARCIS (Netherlands)

    Ligthart, Lannie; de Vries, Boukje; Smith, Albert V.; Ikram, M. Arfan; Amin, Najaf; Hottenga, Jouke-Jan; Koelewijn, Stephany C.; Kattenberg, V. Mathijs; de Moor, Marleen H. M.; Janssens, A. Cecile J. W.; Aulchenko, Yurii S.; Oostra, Ben A.; de Geus, Eco J. C.; Smit, Johannes H.; Zitman, Frans G.; Uitterlinden, Andre G.; Hofman, Albert; Willemsen, Gonneke; Nyholt, Dale R.; Montgomery, Grant W.; Terwindt, Gisela M.; Gudnason, Vilmundur; Penninx, Brenda W. J. H.; Breteler, Monique; Ferrari, Michel D.; Launer, Lenore J.; van Duijn, Cornelia M.; van den Maagdenberg, Arn M. J. M.; Boomsma, Dorret I.

    2011-01-01

    Migraine is a common neurological disorder with a genetically complex background. This paper describes a meta-analysis of genome-wide association (GWA) studies on migraine, performed by the Dutch-Icelandic migraine genetics (DICE) consortium, which brings together six population-based European migra

  20. Stochastic segmentation models for array-based comparative genomic hybridization data analysis.

    Science.gov (United States)

    Lai, Tze Leung; Xing, Haipeng; Zhang, Nancy

    2008-04-01

    Array-based comparative genomic hybridization (array-CGH) is a high throughput, high resolution technique for studying the genetics of cancer. Analysis of array-CGH data typically involves estimation of the underlying chromosome copy numbers from the log fluorescence ratios and segmenting the chromosome into regions with the same copy number at each location. We propose for the analysis of array-CGH data, a new stochastic segmentation model and an associated estimation procedure that has attractive statistical and computational properties. An important benefit of this Bayesian segmentation model is that it yields explicit formulas for posterior means, which can be used to estimate the signal directly without performing segmentation. Other quantities relating to the posterior distribution that are useful for providing confidence assessments of any given segmentation can also be estimated by using our method. We propose an approximation method whose computation time is linear in sequence length which makes our method practically applicable to the new higher density arrays. Simulation studies and applications to real array-CGH data illustrate the advantages of the proposed approach.

  1. Comprehensive Phylogenetic Analysis of Bovine Non-aureus Staphylococci Species Based on Whole-Genome Sequencing

    Science.gov (United States)

    Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen

    2016-01-01

    Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335

  2. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia

    OpenAIRE

    Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S.; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh VR; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

    2015-01-01

    Background Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequ...

  3. BrucellaBase: Genome information resource.

    Science.gov (United States)

    Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2016-09-01

    Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html.

  4. Association between chromosomal aberration of COX8C and tethered spinal cord syndrome: array-based comparative genomic hybridization analysis

    Directory of Open Access Journals (Sweden)

    Qiu-jiong Zhao

    2016-01-01

    Full Text Available Copy number variations have been found in patients with neural tube abnormalities. In this study, we performed genome-wide screening using high-resolution array-based comparative genomic hybridization in three children with tethered spinal cord syndrome and two healthy parents. Of eight copy number variations, four were non-polymorphic. These non-polymorphic copy number variations were associated with Angelman and Prader-Willi syndromes, and microcephaly. Gene function enrichment analysis revealed that COX8C, a gene associated with metabolic disorders of the nervous system, was located in the copy number variation region of Patient 1. Our results indicate that array-based comparative genomic hybridization can be used to diagnose tethered spinal cord syndrome. Our results may help determine the pathogenesis of tethered spinal cord syndrome and prevent occurrence of this disease.

  5. A genome-wide tree- and forest-based association analysis of comorbidity of alcoholism and smoking

    OpenAIRE

    Ye, Yuanqing; Zhong, Xiaoyun; Zhang, Heping

    2005-01-01

    Genetic mechanisms underlying alcoholism are complex. Understanding the etiology of alcohol dependence and its comorbid conditions such as smoking is important because of the significant health concerns. In this report, we describe a method based on classification trees and deterministic forests for association studies to perform a genome-wide joint association analysis of alcoholism and smoking. This approach is used to analyze the single-nucleotide polymorphism data from the Collaborative S...

  6. Pedigree-based analysis of derivation of genome segments of an elite rice reveals key regions during its breeding.

    Science.gov (United States)

    Zhou, Degui; Chen, Wei; Lin, Zechuan; Chen, Haodong; Wang, Chongrong; Li, Hong; Yu, Renbo; Zhang, Fengyun; Zhen, Gang; Yi, Junliang; Li, Kanghuo; Liu, Yaoguang; Terzaghi, William; Tang, Xiaoyan; He, Hang; Zhou, Shaochuan; Deng, Xing Wang

    2016-02-01

    Analyses of genome variations with high-throughput assays have improved our understanding of genetic basis of crop domestication and identified the selected genome regions, but little is known about that of modern breeding, which has limited the usefulness of massive elite cultivars in further breeding. Here we deploy pedigree-based analysis of an elite rice, Huanghuazhan, to exploit key genome regions during its breeding. The cultivars in the pedigree were resequenced with 7.6× depth on average, and 2.1 million high-quality single nucleotide polymorphisms (SNPs) were obtained. Tracing the derivation of genome blocks with pedigree and information on SNPs revealed the chromosomal recombination during breeding, which showed that 26.22% of Huanghuazhan genome are strictly conserved key regions. These major effect regions were further supported by a QTL mapping of 260 recombinant inbred lines derived from the cross of Huanghuazhan and a very dissimilar cultivar, Shuanggui 36, and by the genome profile of eight cultivars and 36 elite lines derived from Huanghuazhan. Hitting these regions with the cloned genes revealed they include numbers of key genes, which were then applied to demonstrate how Huanghuazhan were bred after 30 years of effort and to dissect the deficiency of artificial selection. We concluded the regions are helpful to the further breeding based on this pedigree and performing breeding by design. Our study provides genetic dissection of modern rice breeding and sheds new light on how to perform genomewide breeding by design. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  7. SNP-based pathway enrichment analysis for genome-wide association studies

    Directory of Open Access Journals (Sweden)

    Potkin Steven G

    2011-04-01

    Full Text Available Abstract Background Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs, have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs. Results We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1 for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2 ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one

  8. An overview of the BioExtract Server: a distributed, Web-based system for genomic analysis.

    Science.gov (United States)

    Lushbough, C M; Brendel, V P

    2010-01-01

    Genome research is becoming increasingly dependent on access to multiple, distributed data sources, and bioinformatic tools. The importance of integration across distributed databases and Web services will continue to grow as the number of requisite resources expands. Use of bioinformatic workflows has seen considerable growth in recent years as scientific research becomes increasingly dependent on the analysis of large sets of data and the use of distributed resources. The BioExtract Server (http://bioextract.org) is a Web-based system designed to aid researchers in the analysis of distributed genomic data by providing a platform to facilitate the creation of bioinformatic workflows. Scientific workflows are created within the system by recording the analytic tasks preformed by researchers. These steps may include querying multiple data sources, saving query results as searchable data extracts, and executing local and Web-accessible analytic tools. The series of recorded tasks can be saved as a computational workflow simply by providing a name and description.

  9. Reference Based Genome Compression

    CERN Document Server

    Chern, Bobbie; Manolakos, Alexandros; No, Albert; Venkat, Kartik; Weissman, Tsachy

    2012-01-01

    DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target genome, and then compresses this mapping with an entropy coder. As an illustration of the performance: applying our algorithm to James Watson's genome with hg18 as a reference, we are able to reduce the 2991 megabyte (MB) genome down to 6.99 MB, while Gzip compresses it to 834.8 MB.

  10. Reference Based Genome Compression

    OpenAIRE

    Chern, Bobbie; Ochoa, Idoia; Manolakos, Alexandros; No, Albert; Venkat, Kartik; Weissman, Tsachy

    2012-01-01

    DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target gen...

  11. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    Directory of Open Access Journals (Sweden)

    Bonten Marc JM

    2010-04-01

    Full Text Available Abstract Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI, which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

  12. Genome-Wide Analysis of Microsatellite Markers Based on Sequenced Database in Chinese Spring Wheat (Triticum aestivum L..

    Directory of Open Access Journals (Sweden)

    Bin Han

    Full Text Available Microsatellites or simple sequence repeats (SSRs are distributed across both prokaryotic and eukaryotic genomes and have been widely used for genetic studies and molecular marker-assisted breeding in crops. Though an ordered draft sequence of hexaploid bread wheat have been announced, the researches about systemic analysis of SSRs for wheat still have not been reported so far. In the present study, we identified 364,347 SSRs from among 10,603,760 sequences of the Chinese spring wheat (CSW genome, which were present at a density of 36.68 SSR/Mb. In total, we detected 488 types of motifs ranging from di- to hexanucleotides, among which dinucleotide repeats dominated, accounting for approximately 42.52% of the genome. The density of tri- to hexanucleotide repeats was 24.97%, 4.62%, 3.25% and 24.65%, respectively. AG/CT, AAG/CTT, AGAT/ATCT, AAAAG/CTTTT and AAAATT/AATTTT were the most frequent repeats among di- to hexanucleotide repeats. Among the 21 chromosomes of CSW, the density of repeats was highest on chromosome 2D and lowest on chromosome 3A. The proportions of di-, tri-, tetra-, penta- and hexanucleotide repeats on each chromosome, and even on the whole genome, were almost identical. In addition, 295,267 SSR markers were successfully developed from the 21 chromosomes of CSW, which cover the entire genome at a density of 29.73 per Mb. All of the SSR markers were validated by reverse electronic-Polymerase Chain Reaction (re-PCR; 70,564 (23.9% were found to be monomorphic and 224,703 (76.1% were found to be polymorphic. A total of 45 monomorphic markers were selected randomly for validation purposes; 24 (53.3% amplified one locus, 8 (17.8% amplified multiple identical loci, and 13 (28.9% did not amplify any fragments from the genomic DNA of CSW. Then a dendrogram was generated based on the 24 monomorphic SSR markers among 20 wheat cultivars and three species of its diploid ancestors showing that monomorphic SSR markers represented a promising

  13. A new estimate of afrotherian phylogeny based on simultaneous analysis of genomic, morphological, and fossil evidence

    Directory of Open Access Journals (Sweden)

    Seiffert Erik R

    2007-11-01

    Full Text Available Abstract Background The placental mammalian clade Afrotheria is now supported by diverse forms of genomic data, but interordinal relationships within, and morphological support for, the group remains elusive. As a means for addressing these outstanding problems, competing hypotheses of afrotherian interordinal relationships were tested through simultaneous parsimony analysis of a large data set (> 4,590 parsimony informative characters containing genomic data (> 17 kb of nucleotide data, chromosomal associations, and retroposons and 400 morphological characters scored across 16 extant and 35 extinct afrotherians. Results Parsimony analysis of extant taxa alone recovered the interordinal topology (Afrosoricida, ((Macroscelidea, Tubulidentata, (Hyracoidea, (Proboscidea, Sirenia. Analysis following addition of extinct taxa instead supported Afroinsectivora (Afrosoricida + Macroscelidea and Pseudoungulata (Tubulidentata + Paenungulata, as well as Tethytheria (Proboscidea + Sirenia. This latter topology is, however, sensitive to taxon deletion and different placements of the placental root, and numerous alternative interordinal arrangements within Afrotheria could not be statistically rejected. Relationships among extinct stem members of each afrotherian clade were more stable, but one alleged stem macroscelidean (Herodotius never grouped with that clade and instead consistently joined pseudoungulates or paenungulates. When character transformations were optimized onto a less resolved afrotherian tree that reflects uncertainty about the group's interordinal phylogeny, a total of 21 morphological features were identified as possible synapomorphies of crown Afrotheria, 9 of which optimized unambiguously across all character treatments and optimization methods. Conclusion Instability in afrotherian interordinal phylogeny presumably reflects rapid divergences during two pulses of cladogenesis – the first in the Late Cretaceous, at and just after the

  14. P-value based analysis for shared controls design in genome-wide association studies.

    Science.gov (United States)

    Zaykin, Dmitri V; Kozbur, Damian O

    2010-11-01

    An appealing genome-wide association study design compares one large control group against several disease samples. A pioneering study by the Wellcome Trust Case Control Consortium that employed such a design has identified multiple susceptibility regions, many of which have been independently replicated. While reusing a control sample provides effective utilization of data, it also creates correlation between association statistics across diseases. An observation of a large association statistic for one of the diseases may greatly increase chances of observing a spuriously large association for a different disease. Accounting for the correlation is also particularly important when screening for SNPs that might be involved in a set of diseases with overlapping etiology. We describe methods that correct association statistics for dependency due to shared controls, and we describe ways to obtain a measure of overall evidence and to combine association signals across multiple diseases. The methods we describe require no access to individual subject data, instead, they efficiently utilize information contained in P-values for association reported for individual diseases. P-value based combined tests for association are flexible and essentially as powerful as the approach based on aggregating the individual subject data.

  15. Structural and functional analysis of rice genome

    Indian Academy of Sciences (India)

    Akhilesh K. Tyagi; Jitendra P. Khurana; Paramjit Khurana; Saurabh Raghuvanshi; Anupama Gaur; Anita Kapur; Vikrant Gupta; Dibyendu Kumar; V. Ravi; Shubha Vij; Parul Khurana; Sulabha Sharma

    2004-04-01

    Rice is an excellent system for plant genomics as it represents a modest size genome of 430 Mb. It feeds more than half the population of the world. Draft sequences of the rice genome, derived by whole-genome shotgun approach at relatively low coverage (4–6 X), were published and the International Rice Genome Sequencing Project (IRGSP) declared high quality (>10 X), genetically anchored, phase 2 level sequence in 2002. In addition, phase 3 level finished sequence of chromosomes 1, 4 and 10 (out of 12 chromosomes of rice) has already been reported by scientists from IRGSP consortium. Various estimates of genes in rice place the number at > 50,000. Already, over 28,000 full-length cDNAs have been sequenced, most of which map to genetically anchored genome sequence. Such information is very useful in revealing novel features of macro- and micro-level synteny of rice genome with other cereals. Microarray analysis is unraveling the identity of rice genes expressing in temporal and spatial manner and should help target candidate genes useful for improving traits of agronomic importance. Simultaneously, functional analysis of rice genome has been initiated by marker-based characterization of useful genes and employing functional knock-outs created by mutation or gene tagging. Integration of this enormous information is expected to catalyze tremendous activity on basic and applied aspects of rice genomics.

  16. Genome-based analysis of the transcriptome from mature chickpea root nodules

    Directory of Open Access Journals (Sweden)

    Fabian eAfonso-Grunz

    2014-07-01

    Full Text Available Symbiotic nitrogen fixation (SNF in root nodules of grain legumes such as chickpea is a highly complex process that drastically affects the gene expression patterns of both the prokaryotic as well as eukaryotic interacting cells. A successfully established symbiotic relationship requires mutual signaling mechanisms and a continuous adaptation of the metabolism of the involved cells to varying environmental conditions. Although some of these processes are well understood today many of the molecular mechanisms underlying SNF, especially in chickpea, remain unclear. Here, we reannotated our previously published transcriptome data generated by deepSuperSAGE (Serial Analysis of Gene Expression to the recently published draft genome of chickpea to assess the root- and nodule-specific transcriptomes of the eukaryotic host cells. The identified gene expression patterns comprise up to 71 significantly differentially expressed genes and the expression of twenty of these was validated by quantitative real-time PCR with the tissues from five independent biological replicates. Many of the differentially expressed transcripts were found to encode proteins implicated in sugar metabolism, antioxidant defense as well as biotic and abiotic stress responses of the host cells, and some of them were already known to contribute to SNF in other legumes. The differentially expressed genes identified in this study represent candidates that can be used for further characterization of the complex molecular mechanisms underlying SNF in chickpea.

  17. On the analysis of genome-wide association studies in family-based designs: a universal, robust analysis approach and an application to four genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Sungho Won

    2009-11-01

    Full Text Available For genome-wide association studies in family-based designs, we propose a new, universally applicable approach. The new test statistic exploits all available information about the association, while, by virtue of its design, it maintains the same robustness against population admixture as traditional family-based approaches that are based exclusively on the within-family information. The approach is suitable for the analysis of almost any trait type, e.g. binary, continuous, time-to-onset, multivariate, etc., and combinations of those. We use simulation studies to verify all theoretically derived properties of the approach, estimate its power, and compare it with other standard approaches. We illustrate the practical implications of the new analysis method by an application to a lung-function phenotype, forced expiratory volume in one second (FEV1 in 4 genome-wide association studies.

  18. 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis

    NARCIS (Netherlands)

    Greshock, J; Naylor, TL; Margolin, A; Diskin, S; Cleaver, SH; Futreal, PA; deJong, PJ; Zhao, SY; Liebman, M; Weber, BL

    2004-01-01

    Array-based comparative genomic hybridization (aCGH) is a recently developed tool for genome-wide determination of DNA copy number alterations. This technology has tremendous potential for disease-gene discovery in cancer and developmental disorders as well as numerous other applications. However, w

  19. Genome-based metabolic mapping and 13C flux analysis reveal systematic properties of an oleaginous microalga Chlorella protothecoides.

    Science.gov (United States)

    Wu, Chao; Xiong, Wei; Dai, Junbiao; Wu, Qingyu

    2015-02-01

    Integrated and genome-based flux balance analysis, metabolomics, and (13)C-label profiling of phototrophic and heterotrophic metabolism in Chlorella protothecoides, an oleaginous green alga for biofuel. The green alga Chlorella protothecoides, capable of autotrophic and heterotrophic growth with rapid lipid synthesis, is a promising candidate for biofuel production. Based on the newly available genome knowledge of the alga, we reconstructed the compartmentalized metabolic network consisting of 272 metabolic reactions, 270 enzymes, and 461 encoding genes and simulated the growth in different cultivation conditions with flux balance analysis. Phenotype-phase plane analysis shows conditions achieving theoretical maximum of the biomass and corresponding fatty acid-producing rate for phototrophic cells (the ratio of photon uptake rate to CO2 uptake rate equals 8.4) and heterotrophic ones (the glucose uptake rate to O2 consumption rate reaches 2.4), respectively. Isotope-assisted liquid chromatography-mass spectrometry/mass spectrometry reveals higher metabolite concentrations in the glycolytic pathway and the tricarboxylic acid cycle in heterotrophic cells compared with autotrophic cells. We also observed enhanced levels of ATP, nicotinamide adenine dinucleotide (phosphate), reduced, acetyl-Coenzyme A, and malonyl-Coenzyme A in heterotrophic cells consistently, consistent with a strong activity of lipid synthesis. To profile the flux map in experimental conditions, we applied nonstationary (13)C metabolic flux analysis as a complementing strategy to flux balance analysis. The result reveals negligible photorespiratory fluxes and a metabolically low active tricarboxylic acid cycle in phototrophic C. protothecoides. In comparison, high throughput of amphibolic reactions and the tricarboxylic acid cycle with no glyoxylate shunt activities were measured for heterotrophic cells. Taken together, the metabolic network modeling assisted by experimental metabolomics and (13)C

  20. Genome-wide family-based linkage analysis of exome chip variants and cardiometabolic risk.

    Science.gov (United States)

    Hellwege, Jacklyn N; Palmer, Nicholette D; Raffield, Laura M; Ng, Maggie C Y; Hawkins, Gregory A; Long, Jirong; Lorenzo, Carlos; Norris, Jill M; Ida Chen, Y-D; Speliotes, Elizabeth K; Rotter, Jerome I; Langefeld, Carl D; Wagenknecht, Lynne E; Bowden, Donald W

    2014-05-01

    Linkage analysis of complex traits has had limited success in identifying trait-influencing loci. Recently, coding variants have been implicated as the basis for some biomedical associations. We tested whether coding variants are the basis for linkage peaks of complex traits in 42 African-American (n = 596) and 90 Hispanic (n = 1,414) families in the Insulin Resistance Atherosclerosis Family Study (IRASFS) using Illumina HumanExome Beadchips. A total of 92,157 variants in African Americans (34%) and 81,559 (31%) in Hispanics were polymorphic and tested using two-point linkage and association analyses with 37 cardiometabolic phenotypes. In African Americans 77 LOD scores greater than 3 were observed. The highest LOD score was 4.91 with the APOE SNP rs7412 (MAF = 0.13) with plasma apolipoprotein B (ApoB). This SNP was associated with ApoB (P-value = 4 × 10(-19)) and accounted for 16.2% of the variance in African Americans. In Hispanic families, 104 LOD scores were greater than 3. The strongest evidence of linkage (LOD = 4.29) was with rs5882 (MAF = 0.46) in CETP with HDL. CETP variants were strongly associated with HDL (0.00049 evidence of strong linkage in this genome wide survey of primarily coding variants was uncommon. Loci with strong evidence of linkage was characterized by large contributions to the variance, and, in these cases, are common variants. Less compelling evidence of linkage and association was observed with additional loci that may require larger family sets to confirm. © 2014 WILEY PERIODICALS, INC.

  1. DNAskew:Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-Ru MA; Shao-Bo XIAO; Ai-Zhen GUO; Jian-Qiang L(U); Huan-Chun CHEN

    2004-01-01

    Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution,in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

  2. KEGG-PATH: Kyoto encyclopedia of genes and genomes-based pathway analysis using a path analysis model.

    Science.gov (United States)

    Du, Junli; Yuan, Zhifa; Ma, Ziwei; Song, Jiuzhou; Xie, Xiaoli; Chen, Yulin

    2014-07-29

    The dynamic impact approach (DIA) represents an alternative to overrepresentation analysis (ORA) for functional analysis of time-course experiments or those involving multiple treatments. The DIA can be used to estimate the biological impact of the differentially expressed genes (DEGs) associated with particular biological functions, for example, as represented by the Kyoto encyclopedia of genes and genomes (KEGG) annotations. However, the DIA does not take into account the correlated dependence structure of the KEGG pathway hierarchy. We have developed herein a path analysis model (KEGG-PATH) to subdivide the total effect of each KEGG pathway into the direct effect and indirect effect by taking into account not only each KEGG pathway itself, but also the correlation with its related pathways. In addition, this work also attempts to preliminarily estimate the impact direction of each KEGG pathway by a gradient analysis method from principal component analysis (PCA). As a result, the advantage of the KEGG-PATH model is demonstrated through the functional analysis of the bovine mammary transcriptome during lactation.

  3. Gene-based meta-analysis of genome-wide association studies implicates new loci involved in obesity

    DEFF Research Database (Denmark)

    Hägg, Sara; Ganna, Andrea; Van Der Laan, Sander W

    2015-01-01

    To date, genome-wide association studies (GWASs) have identified >100 loci with single variants associated with body mass index (BMI). This approach may miss loci with high allelic heterogeneity; therefore, the aim of the present study was to use gene-based meta-analysis to identify regions...... with high allelic heterogeneity to discover additional obesity susceptibility loci. We included GWAS data from 123 865 individuals of European descent from 46 cohorts in Stage 1 and Metabochip data from additional 103 046 individuals from 43 cohorts in Stage 2, all within the Genetic Investigation...... of ANthropometric Traits (GIANT) consortium. Each cohort was tested for association between ∼2.4 million (Stage 1) or ∼200 000 (Stage 2) imputed or genotyped single variants and BMI, and summary statistics were subsequently meta-analyzed in 17 941 genes. We used the ‘VErsatile Gene-based Association Study’ (VEGAS...

  4. Comparative Genome Analysis and Genome Evolution

    NARCIS (Netherlands)

    Snel, Berend

    2002-01-01

    This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies. Usi

  5. Comparative Genome Analysis and Genome Evolution

    NARCIS (Netherlands)

    Snel, Berend

    2003-01-01

    This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies. Usi

  6. A Distance Measure for Genome Phylogenetic Analysis

    Science.gov (United States)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  7. A comparative genomic study in schizophrenic and in bipolar disorder patients, based on microarray expression profiling meta-analysis.

    Science.gov (United States)

    Logotheti, Marianthi; Papadodima, Olga; Venizelos, Nikolaos; Chatziioannou, Aristotelis; Kolisis, Fragiskos

    2013-01-01

    Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%-5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets.

  8. A Comparative Genomic Study in Schizophrenic and in Bipolar Disorder Patients, Based on Microarray Expression Profiling Meta-Analysis

    Directory of Open Access Journals (Sweden)

    Marianthi Logotheti

    2013-01-01

    Full Text Available Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%–5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets.

  9. A case study for cloud based high throughput analysis of NGS data using the globus genomics system.

    Science.gov (United States)

    Bhuvaneshwar, Krithika; Sulakhe, Dinanath; Gauba, Robinder; Rodriguez, Alex; Madduri, Ravi; Dave, Utpal; Lacinski, Lukasz; Foster, Ian; Gusev, Yuriy; Madhavan, Subha

    2015-01-01

    Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the "Globus Genomics" system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.

  10. Gene-based meta-analysis of genome-wide association studies implicates new loci involved in obesity

    Science.gov (United States)

    Hägg, Sara; Ganna, Andrea; Van Der Laan, Sander W.; Esko, Tonu; Pers, Tune H.; Locke, Adam E.; Berndt, Sonja I.; Justice, Anne E.; Kahali, Bratati; Siemelink, Marten A.; Pasterkamp, Gerard; Strachan, David P.; Speliotes, Elizabeth K.; North, Kari E.; Loos, Ruth J.F.; Hirschhorn, Joel N.; Pawitan, Yudi; Ingelsson, Erik

    2015-01-01

    To date, genome-wide association studies (GWASs) have identified >100 loci with single variants associated with body mass index (BMI). This approach may miss loci with high allelic heterogeneity; therefore, the aim of the present study was to use gene-based meta-analysis to identify regions with high allelic heterogeneity to discover additional obesity susceptibility loci. We included GWAS data from 123 865 individuals of European descent from 46 cohorts in Stage 1 and Metabochip data from additional 103 046 individuals from 43 cohorts in Stage 2, all within the Genetic Investigation of ANthropometric Traits (GIANT) consortium. Each cohort was tested for association between ∼2.4 million (Stage 1) or ∼200 000 (Stage 2) imputed or genotyped single variants and BMI, and summary statistics were subsequently meta-analyzed in 17 941 genes. We used the ‘VErsatile Gene-based Association Study’ (VEGAS) approach to assign variants to genes and to calculate gene-based P-values based on simulations. The VEGAS method was applied to each cohort separately before a gene-based meta-analysis was performed. In Stage 1, two known (FTO and TMEM18) and six novel (PEX2, MTFR2, SSFA2, IARS2, CEP295 and TXNDC12) loci were associated with BMI (P gene tests). We confirmed all loci, and six of them were gene-wide significant in Stage 2 alone. We provide biological support for the loci by pathway, expression and methylation analyses. Our results indicate that gene-based meta-analysis of GWAS provides a useful strategy to find loci of interest that were not identified in standard single-marker analyses due to high allelic heterogeneity. PMID:26376864

  11. A gene-based linkage map for Bicyclus anynana butterflies allows for a comprehensive analysis of synteny with the lepidopteran reference genome.

    Directory of Open Access Journals (Sweden)

    Patrícia Beldade

    2009-02-01

    Full Text Available Lepidopterans (butterflies and moths are a rich and diverse order of insects, which, despite their economic impact and unusual biological properties, are relatively underrepresented in terms of genomic resources. The genome of the silkworm Bombyx mori has been fully sequenced, but comparative lepidopteran genomics has been hampered by the scarcity of information for other species. This is especially striking for butterflies, even though they have diverse and derived phenotypes (such as color vision and wing color patterns and are considered prime models for the evolutionary and developmental analysis of ecologically relevant, complex traits. We focus on Bicyclus anynana butterflies, a laboratory system for studying the diversification of novelties and serially repeated traits. With a panel of 12 small families and a biphasic mapping approach, we first assigned 508 expressed genes to segregation groups and then ordered 297 of them within individual linkage groups. We also coarsely mapped seven color pattern loci. This is the richest gene-based map available for any butterfly species and allowed for a broad-coverage analysis of synteny with the lepidopteran reference genome. Based on 462 pairs of mapped orthologous markers in Bi. anynana and Bo. mori, we observed strong conservation of gene assignment to chromosomes, but also evidence for numerous large- and small-scale chromosomal rearrangements. With gene collections growing for a variety of target organisms, the ability to place those genes in their proper genomic context is paramount. Methods to map expressed genes and to compare maps with relevant model systems are crucial to extend genomic-level analysis outside classical model species. Maps with gene-based markers are useful for comparative genomics and to resolve mapped genomic regions to a tractable number of candidate genes, especially if there is synteny with related model species. This is discussed in relation to the identification of

  12. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    Science.gov (United States)

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  13. Transcriptome analysis of root response to citrus blight based on the newly assembled Swingle citrumelo draft genome.

    Science.gov (United States)

    Zhang, Yunzeng; Barthe, Gary; Grosser, Jude W; Wang, Nian

    2016-07-08

    Citrus blight is a citrus tree overall decline disease and causes serious losses in the citrus industry worldwide. Although it was described more than one hundred years ago, its causal agent remains unknown and its pathophysiology is not well determined, which hampers our understanding of the disease and design of suitable disease management. In this study, we sequenced and assembled the draft genome for Swingle citrumelo, one important citrus rootstock. The draft genome is approximately 280 Mb, which covers 74 % of the estimated Swingle citrumelo genome and the average coverage is around 15X. The draft genome of Swingle citrumelo enabled us to conduct transcriptome analysis of roots of blight and healthy Swingle citrumelo using RNA-seq. The RNA-seq was reliable as evidenced by the high consistence of RNA-seq analysis and quantitative reverse transcription PCR results (R(2) = 0.966). Comparison of the gene expression profiles between blight and healthy root samples revealed the molecular mechanism underneath the characteristic blight phenotypes including decline, starch accumulation, and drought stress. The JA and ET biosynthesis and signaling pathways showed decreased transcript abundance, whereas SA-mediated defense-related genes showed increased transcript abundance in blight trees, suggesting unclassified biotrophic pathogen was involved in this disease. Overall, the Swingle citrumelo draft genome generated in this study will advance our understanding of plant biology and contribute to the citrus breeding. Transcriptome analysis of blight and healthy trees deepened our understanding of the pathophysiology of citrus blight.

  14. High-resolution Whole-Genome Analysis of Skull Base Chordomas Implicates FHIT Loss in Chordoma Pathogenesis

    Directory of Open Access Journals (Sweden)

    Roberto Jose Diaz

    2012-09-01

    Full Text Available Chordoma is a rare tumor arising in the sacrum, clivus, or vertebrae. It is often not completely resectable and shows a high incidence of recurrence and progression with shortened patient survival and impaired quality of life. Chemotherapeutic options are limited to investigational therapies at present. Therefore, adjuvant therapy for control of tumor recurrence and progression is of great interest, especially in skull base lesions where complete tumor resection is often not possible because of the proximity of cranial nerves. To understand the extent of genetic instability and associated chromosomal and gene losses or gains in skull base chordoma, we undertook whole-genome single-nucleotide polymorphism microarray analysis of flash frozen surgical chordoma specimens, 21 from the clivus and 1 from C1 to C2 vertebrae. We confirm the presence of a deletion at 9p involving CDKN2A, CDKN2B, and MTAP but at a much lower rate (22% than previously reported for sacral chordoma. At a similar frequency (21%, we found aneuploidy of chromosome 3. Tissue microarray immunohistochemistry demonstrated absent or reduced fragile histidine triad (FHIT protein expression in 98% of sacral chordomas and 67%of skull base chordomas. Our data suggest that chromosome 3 aneuploidy and epigenetic regulation of FHIT contribute to loss of the FHIT tumor suppressor in chordoma. The finding that FHIT is lost in a majority of chordomas provides new insight into chordoma pathogenesis and points to a potential new therapeutic target for this challenging neoplasm.

  15. Evaluation of a partial genome screening of two asthma susceptibility regions using bayesian network based bayesian multilevel analysis of relevance.

    Directory of Open Access Journals (Sweden)

    Ildikó Ungvári

    Full Text Available Genetic studies indicate high number of potential factors related to asthma. Based on earlier linkage analyses we selected the 11q13 and 14q22 asthma susceptibility regions, for which we designed a partial genome screening study using 145 SNPs in 1201 individuals (436 asthmatic children and 765 controls. The results were evaluated with traditional frequentist methods and we applied a new statistical method, called bayesian network based bayesian multilevel analysis of relevance (BN-BMLA. This method uses bayesian network representation to provide detailed characterization of the relevance of factors, such as joint significance, the type of dependency, and multi-target aspects. We estimated posteriors for these relations within the bayesian statistical framework, in order to estimate the posteriors whether a variable is directly relevant or its association is only mediated.With frequentist methods one SNP (rs3751464 in the FRMD6 gene provided evidence for an association with asthma (OR = 1.43(1.2-1.8; p = 3×10(-4. The possible role of the FRMD6 gene in asthma was also confirmed in an animal model and human asthmatics.In the BN-BMLA analysis altogether 5 SNPs in 4 genes were found relevant in connection with asthma phenotype: PRPF19 on chromosome 11, and FRMD6, PTGER2 and PTGDR on chromosome 14. In a subsequent step a partial dataset containing rhinitis and further clinical parameters was used, which allowed the analysis of relevance of SNPs for asthma and multiple targets. These analyses suggested that SNPs in the AHNAK and MS4A2 genes were indirectly associated with asthma. This paper indicates that BN-BMLA explores the relevant factors more comprehensively than traditional statistical methods and extends the scope of strong relevance based methods to include partial relevance, global characterization of relevance and multi-target relevance.

  16. Self-Directed Student Research through Analysis of Microarray Datasets: A Computer-Based Functional Genomics Practical Class for Masters-Level Students

    Science.gov (United States)

    Grenville-Briggs, Laura J.; Stansfield, Ian

    2011-01-01

    This report describes a linked series of Masters-level computer practical workshops. They comprise an advanced functional genomics investigation, based upon analysis of a microarray dataset probing yeast DNA damage responses. The workshops require the students to analyse highly complex transcriptomics datasets, and were designed to stimulate…

  17. The integrated microbial genome resource of analysis.

    Science.gov (United States)

    Checcucci, Alice; Mengoni, Alessio

    2015-01-01

    Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.

  18. A case study for cloud based high throughput analysis of NGS data using the globus genomics system

    Directory of Open Access Journals (Sweden)

    Krithika Bhuvaneshwar

    2015-01-01

    Full Text Available Next generation sequencing (NGS technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.

  19. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    Science.gov (United States)

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  20. Comparative genome-scale analysis of niche-based stress-responsive genes in Lactobacillus helveticus strains.

    Science.gov (United States)

    Senan, Suja; Prajapati, Jashbhai B; Joshi, Chaitanya G

    2014-04-01

    Next generation sequencing technologies with advanced bioinformatic tools present a unique opportunity to compare genomes from diverse niches. The identification of niche-specific stress-responsive genes can help in characterizing robust strains for multiple applications. In this study, we attempted to compare the stress-responsive genes of a potential probiotic strain, Lactobacillus helveticus MTCC 5463, and a cheese starter strain, Lactobacillus helveticus DPC 4571, from a gut and dairy niche, respectively. Sequencing of MTCC 5463 was done using 454 GS FLX, and contigs were assembled using GS Assembler software. Genome analysis was done using BLAST hits and the prokaryotic annotation server RAST. The MTCC 5463 genome carried multiple orthologs of genes governing stress responses, whereas the DPC 4571 genome lacked in the number of major stress-response proteins. The absence of the bile salt hydrolase gene in DPC 4571 and its presence in MTCC 5463 clearly indicated niche adaptation. Further, MTCC 5463 carried higher copy numbers of genes contributing towards heat, cold, osmotic, and oxidative stress resistance as compared with DPC 4571. Through comparative genomics, we could thus identify stress-responsive gene sets required to adapt to gut and dairy niches.

  1. GO4genome: A Prokaryotic Phylogeny Based on Genome Organization

    OpenAIRE

    Merkl, Rainer; Wiezer, Arnim

    2009-01-01

    Determining the phylogeny of closely related prokaryotes may fail in an analysis of rRNA or a small set of sequences. Whole-genome phylogeny utilizes the maximally available sample space. For a precise determination of genome similarity, two aspects have to be considered when developing an algorithm of whole-genome phylogeny: (1) gene order conservation is a more precise signal than gene content; and (2) when using sequence similarity, failures in identifying orthologues or the in situ replac...

  2. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Bansal Manju

    2011-07-01

    Full Text Available Abstract Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability in the vicinity of all annotated translation start sites (TLS. Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.

  3. In silico enhanced restriction enzyme based methylation analysis of the human glioblastoma genome using Agilent 244K CpG Island microarrays

    Directory of Open Access Journals (Sweden)

    Anh Tran

    2010-01-01

    Full Text Available Genome wide methylation profiling of gliomas is likely to provide important clues to improving treatment outcomes. Restriction enzyme based approaches have been widely utilized for methylation profiling of cancer genomes and will continue to have importance in combination with higher density microarrays. With the availability of the human genome sequence and microarray probe sequences, these approaches can be readily characterized and optimized via in silico modeling. We adapted the previously described HpaII/MspI based Methylation Sensitive Restriction Enzyme (MSRE assay for use with two-color Agilent 244K CpG island microarrays. In this assay, fragmented genomic DNA is digested in separate reactions with isoschizomeric HpaII (methylation-sensitive and MspI (methylation-insensitive restriction enzymes. Using in silico hybridization, we found that genomic fragmentation with BfaI was superior to MseI, providing a maximum effective coverage of 22,362 CpG islands in the human genome. In addition, we confirmed the presence of an internal control group of fragments lacking HpaII/MspI sites which enable separation of methylated and unmethylated fragments. We used this method on genomic DNA isolated from normal brain, U87MG cells, and a glioblastoma patient tumor sample and confirmed selected differentially methylated CpG islands using bisulfite sequencing. Along with additional validation points, we performed a receiver operating characteristics (ROC analysis to determine the optimal threshold (p ≤ 0.001. Based on this threshold, we identified ~2400 CpG islands common to all three samples and 145 CpG islands unique to glioblastoma. These data provide more general guidance to individuals seeking to maximize effective coverage using restriction enzyme based methylation profiling approaches.

  4. Comparative genome analysis of Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  5. Biosynthetic Pathway for the Epipolythiodioxopiperazine Acetylaranotin in Aspergillus terreus Revealed by Genome-based Deletion Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Guo, Chun-Jun; Yeh, Hsu-Hua; Chiang, Yi Ming; Sanchez, James F.; Chang, ShuLin; Bruno, Kenneth S.; Wang, Clay C.

    2013-04-15

    Abstract Epipolythiodioxopiperazines (ETPs) are a class of fungal secondary metabolites derived from cyclic peptides. Acetylaranotin belongs to one structural subgroup of ETPs characterized by the presence of a seven-membered dihydrooxepine ring. Defining the genes involved in acetylaranotin biosynthesis should provide a means to increase production of these compounds and facilitate the engineering of second-generation molecules. The filamentous fungus Aspergillus terreus produces acetylaranotin and related natural products. Using targeted gene deletions, we have identified a cluster of 9 genes including one nonribosomal peptide synthase gene, ataP, that is required for acetylaranotin biosynthesis. Chemical analysis of the wild type and mutant strains enabled us to isolate seventeen natural products that are either intermediates in the normal biosynthetic pathway or shunt products that are produced when the pathway is interrupted through mutation. Nine of the compounds identified in this study are novel natural products. Our data allow us to propose a complete biosynthetic pathway for acetylaranotin and related natural products.

  6. GenColors-based comparative genome databases for small eukaryotic genomes.

    Science.gov (United States)

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  7. Genome-based Taxonomic Classification of Bacteroidetes

    Directory of Open Access Journals (Sweden)

    Richard L. Hahnke

    2016-12-01

    Full Text Available The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  8. Genome-Based Taxonomic Classification of Bacteroidetes.

    Science.gov (United States)

    Hahnke, Richard L; Meier-Kolthoff, Jan P; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N; Woyke, Tanja; Kyrpides, Nikos C; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.

  9. Phylogeny and Taxonomy of Archaea: A Comparison of the Whole-Genome-Based CVTree Approach with 16S rRNA Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Guanghong Zuo

    2015-03-01

    Full Text Available A tripartite comparison of Archaea phylogeny and taxonomy at and above the rank order is reported: (1 the whole-genome-based and alignment-free CVTree using 179 genomes; (2 the 16S rRNA analysis exemplified by the All-Species Living Tree with 366 archaeal sequences; and (3 the Second Edition of Bergey’s Manual of Systematic Bacteriology complemented by some current literature. A high degree of agreement is reached at these ranks. From the newly proposed archaeal phyla, Korarchaeota, Thaumarchaeota, Nanoarchaeota and Aigarchaeota, to the recent suggestion to divide the class Halobacteria into three orders, all gain substantial support from CVTree. In addition, the CVTree helped to determine the taxonomic position of some newly sequenced genomes without proper lineage information. A few discrepancies between the CVTree and the 16S rRNA approaches call for further investigation.

  10. Genome-Based Construction of the Metabolic Pathways of Orientia tsutsugamushi and Comparative Analysis within the Rickettsiales Order

    Directory of Open Access Journals (Sweden)

    Chan-Ki Min

    2008-01-01

    Full Text Available Orientia tsutsugamushi, the causative agent of scrub typhus, is an obligate intracellular bacterium that belongs to the order of Rickettsiales. Recently, we have reported that O. tsutsugamushi has a unique genomic structure, consisting of highly repetitive sequences, and suggested that it may provide valuable insight into the evolution of intracellular bacteria. Here, we have used genomic information to construct the major metabolic pathways of O. tsutsugamushi and performed a comparative analysis of the metabolic genes and pathways of O. tsutsugamushi with other members of the Rickettsiales order. While O. tsutsugamushi has the largest genome among the members of this order, mainly due to the presence of repeated sequences, its metabolic pathways have been highly streamlined. Overall, the metabolic pathways of O. tsutsugamushi were similar to Rickettsia but there were notable differences in several pathways including carbohydrate metabolism, the TCA cycle, and the synthesis of cell wall components as well as in the transport systems. Our results will provide a useful guide to the postgenomic analysis of O. tsutsugamushi and lead to a better understanding of the virulence and physiology of this intracellular pathogen.

  11. Millstone: software for multiplex microbial genome analysis and engineering.

    Science.gov (United States)

    Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

    2017-05-25

    Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.

  12. Coronavirus Genomics and Bioinformatics Analysis

    Directory of Open Access Journals (Sweden)

    Kwok-Yung Yuen

    2010-08-01

    Full Text Available The drastic increase in the number of coronaviruses discovered and coronavirus genomes being sequenced have given us an unprecedented opportunity to perform genomics and bioinformatics analysis on this family of viruses. Coronaviruses possess the largest genomes (26.4 to 31.7 kb among all known RNA viruses, with G + C contents varying from 32% to 43%. Variable numbers of small ORFs are present between the various conserved genes (ORF1ab, spike, envelope, membrane and nucleocapsid and downstream to nucleocapsid gene in different coronavirus lineages. Phylogenetically, three genera, Alphacoronavirus, Betacoronavirus and Gammacoronavirus, with Betacoronavirus consisting of subgroups A, B, C and D, exist. A fourth genus, Deltacoronavirus, which includes bulbul coronavirus HKU11, thrush coronavirus HKU12 and munia coronavirus HKU13, is emerging. Molecular clock analysis using various gene loci revealed that the time of most recent common ancestor of human/civet SARS related coronavirus to be 1999-2002, with estimated substitution rate of 4´10-4 to 2´10-2 substitutions per site per year. Recombination in coronaviruses was most notable between different strains of murine hepatitis virus (MHV, between different strains of infectious bronchitis virus, between MHV and bovine coronavirus, between feline coronavirus (FCoV type I and canine coronavirus generating FCoV type II, and between the three genotypes of human coronavirus HKU1 (HCoV-HKU1. Codon usage bias in coronaviruses were observed, with HCoV-HKU1 showing the most extreme bias, and cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape such codon usage bias in coronaviruses.

  13. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis.

    NARCIS (Netherlands)

    Hehir-Kwa, J.Y.; Egmont-Peterson, M.; Janssen, I.M.; Smeets, D.F.C.M.; Geurts van Kessel, A.H.M.; Veltman, J.A.

    2007-01-01

    Recently, comparative genomic hybridization onto bacterial artificial chromosome (BAC) arrays (array-based comparative genomic hybridization) has proved to be successful for the detection of submicroscopic DNA copy-number variations in health and disease. Technological improvements to achieve a

  14. Genetic characterization of dogs via chromosomal analysis and array-based comparative genomic hybridization (aCGH).

    Science.gov (United States)

    Müller, M H; Reimann-Berg, N; Bullerdiek, J; Murua Escobar, H

    2012-01-01

    The results of cytogenetic and molecular cytogenetic investigations revealed similarities in genetic background and biological behaviour between tumours and genetic diseases of humans and dogs. These findings classify the dog a good and accepted model for human cancers such as osteosarcomas, mammary carcinomas, oral melanomas and others. With the appearance of new studies and advances in canine genome sequencing, the number of known homologies in diseases between these species raised and still is expected to increase. In this context, array-based comparative genomic hybridization (aCGH) provides a novel tool to rapidly characterize numerical aberrations in canine tumours or to detect copy number aberrations between different breeds. As it is possible to spot probes covering the whole genome on each chip to discover copy number aberrations of all chromosomes simultaneously, this method is time-saving and cost-effective - considering the relation of costs and the amount of data obtained. Complemented with traditional methods like karyotyping and fluorescence in situ hybridization (FISH) analyses, the aCGH is able to provide new insights into the underlying causes of canine carcinogenesis.

  15. Development of a facile droplet-based single-cell isolation platform for cultivation and genomic analysis in microorganisms

    Science.gov (United States)

    Zhang, Qiang; Wang, Tingting; Zhou, Qian; Zhang, Peng; Gong, Yanhai; Gou, Honglei; Xu, Jian; Ma, Bo

    2017-01-01

    Wider application of single-cell analysis has been limited by the lack of an easy-to-use and low-cost strategy for single-cell isolation that can be directly coupled to single-cell sequencing and single-cell cultivation, especially for small-size microbes. Herein, a facile droplet microfluidic platform was developed to dispense individual microbial cells into conventional standard containers for downstream analysis. Functional parts for cell encapsulation, droplet inspection and sorting, as well as a chip-to-tube capillary interface were integrated on one single chip with simple architecture, and control of the droplet sorting was achieved by a low-cost solenoid microvalve. Using microalgal and yeast cells as models, single-cell isolation success rate of over 90% and single-cell cultivation success rate of 80% were demonstrated. We further showed that the individual cells isolated can be used in high-quality DNA and RNA analyses at both gene-specific and whole-genome levels (i.e. real-time quantitative PCR and genome sequencing). The simplicity and reliability of the method should improve accessibility of single-cell analysis and facilitate its wider application in microbiology researches. PMID:28112223

  16. Genome-wide gene pathway analysis of psychotic illness symptom dimensions based on a new schizophrenia-specific model of the OPCRIT.

    Science.gov (United States)

    Docherty, Anna R; Bigdeli, T Bernard; Edwards, Alexis C; Bacanu, Silviu; Lee, Donghyung; Neale, Michael C; Wormley, Brandon K; Walsh, Dermot; O'Neill, F Anthony; Riley, Brien P; Kendler, Kenneth S; Fanous, Ayman H

    2015-05-01

    Empirically derived phenotypic measurements have the potential to enhance gene-finding efforts in schizophrenia. Previous research based on factor analyses of symptoms has typically included schizoaffective cases. Deriving factor loadings from analysis of only narrowly defined schizophrenia cases could yield more sensitive factor scores for gene pathway and gene ontology analyses. Using an Irish family sample, this study 1) factor analyzed clinician-rated Operational Criteria Checklist items in cases with schizophrenia only, 2) scored the full sample based on these factor loadings, and 3) implemented genome-wide association, gene-based, and gene-pathway analysis of these SCZ-based symptom factors (final N=507). Three factors emerged from the analysis of the schizophrenia cases: a manic, a depressive, and a positive symptom factor. In gene-based analyses of these factors, multiple genes had qschizophrenia.

  17. Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae

    Directory of Open Access Journals (Sweden)

    Yuan Huang

    2017-06-01

    Full Text Available Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in

  18. Copy Number Variation Analysis by Array Analysis of Single Cells Following Whole Genome Amplification.

    Science.gov (United States)

    Dimitriadou, Eftychia; Zamani Esteki, Masoud; Vermeesch, Joris Robert

    2015-01-01

    Whole genome amplification is required to ensure the availability of sufficient material for copy number variation analysis of a genome deriving from an individual cell. Here, we describe the protocols we use for copy number variation analysis of non-fixed single cells by array-based approaches following single-cell isolation and whole genome amplification. We are focusing on two alternative protocols, an isothermal and a PCR-based whole genome amplification method, followed by either comparative genome hybridization (aCGH) or SNP array analysis, respectively.

  19. Analysis of Heritability and Shared Heritability Based on Genome-Wide Association Studies for 13 Cancer Types

    Science.gov (United States)

    Wheeler, William A.; Yeager, Meredith; Panagiotou, Orestis; Wang, Zhaoming; Berndt, Sonja I.; Lan, Qing; Abnet, Christian C.; Amundadottir, Laufey T.; Figueroa, Jonine D.; Landi, Maria Teresa; Mirabello, Lisa; Savage, Sharon A.; Taylor, Philip R.; Vivo, Immaculata De; McGlynn, Katherine A.; Purdue, Mark P.; Rajaraman, Preetha; Adami, Hans-Olov; Ahlbom, Anders; Albanes, Demetrius; Amary, Maria Fernanda; An, She-Juan; Andersson, Ulrika; Andriole, Gerald; Andrulis, Irene L.; Angelucci, Emanuele; Ansell, Stephen M.; Arici, Cecilia; Armstrong, Bruce K.; Arslan, Alan A.; Austin, Melissa A.; Baris, Dalsu; Barkauskas, Donald A.; Bassig, Bryan A.; Becker, Nikolaus; Benavente, Yolanda; Benhamou, Simone; Berg, Christine; Van Den Berg, David; Bernstein, Leslie; Bertrand, Kimberly A.; Birmann, Brenda M.; Black, Amanda; Boeing, Heiner; Boffetta, Paolo; Boutron-Ruault, Marie-Christine; Bracci, Paige M.; Brinton, Louise; Brooks-Wilson, Angela R.; Bueno-de-Mesquita, H. Bas; Burdett, Laurie; Buring, Julie; Butler, Mary Ann; Cai, Qiuyin; Cancel-Tassin, Geraldine; Canzian, Federico; Carrato, Alfredo; Carreon, Tania; Carta, Angela; Chan, John K. C.; Chang, Ellen T.; Chang, Gee-Chen; Chang, I-Shou; Chang, Jiang; Chang-Claude, Jenny; Chen, Chien-Jen; Chen, Chih-Yi; Chen, Chu; Chen, Chung-Hsing; Chen, Constance; Chen, Hongyan; Chen, Kexin; Chen, Kuan-Yu; Chen, Kun-Chieh; Chen, Ying; Chen, Ying-Hsiang; Chen, Yi-Song; Chen, Yuh-Min; Chien, Li-Hsin; Chirlaque, María-Dolores; Choi, Jin Eun; Choi, Yi Young; Chow, Wong-Ho; Chung, Charles C.; Clavel, Jacqueline; Clavel-Chapelon, Françoise; Cocco, Pierluigi; Colt, Joanne S.; Comperat, Eva; Conde, Lucia; Connors, Joseph M.; Conti, David; Cortessis, Victoria K.; Cotterchio, Michelle; Cozen, Wendy; Crouch, Simon; Crous-Bou, Marta; Cussenot, Olivier; Davis, Faith G.; Ding, Ti; Diver, W. Ryan; Dorronsoro, Miren; Dossus, Laure; Duell, Eric J.; Ennas, Maria Grazia; Erickson, Ralph L.; Feychting, Maria; Flanagan, Adrienne M.; Foretova, Lenka; Fraumeni, Joseph F.; Freedman, Neal D.; Beane Freeman, Laura E.; Fuchs, Charles; Gago-Dominguez, Manuela; Gallinger, Steven; Gao, Yu-Tang; Gapstur, Susan M.; Garcia-Closas, Montserrat; García-Closas, Reina; Gascoyne, Randy D.; Gastier-Foster, Julie; Gaudet, Mia M.; Gaziano, J. Michael; Giffen, Carol; Giles, Graham G.; Giovannucci, Edward; Glimelius, Bengt; Goggins, Michael; Gokgoz, Nalan; Goldstein, Alisa M.; Gorlick, Richard; Gross, Myron; Grubb, Robert; Gu, Jian; Guan, Peng; Gunter, Marc; Guo, Huan; Habermann, Thomas M.; Haiman, Christopher A.; Halai, Dina; Hallmans, Goran; Hassan, Manal; Hattinger, Claudia; He, Qincheng; He, Xingzhou; Helzlsouer, Kathy; Henderson, Brian; Henriksson, Roger; Hjalgrim, Henrik; Hoffman-Bolton, Judith; Hohensee, Chancellor; Holford, Theodore R.; Holly, Elizabeth A.; Hong, Yun-Chul; Hoover, Robert N.; Horn-Ross, Pamela L.; Hosain, G. M. Monawar; Hosgood, H. Dean; Hsiao, Chin-Fu; Hu, Nan; Hu, Wei; Hu, Zhibin; Huang, Ming-Shyan; Huerta, Jose-Maria; Hung, Jen-Yu; Hutchinson, Amy; Inskip, Peter D.; Jackson, Rebecca D.; Jacobs, Eric J.; Jenab, Mazda; Jeon, Hyo-Sung; Ji, Bu-Tian; Jin, Guangfu; Jin, Li; Johansen, Christoffer; Johnson, Alison; Jung, Yoo Jin; Kaaks, Rudolph; Kamineni, Aruna; Kane, Eleanor; Kang, Chang Hyun; Karagas, Margaret R.; Kelly, Rachel S.; Khaw, Kay-Tee; Kim, Christopher; Kim, Hee Nam; Kim, Jin Hee; Kim, Jun Suk; Kim, Yeul Hong; Kim, Young Tae; Kim, Young-Chul; Kitahara, Cari M.; Klein, Alison P.; Klein, Robert J.; Kogevinas, Manolis; Kohno, Takashi; Kolonel, Laurence N.; Kooperberg, Charles; Kricker, Anne; Krogh, Vittorio; Kunitoh, Hideo; Kurtz, Robert C.; Kweon, Sun-Seog; LaCroix, Andrea; Lawrence, Charles; Lecanda, Fernando; Lee, Victor Ho Fun; Li, Donghui; Li, Haixin; Li, Jihua; Li, Yao-Jen; Li, Yuqing; Liao, Linda M.; Liebow, Mark; Lightfoot, Tracy; Lim, Wei-Yen; Lin, Chien-Chung; Lin, Dongxin; Lindstrom, Sara; Linet, Martha S.; Link, Brian K.; Liu, Chenwei; Liu, Jianjun; Liu, Li; Ljungberg, Börje; Lloreta, Josep; Lollo, Simonetta Di; Lu, Daru; Lund, Eiluv; Malats, Nuria; Mannisto, Satu; Marchand, Loic Le; Marina, Neyssa; Masala, Giovanna; Mastrangelo, Giuseppe; Matsuo, Keitaro; Maynadie, Marc; McKay, James; McKean-Cowdin, Roberta; Melbye, Mads; Melin, Beatrice S.; Michaud, Dominique S.; Mitsudomi, Tetsuya; Monnereau, Alain; Montalvan, Rebecca; Moore, Lee E.; Mortensen, Lotte Maxild; Nieters, Alexandra; North, Kari E.; Novak, Anne J.; Oberg, Ann L.; Offit, Kenneth; Oh, In-Jae; Olson, Sara H.; Palli, Domenico; Pao, William; Park, In Kyu; Park, Jae Yong; Park, Kyong Hwa; Patiño-Garcia, Ana; Pavanello, Sofia; Peeters, Petra H. M.; Perng, Reury-Perng; Peters, Ulrike; Petersen, Gloria M.; Picci, Piero; Pike, Malcolm C.; Porru, Stefano; Prescott, Jennifer; Prokunina-Olsson, Ludmila; Qian, Biyun; Qiao, You-Lin; Rais, Marco; Riboli, Elio; Riby, Jacques; Risch, Harvey A.; Rizzato, Cosmeri; Rodabough, Rebecca; Roman, Eve; Roupret, Morgan; Ruder, Avima M.; de Sanjose, Silvia; Scelo, Ghislaine; Schned, Alan; Schumacher, Fredrick; Schwartz, Kendra; Schwenn, Molly; Scotlandi, Katia; Seow, Adeline; Serra, Consol; Serra, Massimo; Sesso, Howard D.; Setiawan, Veronica Wendy; Severi, Gianluca; Severson, Richard K.; Shanafelt, Tait D.; Shen, Hongbing; Shen, Wei; Shin, Min-Ho; Shiraishi, Kouya; Shu, Xiao-Ou; Siddiq, Afshan; Sierrasesúmaga, Luis; Sihoe, Alan Dart Loon; Skibola, Christine F.; Smith, Alex; Smith, Martyn T.; Southey, Melissa C.; Spinelli, John J.; Staines, Anthony; Stampfer, Meir; Stern, Marianna C.; Stevens, Victoria L.; Stolzenberg-Solomon, Rachael S.; Su, Jian; Su, Wu-Chou; Sund, Malin; Sung, Jae Sook; Sung, Sook Whan; Tan, Wen; Tang, Wei; Tardón, Adonina; Thomas, David; Thompson, Carrie A.; Tinker, Lesley F.; Tirabosco, Roberto; Tjønneland, Anne; Travis, Ruth C.; Trichopoulos, Dimitrios; Tsai, Fang-Yu; Tsai, Ying-Huang; Tucker, Margaret; Turner, Jenny; Vajdic, Claire M.; Vermeulen, Roel C. H.; Villano, Danylo J.; Vineis, Paolo; Virtamo, Jarmo; Visvanathan, Kala; Wactawski-Wende, Jean; Wang, Chaoyu; Wang, Chih-Liang; Wang, Jiu-Cun; Wang, Junwen; Wei, Fusheng; Weiderpass, Elisabete; Weiner, George J.; Weinstein, Stephanie; Wentzensen, Nicolas; White, Emily; Witzig, Thomas E.; Wolpin, Brian M.; Wong, Maria Pik; Wu, Chen; Wu, Guoping; Wu, Junjie; Wu, Tangchun; Wu, Wei; Wu, Xifeng; Wu, Yi-Long; Wunder, Jay S.; Xiang, Yong-Bing; Xu, Jun; Xu, Ping; Yang, Pan-Chyr; Yang, Tsung-Ying; Ye, Yuanqing; Yin, Zhihua; Yokota, Jun; Yoon, Ho-Il; Yu, Chong-Jen; Yu, Herbert; Yu, Kai; Yuan, Jian-Min; Zelenetz, Andrew; Zeleniuch-Jacquotte, Anne; Zhang, Xu-Chao; Zhang, Yawei; Zhao, Xueying; Zhao, Zhenhong; Zheng, Hong; Zheng, Tongzhang; Zheng, Wei; Zhou, Baosen; Zhu, Meng; Zucca, Mariagrazia; Boca, Simina M.; Cerhan, James R.; Ferri, Giovanni M.; Hartge, Patricia; Hsiung, Chao Agnes; Magnani, Corrado; Miligi, Lucia; Morton, Lindsay M.; Smedby, Karin E.; Teras, Lauren R.; Vijai, Joseph; Wang, Sophia S.; Brennan, Paul; Caporaso, Neil E.; Hunter, David J.; Kraft, Peter; Rothman, Nathaniel; Silverman, Debra T.; Slager, Susan L.; Chanock, Stephen J.; Chatterjee, Nilanjan

    2015-01-01

    Background: Studies of related individuals have consistently demonstrated notable familial aggregation of cancer. We aim to estimate the heritability and genetic correlation attributable to the additive effects of common single-nucleotide polymorphisms (SNPs) for cancer at 13 anatomical sites. Methods: Between 2007 and 2014, the US National Cancer Institute has generated data from genome-wide association studies (GWAS) for 49 492 cancer case patients and 34 131 control patients. We apply novel mixed model methodology (GCTA) to this GWAS data to estimate the heritability of individual cancers, as well as the proportion of heritability attributable to cigarette smoking in smoking-related cancers, and the genetic correlation between pairs of cancers. Results: GWAS heritability was statistically significant at nearly all sites, with the estimates of array-based heritability, hl 2, on the liability threshold (LT) scale ranging from 0.05 to 0.38. Estimating the combined heritability of multiple smoking characteristics, we calculate that at least 24% (95% confidence interval [CI] = 14% to 37%) and 7% (95% CI = 4% to 11%) of the heritability for lung and bladder cancer, respectively, can be attributed to genetic determinants of smoking. Most pairs of cancers studied did not show evidence of strong genetic correlation. We found only four pairs of cancers with marginally statistically significant correlations, specifically kidney and testes (ρ = 0.73, SE = 0.28), diffuse large B-cell lymphoma (DLBCL) and pediatric osteosarcoma (ρ = 0.53, SE = 0.21), DLBCL and chronic lymphocytic leukemia (CLL) (ρ = 0.51, SE =0.18), and bladder and lung (ρ = 0.35, SE = 0.14). Correlation analysis also indicates that the genetic architecture of lung cancer differs between a smoking population of European ancestry and a nonsmoking Asian population, allowing for the possibility that the genetic etiology for the same disease can vary by population and environmental exposures. Conclusion: Our

  20. Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L.) Genome

    OpenAIRE

    Navdeep Gill; Matteo Buti; Nolan Kane; Arnaud Bellec; Nicolas Helmstetter; Hélène Berges; Loren H. Rieseberg

    2014-01-01

    Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, assess how repetitive elements in the sunflower genome are organized relative to genes, and compare the g...

  1. Systems-based analysis of the Sarcocystis neurona genome identifies pathways that contribute to a heteroxenous life cycle.

    Science.gov (United States)

    Blazejewski, Tomasz; Nursimulu, Nirvana; Pszenny, Viviana; Dangoudoubiyam, Sriveny; Namasivayam, Sivaranjani; Chiasson, Melissa A; Chessman, Kyle; Tonkin, Michelle; Swapna, Lakshmipuram S; Hung, Stacy S; Bridgers, Joshua; Ricklefs, Stacy M; Boulanger, Martin J; Dubey, Jitender P; Porcella, Stephen F; Kissinger, Jessica C; Howe, Daniel K; Grigg, Michael E; Parkinson, John

    2015-02-10

    Sarcocystis neurona is a member of the coccidia, a clade of single-celled parasites of medical and veterinary importance including Eimeria, Sarcocystis, Neospora, and Toxoplasma. Unlike Eimeria, a single-host enteric pathogen, Sarcocystis, Neospora, and Toxoplasma are two-host parasites that infect and produce infectious tissue cysts in a wide range of intermediate hosts. As a genus, Sarcocystis is one of the most successful protozoan parasites; all vertebrates, including birds, reptiles, fish, and mammals are hosts to at least one Sarcocystis species. Here we sequenced Sarcocystis neurona, the causal agent of fatal equine protozoal myeloencephalitis. The S. neurona genome is 127 Mbp, more than twice the size of other sequenced coccidian genomes. Comparative analyses identified conservation of the invasion machinery among the coccidia. However, many dense-granule and rhoptry kinase genes, responsible for altering host effector pathways in Toxoplasma and Neospora, are absent from S. neurona. Further, S. neurona has a divergent repertoire of SRS proteins, previously implicated in tissue cyst formation in Toxoplasma. Systems-based analyses identified a series of metabolic innovations, including the ability to exploit alternative sources of energy. Finally, we present an S. neurona model detailing conserved molecular innovations that promote the transition from a purely enteric lifestyle (Eimeria) to a heteroxenous parasite capable of infecting a wide range of intermediate hosts. Sarcocystis neurona is a member of the coccidia, a clade of single-celled apicomplexan parasites responsible for major economic and health care burdens worldwide. A cousin of Plasmodium, Cryptosporidium, Theileria, and Eimeria, Sarcocystis is one of the most successful parasite genera; it is capable of infecting all vertebrates (fish, reptiles, birds, and mammals-including humans). The past decade has witnessed an increasing number of human outbreaks of clinical significance associated with

  2. Integrative genomic analysis by interoperation of bioinformatics tools in GenomeSpace

    Science.gov (United States)

    Thorvaldsdottir, Helga; Liefeld, Ted; Ocana, Marco; Borges-Rivera, Diego; Pochet, Nathalie; Robinson, James T.; Demchak, Barry; Hull, Tim; Ben-Artzi, Gil; Blankenberg, Daniel; Barber, Galt P.; Lee, Brian T.; Kuhn, Robert M.; Nekrutenko, Anton; Segal, Eran; Ideker, Trey; Reich, Michael; Regev, Aviv; Chang, Howard Y.; Mesirov, Jill P.

    2015-01-01

    Integrative analysis of multiple data types to address complex biomedical questions requires the use of multiple software tools in concert and remains an enormous challenge for most of the biomedical research community. Here we introduce GenomeSpace (http://www.genomespace.org), a cloud-based, cooperative community resource. Seeded as a collaboration of six of the most popular genomics analysis tools, GenomeSpace now supports the streamlined interaction of 20 bioinformatics tools and data resources. To facilitate the ability of non-programming users’ to leverage GenomeSpace in integrative analysis, it offers a growing set of ‘recipes’, short workflows involving a few tools and steps to guide investigators through high utility analysis tasks. PMID:26780094

  3. l-lysine production by Bacillus methanolicus: Genome-based mutational analysis and l-lysine secretion engineering.

    Science.gov (United States)

    Nærdal, Ingemar; Netzer, Roman; Irla, Marta; Krog, Anne; Heggeset, Tonje Marita Bjerkan; Wendisch, Volker F; Brautaset, Trygve

    2017-02-20

    Bacillus methanolicus is a methylotrophic bacterium with an increasing interest in academic research and for biotechnological applications. This bacterium was previously applied for methanol-based production of l-glutamate, l-lysine and the five-carbon diamine cadaverine by wild type, classical mutant and recombinant strains. The genomes of two different l-lysine secreting B. methanolicus classical mutant strains, NOA2#13A52-8A66 and M168-20, were sequenced. We focused on mutational mapping in genes present in l-lysine and other relevant amino acid biosynthetic pathways, as well as in the primary cell metabolism important for precursor supply. In addition to mutations in the aspartate pathway genes dapG, lysA and hom-1, new mutational target genes like alr, proA, proB1, leuC, odhA and pdhD were identified. Surprisingly, no mutations were found in the putative l-lysine transporter gene lysE(MGA3). Inspection of the wild type B. methanolicus strain PB1 genome sequence identified two homologous putative l-lysine transporter genes, lysE(PB1) and lysE2(PB1). The biological role of these putative l-lysine transporter genes, together with the heterologous l-lysine exporter gene lysE(Cg) from Corynebacterium glutamicum, were therefore investigated. Our results demonstrated that the titer of secreted l-lysine in B. methanolicus was significantly increased by overexpression of lysE(Cg) while overexpression of lysE(MGA3), lysE(PB1) and lysE2(PB1) had no measurable effect.

  4. Analysis of intra-genomic GC content homogeneity within prokaryotes

    DEFF Research Database (Denmark)

    Bohlin, J; Snipen, L; Hardy, S.P.

    2010-01-01

    both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content......Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how...... the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content...

  5. Comparative genomic analysis of esophageal cancers.

    Science.gov (United States)

    Caygill, Christine P J; Gatenby, Piers A C; Herceg, Zdenko; Lima, Sheila C S; Pinto, Luis F R; Watson, Anthony; Wu, Ming-Shiang

    2014-09-01

    The following, from the 12th OESO World Conference: Cancers of the Esophagus, includes commentaries on comparative genomic analysis of esophageal cancers: genomic polymorphisms, the genetic and epigenetic drivers in esophageal cancers, and the collection of data in the UK Barrett's Oesophagus Registry.

  6. Genome sequence and analysis of Lactobacillus helveticus

    Directory of Open Access Journals (Sweden)

    Paola eCremonesi

    2013-01-01

    Full Text Available The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of L. helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract.As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones.

  7. Genome-wide analysis correlates Ayurveda Prakriti.

    Science.gov (United States)

    Govindaraj, Periyasamy; Nizamuddin, Sheikh; Sharath, Anugula; Jyothi, Vuskamalla; Rotti, Harish; Raval, Ritu; Nayak, Jayakrishna; Bhat, Balakrishna K; Prasanna, B V; Shintre, Pooja; Sule, Mayura; Joshi, Kalpana S; Dedge, Amrish P; Bharadwaj, Ramachandra; Gangadharan, G G; Nair, Sreekumaran; Gopinath, Puthiya M; Patwardhan, Bhushan; Kondaiah, Paturu; Satyamoorthy, Kapaettu; Valiathan, Marthanda Varma Sankaran; Thangaraj, Kumarasamy

    2015-10-29

    The practice of Ayurveda, the traditional medicine of India, is based on the concept of three major constitutional types (Vata, Pitta and Kapha) defined as "Prakriti". To the best of our knowledge, no study has convincingly correlated genomic variations with the classification of Prakriti. In the present study, we performed genome-wide SNP (single nucleotide polymorphism) analysis (Affymetrix, 6.0) of 262 well-classified male individuals (after screening 3416 subjects) belonging to three Prakritis. We found 52 SNPs (p ≤ 1 × 10(-5)) were significantly different between Prakritis, without any confounding effect of stratification, after 10(6) permutations. Principal component analysis (PCA) of these SNPs classified 262 individuals into their respective groups (Vata, Pitta and Kapha) irrespective of their ancestry, which represent its power in categorization. We further validated our finding with 297 Indian population samples with known ancestry. Subsequently, we found that PGM1 correlates with phenotype of Pitta as described in the ancient text of Caraka Samhita, suggesting that the phenotypic classification of India's traditional medicine has a genetic basis; and its Prakriti-based practice in vogue for many centuries resonates with personalized medicine.

  8. Identification of probable genomic packaging signal sequence from SARS—CoV genome by bioinformatics analysis

    Institute of Scientific and Technical Information of China (English)

    QINLei; XIONGBin; LUOCheng; GUOZong-Ming; HAOPei; SUJiong; NANPeng; FENGYing; SHIYi-Xiang; YUXiao-Jing; LUOXiao-Min; CHENKai-Xian; SHENXu; SHENJian-Hua; ZOUJian-Ping; ZHAOGuo-Ping; SHITie-Liu; HEWei-Zhong; ZHONGYang; JIANGHua-Liang; LIYi-Xue

    2003-01-01

    AIM:To predict the probable genomic packaging signal of SARS-CoV by bioinformatics analysis. The derived packaging signal may be used to design antisense RNA and RNA interfere (RANi) drugs treating SARS. methods: Based on the studies about the genomic packaging signals of MHV and BCoV, especially the information about primary and secondary structures, the putative genomic packaging signal of SARS_CoV were analyzed by using bioinformatic tools. Multi-alignment for the genomic sequences was performed among SARS-CoV,MHV,BCoV, PEDV and HCoV 229E. Secondary structures of RNA sequences were also predicted for the identification fo the possible genomic packaging signals. Meanwhile, the N and M proteins of all five viruses were analyzed to study the evolutionary relationship with genomic packaging signals. RESULTS: The putative genomic packaging signal of SARS-CoV locates at the 3′ end of ORF1b near that of MHV and BCoV, where is the most variable region of this gene. The RNA secondary structure of SARS-CoV genomic packaging signal is very similar to that of MHV and BCoV. The same result was also obtained in studying the genomic packaging signals of PEDV and HCoV 229E. Further more, the genomic sequence multi-alignment indicated that the locations of packaging signals of SARS-CoV, PEDV, and HCoV overlaped each other. It seems that the mutation rate of packaging signal sequences is much higher than the N protein, while only subtle variations for the M protein. CONCLUSIONS: The probable genomic packaging signal of SARS-CoV is analogous to that of MHV and BCoV, with the corresponding secondary RNA structure locating at the similar region of ORF1b. The positions where genomic packaging signals exist have suffered rounds of mutations, which may influence the primary structures of the N and M proteins consequently.

  9. DNA Data Visualization (DDV): Software for Generating Web-Based Interfaces Supporting Navigation and Analysis of DNA Sequence Data of Entire Genomes.

    Science.gov (United States)

    Neugebauer, Tomasz; Bordeleau, Eric; Burrus, Vincent; Brzezinski, Ryszard

    2015-01-01

    Data visualization methods are necessary during the exploration and analysis activities of an increasingly data-intensive scientific process. There are few existing visualization methods for raw nucleotide sequences of a whole genome or chromosome. Software for data visualization should allow the researchers to create accessible data visualization interfaces that can be exported and shared with others on the web. Herein, novel software developed for generating DNA data visualization interfaces is described. The software converts DNA data sets into images that are further processed as multi-scale images to be accessed through a web-based interface that supports zooming, panning and sequence fragment selection. Nucleotide composition frequencies and GC skew of a selected sequence segment can be obtained through the interface. The software was used to generate DNA data visualization of human and bacterial chromosomes. Examples of visually detectable features such as short and long direct repeats, long terminal repeats, mobile genetic elements, heterochromatic segments in microbial and human chromosomes, are presented. The software and its source code are available for download and further development. The visualization interfaces generated with the software allow for the immediate identification and observation of several types of sequence patterns in genomes of various sizes and origins. The visualization interfaces generated with the software are readily accessible through a web browser. This software is a useful research and teaching tool for genetics and structural genomics.

  10. Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L. Genome

    Directory of Open Access Journals (Sweden)

    Navdeep Gill

    2014-04-01

    Full Text Available Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, assess how repetitive elements in the sunflower genome are organized relative to genes, and compare the genomic distribution of these repeats to that found in other food crops and model species. We also examine the expression of transposable element-related transcripts in EST databases for sunflower to determine the representation of repeats in the transcriptome and to measure their transcriptional activity. Our data confirm previous reports in suggesting that the sunflower genome is >78% repetitive. Sunflower repeats share very little similarity to other plant repeats such as those of Arabidopsis, rice, maize and wheat; overall 28% of repeats are “novel” to sunflower. The repetitive sequences appear to be randomly distributed within the sequenced BACs. Assuming the 96 BACs are representative of the genome as a whole, then approximately 5.2% of the sunflower genome comprises non TE-related genic sequence, with an average gene density of 18kbp/gene. Expression levels of these transposable elements indicate tissue specificity and differential expression in vegetative and reproductive tissues, suggesting that expressed TEs might contribute to sunflower development. The assembled BACs will also be useful for assessing the quality of several different draft assemblies of the sunflower genome and for annotating the reference sequence.

  11. Sequence-Based Analysis of Structural Organization and Composition of the Cultivated Sunflower (Helianthus annuus L.) Genome.

    Science.gov (United States)

    Gill, Navdeep; Buti, Matteo; Kane, Nolan; Bellec, Arnaud; Helmstetter, Nicolas; Berges, Hélène; Rieseberg, Loren H

    2014-04-16

    Sunflower is an important oilseed crop, as well as a model system for evolutionary studies, but its 3.6 gigabase genome has proven difficult to assemble, in part because of the high repeat content of its genome. Here we report on the sequencing, assembly, and analyses of 96 randomly chosen BACs from sunflower to provide additional information on the repeat content of the sunflower genome, assess how repetitive elements in the sunflower genome are organized relative to genes, and compare the genomic distribution of these repeats to that found in other food crops and model species. We also examine the expression of transposable element-related transcripts in EST databases for sunflower to determine the representation of repeats in the transcriptome and to measure their transcriptional activity. Our data confirm previous reports in suggesting that the sunflower genome is >78% repetitive. Sunflower repeats share very little similarity to other plant repeats such as those of Arabidopsis, rice, maize and wheat; overall 28% of repeats are "novel" to sunflower. The repetitive sequences appear to be randomly distributed within the sequenced BACs. Assuming the 96 BACs are representative of the genome as a whole, then approximately 5.2% of the sunflower genome comprises non TE-related genic sequence, with an average gene density of 18kbp/gene. Expression levels of these transposable elements indicate tissue specificity and differential expression in vegetative and reproductive tissues, suggesting that expressed TEs might contribute to sunflower development. The assembled BACs will also be useful for assessing the quality of several different draft assemblies of the sunflower genome and for annotating the reference sequence.

  12. Whole genome analysis of a Vietnamese trio

    Indian Academy of Sciences (India)

    Dang Thanh Hai; Nguyen Dai Thanh; Pham Thi Minh Trang; Le Si Quang; Phan Thi Thu Hang; Dang Cao Cuong; Hoang Kim Phuc; Nguyen Huu Duc; Do Duc Dong; Bui Quang Minh; Pham Bao Son; Le Sy Vinh

    2015-03-01

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome.We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥ 300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

  13. Comparative genomic analysis of eutherian kallikrein genes

    Directory of Open Access Journals (Sweden)

    Marko Premzl

    2017-03-01

    Full Text Available The present study made attempts to update and revise eutherian kallikrein genes implicated in major physiological and pathological processes and in medical molecular diagnostics. Using eutherian comparative genomic analysis protocol and free available genomic sequence assemblies, the tests of reliability of eutherian public genomic sequences annotated most comprehensive curated third party data gene data set of eutherian kallikrein genes including 121 complete coding sequences among 335 potential coding sequences. The present analysis first described 13 major gene clusters of eutherian kallikrein genes, and explained their differential gene expansion patterns. One updated classification and nomenclature of eutherian kallikrein genes was proposed, as new framework of future experiments.

  14. AgBase: a functional genomics resource for agriculture

    Directory of Open Access Journals (Sweden)

    Hill David P

    2006-09-01

    Full Text Available Abstract Background Many agricultural species and their pathogens have sequenced genomes and more are in progress. Agricultural species provide food, fiber, xenotransplant tissues, biopharmaceuticals and biomedical models. Moreover, many agricultural microorganisms are human zoonoses. However, systems biology from functional genomics data is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation and agricultural research communities are smaller with limited funding compared to many model organism communities. Description To facilitate systems biology in these traditionally agricultural species we have established "AgBase", a curated, web-accessible, public resource http://www.agbase.msstate.edu for structural and functional annotation of agricultural genomes. The AgBase database includes a suite of computational tools to use GO annotations. We use standardized nomenclature following the Human Genome Organization Gene Nomenclature guidelines and are currently functionally annotating chicken, cow and sheep gene products using the Gene Ontology (GO. The computational tools we have developed accept and batch process data derived from different public databases (with different accession codes, return all existing GO annotations, provide a list of products without GO annotation, identify potential orthologs, model functional genomics data using GO and assist proteomics analysis of ESTs and EST assemblies. Our journal database helps prevent redundant manual GO curation. We encourage and publicly acknowledge GO annotations from researchers and provide a service for researchers interested in GO and analysis of functional genomics data. Conclusion The AgBase database is the first database dedicated to functional genomics and systems biology analysis for agriculturally important species and their pathogens. We use experimental data to improve structural annotation of genomes and to

  15. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-25

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.

  16. Quantitation of Bt-176 maize genomic sequences by surface plasmon resonance-based biospecific interaction analysis of multiplex polymerase chain reaction (PCR).

    Science.gov (United States)

    Feriotto, Giordana; Gardenghi, Sara; Bianchi, Nicoletta; Gambari, Roberto

    2003-07-30

    Surface plasmon resonance (SPR) based biosensors have been described for the identification of genetically modified organisms (GMO) by biospecific interaction analysis (BIA). This paper describes the design and testing of an SPR-based BIA protocol for quantitative determinations of GMOs. Biotinylated multiplex Polymerase Chain Reaction (PCR) products from nontransgenic maize as well as maize powders containing 0.5 and 2% genetically modified Bt-176 sequences were immobilized on different flow cells of a sensor chip. After immobilization, different oligonucleotide probes recognizing maize zein and Bt-176 sequences were injected. The results obtained were compared with Southern blot analysis and with quantitative real-time PCR assays. It was demonstrated that sequential injections of Bt-176 and zein probes to sensor chip flow cells containing multiplex PCR products allow discrimination between PCR performed using maize genomic DNA containing 0.5% Bt-176 sequences and that performed using maize genomic DNA containing 2% Bt-176 sequences. The efficiency of SPR-based BIA in discriminating material containing different amounts of Bt-176 maize is comparable to real-time quantitative PCR and much more reliable than Southern blotting, which in the past has been used for semiquantitative purposes. Furthermore, the approach allows the BIA assay to be repeated several times on the same multiplex PCR product immobilized on the sensor chip, after washing and regeneration of the flow cell. Finally, it is emphasized that the presented strategy to quantify GMOs could be proposed for all of the SPR-based, commercially available biosensors. Some of these optical SPR-based biosensors use, instead of flow-based sensor chips, stirred microcuvettes, reducing the costs of the experimentation.

  17. Genome-wide copy number profiling on high-density bacterial artificial chromosomes, single-nucleotide polymorphisms, and oligonucleotide microarrays: a platform comparison based on statistical power analysis.

    NARCIS (Netherlands)

    Hehir-Kwa, J.Y.; Egmont-Peterson, M.; Janssen, I.M.; Smeets, D.F.C.M.; Geurts van Kessel, A.H.M.; Veltman, J.A.

    2007-01-01

    Recently, comparative genomic hybridization onto bacterial artificial chromosome (BAC) arrays (array-based comparative genomic hybridization) has proved to be successful for the detection of submicroscopic DNA copy-number variations in health and disease. Technological improvements to achieve a high

  18. Mycobacterial species as case-study of comparative genome analysis.

    Science.gov (United States)

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  19. An Orthology-Based Analysis of Pathogenic Protozoa Impacting Global Health: An Improved Comparative Genomics Approach with Prokaryotes and Model Eukaryote Orthologs

    Science.gov (United States)

    Cuadrat, Rafael R. C.; da Serra Cruz, Sérgio Manuel; Tschoeke, Diogo Antônio; Silva, Edno; Tosta, Frederico; Jucá, Henrique; Jardim, Rodrigo; Campos, Maria Luiza M.; Mattoso, Marta

    2014-01-01

    Abstract A key focus in 21st century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools. PMID:24960463

  20. Analysis of intra-genomic GC content homogeneity within prokaryotes

    Directory of Open Access Journals (Sweden)

    Bohlin Jon

    2010-08-01

    Full Text Available Abstract Background Bacterial genomes possess varying GC content (total guanines (Gs and cytosines (Cs per total of the four bases within the genome but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content of the total genome. A low GCVAR indicates intra-genomic GC homogeneity and high GCVAR heterogeneity. Results The regression analyses indicated that GCVAR was significantly associated with domain (i.e. archaea or bacteria, phylum, and oxygen requirement. GCVAR was significantly higher among anaerobes than both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content was also found but appears to be non-linear and varies greatly among phyla. Conclusions Our findings show that GCVAR is linked with oxygen requirement, while mean genomic GC content is not. We therefore suggest that GCVAR should be used as a complement to mean GC content.

  1. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    Science.gov (United States)

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  2. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics.

    Science.gov (United States)

    Fagerberg, Linn; Hallström, Björn M; Oksvold, Per; Kampf, Caroline; Djureinovic, Dijana; Odeberg, Jacob; Habuka, Masato; Tahmasebpoor, Simin; Danielsson, Angelika; Edlund, Karolina; Asplund, Anna; Sjöstedt, Evelina; Lundberg, Emma; Szigyarto, Cristina Al-Khalili; Skogs, Marie; Takanen, Jenny Ottosson; Berling, Holger; Tegel, Hanna; Mulder, Jan; Nilsson, Peter; Schwenk, Jochen M; Lindskog, Cecilia; Danielsson, Frida; Mardinoglu, Adil; Sivertsson, Asa; von Feilitzen, Kalle; Forsberg, Mattias; Zwahlen, Martin; Olsson, IngMarie; Navani, Sanjay; Huss, Mikael; Nielsen, Jens; Ponten, Fredrik; Uhlén, Mathias

    2014-02-01

    Global classification of the human proteins with regards to spatial expression patterns across organs and tissues is important for studies of human biology and disease. Here, we used a quantitative transcriptomics analysis (RNA-Seq) to classify the tissue-specific expression of genes across a representative set of all major human organs and tissues and combined this analysis with antibody-based profiling of the same tissues. To present the data, we launch a new version of the Human Protein Atlas that integrates RNA and protein expression data corresponding to ∼80% of the human protein-coding genes with access to the primary data for both the RNA and the protein analysis on an individual gene level. We present a classification of all human protein-coding genes with regards to tissue-specificity and spatial expression pattern. The integrative human expression map can be used as a starting point to explore the molecular constituents of the human body.

  3. Mathematical Analysis of Genomic Evolution

    Directory of Open Access Journals (Sweden)

    Cedric Green

    2011-01-01

    Full Text Available Changes in nucleotide sequences, or mutations, accumulate from generation to generation in the genomes of all living organisms. The mutations can be advantageous, deleterious, or neutral. The goal of this project is to determine the amount of advantageous mutations it takes to get human (Homo sapiens DNA from the DNA of genetically distinct organisms. We do this by collecting the genomic data of such organisms, and estimating the amount of mutations it takes to transform yeast (Saccharomyces cerevisiae DNA to the DNA of a human. We calculate the typical number of mutations occurring annually through the organism's average life span and the average mutation rate. This allows us to determine the total number of mutations as well as the probability of advantageous mutations. Not surprisingly, this probability proves to be fairly small. A more precise estimate can be determined by accounting for the differences in the chromosomal structure and phenomena like horizontal gene transfer.

  4. Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis

    Directory of Open Access Journals (Sweden)

    Jolly Emmitt R

    2005-11-01

    Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.

  5. P2RP: a Web-based framework for the identification and analysis of regulatory proteins in prokaryotic genomes.

    Science.gov (United States)

    Barakat, Mohamed; Ortet, Philippe; Whitworth, David E

    2013-04-20

    Regulatory proteins (RPs) such as transcription factors (TFs) and two-component system (TCS) proteins control how prokaryotic cells respond to changes in their external and/or internal state. Identification and annotation of TFs and TCSs is non-trivial, and between-genome comparisons are often confounded by different standards in annotation. There is a need for user-friendly, fast and convenient tools to allow researchers to overcome the inherent variability in annotation between genome sequences. We have developed the web-server P2RP (Predicted Prokaryotic Regulatory Proteins), which enables users to identify and annotate TFs and TCS proteins within their sequences of interest. Users can input amino acid or genomic DNA sequences, and predicted proteins therein are scanned for the possession of DNA-binding domains and/or TCS domains. RPs identified in this manner are categorised into families, unambiguously annotated, and a detailed description of their features generated, using an integrated software pipeline. P2RP results can then be outputted in user-specified formats. Biologists have an increasing need for fast and intuitively usable tools, which is why P2RP has been developed as an interactive system. As well as assisting experimental biologists to interrogate novel sequence data, it is hoped that P2RP will be built into genome annotation pipelines and re-annotation processes, to increase the consistency of RP annotation in public genomic sequences. P2RP is the first publicly available tool for predicting and analysing RP proteins in users' sequences. The server is freely available and can be accessed along with documentation at http://www.p2rp.org.

  6. Geographical origin of Plasmodium vivax in the Republic of Korea: haplotype network analysis based on the parasite's mitochondrial genome

    Directory of Open Access Journals (Sweden)

    Iwagami Moritoshi

    2010-06-01

    Full Text Available Abstract Background The Republic of Korea (South Korea is one of the countries where vivax malaria had been successfully eradicated by the late 1970s. However, re-emergence of vivax malaria in South Korea was reported in 1993. Several epidemiological studies and some genetic studies using antigenic molecules of Plasmodium vivax in the country have been reported, but the evolutionary history of P. vivax has not been fully understood. In this study, the origin of the South Korean P. vivax population was estimated by molecular phylogeographic analysis. Methods A haplotype network analysis based on P. vivax mitochondrial (mt DNA sequences was conducted on 11 P. vivax isolates from South Korea and another 282 P. vivax isolates collected worldwide. Results The network analysis of P. vivax mtDNA sequences showed that the coexistence of two different groups (A and B in South Korea. Groups A and B were identical or close to two different populations in southern China. Conclusions Although the direct introduction of the two P. vivax populations in South Korea were thought to have been from North Korea, the results of this analysis suggest the genealogical origin to be the two different populations in southern China.

  7. Comparative Genome Analysis in the Integrated Microbial Genomes(IMG) System

    Energy Technology Data Exchange (ETDEWEB)

    Kyrpides, Nikos C.; Markowitz, Victor M.

    2006-03-01

    Comparative genome analysis is critical for the effectiveexploration of a rapidly growing number of complete and draft sequencesfor microbial genomes. The Integrated Microbial Genomes (IMG) system(img.jgi.doe.gov) has been developed as a community resource thatprovides support for comparative analysis of microbial genomes in anintegrated context. IMG allows users to navigate the multidimensionalmicrobial genome data space and focus their analysis on a subset ofgenes, genomes, and functions of interest. IMG provides graphicalviewers, summaries and occurrence profile tools for comparing genes,pathways and functions (terms) across specific genomes. Genes can befurther examined using gene neighborhoods and compared with sequencealignment tools.

  8. A model-based analysis of GC-biased gene conversion in the human and chimpanzee genomes.

    Directory of Open Access Journals (Sweden)

    John A Capra

    Full Text Available GC-biased gene conversion (gBGC is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.

  9. Excised radicle tips as a source of genomic DNA for PCR-based genotyping and melting curve analysis in cotton

    Indian Academy of Sciences (India)

    P Srinivasa Rao; P Sateesh Kumar; Ramesh V Sonti

    2013-03-01

    Genomic DNA isolation in cotton is complicated because of the presence of secondary metabolites that are inhibitory to PCR amplification. We report here that radicle tips, but not other parts of cotton seedlings, yield high-quality DNA that is readily amenable for PCR. The radicle-tip-excised seedlings retain viability because of the formation of adventitious roots. We demonstrate the utility of this method in distinguishing homozygotes from heterozygotes in a cotton breeding population and in hybrid seed purity testing.

  10. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  11. Automated cell analysis tool for a genome-wide RNAi screen with support vector machine based supervised learning

    Science.gov (United States)

    Remmele, Steffen; Ritzerfeld, Julia; Nickel, Walter; Hesser, Jürgen

    2011-03-01

    RNAi-based high-throughput microscopy screens have become an important tool in biological sciences in order to decrypt mostly unknown biological functions of human genes. However, manual analysis is impossible for such screens since the amount of image data sets can often be in the hundred thousands. Reliable automated tools are thus required to analyse the fluorescence microscopy image data sets usually containing two or more reaction channels. The herein presented image analysis tool is designed to analyse an RNAi screen investigating the intracellular trafficking and targeting of acylated Src kinases. In this specific screen, a data set consists of three reaction channels and the investigated cells can appear in different phenotypes. The main issue of the image processing task is an automatic cell segmentation which has to be robust and accurate for all different phenotypes and a successive phenotype classification. The cell segmentation is done in two steps by segmenting the cell nuclei first and then using a classifier-enhanced region growing on basis of the cell nuclei to segment the cells. The classification of the cells is realized by a support vector machine which has to be trained manually using supervised learning. Furthermore, the tool is brightness invariant allowing different staining quality and it provides a quality control that copes with typical defects during preparation and acquisition. A first version of the tool has already been successfully applied for an RNAi-screen containing three hundred thousand image data sets and the SVM extended version is designed for additional screens.

  12. Prediction of operon-like gene clusters in the Arabidopsis thaliana genome based on co-expression analysis of neighboring genes.

    Science.gov (United States)

    Wada, Masayoshi; Takahashi, Hiroki; Altaf-Ul-Amin, Md; Nakamura, Kensuke; Hirai, Masami Y; Ohta, Daisaku; Kanaya, Shigehiko

    2012-07-15

    Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of EOperon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system.

  13. Array-based comparative genomic hybridization is more informative than conventional karyotyping and fluorescence in situ hybridization in the analysis of first-trimester spontaneous abortion

    Directory of Open Access Journals (Sweden)

    Gao Jinsong

    2012-07-01

    Full Text Available Abstract Background Array-based comparative genomic hybridization (aCGH is a new technique for detecting submicroscopic deletions and duplications, and can overcome many of the limitations associated with classic cytogenetic analysis. However, its clinical use in spontaneous abortion needs comprehensive evaluation. We used aCGH to investigate chromosomal imbalances in 100 spontaneous abortions and compared the results with G-banding karyotyping and fluorescence in situ hybridization (FISH. Inconsistent results were verified by quantitative fluorescence PCR. Results Abnormalities were detected in 61 cases. aCGH achieved the highest detection rate (93.4%, 57/61 compared with traditional karyotyping (77%, 47/61 and FISH analysis (68.9%, 42/61. aCGH identified all chromosome abnormalities reported by traditional karyotyping and interphase FISH analysis, with the exception of four triploids. It also detected three additional aneuploidy cases in 37 specimens with ‘normal’ karyotypes, one mosaicism and 10 abnormalities in 14 specimens that failed to grow in vitro. Conclusions aCGH analysis circumvents many limitations in traditional karyotyping or FISH. The accuracy and efficiency of aCGH in spontaneous abortions highlights its clinical usefulness for the future. As aborted tissues have the potential to be contaminated with maternal cells, the threshold value of detection in aCGH should be lowered to avoid false negatives.

  14. Pathway-based genome-wide association analysis of milk coagulation properties, curd firmness, cheese yield, and curd nutrient recovery in dairy cattle.

    Science.gov (United States)

    Dadousis, C; Pegolo, S; Rosa, G J M; Gianola, D; Bittante, G; Cecchinato, A

    2017-02-01

    It is becoming common to complement genome-wide association studies (GWAS) with gene-set enrichment analysis to deepen the understanding of the biological pathways affecting quantitative traits. Our objective was to conduct a gene ontology and pathway-based analysis to identify possible biological mechanisms involved in the regulation of bovine milk technological traits: coagulation properties, curd firmness modeling, individual cheese yield (CY), and milk nutrient recovery into the curd (REC) or whey loss traits. Results from 2 previous GWAS studies using 1,011 cows genotyped for 50k single nucleotide polymorphisms were used. Overall, the phenotypes analyzed consisted of 3 traditional milk coagulation property measures [RCT: rennet coagulation time defined as the time (min) from addition of enzyme to the beginning of coagulation; k20: the interval (min) from RCT to the time at which a curd firmness of 20 mm is attained; a30: a measure of the extent of curd firmness (mm) 30 min after coagulant addition], 6 curd firmness modeling traits [RCTeq: RCT estimated through the CF equation (min); CFP: potential asymptotic curd firmness (mm); kCF: curd-firming rate constant (% × min(-1)); kSR: syneresis rate constant (% × min(-1)); CFmax: maximum curd firmness (mm); and tmax: time to CFmax (min)], 3 individual CY-related traits expressing the weight of fresh curd (%CYCURD), curd solids (%CYSOLIDS), and curd moisture (%CYWATER) as a percentage of weight of milk processed and 4 milk nutrient and energy recoveries in the curd (RECFAT, RECPROTEIN, RECSOLIDS, and RECENERGY calculated as the % ratio between the nutrient in curd and the corresponding nutrient in processed milk), milk pH, and protein percentage. Each trait was analyzed separately. In total, 13,269 annotated genes were used in the analysis. The Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway databases were queried for enrichment analyses. Overall, 21 Gene Ontology and 17 Kyoto Encyclopedia of

  15. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  16. Genome-based analysis of virulence genes in a non-biofilm-forming Staphylococcus epidermidis strain (ATCC 12228).

    Science.gov (United States)

    Zhang, Yue-Qing; Ren, Shuang-Xi; Li, Hua-Lin; Wang, Yong-Xiang; Fu, Gang; Yang, Jian; Qin, Zhi-Qiang; Miao, You-Gang; Wang, Wen-Yi; Chen, Run-Sheng; Shen, Yan; Chen, Zhu; Yuan, Zheng-Hong; Zhao, Guo-Ping; Qu, Di; Danchin, Antoine; Wen, Yu-Mei

    2003-09-01

    Staphylococcus epidermidis strains are diverse in their pathogenicity; some are invasive and cause serious nosocomial infections, whereas others are non-pathogenic commensal organisms. To analyse the implications of different virulence factors in Staphylococcus epidermidis infections, the complete genome of Staphylococcus epidermidis strain ATCC 12228, a non-biofilm forming, non-infection associated strain used for detection of residual antibiotics in food products, was sequenced. This strain showed low virulence by mouse and rat experimental infections. The genome consists of a single 2499 279 bp chromosome and six plasmids. The chromosomal G + C content is 32.1% and 2419 protein coding sequences (CDS) are predicted, among which 230 are putative novel genes. Compared to the virulence factors in Staphylococcus aureus, aside from delta-haemolysin and beta-haemolysin, other toxin genes were not found. In contrast, the majority of adhesin genes are intact in ATCC 12228. Most strikingly, the ica operon coding for the enzymes synthesizing interbacterial cellular polysaccharide is missing in ATCC 12228 and rearrangements of adjacent genes are shown. No mec genes, IS256, IS257, were found in ATCC 12228. It is suggested that the absence of the ica operon is a genetic marker in commensal Staphylococcus epidermidis strains which are less likely to become invasive.

  17. AcCNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks.

    Science.gov (United States)

    Lanza, Val F; Baquero, Fernando; de la Cruz, Fernando; Coque, Teresa M

    2017-01-15

    AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population.

  18. Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

    Directory of Open Access Journals (Sweden)

    Villegas Andre

    2010-09-01

    Full Text Available Abstract Background The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. Results Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST. The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. Conclusion Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs

  19. Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis

    Directory of Open Access Journals (Sweden)

    Wang Shan

    2012-08-01

    Full Text Available Abstract Background Identification of the novel genes relevant to plant cell-wall (PCW synthesis represents a highly important and challenging problem. Although substantial efforts have been invested into studying this problem, the vast majority of the PCW related genes remain unknown. Results Here we present a computational study focused on identification of the novel PCW genes in Arabidopsis based on the co-expression analyses of transcriptomic data collected under 351 conditions, using a bi-clustering technique. Our analysis identified 217 highly co-expressed gene clusters (modules under some experimental conditions, each containing at least one gene annotated as PCW related according to the Purdue Cell Wall Gene Families database. These co-expression modules cover 349 known/annotated PCW genes and 2,438 new candidates. For each candidate gene, we annotated the specific PCW synthesis stages in which it is involved and predicted the detailed function. In addition, for the co-expressed genes in each module, we predicted and analyzed their cis regulatory motifs in the promoters using our motif discovery pipeline, providing strong evidence that the genes in each co-expression module are transcriptionally co-regulated. From the all co-expression modules, we infer that 108 modules are related to four major PCW synthesis components, using three complementary methods. Conclusions We believe our approach and data presented here will be useful for further identification and characterization of PCW genes. All the predicted PCW genes, co-expression modules, motifs and their annotations are available at a web-based database: http://csbl.bmb.uga.edu/publications/materials/shanwang/CWRPdb/index.html.

  20. The Hamilton Depression Scale (HAM-D) and the Montgomery-Åsberg Depression Scale (MADRS). A psychometric re-analysis of the European Genome-Based Therapeutic Drugs for Depression Study using Rasch analysis

    DEFF Research Database (Denmark)

    Bech, Per; Allerup, Peter; Larsen, Erik Roj

    2014-01-01

    The objective of this re-analysis of the European Genome-Based Therapeutic Drugs for Depression Study (GENDEP) was to psychometrically test the unidimensionality of the full Montgomery Åsberg Depression Rating Scale (MADRS10) and the Hamilton Depression Scale (HAM-D17) versus their respective...... 2030 had to be supplemented with the Friedman two-way analysis of variance by ranks. The HAM-D6 but not the MADRS5 was accepted. It was therefore concluded that the HAM-D6 is a psychometrically valid outcome scale to measure change in clinical trials of antidepressants....

  1. Genome wide copy number analysis of single cells

    Science.gov (United States)

    Baslan, Timour; Kendall, Jude; Rodgers, Linda; Cox, Hilary; Riggs, Mike; Stepansky, Asya; Troge, Jennifer; Ravi, Kandasamy; Esposito, Diane; Lakshmi, B.; Wigler, Michael; Navin, Nicholas; Hicks, James

    2016-01-01

    Summary Copy number variation (CNV) is increasingly recognized as an important contributor to phenotypic variation in health and disease. Most methods for determining CNV rely on admixtures of cells, where information regarding genetic heterogeneity is lost. Here, we present a protocol that allows for the genome wide copy number analysis of single nuclei isolated from mixed populations of cells. Single nucleus sequencing (SNS), combines flow sorting of single nuclei based on DNA content, whole genome amplification (WGA), followed by next generation sequencing to quantize genomic intervals in a genome wide manner. Multiplexing of single cells is discussed. Additionally, we outline informatic approaches that correct for biases inherent in the WGA procedure and allow for accurate determination of copy number profiles. All together, the protocol takes ~3 days from flow cytometry to sequence-ready DNA libraries. PMID:22555242

  2. AGAPE (Automated Genome Analysis PipelinE for pan-genome analysis of Saccharomyces cerevisiae.

    Directory of Open Access Journals (Sweden)

    Giltae Song

    Full Text Available The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  3. AGAPE (Automated Genome Analysis PipelinE) for pan-genome analysis of Saccharomyces cerevisiae.

    Science.gov (United States)

    Song, Giltae; Dickins, Benjamin J A; Demeter, Janos; Engel, Stacia; Gallagher, Jennifer; Choe, Kisurb; Dunn, Barbara; Snyder, Michael; Cherry, J Michael

    2015-01-01

    The characterization and public release of genome sequences from thousands of organisms is expanding the scope for genetic variation studies. However, understanding the phenotypic consequences of genetic variation remains a challenge in eukaryotes due to the complexity of the genotype-phenotype map. One approach to this is the intensive study of model systems for which diverse sources of information can be accumulated and integrated. Saccharomyces cerevisiae is an extensively studied model organism, with well-known protein functions and thoroughly curated phenotype data. To develop and expand the available resources linking genomic variation with function in yeast, we aim to model the pan-genome of S. cerevisiae. To initiate the yeast pan-genome, we newly sequenced or re-sequenced the genomes of 25 strains that are commonly used in the yeast research community using advanced sequencing technology at high quality. We also developed a pipeline for automated pan-genome analysis, which integrates the steps of assembly, annotation, and variation calling. To assign strain-specific functional annotations, we identified genes that were not present in the reference genome. We classified these according to their presence or absence across strains and characterized each group of genes with known functional and phenotypic features. The functional roles of novel genes not found in the reference genome and associated with strains or groups of strains appear to be consistent with anticipated adaptations in specific lineages. As more S. cerevisiae strain genomes are released, our analysis can be used to collate genome data and relate it to lineage-specific patterns of genome evolution. Our new tool set will enhance our understanding of genomic and functional evolution in S. cerevisiae, and will be available to the yeast genetics and molecular biology community.

  4. Overrepresentation of glutamate signaling in Alzheimer's disease: network-based pathway enrichment using meta-analysis of genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Eduardo Pérez-Palma

    Full Text Available Genome-wide association studies (GWAS have successfully identified several risk loci for Alzheimer's disease (AD. Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 4,569 individuals (2,540 cases and 2,029 healthy controls derived from three publicly available GWAS in AD and replicated a broad genomic region (>248,000 bp associated with the disease near the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values (GW, defined as the single strongest association signal within a gene, or calculated a more stringent gene-based association p-value using the extended Simes (GATES procedure. Comparison of these strategies revealed that ontological sub-networks (SNs involved in glutamate signaling were significantly overrepresented in AD (p<2.7×10(-11, p<1.9×10(-11; GW and GATES, respectively. Notably, glutamate signaling SNs were also found to be significantly overrepresented (p<5.1×10(-8 in the Alzheimer's disease Neuroimaging Initiative (ADNI study, which was used as a targeted replication sample. Interestingly, components of the glutamate signaling SNs are coordinately expressed in disease-related tissues, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional biological networks should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder.

  5. Genome-wide single nucleotide polymorphism-based assay for high-resolution epidemiological analysis of the methicillin-resistant Staphylococcus aureus hospital clone EMRSA-15.

    Science.gov (United States)

    Holmes, A; McAllister, G; McAdam, P R; Hsien Choi, S; Girvan, K; Robb, A; Edwards, G; Templeton, K; Fitzgerald, J R

    2014-02-01

    The EMRSA-15 clone is a major cause of nosocomial methicillin-resistant Staphylococcus aureus (MRSA) infections in the UK and elsewhere but existing typing methodologies have limited capacity to discriminate closely related strains, and are often poorly reproducible between laboratories. Here, we report the design, development and validation of a genome-wide single nucleotide polymorphism (SNP) typing method and compare it to established methods for typing of EMRSA-15. In order to identify discriminatory SNPs, the genomes of 17 EMRSA-15 strains, selected to represent the breadth of genotypic and phenotypic diversity of EMRSA-15 isolates in Scotland, were determined and phylogenetic reconstruction was carried out. In addition to 17 phylogenetically informative SNPs, five binary markers were included to form the basis of an EMRSA-15 genotyping assay. The SNP-based typing assay was as discriminatory as pulsed-field gel electrophoresis, and significantly more discriminatory than staphylococcal protein A (spa) typing for typing of a representative panel of diverse EMRSA-15 strains, isolates from two EMRSA-15 hospital outbreak investigations, and a panel of bacteraemia isolates obtained in healthcare facilities in the east of Scotland during a 12-month period. The assay is a rapid, and reproducible approach for epidemiological analysis of EMRSA-15 clinical isolates in Scotland. Unlike established methods the DNA sequence-based method is ideally suited for inter-laboratory comparison of identified genotypes, and its flexibility lends itself to supplementation with additional SNPs or markers for the identification of novel S. aureus strains in other regions of the world.

  6. Machine-Learning-Based Analysis in Genome-Edited Cells Reveals the Efficiency of Clathrin-Mediated Endocytosis

    Directory of Open Access Journals (Sweden)

    Sun Hae Hong

    2015-09-01

    Full Text Available Cells internalize various molecules through clathrin-mediated endocytosis (CME. Previous live-cell imaging studies suggested that CME is inefficient, with about half of the events terminated. These CME efficiency estimates may have been confounded by overexpression of fluorescently tagged proteins and inability to filter out false CME sites. Here, we employed genome editing and machine learning to identify and analyze authentic CME sites. We examined CME dynamics in cells that express fluorescent fusions of two defining CME proteins, AP2 and clathrin. Support vector machine classifiers were built to identify and analyze authentic CME sites. From inception until disappearance, authentic CME sites contain both AP2 and clathrin, have the same degree of limited mobility, continue to accumulate AP2 and clathrin over lifetimes >∼20 s, and almost always form vesicles as assessed by dynamin2 recruitment. Sites that contain only clathrin or AP2 show distinct dynamics, suggesting they are not part of the CME pathway.

  7. A process for analysis of microarray comparative genomics hybridisation studies for bacterial genomes

    Directory of Open Access Journals (Sweden)

    Woodward Martin J

    2008-01-01

    Full Text Available Abstract Background Microarray based comparative genomic hybridisation (CGH experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.

  8. Dating the age of admixture via wavelet transform analysis of genome-wide data

    NARCIS (Netherlands)

    I. Pugach (Irina); R. Matveyev (Rostislav); A. Wollstein (Andreas); M.H. Kayser (Manfred); M. Stoneking (Mark)

    2011-01-01

    textabstractWe describe a PCA-based genome scan approach to analyze genome-wide admixture structure, and introduce wavelet transform analysis as a method for estimating the time of admixture. We test the wavelet transform method with simulations and apply it to genome-wide SNP data from eight admixe

  9. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    Directory of Open Access Journals (Sweden)

    Feltus Frank A

    2011-07-01

    Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of

  10. Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

    Science.gov (United States)

    Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

    2012-05-01

    The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs.

  11. Integrative bayesian network analysis of genomic data.

    Science.gov (United States)

    Ni, Yang; Stingo, Francesco C; Baladandayuthapani, Veerabhadran

    2014-01-01

    Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient's clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.

  12. Hyperstructures, genome analysis and I-cells

    DEFF Research Database (Denmark)

    Amar, P.; Ballet, P.; Barlovatz-Meimon, G.

    2002-01-01

    New concepts may prove necessary to profit from the avalanche of sequence data on the genome, transcriptome, proteome and interactome and to relate this information to cell physiology. Here, we focus on the concept of large activity-based structures, or hyperstructures, in which a variety of type...

  13. Global genomic diversity of human papillomavirus 6 based on 724 isolates and 190 complete genome sequences.

    Science.gov (United States)

    Jelen, Mateja M; Chen, Zigui; Kocjan, Boštjan J; Burt, Felicity J; Chan, Paul K S; Chouhy, Diego; Combrinck, Catharina E; Coutlée, François; Estrade, Christine; Ferenczy, Alex; Fiander, Alison; Franco, Eduardo L; Garland, Suzanne M; Giri, Adriana A; González, Joaquín Víctor; Gröning, Arndt; Heidrich, Kerstin; Hibbitts, Sam; Hošnjak, Lea; Luk, Tommy N M; Marinic, Karina; Matsukura, Toshihiko; Neumann, Anna; Oštrbenk, Anja; Picconi, Maria Alejandra; Richardson, Harriet; Sagadin, Martin; Sahli, Roland; Seedat, Riaz Y; Seme, Katja; Severini, Alberto; Sinchi, Jessica L; Smahelova, Jana; Tabrizi, Sepehr N; Tachezy, Ruth; Tohme, Sarah; Uloza, Virgilijus; Vitkauskiene, Astra; Wong, Yong Wee; Zidovec Lepej, Snježana; Burk, Robert D; Poljak, Mario

    2014-07-01

    Human papillomavirus type 6 (HPV6) is the major etiological agent of anogenital warts and laryngeal papillomas and has been included in both the quadrivalent and nonavalent prophylactic HPV vaccines. This study investigated the global genomic diversity of HPV6, using 724 isolates and 190 complete genomes from six continents, and the association of HPV6 genomic variants with geographical location, anatomical site of infection/disease, and gender. Initially, a 2,800-bp E5a-E5b-L1-LCR fragment was sequenced from 492/530 (92.8%) HPV6-positive samples collected for this study. Among them, 130 exhibited at least one single nucleotide polymorphism (SNP), indel, or amino acid change in the E5a-E5b-L1-LCR fragment and were sequenced in full. A global alignment and maximum likelihood tree of 190 complete HPV6 genomes (130 fully sequenced in this study and 60 obtained from sequence repositories) revealed two variant lineages, A and B, and five B sublineages: B1, B2, B3, B4, and B5. HPV6 (sub)lineage-specific SNPs and a 960-bp representative region for whole-genome-based phylogenetic clustering within the L2 open reading frame were identified. Multivariate logistic regression analysis revealed that lineage B predominated globally. Sublineage B3 was more common in Africa and North and South America, and lineage A was more common in Asia. Sublineages B1 and B3 were associated with anogenital infections, indicating a potential lesion-specific predilection of some HPV6 sublineages. Females had higher odds for infection with sublineage B3 than males. In conclusion, a global HPV6 phylogenetic analysis revealed the existence of two variant lineages and five sublineages, showing some degree of ethnogeographic, gender, and/or disease predilection in their distribution. This study established the largest database of globally circulating HPV6 genomic variants and contributed a total of 130 new, complete HPV6 genome sequences to available sequence repositories. Two HPV6 variant lineages

  14. Pan-genome multilocus sequence typing and outbreak-specific reference-based single nucleotide polymorphism analysis to resolve two concurrent Staphylococcus aureus outbreaks in neonatal services.

    Science.gov (United States)

    Roisin, S; Gaudin, C; De Mendonça, R; Bellon, J; Van Vaerenbergh, K; De Bruyne, K; Byl, B; Pouseele, H; Denis, O; Supply, P

    2016-06-01

    We used a two-step whole genome sequencing analysis for resolving two concurrent outbreaks in two neonatal services in Belgium, caused by exfoliative toxin A-encoding-gene-positive (eta+) methicillin-susceptible Staphylococcus aureus with an otherwise sporadic spa-type t209 (ST-109). Outbreak A involved 19 neonates and one healthcare worker in a Brussels hospital from May 2011 to October 2013. After a first episode interrupted by decolonization procedures applied over 7 months, the outbreak resumed concomitantly with the onset of outbreak B in a hospital in Asse, comprising 11 neonates and one healthcare worker from mid-2012 to January 2013. Pan-genome multilocus sequence typing, defined on the basis of 42 core and accessory reference genomes, and single-nucleotide polymorphisms mapped on an outbreak-specific de novo assembly were used to compare 28 available outbreak isolates and 19 eta+/spa-type t209 isolates identified by routine or nationwide surveillance. Pan-genome multilocus sequence typing showed that the outbreaks were caused by independent clones not closely related to any of the surveillance isolates. Isolates from only ten cases with overlapping stays in outbreak A, including four pairs of twins, showed no or only a single nucleotide polymorphism variation, indicating limited sequential transmission. Detection of larger genomic variation, even from the start of the outbreak, pointed to sporadic seeding from a pre-existing exogenous source, which persisted throughout the whole course of outbreak A. Whole genome sequencing analysis can provide unique fine-tuned insights into transmission pathways of complex outbreaks even at their inception, which, with timely use, could valuably guide efforts for early source identification.

  15. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    and dhurrin, which have not previously been characterized in blueberries. There are more than 44,500 spider species with distinct habitats and unique characteristics. Spiders are masters of producing silk webs to catch prey and using venom to neutralize. The exploration of the genetics behind these properties...... japonicus (Lotus), Vaccinium corymbosum (blueberry), Stegodyphus mimosarum (spider) and Trifolium occidentale (clover). From a bioinformatics data analysis perspective, my work can be divided into three parts; genome annotation, small RNA, and gene expression analysis. Lotus is a legume of significant...... has just started. We have assembled and annotated the first two spider genomes to facilitate our understanding of spiders at the molecular level. The need for analyzing the large and increasing amount of sequencing data has increased the demand for efficient, user friendly, and broadly applicable...

  16. Bio-informatics Research Progress in the Post-genome Era Based on the Quantitative Analysis of SCIE

    Institute of Scientific and Technical Information of China (English)

    Yongqin; ZHAN; Min; YU

    2013-01-01

    SCIE paper output can reflect the status quo and trend of discipline research and 7 038 scientific articles concerning bioinformatics are retrieved in SCIE database during the years between 2008 and 2012. Quantitative analysis of paper output and citation frequency are conducted according to nations, institutions, publications, research direction as well as hot articles, which provides assistance for bioinformatics researchers to understand the present situation of this subject, carry out cooperative studies and display scientific research achievements.

  17. CFGP: a web-based, comparative fungal genomics platform.

    Science.gov (United States)

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  18. StellaBase: The Nematostella vectensis Genomics Database

    OpenAIRE

    James C Sullivan; Ryan, Joseph F; Watson, James A.; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2005-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions...

  19. Quantitative Mass Spectrometry-Based Analysis of β-D-Glucosyl-5-Hydroxymethyluracil in Genomic DNA of Trypanosoma brucei

    Science.gov (United States)

    Liu, Shuo; Ji, Debin; Cliffe, Laura; Sabatini, Robert; Wang, Yinsheng

    2014-10-01

    β-D-glucosyl-5-hydroxymethyluracil (base J) is a hyper-modified nucleobase found in the nuclear DNA of kinetoplastid parasites. With replacement of a fraction of thymine in DNA, J is localized primarily in telomeric regions of all organisms carrying this modified base. The biosynthesis of J occurs in two putative steps: first, a specific thymine in DNA is recognized and converted into 5-hydroxymethyluracil (5-HmU) by J-binding proteins (JBP1 and JBP2); a glucosyl transferase (GT) subsequently glucosylates the 5-HmU to yield J. Although several recent studies revealed the roles of internal J in regulating transcription in kinetoplastids, functions of telomeric J and proteins involved in J synthesis remain elusive. Assessing the functions of base J and understanding fully its biosynthesis necessitate the measurement of its level in cells and organisms. In this study, we reported a reversed-phase HPLC coupled with tandem mass spectrometry (LC-MS/MS) method, together with the use of a surrogate internal standard (β-D-glucosyl-5-hydroxymethyl-2'-deoxycytidine, 5-gHmdC), for the accurate detection of β-D-glucosyl-5-hydroxymethyl-2'-deoxyuridine (dJ) in Trypanosoma brucei DNA. For comparison, we also measured the level of the precursor for dJ synthesis [i.e. 5-hydroxymethyl-2'-deoxyuridine (5-HmdU)]. We found that base J was not detectable in the JBP-null cells whereas it replaced approximately 0.5% thymine in wild-type cells, which was accompanied with a markedly decreased level of 5-HmdU in JBP1/JBP2-null strain relative to the wild-type strain. These results provided direct evidence supporting that JBP proteins play an important role in oxidizing thymidine to form 5-HmdU, which facilitated the generation of dJ. This is the first report about the application of LC-MS/MS for the quantification of base J. The analytical method built a solid foundation for dissecting the molecular mechanisms of J biosynthesis and assessing the biological functions of base J in the

  20. Sequencing and analysis of the giant panda genome

    Institute of Scientific and Technical Information of China (English)

    YANG HuanMing

    2010-01-01

    @@ The giant panda (Ailuropoda melanoleuca) is loved all over the world and is considered a symbol of China, as illustrated by its being one of the mascots for the Beijing 2008 Olympic Games.It is also one of the world's most endangered animals and a flagship species for conservation.Using next-generation sequencing technology (Illumina Genome Analyzer) and our in-house assembly software, we have generated the first map of the giant panda genome sequence.This map will provide an unparalleled amount of information to aid in understanding the genetic and biological nature of this unique species and will contribute significantly to disease control and conservation efforts for this endangered species.In March 2008, the giant panda genome sequencing and analysis project was started at the Beijing Genomics Institute (BGI) in Shenzhen with collaborators from the Kunming Institute of Zoology and the Chengdu Research Base of Giant Panda Breeding.On 21 Jan.2010, this collaboration resulted in the publication, as a cover story in the journal Nature, of the sequencing and analysis of the giant panda genome.

  1. Sequence analysis reveals mosaic genome of Aichi virus

    Directory of Open Access Journals (Sweden)

    Han Xiaohong

    2011-08-01

    Full Text Available Abstract Aichi virus is a positive-sense and single-stranded RNA virus, which demonstrated to be related to diarrhea of Children. In the present study, phylogenetic and recombination analysis based on the Aichi virus complete genomes available in GenBank reveal a mosaic genome sequence [GenBank: FJ890523], of which the nt 261-852 region (the nt position was based on the aligned sequence file shows close relationship with AB010145/Japan with 97.9% sequence identity, while the other genomic regions show close relationship with AY747174/German with 90.1% sequence identity. Our results will provide valuable hints for future research on Aichi virus diversity. Aichi virus is a member of the Kobuvirus genus of the Picornaviridae family 12 and belongs to a positive-sense and single-stranded RNA virus. Its presence in fecal specimens of children suffering from diarrhea has been demonstrated in several Asian countries 3456, in Brazil and German 7, in France 8 and in Tunisia 9. Some reports showed the high level of seroprevalence in adults 710, suggesting the widespread exposure to Aichi virus during childhood. The genome of Aichi virus contains 8,280 nucleotides and a poly(A tail. The single large open reading frame (nt 713-8014 according to the strain AB010145 encodes a polyprotein of 2,432 amino acids that is cleaved into the typical picornavirus structural proteins VP0, VP3, VP1, and nonstructural proteins 2A, 2B, 2C, 3A, 3B, 3C and 3D 211. Based on the phylogenetic analysis of 519-bp sequences at the 3C-3D (3CD junction, Aichi viruses can be divided into two genotypes A and B with approximately 90% sequence homology 12. Although only six complete genomes of Aichi virus were deposited in GenBank at present, mosaic genomes can be found in strains from different countries.

  2. Comparative genomic analysis of soybean flowering genes.

    Directory of Open Access Journals (Sweden)

    Chol-Hee Jung

    Full Text Available Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant

  3. A comprehensive analysis of bilaterian mitochondrial genomes and phylogeny.

    Science.gov (United States)

    Bernt, Matthias; Bleidorn, Christoph; Braband, Anke; Dambach, Johannes; Donath, Alexander; Fritzsch, Guido; Golombek, Anja; Hadrys, Heike; Jühling, Frank; Meusemann, Karen; Middendorf, Martin; Misof, Bernhard; Perseke, Marleen; Podsiadlowski, Lars; von Reumont, Björn; Schierwater, Bernd; Schlegel, Martin; Schrödl, Michael; Simon, Sabrina; Stadler, Peter F; Stöger, Isabella; Struck, Torsten H

    2013-11-01

    About 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies. Nevertheless, mitochondrial genome data might still be promising as it allows for a larger taxon sampling, while presenting a smaller amount of sequence information. We present the most comprehensive analysis of bilaterian relationships based on mitochondrial genome data. The analyzed data set comprises more than 650 mitochondrial genomes that have been chosen to represent a profound sample of the phylogenetic as well as sequence diversity. The results are based on high quality amino acid alignments obtained from a complete reannotation of the mitogenomic sequences from NCBI RefSeq database. However, the results failed to give support for many otherwise undisputed high-ranking taxa, like Mollusca, Hexapoda, Arthropoda, and suffer from extreme long branches of Nematoda, Platyhelminthes, and some other taxa. In order to identify the sources of misleading phylogenetic signals, we discuss several problems associated with mitochondrial genome data sets, e.g. the nucleotide and amino acid landscapes and a strong correlation of gene rearrangements with long branches.

  4. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.

  5. Genome Assembly and Computational Analysis Pipelines for Bacterial Pathogens

    KAUST Repository

    Rangkuti, Farania Gama Ardhina

    2011-06-01

    Pathogens lie behind the deadliest pandemics in history. To date, AIDS pandemic has resulted in more than 25 million fatal cases, while tuberculosis and malaria annually claim more than 2 million lives. Comparative genomic analyses are needed to gain insights into the molecular mechanisms of pathogens, but the abundance of biological data dictates that such studies cannot be performed without the assistance of computational approaches. This explains the significant need for computational pipelines for genome assembly and analyses. The aim of this research is to develop such pipelines. This work utilizes various bioinformatics approaches to analyze the high-­throughput genomic sequence data that has been obtained from several strains of bacterial pathogens. A pipeline has been compiled for quality control for sequencing and assembly, and several protocols have been developed to detect contaminations. Visualization has been generated of genomic data in various formats, in addition to alignment, homology detection and sequence variant detection. We have also implemented a metaheuristic algorithm that significantly improves bacterial genome assemblies compared to other known methods. Experiments on Mycobacterium tuberculosis H37Rv data showed that our method resulted in improvement of N50 value of up to 9697% while consistently maintaining high accuracy, covering around 98% of the published reference genome. Other improvement efforts were also implemented, consisting of iterative local assemblies and iterative correction of contiguated bases. Our result expedites the genomic analysis of virulent genes up to single base pair resolution. It is also applicable to virtually every pathogenic microorganism, propelling further research in the control of and protection from pathogen-­associated diseases.

  6. Accurate genome relative abundance estimation based on shotgun metagenomic reads.

    Directory of Open Access Journals (Sweden)

    Li C Xia

    Full Text Available Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy. GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

  7. Genomics-based plant germplasm research (GPGR

    Directory of Open Access Journals (Sweden)

    Jizeng Jia

    2017-04-01

    Full Text Available Plant germplasm underpins much of crop genetic improvement. Millions of germplasm accessions have been collected and conserved ex situ and/or in situ, and the major challenge is now how to exploit and utilize this abundant resource. Genomics-based plant germplasm research (GPGR or “Genoplasmics” is a novel cross-disciplinary research field that seeks to apply the principles and techniques of genomics to germplasm research. We describe in this paper the concept, strategy, and approach behind GPGR, and summarize current progress in the areas of the definition and construction of core collections, enhancement of germplasm with core collections, and gene discovery from core collections. GPGR is opening a new era in germplasm research. The contribution, progress and achievements of GPGR in the future are predicted.

  8. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  9. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

    Directory of Open Access Journals (Sweden)

    Surovcik Katharina

    2006-03-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired

  10. Genomic profiling of oral squamous cell carcinoma by array-based comparative genomic hybridization.

    Directory of Open Access Journals (Sweden)

    Shunichi Yoshioka

    Full Text Available We designed a study to investigate genetic relationships between primary tumors of oral squamous cell carcinoma (OSCC and their lymph node metastases, and to identify genomic copy number aberrations (CNAs related to lymph node metastasis. For this purpose, we collected a total of 42 tumor samples from 25 patients and analyzed their genomic profiles by array-based comparative genomic hybridization. We then compared the genetic profiles of metastatic primary tumors (MPTs with their paired lymph node metastases (LNMs, and also those of LNMs with non-metastatic primary tumors (NMPTs. Firstly, we found that although there were some distinctive differences in the patterns of genomic profiles between MPTs and their paired LNMs, the paired samples shared similar genomic aberration patterns in each case. Unsupervised hierarchical clustering analysis grouped together 12 of the 15 MPT-LNM pairs. Furthermore, similarity scores between paired samples were significantly higher than those between non-paired samples. These results suggested that MPTs and their paired LNMs are composed predominantly of genetically clonal tumor cells, while minor populations with different CNAs may also exist in metastatic OSCCs. Secondly, to identify CNAs related to lymph node metastasis, we compared CNAs between grouped samples of MPTs and LNMs, but were unable to find any CNAs that were more common in LNMs. Finally, we hypothesized that subpopulations carrying metastasis-related CNAs might be present in both the MPT and LNM. Accordingly, we compared CNAs between NMPTs and LNMs, and found that gains of 7p, 8q and 17q were more common in the latter than in the former, suggesting that these CNAs may be involved in lymph node metastasis of OSCC. In conclusion, our data suggest that in OSCCs showing metastasis, the primary and metastatic tumors share similar genomic profiles, and that cells in the primary tumor may tend to metastasize after acquiring metastasis-associated CNAs.

  11. Sequencing and annotated analysis of an Estonian human genome.

    Science.gov (United States)

    Lilleoja, Rutt; Sarapik, Aili; Reimann, Ene; Reemann, Paula; Jaakma, Ülle; Vasar, Eero; Kõks, Sulev

    2012-02-01

    In present study we describe the sequencing and annotated analysis of the individual genome of Estonian. Using SOLID technology we generated 2,449,441,916 of 50-bp reads. The Bioscope version 1.3 was used for mapping and pairing of reads to the NCBI human genome reference (build 36, hg18). Bioscope enables also the annotation of the results of variant (tertiary) analysis. The average mapping of reads was 75.5% with total coverage of 107.72 Gb. resulting in mean fold coverage of 34.6. We found 3,482,975 SNPs out of which 352,492 were novel. 21,222 SNPs were in coding region: 10,649 were synonymous SNPs, 10,360 were nonsynonymous missense SNPs, 155 were nonsynonymous nonsense SNPs and 58 were nonsynonymous frameshifts. We identified 219 CNVs with total base pair coverage of 37,326,300 bp and 87,451 large insertion/deletion polymorphisms covering 10,152,256 bp of the genome. In addition, we found 285,864 small size insertion/deletion polymorphisms out of which 133,969 were novel. Finally, we identified 53 inversions, 19 overlapped genes and 2 overlapped exons. Interestingly, we found the region in chromosome 6 to be enriched with the coding SNPs and CNVs. This study confirms previous findings, that our genomes are more complex and variable as thought before. Therefore, sequencing of the personal genomes followed by annotation would improve the analysis of heritability of phenotypes and our understandings on the functions of genome.

  12. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    Science.gov (United States)

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-07-01

    A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  13. Whole genome sequence-based serogrouping of Listeria monocytogenes isolates.

    Science.gov (United States)

    Hyden, Patrick; Pietzka, Ariane; Lennkh, Anna; Murer, Andrea; Springer, Burkhard; Blaschitz, Marion; Indra, Alexander; Huhulescu, Steliana; Allerberger, Franz; Ruppitsch, Werner; Sensen, Christoph W

    2016-10-10

    Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  14. Genome-wide Studies of Mycolic Acid Bacteria: Computational Identification and Analysis of a Minimal Genome

    KAUST Repository

    Kamanu, Frederick Kinyua

    2012-12-01

    The mycolic acid bacteria are a distinct suprageneric group of asporogenous Grampositive, high GC-content bacteria, distinguished by the presence of mycolic acids in their cell envelope. They exhibit great diversity in their cell and morphology; although primarily non-pathogens, this group contains three major pathogens Mycobacterium leprae, Mycobacterium tuberculosis complex, and Corynebacterium diphtheria. Although the mycolic acid bacteria are a clearly defined group of bacteria, the taxonomic relationships between its constituent genera and species are less well defined. Two approaches were tested for their suitability in describing the taxonomy of the group. First, a Multilocus Sequence Typing (MLST) experiment was assessed and found to be superior to monophyletic (16S small ribosomal subunit) in delineating a total of 52 mycolic acid bacterial species. Phylogenetic inference was performed using the neighbor-joining method. To further refine phylogenetic analysis and to take advantage of the widespread availability of bacterial genome data, a computational framework that simulates DNA-DNA hybridisation was developed and validated using multiscale bootstrap resampling. The tool classifies microbial genomes based on whole genome DNA, and was deployed as a web-application using PHP and Javascript. It is accessible online at http://cbrc.kaust.edu.sa/dna_hybridization/ A third study was a computational and statistical methods in the identification and analysis of a putative minimal mycolic acid bacterial genome so as to better understand (1) the genomic requirements to encode a mycolic acid bacterial cell and (2) the role and type of genes and genetic elements that lead to the massive increase in genome size in environmental mycolic acid bacteria. Using a reciprocal comparison approach, a total of 690 orthologous gene clusters forming a putative minimal genome were identified across 24 mycolic acid bacterial species. In order to identify new potential drug

  15. Analysis of pan-genome content and its application in microbial identification

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana

    of genomic data and use this to answer important biological questions. More specifically, comparison of prokaryotic proteomes is used to determine possible sets of functions, essential to sustain microbial life; to extract and interpret similarities and variance in genomic content within different taxonomic...... analyses for the characterization of two Listeria monocytogenes strains. Chapter 4 describes the use of profile HMMs for comparative analysis using for sequence-based homology searches. Paper III introduces PanFunPro a new, profile HMM-based method for pan-genome analysis. Paper IV illustrates...... the application of PanFunPro to a set of more than 2000 genomes; this paper aims to define set of protein families, which are conserved among all the genomes. Papers V demonstrates comparative genomics analysis of proteomes, belonging to Vibrio genus. In the last project, described in Chapter 5, both BLAST...

  16. Large-scale pathway-based analysis of bladder cancer genome-wide association data from five studies of European background.

    Directory of Open Access Journals (Sweden)

    Idan Menashe

    Full Text Available Pathway analysis of genome-wide association studies (GWAS offer a unique opportunity to collectively evaluate genetic variants with effects that are too small to be detected individually. We applied a pathway analysis to a bladder cancer GWAS containing data from 3,532 cases and 5,120 controls of European background (n = 5 studies. Thirteen hundred and ninety-nine pathways were drawn from five publicly available resources (Biocarta, Kegg, NCI-PID, HumanCyc, and Reactome, and we constructed 22 additional candidate pathways previously hypothesized to be related to bladder cancer. In total, 1421 pathways, 5647 genes and ∼90,000 SNPs were included in our study. Logistic regression model adjusting for age, sex, study, DNA source, and smoking status was used to assess the marginal trend effect of SNPs on bladder cancer risk. Two complementary pathway-based methods (gene-set enrichment analysis [GSEA], and adapted rank-truncated product [ARTP] were used to assess the enrichment of association signals within each pathway. Eighteen pathways were detected by either GSEA or ARTP at P≤0.01. To minimize false positives, we used the I(2 statistic to identify SNPs displaying heterogeneous effects across the five studies. After removing these SNPs, seven pathways ('Aromatic amine metabolism' [P(GSEA = 0.0100, P(ARTP = 0.0020], 'NAD biosynthesis' [P(GSEA = 0.0018, P(ARTP = 0.0086], 'NAD salvage' [P(ARTP = 0.0068], 'Clathrin derived vesicle budding' [P(ARTP = 0.0018], 'Lysosome vesicle biogenesis' [P(GSEA = 0.0023, P(ARTP<0.00012], 'Retrograde neurotrophin signaling' [P(GSEA = 0.00840], and 'Mitotic metaphase/anaphase transition' [P(GSEA = 0.0040] remained. These pathways seem to belong to three fundamental cellular processes (metabolic detoxification, mitosis, and clathrin-mediated vesicles. Identification of the aromatic amine metabolism pathway provides support for the ability of this approach to identify pathways

  17. Human and mouse genome analysis using array comparative genomic hybridization

    NARCIS (Netherlands)

    Snijders, Antoine Maria

    2004-01-01

    Almost all human cancers as well as developmental abnormalities are characterized by the presence of genetic alterations, most of which target a gene or a particular genomic locus resulting in altered gene expression and ultimately an altered phenotype. Different types of genetic alterations include

  18. SIDEKICK: Genomic data driven analysis and decision-making framework

    Directory of Open Access Journals (Sweden)

    Yoon Kihoon

    2010-12-01

    Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to

  19. Natural selection on functional modules, a genome-wide analysis.

    Science.gov (United States)

    Serra, François; Arbiza, Leonardo; Dopazo, Joaquín; Dopazo, Hernán

    2011-03-01

    Classically, the functional consequences of natural selection over genomes have been analyzed as the compound effects of individual genes. The current paradigm for large-scale analysis of adaptation is based on the observed significant deviations of rates of individual genes from neutral evolutionary expectation. This approach, which assumed independence among genes, has not been able to identify biological functions significantly enriched in positively selected genes in individual species. Alternatively, pooling related species has enhanced the search for signatures of selection. However, grouping signatures does not allow testing for adaptive differences between species. Here we introduce the Gene-Set Selection Analysis (GSSA), a new genome-wide approach to test for evidences of natural selection on functional modules. GSSA is able to detect lineage specific evolutionary rate changes in a notable number of functional modules. For example, in nine mammal and Drosophilae genomes GSSA identifies hundreds of functional modules with significant associations to high and low rates of evolution. Many of the detected functional modules with high evolutionary rates have been previously identified as biological functions under positive selection. Notably, GSSA identifies conserved functional modules with many positively selected genes, which questions whether they are exclusively selected for fitting genomes to environmental changes. Our results agree with previous studies suggesting that adaptation requires positive selection, but not every mutation under positive selection contributes to the adaptive dynamical process of the evolution of species.

  20. Natural selection on functional modules, a genome-wide analysis.

    Directory of Open Access Journals (Sweden)

    François Serra

    2011-03-01

    Full Text Available Classically, the functional consequences of natural selection over genomes have been analyzed as the compound effects of individual genes. The current paradigm for large-scale analysis of adaptation is based on the observed significant deviations of rates of individual genes from neutral evolutionary expectation. This approach, which assumed independence among genes, has not been able to identify biological functions significantly enriched in positively selected genes in individual species. Alternatively, pooling related species has enhanced the search for signatures of selection. However, grouping signatures does not allow testing for adaptive differences between species. Here we introduce the Gene-Set Selection Analysis (GSSA, a new genome-wide approach to test for evidences of natural selection on functional modules. GSSA is able to detect lineage specific evolutionary rate changes in a notable number of functional modules. For example, in nine mammal and Drosophilae genomes GSSA identifies hundreds of functional modules with significant associations to high and low rates of evolution. Many of the detected functional modules with high evolutionary rates have been previously identified as biological functions under positive selection. Notably, GSSA identifies conserved functional modules with many positively selected genes, which questions whether they are exclusively selected for fitting genomes to environmental changes. Our results agree with previous studies suggesting that adaptation requires positive selection, but not every mutation under positive selection contributes to the adaptive dynamical process of the evolution of species.

  1. RAPD-based screening of genomic libraries for positional cloning.

    Science.gov (United States)

    Dioh, W; Tharreau, D; Lebrun, M H

    1997-12-15

    RAPD markers are frequently used for positional cloning. However, RAPD markers often contain repeated sequences which prevent genomic library screening by hybridisation. We have developed a simple RAPD analysis of genomic libraries based on the identification of cosmid pools and clones amplifying the RAPD marker of interest. Our method does not require the cloning or characterisation of the RAPD marker as it relies on the analysis of cosmid pools or clones using a simple RAPD protocol. We applied this strategy using four RAPD markers composed of single copy or repeated sequences linked to avirulence genes of the rice blast fungus Magnaporthe grisea . Cosmids containing these RAPD markers were easily and rapidly identified allowing the construction of physical contigs at these loci.

  2. 植物CRISPR/Cas9基因组编辑系统与突变分析%CRISPR/Cas9-based genome editing systems and the analysis of targeted genome mutations in plants

    Institute of Scientific and Technical Information of China (English)

    马兴亮; 刘耀光

    2016-01-01

    Targeted genomic editing technologies use programmable DNA nucleases to cleave genomic target sites, thus inducing targeted mutations in the genomes. The newly prevailed clustered regularly interspaced short palin-dromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system that consists of the Cas9 nuclease and single guide RNA (sgRNA) has the advantages of simplicity and high efficiency as compared to other programmable DNA nuclease systems such as zinc finger nucleases (ZFNs) and transcription activator like effector nucleases (TALENs). Currently, a number of cases have been reported on the application of the CRISPR/Cas9 genomic editing technology in plants. In this review, we summarize the strategies for preparing the Cas9 and sgRNA expression constructs, the transformation method for obtaining targeted mutations, the efficiency and features of the resulting mutations and the methods for detecting or genotyping of the mutation sites. We also discuss the existing problems and perspectives of CRISPR/Cas9-based genomic editing in plants.%基因组定点编辑技术通过可编码核酸酶切割基因组特定位点,进而诱导基因组定点突变.CRISPR/Cas9(Clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9)基因组编辑系统由Cas9核酸酶以及sgRNA(Single guide RNA)组成,与其他可编码核酸酶系统如锌指核酸酶(Zinc finger nucleases,ZFNs)和类转录激活因子效应物核酸酶(Transcription activator-like effector nucleases,TALENs)相比具有更简便的操作性和更高的基因组定点编辑效率.目前在植物中已有多例应用CRISPR/Cas9系统进行基因组编辑的报道.本文从Cas9基因与sgRNA表达载体的构建策略,获得基因组定点编辑突变体的转化方法、突变的效率和特征、突变的检测方法等方面进行了总结,最后对植物中利用CRISPR/Cas9系统进行基因组编辑存在的问题以及发展前景进行了讨论.

  3. Enhancing genomics information retrieval through dimensional analysis.

    Science.gov (United States)

    Hu, Qinmin; Huang, Jimmy Xiangji

    2013-06-01

    We propose a novel dimensional analysis approach to employing meta information in order to find the relationships within the unstructured or semi-structured document/passages for improving genomics information retrieval performance. First, we make use of the auxiliary information as three basic dimensions, namely "temporal", "journal", and "author". The reference section is treated as a commensurable quantity of the three basic dimensions. Then, the sample space and subspaces are built up and a set of events are defined to meet the basic requirement of dimensional homogeneity to be commensurable quantities. After that, the classic graph analysis algorithm in the Web environments is applied on each dimension respectively to calculate the importance of each dimension. Finally, we integrate all the dimension networks and re-rank the outputs for evaluation. Our experimental results show the proposed approach is superior and promising.

  4. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    cells are capable of regulating their gene expression, so that each cell can only express a particular set of genes yielding limited numbers of proteins with specialized functions. Therefore a rigid control of differential gene expression is necessary for cellular diversity. On the other hand, aberrant...... gene regulation will disrupt the cell’s fundamental processes, which in turn can cause disease. Hence, understanding gene regulation is essential for deciphering the code of life. Along with the development of high throughput sequencing (HTS) technology and the subsequent large-scale data analysis......, genome-wide assays have increased our understanding of gene regulation significantly. This thesis describes the integration and analysis of HTS data across different important aspects of gene regulation. Gene expression can be regulated at different stages when the genetic information is passed from gene...

  5. Genomics and public health: development of Web-based training tools for increasing genomic awareness.

    Science.gov (United States)

    Bodzin, Jennifer; Kardia, Sharon L R; Goldenberg, Aaron; Raup, Sarah F; Bach, Janice V; Citrin, Toby

    2005-04-01

    In 2001, the Centers for Disease Control and Prevention funded three Centers for Genomics and Public Health to develop training tools for increasing genomic awareness. Over the past three years, the centers, working together with the Centers for Disease Control and Prevention's Office of Genomics and Disease Prevention, have developed tools to increase awareness of the impact genomics will have on public health practice, to provide a foundation for understanding basic genomic advances, and to translate the relevance of that information to public health practitioners' own work. These training tools serve to communicate genomic advances and their potential for integration into public heath practice. This paper highlights two of these training tools: 1) Genomics for Public Health Practitioners: The Practical Application of Genomics in Public Health Practice, a Web-based introduction to genomics, and 2) Six Weeks to Genomic Awareness, an in-depth training module on public health genomics. This paper focuses on the processes and collaborative efforts by which these live presentations were developed and delivered as Web-based training sessions.

  6. CyanoBase: the cyanobacteria genome database update 2010.

    Science.gov (United States)

    Nakao, Mitsuteru; Okamoto, Shinobu; Kohara, Mitsuyo; Fujishiro, Tsunakazu; Fujisawa, Takatomo; Sato, Shusei; Tabata, Satoshi; Kaneko, Takakazu; Nakamura, Yasukazu

    2010-01-01

    CyanoBase (http://genome.kazusa.or.jp/cyanobase) is the genome database for cyanobacteria, which are model organisms for photosynthesis. The database houses cyanobacteria species information, complete genome sequences, genome-scale experiment data, gene information, gene annotations and mutant information. In this version, we updated these datasets and improved the navigation and the visual display of the data views. In addition, a web service API now enables users to retrieve the data in various formats with other tools, seamlessly.

  7. VibrioBase: A Model for Next-Generation Genome and Annotation Database Development

    Directory of Open Access Journals (Sweden)

    Siew Woh Choo

    2014-01-01

    Full Text Available To facilitate the ongoing research of Vibrio spp., a dedicated platform for the Vibrio research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. We present VibrioBase, a useful resource platform, providing all basic features of a sequence database with the addition of unique analysis tools which could be valuable for the Vibrio research community. VibrioBase currently houses a total of 252 Vibrio genomes developed in a user-friendly manner and useful to enable the analysis of these genomic data, particularly in the field of comparative genomics. Besides general data browsing features, VibrioBase offers analysis tools such as BLAST interfaces and JBrowse genome browser. Other important features of this platform include our newly developed in-house tools, the pairwise genome comparison (PGC tool, and pathogenomics profiling tool (PathoProT. The PGC tool is useful in the identification and comparative analysis of two genomes, whereas PathoProT is designed for comparative pathogenomics analysis of Vibrio strains. Both of these tools will enable researchers with little experience in bioinformatics to get meaningful information from Vibrio genomes with ease. We have tested the validity and suitability of these tools and features for use in the next-generation database development.

  8. Multidimensional gene set analysis of genomic data.

    Directory of Open Access Journals (Sweden)

    David Montaner

    Full Text Available Understanding the functional implications of changes in gene expression, mutations, etc., is the aim of most genomic experiments. To achieve this, several functional profiling methods have been proposed. Such methods study the behaviour of different gene modules (e.g. gene ontology terms in response to one particular variable (e.g. differential gene expression. In spite to the wealth of information provided by functional profiling methods, a common limitation to all of them is their inherent unidimensional nature. In order to overcome this restriction we present a multidimensional logistic model that allows studying the relationship of gene modules with different genome-scale measurements (e.g. differential expression, genotyping association, methylation, copy number alterations, heterozygosity, etc. simultaneously. Moreover, the relationship of such functional modules with the interactions among the variables can also be studied, which produces novel results impossible to be derived from the conventional unidimensional functional profiling methods. We report sound results of gene sets associations that remained undetected by the conventional one-dimensional gene set analysis in several examples. Our findings demonstrate the potential of the proposed approach for the discovery of new cell functionalities with complex dependences on more than one variable.

  9. Genome Data Exploration Using Correspondence Analysis.

    Science.gov (United States)

    Tekaia, Fredj

    2016-01-01

    Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results

  10. Genomic-based-breeding tools for tropical maize improvement.

    Science.gov (United States)

    Chakradhar, Thammineni; Hindu, Vemuri; Reddy, Palakolanu Sudhakar

    2017-09-05

    Maize has traditionally been the main staple diet in the Southern Asia and Sub-Saharan Africa and widely grown by millions of resource poor small scale farmers. Approximately, 35.4 million hectares are sown to tropical maize, constituting around 59% of the developing worlds. Tropical maize encounters tremendous challenges besides poor agro-climatic situations with average yields recorded <3 tones/hectare that is far less than the average of developed countries. On the contrary to poor yields, the demand for maize as food, feed, and fuel is continuously increasing in these regions. Heterosis breeding introduced in early 90 s improved maize yields significantly, but genetic gains is still a mirage, particularly for crop growing under marginal environments. Application of molecular markers has accelerated the pace of maize breeding to some extent. The availability of array of sequencing and genotyping technologies offers unrivalled service to improve precision in maize-breeding programs through modern approaches such as genomic selection, genome-wide association studies, bulk segregant analysis-based sequencing approaches, etc. Superior alleles underlying complex traits can easily be identified and introgressed efficiently using these sequence-based approaches. Integration of genomic tools and techniques with advanced genetic resources such as nested association mapping and backcross nested association mapping could certainly address the genetic issues in maize improvement programs in developing countries. Huge diversity in tropical maize and its inherent capacity for doubled haploid technology offers advantage to apply the next generation genomic tools for accelerating production in marginal environments of tropical and subtropical world. Precision in phenotyping is the key for success of any molecular-breeding approach. This article reviews genomic technologies and their application to improve agronomic traits in tropical maize breeding has been reviewed in

  11. Pig genome sequence - analysis and publication strategy

    NARCIS (Netherlands)

    Archibald, A.L.; Bolund, L.; Churcher, C.; Fredholm, M.; Groenen, M.A.M.; Harlizius, B.

    2010-01-01

    Background - The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. Results - Assemblies of the B

  12. Genome size evolution in pufferfish: an insight from BAC clone-based Diodon holocanthus genome sequencing

    Directory of Open Access Journals (Sweden)

    Gan Xiaoni

    2010-06-01

    Full Text Available Abstract Background Variations in genome size within and between species have been observed since the 1950 s in diverse taxonomic groups. Serving as model organisms, smooth pufferfish possess the smallest vertebrate genomes. Interestingly, spiny pufferfish from its sister family have genome twice as large as smooth pufferfish. Therefore, comparative genomic analysis between smooth pufferfish and spiny pufferfish is useful for our understanding of genome size evolution in pufferfish. Results Ten BAC clones of a spiny pufferfish Diodon holocanthus were randomly selected and shotgun sequenced. In total, 776 kb of non-redundant sequences without gap representing 0.1% of the D. holocanthus genome were identified, and 77 distinct genes were predicted. In the sequenced D. holocanthus genome, 364 kb is homologous with 265 kb of the Takifugu rubripes genome, and 223 kb is homologous with 148 kb of the Tetraodon nigroviridis genome. The repetitive DNA accounts for 8% of the sequenced D. holocanthus genome, which is higher than that in the T. rubripes genome (6.89% and that in the Te. nigroviridis genome (4.66%. In the repetitive DNA, 76% is retroelements which account for 6% of the sequenced D. holocanthus genome and belong to known families of transposable elements. More than half of retroelements were distributed within genes. In the non-homologous regions, repeat element proportion in D. holocanthus genome increased to 10.6% compared with T. rubripes and increased to 9.19% compared with Te. nigroviridis. A comparison of 10 well-defined orthologous genes showed that the average intron size (566 bp in D. holocanthus genome is significantly longer than that in the smooth pufferfish genome (435 bp. Conclusion Compared with the smooth pufferfish, D. holocanthus has a low gene density and repeat elements rich genome. Genome size variation between D. holocanthus and the smooth pufferfish exhibits as length variation between homologous region and different

  13. Integrated Genome-Based Studies of Shewanella Ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Andrei L. Osterman, Ph.D.

    2012-12-17

    Integration of bioinformatics and experimental techniques was applied to mapping and characterization of the key components (pathways, enzymes, transporters, regulators) of the core metabolic machinery in Shewanella oneidensis and related species with main focus was on metabolic and regulatory pathways involved in utilization of various carbon and energy sources. Among the main accomplishments reflected in ten joint publications with other participants of Shewanella Federation are: (i) A systems-level reconstruction of carbohydrate utilization pathways in the genus of Shewanella (19 species). This analysis yielded reconstruction of 18 sugar utilization pathways including 10 novel pathway variants and prediction of > 60 novel protein families of enzymes, transporters and regulators involved in these pathways. Selected functional predictions were verified by focused biochemical and genetic experiments. Observed growth phenotypes were consistent with bioinformatic predictions providing strong validation of the technology and (ii) Global genomic reconstruction of transcriptional regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors, 8 riboswitches and 6 translational attenuators. Of those, 45 regulons were inferred directly from the genome context analysis, whereas others were propagated from previously characterized regulons in other species. Selected regulatory predictions were experimentally tested. Integration of this analysis with microarray data revealed overall consistency and provided additional layer of interactions between regulons. All the results were captured in the new database RegPrecise, which is a joint development with the LBNL team. A more detailed analysis of the individual subsystems, pathways and regulons in Shewanella spp included bioinfiormatics-based prediction and experimental characterization of: (i) N-Acetylglucosamine catabolic pathway; (ii)Lactate utilization machinery; (iii) Novel Nrt

  14. Digital microarray analysis for digital artifact genomics

    Science.gov (United States)

    Jaenisch, Holger; Handley, James; Williams, Deborah

    2013-06-01

    We implement a Spatial Voting (SV) based analogy of microarray analysis for digital gene marker identification in malware code sections. We examine a famous set of malware formally analyzed by Mandiant and code named Advanced Persistent Threat (APT1). APT1 is a Chinese organization formed with specific intent to infiltrate and exploit US resources. Manidant provided a detailed behavior and sting analysis report for the 288 malware samples available. We performed an independent analysis using a new alternative to the traditional dynamic analysis and static analysis we call Spatial Analysis (SA). We perform unsupervised SA on the APT1 originating malware code sections and report our findings. We also show the results of SA performed on some members of the families associated by Manidant. We conclude that SV based SA is a practical fast alternative to dynamics analysis and static analysis.

  15. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  16. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    Science.gov (United States)

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  17. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  18. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  19. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database

    Science.gov (United States)

    Tian, Feng; Zhao, Jinlong; Kang, Zhenxing

    2017-01-01

    Background Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. Methods We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Results Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. Conclusions The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC.

  20. Weighted gene co-expression network analysis in identification of metastasis-related genes of lung squamous cell carcinoma based on the Cancer Genome Atlas database.

    Science.gov (United States)

    Tian, Feng; Zhao, Jinlong; Fan, Xinlei; Kang, Zhenxing

    2017-01-01

    Lung squamous cell carcinoma (lung SCC) is a common type of malignancy. Its pathogenesis mechanism of tumor development is unclear. The aim of this study was to identify key genes for diagnosis biomarkers in lung SCC metastasis. We searched and downloaded mRNA expression data and clinical data from The Cancer Genome Atlas (TCGA) database to identify differences in mRNA expression of primary tumor tissues from lung SCC with and without metastasis. Gene co-expression network analysis, protein-protein interaction (PPI) network, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis and quantitative real-time polymerase chain reactions (qRT-PCR) were used to explore the biological functions of the identified dysregulated genes. Four hundred and eighty-two differentially expressed genes (DEGs) were identified between lung SCC with and without metastasis. Nineteen modules were identified in lung SCC through weighted gene co-expression network analysis (WGCNA). Twenty-three DEGs and 26 DEGs were significantly enriched in the respective pink and black module. KEGG pathway analysis displayed that 26 DEGs in the black module were significantly enriched in bile secretion pathway. Forty-nine DEGs in the two gene co-expression module were used to construct PPI network. CFTR in the black module was the hub protein, had the connectivity with 182 genes. The results of qRT-PCR displayed that FIGF, SFTPD, DYNLRB2 were significantly down-regulated in the tumor samples of lung SCC with metastasis and CFTR, SCGB3A2, SSTR1, SCTR, ROPN1L had the down-regulation tendency in lung SCC with metastasis compared to lung SCC without metastasis. The dysregulated genes including CFTR, SCTR and FIGF might be involved in the pathology of lung SCC metastasis and could be used as potential diagnosis biomarkers or therapeutic targets for lung SCC.

  1. Pig genome sequence - analysis and publication strategy

    DEFF Research Database (Denmark)

    Archibald, Alan L.; Bolund, Lars; Churcher, Carol;

    2010-01-01

    BACKGROUND: The pig genome is being sequenced and characterised under the auspices of the Swine Genome Sequencing Consortium. The sequencing strategy followed a hybrid approach combining hierarchical shotgun sequencing of BAC clones and whole genome shotgun sequencing. RESULTS: Assemblies......) is under construction and will incorporate whole genome shotgun sequence (WGS) data providing > 30x genome coverage. The WGS sequence, most of which comprise short Illumina/Solexa reads, were generated from DNA from the same single Duroc sow as the source of the BAC library from which clones were...

  2. The controversial nature of the Weissella genus: technological and functional aspects versus whole genome analysis-based pathogenic potential for their application in food and health.

    Directory of Open Access Journals (Sweden)

    Hikmate eAbriouel

    2015-10-01

    Full Text Available Despite the use of several Weissella strains for biotechnological and probiotic purposes, certain species of this genus were found to act as opportunistic pathogens, while strains of W. ceti were recognized to be pathogenic for farmed rainbow trout. Herein, we investigated the pathogenic potential of weissellas based on in silico analyses of the 13 whole genome sequences available to date in the NCBI database.Our screening allowed us to find several virulence determinants such as collagen adhesins, aggregation substances, mucus-binding proteins and hemolysins in some species. Moreover, we detected several antibiotic resistance-encoding genes, whose presence could increase the potential pathogenicity of some strains, but should not be regarded as an excluding trait for beneficial weissellas,as long as these genes are not present on mobile genetic elements. Thus, selection of weissellas intended to be used as starters or for biotechnological or probiotic purposes should be investigated regarding their safety aspects on a strain to strain basis, preferably also by genome sequencing, since nucleotide sequence heterogeneity in virulence and antibiotic resistance genes makes PCR-based screeningunreliable for safety assessments. In this sense, the application of W. confusa and W. cibaria strains as starter cultures or as probiotics should be approached with caution, by carefully selecting strains that lack pathogenic potential.

  3. Genome-based bioprospecting of microbes for new therapeutics.

    Science.gov (United States)

    Zotchev, Sergey B; Sekurova, Olga N; Katz, Leonard

    2012-12-01

    Bioprospecting of natural sources for new medicines has a long and successful history, exemplified by the fact that over 50% of all drugs currently on the market are either derived from or inspired by natural products. However, development of new natural product-based therapeutics has been on the decline over the past 20 years, mainly owing to frequent re-discovery of already known compounds coupled with high costs for screening, characterization and development. With the onset of the genomic era allowing rapid sequencing and analysis of bacterial and fungal genomes, it became evident that these organisms possess 'hidden treasures' in the form of gene clusters potentially governing biosynthesis of novel biologically active compounds. This review highlights current progress in mining for and expression of these gene clusters, which may revolutionize the drug discovery pipelines in the near future.

  4. The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

    Science.gov (United States)

    Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

    2013-01-01

    Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520

  5. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    Directory of Open Access Journals (Sweden)

    Guozheng Liu

    Full Text Available BACKGROUND: Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L. is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt genome could be helpful for the evolution research of plant mt genomes. METHODOLOGY/PRINCIPAL FINDINGS: We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. CONCLUSION: The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  6. Hyperstructures, genome analysis and I-cells

    DEFF Research Database (Denmark)

    Amar, P.; Ballet, P.; Barlovatz-Meimon, G.

    2002-01-01

    familiar to biologists. Finally, we speculate on how a variety of in silico approaches involving cellular automata and multi-agent systems could be combined to develop new concepts in the form of an Integrated cell (I-cell) which would undergo selection for growth and survival in a world of artificial......New concepts may prove necessary to profit from the avalanche of sequence data on the genome, transcriptome, proteome and interactome and to relate this information to cell physiology. Here, we focus on the concept of large activity-based structures, or hyperstructures, in which a variety of types...... of molecules are brought together to perform a function. We review the evidence for the existence of hyperstructures responsible for the initiation of DNA replication, the sequestration of newly replicated origins of replication, cell division and for metabolism. The processes responsible for hyperstructure...

  7. Ensemble analysis of adaptive compressed genome sequencing strategies

    Science.gov (United States)

    2014-01-01

    Background Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. Results In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource

  8. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  9. Coordinates and intervals in graph-based reference genomes.

    Science.gov (United States)

    Rand, Knut D; Grytten, Ivar; Nederbragt, Alexander J; Storvik, Geir O; Glad, Ingrid K; Sandve, Geir K

    2017-05-18

    It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .

  10. Comparative analysis of Acinetobacters: three genomes for three lifestyles.

    Directory of Open Access Journals (Sweden)

    David Vallenet

    Full Text Available Acinetobacter baumannii is the source of numerous nosocomial infections in humans and therefore deserves close attention as multidrug or even pandrug resistant strains are increasingly being identified worldwide. Here we report the comparison of two newly sequenced genomes of A. baumannii. The human isolate A. baumannii AYE is multidrug resistant whereas strain SDF, which was isolated from body lice, is antibiotic susceptible. As reference for comparison in this analysis, the genome of the soil-living bacterium A. baylyi strain ADP1 was used. The most interesting dissimilarities we observed were that i whereas strain AYE and A. baylyi genomes harbored very few Insertion Sequence elements which could promote expression of downstream genes, strain SDF sequence contains several hundred of them that have played a crucial role in its genome reduction (gene disruptions and simple DNA loss; ii strain SDF has low catabolic capacities compared to strain AYE. Interestingly, the latter has even higher catabolic capacities than A. baylyi which has already been reported as a very nutritionally versatile organism. This metabolic performance could explain the persistence of A. baumannii nosocomial strains in environments where nutrients are scarce; iii several processes known to play a key role during host infection (biofilm formation, iron uptake, quorum sensing, virulence factors were either different or absent, the best example of which is iron uptake. Indeed, strain AYE and A. baylyi use siderophore-based systems to scavenge iron from the environment whereas strain SDF uses an alternate system similar to the Haem Acquisition System (HAS. Taken together, all these observations suggest that the genome contents of the 3 Acinetobacters compared are partly shaped by life in distinct ecological niches: human (and more largely hospital environment, louse, soil.

  11. CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2009-08-01

    Full Text Available Abstract Background Viruses and small-genome bacteria (~2 megabases and smaller comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. Findings CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. Conclusion CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.

  12. Integrated Genome-Based Studies of Shewanella Echophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Margrethe H. Serres

    2012-06-29

    Shewanella oneidensis MR-1 is a motile, facultative {gamma}-Proteobacterium with remarkable respiratory versatility; it can utilize a range of organic and inorganic compounds as terminal electronacceptors for anaerobic metabolism. The ability to effectively reduce nitrate, S0, polyvalent metals andradionuclides has established MR-1 as an important model dissimilatory metal-reducing microorganism for genome-based investigations of biogeochemical transformation of metals and radionuclides that are of concern to the U.S. Department of Energy (DOE) sites nationwide. Metal-reducing bacteria such as Shewanella also have a highly developed capacity for extracellular transfer of respiratory electrons to solid phase Fe and Mn oxides as well as directly to anode surfaces in microbial fuel cells. More broadly, Shewanellae are recognized free-living microorganisms and members of microbial communities involved in the decomposition of organic matter and the cycling of elements in aquatic and sedimentary systems. To function and compete in environments that are subject to spatial and temporal environmental change, Shewanella must be able to sense and respond to such changes and therefore require relatively robust sensing and regulation systems. The overall goal of this project is to apply the tools of genomics, leveraging the availability of genome sequence for 18 additional strains of Shewanella, to better understand the ecophysiology and speciation of respiratory-versatile members of this important genus. To understand these systems we propose to use genome-based approaches to investigate Shewanella as a system of integrated networks; first describing key cellular subsystems - those involved in signal transduction, regulation, and metabolism - then building towards understanding the function of whole cells and, eventually, cells within populations. As a general approach, this project will employ complimentary "top-down" - bioinformatics-based genome functional predictions, high

  13. Integrative Genomics with Mediation Analysis in a Survival Context

    Directory of Open Access Journals (Sweden)

    Szilárd Nemes

    2013-01-01

    Full Text Available DNA copy number aberrations (DCNA and subsequent altered gene expression profiles may have a major impact on tumor initiation, on development, and eventually on recurrence and cancer-specific mortality. However, most methods employed in integrative genomic analysis of the two biological levels, DNA and RNA, do not consider survival time. In the present note, we propose the adoption of a survival analysis-based framework for the integrative analysis of DCNA and mRNA levels to reveal their implication on patient clinical outcome with the prerequisite that the effect of DCNA on survival is mediated by mRNA levels. The specific aim of the paper is to offer a feasible framework to test the DCNA-mRNA-survival pathway. We provide statistical inference algorithms for mediation based on asymptotic results. Furthermore, we illustrate the applicability of the method in an integrative genomic analysis setting by using a breast cancer data set consisting of 141 invasive breast tumors. In addition, we provide implementation in R.

  14. Phylogeny and evolution of Cervidae based on complete mitochondrial genomes.

    Science.gov (United States)

    Zhang, W-Q; Zhang, M-H

    2012-03-14

    Mitochondrial DNA sequences can be used to estimate phylogenetic relationships among animal taxa and for molecular phylogenetic evolution analysis. With the development of sequencing technology, more and more mitochondrial sequences have been made available in public databases, including whole mitochondrial DNA sequences. These data have been used for phylogenetic analysis of animal species, and for studies of evolutionary processes. We made phylogenetic analyses of 19 species of Cervidae, with Bos taurus as the outgroup. We used neighbor joining, maximum likelihood, maximum parsimony, and Bayesian inference methods on whole mitochondrial genome sequences. The consensus phylogenetic trees supported monophyly of the family Cervidae; it was divided into two subfamilies, Plesiometacarpalia and Telemetacarpalia, and four tribes, Cervinae, Muntiacinae, Hydropotinae, and Odocoileinae. The divergence times in these families were estimated by phylogenetic analysis using the Bayesian method with a relaxed molecular clock method; the results were consistent with those of previous studies. We concluded that the evolutionary structure of the family Cervidae can be reconstructed by phylogenetic analysis based on whole mitochondrial genomes; this method could be used broadly in phylogenetic evolutionary analysis of animal taxa.

  15. Comparison of genomes of eight species of sections Linum and Adenolinum from the genus Linum based on chromosome banding, molecular markers and RAPD analysis.

    Science.gov (United States)

    Muravenko, Olga V; Yurkevich, Olga Yu; Bolsheva, Nadezhda L; Samatadze, Tatiana E; Nosova, Inna V; Zelenina, Daria A; Volkov, Alexander A; Popov, Konstantin V; Zelenin, Alexander V

    2009-03-01

    Karyotypes of species sects. Linum and Adenolinum have been studied using C/DAPI-banding, Ag-NOR staining, FISH with 5S and 26S rDNA and RAPD analysis. C/DAPI-banding patterns enabled identification of all homologous chromosome pairs in the studied karyotypes. The revealed high similarity between species L. grandiflorum (2n = 16) and L. decumbens by chromosome and molecular markers proved their close genome relationship and identified the chromosome number in L. decumbens as 2n = 16. The similarity found for C/DAPI-banding patterns between species with the same chromosome numbers corresponds with the results obtained by RAPD-analysis, showing clusterization of 16-, 18- and 30-chromosome species into three separate groups. 5S rDNA and 26S rDNA were co-localized in NOR-chromosome 1 in the genomes of all species investigated. In 30-chromosome species, there were three separate 5S rDNA sites in chromosomes 3, 8 and 13. In 16-chromosome species, a separate 5S rDNA site was also located in chromosome 3, whereas in 18-chromosome species it was found in the long arm of NOR-chromosome 1. Thus, the difference in localization of rDNA sites in species with 2n = 16, 2n = 30 and 2n = 18 confirms taxonomists opinion, who attributed these species to different sects. Linum and Adenolinum, respectively. The obtained results suggest that species with 2n = 16, 2n = 18 and 2n = 30 originated from a 16-chromosome ancestor.

  16. Insights into the physiology and ecology of the brackish-water-adapted Cyanobacterium Nodularia spumigena CCY9414 based on a genome-transcriptome analysis.

    Directory of Open Access Journals (Sweden)

    Björn Voss

    Full Text Available Nodularia spumigena is a filamentous diazotrophic cyanobacterium that dominates the annual late summer cyanobacterial blooms in the Baltic Sea. But N. spumigena also is common in brackish water bodies worldwide, suggesting special adaptation allowing it to thrive at moderate salinities. A draft genome analysis of N. spumigena sp. CCY9414 yielded a single scaffold of 5,462,271 nucleotides in length on which genes for 5,294 proteins were annotated. A subsequent strand-specific transcriptome analysis identified more than 6,000 putative transcriptional start sites (TSS. Orphan TSSs located in intergenic regions led us to predict 764 non-coding RNAs, among them 70 copies of a possible retrotransposon and several potential RNA regulators, some of which are also present in other N2-fixing cyanobacteria. Approximately 4% of the total coding capacity is devoted to the production of secondary metabolites, among them the potent hepatotoxin nodularin, the linear spumigin and the cyclic nodulapeptin. The transcriptional complexity associated with genes involved in nitrogen fixation and heterocyst differentiation is considerably smaller compared to other Nostocales. In contrast, sophisticated systems exist for the uptake and assimilation of iron and phosphorus compounds, for the synthesis of compatible solutes, and for the formation of gas vesicles, required for the active control of buoyancy. Hence, the annotation and interpretation of this sequence provides a vast array of clues into the genomic underpinnings of the physiology of this cyanobacterium and indicates in particular a competitive edge of N. spumigena in nutrient-limited brackish water ecosystems.

  17. Two-component signal transduction in Agaricus bisporus: a comparative genomic analysis with other basidiomycetes through the web-based tool BASID2CS.

    Science.gov (United States)

    Lavín, José L; García-Yoldi, Alberto; Ramírez, Lucía; Pisabarro, Antonio G; Oguiza, José A

    2013-06-01

    Two-component systems (TCSs) are signal transduction mechanisms present in many eukaryotes, including fungi that play essential roles in the regulation of several cellular functions and responses. In this study, we carry out a genomic analysis of the TCS proteins in two varieties of the white button mushroom Agaricus bisporus. The genomes of both A. bisporus varieties contain eight genes coding for TCS proteins, which include four hybrid Histidine Kinases (HKs), a single histidine-containing phosphotransfer (HPt) protein and three Response Regulators (RRs). Comparison of the TCS proteins among A. bisporus and the sequenced basidiomycetes showed a conserved core complement of five TCS proteins including the Tco1/Nik1 hybrid HK, HPt protein and Ssk1, Skn7 and Rim15-like RRs. In addition, Dual-HKs, unusual hybrid HKs with 2 HK and 2 RR domains, are absent in A. bisporus and are limited to various species of basidiomycetes. Differential expression analysis showed no significant up- or down-regulation of the Agaricus TCS genes in the conditions/tissue analyzed with the exception of the Skn7-like RR gene (Agabi_varbisH97_2|198669) that is significantly up-regulated on compost compared to cultured mycelia. Furthermore, the pipeline web server BASID2CS (http://bioinformatics.unavarra.es:1000/B2CS/BASID2CS.htm) has been specifically designed for the identification, classification and functional annotation of putative TCS proteins from any predicted proteome of basidiomycetes using a combination of several bioinformatic approaches. Copyright © 2012 Elsevier Inc. All rights reserved.

  18. Ontology-Based Search of Genomic Metadata.

    Science.gov (United States)

    Fernandez, Javier D; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano

    2016-01-01

    The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.

  19. Sequencing and Analysis of Neanderthal Genomic DNA

    OpenAIRE

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo, Svante; Pritchard, Jonathan K; Rubin, Edward M.

    2006-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library a...

  20. CRISPR/Cas9 based genome editing of Penicillium chrysogenum

    NARCIS (Netherlands)

    Pohl, Carsten; Kiel, Jan A K W; Driessen, Arnold J M; Bovenberg, Roel A L; Nygård, Yvonne

    2016-01-01

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially rel

  1. Genomic analysis of the F3031 Brazilian purpuric fever clone of Haemophilus influenzae biogroup aegyptius by PCR-based subtractive hybridization.

    Science.gov (United States)

    Smoot, Laura M; Franke, Deanna D; McGillivary, Glen; Actis, Luis A

    2002-05-01

    PCR-based subtractive genome hybridization produced clones harboring inserts present in Brazilian purpuric fever (BPF) prototype strain F3031 but absent in noninvasive Haemophilus influenzae biogroup aegyptius isolate F1947. Some of these inserts have no matches in the GenBank database, while others are similar to genes encoding either known or hypothetical proteins. One insert represents a 2.3-kb locus with similarity to a Thermotoga maritima hypothetical protein, while another is part of a 7.6-kb locus that contains predicted genes encoding hypothetical, phage-related, and carotovoricin Er-like proteins. The presence of DNA related to these loci is variable among BPF isolates and nontypeable H. influenzae strains, while neither of them was detected in strains of types a to f. The data indicate that BPF-causing strain F3031 harbors unique chromosomal regions, most of which appear to be acquired from unrelated microbial sources.

  2. Genome-wide gene-based analysis suggests an association between Neuroligin 1 (NLGN1) and post-traumatic stress disorder

    Science.gov (United States)

    Kilaru, V; Iyer, S V; Almli, L M; Stevens, J S; Lori, A; Jovanovic, T; Ely, T D; Bradley, B; Binder, E B; Koen, N; Stein, D J; Conneely, K N; Wingo, A P; Smith, A K; Ressler, K J

    2016-01-01

    Post-traumatic stress disorder (PTSD) develops in only some people following trauma exposure, but the mechanisms differentially explaining risk versus resilience remain largely unknown. PTSD is heritable but candidate gene studies and genome-wide association studies (GWAS) have identified only a modest number of genes that reliably contribute to PTSD. New gene-based methods may help identify additional genes that increase risk for PTSD development or severity. We applied gene-based testing to GWAS data from the Grady Trauma Project (GTP), a primarily African American cohort, and identified two genes (NLGN1 and ZNRD1-AS1) that associate with PTSD after multiple test correction. Although the top SNP from NLGN1 did not replicate, we observed gene-based replication of NLGN1 with PTSD in the Drakenstein Child Health Study (DCHS) cohort from Cape Town. NLGN1 has previously been associated with autism, and it encodes neuroligin 1, a protein involved in synaptogenesis, learning, and memory. Within the GTP dataset, a single nucleotide polymorphism (SNP), rs6779753, underlying the gene-based association, associated with the intermediate phenotypes of higher startle response and greater functional magnetic resonance imaging activation of the amygdala, orbitofrontal cortex, right thalamus and right fusiform gyrus in response to fearful faces. These findings support a contribution of the NLGN1 gene pathway to the neurobiological underpinnings of PTSD.

  3. Pathway and network analysis of cancer genomes

    DEFF Research Database (Denmark)

    Creixell, Pau; Reimand, Jueri; Haider, Syed

    2015-01-01

    Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been...

  4. Hellbender genome sequences shed light on genomic expansion at the base of crown salamanders.

    Science.gov (United States)

    Sun, Cheng; Mueller, Rachel Lockridge

    2014-07-01

    Among animals, genome sizes range from 20 Mb to 130 Gb, with 380-fold variation across vertebrates. Most of the largest vertebrate genomes are found in salamanders, an amphibian clade of 660 species. Thus, salamanders are an important system for studying causes and consequences of genomic gigantism. Previously, we showed that plethodontid salamander genomes accumulate higher levels of long terminal repeat (LTR) retrotransposons than do other vertebrates, although the evolutionary origins of such sequences remained unexplored. We also showed that some salamanders in the family Plethodontidae have relatively slow rates of DNA loss through small insertions and deletions. Here, we present new data from Cryptobranchus alleganiensis, the hellbender. Cryptobranchus and Plethodontidae span the basal phylogenetic split within salamanders; thus, analyses incorporating these taxa can shed light on the genome of the ancestral crown salamander lineage, which underwent expansion. We show that high levels of LTR retrotransposons likely characterize all crown salamanders, suggesting that disproportionate expansion of this transposable element (TE) class contributed to genomic expansion. Phylogenetic and age distribution analyses of salamander LTR retrotransposons indicate that salamanders' high TE levels reflect persistence and diversification of ancestral TEs rather than horizontal transfer events. Finally, we show that relatively slow DNA loss rates through small indels likely characterize all crown salamanders, suggesting that a decreased DNA loss rate contributed to genomic expansion at the clade's base. Our identification of shared genomic features across phylogenetically distant salamanders is a first step toward identifying the evolutionary processes underlying accumulation and persistence of high levels of repetitive sequence in salamander genomes.

  5. Whole genome microarray analysis, from neonatal blood cards

    Directory of Open Access Journals (Sweden)

    Hogan Michael E

    2009-07-01

    Full Text Available Abstract Background Neonatal blood, obtained from a heel stick and stored dry on paper cards, has been the standard for birth defects screening for 50 years. Such dried blood samples are used, primarily, for analysis of small-molecule analytes. More recently, the DNA complement of such dried blood cards has been used for targeted genetic testing, such as for single nucleotide polymorphism in cystic fibrosis. Expansion of such testing to include polygenic traits, and perhaps whole genome scanning, has been discussed as a formal possibility. However, until now the amount of DNA that might be obtained from such dried blood cards has been limiting, due to inefficient DNA recovery technology. Results A new technology is employed for efficient DNA release from a standard neonatal blood card. Using standard Guthrie cards, stored an average of ten years post-collection, about 1/40th of the air-dried neonatal blood specimen (two 3 mm punches was processed to obtain DNA that was sufficient in mass and quality for direct use in microarray-based whole genome scanning. Using that same DNA release technology, it is also shown that approximately 1/250th of the original purified DNA (about 1 ng could be subjected to whole genome amplification, thus yielding an additional microgram of amplified DNA product. That amplified DNA product was then used in microarray analysis and yielded statistical concordance of 99% or greater to the primary, unamplified DNA sample. Conclusion Together, these data suggest that DNA obtained from less than 10% of a standard neonatal blood specimen, stored dry for several years on a Guthrie card, can support a program of genome-wide neonatal genetic testing.

  6. Genome analysis methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available [ Credits ] BLAST Search Image Search Home About Archive Update History Contact us PGDBj Registered...ear Year of genome analysis Sequencing method Sequencing method Read counts Read counts Covered genome region Covered...otation method Number of predicted genes Number of predicted genes Genome database Genome database informati... License Update History of This Database Site Policy | Contact Us Genome analysis... methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  7. A Novel Uncultured Bacterium of the Family Gallionellaceae: Description and Genome Reconstruction Based on the Metagenomic Analysis of Microbial Community in Acid Mine Drainage.

    Science.gov (United States)

    Kadnikov, V V; Ivasenko, D A; Beletsky, A V; Mardanov, A V; Danilova, E V; Pimenov, N V; Karnachuk, O V; Ravin, N V

    2016-07-01

    Drainage waters at the metal mining areas often have low pH and high content of dissolved metals due to oxidation of sulfide minerals. Extreme conditions limit microbial diversity in- such ecosystems. A drainage water microbial community (6.5'C, pH 2.65) in an open pit at the Sherlovaya Gora polymetallic open-cast mine (Transbaikal region, Eastern Siberia, Russia) was studied using metagenomic techniques. Metagenome sequencing provided information for taxonomic and functional characterization of the micro- bial community. The majority of microorganisms belonged to a single uncultured lineage representing a new Betaproteobacteria species of the genus Gallionella. While no.acidophiles are known among the cultured members of the family Gallionellaceae, similar 16S rRNA gene sequences were detected in acid mine drain- ages. Bacteria ofthe genera Thiobacillus, Acidobacterium, Acidisphaera, and Acidithiobacillus,-which are com- mon in acid mine drainage environments, were the minor components of the community. Metagenomic data were -used to determine the almost complete (-3.4 Mb) composite genome of the new bacterial. lineage desig- nated Candidatus Gallionella acididurans ShG14-8. Genome analysis revealed that Fe(II) oxidation probably involved the cytochromes localized on the outer membrane of the cell. The electron transport chain included NADH dehydrogenase, a cytochrome bc1 complex, an alternative complex III, and cytochrome oxidases of the bd, cbb3, and bo3 types. Oxidation of reduced sulfur compounds probably involved the Sox system, sul- fide-quinone oxidoreductase, adenyl sulfate reductase, and sulfate adenyltransferase. The genes required for autotrophic carbon assimilation via the Calvin cycle were present, while no pathway for nitrogen fixation was revealed. High numbers of RND metal transporters and P type ATPases were probably responsible for resis- tance to heavy metals. The new microorganism was an aerobic chemolithoautotroph of the group of

  8. StellaBase: the Nematostella vectensis Genomics Database.

    Science.gov (United States)

    Sullivan, James C; Ryan, Joseph F; Watson, James A; Webb, Jeramy; Mullikin, James C; Rokhsar, Daniel; Finnerty, John R

    2006-01-01

    StellaBase, the Nematostella vectensis Genomics Database, is a web-based resource that will facilitate desktop and bench-top studies of the starlet sea anemone. Nematostella is an emerging model organism that has already proven useful for addressing fundamental questions in developmental evolution and evolutionary genomics. StellaBase allows users to query the assembled Nematostella genome, a confirmed gene library, and a predicted genome using both keyword and homology based search functions. Data provided by these searches will elucidate gene family evolution in early animals. Unique research tools, including a Nematostella genetic stock library, a primer library, a literature repository and a gene expression library will provide support to the burgeoning Nematostella research community. The development of StellaBase accompanies significant upgrades to CnidBase, the Cnidarian Evolutionary Genomics Database. With the completion of the first sequenced cnidarian genome, genome comparison tools have been added to CnidBase. In addition, StellaBase provides a framework for the integration of additional species-specific databases into CnidBase. StellaBase is available at http://www.stellabase.org.

  9. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Science.gov (United States)

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  10. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    Directory of Open Access Journals (Sweden)

    Param Priya Singh

    2015-07-01

    Full Text Available Whole genome duplications (WGD have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  11. Statistical analysis of simple repeats in the human genome

    Science.gov (United States)

    Piazza, F.; Liò, P.

    2005-03-01

    The human genome contains repetitive DNA at different level of sequence length, number and dispersion. Highly repetitive DNA is particularly rich in homo- and di-nucleotide repeats, while middle repetitive DNA is rich of families of interspersed, mobile elements hundreds of base pairs (bp) long, among which belong the Alu families. A link between homo- and di-polymeric tracts and mobile elements has been recently highlighted. In particular, the mobility of Alu repeats, which form 10% of the human genome, has been correlated with the length of poly(A) tracts located at one end of the Alu. These tracts have a rigid and non-bendable structure and have an inhibitory effect on nucleosomes, which normally compact the DNA. We performed a statistical analysis of the genome-wide distribution of lengths and inter-tract separations of poly(X) and poly(XY) tracts in the human genome. Our study shows that in humans the length distributions of these sequences reflect the dynamics of their expansion and DNA replication. By means of general tools from linguistics, we show that the latter play the role of highly-significant content-bearing terms in the DNA text. Furthermore, we find that such tracts are positioned in a non-random fashion, with an apparent periodicity of 150 bases. This allows us to extend the link between repetitive, highly mobile elements such as Alus and low-complexity words in human DNA. More precisely, we show that Alus are sources of poly(X) tracts, which in turn affect in a subtle way the combination and diversification of gene expression and the fixation of multigene families.

  12. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.

    Science.gov (United States)

    Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M

    2016-03-31

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.

  13. High frequency of submicroscopic chromosomal imbalances in patients with syndromic craniosynostosis detected by a combined approach of microsatellite segregation analysis, multiplex ligation-dependent probe amplification and array-based comparative genome hybridisation.

    NARCIS (Netherlands)

    Jehee, F.S.; Krepischi-Santos, A.C.; Rocha, K.M.; Cavalcanti, D.P.; Kim, C.A.; Bertola, D.R.; Alonso, L.G.; D'Angelo, C.S.; Mazzeu, J.F.; Froyen, G.; Lugtenberg, D.; Vianna-Morgante, A.M.; Rosenberg, C.; Passos-Bueno, M.R.

    2008-01-01

    We present the first comprehensive study, to our knowledge, on genomic chromosomal analysis in syndromic craniosynostosis. In total, 45 patients with craniosynostotic disorders were screened with a variety of methods including conventional karyotype, microsatellite segregation analysis, subtelomeric

  14. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan;

    2014-01-01

    than 1 kb. Excluding the 59 SVs (54 insertions/deletions, 5 inversions) that overlap with N-base gaps in the reference assembly hg19, 666 non-gap SVs remained, and 396 of them (60%) were verified by paired-end data from whole-genome sequencing-based re-sequencing or de novo assembly sequence from...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  15. Colibri: a functional data base for the Escherichia coli genome.

    Science.gov (United States)

    Médigue, C; Viari, A; Hénaut, A; Danchin, A

    1993-09-01

    Several data libraries have been created to organize all the data obtained worldwide about the Escherichia coli genome. Because the known data now amount to more than 40% of the whole genome sequence, it has become necessary to organize the data in such a way that appropriate procedures can associate knowledge produced by experiments about each gene to its position on the chromosome and its relation to other relevant genes, for example. In addition, global properties of genes, affected by the introduction of new entries, should be present as appropriate description fields. A data base, implemented on Macintosh by using the data base management system 4th Dimension, is described. It is constructed around a core constituted by known contigs of E. coli sequences and links data collected in general libraries (unmodified) to data associated with evolving knowledge (with modifiable fields). Biologically significant results obtained through the coupling of appropriate procedures (learning or statistical data analysis) are presented. The data base is available through a 4th Dimension runtime and through FTP on Internet. It has been regularly updated and will be systematically linked to other E. coli data bases (M. Kroger, R. Wahl, G. Schachtel, and P. Rice, Nucleic Acids Res. 20(Suppl.):2119-2144, 1992; K. E. Rudd, W. Miller, C. Werner, J. Ostell, C. Tolstoshev, and S. G. Satterfield, Nucleic Acids Res. 19:637-647, 1991) in the near future.

  16. Meta-Analysis of Genome-Wide Association Studies and Network Analysis-Based Integration with Gene Expression Data Identify New Suggestive Loci and Unravel a Wnt-Centric Network Associated with Dupuytren’s Disease

    Science.gov (United States)

    Becker, Kerstin; Siegert, Sabine; Toliat, Mohammad Reza; Du, Juanjiangmeng; Casper, Ramona; Dolmans, Guido H.; Werker, Paul M.; Tinschert, Sigrid; Franke, Andre; Gieger, Christian; Strauch, Konstantin; Nothnagel, Michael; Nürnberg, Peter; Hennies, Hans Christian

    2016-01-01

    Dupuytren´s disease, a fibromatosis of the connective tissue in the palm, is a common complex disease with a strong genetic component. Up to date nine genetic loci have been found to be associated with the disease. Six of these loci contain genes that code for Wnt signalling proteins. In spite of this striking first insight into the genetic factors in Dupuytren´s disease, much of the inherited risk in Dupuytren´s disease still needs to be discovered. The already identified loci jointly explain ~1% of the heritability in this disease. To further elucidate the genetic basis of Dupuytren´s disease, we performed a genome-wide meta-analysis combining three genome-wide association study (GWAS) data sets, comprising 1,580 cases and 4,480 controls. We corroborated all nine previously identified loci, six of these with genome-wide significance (p-value Dupuytren´s disease. PMID:27467239

  17. Genome bioinformatic analysis of nonsynonymous SNPs

    Directory of Open Access Journals (Sweden)

    Todd John A

    2007-08-01

    Full Text Available Abstract Background Genome-wide association studies of common diseases for common, low penetrance causal variants are underway. A proportion of these will alter protein sequences, the most common of which is the non-synonymous single nucleotide polymorphism (nsSNP. It would be an advantage if the functional effects of an nsSNP on protein structure and function could be predicted, both for the final identification process of a causal variant in a disease-associated chromosome region, and in further functional analyses of the nsSNP and its disease-associated protein. Results In the present report we have compared and contrasted structure- and sequence-based methods of prediction to over 5500 genes carrying nearly 24,000 nsSNPs, by employing an automatic comparative modelling procedure to build models for the genes. The nsSNP information came from two sources, the OMIM database which are rare (minor allele frequency, MAF, 0.05, have no known link to a disease. For over 40% of the nsSNPs, structure-based methods predicted which of these sequence changes are likely to either disrupt the structure of the protein or interfere with the function or interactions of the protein. For the remaining 60%, we generated sequence-based predictions. Conclusion We show that, in general, the prediction tools are able distinguish disease causing mutations from those mutations which are thought to have a neutral affect. We give examples of mutations in genes that are predicted to be deleterious and may have a role in disease. Contrary to previous reports, we also show that rare mutations are consistently predicted to be deleterious as often as commonly occurring nsSNPs.

  18. An acquisition account of genomic islands based on genome signature comparisons

    Directory of Open Access Journals (Sweden)

    Luyf ACM

    2005-11-01

    Full Text Available Abstract Background Recent analyses of prokaryotic genome sequences have demonstrated the important force horizontal gene transfer constitutes in genome evolution. Horizontally acquired sequences are detectable by, among others, their dinucleotide composition (genome signature dissimilarity with the host genome. Genomic islands (GIs comprise important and interesting horizontally transferred sequences, but information about acquisition events or relatedness between GIs is scarce. In Vibrio vulnificus CMCP6, 10 and 11 GIs have previously been identified in the sequenced chromosomes I and II, respectively. We assessed the compositional similarity and putative acquisition account of these GIs using the genome signature. For this analysis we developed a new algorithm, available as a web application. Results Of 21 GIs, VvI-1 and VvI-10 of chromosome I have similar genome signatures, and while artificially divided due to a linear annotation, they are adjacent on the circular chromosome and therefore comprise one GI. Similarly, GIs VvI-3 and VvI-4 of chromosome I together with the region between these two islands are compositionally similar, suggesting that they form one GI (making a total of 19 GIs in chromosome I + chromosome II. Cluster analysis assigned the 19 GIs to 11 different branches above our conservative threshold. This suggests a limited number of compositionally similar donors or intragenomic dispersion of ancestral acquisitions. Furthermore, 2 GIs of chromosome II cluster with chromosome I, while none of the 19 GIs group with chromosome II, suggesting an unidirectional dispersal of large anomalous gene clusters from chromosome I to chromosome II. Conclusion From the results, we infer 10 compositionally dissimilar donors for 19 GIs in the V. vulnificus CMCP6 genome, including chromosome I donating to chromosome II. This suggests multiple transfer events from individual donor types or from donors with similar genome signatures. Applied to

  19. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  20. BioMet Toolbox: genome-wide analysis of metabolism

    DEFF Research Database (Denmark)

    Cvijovic, M.; Olivares Hernandez, Roberto; Agren, R.

    2010-01-01

    models. Systematic analysis of biological processes by means of modelling and simulations has made the identification of metabolic networks and prediction of metabolic capabilities under different conditions possible. For facilitating such systemic analysis, we have developed the BioMet Toolbox, a web......-based resource for stoichiometric analysis and for integration of transcriptome and interactome data, thereby exploiting the capabilities of genome-scale metabolic models. The BioMet Toolbox provides an effective user-friendly way to perform linear programming simulations towards maximized or minimized growth...... rates, substrate uptake rates and metabolic production rates by detecting relevant fluxes, simulate single and double gene deletions or detect metabolites around which major transcriptional changes are concentrated. These tools can be used for high-throughput in silico screening and allows fully...

  1. Genomic risk profiling of ischemic stroke: results of an international genome-wide association meta-analysis.

    Directory of Open Access Journals (Sweden)

    James F Meschia

    Full Text Available INTRODUCTION: Familial aggregation of ischemic stroke derives from shared genetic and environmental factors. We present a meta-analysis of genome-wide association scans (GWAS from 3 cohorts to identify the contribution of common variants to ischemic stroke risk. METHODS: This study involved 1464 ischemic stroke cases and 1932 controls. Cases were genotyped using the Illumina 610 or 660 genotyping arrays; controls, with Illumina HumanHap 550Kv1 or 550Kv3 genotyping arrays. Imputation was performed with the 1000 Genomes European ancestry haplotypes (August 2010 release as a reference. A total of 5,156,597 single-nucleotide polymorphisms (SNPs were incorporated into the fixed effects meta-analysis. All SNPs associated with ischemic stroke (P<1×10(-5 were incorporated into a multivariate risk profile model. RESULTS: No SNP reached genome-wide significance for ischemic stroke (P<5×10(-8. Secondary analysis identified a significant cumulative effect for age at onset of stroke (first versus fifth quintile of cumulative profiles based on SNPs associated with late onset, ß = 14.77 [10.85,18.68], P = 5.5×10(-12, as well as a strong effect showing increased risk across samples with a high propensity for stroke among samples with enriched counts of suggestive risk alleles (P<5×10(-6. Risk profile scores based only on genomic information offered little incremental prediction. DISCUSSION: There is little evidence of a common genetic variant contributing to moderate risk of ischemic stroke. Quintiles based on genetic loading of alleles associated with a younger age at onset of ischemic stroke revealed a significant difference in age at onset between those in the upper and lower quintiles. Using common variants from GWAS and imputation, genomic profiling remains inferior to family history of stroke for defining risk. Inclusion of genomic (rare variant information may be required to improve clinical risk profiling.

  2. Radiation induced genome instability: multiscale modelling and data analysis

    Science.gov (United States)

    Andreev, Sergey; Eidelman, Yuri

    2012-07-01

    Genome instability (GI) is thought to be an important step in cancer induction and progression. Radiation induced GI is usually defined as genome alterations in the progeny of irradiated cells. The aim of this report is to demonstrate an opportunity for integrative analysis of radiation induced GI on the basis of multiscale modelling. Integrative, systems level modelling is necessary to assess different pathways resulting in GI in which a variety of genetic and epigenetic processes are involved. The multilevel modelling includes the Monte Carlo based simulation of several key processes involved in GI: DNA double strand breaks (DSBs) generation in cells initially irradiated as well as in descendants of irradiated cells, damage transmission through mitosis. Taking the cell-cycle-dependent generation of DNA/chromosome breakage into account ensures an advantage in estimating the contribution of different DNA damage response pathways to GI, as to nonhomologous vs homologous recombination repair mechanisms, the role of DSBs at telomeres or interstitial chromosomal sites, etc. The preliminary estimates show that both telomeric and non-telomeric DSB interactions are involved in delayed effects of radiation although differentially for different cell types. The computational experiments provide the data on the wide spectrum of GI endpoints (dicentrics, micronuclei, nonclonal translocations, chromatid exchanges, chromosome fragments) similar to those obtained experimentally for various cell lines under various experimental conditions. The modelling based analysis of experimental data demonstrates that radiation induced GI may be viewed as processes of delayed DSB induction/interaction/transmission being a key for quantification of GI. On the other hand, this conclusion is not sufficient to understand GI as a whole because factors of DNA non-damaging origin can also induce GI. Additionally, new data on induced pluripotent stem cells reveal that GI is acquired in normal mature

  3. Initial sequencing and analysis of the human genome.

    Science.gov (United States)

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  4. The genetic aetiology of cannabis use initiation: A meta-analysis of genome-wide association studies and a SNP-based heritability estimation

    NARCIS (Netherlands)

    Verweij, C.J.H.; Vinkhuyzen, A.A.E.; Benyamin, B.; Lynskey, M.T.; Quaye, L.; Agrawal, A.; Gordon, S.D.; Montgomery, G.W.; Madden, P.A.F.; Heath, A.C.; Spector, T.D.; Martin, N.G.; Medland, S.E.

    2013-01-01

    While initiation of cannabis use is around 40% heritable, not much is known about the underlying genetic aetiology. Here, we meta-analysed two genome-wide association studies of initiation of cannabis use with >10000 individuals. None of the genetic variants reached genome-wide significance. We a

  5. On genome-wide association studies for family-based designs: an integrative analysis approach combining ascertained family samples with unselected controls.

    NARCIS (Netherlands)

    Lasky-Su, J.; Won, S.; Mick, E.; Anney, R.J.; Franke, B.; Neale, B.; Biederman, J.; Smalley, S.L.; Loo, S.K.; Todorov, A.; Faraone, S.V.; Weiss, S.T.; Lange, C.

    2010-01-01

    Large numbers of control individuals with genome-wide genotype data are now available through various databases. These controls are regularly used in case-control genome-wide association studies (GWAS) to increase the statistical power. Controls are often "unselected" for the disease of interest and

  6. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

    Science.gov (United States)

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038

  7. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    Science.gov (United States)

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  8. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level.

    Science.gov (United States)

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea's genetic data sources.

  9. Analysis of Simple Sequence Repeats in Genomes of Rhizobia

    Institute of Scientific and Technical Information of China (English)

    GAO Ya-mei; HAN Yi-qiang; TANG Hui; SUN Dong-mei; WANG Yan-jie; WANG Wei-dong

    2008-01-01

    Simple sequence repeats (SSRs) or microsatellites, as genetic markers, are ubiquitous in genomes of various organisms. The analysis of SSR in rhizobia genome provides useful information for a variety of applications in population genetics of rhizobia. We analyzed the occurrences, relative abundance, and relative density of SSRs, the most common in Bradyrhizobium japonicum, Mesorhizobium loti, and Sinorhizobium meliloti genomes se-quenced in the microorganisms tandem repeats database, and SSRs in the three species genomes were compared with each other. The result showed that there were 1 410, 859, and 638 SSRs in B. japonicum, M. loti, and 5. meliloti genomes, respectively. In the genomes of B. japonicum, M. loti, and 5. meliloti, tetranucleotide, pentanucleotide, and hexanucleotide repeats were more abundant and indicated higher mutation rates in these species. The least abundance was mononucleotide repeat. The SSRs type and distribution were similar among these species.

  10. Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome.

    Science.gov (United States)

    Wang, Yi; Liu, Xianju; Ren, Chong; Zhong, Gan-Yuan; Yang, Long; Li, Shaohua; Liang, Zhenchang

    2016-04-21

    CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of humans, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific target sites for CRISPR/Cas9 have been computationally identified for several annual model and crop species, but such sites have not been reported for perennial, woody fruit species. In this study, we identified and characterized five types of CRISPR/Cas9 target sites in the widely cultivated grape species Vitis vinifera and developed a user-friendly database for editing grape genomes in the future. A total of 35,767,960 potential CRISPR/Cas9 target sites were identified from grape genomes in this study. Among them, 22,597,817 target sites were mapped to specific genomic locations and 7,269,788 were found to be highly specific. Protospacers and PAMs were found to distribute uniformly and abundantly in the grape genomes. They were present in all the structural elements of genes with the coding region having the highest abundance. Five PAM types, TGG, AGG, GGG, CGG and NGG, were observed. With the exception of the NGG type, they were abundantly present in the grape genomes. Synteny analysis of similar genes revealed that the synteny of protospacers matched the synteny of homologous genes. A user-friendly database containing protospacers and detailed information of the sites was developed and is available for public use at the Grape-CRISPR website ( http://biodb.sdau.edu.cn/gc/index.html ). Grape genomes harbour millions of potential CRISPR/Cas9 target sites. These sites are widely distributed among and within chromosomes with predominant abundance in the coding regions of genes. We developed a publicly-accessible Grape-CRISPR database for facilitating the use of the CRISPR/Cas9 system as a genome editing tool for functional studies and molecular breeding of grapes. Among

  11. Analysis of the Vibrionaceae pan-genome

    OpenAIRE

    Kahlke, Tim

    2013-01-01

    Paper 2 of this thesis is not available in Munin: 2. Tim Kahlke, Alexander Goesmann and Peik Haugen: 'The Vibrionaceae pan-genome hints at gene expression as the major driving force for unequal gene distributions on Vibrionaceae chromosomes' (manuscript) In the presented work the bacterial family Vibrionaceae was used as a model to investigate bacterial diversity on a gene level and to analyze the underlying concepts of bacterial niche adaptation and evolution. For this, the genomes ...

  12. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  13. Integrated genomic analysis of survival outliers in glioblastoma.

    Science.gov (United States)

    Peng, Sen; Dhruv, Harshil; Armstrong, Brock; Salhia, Bodour; Legendre, Christophe; Kiefer, Jeffrey; Parks, Julianna; Virk, Selene; Sloan, Andrew E; Ostrom, Quinn T; Barnholtz-Sloan, Jill S; Tran, Nhan L; Berens, Michael E

    2017-06-01

    To elucidate molecular features associated with disproportionate survival of glioblastoma (GB) patients, we conducted deep genomic comparative analysis of a cohort of patients receiving standard therapy (surgery plus concurrent radiation and temozolomide); "GB outliers" were identified: long-term survivor of 33 months (LTS; n = 8) versus short-term survivor of 7 months (STS; n = 10). We implemented exome, RNA, whole genome sequencing, and DNA methylation for collection of deep genomic data from STS and LTS GB patients. LTS GB showed frequent chromosomal gains in 4q12 (platelet derived growth factor receptor alpha and KIT) and 12q14.1 (cyclin-dependent kinase 4), and deletion in 19q13.33 (BAX, branched chain amino-acid transaminase 2, and cluster of differentiation 33). STS GB showed frequent deletion in 9p11.2 (forkhead box D4-like 2 and aquaporin 7 pseudogene 3) and 22q11.21 (Hypermethylated In Cancer 2). LTS GB showed 2-fold more frequent copy number deletions compared with STS GB. Gene expression differences showed the STS cohort with altered transcriptional regulators: activation of signal transducer and activator of transcription (STAT)5a/b, nuclear factor-kappaB (NF-κB), and interferon-gamma (IFNG), and inhibition of mitogen-activated protein kinase (MAPK1), extracellular signal-regulated kinase (ERK)1/2, and estrogen receptor (ESR)1. Expression-based biological concepts prominent in the STS cohort include metabolic processes, anaphase-promoting complex degradation, and immune processes associated with major histocompatibility complex class I antigen presentation; the LTS cohort features genes related to development, morphogenesis, and the mammalian target of rapamycin signaling pathway. Whole genome methylation analyses showed that a methylation signature of 89 probes distinctly separates LTS from STS GB tumors. We posit that genomic instability is associated with longer survival of GB (possibly with vulnerability to standard therapy); conversely, genomic

  14. Genomic analysis of primordial dwarfism reveals novel disease genes.

    Science.gov (United States)

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  15. Comparative analysis of the base compositions of the pre-mRNA 3' cleaved-off region and the mRNA 3' untranslated region relative to the genomic base composition in animals and plants.

    Science.gov (United States)

    Li, Xiu-Qing

    2014-01-01

    The precursor messenger RNA (pre-mRNA) three-prime cleaved-off region (3'COR) and the mRNA three-prime untranslated region (3'UTR) play critical roles in regulating gene expression. The differences in base composition between these regions and the corresponding genomes are still largely uncharacterized in animals and plants. In this study, the base compositions of non-redundant 3'CORs and 3'UTRs were compared with the corresponding whole genomes of eleven animals, four dicotyledonous plants, and three monocotyledonous (cereal) plants. Among the four bases (A, C, G, and U for adenine, cytosine, guanine, and uracil, respectively), U (which corresponds to T, for thymine, in DNA) was the most frequent, A the second most frequent, G the third most frequent, and C the least frequent in most of the species in both the 3'COR and 3'UTR regions. In comparison with the whole genomes, in both regions the U content was usually the most overrepresented (particularly in the monocotyledonous plants), and the C content was the most underrepresented. The order obtained for the species groups, when ranked from high to low according to the U contents in the 3'COR and 3'UTR was as follows: dicotyledonous plants, monocotyledonous plants, non-mammal animals, and mammals. In contrast, the genomic T content was highest in dicotyledonous plants, lowest in monocotyledonous plants, and intermediate in animals. These results suggest the following: 1) there is a mechanism operating in both animals and plants which is biased toward U and against C in the 3'COR and 3'UTR; 2) the 3'UTR and 3'COR, as functional units, minimized the difference between dicotyledonous and monocotyledonous plants, while the dicotyledonous and monocotyledonous genomes evolved into two extreme groups in terms of base composition.

  16. Post-Fragmentation Whole Genome Amplification-Based Method

    Science.gov (United States)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (fragments with defined 3 and 5 termini. Specific primers to these termini are then used to isothermally amplify this library into potentially unlimited quantities that can be used immediately for multiple downstream applications including gel eletrophoresis, quantitative polymerase chain reaction (QPCR), comparative genomic hybridization microarray, SNP analysis, and sequencing. The standard reaction can be performed with minimal hands-on time, and can produce amplified DNA in as little as three hours. Post-fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have inherent sample loss issues at

  17. An Alternative Methodological Approach for Cost-Effectiveness Analysis and Decision Making in Genomic Medicine.

    Science.gov (United States)

    Fragoulakis, Vasilios; Mitropoulou, Christina; van Schaik, Ron H; Maniadakis, Nikolaos; Patrinos, George P

    2016-05-01

    Genomic Medicine aims to improve therapeutic interventions and diagnostics, the quality of life of patients, but also to rationalize healthcare costs. To reach this goal, careful assessment and identification of evidence gaps for public health genomics priorities are required so that a more efficient healthcare environment is created. Here, we propose a public health genomics-driven approach to adjust the classical healthcare decision making process with an alternative methodological approach of cost-effectiveness analysis, which is particularly helpful for genomic medicine interventions. By combining classical cost-effectiveness analysis with budget constraints, social preferences, and patient ethics, we demonstrate the application of this model, the Genome Economics Model (GEM), based on a previously reported genome-guided intervention from a developing country environment. The model and the attendant rationale provide a practical guide by which all major healthcare stakeholders could ensure the sustainability of funding for genome-guided interventions, their adoption and coverage by health insurance funds, and prioritization of Genomic Medicine research, development, and innovation, given the restriction of budgets, particularly in developing countries and low-income healthcare settings in developed countries. The implications of the GEM for the policy makers interested in Genomic Medicine and new health technology and innovation assessment are also discussed.

  18. Distribution and molecular profiling of class 1 integrons in MDR Acinetobacter baumannii isolates and whole genome-based analysis of antibiotic resistance mechanisms in a representative strain.

    Science.gov (United States)

    Zhu, Yuying; Yi, Yong; Liu, Fei; Lv, Na; Yang, Xi; Li, Jing; Hu, Yongfei; Zhu, Baoli

    2014-11-01

    The class 1 integron is an important driver of the nosocomial dissemination of multidrug-resistant (MDR) bacteria, such as Acinetobacters. In this study, we characterized the gene cassette arrays of class 1 integrons in Acinetobacter baumannii, where the detailed structure of these integrons for 38 clinical strains was analyzed. The results showed that there are three types of gene cassette arrays that are carried by different class 1 integrons, among them the aac(6')-IId-catB8-aadA1 array was the most prevalent. For detailed analysis of the integron structure, whole genome sequencing was carried out on strain AB16, and it was found that a single integron on its chromosome has a partial Tn21 transposon in its 5' flanking region and two complete copies of the insertion element IS26 in both the 5' and 3' flanking regions, indicating that the integron could be acquired by horizontal gene transfer. Furthermore, there is one resistance island AbaR22, one bla gene containing a transposon, four intrinsic resistant genes and one efflux pump that together confer six types of antibiotic resistance.

  19. Whole Genome Amplification in Genomic Analysis of Single Circulating Tumor Cells.

    Science.gov (United States)

    Gasch, Christin; Pantel, Klaus; Riethdorf, Sabine

    2015-01-01

    Investigation of the genome of organisms is one of the major basics in molecular biology to understand the complex organization of cells. While genomic DNA can easily be isolated from tissues or cell cultures of plant, animal or human origin, DNA extraction from single cells is still challenging. Here, we describe three techniques for the amplification of genomic DNA of fixed single circulating tumor cells (CTC) isolated from blood of cancer patients. This amplification is aimed to increase DNA amounts from those of one cell to yields sufficient for different DNA analyses such as mutational analysis including next-generation sequencing, array-comparative genome hybridization (CGH), and quantitative measurement of gene amplifications. Molecular analysis of CTC as liquid biopsy can be used to identify therapeutic targets in personalized medicine directed, e.g. against human epidermal growth factor receptor 2 (HER2) or epidermal growth factor receptor (EGFR) and to stratify the patients to those therapies.

  20. Comparative analysis of catfish BAC end sequences with the zebrafish genome

    Directory of Open Access Journals (Sweden)

    Abernathy Jason

    2009-12-01

    Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.

  1. Chromosomes in the flow to simplify genome analysis.

    Science.gov (United States)

    Doležel, Jaroslav; Vrána, Jan; Safář, Jan; Bartoš, Jan; Kubaláková, Marie; Simková, Hana

    2012-08-01

    Nuclear genomes of human, animals, and plants are organized into subunits called chromosomes. When isolated into aqueous suspension, mitotic chromosomes can be classified using flow cytometry according to light scatter and fluorescence parameters. Chromosomes of interest can be purified by flow sorting if they can be resolved from other chromosomes in a karyotype. The analysis and sorting are carried out at rates of 10(2)-10(4) chromosomes per second, and for complex genomes such as wheat the flow sorting technology has been ground-breaking in reducing genome complexity for genome sequencing. The high sample rate provides an attractive approach for karyotype analysis (flow karyotyping) and the purification of chromosomes in large numbers. In characterizing the chromosome complement of an organism, the high number that can be studied using flow cytometry allows for a statistically accurate analysis. Chromosome sorting plays a particularly important role in the analysis of nuclear genome structure and the analysis of particular and aberrant chromosomes. Other attractive but not well-explored features include the analysis of chromosomal proteins, chromosome ultrastructure, and high-resolution mapping using FISH. Recent results demonstrate that chromosome flow sorting can be coupled seamlessly with DNA array and next-generation sequencing technologies for high-throughput analyses. The main advantages are targeting the analysis to a genome region of interest and a significant reduction in sample complexity. As flow sorters can also sort single copies of chromosomes, shotgun sequencing DNA amplified from them enables the production of haplotype-resolved genome sequences. This review explains the principles of flow cytometric chromosome analysis and sorting (flow cytogenetics), discusses the major uses of this technology in genome analysis, and outlines future directions.

  2. wFleaBase: the Daphnia genome database

    Directory of Open Access Journals (Sweden)

    Singan Vasanth R

    2005-03-01

    Full Text Available Abstract Background wFleaBase is a database with the necessary infrastructure to curate, archive and share genetic, molecular and functional genomic data and protocols for an emerging model organism, the microcrustacean Daphnia. Commonly known as the water-flea, Daphnia's ecological merit is unequaled among metazoans, largely because of its sentinel role within freshwater ecosystems and over 200 years of biological investigations. By consequence, the Daphnia Genomics Consortium (DGC has launched an interdisciplinary research program to create the resources needed to study genes that affect ecological and evolutionary success in natural environments. Discussion These tools include the genome database wFleaBase, which currently contains functions to search and extract information from expressed sequenced tags, genome survey sequences and full genome sequencing projects. This new database is built primarily from core components of the Generic Model Organism Database project, and related bioinformatics tools. Summary Over the coming year, preliminary genetic maps and the nearly complete genomic sequence of Daphnia pulex will be integrated into wFleaBase, including gene predictions and ortholog assignments based on sequence similarities with eukaryote genes of known function. wFleaBase aims to serve a large ecological and evolutionary research community. Our challenge is to rapidly expand its content and to ultimately integrate genetic and functional genomic information with population-level responses to environmental challenges. URL: http://wfleabase.org/.

  3. The chickpea genomic web resource: visualization and analysis of the desi-type Cicer arietinum nuclear genome for comparative exploration of legumes.

    Science.gov (United States)

    Misra, Gopal; Priya, Piyush; Bandhiwal, Nitesh; Bareja, Neha; Jain, Mukesh; Bhatia, Sabhyata; Chattopadhyay, Debasis; Tyagi, Akhilesh K; Yadav, Gitanjali

    2014-12-18

    Availability of the draft nuclear genome sequences of small-seeded desi-type legume crop Cicer arietinum has provided an opportunity for investigating unique chickpea genomic features and evaluation of their biological significance. The increasing number of legume genome sequences also presents a challenge for developing reliable and information-driven bioinformatics applications suitable for comparative exploration of this important class of crop plants. The Chickpea Genomic Web Resource (CGWR) is an implementation of a suite of web-based applications dedicated to chickpea genome visualization and comparative analysis, based on next generation sequencing and assembly of Cicer arietinum desi-type genotype ICC4958. CGWR has been designed and configured for mapping, scanning and browsing the significant chickpea genomic features in view of the important existing and potential roles played by the various legume genome projects in mutant mapping and cloning. It also enables comparative informatics of ICC4958 DNA sequence analysis with other wild and cultivated genotypes of chickpea, various other leguminous species as well as several non-leguminous model plants, to enable investigations into evolutionary processes that shape legume genomes. CGWR is an online database offering a comprehensive visual and functional genomic analysis of the chickpea genome, along with customized maps and gene-clustering options. It is also the only plant based web resource supporting display and analysis of nucleosome positioning patterns in the genome. The usefulness of CGWR has been demonstrated with discoveries of biological significance made using this server. The CGWR is compatible with all available operating systems and browsers, and is available freely under the open source license at http://www.nipgr.res.in/CGWR/home.php.

  4. Genomics based analysis of interactions between developing B-lymphocytes and stromal cells reveal complex interactions and two-way communication

    Directory of Open Access Journals (Sweden)

    Bryder David

    2010-02-01

    Full Text Available Abstract Background The use of functional genomics has largely increased our understanding of cell biology and promises to help the development of systems biology needed to understand the complex order of events that regulates cellular differentiation in vivo. One model system clearly dependent on the integration of extra and intra cellular signals is the development of B-lymphocytes from hematopoietic stem cells in the bone marrow. This developmental pathway involves several defined differentiation stages associated with specific expression of genes including surface markers that can be used for the prospective isolation of the progenitor cells directly from the bone marrow to allow for ex vivo gene expression analysis. The developmental process can be simulated in vitro making it possible to dissect information about cell/cell communication as well as to address the relevance of communication pathways in a rather direct manner. Thus we believe that B-lymphocyte development represents a useful model system to take the first steps towards systems biology investigations in the bone marrow. Results In order to identify extra cellular signals that promote B lymphocyte development we created a database with approximately 400 receptor ligand pairs and software matching gene expression data from two cell populations to obtain information about possible communication pathways. Using this database and gene expression data from NIH3T3 cells (unable to support B cell development, OP-9 cells (strongly supportive of B cell development, pro-B and pre-B cells as well as mature peripheral B-lineage cells, we were able to identify a set of potential stage and stromal cell restricted communication pathways. Functional analysis of some of these potential ways of communication allowed us to identify BMP-4 as a potent stimulator of B-cell development in vitro. Further, the analysis suggested that there existed possibilities for progenitor B cells to send signals to

  5. Genomic analysis of hyperthermophilic archaea; Chokonetsusei kosaikin no genomu kaiseki

    Energy Technology Data Exchange (ETDEWEB)

    Kato, C. [Japan Marine Science and Technology Center, Kanagawa (Japan)

    1997-05-20

    Whole genome sequences of five strains of microorganisms have been reported up to the present and many genome analysis projects are in progress in the world. Among archaea (archaebacteria), the genome analysis of Methanococcus jannaschii have been completed and the sequencing data are opened to public. While 134 regulatory genes were identified in Synechocystis sp. PCC 6803 (eubacteria, 3.6 genome size), only 7 regulatory genes were identified in M. jannaschii (1.7Mb). Difference of the genome size is believed to correspond to the quantity of the environmental stresses. In Japan, the genome analysis project on a new hyperthermophilic archaeon, Pyrococcus horikoshii is in progress. P. horikoshii was isolated in a deep sea hydrothermal vent. It shows barophilic growth at maximum high temperature of 103degC under pressure of 30MPa. Thus, the genome analysis of barophilic hyperthermophilic archaea is expected to contribute to the understanding of the origin of life and evolution. 19 refs., 4 figs., 1 tab.

  6. BATCH-GE: Batch analysis of Next-Generation Sequencing data for genome editing assessment.

    Science.gov (United States)

    Boel, Annekatrien; Steyaert, Woutert; De Rocker, Nina; Menten, Björn; Callewaert, Bert; De Paepe, Anne; Coucke, Paul; Willaert, Andy

    2016-07-27

    Targeted mutagenesis by the CRISPR/Cas9 system is currently revolutionizing genetics. The ease of this technique has enabled genome engineering in-vitro and in a range of model organisms and has pushed experimental dimensions to unprecedented proportions. Due to its tremendous progress in terms of speed, read length, throughput and cost, Next-Generation Sequencing (NGS) has been increasingly used for the analysis of CRISPR/Cas9 genome editing experiments. However, the current tools for genome editing assessment lack flexibility and fall short in the analysis of large amounts of NGS data. Therefore, we designed BATCH-GE, an easy-to-use bioinformatics tool for batch analysis of NGS-generated genome editing data, available from https://github.com/WouterSteyaert/BATCH-GE.git. BATCH-GE detects and reports indel mutations and other precise genome editing events and calculates the corresponding mutagenesis efficiencies for a large number of samples in parallel. Furthermore, this new tool provides flexibility by allowing the user to adapt a number of input variables. The performance of BATCH-GE was evaluated in two genome editing experiments, aiming to generate knock-out and knock-in zebrafish mutants. This tool will not only contribute to the evaluation of CRISPR/Cas9-based experiments, but will be of use in any genome editing experiment and has the ability to analyze data from every organism with a sequenced genome.

  7. Comparative genomic analysis of eutherian interferon-γ-inducible GTPases.

    Science.gov (United States)

    Premzl, Marko

    2012-11-01

    The interferon-γ-inducible GTPases, IFGGs, are intracellular proteins involved in immune response against pathogens. A comprehensive comparative genomic review and analysis of eutherian IFGGs was carried out using public genomic sequences. The 64 eutherian IFGG genes were examined in detail and annotated. The eutherian IFGG promoter types were first catalogued followed by a phylogenetic analysis of eutherian IFGGs, which described five major IFGG clusters. The patterns of differential gene expansions and protein regions that may regulate IFGG catalytic features suggested a new classification of eutherian IFGGs. This mini-review has also provided new tests of reliability of public genomic sequences as well as tests of protein molecular evolution.

  8. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  9. A web-based multi-genome synteny viewer for customized data

    Directory of Open Access Journals (Sweden)

    Revanna Kashi V

    2012-08-01

    Full Text Available Abstract Background Web-based synteny visualization tools are important for sharing data and revealing patterns of complicated genome conservation and rearrangements. Such tools should allow biologists to upload genomic data for their own analysis. This requirement is critical because individual biologists are generating large amounts of genomic sequences that quickly overwhelm any centralized web resources to collect and display all those data. Recently, we published a web-based synteny viewer, GSV, which was designed to satisfy the above requirement. However, GSV can only compare two genomes at a given time. Extending the functionality of GSV to visualize multiple genomes is important to meet the increasing demand of the research community. Results We have developed a multi-Genome Synteny Viewer (mGSV. Similar to GSV, mGSV is a web-based tool that allows users to upload their own genomic data files for visualization. Multiple genomes can be presented in a single integrated view with an enhanced user interface. Users can navigate through all the selected genomes in either pairwise or multiple viewing mode to examine conserved genomic regions as well as the accompanying genome annotations. Besides serving users who manually interact with the web server, mGSV also provides Web Services for machine-to-machine communication to accept data sent by other remote resources. The entire mGSV package can also be downloaded for easy local installation. Conclusions mGSV significantly enhances the original functionalities of GSV. A web server hosting mGSV is provided at http://cas-bioinfo.cas.unt.edu/mgsv.

  10. Whole genome sequencing analysis of Plasmodium vivax using whole genome capture

    Directory of Open Access Journals (Sweden)

    Bright A

    2012-06-01

    Full Text Available Abstract Background Malaria caused by Plasmodium vivax is an experimentally neglected severe disease with a substantial burden on human health. Because of technical limitations, little is known about the biology of this important human pathogen. Whole genome analysis methods on patient-derived material are thus likely to have a substantial impact on our understanding of P. vivax pathogenesis and epidemiology. For example, it will allow study of the evolution and population biology of the parasite, allow parasite transmission patterns to be characterized, and may facilitate the identification of new drug resistance genes. Because parasitemias are typically low and the parasite cannot be readily cultured, on-site leukocyte depletion of blood samples is typically needed to remove human DNA that may be 1000X more abundant than parasite DNA. These features have precluded the analysis of archived blood samples and require the presence of laboratories in close proximity to the collection of field samples for optimal pre-cryopreservation sample preparation. Results Here we show that in-solution hybridization capture can be used to extract P. vivax DNA from human contaminating DNA in the laboratory without the need for on-site leukocyte filtration. Using a whole genome capture method, we were able to enrich P. vivax DNA from bulk genomic DNA from less than 0.5% to a median of 55% (range 20%-80%. This level of enrichment allows for efficient analysis of the samples by whole genome sequencing and does not introduce any gross biases into the data. With this method, we obtained greater than 5X coverage across 93% of the P. vivax genome for four P. vivax strains from Iquitos, Peru, which is similar to our results using leukocyte filtration (greater than 5X coverage across 96% . Conclusion The whole genome capture technique will enable more efficient whole genome analysis of P. vivax from a larger geographic region and from valuable archived sample collections.

  11. A novel statistic for genome-wide interaction analysis.

    Science.gov (United States)

    Wu, Xuesen; Dong, Hua; Luo, Li; Zhu, Yun; Peng, Gang; Reveille, John D; Xiong, Momiao

    2010-09-23

    Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDRanalysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

  12. A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes

    Directory of Open Access Journals (Sweden)

    Michael Strong

    2009-12-01

    Full Text Available As the number of completely sequenced microbial genomes continues to rise at an impressive rate, it is important to prepare students with the skills necessary to investigate microorganisms at the genomic level. As a part of the core curriculum for first-year graduate students in the biological sciences, we have implemented a web-based tutorial to introduce students to the fields of comparative and functional genomics. The tutorial focuses on recent computational methods for identifying functionally linked genes and proteins on a genome-wide scale and was used to introduce students to the Rosetta Stone, Phylogenetic Profile, conserved Gene Neighbor, and Operon computational methods. Students learned to use a number of publicly available web servers and databases to identify functionally linked genes in the Escherichia coli genome, with emphasis on genome organization and operon structure. The overall effectiveness of the tutorial was assessed based on student evaluations and homework assignments. The tutorial is available to other educators at http://www.doe-mbi.ucla.edu/~strong/m253.php.

  13. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    Directory of Open Access Journals (Sweden)

    Childs Kevin L

    2010-11-01

    Full Text Available Abstract Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence.

  14. Viral genome analysis and knowledge management.

    Science.gov (United States)

    Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

    2013-01-01

    One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.

  15. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    IP-seq and small RNA-seq, we delineated the landscape of the promoters with bidirectional transcriptions that yield steady-state RNA in only one directions (Paper III). A subsequent motif analysis enabled us to uncover specific DNA signals – early polyA sites – that make RNA on the reverse strand sensitive...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V......). Gene enrichment analysis on the detected NMD substrates revealed an unappreciated NMD-based regulatory mechanism of the genes hosting multiple intronic snoRNAs, which can facilitate differential expression of individual snoRNAs from a single host gene locus. Finally, supported by RNA-seq and small RNA-seq...

  16. Parallel WGA and WTA for Comparative Genome and Transcriptome NGS Analysis Using Tiny Cell Numbers.

    Science.gov (United States)

    Korfhage, Christian; Fricke, Evelyn; Meier, Andreas

    2015-07-01

    Genomic DNA determines how and when the transcriptome is changed by a trigger or environmental change and how cellular metabolism is influenced. Comparative genome and transcriptome analysis of the same cell sample links a defined genome with all changes in the bases, structure, or numbers of the transcriptome. However, comparative genome and transcriptome analysis using next-generation sequencing (NGS) or real-time PCR is often limited by the small amount of sample available. In mammals, the amount of DNA and RNA in a single cell is ∼10 picograms, but deep analysis of the genome and transcriptome currently requires several hundred nanograms of nucleic acids for library preparation for NGS sequencing. Consequently, accurate whole-genome amplification (WGA) and whole-transcriptome amplification (WTA) is required for such quantitative analysis. This unit describes how the genome and the transcriptome of a tiny number of cells can be amplified in a highly parallel and comparable process. Protocols for quality control of amplified DNA and application of amplified DNA for NGS are included.

  17. Microarray comparative genomic hybridisation analysis incorporating genomic organisation, and application to enterobacterial plant pathogens.

    Directory of Open Access Journals (Sweden)

    Leighton Pritchard

    2009-08-01

    Full Text Available Microarray comparative genomic hybridisation (aCGH provides an estimate of the relative abundance of genomic DNA (gDNA taken from comparator and reference organisms by hybridisation to a microarray containing probes that represent sequences from the reference organism. The experimental method is used in a number of biological applications, including the detection of human chromosomal aberrations, and in comparative genomic analysis of bacterial strains, but optimisation of the analysis is desirable in each problem domain.We present a method for analysis of bacterial aCGH data that encodes spatial information from the reference genome in a hidden Markov model. This technique is the first such method to be validated in comparisons of sequenced bacteria that diverge at the strain and at the genus level: Pectobacterium atrosepticum SCRI1043 (Pba1043 and Dickeya dadantii 3937 (Dda3937; and Lactococcus lactis subsp. lactis IL1403 and L. lactis subsp. cremoris MG1363. In all cases our method is found to outperform common and widely used aCGH analysis methods that do not incorporate spatial information. This analysis is applied to comparisons between commercially important plant pathogenic soft-rotting enterobacteria (SRE Pba1043, P. atrosepticum SCRI1039, P. carotovorum 193, and Dda3937.Our analysis indicates that it should not be assumed that hybridisation strength is a reliable proxy for sequence identity in aCGH experiments, and robustly extends the applicability of aCGH to bacterial comparisons at the genus level. Our results in the SRE further provide evidence for a dynamic, plastic 'accessory' genome, revealing major genomic islands encoding gene products that provide insight into, and may play a direct role in determining, variation amongst the SRE in terms of their environmental survival, host range and aetiology, such as phytotoxin synthesis, multidrug resistance, and nitrogen fixation.

  18. Chloroplast genome analysis of Australian eucalypts--Eucalyptus, Corymbia, Angophora, Allosyncarpia and Stockwellia (Myrtaceae).

    Science.gov (United States)

    Bayly, Michael J; Rigault, Philippe; Spokevicius, Antanas; Ladiges, Pauline Y; Ades, Peter K; Anderson, Charlotte; Bossinger, Gerd; Merchant, Andrew; Udovicic, Frank; Woodrow, Ian E; Tibbits, Josquin

    2013-12-01

    We present a phylogenetic analysis and comparison of structural features of chloroplast genomes for 39 species of the eucalypt group (genera Eucalyptus, Corymbia, Angophora, and outgroups Allosyncarpia and Stockwellia). We use 41 complete chloroplast genome sequences, adding 39 finished-quality chloroplast genomes to two previously published genomes. Maximum parsimony and Bayesian analyses, based on >7000 variable nucleotide positions, produced one fully resolved phylogenetic tree (35 supported nodes, 27 with 100% bootstrap support). Eucalyptus and its sister lineage Angophora+Corymbia show a deep divergence. Within Eucalyptus, three lineages are resolved: the 'eudesmid', 'symphyomyrt' and 'monocalypt' groups. Corymbia is paraphyletic with respect to Angophora. Gene content and order do not vary among eucalypt chloroplasts; length mutations, especially frame shifts, are uncommon in protein-coding genes. Some non-synonymous mutations are highly incongruent with the overall phylogenetic signal, notably in rbcL, and may be adaptive. Application of custom informatics pipelines (GYDLE Inc.) enabled direct chloroplast genome assembly, resolving each genome to finished-quality with no need for PCR gap-filling or contig order resolution. Analysis of whole chloroplast genomes resolved major eucalypt clades and revealed variable regions of the genome that will be useful in lower-level genetic studies (including phylogeography and geneflow).

  19. QTL analysis of novel genomic regions associated with yield and yield related traits in new plant type based recombinant inbred lines of rice (Oryza sativa L.

    Directory of Open Access Journals (Sweden)

    Marathi Balram

    2012-08-01

    Full Text Available Abstract Background Rice is staple food for more than half of the world’s population including two billion Asians, who obtain 60-70% of their energy intake from rice and its derivatives. To meet the growing demand from human population, rice varieties with higher yield potential and greater yield stability need to be developed. The favourable alleles for yield and yield contributing traits are distributed among two subspecies i.e., indica and japonica of cultivated rice (Oryza sativa L.. Identification of novel favourable alleles in indica/japonica will pave way to marker-assisted mobilization of these alleles in to a genetic background to break genetic barriers to yield. Results A new plant type (NPT based mapping population of 310 recombinant inbred lines (RILs was used to map novel genomic regions and QTL hotspots influencing yield and eleven yield component traits. We identified major quantitative trait loci (QTLs for days to 50% flowering (R2 = 25%, LOD = 14.3, panicles per plant (R2 = 19%, LOD = 9.74, flag leaf length (R2 = 22%, LOD = 3.05, flag leaf width (R2 = 53%, LOD = 46.5, spikelets per panicle (R2 = 16%, LOD = 13.8, filled grains per panicle (R2 = 22%, LOD = 15.3, percent spikelet sterility (R2 = 18%, LOD = 14.24, thousand grain weight (R2 = 25%, LOD = 12.9 and spikelet setting density (R2 = 23%, LOD = 15 expressing over two or more locations by using composite interval mapping. The phenotypic variation (R2 ranged from 8 to 53% for eleven QTLs expressing across all three locations. 19 novel QTLs were contributed by the NPT parent, Pusa1266. 15 QTL hotpots on eight chromosomes were identified for the correlated traits. Six epistatic QTLs effecting five traits at two locations were identified. A marker interval (RM3276-RM5709 on chromosome 4 harboring major QTLs for four traits was identified. Conclusions The present study reveals that favourable alleles for

  20. WormBase ParaSite - a comprehensive resource for helminth genomics.

    Science.gov (United States)

    Howe, Kevin L; Bolt, Bruce J; Shafie, Myriam; Kersey, Paul; Berriman, Matthew

    2017-07-01

    The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  1. High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software [v1; ref status: indexed, http://f1000r.es/40b

    Directory of Open Access Journals (Sweden)

    Diego Fabregat-Traver

    2014-08-01

    Full Text Available To raise the power of genome-wide association studies (GWAS and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations, increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNU GPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.

  2. Genomic tools development for Aquilegia: construction of a BAC-based physical map

    Directory of Open Access Journals (Sweden)

    Hodges Scott A

    2010-11-01

    Full Text Available Abstract Background The genus Aquilegia, consisting of approximately 70 taxa, is a member of the basal eudicot lineage, Ranuculales, which is evolutionarily intermediate between monocots and core eudicots, and represents a relatively unstudied clade in the angiosperm phylogenetic tree that bridges the gap between these two major plant groups. Aquilegia species are closely related and their distribution covers highly diverse habitats. These provide rich resources to better understand the genetic basis of adaptation to different pollinators and habitats that in turn leads to rapid speciation. To gain insights into the genome structure and facilitate gene identification, comparative genomics and whole-genome shotgun sequencing assembly, BAC-based genomics resources are of crucial importance. Results BAC-based genomic resources, including two BAC libraries, a physical map with anchored markers and BAC end sequences, were established from A. formosa. The physical map was composed of a total of 50,155 BAC clones in 832 contigs and 3939 singletons, covering 21X genome equivalents. These contigs spanned a physical length of 689.8 Mb (~2.3X of the genome suggesting the complex heterozygosity of the genome. A set of 197 markers was developed from ESTs induced by drought-stress, or involved in anthocyanin biosynthesis or floral development, and was integrated into the physical map. Among these were 87 genetically mapped markers that anchored 54 contigs, spanning 76.4 Mb (25.5% across the genome. Analysis of a selection of 12,086 BAC end sequences (BESs from the minimal tiling path (MTP allowed a preview of the Aquilegia genome organization, including identification of transposable elements, simple sequence repeats and gene content. Common repetitive elements previously reported in both monocots and core eudicots were identified in Aquilegia suggesting the value of this genome in connecting the two major plant clades. Comparison with sequenced plant genomes

  3. Pan-cancer analysis of ROS1 genomic aberrations

    OpenAIRE

    Wang, Yidan; 王奕丹

    2015-01-01

    The ROS proto-oncogene 1 (ROS1) encodes the ROS1 receptor kinase. ROS1 rearrangements are known to be oncogenic in glioblastoma, non–small-cell lung carcinoma (NSCLC) and cholangiocarcinoma. The clinical relevance of ROS1 genomic aberrations in other human cancers is largely unexamined. Here, we performed a pan-cancer analysis of ROS1 genomic aberrations across 20 cancer sites by interrogating the whole-exome sequencing data of the Cancer Genome Atlas (TCGA) via the cBioportal (www.cbioportal...

  4. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    Science.gov (United States)

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  5. Single-cell analysis in cancer genomics

    Science.gov (United States)

    Saadatpour, Assieh; Lai, Shujing; Guo, Guoji; Yuan, Guo-Cheng

    2017-01-01

    Genetic changes and environmental differences result in cellular heterogeneity among cancer cells within the same tumor, thereby complicating treatment outcomes. Recent advances in single-cell technologies have opened new avenues to characterize the intra-tumor cellular heterogeneity, identify rare cell types, measure mutation rates, and, ultimately, guide diagnosis and treatment. In this paper, we review the recent single-cell technological and computational advances at the genomic, transcriptomic, and proteomic levels, and discuss their applications in cancer research. PMID:26450340

  6. Ascaris phylogeny based on multiple whole mtDNA genomes

    DEFF Research Database (Denmark)

    Nejsum, Peter; Hawash, Mohamed B F; Betson, Martha

    2016-01-01

    Ascaris lumbricoides and A. suum are two parasitic nematodes infecting humans and pigs, respectively. There has been considerable debate as to whether Ascaris in the two hosts should be considered a single or two separate species. Previous studies identified at least three major clusters (A, B...... and C) of human and pig Ascaris based on partial cox1 sequences. In the present study, we selected major haplotypes from these different clusters to characterize their whole mitochondrial genomes for phylogenetic analysis. We also undertook coalescent simulations to investigate the evolutionary history...... events: the first one occurring early in the Neolithic period which resulted in a differentiated population of Ascaris in pigs (cluster C), the second occurring more recently (~ 900 generations ago), resulting in clusters A and B which might have been spread worldwide by human activities....

  7. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    Directory of Open Access Journals (Sweden)

    Jiuzhou Song

    2004-01-01

    Full Text Available Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  8. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    Science.gov (United States)

    Song, Jiuzhou; Ware, Tony; Liu, Shu-Lin; Surette, M.

    2004-12-01

    Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  9. Genomic analysis of epithelial ovarian cancer

    Institute of Scientific and Technical Information of China (English)

    John Farley; Laurent L Ozbun; Michael J Birrer

    2008-01-01

    Ovarian cancer is a major health problem for women in the United States.Despite evidence of considerable heterogeneity,most cases of ovarian cancer are treated in a similar fashion.The molecular basis for the clinicopathologic characteristics of these tumors remains poorly defined.Whole genome expression profiling is a genomic tool,which can identify dysregulated genes and uncover unique sub-classes of tumors.The application of this technology to ovarian cancer has provided a solid molecular basis for differences in histology and grade of ovarian tumors.Differentially expressed genes identified pathways implicated in cell proliferation,invasion,motility,chromosomal instability,and gene silencing and provided new insights into the origin and potential treatment of these cancers.The added knowledge provided by global gene expression profiling should allow for a more rational treatment of ovarian cancers.These techniques are leading to a paradigm shift from empirical treatment to an individually tailored approach.This review summarizes the new genomic data on epithelial ovarian cancers of different histology and grade and the impact it will have on our understanding and treatment of this disease.

  10. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Directory of Open Access Journals (Sweden)

    Gurusamy Raman

    Full Text Available Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC region (82,805 bp, with some variations in the inverted repeat region A (IRA/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19 was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA and ribosomal protein subunit L23 (rpl23 genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  11. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Science.gov (United States)

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  12. Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes

    Directory of Open Access Journals (Sweden)

    Qin Xiang

    2012-07-01

    Full Text Available Abstract Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA strains (including STs 16, 17, 18, and 78, in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains

  13. BambooGDB: a bamboo genome database with functional annotation and an analysis platform.

    Science.gov (United States)

    Zhao, Hansheng; Peng, Zhenhua; Fei, Benhua; Li, Lubin; Hu, Tao; Gao, Zhimin; Jiang, Zehui

    2014-01-01

    Bamboo, as one of the most important non-timber forest products and fastest-growing plants in the world, represents the only major lineage of grasses that is native to forests. Recent success on the first high-quality draft genome sequence of moso bamboo (Phyllostachys edulis) provides new insights on bamboo genetics and evolution. To further extend our understanding on bamboo genome and facilitate future studies on the basis of previous achievements, here we have developed BambooGDB, a bamboo genome database with functional annotation and analysis platform. The de novo sequencing data, together with the full-length complementary DNA and RNA-seq data of moso bamboo composed the main contents of this database. Based on these sequence data, a comprehensively functional annotation for bamboo genome was made. Besides, an analytical platform composed of comparative genomic analysis, protein-protein interactions network, pathway analysis and visualization of genomic data was also constructed. As discovery tools to understand and identify biological mechanisms of bamboo, the platform can be used as a systematic framework for helping and designing experiments for further validation. Moreover, diverse and powerful search tools and a convenient browser were incorporated to facilitate the navigation of these data. As far as we know, this is the first genome database for bamboo. Through integrating high-throughput sequencing data, a full functional annotation and several analysis modules, BambooGDB aims to provide worldwide researchers with a central genomic resource and an extensible analysis platform for bamboo genome. BambooGDB is freely available at http://www.bamboogdb.org/. Database URL: http://www.bamboogdb.org.

  14. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    Science.gov (United States)

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  15. Analysis of anoxybacillus genomes from the aspects of lifestyle adaptations, prophage diversity, and carbohydrate metabolism.

    Directory of Open Access Journals (Sweden)

    Kian Mau Goh

    Full Text Available Species of Anoxybacillus are widespread in geothermal springs, manure, and milk-processing plants. The genus is composed of 22 species and two subspecies, but the relationship between its lifestyle and genome is little understood. In this study, two high-quality draft genomes were generated from Anoxybacillus spp. SK3-4 and DT3-1, isolated from Malaysian hot springs. De novo assembly and annotation were performed, followed by comparative genome analysis with the complete genome of Anoxybacillus flavithermus WK1 and two additional draft genomes, of A. flavithermus TNO-09.006 and A. kamchatkensis G10. The genomes of Anoxybacillus spp. are among the smaller of the family Bacillaceae. Despite having smaller genomes, their essential genes related to lifestyle adaptations at elevated temperature, extreme pH, and protection against ultraviolet are complete. Due to the presence of various competence proteins, Anoxybacillus spp. SK3-4 and DT3-1 are able to take up foreign DNA fragments, and some of these transferred genes are important for the survival of the cells. The analysis of intact putative prophage genomes shows that they are highly diversified. Based on the genome analysis using SEED, many of the annotated sequences are involved in carbohydrate metabolism. The presence of glycosyl hydrolases among the Anoxybacillus spp. was compared, and the potential applications of these unexplored enzymes are suggested here. This is the first study that compares Anoxybacillus genomes from the aspect of lifestyle adaptations, the capacity for horizontal gene transfer, and carbohydrate metabolism.

  16. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  17. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    Science.gov (United States)

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2015-10-30

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.

  18. A novel statistic for genome-wide interaction analysis.

    Directory of Open Access Journals (Sweden)

    Xuesen Wu

    2010-09-01

    Full Text Available Although great progress in genome-wide association studies (GWAS has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked. The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

  19. Analysis of the Core Genome and Pan-Genome of Autotrophic Acetogenic Bacteria

    Science.gov (United States)

    Shin, Jongoh; Song, Yoseb; Jeong, Yujin; Cho, Byung-Kwan

    2016-01-01

    Acetogens are obligate anaerobic bacteria capable of reducing carbon dioxide (CO2) to multicarbon compounds coupled to the oxidation of inorganic substrates, such as hydrogen (H2) or carbon monoxide (CO), via the Wood-Ljungdahl pathway. Owing to the metabolic capability of CO2 fixation, much attention has been focused on understanding the unique pathways associated with acetogens, particularly their metabolic coupling of CO2 fixation to energy conservation. Most known acetogens are phylogenetically and metabolically diverse bacteria present in 23 different bacterial genera. With the increased volume of available genome information, acetogenic bacterial genomes can be analyzed by comparative genome analysis. Even with the genetic diversity that exists among acetogens, the Wood-Ljungdahl pathway, a central metabolic pathway, and cofactor biosynthetic pathways are highly conserved for autotrophic growth. Additionally, comparative genome analysis revealed that most genes in the acetogen-specific core genome were associated with the Wood-Ljungdahl pathway. The conserved enzymes and those predicted as missing can provide insight into biological differences between acetogens and allow for the discovery of promising candidates for industrial applications. PMID:27733845

  20. Analysis of the core genome and pan-genome of autotrophic acetogenic bacteria

    Directory of Open Access Journals (Sweden)

    JongOh Shin

    2016-09-01

    Full Text Available Acetogens are obligate anaerobic bacteria capable of reducing carbon dioxide (CO2 to multicarbon compounds coupled to the oxidation of inorganic substrates, such as hydrogen (H2 or carbon monoxide (CO, via the Wood-Ljungdahl pathway. Owing to the metabolic capability of CO2 fixation, much attention has been focused on understanding the unique pathways associated with acetogens, particularly their metabolic coupling of CO2 fixation to energy conservation. Most known acetogens are phylogenetically and metabolically diverse bacteria present in 23 different bacterial genera. With the increased volume of available genome information, acetogenic bacterial genomes can be analyzed by comparative genome analysis. Even with the genetic diversity that exists among acetogens, the Wood-Ljungdahl pathway, a central metabolic pathway, and cofactor biosynthetic pathways are highly conserved for autotrophic growth. Additionally, comparative genome analysis revealed that most genes in the acetogen-specific core genome were associated with the Wood-Ljungdahl pathway. The conserved enzymes and those predicted as missing can provide insight into biological differences between acetogens and allow for the discovery of promising candidates for industrial applications.

  1. Preliminary analysis of the mitochondrial genome evolutionary pattern in primates

    Institute of Scientific and Technical Information of China (English)

    Liang ZHAO; Xingtao ZHANG; Xingkui TAO; Weiwei WANG; Ming LI

    2012-01-01

    Since the birth of molecular evolutionary analysis,primates have been a central focus of study and mitochondrial DNA is well suited to these endeavors because of its unique features.Surprisingly,to date no comprehensive evaluation of the nucleotide substitution patterns has been conducted on the mitochondrial genome of primates.Here,we analyzed the evolutionary patterns and evaluated selection and recombination in the mitochondrial genomes of 44 Primates species downloaded from GenBank.The results revealed that a strong rate heterogeneity occurred among sites and genes in all comparisons.Likewise,an obvious decline in primate nucleotide diversity was noted in the subunit rRNAs and tRNAs as compared to the protein-coding genes.Within 13 protein-coding genes,the pattern of nonsynonymous divergence was similar to that of overall nucleotide divergence,while synonymous changes differed only for individual genes,indicating that the rate heterogeneity may result from the rate of change at nonsynonymous sites.Codon usage analysis revealed that there was intermediate codon usage bias in primate protein-coding genes,and supported the idea that GC mutation pressure might determine codon usage and that positive selection is not the driving force for the codon usage bias.Neutrality tests using site-specific positive selection from a Bayesian framework indicated no sites were under positive selection for any gene,consistent with near neutrality.Recombination tests based on the pairwise homoplasy test statistic supported complete linkage even for much older divergent primate species.Thus,with the exception of rate heterogeneity among mitochondrial genes,evaluating the validity assumed complete linkage and selective neutrality in primates prior to phylogenetic or phylogeographic analysis seems unnecessary.

  2. Preliminary analysis of the mitochondrial genome evolutionary pattern in primates.

    Science.gov (United States)

    Zhao, Liang; Zhang, Xingtao; Tao, Xingkui; Wang, Weiwei; Li, Ming

    2012-08-01

    Since the birth of molecular evolutionary analysis, primates have been a central focus of study and mitochondrial DNA is well suited to these endeavors because of its unique features. Surprisingly, to date no comprehensive evaluation of the nucleotide substitution patterns has been conducted on the mitochondrial genome of primates. Here, we analyzed the evolutionary patterns and evaluated selection and recombination in the mitochondrial genomes of 44 Primates species downloaded from GenBank. The results revealed that a strong rate heterogeneity occurred among sites and genes in all comparisons. Likewise, an obvious decline in primate nucleotide diversity was noted in the subunit rRNAs and tRNAs as compared to the protein-coding genes. Within 13 protein-coding genes, the pattern of nonsynonymous divergence was similar to that of overall nucleotide divergence, while synonymous changes differed only for individual genes, indicating that the rate heterogeneity may result from the rate of change at nonsynonymous sites. Codon usage analysis revealed that there was intermediate codon usage bias in primate protein-coding genes, and supported the idea that GC mutation pressure might determine codon usage and that positive selection is not the driving force for the codon usage bias. Neutrality tests using site-specific positive selection from a Bayesian framework indicated no sites were under positive selection for any gene, consistent with near neutrality. Recombination tests based on the pairwise homoplasy test statistic supported complete linkage even for much older divergent primate species. Thus, with the exception of rate heterogeneity among mitochondrial genes, evaluating the validity assumed complete linkage and selective neutrality in primates prior to phylogenetic or phylogeographic analysis seems unnecessary.

  3. Enhancing genomic laboratory reports: A qualitative analysis of provider review

    Science.gov (United States)

    Rahm, Alanna Kulchak; Stuckey, Heather; Green, Jamie; Feldman, Lynn; Zallen, Doris T.; Bonhag, Michele; Segal, Michael M.; Fan, Audrey L.; Williams, Marc S.

    2016-01-01

    This study reports on the responses of physicians who reviewed provider and patient versions of a genomic laboratory report designed to communicate results of whole genome sequencing. Semi‐structured interviews addressed concept communication, elements, and format of example genome reports. Analysis of the coded transcripts resulted in recognition of three constructs around communication of genome sequencing results: (1) Providers agreed that whole genomic sequencing results are complex and they welcomed a report that provided supportive interpretation information to accompany sequencing results; (2) Providers strongly endorsed a report that included active clinical guidance, such as reference to practice guidelines, if available; and (3) Providers valued the genomic report as a resource that would serve as the basis to facilitate communication of genome sequencing results with their patients and families. Providers valued both versions of the report, though they affirmed the need for a provider‐oriented report. Critical elements of the report included clear language to explain the result, as well as consolidated yet comprehensive prognostic information with clear guidance over time for the clinical care of the patient. Most importantly, it appears a report with this design has the potential not only to return results but also serves as a communication tool to help providers and patients discuss and coordinate care over time. © 2016 The Authors. American Journal of Medical Genetics Part A published by Wiley Periodicals, Inc. PMID:26842872

  4. Comparative optical genome analysis of two pangolin species: Manis pentadactyla and Manis javanica.

    Science.gov (United States)

    Zhihai, Huang; Jiang, Xu; Shuiming, Xiao; Baosheng, Liao; Yuan, Gao; Chaochao, Zhai; Xiaohui, Qiu; Wen, Xu; Shilin, Chen

    2016-12-01

    The pangolin is a Pholidota mammal with large keratin scales protecting its skin. Two pangolin species ( Manis pentadactyla and Manis javanica ) have been recorded as critically endangered on the International Union for Conservation of Nature Red List of Threatened Species. Optical mapping constructs high-resolution restriction maps from single DNA molecules for genome analysis at the megabase scale and to assist genome assembly. Here, we constructed restriction maps of M. pentadactyla and M. javanica using optical mapping to assist with genome assembly and analysis of these species. Genomic DNA was nicked with Nt.BspQI and then labeled using fluorescently labeled bases that were detected by the Irys optical mapping system. In total, 3,313,734 DNA molecules (517.847 Gb) for M. pentadactyla and 3,439,885 DNA molecules (504.743 Gb) for M. javanica were obtained, which corresponded to approximately 178X and 177X genome coverage, respectively. Qualified molecules (≥150 kb with a label density of >6 sites per 100 kb) were analyzed using the de novo assembly program embedded in the IrysView pipeline. We obtained two maps that were 2.91 Gb and 2.85 Gb in size with N50s of 1.88 Mb and 1.97 Mb, respectively. Optical mapping reveals large-scale structural information that is especially important for non-model genomes that lack a good reference. The approach has the potential to guide de novo assembly of genomes sequenced using next-generation sequencing. Our data provide a resource for Manidae genome analysis and references for de novo assembly. This note also provides new insights into Manidae evolutionary analysis at the genome structure level.

  5. Intimate evolution of proteins. Proteome atomic content correlates with genome base composition.

    Science.gov (United States)

    Baudouin-Cornu, Peggy; Schuerer, Katja; Marlière, Philippe; Thomas, Dominique

    2004-02-13

    Discerning the significant relations that exist within and among genome sequences is a major step toward the modeling of biopolymer evolution. Here we report the systematic analysis of the atomic composition of proteins encoded by organisms representative of each kingdoms. Protein atomic contents are shown to vary largely among species, the larger variations being observed for the main architectural component of proteins, the carbon atom. These variations apply to the bulk proteins as well as to subsets of ortholog proteins. A pronounced correlation between proteome carbon content and genome base composition is further evidenced, with high G+C genome content being related to low protein carbon content. The generation of random proteomes and the examination of the canonical genetic code provide arguments for the hypothesis that natural selection might have driven genome base composition.

  6. Comparative genomic analysis of two-component regulatory proteins in Pseudomonas syringae

    DEFF Research Database (Denmark)

    Lavin, J.L.; Kiil, Kristoffer; Resano, O.

    2007-01-01

    important differences in TCS proteins among the three P. syringae pathovars. Conclusion: In this article we present a thorough analysis of the identification and distribution of TCS proteins among the sequenced genomes of P. syringae. We have identified differences in TCS proteins among the three P...... requires a complex array of TCS proteins to cope with diverse plant hosts, host responses, and environmental conditions. Results: Based on the genomic data, pattern searches with Hidden Markov Model (HMM) profiles have been used to identify putative HKs and RRs. The genomes of Psy B728a, Pto DC3000 and Pph...... 1448A were found to contain a large number of genes encoding TCS proteins, and a core of complete TCS proteins were shared between these genomes: 30 putative TCS clusters, 11 orphan HKs, 33 orphan RRs, and 16 hybrid HKs. A close analysis of the distribution of genes encoding TCS proteins revealed...

  7. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    Science.gov (United States)

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities.

  8. Array-based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors.

    NARCIS (Netherlands)

    Veltman, J.A.; Fridlyand, J.; Pejavar, S.; Olshen, A.B.; Korkola, J.E.; Vries, S. de; Carroll, P.; Kuo, W.L.; Pinkel, D.; Albertson, D.; Cordon-Cardo, C.; Jain, A.N.; Waldman, F.M.

    2003-01-01

    Genome-wide copy number profiles were characterized in 41 primary bladder tumors using array-based comparative genomic hybridization (array CGH). In addition to previously identified alterations in large chromosomal regions, alterations were identified in many small genomic regions, some with high-l

  9. Genomic comparisons of Brucella spp. and closely related bacteria using base compositional and proteome based methods

    DEFF Research Database (Denmark)

    Bohlin, Jon; Snipen, Lars; Cloeckaert, Axel

    2010-01-01

    , genomic codon and amino acid frequencies based comparisons) and proteomes (all-against-all BLAST protein comparisons and pan-genomic analyses). RESULTS: We found that the oligonucleotide based methods gave different results compared to that of the proteome based methods. Differences were also found...... than proteome comparisons between species in genus Brucella and genus Ochrobactrum. Pan-genomic analyses indicated that uptake of DNA from outside genus Brucella appears to be limited. CONCLUSIONS: While both the proteome based methods and the Markov chain based genomic signatures were able to reflect...

  10. A radiation hybrid map of the European sea bass (Dicentrarchus labrax) based on 1581 markers: Synteny analysis with model fish genomes.

    Science.gov (United States)

    Guyon, Richard; Senger, Fabrice; Rakotomanga, Michaelle; Sadequi, Naoual; Volckaert, Filip A M; Hitte, Christophe; Galibert, Francis

    2010-10-01

    The selective breeding of fish for aquaculture purposes requires the understanding of the genetic basis of traits such as growth, behaviour, resistance to pathogens and sex determinism. Access to well-developed genomic resources is a prerequisite to improve the knowledge of these traits. Having this aim in mind, a radiation hybrid (RH) panel of European sea bass (Dicentrarchus labrax) was constructed from splenocytes irradiated at 3000 rad, allowing the construction of a 1581 marker RH map. A total of 1440 gene markers providing ~4400 anchors with the genomes of three-spined stickleback, medaka, pufferfish and zebrafish, helped establish synteny relationships with these model species. The identification of Conserved Segments Ordered (CSO) between sea bass and model species allows the anticipation of the position of any sea bass gene from its location in model genomes. Synteny relationships between sea bass and gilthead seabream were addressed by mapping 37 orthologous markers. The sea bass genetic linkage map was integrated in the RH map through the mapping of 141 microsatellites. We are thus able to present the first complete gene map of sea bass. It will facilitate linkage studies and the identification of candidate genes and Quantitative Trait Loci (QTL). The RH map further positions sea bass as a genetic and evolutionary model of Perciformes and supports their ongoing aquaculture expansion.

  11. Genome-based comparative analyses of Antarctic and temperate species of Paenibacillus.

    Directory of Open Access Journals (Sweden)

    Melissa Dsouza

    Full Text Available Antarctic soils represent a unique environment characterised by extremes of temperature, salinity, elevated UV radiation, low nutrient and low water content. Despite the harshness of this environment, members of 15 bacterial phyla have been identified in soils of the Ross Sea Region (RSR. However, the survival mechanisms and ecological roles of these phyla are largely unknown. The aim of this study was to investigate whether strains of Paenibacillus darwinianus owe their resilience to substantial genomic changes. For this, genome-based comparative analyses were performed on three P. darwinianus strains, isolated from gamma-irradiated RSR soils, together with nine temperate, soil-dwelling Paenibacillus spp. The genome of each strain was sequenced to over 1,000-fold coverage, then assembled into contigs totalling approximately 3 Mbp per genome. Based on the occurrence of essential, single-copy genes, genome completeness was estimated at approximately 88%. Genome analysis revealed between 3,043-3,091 protein-coding sequences (CDSs, primarily associated with two-component systems, sigma factors, transporters, sporulation and genes induced by cold-shock, oxidative and osmotic stresses. These comparative analyses provide an insight into the metabolic potential of P. darwinianus, revealing potential adaptive mechanisms for survival in Antarctic soils. However, a large proportion of these mechanisms were also identified in temperate Paenibacillus spp., suggesting that these mechanisms are beneficial for growth and survival in a range of soil environments. These analyses have also revealed that the P. darwinianus genomes contain significantly fewer CDSs and have a lower paralogous content. Notwithstanding the incompleteness of the assemblies, the large differences in genome sizes, determined by the number of genes in paralogous clusters and the CDS content, are indicative of genome content scaling. Finally, these sequences are a resource for further

  12. BioViews: Java-based tools for genomic data visualization.

    Science.gov (United States)

    Helt, G A; Lewis, S; Loraine, A E; Rubin, G M

    1998-03-01

    Visualization tools for bioinformatics ideally should provide universal access to the most current data in an interactive and intuitive graphical user interface. Since the introduction of Java, a language designed for distributed programming over the Web, the technology now exists to build a genomic data visualization tool that meets these requirements. Using Java we have developed a prototype genome browser applet (BioViews) that incorporates a three-level graphical view of genomic data: a physical map, an annotated sequence map, and a DNA sequence display. Annotated biological features are displayed on the physical and sequence-based maps, and the different views are interconnected. The applet is linked to several databases and can retrieve features and display hyperlinked textual data on selected features. In addition to browsing genomic data, different types of analyses can be performed interactively and the results of these analyses visualized alongside prior annotations. Our genome browser is built on top of extensible, reusable graphic components specifically designed for bioinformatics. Other groups can (and do) reuse this work in various ways. Genome centers can reuse large parts of the genome browser with minor modifications, bioinformatics groups working on sequence analysis can reuse components to build front ends for analysis programs, and biology laboratories can reuse components to publish results as dynamic Web documents.

  13. Cytogenetic analysis from DNA by comparative genomic hybridization.

    Science.gov (United States)

    Tachdjian, G; Aboura, A; Lapierre, J M; Viguié, F

    2000-01-01

    Comparative genomic hybridization (CGH) is a modified in situ hybridization technique which allows detection and mapping of DNA sequence copy differences between two genomes in a single experiment. In CGH analysis, two differentially labelled genomic DNA (study and reference) are co-hybridized to normal metaphase spreads. Chromosomal locations of copy number changes in the DNA segments of the study genome are revealed by a variable fluorescence intensity ratio along each target chromosome. Since its development, CGH has been applied mostly as a research tool in the field of cancer cytogenetics to identify genetic changes in many previously unknown regions. CGH may also have a role in clinical cytogenetics for detection and identification of unbalanced chromosomal abnormalities.

  14. MIRAGE: a functional genomics-based approach for metabolic network model reconstruction and its application to cyanobacteria networks.

    Science.gov (United States)

    Vitkin, Edward; Shlomi, Tomer

    2012-11-29

    Genome-scale metabolic network reconstructions are considered a key step in quantifying the genotype-phenotype relationship. We present a novel gap-filling approach, MetabolIc Reconstruction via functionAl GEnomics (MIRAGE), which identifies missing network reactions by integrating metabolic flux analysis and functional genomics data. MIRAGE's performance is demonstrated on the reconstruction of metabolic network models of E. coli and Synechocystis sp. and validated via existing networks for these species. Then, it is applied to reconstruct genome-scale metabolic network models for 36 sequenced cyanobacteria amenable for constraint-based modeling analysis and specifically for metabolic engineering. The reconstructed network models are supplied via standard SBML files.

  15. Differential DNA Methylation Analysis without a Reference Genome

    Directory of Open Access Journals (Sweden)

    Johanna Klughammer

    2015-12-01

    Full Text Available Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS, which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish. Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org. The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.

  16. Differential DNA Methylation Analysis without a Reference Genome.

    Science.gov (United States)

    Klughammer, Johanna; Datlinger, Paul; Printz, Dieter; Sheffield, Nathan C; Farlik, Matthias; Hadler, Johanna; Fritsch, Gerhard; Bock, Christoph

    2015-12-22

    Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS), which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish). Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org). The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.

  17. Whole genome sequencing as a tool for phylogenetic analysis of clinical strains of Mitis group streptococci.

    Science.gov (United States)

    Rasmussen, L H; Dargis, R; Højholt, K; Christensen, J J; Skovgaard, O; Justesen, U S; Rosenvinge, F S; Moser, C; Lukjancenko, O; Rasmussen, S; Nielsen, X C

    2016-10-01

    Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients with infective endocarditis were whole genome sequenced. We compared the phylogenetic analyses based on single genes (recA, sodA, gdh), multigene (MLSA), SNPs, and core-genome sequences. The six phylogenetic analyses generally showed a similar pattern of six monophyletic clusters, though a few differences were observed in single gene analyses. Species identification based on single gene analysis showed their limitations when more strains were included. In contrast, analyses incorporating more sequence data, like MLSA, SNPs and core-genome analyses, provided more distinct clustering. The core-genome tree showed the most distinct clustering.

  18. Bacterial Recombineering: Genome Engineering via Phage-Based Homologous Recombination.

    Science.gov (United States)

    Pines, Gur; Freed, Emily F; Winkler, James D; Gill, Ryan T

    2015-11-20

    The ability to specifically modify bacterial genomes in a precise and efficient manner is highly desired in various fields, ranging from molecular genetics to metabolic engineering and synthetic biology. Much has changed from the initial realization that phage-derived genes may be employed for such tasks to today, where recombineering enables complex genetic edits within a genome or a population. Here, we review the major developments leading to recombineering becoming the method of choice for in situ bacterial genome editing while highlighting the various applications of recombineering in pushing the boundaries of synthetic biology. We also present the current understanding of the mechanism of recombineering. Finally, we discuss in detail issues surrounding recombineering efficiency and future directions for recombineering-based genome editing.

  19. What’s in the genome of a filamentous fungus? Analysis of the Neurospora genome sequence

    Science.gov (United States)

    Mannhaupt, Gertrud; Montrone, Corinna; Haase, Dirk; Mewes, H. Werner; Aign, Verena; Hoheisel, Jörg D.; Fartmann, Berthold; Nyakatura, Gerald; Kempken, Frank; Maier, Josef; Schulte, Ulrich

    2003-01-01

    The German Neurospora Genome Project has assembled sequences from ordered cosmid and BAC clones of linkage groups II and V of the genome of Neurospora crassa in 13 and 12 contigs, respectively. Including additional sequences located on other linkage groups a total of 12 Mb were subjected to a manual gene extraction and annotation process. The genome comprises a small number of repetitive elements, a low degree of segmental duplications and very few paralogous genes. The analysis of the 3218 identified open reading frames provides a first overview of the protein equipment of a filamentous fungus. Significantly, N.crassa possesses a large variety of metabolic enzymes including a substantial number of enzymes involved in the degradation of complex substrates as well as secondary metabolism. While several of these enzymes are specific for filamentous fungi many are shared exclusively with prokaryotes. PMID:12655011

  20. Savant Genome Browser 2: visualization and analysis for population-scale genomics.

    Science.gov (United States)

    Fiume, Marc; Smith, Eric J M; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M; Robinson, Mark D; Wodak, Shoshana J; Brudno, Michael

    2012-07-01

    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.

  1. Analysis of the relationship between genomic GC Content and patterns of base usage, codon usage and amino acid usage in prokaryotes: similar GC content adopts similar compositional frequencies regardless of the phylogenetic lineages.

    Directory of Open Access Journals (Sweden)

    Hui-Qi Zhou

    Full Text Available The GC contents of 2670 prokaryotic genomes that belong to diverse phylogenetic lineages were analyzed in this paper. These genomes had GC contents that ranged from 13.5% to 74.9%. We analyzed the distance of base frequencies at the three codon positions, codon frequencies, and amino acid compositions across genomes with respect to the differences in the GC content of these prokaryotic species. We found that although the phylogenetic lineages were remote among some species, a similar genomic GC content forced them to adopt similar base usage patterns at the three codon positions, codon usage patterns, and amino acid usage patterns. Our work demonstrates that in prokaryotic genomes: a base usage, codon usage, and amino acid usage change with GC content with a linear correlation; b the distance of each usage has a linear correlation with the GC content difference; and c GC content is more essential than phylogenetic lineage in determining base usage, codon usage, and amino acid usage. This work is exceptional in that we adopted intuitively graphic methods for all analyses, and we used these analyses to examine as many as 2670 prokaryotes. We hope that this work is helpful for understanding common features in the organization of microbial genomes.

  2. Analysis of common k-mers for whole genome sequences using SSB-tree.

    Science.gov (United States)

    Choi, Jeong-Hyeon; Cho, Hwan-Gue

    2002-01-01

    As sequenced genomes become larger and sequencing process becomes faster, there is a need to develop a tool to analyze sequences in the whole genomic scale. However, on-memory algorithms such as suffix tree and suffix array are not applicable to the analysis of whole genome sequence set, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench called SequeX for the analysis and visualization of whole genome sequences using SSB-tree (Static SB-tree). It consists of two parts: the analysis query subsystem and the visualization subsystem. The query subsystem supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization subsystem helps biologists to easily understand whole genome structure and feature by sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, and k-mer viewer. The system also supports a user-friendly programming interface based on Java script for batch processing and the extension for a specific purpose of a user. SequeX can be used to identify conserved genes or sequences by the analysis of the common k-mers and annotation. We analyze the common k-mer for 72 microbial genomes announced by Entrez, and find an interesting biological fact that the longest common k-mer for 72 sequences is 11-mer, and only 11 such sequences exist. Finally we note that many common k-mers occur in conserved region such as CDS, rRNA, and tRNA.

  3. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    Directory of Open Access Journals (Sweden)

    Andersson Jan O

    2010-10-01

    Full Text Available Abstract Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation.

  4. Yeast as a touchstone in post-genomic research: strategies for integrative analysis in functional genomics.

    Science.gov (United States)

    Castrillo, Juan I; Oliver, Stephen G

    2004-01-31

    The new complexity arising from the genome sequencing projects requires new comprehensive post-genomic strategies: advanced studies in regulatory mechanisms, application of new high-throughput technologies at a genome-wide scale, at the different levels of cellular complexity (genome, transcriptome, proteome and metabolome), efficient analysis of the results, and application of new bioinformatic methods in an integrative or systems biology perspective. This can be accomplished in studies with model organisms under controlled conditions. In this review a perspective of the favourable characteristics of yeast as a touchstone model in post-genomic research is presented. The state-of-the art, latest advances in the field and bottlenecks, new strategies, new regulatory mechanisms, applications (patents) and high-throughput technologies, most of them being developed and validated in yeast, are presented. The optimal characteristics of yeast as a well-defined system for comprehensive studies under controlled conditions makes it a perfect model to be used in integrative, "systems biology" studies to get new insights into the mechanisms of regulation (regulatory networks) responsible of specific phenotypes under particular environmental conditions, to be applied to more complex organisms (e.g. plants, human).

  5. Sequencing and Analysis of a Genomic Fragment Provide an Insight into the Dunaliella viridis Genomic Sequence

    Institute of Scientific and Technical Information of China (English)

    Xiao-Ming SUN; Yuan-Ping TANG; Xiang-Zong MENG; Wen-Wen ZHANG; Shan LI; Zhi-Rui DENG; Zheng-Kai XU; Ren-Tao SONG

    2006-01-01

    Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)n type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features.

  6. 13C metabolic flux analysis at a genome-scale.

    Science.gov (United States)

    Gopalakrishnan, Saratram; Maranas, Costas D

    2015-11-01

    Metabolic models used in 13C metabolic flux analysis generally include a limited number of reactions primarily from central metabolism. They typically omit degradation pathways, complete cofactor balances, and atom transition contributions for reactions outside central metabolism. This study addresses the impact on prediction fidelity of scaling-up mapping models to a genome-scale. The core mapping model employed in this study accounts for (75 reactions and 65 metabolites) primarily from central metabolism. The genome-scale metabolic mapping model (GSMM) (697 reaction and 595 metabolites) is constructed using as a basis the iAF1260 model upon eliminating reactions guaranteed not to carry flux based on growth and fermentation data for a minimal glucose growth medium. Labeling data for 17 amino acid fragments obtained from cells fed with glucose labeled at the second carbon was used to obtain fluxes and ranges. Metabolic fluxes and confidence intervals are estimated, for both core and genome-scale mapping models, by minimizing the sum of square of differences between predicted and experimentally measured labeling patterns using the EMU decomposition algorithm. Overall, we find that both topology and estimated values of the metabolic fluxes remain largely consistent between core and GSM model. Stepping up to a genome-scale mapping model leads to wider flux inference ranges for 20 key reactions present in the core model. The glycolysis flux range doubles due to the possibility of active gluconeogenesis, the TCA flux range expanded by 80% due to the availability of a bypass through arginine consistent with labeling data, and the transhydrogenase reaction flux was essentially unresolved due to the presence of as many as five routes for the inter-conversion of NADPH to NADH afforded by the genome-scale model. By globally accounting for ATP demands in the GSMM model the unused ATP decreased drastically with the lower bound matching the maintenance ATP requirement. A non

  7. Identification of Elymus (Triticeae, Poaceae) and its related genera genomes by RFLP analysis of PCR-amplified Adh genes.

    Science.gov (United States)

    Liu, QuanLan; Zhang, NingNing; Li, Lei; Liu, Jie

    2010-10-01

    Elymus L. is the largest genus in Triticeae, containing about 150 species with four recognized genome donors (St, H, P, and W). Traditionally, the genome compound of this genus is identified based on cytological data. Recently, molecular phylogenetic analysis was used to investigate its genomic combination. Here we describe a restriction fragment length polymorphism (RFLP) assay based on digesting alcohol dehydrogenase (Adh) amplicons with two restriction enzyme combinations, EcoRI-HindIII and EcoRI-PstI, which easily can be used to distinguish Elymus and its closely related genera genomes. The method includes only four steps: (1) amplifying nuclear Adh genes with universal primers; (2) purifying and cloning PCR products; (3) digesting plasmids with restriction enzymes that identify a given genome; (4) running the digested products on an agarose gel and identify the sample based on the restriction profiles. Results showed that: (1) PCR products ranged from 1,200 to 2,000 bp; (2) Adh2 gene was amplified from all the tested genomes; Adh1 gene was amplified from almost all of the tested genomes except the W genome; Adh3 gene was amplified only from the St genome; (3) the EcoRI-HindIII combination was effective to distinguish different Adh gene types (Adh1, Adh2, and Adh3); (4) the Adh2-EcoRI-PstI fragments could be used to distinguish Elymus and its closely related genera genomes. Therefore, This RFLP assay provides an inexpensive and simple means of identifying Elymus genomes.

  8. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......BACKGROUND: Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point...... mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome....

  9. Inverted Low-Copy Repeats and Genome Instability—A Genome-Wide Analysis

    Science.gov (United States)

    Dittwald, Piotr; Gambin, Tomasz; Gonzaga-Jauregui, Claudia; Carvalho, Claudia M.B.; Lupski, James R.; Stankiewicz, Paweł; Gambin, Anna

    2013-01-01

    Inverse paralogous low-copy repeats (IP-LCRs) can cause genome instability by nonallelic homologous recombination (NAHR)-mediated balanced inversions. When disrupting a dosage-sensitive gene(s), balanced inversions can lead to abnormal phenotypes. We delineated the genome-wide distribution of IP-LCRs >1 kB in size with >95% sequence identity and mapped the genes, potentially intersected by an inversion, that overlap at least one of the IP-LCRs. Remarkably, our results show that 12.0% of the human genome is potentially susceptible to such inversions and 942 genes, 99 of which are on the X chromosome, are predicted to be disrupted secondary to such an inversion! In addition, IP-LCRs larger than 800 bp with at least 98% sequence identity (duplication/triplication facilitating IP-LCRs, DTIP-LCRs) were recently implicated in the formation of complex genomic rearrangements with a duplication-inverted triplication–duplication (DUP-TRP/INV-DUP) structure by a replication-based mechanism involving a template switch between such inverted repeats. We identified 1,551 DTIP-LCRs that could facilitate DUP-TRP/INV-DUP formation. Remarkably, 1,445 disease-associated genes are at risk of undergoing copy-number gain as they map to genomic intervals susceptible to the formation of DUP-TRP/INV-DUP complex rearrangements. We implicate inverted LCRs as a human genome architectural feature that could potentially be responsible for genomic instability associated with many human disease traits. PMID:22965494

  10. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Directory of Open Access Journals (Sweden)

    Jianmin Fu

    Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  11. Genome-Wide Analysis of Simple Sequence Repeats and Efficient Development of Polymorphic SSR Markers Based on Whole Genome Re-Sequencing of Multiple Isolates of the Wheat Stripe Rust Fungus.

    Directory of Open Access Journals (Sweden)

    Huaiyong Luo

    Full Text Available The biotrophic parasitic fungus Puccinia striiformis f. sp. tritici (Pst causes stripe rust, a devastating disease of wheat, endangering global food security. Because the Pst population is highly dynamic, it is difficult to develop wheat cultivars with durable and highly effective resistance. Simple sequence repeats (SSRs are widely used as molecular markers in genetic studies to determine population structure in many organisms. However, only a small number of SSR markers have been developed for Pst. In this study, a total of 4,792 SSR loci were identified using the whole genome sequences of six isolates from different regions of the world, with a marker density of one SSR per 22.95 kb. The majority of the SSRs were di- and tri-nucleotide repeats. A database containing 1,113 SSR markers were established. Through in silico comparison, the previously reported SSR markers were found mainly in exons, whereas the SSR markers in the database were mostly in intergenic regions. Furthermore, 105 polymorphic SSR markers were confirmed in silico by their identical positions and nucleotide variations with INDELs identified among the six isolates. When 104 in silico polymorphic SSR markers were used to genotype 21 Pst isolates, 84 produced the target bands, and 82 of them were polymorphic and revealed the genetic relationships among the isolates. The results show that whole genome re-sequencing of multiple isolates provides an ideal resource for developing SSR markers, and the newly developed SSR markers are useful for genetic and population studies of the wheat stripe rust fungus.

  12. Comparative Genomic Analysis of Clinical and Environmental Vibrio Vulnificus Isolates Revealed Biotype 3 Evolutionary Relationships

    Directory of Open Access Journals (Sweden)

    Yael eKotton

    2015-01-01

    Full Text Available In 1996 a common-source outbreak of severe soft tissue and bloodstream infections erupted among Israeli fish farmers and fish consumers due to changes in fish marketing policies. The causative pathogen was a new strain of Vibrio vulnificus, named biotype 3, which displayed a unique biochemical and genotypic profile. Initial observations suggested that the pathogen erupted as a result of genetic recombination between two distinct populations. We applied a whole genome shotgun sequencing approach using several V. vulnificus strains from Israel in order to study the pan genome of V. vulnificus and determine the phylogenetic relationship of biotype 3 with existing populations. The core genome of V. vulnificus based on 16 draft and complete genomes consisted of 3068 genes, representing between 59% and 78% of the whole genome of 16 strains. The accessory genome varied in size from 781 kbp to 2044 kbp. Phylogenetic analysis based on whole, core, and accessory genomes displayed similar clustering patterns with two main clusters, clinical (C and environmental (E, all biotype 3 strains formed a distinct group within the E cluster. Annotation of accessory genomic regions found in biotype 3 strains and absent from the core genome yielded 1732 genes, of which the vast majority encoded hypothetical proteins, phage-related proteins, and mobile element proteins. A total of 1916 proteins (including 713 hypothetical proteins were present in all human pathogenic strains (both biotype 3 and non-biotype 3 and absent from the environmental strains. Clustering analysis of the non-hypothetical proteins revealed 148 protein clusters shared by all human pathogenic strains; these included transcriptional regulators, arylsulfatases, methyl-accepting chemotaxis proteins, acetyltransferases, GGDEF family proteins, transposases, type IV secretory system (T4SS proteins, and integrases. Our study showed that V. vulnificus biotype 3 evolved from environmental populations and

  13. Identification of conserved regulatory elements by comparative genome analysis

    Directory of Open Access Journals (Sweden)

    Jareborg Niclas

    2003-05-01

    Full Text Available Abstract Background For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments. Results We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at http://www.phylofoot.org/. Conclusions Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.

  14. Genome-wide prediction, display and refinement of binding sites with information theory-based models

    Directory of Open Access Journals (Sweden)

    Leeder J Steven

    2003-09-01

    Full Text Available Abstract Background We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices. Results Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4–6 hours for transcription factor binding sites and 10–19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths

  15. Comparative genomics and phylogenetic analysis of S. dysenteriae subgroup

    Institute of Scientific and Technical Information of China (English)

    YANG; E; BIN; Wen; PENG; Junping; ZHANG; Xiaobing; WANG; Ji

    2005-01-01

    Genomic compositions of representatives of thirteen S. Dysenteriae serotypes were investigated by performing comparative genomic hybridization (CGH) with microarray containing the whole genomic ORFs (open reading frames, ORFs) of E. Coli K12 strain MG1655 and specific ORFs of S. Dysenteriae A1 strain Sd51197. The CGH results indicated the genomes of the serotypes contain 2654 conserved ORFs originating from E. Coli. However, 219 intrinsic genes of E. Coli including those prophage genes, molecular chaperones, synthesis of specific O antigen and so on were absent. Moreover, some specific genes such as type II secretion system associated components, iron transport related genes and some others as well were acquired through horizontal transfer. According to phylogenic trees based on genetic composition, it was demonstrated that A1, A2, A8, A10 were distinct from the other S. Dysenteriae serotypes. Our results in this report may provide new insights into the physiological process, pathogenicity and evolution of S. Dysenteriae.

  16. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    Science.gov (United States)

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  17. Determining protein function and interaction from genome analysis

    Science.gov (United States)

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  18. An approach to incorporate linkage disequilibrium structure into genomic association analysis

    Institute of Scientific and Technical Information of China (English)

    Fengyu Zhang; Diane Wagener

    2008-01-01

    In this study, we propose to use the principal component analysis (PCA) and regression model to incorporate linkage disequilibrium (LD) in genomic association data analysis. To accommodate LD in genomic data and reduce multiple testing, we suggest performing PCA and extracting the PCA score to capture the variation of genomic data, after which regression analysis is used to assess the association of the disease with the principal component score. An empirical analysis result shows that both genotype-basod correlation matrix and haplotype-based LD matrix can produce similar results for PCA. Principal component score seems to be more powerful in detecting genetic association because the principal component score is quantitatively measured and may be able to capture the effect of multiple loci.

  19. Genomic compositions and phylogenetic analysis of Shigella boydii subgroup

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Comparative Genomic Hybridization (CGH) microarray analysis was used to compare the genomic compositions of all eighteen Shigella boydii serotype representative strains. The results indicated the genomic "backbone" of this subgroup contained 2552 ORFs homologous to nonpathogenic E. coli K12. Compared with the genome of K12199 ORFs were found to be absent in all S. boydii serotype representatives, including mainly outer membrane protein genes and O-antigen biosynthesis genes. Yet the specific ORFs of S. boydii subgroup contained basically bacteriophage genes and the function unknown (FUN) genes. Some iron metabolism, transport and type II secretion system related genes were found in most representative strains. According to the CGH phylogenetic analysis, the eighteen S. boydii serotype representatives were divided into four groups, in which serotype C13 strain was remarkably distinguished from the other serotype strains. This grouping result corresponded to the distribution of some metabolism related genes. Furthermore, the analysis of genome backbone genes, specific genes, and the phylogenetic trees allowed us to discover the evolution laws of S. boydii and to find out important clues to pathogenesis research, vaccination and the therapeutic medicine development.

  20. Genome Sequencing and Comparative Genomics Analysis Revealed Pathogenic Potential in Penicillium capsulatum as a Novel Fungal Pathogen Belonging to Eurotiales

    Science.gov (United States)

    Yang, Ying; Chen, Min; Li, Zongwei; Al-Hatmi, Abdullah M. S.; de Hoog, Sybren; Pan, Weihua; Ye, Qiang; Bo, Xiaochen; Li, Zhen; Wang, Shengqi; Wang, Junzhi; Chen, Huipeng; Liao, Wanqing

    2016-01-01

    Penicillium capsulatum is a rare Penicillium species used in paper manufacturing, but recently it has been reported to cause invasive infection. To research the pathogenicity of the clinical Penicillium strain, we sequenced the genomes and transcriptomes of the clinical and environmental strains of P. capsulatum. Comparative analyses of these two P. capsulatum strains and close related strains belonging to Eurotiales were performed. The assembled genome sizes of P. capsulatum are approximately 34.4 Mbp in length and encode 11,080 predicted genes. The different isolates of P. capsulatum are highly similar, with the exception of several unique genes, INDELs or SNPs in the genes coding for glycosyl hydrolases, amino acid transporters and circumsporozoite protein. A phylogenomic analysis was performed based on the whole genome data of 38 strains belonging to Eurotiales. By comparing the whole genome sequences and the virulence-related genes from 20 important related species, including fungal pathogens and non-human pathogens belonging to Eurotiales, we found meaningful pathogenicity characteristics between P. capsulatum and its closely related species. Our research indicated that P. capsulatum may be a neglected opportunistic pathogen. This study is beneficial for mycologists, geneticists and epidemiologists to achieve a deeper understanding of the genetic basis of the role of P. capsulatum as a newly reported fungal pathogen. PMID:27761131

  1. Functional Analysis of Shewanella, a cross genome comparison.

    Energy Technology Data Exchange (ETDEWEB)

    Serres, Margrethe H.

    2009-05-15

    The bacterial genus Shewanella includes a group of highly versatile organisms that have successfully adapted to life in many environments ranging from aquatic (fresh and marine) to sedimentary (lake and marine sediments, subsurface sediments, sea vent). A unique respiratory capability of the Shewanellas, initially observed for Shewanella oneidensis MR-1, is the ability to use metals and metalloids, including radioactive compounds, as electron acceptors. Members of the Shewanella genus have also been shown to degrade environmental pollutants i.e. halogenated compounds, making this group highly applicable for the DOE mission. S. oneidensis MR-1 has in addition been found to utilize a diverse set of nutrients and to have a large set of genes dedicated to regulation and to sensing of the environment. The sequencing of the S. oneidensis MR-1 genome facilitated experimental and bioinformatics analyses by a group of collaborating researchers, the Shewanella Federation. Through the joint effort and with support from Department of Energy S. oneidensis MR-1 has become a model organism of study. Our work has been a functional analysis of S. oneidensis MR-1, both by itself and as part of a comparative study. We have improved the annotation of gene products, assigned metabolic functions, and analyzed protein families present in S. oneidensis MR-1. The data has been applied to analysis of experimental data (i.e. gene expression, proteome) generated for S. oneidensis MR-1. Further, this work has formed the basis for a comparative study of over 20 members of the Shewanella genus. The species and strains selected for genome sequencing represented an evolutionary gradient of DNA relatedness, ranging from close to intermediate, and to distant. The organisms selected have also adapted to a variety of ecological niches. Through our work we have been able to detect and interpret genome similarities and differences between members of the genus. We have in this way contributed to the

  2. Comparative analysis of A, B,C and D genomes in the genus Oryza with C0t-1 DNA of C genome

    Institute of Scientific and Technical Information of China (English)

    LAN Weizhen; QIN Rui; LI Gang; HE Guangcun

    2006-01-01

    Fluorescence in situ hybridization (FISH)was applied to somatic chromosomes preparations of Oryza officinalis Wall. (CC), O. sativa L. (AA)×O. officinalis F1 hybrid (AC), backcross progenies BC1 (AAC and ACC), O. latifolia Desv. (CCDD), O. alta Swallen (CCDD) and O. punctata Kotschy (BBCC)with a labelled probe of Cot-1 DNA from O. officinalis.In O. officinalis, the homologous chromosomes showed similar signal bands probed by C0t-1 DNA and karyotype analysis was conducted based on the band patterns. Using no blocking DNA, the probe identified the chromosomes of C genome clearly, but detected few signals on chromosomes of A genome in the F1 hybrid and two backcross progenies of BC1.It is obvious that the highly and moderately repetitive DNA sequences were considerably different between C and A genomes. The chromosomes of C genome were also discriminated from the chromosomes of D-and B-genome in the tetraploid species O. latifolia, O.alta and O. punctata by C0t-1 DNA-FISH. Comparison of the fluorescence intensity on the chromosomes of B, C and D genomes in O. latifolia, O. alta,and O. punctata indicated that the differentiations between C and D genomes are less than that between C and B genomes. The relationship between C and D genomes in O. alta is closer than that of C and D genomes in O. latifolia. This would be one of the causes for the fact that both the genomes are of the same karyotype (CCDD) but belong to different species. The above results showed that the C0t-1 DNA had a high specificity of genome and species. In this paper, the origin of allotetraploid in genus Oryza is also discussed.

  3. Ori-Finder: A web-based system for finding oriCs in unannotated bacterial genomes

    Directory of Open Access Journals (Sweden)

    Zhang Chun-Ting

    2008-02-01

    Full Text Available Abstract Background Chromosomal replication is the central event in the bacterial cell cycle. Identification of replication origins (oriCs is necessary for almost all newly sequenced bacterial genomes. Given the increasing pace of genome sequencing, the current available software for predicting oriCs, however, still leaves much to be desired. Therefore, the increasing availability of genome sequences calls for improved software to identify oriCs in newly sequenced and unannotated bacterial genomes. Results We have developed Ori-Finder, an online system for finding oriCs in bacterial genomes based on an integrated method comprising the analysis of base composition asymmetry using the Z-curve method, distribution of DnaA boxes, and the occurrence of genes frequently close to oriCs. The program can also deal with unannotated genome sequences by integrating the gene-finding program ZCURVE 1.02. Output of the predicted results is exported to an HTML report, which offers convenient views on the results in both graphical and tabular formats. Conclusion A web-based system to predict replication origins of bacterial genomes has been presented here. Based on this system, oriC regions have been predicted for the bacterial genomes available in GenBank currently. It is hoped that Ori-Finder will become a useful tool for the identification and analysis of oriCs in both bacterial and archaeal genomes.

  4. Genome-Scale Analysis of Cell-Specific Regulatory Codes Using Nuclear Enzymes.

    Science.gov (United States)

    Baek, Songjoon; Sung, Myong-Hee

    2016-01-01

    High-throughput sequencing technologies have made it possible for biologists to generate genome-wide profiles of chromatin features at the nucleotide resolution. Enzymes such as nucleases or transposes have been instrumental as a chromatin-probing agent due to their ability to target accessible chromatin for cleavage or insertion. On the scale of a few hundred base pairs, preferential action of the nuclear enzymes on accessible chromatin allows mapping of cell state-specific accessibility in vivo. Such accessible regions contain functionally important regulatory sites, including promoters and enhancers, which undergo active remodeling for cells adapting in a dynamic environment. DNase-seq and the more recent ATAC-seq are two assays that are gaining popularity. Deep sequencing of DNA libraries from these assays, termed genomic footprinting, has been proposed to enable the comprehensive construction of protein occupancy profiles over the genome at the nucleotide level. Recent studies have discovered limitations of genomic footprinting which reduce the scope of detectable proteins. In addition, the identification of putative factors that bind to the observed footprints remains challenging. Despite these caveats, the methodology still presents significant advantages over alternative techniques such as ChIP-seq or FAIRE-seq. Here we describe computational approaches and tools for analysis of chromatin accessibility and genomic footprinting. Proper experimental design and assay-specific data analysis ensure the detection sensitivity and maximize retrievable information. The enzyme-based chromatin profiling approaches represent a powerful and evolving methodology which facilitates our understanding of how the genome is regulated.

  5. Stacks: an analysis tool set for population genomics.

    Science.gov (United States)

    Catchen, Julian; Hohenlohe, Paul A; Bassham, Susan; Amores, Angel; Cresko, William A

    2013-06-01

    Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.

  6. IMG/M: integrated genome and metagenome comparative data analysis system.

    Science.gov (United States)

    Chen, I-Min A; Markowitz, Victor M; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N; Kyrpides, Nikos C

    2017-01-04

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.

  7. Genome sequencing and analysis of a granulovirus isolated from the Asiatic rice leafroller, Cnaphalocrocis medinalis.

    Science.gov (United States)

    Zhang, Shan; Zhu, Zheng; Sun, Shifeng; Chen, Qijin; Deng, Fei; Yang, Kai

    2015-12-01

    The complete genome of Cnaphalocrocis medinalis granulovirus (CnmeGV) from a serious migratory rice pest, Cnaphalocrocis medinalis (Lepidoptera: Pyralidae), was sequenced using the Roche 454 Genome Sequencer FLX system (GS FLX) with shotgun strategy and assembled by Roche GS De Novo assembler software. Its circular double-stranded genome is 111,246 bp in size with a high A+T content of 64.8% and codes for 118 putative open reading frames (ORFs). It contains 37 conserved baculovirus core ORFs, 13 unique ORFs, 26 ORFs that were found in all Lepidoptera baculoviruses and 42 common ORFs. The analysis of nucleotide sequence repeats revealed that the CnmeGV genome differs from the rest of sequenced GVs by a 23 kb and a 17kb gene block inversions, and does not contain any typical homologous region (hr) except for a region of non-hr-like sequence. Chitinase and cathepsin genes, which are reported to have major roles in the liquefaction of the hosts, were not found in the CnmeGV genome, which explains why CnmeGV infected insects do not show the phenotype of typical liquefaction. Phylogenetic analysis, based on the 37 core baculovirus genes, indicates that CnmeGV is closely related to Adoxophyes orana granulovirus. The genome analysis would contribute to the functional research of CnmeGV, and would benefit to the utilization of CnmeGV as pest control reagent for rice production.

  8. Genome sequencing and analysis of a granulovirus isolated from the Asiatic rice leafroller, Cnaphalocrocis medinalis

    Institute of Scientific and Technical Information of China (English)

    Shan Zhang; Zheng Zhu; Shifeng Sun; Qijin Chen; Fei Deng; Kai Yang

    2015-01-01

    The complete genome of Cnaphalocrocis medinalis granulovirus(CnmeGV) from a serious migratory rice pest, Cnaphalocrocis medinalis(Lepidoptera: Pyralidae), was sequenced using the Roche 454 Genome Sequencer FLX system(GS FLX) with shotgun strategy and assembled by Roche GS De Novo assembler software. Its circular double-stranded genome is 111,246 bp in size with a high A+T content of 64.8% and codes for 118 putative open reading frames(ORFs). It contains 37 conserved baculovirus core ORFs, 13 unique ORFs, 26 ORFs that were found in all Lepidoptera baculoviruses and 42 common ORFs. The analysis of nucleotide sequence repeats revealed that the CnmeGV genome differs from the rest of sequenced GVs by a 23 kb and a 17 kb gene block inversions, and does not contain any typical homologous region(hr) except for a region of non-hr-like sequence. Chitinase and cathepsin genes, which are reported to have major roles in the liquefaction of the hosts, were not found in the CnmeGV genome, which explains why CnmeGV infected insects do not show the phenotype of typical liquefaction. Phylogenetic analysis,based on the 37 core baculovirus genes, indicates that CnmeGV is closely related to Adoxophyes orana granulovirus. The genome analysis would contribute to the functional research of CnmeGV,and would benefit to the utilization of CnmeGV as pest control reagent for rice production.

  9. Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes

    DEFF Research Database (Denmark)

    Bohlin, J; Skjerve, E; Ussery, David

    2008-01-01

    BACKGROUND: The increasing number of sequenced prokaryotic genomes contains a wealth of genomic data that needs to be effectively analysed. A set of statistical tools exists for such analysis, but their strengths and weaknesses have not been fully explored. The statistical methods we are concerned......, or be based on specific statistical distributions. Advantages with these statistical methods include measurements of phylogenetic relationship with relatively small pieces of DNA sampled from almost anywhere within genomes, detection of foreign/conserved DNA, and homology searches. Our aim was to explore...... measure was a good measure to detect horizontally transferred regions, and when used to compare the phylogenetic relationships between plasmids and hosts, significant correlation (R2 = 0.4) was found with genomic GC content and intra-chromosomal homogeneity. CONCLUSION: The statistical methods examined...

  10. Functional genomic analysis of C. elegans molting.

    Directory of Open Access Journals (Sweden)

    Alison R Frand

    2005-10-01

    Full Text Available Although the molting cycle is a hallmark of insects and nematodes, neither the endocrine control of molting via size, stage, and nutritional inputs nor the enzymatic mechanism for synthesis and release of the exoskeleton is well understood. Here, we identify endocrine and enzymatic regulators of molting in C. elegans through a genome-wide RNA-interference screen. Products of the 159 genes discovered include annotated transcription factors, secreted peptides, transmembrane proteins, and extracellular matrix enzymes essential for molting. Fusions between several genes and green fluorescent protein show a pulse of expression before each molt in epithelial cells that synthesize the exoskeleton, indicating that the corresponding proteins are made in the correct time and place to regulate molting. We show further that inactivation of particular genes abrogates expression of the green fluorescent protein reporter genes, revealing regulatory networks that might couple the expression of genes essential for molting to endocrine cues. Many molting genes are conserved in parasitic nematodes responsible for human disease, and thus represent attractive targets for pesticide and pharmaceutical development.

  11. Dyneins across eukaryotes: a comparative genomic analysis.

    Science.gov (United States)

    Wickstead, Bill; Gull, Keith

    2007-12-01

    Dyneins are large minus-end-directed microtubule motors. Each dynein contains at least one dynein heavy chain (DHC) and a variable number of intermediate chains (IC), light intermediate chains (LIC) and light chains (LC). Here, we used genome sequence data from 24 diverse eukaryotes to assess the distribution of DHCs, ICs, LICs and LCs across Eukaryota. Phylogenetic inference identified nine DHC families (two cytoplasmic and seven axonemal) and six IC families (one cytoplasmic). We confirm that dyneins have been lost from higher plants and show that this is most likely because of a single loss of cytoplasmic dynein 1 from the ancestor of Rhodophyta and Viridiplantae, followed by lineage-specific losses of other families. Independent losses in Entamoeba mean that at least three extant eukaryotic lineages are entirely devoid of dyneins. Cytoplasmic dynein 2 is associated with intraflagellar transport (IFT), but in two chromalveolate organisms, we find an IFT footprint without the retrograde motor. The distribution of one family of outer-arm dyneins accounts for 2-headed or 3-headed outer-arm ultrastructures observed in different organisms. One diatom species builds motile axonemes without any inner-arm dyneins (IAD), and the unexpected conservation of IAD I1 in non-flagellate algae and LC8 (DYNLL1/2) in all lineages reveals a surprising fluidity to dynein function.

  12. Primer to analysis of genomic data using R

    CERN Document Server

    Gondro, Cedric

    2015-01-01

    Through this book, researchers and students will learn to use R for analysis of large-scale genomic data and how to create routines to automate analytical steps. The philosophy behind the book is to start with real world raw datasets and perform all the analytical steps needed to reach final results. Though theory plays an important role, this is a practical book for advanced undergraduate and graduate classes in bioinformatics, genomics and statistical genetics or for use in lab sessions. This book is also designed to be used by students in computer science and statistics who want to learn the practical aspects of genomic analysis without delving into algorithmic details. The datasets used throughout the book may be downloaded from the publisher’s website.  Chapters show how to handle and manage high-throughput genomic data, create automated workflows and speed up analyses in R. A wide range of R packages useful for working with genomic data are illustrated with practical examples. In recent years R has b...

  13. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    Directory of Open Access Journals (Sweden)

    Maximo Rivarola

    Full Text Available Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  14. Castor bean organelle genome sequencing and worldwide genetic diversity analysis.

    Science.gov (United States)

    Rivarola, Maximo; Foster, Jeffrey T; Chan, Agnes P; Williams, Amber L; Rice, Danny W; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M J; Khouri, Hoda M; Beckstrom-Sternberg, Stephen M; Allan, Gerard J; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

  15. Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis

    Science.gov (United States)

    Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

    2011-01-01

    Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade. PMID:21750729

  16. Comparative analysis of methods for genome-wide nucleosome cartography.

    Science.gov (United States)

    Quintales, Luis; Vázquez, Enrique; Antequera, Francisco

    2015-07-01

    Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  17. Comparative genomics beyond sequence-based alignments

    DEFF Research Database (Denmark)

    Þórarinsson, Elfar; Yao, Zizhen; Wiklund, Eric D.;

    2008-01-01

    Recent computational scans for non-coding RNAs (ncRNAs) in multiple organisms have relied on existing multiple sequence alignments. However, as sequence similarity drops, a key signal of RNA structure--frequent compensating base changes--is increasingly likely to cause sequence-based alignment me...

  18. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  19. WormBase 2016: expanding to enable helminth genomic research

    Science.gov (United States)

    Howe, Kevin L.; Bolt, Bruce J.; Cain, Scott; Chan, Juancarlos; Chen, Wen J.; Davis, Paul; Done, James; Down, Thomas; Gao, Sibyl; Grove, Christian; Harris, Todd W.; Kishore, Ranjana; Lee, Raymond; Lomax, Jane; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Stanley, Eleanor; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W.

    2016-01-01

    WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research. PMID:26578572

  20. Changing Histopathological Diagnostics by Genome-Based Tumor Classification

    Directory of Open Access Journals (Sweden)

    Michael Kloth

    2014-05-01

    Full Text Available Traditionally, tumors are classified by histopathological criteria, i.e., based on their specific morphological appearances. Consequently, current therapeutic decisions in oncology are strongly influenced by histology rather than underlying molecular or genomic aberrations. The increase of information on molecular changes however, enabled by the Human Genome Project and the International Cancer Genome Consortium as well as the manifold advances in molecular biology and high-throughput sequencing techniques, inaugurated the integration of genomic information into disease classification. Furthermore, in some cases it became evident that former classifications needed major revision and adaption. Such adaptations are often required by understanding the pathogenesis of a disease from a specific molecular alteration, using this molecular driver for targeted and highly effective therapies. Altogether, reclassifications should lead to higher information content of the underlying diagnoses, reflecting their molecular pathogenesis and resulting in optimized and individual therapeutic decisions. The objective of this article is to summarize some particularly important examples of genome-based classification approaches and associated therapeutic concepts. In addition to reviewing disease specific markers, we focus on potentially therapeutic or predictive markers and the relevance of molecular diagnostics in disease monitoring.

  1. Array-based comparative genomic hybridization analysis reveals chromosomal copy number aberrations associated with clinical outcome in canine diffuse large B-cell lymphoma.

    Directory of Open Access Journals (Sweden)

    Arianna Aricò

    Full Text Available Canine Diffuse Large B-cell Lymphoma (cDLBCL is an aggressive cancer with variable clinical response. Despite recent attempts by gene expression profiling to identify the dog as a potential animal model for human DLBCL, this tumor remains biologically heterogeneous with no prognostic biomarkers to predict prognosis. The aim of this work was to identify copy number aberrations (CNAs by high-resolution array comparative genomic hybridization (aCGH in 12 dogs with newly diagnosed DLBCL. In a subset of these dogs, the genetic profiles at the end of therapy and at relapse were also assessed. In primary DLBCLs, 90 different genomic imbalances were counted, consisting of 46 gains and 44 losses. Two gains in chr13 were significantly correlated with clinical stage. In addition, specific regions of gains and losses were significantly associated to duration of remission. In primary DLBCLs, individual variability was found, however 14 recurrent CNAs (>30% were identified. Losses involving IGK, IGL and IGH were always found, and gains along the length of chr13 and chr31 were often observed (>41%. In these segments, MYC, LDHB, HSF1, KIT and PDGFRα are annotated. At the end of therapy, dogs in remission showed four new CNAs, whereas three new CNAs were observed in dogs at relapse compared with the previous profiles. One ex novo CNA, involving TCR, was present in dogs in remission after therapy, possibly induced by the autologous vaccine. Overall, aCGH identified small CNAs associated with outcome, which, along with future expression studies, may reveal target genes relevant to cDLBCL.

  2. arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays

    Directory of Open Access Journals (Sweden)

    Moreau Yves

    2005-05-01

    Full Text Available Abstract Background The availability of the human genome sequence as well as the large number of physically accessible oligonucleotides, cDNA, and BAC clones across the entire genome has triggered and accelerated the use of several platforms for analysis of DNA copy number changes, amongst others microarray comparative genomic hybridization (arrayCGH. One of the challenges inherent to this new technology is the management and analysis of large numbers of data points generated in each individual experiment. Results We have developed arrayCGHbase, a comprehensive analysis platform for arrayCGH experiments consisting of a MIAME (Minimal Information About a Microarray Experiment supportive database using MySQL underlying a data mining web tool, to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Following its flexible design, arrayCGHbase is compatible with all existing and forthcoming arrayCGH platforms. Data can be exported in a multitude of formats, including BED files to map copy number information on the genome using the Ensembl or UCSC genome browser. Conclusion ArrayCGHbase is a web based and platform independent arrayCGH data analysis tool, that allows users to access the analysis suite through the internet or a local intranet after installation on a private server. ArrayCGHbase is available at http://medgen.ugent.be/arrayCGHbase/.

  3. Analysis of genomic signatures in prokaryotes using multinomial regression and hierarchical clustering

    DEFF Research Database (Denmark)

    Ussery, David; Bohlin, Jon; Skjerve, Eystein

    2009-01-01

    Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...

  4. Genomic analysis of thermophilic Bacillus coagulans strains: efficient producers for platform bio-chemicals.

    Science.gov (United States)

    Su, Fei; Xu, Ping

    2014-01-29

    Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic information is available about this species. Here, we determined the genome sequences of five B. coagulans strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino acid metabolism may have played a significant role in the thermal adaptation. We also researched the immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of B. coagulans and paves the way for improving and extending the uses of this species.

  5. DivStat: a user-friendly tool for single nucleotide polymorphism analysis of genomic diversity.

    Directory of Open Access Journals (Sweden)

    Inês Soares

    Full Text Available Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs. Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.

  6. Comparative Analysis on Genomes from Oryza alta and Oryza latifolia by C0t-1 DNA

    Institute of Scientific and Technical Information of China (English)

    WANG De-bin; WANG Yang; WU Qi; ZHAO Hou-ming; LI Gang; QIN Rui; WANG Chun-tai; LIU Hong

    2010-01-01

    In order to reveal the origin and evolutionary relationship between two CCDD genome species, Oryza alta and Oryza latifolia, fluorescence in situ hybridization (FISH) was adopted to analyze the genomes of the two species with C0t-1 DNA from O. alta as a probe. Karyotype was also comparatively analyzed between O. alta and O. latifolia based on their similar band patterns of the hybridization signals. There were a high homology and close relationship between O. alta and O. latifolia, however, the distinction between the hybridization signals was also clear. C0t-1 DNA was proved to be species- and genome type-specific. It is suggested that C0t-1 DNA-FISH could be more efficient to analyze the genomic relationship between different species. According to the comparative analysis of highly and moderately repetitive DNA sequences between the two allotetraploidy species, O. alta and O. latifolia, the possible origin and evolutionary mechanism of allotetraploidy of Oryza were discussed.

  7. Genome-wide gene expression analysis of anguillid herpesvirus 1

    NARCIS (Netherlands)

    Beurden, van S.J.; Peeters, B.P.H.; Rottier, P.J.M.; Davison, A.A.; Engelsma, M.Y.

    2013-01-01

    Background Whereas temporal gene expression in mammalian herpesviruses has been studied extensively, little is known about gene expression in fish herpesviruses. Here we report a genome-wide transcription analysis of a fish herpesvirus, anguillid herpesvirus 1, in cell culture, studied during the

  8. Integrated translational genomics for analysis of complex traits in sorghum

    Science.gov (United States)

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  9. Genome-Wide Association Analysis in Primary Sclerosing Cholangitis

    NARCIS (Netherlands)

    T.H. Karlsen; A. Franke; E. Melum; A.. Kaser; J.R. Hov; T. Balschun; B.A. Lie; A. Bergquist; C. Schramm; T.J. Weismüller; D. Gotthardt; C. Rust; E.E.R. Philipp; T. Fritz; L. Henckaerts; R. Weersma; P. Stokkers; C.Y. Ponsioen; C. Wijmenga; M. Sterneck; M. Nothnagel; J. Hampe; A. Teufel; H. Runz; P. Rosenstiel; A. Stiehl; S. Vermeire; U. Beuers; M. Manns; E. Schrumpf; K.M. Boberg; S. Schreiber

    2010-01-01

    BACKGROUND & AIMS: We aimed to characterize the genetic susceptibility to primary sclerosing cholangitis (PSC) by means of a genome-wide association analysis of single nucleotide polymorphism (SNP) markers. METHODS: A total of 443,816 SNPs on the Affymetrix SNP Array 5.0 (Affymetrix, Santa Clara, CA

  10. Genomic and single nucleotide polymorphism analysis of infectious bronchitis coronavirus.

    Science.gov (United States)

    Abolnik, Celia

    2015-06-01

    Infectious bronchitis virus (IBV) is a Gammacoronavirus that causes a highly contagious respiratory disease in chickens. A QX-like strain was analysed by high-throughput Illumina sequencing and genetic variation across the entire viral genome was explored at the sub-consensus level by single nucleotide polymorphism (SNP) analysis. Thirteen open reading frames (ORFs) in the order 5'-UTR-1a-1ab-S-3a-3b-E-M-4b-4c-5a-5b-N-6b-3'UTR were predicted. The relative frequencies of missense: silent SNPs were calculated to obtain a comparative measure of variability in specific genes. The most variable ORFs in descending order were E, 3b, 5'UTR, N, 1a, S, 1ab, M, 4c, 5a, 6b. The E and 3b protein products play key roles in coronavirus virulence, and RNA folding demonstrated that the mutations in the 5'UTR did not alter the predicted secondary structure. The frequency of SNPs in the Spike (S) protein ORF of 0.67% was below the genomic average of 0.76%. Only three SNPS were identified in the S1 subunit, none of which were located in hypervariable region (HVR) 1 or HVR2. The S2 subunit was considerably more variable containing 87% of the polymorphisms detected across the entire S protein. The S2 subunit also contained a previously unreported multi-A insertion site and a stretch of four consecutive mutated amino acids, which mapped to the stalk region of the spike protein. Template-based protein structure modelling produced the first theoretical model of the IBV spike monomer. Given the lack of diversity observed at the sub-consensus level, the tenet that the HVRs in the S1 subunit are very tolerant of amino acid changes produced by genetic drift is questioned. Copyright © 2015 Elsevier B.V. All rights reserved.

  11. Cost analysis of whole genome sequencing in German clinical practice.

    Science.gov (United States)

    Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf

    2017-06-01

    Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.

  12. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 complex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  13. Comparative analysis of whole genome structure of Streptococcus suis using whole genome PCR scanning

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    An outbreak associated with Streptococcus suis infection in humans emerged in Sichuan province, China in 2005. The outbreak is atypical for the apparent large number of human cases, high fatality rate and geographical spread. To determine whether the bacterium has changed, we compared both human and animal isolates from the Sichuan outbreak with those collected previously within China and in other countries using whole genome PCR scanning (WGPScaning) comparative sequencing of several known virulence factor genes and multilocus sequence typing (MLST) analysis. WGPScanning analysis showed that all primer pairs yielded PCR products of the expected sizes in all four strains tested. The nucleotide sequences of all the detected virulence factor genes are identical in the four strains and MLST results showed that the four isolates studied and reference strain all belonged to the ST1 com-plex. No new genetic changes were found in the genome structure of the isolates from this Sichuan outbreak.

  14. Microsatellite analysis in the genome of Acanthaceae: An in silico approach

    Directory of Open Access Journals (Sweden)

    Priyadharsini Kaliswamy

    2015-01-01

    Full Text Available Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future.

  15. Flow cytometric analysis of oil palm: a preliminary analysis for cultivars and genomic DNA alteration

    Directory of Open Access Journals (Sweden)

    Warawut Chuthammathat

    2005-12-01

    Full Text Available DNA contents of oil palm (Elaeis guineensis Jacq. cultivars were analyzed by flow cytometry using different external reference plant species. Analysis using corn (Zea mays line CE-777 as a reference plant gave the highest DNA content of oil palm (4.72±0.23 pg 2C-1 whereas the DNA content was found to be lower when using soybean (Glycine max cv. Polanka (3.77±0.09 pg 2C-1 or tomato (Lycopersicon esculentum cv. Stupicke (4.25±0.09 pg 2C-1 as a reference. The nuclear DNA contents of Dura (D109, Pisifera (P168 and Tenera (T38 cultivars were 3.46±0.04, 3.24±0.03 and 3.76±0.04 pg 2C-1 nuclei, respectively, using soybean as a reference. One haploid genome of oil palm therefore ranged from 1.56 to 1.81±109 base pairs. DNA contents from one-year-old calli and cell suspension of oil palm were found to be significantly different from those of seedlings. It thus should be noted that genomic DNA alteration occurred in these cultured tissues. We therefore confirm that flow cytometric analysis could verify cultivars, DNA content and genomic DNA alteration of oil palm using soybean as an external reference standard.

  16. The complexity of Rhipicephalus (Boophilus microplus genome characterised through detailed analysis of two BAC clones

    Directory of Open Access Journals (Sweden)

    Valle Manuel

    2011-07-01

    Full Text Available Abstract Background Rhipicephalus (Boophilus microplus (Rmi a major cattle ectoparasite and tick borne disease vector, impacts on animal welfare and industry productivity. In arthropod research there is an absence of a complete Chelicerate genome, which includes ticks, mites, spiders, scorpions and crustaceans. Model arthropod genomes such as Drosophila and Anopheles are too taxonomically distant for a reference in tick genomic sequence analysis. This study focuses on the de-novo assembly of two R. microplus BAC sequences from the understudied R microplus genome. Based on available R. microplus sequenced resources and comparative analysis, tick genomic structure and functional predictions identify complex gene structures and genomic targets expressed during tick-cattle interaction. Results In our BAC analyses we have assembled, using the correct positioning of BAC end sequences and transcript sequences, two challenging genomic regions. Cot DNA fractions compared to the BAC sequences confirmed a highly repetitive BAC sequence BM-012-E08 and a low repetitive BAC sequence BM-005-G14 which was gene rich and contained short interspersed elements (SINEs. Based directly on the BAC and Cot data comparisons, the genome wide frequency of the SINE Ruka element was estimated. Using a conservative approach to the assembly of the highly repetitive BM-012-E08, the sequence was de-convoluted into three repeat units, each unit containing an 18S, 5.8S and 28S ribosomal RNA (rRNA encoding gene sequence (rDNA, related internal transcribed spacer and complex intergenic region. In the low repetitive BM-005-G14, a novel gene complex was found between to 2 genes on the same strand. Nested in the second intron of a large 9 Kb papilin gene was a helicase gene. This helicase overlapped in two exonic regions with the papilin. Both these genes were shown expressed in different tick life stage important in ectoparasite interaction with the host. Tick specific sequence

  17. The genome sequence of E. coli W (ATCC 9637: comparative genome analysis and an improved genome-scale reconstruction of E. coli

    Directory of Open Access Journals (Sweden)

    Lee Sang

    2011-01-01

    Full Text Available Abstract Background Escherichia coli is a model prokaryote, an important pathogen, and a key organism for industrial biotechnology. E. coli W (ATCC 9637, one of four strains designated as safe for laboratory purposes, has not been sequenced. E. coli W is a fast-growing strain and is the only safe strain that can utilize sucrose as a carbon source. Lifecycle analysis has demonstrated that sucrose from sugarcane is a preferred carbon source for industrial bioprocesses. Results We have sequenced and annotated the genome of E. coli W. The chromosome is 4,900,968 bp and encodes 4,764 ORFs. Two plasmids, pRK1 (102,536 bp and pRK2 (5,360 bp, are also present. W has unique features relative to other sequenced laboratory strains (K-12, B and Crooks: it has a larger genome and belongs to phylogroup B1 rather than A. W also grows on a much broader range of carbon sources than does K-12. A genome-scale reconstruction was developed and validated in order to interrogate metabolic properties. Conclusions The genome of W is more similar to commensal and pathogenic B1 strains than phylogroup A strains, and therefore has greater utility for comparative analyses with these strains. W should therefore be the strain of choice, or 'type strain' for group B1 comparative analyses. The genome annotation and tools created here are expected to allow further utilization and development of E. coli W as an industrial organism for sucrose-based bioprocesses. Refinements in our E. coli metabolic reconstruction allow it to more accurately define E. coli metabolism relative to previous models.

  18. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  19. Chromosomal imbalances in nasopharyngeal carcinoma: a meta-analysis of comparative genomic hybridization results

    Directory of Open Access Journals (Sweden)

    Jin Ping

    2006-01-01

    Full Text Available Abstract Nasopharyngeal carcinoma (NPC is a highly prevalent disease in Southeast Asia and its prevalence is clearly affected by genetic background. Various theories have been suggested for its high incidence in this geographical region but to these days no conclusive explanation has been identified. Chromosomal imbalances identifiable through comparative genomic hybridization may shed some light on common genetic alterations that may be of relevance to the onset and progression of NPC. Review of the literature, however, reveals contradictory results among reported findings possibly related to factors associated with patient selection, stage of disease, differences in methodological details etc. To increase the power of the analysis and attempt to identify commonalities among the reported findings, we performed a meta-analysis of results described in NPC tissues based on chromosomal comparative genomic hybridization (CGH. This meta-analysis revealed consistent patters in chromosomal abnormalities that appeared to cluster in specific "hot spots" along the genome following a stage-dependent progression.

  20. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  1. Genomes-based phylogeny of the genus Xanthomonas

    Directory of Open Access Journals (Sweden)

    Rodriguez-R Luis M

    2012-03-01

    Full Text Available Abstract Background The genus Xanthomonas comprises several plant pathogenic bacteria affecting a wide range of hosts. Despite the economic, industrial and biological importance of Xanthomonas, the classification and phylogenetic relationships within the genus are still under active debate. Some of the relationships between pathovars and species have not been thoroughly clarified, with old pathovars becoming new species. A change in the genus name has been recently suggested for Xanthomonas albilineans, an early branching species currently located in this genus, but a thorough phylogenomic reconstruction would aid in solving these and other discrepancies in this genus. Results Here we report the results of the genome-wide analysis of DNA sequences from 989 orthologous groups from 17 Xanthomonas spp. genomes available to date, representing all major lineages within the genus. The phylogenetic and computational analyses used in this study have been automated in a Perl package designated Unus, which provides a framework for phylogenomic analyses which can be applied to other datasets at the genomic level. Unus can also be easily incorporated into other phylogenomic pipelines. Conclusions Our phylogeny agrees with previous phylogenetic topologies on the genus, but revealed that the genomes of Xanthomonas citri and Xanthomonas fuscans belong to the same species, and that of Xanthomonas albilineans is basal to the joint clade of Xanthomonas and Xylella fastidiosa. Genome reduction was identified in the species Xanthomonas vasicola in addition to the previously identified reduction in Xanthomonas albilineans. Lateral gene transfer was also observed in two gene clusters.

  2. Large-scale genomic analysis of ovarian carcinomas.

    Science.gov (United States)

    Gorringe, Kylie L; Campbell, Ian G

    2009-04-01

    Epithelial ovarian cancers are typified by frequent genomic aberrations that have been difficult to unravel. Recently, high-resolution array technologies have provided the first glimpse of the remarkable complexity of these aberrations with some ovarian cancers containing hundreds of copy number breakpoints, micro-deletions and amplifications. Many of these alterations contain cancer-related genes suggesting that the majority is disease-associated and not just the product of random genomic instability. Future developments such as next-generation sequencing and integrated analysis of data from multiple array platforms on large numbers of samples are poised to revolutionize our understanding of this complex disease.

  3. INTEGRATED GENOME-BASED STUDIES OF SHEWANELLA ECOPHYSIOLOGY

    Energy Technology Data Exchange (ETDEWEB)

    TIEDJE, JAMES M; KONSTANTINIDIS, KOSTAS; WORDEN, MARK

    2014-01-08

    The aim of the work reported is to study Shewanella population genomics, and to understand the evolution, ecophysiology, and speciation of Shewanella. The tasks supporting this aim are: to study genetic and ecophysiological bases defining the core and diversification of Shewanella species; to determine gene content patterns along redox gradients; and to Investigate the evolutionary processes, patterns and mechanisms of Shewanella.

  4. The Genome of Nosema sp. Isolate YNPr: A Comparative Analysis of Genome Evolution within the Nosema/Vairimorpha Clade

    Science.gov (United States)

    Ma, Zhenggang; Li, Tian; Zhang, Xiaoyan; Debrunner-Vossbrinck, Bettina A.; Zhou, Zeyang; Vossbrinck, Charles R.

    2016-01-01

    The microsporidian parasite designated here as Nosema sp. Isolate YNPr was isolated from the cabbage butterfly Pieris rapae collected in Honghe Prefecture, Yunnan Province, China. The genome was sequenced by Illumina sequencing and compared to those of two related members of the Nosema/Vairimorpha clade, Nosema ceranae and Nosema apis. Based upon assembly statistics, the Nosema sp. YNPr genome is 3.36 x 106bp with a G+C content of 23.18% and 2,075 protein coding sequences. An “ACCCTT” motif is present approximately 50-bp upstream of the start codon, as reported from other members of the clade and from Encephalitozoon cuniculi, a sister taxon. Comparative small subunit ribosomal DNA (SSU rDNA) analysis as well as genome-wide phylogenetic analysis confirms a closer relationship between N. ceranae and Nosema sp. YNPr than between the two honeybee parasites N. ceranae and N. apis. The more closely related N. ceranae and Nosema sp. YNPr show similarities in a number of structural characteristics such as gene synteny, gene length, gene number, transposon composition and gene reduction. Based on transposable element content of the assemblies, the transposon content of Nosema sp. YNPr is 4.8%, that of N. ceranae is 3.7%, and that of N. apis is 2.5%, with large differences in the types of transposons present among these 3 species. Gene function annotation indicates that the number of genes participating in most metabolic activities is similar in all three species. However, the number of genes in the transcription, general function, and cysteine protease categories is greater in N. apis than in the other two species. Our studies further characterize the evolution of the Nosema/Vairimorpha clade of microsporidia. These organisms maintain variable but very reduced genomes. We are interested in understanding the effects of genetic drift versus natural selection on genome size in the microsporidia and in developing a testable hypothesis for further studies on the genomic

  5. Identification of genomic sites for CRISPR/Cas9-based genome editing in the Vitis vinifera genome

    Science.gov (United States)

    CRISPR/Cas9 has been recently demonstrated as an effective and popular genome editing tool for modifying genomes of human, animals, microorganisms, and plants. Success of such genome editing is highly dependent on the availability of suitable target sites in the genomes to be edited. Many specific t...

  6. Analysis of recent segmental duplications in the bovine genome

    Directory of Open Access Journals (Sweden)

    Li Congjun

    2009-12-01

    Full Text Available Abstract Background Duplicated sequences are an important source of gene innovation and structural variation within mammalian genomes. We performed the first systematic and genome-wide analysis of segmental duplications in the modern domesticated cattle (Bos taurus. Using two distinct computational analyses, we estimated that 3.1% (94.4 Mb of the bovine genome consists of recently duplicated sequences (≥ 1 kb in length, ≥ 90% sequence identity. Similar to other mammalian draft assemblies, almost half (47% of 94.4 Mb of these sequences have not been assigned to cattle chromosomes. Results In this study, we provide the first experimental validation large duplications and briefly compared their distribution on two independent bovine genome assemblies using fluorescent in situ hybridization (FISH. Our analyses suggest that the (75-90% of segmental duplications are organized into local tandem duplication clusters. Along with rodents and carnivores, these results now confidently establish tandem duplications as the most likely mammalian archetypical organization, in contrast to humans and great ape species which show a preponderance of interspersed duplications. A cross-species survey of duplicated genes and gene families indicated that duplication, positive selection and gene conversion have shaped primates, rodents, carnivores and ruminants to different degrees for their speciation and adaptation. We identified that bovine segmental duplications corresponding to genes are significantly enriched for specific biological functions such as immunity, digestion, lactation and reproduction. Conclusion Our results suggest that in most mammalian lineages segmental duplications are organized in a tandem configuration. Segmental duplications remain problematic for genome and assembly and we highlight genic regions that require higher quality sequence characterization. This study provides insights into mammalian genome evolution and generates a valuable

  7. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  8. Random forests for genomic data analysis

    National Research Council Canada - National Science Library

    Chen, Xi; Ishwaran, Hemant

    2012-01-01

    Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features...

  9. Genome analysis of the platypus reveals unique signatures of evolution.

    Science.gov (United States)

    Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

    2008-05-08

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.

  10. Genome analysis of the platypus reveals unique signatures of evolution

    Science.gov (United States)

    Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

    2009-01-01

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734

  11. Genomic Analysis of Caldithrix abyssi, the Thermophilic Anaerobic Bacterium of the Novel Bacterial Phylum Calditrichaeota

    Science.gov (United States)

    Kublanov, Ilya V.; Sigalova, Olga M.; Gavrilov, Sergey N.; Lebedinsky, Alexander V.; Rinke, Christian; Kovaleva, Olga; Chernyh, Nikolai A.; Ivanova, Natalia; Daum, Chris; Reddy, T.B.K.; Klenk, Hans-Peter; Spring, Stefan; Göker, Markus; Reva, Oleg N.; Miroshnichenko, Margarita L.; Kyrpides, Nikos C.; Woyke, Tanja; Gelfand, Mikhail S.; Bonch-Osmolovskaya, Elizaveta A.

    2017-01-01

    demonstrated the ability of C. abyssi to synthesize nucleotides and most amino acids and vitamins. Finally, the genomic sequence allowed us to perform a phylogenomic analysis, based on 38 protein sequences, which confirmed the deep branching of this lineage and justified the proposal of a novel phylum Calditrichaeota. PMID:28265262

  12. Improved statistics for genome-wide interaction analysis.

    Science.gov (United States)

    Ueki, Masao; Cordell, Heather J

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new "joint effects" statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al

  13. Impulsive Neural Networks Algorithm Based on the Artificial Genome Model

    Directory of Open Access Journals (Sweden)

    Yuan Gao

    2014-05-01

    Full Text Available To describe gene regulatory networks, this article takes the framework of the artificial genome model and proposes impulsive neural networks algorithm based on the artificial genome model. Firstly, the gene expression and the cell division tree are applied to generate spiking neurons with specific attributes, neural network structure, connection weights and specific learning rules of each neuron. Next, the gene segment duplications and divergence model are applied to design the evolutionary algorithm of impulsive neural networks at the level of the artificial genome. The dynamic changes of developmental gene regulatory networks are controlled during the whole evolutionary process. Finally, the behavior of collecting food for autonomous intelligent agent is simulated, which is driven by nerves. Experimental results demonstrate that the algorithm in this article has the evolutionary ability on large-scale impulsive neural networks

  14. The effect of genealogy-based haplotypes on genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Fernando, Rohan L.; Su, Guosheng

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression...... on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using...... local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (pi) of the haplotype covariates had zero effect...

  15. Genome-wide microarray-based analysis of miRNAs expression in patients with acute-on-chronic liver failure

    Institute of Scientific and Technical Information of China (English)

    Wen Chen; Ze-Hui Yan; Yu-Ming Wang; Bao-Yan Xu; Guo-Hong Deng

    2014-01-01

    BACKGROUND: Acute-on-chronic liver failure (ACLF) is a severe clinical syndrome that may cause a high mortality. However, the mechanism is still not clear. Characterization of the microRNA (miRNA) proifles in ACLF patients may provide new clues to the pathogenesis and management of this syndrome. METHODS: Genome-wide  microarray  was  performed  to compare the different miRNA expression proifles in peripheral blood mononuclear cells of a pair of monozygotic twins, an ACLF patient and an HBV asymptomatic carrier (AsC). The case-control miRNA proifles were compared and conifrmed by quantitative reverse transcription-polymerase chain reaction in 104 ACLF patients and 96 AsCs. A combined computational prediction algorithm was used to predict the potential target genes. RESULTS: Forty-ifve miRNAs were increased and eight miRNAs were decreased in the ACLF group. The expressions of hsa-let-7a and hsa-miR-16 were increased by 8.58- and 8.63-fold in ACLF patients compared with that in AsCs, respectively (P CONCLUSIONS: Our  results  showed  that  there  is  a  close relationship  between  speciifc  miRNAs  of  peripheral  blood mononuclear cells and ACLF. hsa-miR-16 and hsa-let-7a may contribute to the development of ACLF.

  16. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

  17. The Chlamydia psittaci genome: a comparative analysis of intracellular pathogens.

    Directory of Open Access Journals (Sweden)

    Anja Voigt

    Full Text Available BACKGROUND: Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis. RESULTS: A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins. CONCLUSIONS: This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.

  18. Pan-genome analysis of Senegalese and Gambian strains of ...

    African Journals Online (AJOL)

    Mbaye

    2016-11-09

    Nov 9, 2016 ... this work, the pan-genome of B. anthracis was studied based on nine strains and using ... Humans can be infected by various routes: ingestion, inhalation ... associated with 96 other projects on an Applied Biosystems SOLiD.

  19. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web.

    Science.gov (United States)

    Miller, Chase A; Anthony, Jon; Meyer, Michelle M; Marth, Gabor

    2013-02-01

    High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported.

  20. Characterization of the complete mitochondrial genome of the hybrid Epinephelus moara♀ × Epinephelus lanceolatus♂, and phylogenetic analysis in subfamily epinephelinae

    Science.gov (United States)

    Gao, Fengtao; Wei, Min; Zhu, Ying; Guo, Hua; Chen, Songlin; Yang, Guanpin

    2017-06-01

    This study presents the complete mitochondrial genome of the hybrid Epinephelus moara♀× Epinephelus lanceolatus♂. The genome is 16886 bp in length, and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a light-strand replication origin and a control region. Additionally, phylogenetic analysis based on the nucleotide sequences of 13 conserved protein-coding genes using the maximum likelihood method indicated that the mitochondrial genome is maternally inherited. This study presents genomic data for studying phylogenetic relationships and breeding of hybrid Epinephelinae.

  1. Integrated genome based studies of Shewanella ecophysiology

    Energy Technology Data Exchange (ETDEWEB)

    Saffarini, Daad A

    2013-03-07

    Progress is reported in these areas: Regulation of anaerobic respiration by cAMP receptor protein and role of adenylate cyclases; Identification of an octaheme c cytochrome as the terminal sulfite reductase in S. oneidensis MR-1; Identification and analysis of components of the electron transport chains that lead to reduction of thiosulfate, tetrathionate, and elemental sulfur in MR-1; Involvement of pili and flagella in metal reduction by S. oneidensis MR-1; and work suggesting that HemN1 is the major enzyme that is involved in heme biosynthesis under anaerobic conditions.

  2. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows.

    Science.gov (United States)

    O'Connor, Brian D; Yuen, Denis; Chung, Vincent; Duncan, Andrew G; Liu, Xiang Kun; Patricia, Janice; Paten, Benedict; Stein, Lincoln; Ferretti, Vincent

    2017-01-01

    As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore ( https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).

  3. RadishBase: a database for genomics and genetics of radish.

    Science.gov (United States)

    Shen, Di; Sun, Honghe; Huang, Mingyun; Zheng, Yi; Li, Xixiang; Fei, Zhangjun

    2013-02-01

    Radish is an economically important vegetable crop. During the past several years, large-scale genomics and genetics resources have been accumulated for this species. To store, query, analyze and integrate these radish resources efficiently, we have developed RadishBase (http://bioinfo.bti.cornell.edu/radish), a genomics and genetics database of radish. Currently the database contains radish mitochondrial genome sequences, expressed sequence tag (EST) and unigene sequences and annotations, biochemical pathways, EST-derived single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers, and genetic maps. RadishBase is designed to enable users easily to retrieve and visualize biologically important information through a set of efficient query interfaces and analysis tools, including the BLAST search and unigene annotation query interfaces, and tools to classify unigenes functionally, to identify enriched gene ontology (GO) terms and to visualize genetic maps. A database containing radish pathways predicted from unigene sequences is also included in RadishBase. The tools and interfaces in RadishBase allow efficient mining of recently released and continually expanding large-scale radish genomics and genetics data sets, including the radish genome sequences and RNA-seq data sets.

  4. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  5. Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

    Science.gov (United States)

    Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

    2016-11-01

    In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.

  6. The Revolution in Viral Genomics as Exemplified by the Bioinformatic Analysis of Human Adenoviruses

    Directory of Open Access Journals (Sweden)

    Sarah Torres

    2010-06-01

    Full Text Available Over the past 30 years, genomic and bioinformatic analysis of human adenoviruses has been achieved using a variety of DNA sequencing methods; initially with the use of restriction enzymes and more currently with the use of the GS FLX pyrosequencing technology. Following the conception of DNA sequencing in the 1970s, analysis of adenoviruses has evolved from 100 base pair mRNA fragments to entire genomes. Comparative genomics of adenoviruses made its debut in 1984 when nucleotides and amino acids of coding sequences within the hexon genes of two human adenoviruses (HAdV, HAdV–C2 and HAdV–C5, were compared and analyzed. It was determined that there were three different zones (1-393, 394-1410, 1411-2910 within the hexon gene, of which HAdV–C2 and HAdV–C5 shared zones 1 and 3 with 95% and 89.5% nucleotide identity, respectively. In 1992, HAdV-C5 became the first adenovirus genome to be fully sequenced using the Sanger method. Over the next seven years, whole genome analysis and characterization was completed using bioinformatic tools such as blastn, tblastx, ClustalV and FASTA, in order to determine key proteins in species HAdV-A through HAdV-F. The bioinformatic revolution was initiated with the introduction of a novel species, HAdV-G, that was typed and named by the use of whole genome sequencing and phylogenetics as opposed to traditional serology. HAdV bioinformatics will continue to advance as the latest sequencing technology enables scientists to add to and expand the resource databases. As a result of these advancements, how novel HAdVs are typed has changed. Bioinformatic analysis has become the revolutionary tool that has significantly accelerated the in-depth study of HAdV microevolution through comparative genomics.

  7. The diversity of cyanobacterial metabolism: genome analysis of multiple phototrophic microorganisms

    Directory of Open Access Journals (Sweden)

    Beck Christian

    2012-02-01

    Full Text Available Abstract Background Cyanobacteria are among the most abundant organisms on Earth and represent one of the oldest and most widespread clades known in modern phylogenetics. As the only known prokaryotes capable of oxygenic photosynthesis, cyanobacteria are considered to be a promising resource for renewable fuels and natural products. Our efforts to harness the sun's energy using cyanobacteria would greatly benefit from an increased understanding of the genomic diversity across multiple cyanobacterial strains. In this respect, the advent of novel sequencing techniques and the availability of several cyanobacterial genomes offers new opportunities for understanding microbial diversity and metabolic organization and evolution in diverse environments. Results Here, we report a whole genome comparison of multiple phototrophic cyanobacteria. We describe genetic diversity found within cyanobacterial genomes, specifically with respect to metabolic functionality. Our results are based on pair-wise comparison of protein sequences and concomitant construction of clusters of likely ortholog genes. We differentiate between core, shared and unique genes and show that the majority of genes are associated with a single genome. In contrast, genes with metabolic function are strongly overrepresented within the core genome that is common to all considered strains. The analysis of metabolic diversity within core carbon metabolism reveals parts of the metabolic networks that are highly conserved, as well as highly fragmented pathways. Conclusions Our results have direct implications for resource allocation and further sequencing projects. It can be extrapolated that the number of newly identified genes still significantly increases with increasing number of new sequenced genomes. Furthermore, genome analysis of multiple phototrophic strains allows us to obtain a detailed picture of metabolic diversity that can serve as a starting point for biotechnological

  8. Genome size determination in peronosporales (Oomycota) by Feulgen image analysis.

    Science.gov (United States)

    Voglmayr, H; Greilhuber, J

    1998-12-01

    Genome size was determined, by nuclear Feulgen staining and image analysis, in 46 accessions of 31 species of Peronosporales (Oomycota), including important plant pathogens such as Bremia lactucae, Plasmopara viticola, Pseudoperonospora cubensis, and Pseudoperonospora humuli. The 1C DNA contents ranged from 0.046 (45. 6 Mb) to 0.163 pg (159.9 Mb). This is 0.041- to 0.144-fold that of Glycine max (soybean, 1C = 1.134 pg), which was used as an internal standard for genome size determination. The linearity of Feulgen absorbance photometry method over this range was demonstrated by calibration of Aspergillus species (1C = 31-38 Mb) against Glycine, which revealed differences of less than 6% compared to the published CHEF data. The low coefficients of variation (usually between 5 and 10%), repeatability of the results, and compatibility with CHEF data prove the resolution power of Feulgen image analysis. The applicability and limitations of Feulgen photometry are discussed in relation to other methods of genome size determination (CHEF gel electrophoresis, reassociation kinetics, genomic reconstruction) that have been previously applied to Oomycota. Copyright 1998 Academic Press.

  9. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  10. In silico comparative genomic analysis of GABAA receptor transcriptional regulation

    Directory of Open Access Journals (Sweden)

    Joyce Christopher J

    2007-06-01

    Full Text Available Abstract Background Subtypes of the GABAA receptor subunit exhibit diverse temporal and spatial expression patterns. In silico comparative analysis was used to predict transcriptional regulatory features in individual mammalian GABAA receptor subunit genes, and to identify potential transcriptional regulatory components involved in the coordinate regulation of the GABAA receptor gene clusters. Results Previously unreported putative promoters were identified for the β2, γ1, γ3, ε, θ and π subunit genes. Putative core elements and proximal transcriptional factors were identified within these predicted promoters, and within the experimentally determined promoters of other subunit genes. Conserved intergenic regions of sequence in the mammalian GABAA receptor gene cluster comprising the α1, β2, γ2 and α6 subunits were identified as potential long range transcriptional regulatory components involved in the coordinate regulation of these genes. A region of predicted DNase I hypersensitive sites within the cluster may contain transcriptional regulatory features coordinating gene expression. A novel model is proposed for the coordinate control of the gene cluster and parallel expression of the α1 and β2 subunits, based upon the selective action of putative Scaffold/Matrix Attachment Regions (S/MARs. Conclusion The putative regulatory features identified by genomic analysis of GABAA receptor genes were substantiated by cross-species comparative analysis and now require experimental verification. The proposed model for the coordinate regulation of genes in the cluster accounts for the head-to-head orientation and parallel expression of the α1 and β2 subunit genes, and for the disruption of transcription caused by insertion of a neomycin gene in the close vicinity of the α6 gene, which is proximal to a putative critical S/MAR.

  11. A structural genomics analysis of histidine kinase sensor domains

    Science.gov (United States)

    Cheung, Jonah

    2005-11-01

    Histidine kinase sensors are a part of a two-component system of protein signaling in prokaryotes and lower eukaryotes that relay an external environmental signal to an adaptive internal cellular response. Signal transduction occurs via phosphotransfer between a sensor protein and a response regulator which interact in tandem. The sensor is usually a transmembrane protein that contains a conserved cytoplasmic histidine kinase transmitter domain and a modular periplasmic sensor domain. The response regulator is cytoplasmic protein that contains a receiver domain that interacts with the histidine kinase, and an output domain that interacts with regulators of transcription or chemotaxis. My work focuses on the X-ray structure determination of a variety of bacterial sensor domains, based on a structural genomics analysis of the entire sensor domain family. Structures of the NarX, DcuS, LisK, and DctB sensor domains have been solved to atomic resolution, some in both ligand-bound and ligand-free states. Two distinct structural folds have been revealed---all-alpha helical and mixed alpha-beta. An analysis of the structures reveals a possible mechanism of transmembrane signaling in histidine kinase sensors as a sliding-piston motion between transmembrane helices. Although there is great diversity in ligand binding, there appears to be a small number of distinct sensor domain folds for which structural representatives of two have been solved. A final synthesis of the structural information with a comprehensive bio-informatics analysis of all histidine kinase sensor domain sequences allows fold prediction for over 400 sensor domains, in a step towards mapping the entire structural landscape of this protein family.

  12. Analysis of the genome content of Lactococcus garvieae by genomic interspecies microarray hybridization

    Directory of Open Access Journals (Sweden)

    Gibello Alicia

    2010-03-01

    Full Text Available Abstract Background Lactococcus garvieae is a bacterial pathogen that affects different animal species in addition to humans. Despite the widespread distribution and emerging clinical significance of L. garvieae in both veterinary and human medicine, there is almost a complete lack of knowledge about the genetic content of this microorganism. In the present study, the genomic content of L. garvieae CECT 4531 was analysed using bioinformatics tools and microarray-based comparative genomic hybridization (CGH experiments. Lactococcus lactis subsp. lactis IL1403 and Streptococcus pneumoniae TIGR4 were used as reference microorganisms. Results The combination and integration of in silico analyses and in vitro CGH experiments, performed in comparison with the reference microorganisms, allowed establishment of an inter-species hybridization framework with a detection threshold based on a sequence similarity of ≥ 70%. With this threshold value, 267 genes were identified as having an analogue in L. garvieae, most of which (n = 258 have been documented for the first time in this pathogen. Most of the genes are related to ribosomal, sugar metabolism or energy conversion systems. Some of the identified genes, such as als and mycA, could be involved in the pathogenesis of L. garvieae infections. Conclusions In this study, we identified 267 genes that were potentially present in L. garvieae CECT 4531. Some of the identified genes could be involved in the pathogenesis of L. garvieae infections. These results provide the first insight into the genome content of L. garvieae.

  13. Comparative Genome Analysis of Basidiomycete Fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  14. Integrative Genomic Analysis of Complex traits

    DEFF Research Database (Denmark)

    Ehsani, Ali Reza

    In the last decade rapid development in biotechnologies has made it possible to extract extensive information about practically all levels of biological organization. An ever-increasing number of studies are reporting miltilayered datasets on the entire DNA sequence, transceroption, protein...... expression, and metabolite abundance of more and more populations in a multitude of invironments. However, a solid model for including all of this complex information in one analysis, to disentangle genetic variation and the underlying genetic architecture of complex traits and diseases, has not yet been...... proposed. This thesis introduced a novel way to integrate such huge data sets in an efficient and informative procedure to dissect the comæexity of obesity related traits (e.g. body wight, body fat, feed intake, etc) and map the flow from DNA through RNA ending with individual phenotypes....

  15. Computational and Genomic Analysis of Mycobacteriophage: A Longitudinal Study of Technology Engineered Biology Courses That Implemented an Inquiry Based Laboratory Practice Designed to Enhance, Encourage, and Empower Student Learning

    Science.gov (United States)

    Hollowell, Gail P.; Osler, James E.; Hester, April L.

    2015-01-01

    This paper provides an applied research rational for a longitudinal investigation that involved teaching a "Technology Engineered Science Education Course" via an Interactive Laboratory Based Genomics Curriculum. The Technology st Engineering [TE] methodology was first introduced at the SAPES: South Atlantic Philosophy of Education…

  16. Genome-wide analysis of TCP family in tobacco.

    Science.gov (United States)

    Chen, L; Chen, Y Q; Ding, A M; Chen, H; Xia, F; Wang, W F; Sun, Y H

    2016-05-23

    The TCP family is a transcription factor family, members of which are extensively involved in plant growth and development as well as in signal transduction in the response against many physiological and biochemical stimuli. In the present study, 61 TCP genes were identified in tobacco (Nicotiana tabacum) genome. Bioinformatic methods were employed for predicting and analyzing the gene structure, gene expression, phylogenetic analysis, and conserved domains of TCP proteins in tobacco. The 61 NtTCP genes were divided into three diverse groups, based on the division of TCP genes in tomato and Arabidopsis, and the results of the conserved domain and sequence analyses further confirmed the classification of the NtTCP genes. The expression pattern of NtTCP also demonstrated that majority of these genes play important roles in all the tissues, while some special genes exercise their functions only in specific tissues. In brief, the comprehensive and thorough study of the TCP family in other plants provides sufficient resources for studying the structure and functions of TCPs in tobacco.

  17. Comparative Analysis of Fatty Acid Desaturases in Cyanobacterial Genomes

    Directory of Open Access Journals (Sweden)

    Xiaoyuan Chi

    2008-01-01

    Full Text Available Fatty acid desaturases are enzymes that introduce double bonds into the hydrocarbon chains of fatty acids. The fatty acid desaturases from 37 cyanobacterial genomes were identified and classified based upon their conserved histidine-rich motifs and phylogenetic analysis, which help to determine the amounts and distributions of desaturases in cyanobacterial species. The filamentous or N2-fixing cyanobacteria usually possess more types of fatty acid desaturases than that of unicellular species. The pathway of acyl-lipid desaturation for unicellular marine cyanobacteria Synechococcus and Prochlorococcus differs from that of other cyanobacteria, indicating different phylogenetic histories of the two genera from other cyanobacteria isolated from freshwater, soil, or symbiont. Strain Gloeobacter violaceus PCC 7421 was isolated from calcareous rock and lacks thylakoid membranes. The types and amounts of desaturases of this strain are distinct to those of other cyanobacteria, reflecting the earliest divergence of it from the cyanobacterial line. Three thermophilic unicellular strains, Thermosynechococcus elongatus BP-1 and two Synechococcus Yellowstone species, lack highly unsaturated fatty acids in lipids and contain only one Δ9 desaturase in contrast with mesophilic strains, which is probably due to their thermic habitats. Thus, the amounts and types of fatty acid desaturases are various among different cyanobacterial species, which may result from the adaption to environments in evolution.

  18. Genomic analysis for managing small and endangered populations: A case study in Tyrol Grey cattle

    Directory of Open Access Journals (Sweden)

    Gábor eMészáros

    2015-05-01

    Full Text Available Analysis of genomic data is increasingly becoming part of the livestock industry. Therefore the routine collection of genomic information would be an invaluable resource for management of breeding programs in small, endangered populations. The objectives of this project were to analyse 1. linkage disequlibrium decay and the effective population size; 2. Inbreeding level and effective population size (NeROH based on runs of homozygosity (ROH; 3. Prediction of genomic breeding values (GEBV within and across breeds. In addition, the use of genomic information for breed management is discussed. The study was based on all available genotypes of Tyrol Grey AI bulls. ROHs were derived based on regions covering at least 4 Mb, 8 Mb and 16 Mb regions, with the corresponding mean inbreeding coefficients 4.0%, 2.9% and 1.6%, respectively. The NeROH was 125 (NeROH>16Mb, 186 (NeROH>8Mb and 370 (NeROH>4Mb, indicating strict avoidance of close inbreeding in the population.The genomic selection was developed for and is working well in large breeds. Contrary to the expectations, the accuracy of GEBVs with very small within breed reference populations were very high, between 0.13-0.91 and 0.12-0.63, when EBVs and dEBVs were used as pseudo-phenotypes, respectively. Subsequent analyses confirmed the high accuracies being heavily influenced by parent averages. Multi-breed and across breed reference sets gave inconsistent and lower accuracies. Genomic information may have a crucial role in management of small breeds. It allows to assess relatedness between individuals, trends in inbreeding and to take decisions accordingly. These decisions would be based on the real genome architecture, rather than conventional pedigree information, which can be missing or incomplete. We strongly suggest the routine genotyping of all individuals that belong to a small breed in order to facilitate the effective management of endangered livestock populations.

  19. Genomics-enabled analysis of the emergent disease cotton bacterial blight.

    Directory of Open Access Journals (Sweden)

    Anne Z Phillips

    2017-09-01

    Full Text Available Cotton bacterial blight (CBB, an important disease of (Gossypium hirsutum in the early 20th century, had been controlled by resistant germplasm for over half a century. Recently, CBB re-emerged as an agronomic problem in the United States. Here, we report analysis of cotton variety planting statistics that indicate a steady increase in the percentage of susceptible cotton varieties grown each year since 2009. Phylogenetic analysis revealed that strains from the current outbreak cluster with race 18 Xanthomonas citri pv. malvacearum (Xcm strains. Illumina based draft genomes were generated for thirteen Xcm isolates and analyzed along with 4 previously published Xcm genomes. These genomes encode 24 conserved and nine variable type three effectors. Strains in the race 18 clade contain 3 to 5 more effectors than other Xcm strains. SMRT sequencing of two geographically and temporally diverse strains of Xcm yielded circular chromosomes and accompanying plasmids. These genomes encode eight and thirteen distinct transcription activator-like effector genes. RNA-sequencing revealed 52 genes induced within two cotton cultivars by both tested Xcm strains. This gene list includes a homeologous pair of genes, with homology to the known susceptibility gene, MLO. In contrast, the two strains of Xcm induce different clade III SWEET sugar transporters. Subsequent genome wide analysis revealed patterns in the overall expression of homeologous gene pairs in cotton after inoculation by Xcm. These data reveal important insights into the Xcm-G. hirsutum disease complex and strategies for future development of resistant cultivars.

  20. Genomic analysis of mouse retinal development.

    Directory of Open Access Journals (Sweden)

    Seth Blackshaw

    2004-09-01

    Full Text Available The vertebrate retina is comprised of seven major cell types that are generated in overlapping but well-defined intervals. To identify genes that might regulate retinal development, gene expression in the developing retina was profiled at multiple time points using serial analysis of gene expression (SAGE. The expression patterns of 1,051 genes that showed developmentally dynamic expression by SAGE were investigated using in situ hybridization. A molecular atlas of gene expression in the developing and mature retina was thereby constructed, along with a taxonomic classification of developmental gene expression patterns. Genes were identified that label both temporal and spatial subsets of mitotic progenitor cells. For each developing and mature major retinal cell type, genes selectively expressed in that cell type were identified. The gene expression profiles of retinal Müller glia and mitotic progenitor cells were found to be highly similar, suggesting that Müller glia might serve to produce multiple retinal cell types under the right conditions. In addition, multiple transcripts that were evolutionarily conserved that did not appear to encode open reading frames of more than 100 amino acids in length ("noncoding RNAs" were found to be dynamically and specifically expressed in developing and mature retinal cell types. Finally, many photoreceptor-enriched genes that mapped to chromosomal intervals containing retinal disease genes were identified. These data serve as a starting point for functional investigations of the roles of these genes in retinal development and physiology.

  1. HANDS: a tool for genome-wide discovery of subgenome-specific base-identity in polyploids.

    KAUST Repository

    Mithani, Aziz

    2013-09-24

    The analysis of polyploid genomes is problematic because homeologous subgenome sequences are closely related. This relatedness makes it difficult to assign individual sequences to the specific subgenome from which they are derived, and hinders the development of polyploid whole genome assemblies.We here present a next-generation sequencing (NGS)-based approach for assignment of subgenome-specific base-identity at sites containing homeolog-specific polymorphisms (HSPs): \\'HSP base Assignment using NGS data through Diploid Similarity\\' (HANDS). We show that HANDS correctly predicts subgenome-specific base-identity at >90% of assayed HSPs in the hexaploid bread wheat (Triticum aestivum) transcriptome, thus providing a substantial increase in accuracy versus previous methods for homeolog-specific base assignment.We conclude that HANDS enables rapid and accurate genome-wide discovery of homeolog-specific base-identity, a capability having multiple applications in polyploid genomics.

  2. Genomic analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea.

    Directory of Open Access Journals (Sweden)

    Joelle Amselem

    2011-08-01

    Full Text Available Sclerotinia sclerotiorum and Botrytis cinerea are closely related necrotrophic plant pathogenic fungi notable for their wide host ranges and environmental persistence. These attributes have made these species models for understanding the complexity of necrotrophic, broad host-range pathogenicity. Despite their similarities, the two species differ in mating behaviour and the ability to produce asexual spores. We have sequenced the genomes of one strain of S. sclerotiorum and two strains of B. cinerea. The comparative analysis of these genomes relative to one another and to other sequenced fungal genomes is provided here. Their 38-39 Mb genomes include 11,860-14,270 predicted genes, which share 83% amino acid identity on average between the two species. We have mapped the S. sclerotiorum assembly to 16 chromosomes and found large-scale co-linearity with the B. cinerea genomes. Seven percent of the S. sclerotiorum genome comprises transposable elements compared to <1% of B. cinerea. The arsenal of genes associated with necrotrophic processes is similar between the species, including genes involved in plant cell wall degradation and oxalic acid production. Analysis of secondary metabolism gene clusters revealed an expansion in number and diversity of B. cinerea-specific secondary metabolites relative to S. sclerotiorum. The potential diversity in secondary metabolism might be involved in adaptation to specific ecological niches. Comparative genome analysis revealed the basis of differing sexual mating compatibility systems between S. sclerotiorum and B. cinerea. The organization of the mating-type loci differs, and their structures provide evidence for the evolution of heterothallism from homothallism. These data shed light on the evolutionary and mechanistic bases of the genetically complex traits of necrotrophic pathogenicity and sexual mating. This resource should facilitate the functional studies designed to better understand what makes these

  3. Genome reannotation of the lizard Anolis carolinensis based on 14 adult and embryonic deep transcriptomes

    Directory of Open Access Journals (Sweden)

    Eckalbar Walter L

    2013-01-01

    Full Text Available Abstract Background The green anole lizard, Anolis carolinensis, is a key species for both laboratory and field-based studies of evolutionary genetics, development, neurobiology, physiology, behavior, and ecology. As the first non-avian reptilian genome sequenced, A. carolinesis is also a prime reptilian model for comparison with other vertebrate genomes. The public databases of Ensembl and NCBI have provided a first generation gene annotation of the anole genome that relies primarily on sequence conservation with related species. A second generation annotation based on tissue-specific transcriptomes would provide a valuable resource for molecular studies. Results Here we provide an annotation of the A. carolinensis genome based on de novo assembly of deep transcriptomes of 14 adult and embryonic tissues. This revised annotation describes 59,373 transcripts, compared to 16,533 and 18,939 currently for Ensembl and NCBI, and 22,962 predicted protein-coding genes. A key improvement in this revised annotation is coverage of untranslated region (UTR sequences, with 79% and 59% of transcripts containing 5’ and 3’ UTRs, respectively. Gaps in genome sequence from the current A. carolinensis build (Anocar2.0 are highlighted by our identification of 16,542 unmapped transcripts, representing 6,695 orthologues, with less than 70% genomic coverage. Conclusions Incorporation of tissue-specific transcriptome sequence into the A. carolinensis genome annotation has markedly improved its utility for comparative and functional studies. Increased UTR coverage allows for more accurate predicted protein sequence and regulatory analysis. This revised annotation also provides an atlas of gene expression specific to adult and embryonic tissues.

  4. Comparative analysis of microsatellites in chloroplast genomes of lower and higher plants.

    Science.gov (United States)

    George, Biju; Bhatt, Bhavin S; Awasthi, Mayur; George, Binu; Singh, Achuit K

    2015-11-01

    Microsatellites, or simple sequence repeats (SSRs), contain repetitive DNA sequence where tandem repeats of one to six base pairs are present number of times. Chloroplast genome sequences have been  shown to possess extensive variations in the length, number and distribution of SSRs. However, a comparative analysis of chloroplast microsatellites is not available. Considering their potential importance in generating genomic diversity, we have systematically analysed the abundance and distribution of simple and compound microsatellites in 164 sequenced chloroplast genomes from wide range of plants. The key findings of these studies are (1) a large number of mononucleotide repeats as compared to SSR(2-6)(di-, tri-, tetra-, penta-, hexanucleotide repeats) are present in all chloroplast genomes investigated, (2) lower plants such as algae show wide variation in relative abundance, density and distribution of microsatellite repeats as compared to flowering plants, (3) longer SSRs are excluded from coding regions of most chloroplast genomes, (4) GC content has a weak influence on number, relative abundance and relative density of mononucleotide as well as SSR(2-6). However, GC content strongly showed negative correlation with relative density (R (2) = 0.5, P plants possesses relatively more genomic diversity compared to higher plants.

  5. Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

    Science.gov (United States)

    Aljohi, Hasan Awad; Liu, Wanfei; Lin, Qiang; Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O; Alawad, Abdullah O; Al-Sadi, Abdullah M; Hu, Songnian; Yu, Jun

    2016-01-01

    Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.

  6. Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome

    Science.gov (United States)

    Zhao, Yuhui; Zeng, Jingyao; Alamer, Ali; Alanazi, Ibrahim O.; Alawad, Abdullah O.; Al-Sadi, Abdullah M.; Hu, Songnian; Yu, Jun

    2016-01-01

    Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants. PMID:27736909

  7. Genome Sequencing and Analysis of the Biomass-Degrading Fungus Trichoderma reesei (syn. Hypocrea jecorina)

    Energy Technology Data Exchange (ETDEWEB)

    Martinez, Antonio D.; Berka, Randy; Henrissat, Bernard; Saloheimo, Markku; Arvas, Mikko; Baker, Scott E.; Chapman, Jaro d; Chertkov, Olga; Coutinho, Pedro M.; Cullen, Dan; Danchin, Etienne G.; Grigoriev, Igor V.; Harris, Paul; Jackson, Melissa ?.; kubicek, Christian P.; Han, Cliff F.; Ho, Isaac; Larrando, Luis F.; Lopez de Leon, Alfredo; Magnuson, Jon K.; Merino, Sandy; Misra, Monica; Nelson, Beth; Putnam, Nicholas; Robbertse, Barbara; Salamov, Asaf; Schmoll, Monika; Terry, Astrid ?.; Thayer, Nina; Westerholm-Parvinen, Ann; Schoch, Conrad L.; Yao, Jian ?.; Barbote, Ravi; Nelson, Mary Anne; Detter, Chris J.; Bruce, David; Kuske, Cheryl; Xie, Gary; Richardson, P. M.; Rokhsar, Daniel S.; Lucas, Susan; Rubin, Eddie M.; Dunn-Coleman, Nigel; Ward, Michael ?.; Brettin, T.

    2008-05-01

    A major thrust of the white biotechnology movement involves the development of enzyme systems which depolymerize biomass to simple sugars which are subsequently converted to sustainable biofuels (e.g., ethanol) and chemical intermediates. The fungus Trichoderma reesei (syn. Hypocrea jecorina) represents a paradigm for the industrial production of highly efficient cellulases and hemicellulases needed for hydrolysis of biomass polysaccharides. Herein we describe intriguing attributes of the T. reeseigenome in relation to the future of fuel biotechnology. The T. reesei genome sequence was derived using a whole genome shotgun approach combined with finishing work to generate an assembly comprising 89 scaffolds totaling 34 Mbp with few gaps. In total, 9,130 gene models were predicted using a combination of ab initio and sequence similarity-based methods and EST data. Considering the industrial utility and effectiveness of its enzymes, the T. reesei genome surprisingly encodes the fewest cellulases and hemicellulases of any fungus having the ability to hydrolyze plant cell wall polysaccharides and whose genome has been sequenced. Many genes encoding carbohydrate active enzymes are distributed non-randomly in groups or clusters that interestingly lie between regions of synteny with other Sordariomycetes. Additionally, the T. reesei genome contains a multitude of genes encoding biosynthetic pathways for secondary metabolites (possible antibacterial and antifungal compounds) which may promote successful competition and survival in the crowded and competitive soil habitat occupied by T. reesei. Our analysis coupled with the availability of genome sequence data provides a roadmap for construction of enhanced T. reesei strains for industrial applications.

  8. GEGEINTOOL: A Computer-Based Tool for Automated Analysis of Gene-Gene Interactions in Large Epidemiological Studies in Cardiovascular Genomics

    Directory of Open Access Journals (Sweden)

    Oscar Coltell

    2013-06-01

    Full Text Available Current methods of data analysis of gene-gene interactions in complex diseases, after taking into account environmental factors using traditional approaches, are inefficient. High-throughput methods of analysis in large scale studies including thousands of subjects and hundreds of SNPs should be implemented. We developed an integrative computer tool, GEGEINTOOL (GEne- GEne INTeraction tOOL, for large-scale analysis of gene-gene interactions, in human studies of complex diseases including a large number of subjects, SNPs, as well as environmental factors. That resource uses standard statistical packages (SPSS, etc. to build and fit the gene-gene interaction models by means of syntax scripts in predicting one or more continuous or dichotomic phenotypes. Codominant, dominant and recessive genetic interaction models including control for covariates are automatically created for each SNP in order to test the best model. From the standard outputs, GEGEINTOOL extracts a selected set of parameters (regression coefficients, p-values, adjusted means, etc., and groups them in a single MS Excel Spreadsheet. The tool allows editing the set of filter parameters, filtering the selected results depending on p-values, as well as plotting the selected gene-gene interactions to check consistency. In conclusion, GEGEINTOOL is a useful and friendly tool for exploring and identifying gene-gene interactions in complex diseases.

  9. CRISPR/Cas9 Based Genome Editing of Penicillium chrysogenum.

    Science.gov (United States)

    Pohl, C; Kiel, J A K W; Driessen, A J M; Bovenberg, R A L; Nygård, Y

    2016-07-15

    CRISPR/Cas9 based systems have emerged as versatile platforms for precision genome editing in a wide range of organisms. Here we have developed powerful CRISPR/Cas9 tools for marker-based and marker-free genome modifications in Penicillium chrysogenum, a model filamentous fungus and industrially relevant cell factory. The developed CRISPR/Cas9 toolbox is highly flexible and allows editing of new targets with minimal cloning efforts. The Cas9 protein and the sgRNA can be either delivered during transformation, as preassembled CRISPR-Cas9 ribonucleoproteins (RNPs) or expressed from an AMA1 based plasmid within the cell. The direct delivery of the Cas9 protein with in vitro synthesized sgRNA to the cells allows for a transient method for genome engineering that may rapidly be applicable for other filamentous fungi. The expression of Cas9 from an AMA1 based vector was shown to be highly efficient for marker-free gene deletions.

  10. Integrated genomic analysis of breast cancers.

    Science.gov (United States)

    Addou-Klouche, L; Adélaïde, J; Cornen, S; Bekhouche, I; Finetti, P; Guille, A; Sircoulomb, F; Raynaud, S; Bertucci, F; Birnbaum, D; Chaffanet, M

    2012-12-01

    Breast cancer is the most frequent and the most deadly cancer in women in Western countries. Different classifications of disease (anatomoclinical, pathological, prognostic, genetic) are used for guiding the management of patients. Unfortunately, they fail to reflect the whole clinical heterogeneity of the disease. Consequently, molecularly distinct diseases are grouped in similar clinical classes, likely explaining the different clinical outcome between patients in a given class, and the fact that selection of the most appropriate diagnostic or therapeutic strategy for each patient is not done accurately. Today, treatment is efficient in only 70.0-75.0% of cases overall. Our repertoire of efficient drugs is limited but is being expanded with the discovery of new molecular targets for new drugs, based on the identification of candidate oncogenes and tumor suppressor genes (TSG) functionally relevant in disease. Development of new drugs makes therapeutical decisions even more demanding of reliable classifiers and prognostic/predictive tests. Breast cancer is a complex, heterogeneous disease at the molecular level. The combinatorial molecular origin and the heterogeneity of malignant cells, and the variability of the host background, create distinct subgroups of tumors endowed with different phenotypic features such as response to therapy and clinical outcome. Cellular and molecular analyses can identify new classes biologically and clinically relevant, as well as provide new clinically relevant markers and targets. The various stages of mammary tumorigenesis are not clearly defined and the genetic and epigenetic events critical to the development and aggressiveness of breast cancer are not precisely known. Because the phenotype of tumors is dependent on many genes, a large-scale and integrated molecular characterization of the genetic and epigenetic alterations and gene expression deregulation should allow the identification of new molecular classes clinically

  11. Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets

    Directory of Open Access Journals (Sweden)

    Amjad Ali

    2015-01-01

    Full Text Available Helicobacter pylori is a human gastric pathogen implicated as the major cause of peptic ulcer and second leading cause of gastric cancer (~70% around the world. Conversely, an increased resistance to antibiotics and hindrances in the development of vaccines against H. pylori are observed. Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome approach; the predicted conserved gene families (1,193 constitute ~77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost homolog proteins were characterized as universal therapeutic targets against H. pylori based on their functional annotation and protein-protein interaction. Finally, pathogenomics and genome plasticity analysis revealed 3 highly conserved and 2 highly variable putative pathogenicity islands in all of the H. pylori genomes been analyzed.

  12. CRISPR-based screening of genomic island excision events in bacteria.

    Science.gov (United States)

    Selle, Kurt; Klaenhammer, Todd R; Barrangou, Rodolphe

    2015-06-30

    Genomic analysis of Streptococcus thermophilus revealed that mobile genetic elements (MGEs) likely contributed to gene acquisition and loss during evolutionary adaptation to milk. Clustered regularly interspaced short palindromic repeats-CRISPR-associated genes (CRISPR-Cas), the adaptive immune system in bacteria, limits genetic diversity by targeting MGEs including bacteriophages, transposons, and plasmids. CRISPR-Cas systems are widespread in streptococci, suggesting that the interplay between CRISPR-Cas systems and MGEs is one of the driving forces governing genome homeostasis in this genus. To investigate the genetic outcomes resulting from CRISPR-Cas targeting of integrated MGEs, in silico prediction revealed four genomic islands without essential genes in lengths from 8 to 102 kbp, totaling 7% of the genome. In this study, the endogenous CRISPR3 type II system was programmed to target the four islands independently through plasmid-based expression of engineered CRISPR arrays. Targeting lacZ within the largest 102-kbp genomic island was lethal to wild-type cells and resulted in a reduction of up to 2.5-log in the surviving population. Genotyping of Lac(-) survivors revealed variable deletion events between the flanking insertion-sequence elements, all resulting in elimination of the Lac-encoding island. Chimeric insertion sequence footprints were observed at the deletion junctions after targeting all of the four genomic islands, suggesting a common mechanism of deletion via recombination between flanking insertion sequences. These results established that self-targeting CRISPR-Cas systems may direct significant evolution of bacterial genomes on a population level, influencing genome homeostasis and remodeling.

  13. BioMet Toolbox: genome-wide analysis of metabolism

    OpenAIRE

    Cvijovic, M.; R. Olivares-Hernandez; Agren, R.; Dahr, N.; Vongsangnak, W.; Nookaew, I.; K. R. Patil; Nielsen, J.

    2010-01-01

    The rapid progress of molecular biology tools for directed genetic modifications, accurate quantitative experimental approaches, high-throughput measurements, together with development of genome sequencing has made the foundation for a new area of metabolic engineering that is driven by metabolic models. Systematic analysis of biological processes by means of modelling and simulations has made the identification of metabolic networks and prediction of metabolic capabilities under different co...

  14. Estimation of the Whitefly Bemisia tabaci Genome Size Based on k-mer and Flow Cytometric Analyses

    Directory of Open Access Journals (Sweden)

    Wenbo Chen

    2015-07-01

    Full Text Available Whiteflies of the Bemisia tabaci (Hemiptera: Aleyrodidae cryptic species complex are among the most important agricultural insect pests in the world. These phloem-feeding insects can colonize over 1000 species of plants worldwide and inflict severe economic losses to crops, mainly through the transmission of pathogenic viruses. Surprisingly, there is very little genomic information about whiteflies. As a starting point to genome sequencing, we report a new estimation of the genome size of the B. tabaci B biotype or Middle East-Asia Minor 1 (MEAM1 population. Using an isogenic whitefly colony with over 6500 haploid male individuals for genomic DNA, three paired-end genomic libraries with insert sizes of ~300 bp, 500 bp and 1 Kb were constructed and sequenced on an Illumina HiSeq 2500 system. A total of ~50 billion base pairs of sequences were obtained from each library. K-mer analysis using these sequences revealed that the genome size of the whitefly was ~682.3 Mb. In addition, the flow cytometric analysis estimated the haploid genome size of the whitefly to be ~690 Mb. Considering the congruency between both estimation methods, we predict the haploid genome size of B. tabaci MEAM1 to be ~680–690 Mb. Our data provide a baseline for ongoing efforts to assemble and annotate the B. tabaci genome.

  15. Pan-Genome Analysis of Brazilian Lineage A Amoebal Mimiviruses

    Directory of Open Access Journals (Sweden)

    Felipe L. Assis

    2015-06-01

    Full Text Available Since the recent discovery of Samba virus, the first representative of the family Mimiviridae from Brazil, prospecting for mimiviruses has been conducted in different environmental conditions in Brazil. Recently, we isolated using Acanthamoeba sp. three new mimiviruses, all of lineage A of amoebal mimiviruses: Kroon virus from urban lake water; Amazonia virus from the Brazilian Amazon river; and Oyster virus from farmed oysters. The aims of this work were to sequence and analyze the genome of these new Brazilian mimiviruses (mimi-BR and update the analysis of the Samba virus genome. The genomes of Samba virus, Amazonia virus and Oyster virus were 97%–99% similar, whereas Kroon virus had a low similarity (90%–91% with other mimi-BR. A total of 3877 proteins encoded by mimi-BR were grouped into 974 orthologous clusters. In addition, we identified three new ORFans in the Kroon virus genome. Additional work is needed to expand our knowledge of the diversity of mimiviruses from Brazil, including if and why among amoebal mimiviruses those of lineage A predominate in the Brazilian environment.

  16. Analysis of the core genome and pangenome of Pseudomonas putida.

    Science.gov (United States)

    Udaondo, Zulema; Molina, Lázaro; Segura, Ana; Duque, Estrella; Ramos, Juan L

    2016-10-01

    Pseudomonas putida are strict aerobes that proliferate in a range of temperate niches and are of interest for environmental applications due to their capacity to degrade pollutants and ability to promote plant growth. Furthermore solvent-tolerant strains are useful for biosynthesis of added-value chemicals. We present a comprehensive comparative analysis of nine strains and the first characterization of the Pseudomonas putida pangenome. The core genome of P. putida comprises approximately 3386 genes. The most abundant genes within the core genome are those that encode nutrient transporters. Other conserved genes include those for central carbon metabolism through the Entner-Doudoroff pathway, the pentose phosphate cycle, arginine and proline metabolism, and pathways for degradation of aromatic chemicals. Genes that encode transporters, enzymes and regulators for amino acid metabolism (synthesis and degradation) are all part of the core genome, as well as various electron transporters, which enable aerobic metabolism under different oxygen regimes. Within the core genome are 30 genes for flagella biosynthesis and 12 key genes for biofilm formation. Pseudomonas putida strains share 85% of the coding regions with Pseudomonas aeruginosa; however, in P. putida, virulence factors such as exotoxins and type III secretion systems are absent.

  17. A GeneTrek analysis of the maize genome.

    Science.gov (United States)

    Liu, Renyi; Vitte, Clémentine; Ma, Jianxin; Mahama, A Assibi; Dhliwayo, Thanda; Lee, Michael; Bennetzen, Jeffrey L

    2007-07-10

    Analysis of the sequences of 74 randomly selected BACs demonstrated that the maize nuclear genome contains approximately 37,000 candidate genes with homologues in other plant species. An additional approximately 5,500 predicted genes are severely truncated and probably pseudogenes. The distribution of genes is uneven, with approximately 30% of BACs containing no genes. BAC gene density varies from 0 to 7.9 per 100 kb, whereas most gene islands contain only one gene. The average number of genes per gene island is 1.7. Only 72% of these genes show collinearity with the rice genome. Particular LTR retrotransposon families (e.g., Gyma) are enriched on gene-free BACs, most of which do not come from pericentromeres or other large heterochromatic regions. Gene-containing BACs are relatively enriched in different families of LTR retrotransposons (e.g., Ji). Two major bursts of LTR retrotransposon activity in the last 2 million years are responsible for the large size of the maize genome, but only the more recent of these is well represented in gene-containing BACs, suggesting that LTR retrotransposons are more efficiently removed in these domains. The results demonstrate that sample sequencing and careful annotation of a few randomly selected BACs can provide a robust description of a complex plant genome.

  18. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  19. Genome-association analysis of Korean Holstein milk traits using genomic estimated breeding value

    Directory of Open Access Journals (Sweden)

    Donghyun Shin

    2017-03-01

    Full Text Available Objective Holsteins are known as the world’s highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein using Korean Holstein data. Methods This study was performed using single nucleotide polymorphism (SNP chip data (Illumina BovineSNP50 Beadchip of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. Results We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. Conclusion This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins.

  20. Identification and phenotypic characterization of a second collagen adhesin, Scm, and genome-based identification and analysis of 13 other predicted MSCRAMMs, including four distinct pilus loci, in Enterococcus faecium.

    Science.gov (United States)

    Sillanpää, Jouko; Nallapareddy, Sreedhar R; Prakash, Vittal P; Qin, Xiang; Höök, Magnus; Weinstock, George M; Murray, Barbara E

    2008-10-01

    Attention has recently been drawn to Enterococcus faecium because of an increasing number of nosocomial infections caused by this species and its resistance to multiple antibacterial agents. However, relatively little is known about the pathogenic determinants of this organism. We have previously identified a cell-wall-anchored collagen adhesin, Acm, produced by some isolates of E. faecium, and a secreted antigen, SagA, exhibiting broad-spectrum binding to extracellular matrix proteins. Here, we analysed the draft genome of strain TX0016 for potential microbial surface components recognizing adhesive matrix molecules (MSCRAMMs). Genome-based bioinformatics identified 22 predicted cell-wall-anchored E. faecium surface proteins (Fms), of which 15 (including Acm) had characteristics typical of MSCRAMMs, including predicted folding into a modular architecture with multiple immunoglobulin-like domains. Functional characterization of one [Fms10; redesignated second collagen adhesin of E. faecium (Scm)] revealed that recombinant Scm(65) (A- and B-domains) and Scm(36) (A-domain) bound to collagen type V efficiently in a concentration-dependent manner, bound considerably less to collagen type I and fibrinogen, and differed from Acm in their binding specificities to collagen types IV and V. Results from far-UV circular dichroism measurements of recombinant Scm(36) and of Acm(37) indicated that these proteins were rich in beta-sheets, supporting our folding predictions. Whole-cell ELISA and FACS analyses unambiguously demonstrated surface expression of Scm in most E. faecium isolates. Strikingly, 11 of the 15 predicted MSCRAMMs clustered in four loci, each with a class C sortase gene; nine of these showed similarity to Enterococcus faecalis Ebp pilus subunits and also contained motifs essential for pilus assembly. Antibodies against one of the predicted major pilus proteins, Fms9 (redesignated EbpC(fm)), detected a 'ladder' pattern of high-molecular-mass protein bands in a

  1. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii genome.

    Directory of Open Access Journals (Sweden)

    Byrappa Venkatesh

    2007-04-01

    Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

  2. Survey Sequencing and Comparative Analysis of the Elephant Shark (Callorhinchus milii) Genome

    Science.gov (United States)

    Venkatesh, Byrappa; Kirkness, Ewen F; Loh, Yong-Hwee; Halpern, Aaron L; Lee, Alison P; Johnson, Justin; Dandona, Nidhi; Viswanathan, Lakshmi D; Tay, Alice; Venter, J. Craig; Strausberg, Robert L; Brenner, Sydney

    2007-01-01

    Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras) provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4× coverage) and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element–like and long interspersed element–like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes. PMID:17407382

  3. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    Energy Technology Data Exchange (ETDEWEB)

    Angelova, Angelina [University of Arizona; Park, Sang-Hycuk [University of Arizona; Kyndt, John [Bellevue University; Fitzsimmons, Kevin [University of Arizona; Brown, Judith K [University of Arizona

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis. The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 K