WorldWideScience

Sample records for genomics project predict

  1. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  2. Genomic Prediction in Barley

    DEFF Research Database (Denmark)

    Edriss, Vahid; Cericola, Fabio; Jensen, Jens D;

    Genomic prediction uses markers (SNPs) across the whole genome to predict individual breeding values at an early growth stage potentially before large scale phenotyping. One of the applications of genomic prediction in plant breeding is to identify the best individual candidate lines to contribut...

  3. Genomic Prediction in Barley

    DEFF Research Database (Denmark)

    Edriss, Vahid; Cericola, Fabio; Jensen, Jens D;

    2015-01-01

    Genomic prediction uses markers (SNPs) across the whole genome to predict individual breeding values at an early growth stage potentially before large scale phenotyping. One of the applications of genomic prediction in plant breeding is to identify the best individual candidate lines to contribut...

  4. Genomic Prediction in Barley

    DEFF Research Database (Denmark)

    Edriss, Vahid; Cericola, Fabio; Jensen, Jens D

    2015-01-01

    Genomic prediction uses markers (SNPs) across the whole genome to predict individual breeding values at an early growth stage potentially before large scale phenotyping. One of the applications of genomic prediction in plant breeding is to identify the best individual candidate lines to contribute...... to next generation. The main goal of this study was to see the potential of using genomic prediction in a commercial Barley breeding program. The data used in this study was from Nordic Seed company which is located in Denmark. Around 350 advanced lines were genotyped with 9K Barely chip from Illumina...

  5. Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Block, S. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Cornwall, J. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Dally, W. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Dyson, F. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Fortson, N. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Joyce, G. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Kimble, H. J. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Lewis, N. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Max, C. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Prince, T. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Schwitters, R. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Weinberger, P. [The MITRE Corporation, McLean, VA (US). JASON Program Office; Woodin, W. H. [The MITRE Corporation, McLean, VA (US). JASON Program Office

    1998-01-04

    The study reviews Department of Energy supported aspects of the United States Human Genome Project, the joint National Institutes of Health/Department of Energy program to characterize all human genetic material, to discover the set of human genes, and to render them accessible for further biological study. The study concentrates on issues of technology, quality assurance/control, and informatics relevant to current effort on the genome project and needs beyond it. Recommendations are presented on areas of the genome program that are of particular interest to and supported by the Department of Energy.

  6. The Materials Genome Project

    Science.gov (United States)

    Aourag, H.

    2008-09-01

    In the past, the search for new and improved materials was characterized mostly by the use of empirical, trial- and-error methods. This picture of materials science has been changing as the knowledge and understanding of fundamental processes governing a material's properties and performance (namely, composition, structure, history, and environment) have increased. In a number of cases, it is now possible to predict a material's properties before it has even been manufactured thus greatly reducing the time spent on testing and development. The objective of modern materials science is to tailor a material (starting with its chemical composition, constituent phases, and microstructure) in order to obtain a desired set of properties suitable for a given application. In the short term, the traditional "empirical" methods for developing new materials will be complemented to a greater degree by theoretical predictions. In some areas, computer simulation is already used by industry to weed out costly or improbable synthesis routes. Can novel materials with optimized properties be designed by computers? Advances in modelling methods at the atomic level coupled with rapid increases in computer capabilities over the last decade have led scientists to answer this question with a resounding "yes'. The ability to design new materials from quantum mechanical principles with computers is currently one of the fastest growing and most exciting areas of theoretical research in the world. The methods allow scientists to evaluate and prescreen new materials "in silico" (in vitro), rather than through time consuming experimentation. The Materials Genome Project is to pursue the theory of large scale modeling as well as powerful methods to construct new materials, with optimized properties. Indeed, it is the intimate synergy between our ability to predict accurately from quantum theory how atoms can be assembled to form new materials and our capacity to synthesize novel materials atom

  7. The Chlamydomonas genome project: a decade on

    Science.gov (United States)

    Blaby, Ian K.; Blaby-Haas, Crysten; Tourasse, Nicolas; Hom, Erik F. Y.; Lopez, David; Aksoy, Munevver; Grossman, Arthur; Umen, James; Dutcher, Susan; Porter, Mary; King, Stephen; Witman, George; Stanke, Mario; Harris, Elizabeth H.; Goodstein, David; Grimwood, Jane; Schmutz, Jeremy; Vallon, Olivier; Merchant, Sabeeha S.; Prochnik, Simon

    2014-01-01

    The green alga Chlamydomonas reinhardtii is a popular unicellular organism for studying photosynthesis, cilia biogenesis and micronutrient homeostasis. Ten years since its genome project was initiated, an iterative process of improvements to the genome and gene predictions has propelled this organism to the forefront of the “omics” era. Housed at Phytozome, the Joint Genome Institute’s (JGI) plant genomics portal, the most up-to-date genomic data include a genome arranged on chromosomes and high-quality gene models with alternative splice forms supported by an abundance of RNA-Seq data. Here, we present the past, present and future of Chlamydomonas genomics. Specifically, we detail progress on genome assembly and gene model refinement, discuss resources for gene annotations, functional predictions and locus ID mapping between versions and, importantly, outline a standardized framework for naming genes. PMID:24950814

  8. GIPSy: Genomic island prediction software.

    Science.gov (United States)

    Soares, Siomar C; Geyik, Hakan; Ramos, Rommel T J; de Sá, Pablo H C G; Barbosa, Eudes G V; Baumbach, Jan; Figueiredo, Henrique C P; Miyoshi, Anderson; Tauch, Andreas; Silva, Artur; Azevedo, Vasco

    2016-08-20

    Bacteria are highly diverse organisms that are able to adapt to a broad range of environments and hosts due to their high genomic plasticity. Horizontal gene transfer plays a pivotal role in this genome plasticity and in evolution by leaps through the incorporation of large blocks of genome sequences, ordinarily known as genomic islands (GEIs). GEIs may harbor genes encoding virulence, metabolism, antibiotic resistance and symbiosis-related functions, namely pathogenicity islands (PAIs), metabolic islands (MIs), resistance islands (RIs) and symbiotic islands (SIs). Although many software for the prediction of GEIs exist, they only focus on PAI prediction and present other limitations, such as complicated installation and inconvenient user interfaces. Here, we present GIPSy, the genomic island prediction software, a standalone and user-friendly software for the prediction of GEIs, built on our previously developed pathogenicity island prediction software (PIPS). We also present four application cases in which we crosslink data from literature to PAIs, MIs, RIs and SIs predicted by GIPSy. Briefly, GIPSy correctly predicted the following previously described GEIs: 13 PAIs larger than 30kb in Escherichia coli CFT073; 1 MI for Burkholderia pseudomallei K96243, which seems to be a miscellaneous island; 1 RI of Acinetobacter baumannii AYE, named AbaR1; and, 1 SI of Mesorhizobium loti MAFF303099 presenting a mosaic structure. GIPSy is the first life-style-specific genomic island prediction software to perform analyses of PAIs, MIs, RIs and SIs, opening a door for a better understanding of bacterial genome plasticity and the adaptation to new traits.

  9. Malaria Genome Sequencing Project

    Science.gov (United States)

    2004-01-01

    million cases and up to 2.7 million A whole chromosome shotgun sequencing strategy was used to deaths from malaria each year. The mortality levels are...deaths from malaria each year. The mortality levels are greatest in determine the genome sequence of P. falciparum clone 3D7. This sub-Saharan Africa...aminolevulinic acid dehydratase. Cura . Genet. 40, 391-398 (2002). 15. Lasonder, E. et al Analysis of the Plasmodium falciparum proteome by high-accuracy mass

  10. The Human Genome Diversity Project

    Energy Technology Data Exchange (ETDEWEB)

    Cavalli-Sforza, L. [Stanford Univ., CA (United States)

    1994-12-31

    The Human Genome Diversity Project (HGD Project) is an international anthropology project that seeks to study the genetic richness of the entire human species. This kind of genetic information can add a unique thread to the tapestry knowledge of humanity. Culture, environment, history, and other factors are often more important, but humanity`s genetic heritage, when analyzed with recent technology, brings another type of evidence for understanding species` past and present. The Project will deepen the understanding of this genetic richness and show both humanity`s diversity and its deep and underlying unity. The HGD Project is still largely in its planning stages, seeking the best ways to reach its goals. The continuing discussions of the Project, throughout the world, should improve the plans for the Project and their implementation. The Project is as global as humanity itself; its implementation will require the kinds of partnerships among different nations and cultures that make the involvement of UNESCO and other international organizations particularly appropriate. The author will briefly discuss the Project`s history, describe the Project, set out the core principles of the Project, and demonstrate how the Project will help combat the scourge of racism.

  11. SIFT missense predictions for genomes.

    Science.gov (United States)

    Vaser, Robert; Adusumalli, Swarnaseetha; Leng, Sim Ngak; Sikic, Mile; Ng, Pauline C

    2016-01-01

    The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4G's precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from http://sift-dna.org/sift4g.

  12. All about the Human Genome Project (HGP)

    Science.gov (United States)

    ... Genome Resources Access to the full human sequence All About The Human Genome Project (HGP) The Human ... an international research effort to sequence and map all of the genes - together known as the genome - ...

  13. Efficient marker data utilization in genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid

    Genomic prediction is a novel method to recognize the best animals for breeding. The aim of this PhD is to improve the accuracy of genomic prediction in dairy cattle by effeiently utilizing marker data. The thesis focuses on three aspects for improving the genomc prediction, which are: criteria...

  14. Parasite Genome Projects and the Trypanosoma cruzi Genome Initiative

    Directory of Open Access Journals (Sweden)

    Wim Degrave

    1997-11-01

    Full Text Available Since the start of the human genome project, a great number of genome projects on other "model" organism have been initiated, some of them already completed. Several initiatives have also been started on parasite genomes, mainly through support from WHO/TDR, involving North-South and South-South collaborations, and great hopes are vested in that these initiatives will lead to new tools for disease control and prevention, as well as to the establishment of genomic research technology in developing countries. The Trypanosoma cruzi genome project, using the clone CL-Brener as starting point, has made considerable progress through the concerted action of more than 20 laboratories, most of them in the South. A brief overview of the current state of the project is given

  15. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using eithe...

  16. Imaging genomic mapping of an invasive MRI phenotype predicts patient outcome and metabolic dysfunction: a TCGA glioma phenotype research group project.

    Science.gov (United States)

    Colen, Rivka R; Vangel, Mark; Wang, Jixin; Gutman, David A; Hwang, Scott N; Wintermark, Max; Jain, Rajan; Jilwan-Nicolas, Manal; Chen, James Y; Raghavan, Prashant; Holder, Chad A; Rubin, Daniel; Huang, Eric; Kirby, Justin; Freymann, John; Jaffe, Carl C; Flanders, Adam; Zinn, Pascal O

    2014-06-02

    Invasion of tumor cells into adjacent brain parenchyma is a major cause of treatment failure in glioblastoma. Furthermore, invasive tumors are shown to have a different genomic composition and metabolic abnormalities that allow for a more aggressive GBM phenotype and resistance to therapy. We thus seek to identify those genomic abnormalities associated with a highly aggressive and invasive GBM imaging-phenotype. We retrospectively identified 104 treatment-naïve glioblastoma patients from The Cancer Genome Atlas (TCGA) whom had gene expression profiles and corresponding MR imaging available in The Cancer Imaging Archive (TCIA). The standardized VASARI feature-set criteria were used for the qualitative visual assessments of invasion. Patients were assigned to classes based on the presence (Class A) or absence (Class B) of statistically significant invasion parameters to create an invasive imaging signature; imaging genomic analysis was subsequently performed using GenePattern Comparative Marker Selection module (Broad Institute). Our results show that patients with a combination of deep white matter tracts and ependymal invasion (Class A) on imaging had a significant decrease in overall survival as compared to patients with absence of such invasive imaging features (Class B) (8.7 versus 18.6 months, p < 0.001). Mitochondrial dysfunction was the top canonical pathway associated with Class A gene expression signature. The MYC oncogene was predicted to be the top activation regulator in Class A. We demonstrate that MRI biomarker signatures can identify distinct GBM phenotypes associated with highly significant survival differences and specific molecular pathways. This study identifies mitochondrial dysfunction as the top canonical pathway in a very aggressive GBM phenotype. Thus, imaging-genomic analyses may prove invaluable in detecting novel targetable genomic pathways.

  17. The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects.

    Science.gov (United States)

    Papanicolaou, Alexie

    2016-01-01

    Many research programs on non-model species biology have been empowered by genomics. In turn, genomics is underpinned by a reference sequence and ancillary information created by so-called "genome projects". The most reliable genome projects are the ones created as part of an active research program and designed to address specific questions but their life extends past publication. In this opinion paper I outline four key insights that have facilitated maintaining genomic communities: the key role of computational capability, the iterative process of building genomic resources, the value of community participation and the importance of manual curation. Taken together, these ideas can and do ensure the longevity of genome projects and the growing non-model species community can use them to focus a discussion with regards to its future genomic infrastructure.

  18. Accounting for discovery bias in genomic prediction

    Science.gov (United States)

    Our objective was to evaluate an approach to mitigating discovery bias in genomic prediction. Accuracy may be improved by placing greater emphasis on regions of the genome expected to be more influential on a trait. Methods emphasizing regions result in a phenomenon known as “discovery bias” if info...

  19. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project

    Data.gov (United States)

    U.S. Environmental Protection Agency — Data from a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) demonstrating using predictive computational...

  20. Genomics :GTL project quarterly report April 2005.

    Energy Technology Data Exchange (ETDEWEB)

    Rintoul, Mark Daniel; Martino, Anthony A.; Palenik, Brian; Heffelfinger, Grant S.; Xu, Ying; Geist, Al; Gorin, Andrey

    2005-11-01

    This SAND report provides the technical progress through April 2005 of the Sandia-led project, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling'', funded by the DOE Office of Science GenomicsGTL Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO{sub 2} are important terms in the global environmental response to anthropogenic atmospheric inputs of CO{sub 2} and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microamy experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In

  1. Cancer Genome Anatomy Project | Office of Cancer Genomics

    Science.gov (United States)

    The National Cancer Institute (NCI) Cancer Genome Anatomy Project (CGAP) is an online resource designed to provide the research community access to biological tissue characterization data. Request a free copy of the CGAP Website Virtual Tour CD from ocg@mail.nih.gov.

  2. [Human genomic project and human genomic haplotype map project: opportunitiy, challenge and strategy in stomatology].

    Science.gov (United States)

    Wu, Rui-qing; Zeng, Xin; Wang, Zhi

    2010-08-01

    The human genomic project and the international HapMap project were designed to create a genome-wide database of patterns of human genetic variation, with the expectation that these patterns would be useful for genetic association studies of common diseases, thus lead to molecular diagnosis and personnel therapy. The article briefly reviewed the creation, target and achievement of those two projects. Furthermore, the authors have given four suggestions in facing to the opportunities and challenges brought by the two projects, including cultivation improvement of elites, cross binding of multi-subjects, strengthening construction of research base and initiation of natural key scientific project.

  3. Genomic Prediction of Barley Hybrid Performance

    Directory of Open Access Journals (Sweden)

    Norman Philipp

    2016-07-01

    Full Text Available Hybrid breeding in barley ( L. offers great opportunities to accelerate the rate of genetic improvement and to boost yield stability. A crucial requirement consists of the efficient selection of superior hybrid combinations. We used comprehensive phenotypic and genomic data from a commercial breeding program with the goal of examining the potential to predict the hybrid performances. The phenotypic data were comprised of replicated grain yield trials for 385 two-way and 408 three-way hybrids evaluated in up to 47 environments. The parental lines were genotyped using a 3k single nucleotide polymorphism (SNP array based on an Illumina Infinium assay. We implemented ridge regression best linear unbiased prediction modeling for additive and dominance effects and evaluated the prediction ability using five-fold cross validations. The prediction ability of hybrid performances based on general combining ability (GCA effects was moderate, amounting to 0.56 and 0.48 for two- and three-way hybrids, respectively. The potential of GCA-based hybrid prediction requires that both parental components have been evaluated in a hybrid background. This is not necessary for genomic prediction for which we also observed moderate cross-validated prediction abilities of 0.51 and 0.58 for two- and three-way hybrids, respectively. This exemplifies the potential of genomic prediction in hybrid barley. Interestingly, prediction ability using the two-way hybrids as training population and the three-way hybrids as test population or vice versa was low, presumably, because of the different genetic makeup of the parental source populations. Consequently, further research is needed to optimize genomic prediction approaches combining different source populations in barley.

  4. The Human Genome Project and Biology Education.

    Science.gov (United States)

    McInerney, Joseph D.

    1996-01-01

    Highlights the importance of the Human Genome Project in educating the public about genetics. Discusses four challenges that science educators must address: teaching for conceptual understanding, the nature of science, the personal and social impact of science and technology, and the principles of technology. Contains 45 references. (JRH)

  5. Justice and the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Murphy, T.F.; Lappe, M. [eds.

    1992-12-31

    Most of the essays gathered in this volume were first presented at a conference, Justice and the Human Genome, in Chicago in early November, 1991. The goal of the, conference was to consider questions of justice as they are and will be raised by the Human Genome Project. To achieve its goal of identifying and elucidating the challenges of justice inherent in genomic research and its social applications the conference drew together in one forum members from academia, medicine, and industry with interests divergent as rate-setting for insurance, the care of newborns, and the history of ethics. The essays in this volume address a number of theoretical and practical concerns relative to the meaning of genomic research.

  6. Justice and the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Murphy, T.F.; Lappe, M. (eds.)

    1992-01-01

    Most of the essays gathered in this volume were first presented at a conference, Justice and the Human Genome, in Chicago in early November, 1991. The goal of the, conference was to consider questions of justice as they are and will be raised by the Human Genome Project. To achieve its goal of identifying and elucidating the challenges of justice inherent in genomic research and its social applications the conference drew together in one forum members from academia, medicine, and industry with interests divergent as rate-setting for insurance, the care of newborns, and the history of ethics. The essays in this volume address a number of theoretical and practical concerns relative to the meaning of genomic research.

  7. Implications of the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Kitcher, P.

    1998-11-01

    The Human Genome Project (HGP), launched in 1991, aims to map and sequence the human genome by 2006. During the fifteen-year life of the project, it is projected that $3 billion in federal funds will be allocated to it. The ultimate aims of spending this money are to analyze the structure of human DNA, to identify all human genes, to recognize the functions of those genes, and to prepare for the biology and medicine of the twenty-first century. The following summary examines some of the implications of the program, concentrating on its scientific import and on the ethical and social problems that it raises. Its aim is to expose principles that might be used in applying the information which the HGP will generate. There is no attempt here to translate the principles into detailed proposals for legislation. Arguments and discussion can be found in the full report, but, like this summary, that report does not contain any legislative proposals.

  8. nGASP - the nematode genome annotation assessment project

    Energy Technology Data Exchange (ETDEWEB)

    Coghlan, A; Fiedler, T J; McKay, S J; Flicek, P; Harris, T W; Blasiar, D; Allen, J; Stein, L D

    2008-12-19

    While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C

  9. The human genome project and the future of medical practice ...

    African Journals Online (AJOL)

    The human genome project and the future of medical practice. ... the planning stages of the human genome project, the technology and sequence data ... the quality of healthcare available in the resource-rich and the resource-poor countries.

  10. Origins of the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Cook-Deegan, Robert

    1993-07-01

    The human genome project was borne of technology, grew into a science bureaucracy in the US and throughout the world, and is now being transformed into a hybrid academic and commercial enterprise. The next phase of the project promises to veer more sharply toward commercial application, harnessing both the technical prowess of molecular biology and the rapidly growing body of knowledge about DNA structure to the pursuit of practical benefits. Faith that the systematic analysis of DNA structure will prove to be a powerful research tool underlies the rationale behind the genome project. The notion that most genetic information is embedded in the sequence of CNA base pairs comprising chromosomes is a central tenet. A rough analogy is to liken an organism's genetic code to computer code. The coal of the genome project, in this parlance, is to identify and catalog 75,000 or more files (genes) in the software that directs construction of a self-modifying and self-replicating system -- a living organism.

  11. Origins of the Human Genome Project

    Science.gov (United States)

    Cook-Deegan, Robert (Affiliation: Institute of Medicine, National Academy of Sciences)

    1993-07-01

    The human genome project was borne of technology, grew into a science bureaucracy in the United States and throughout the world, and is now being transformed into a hybrid academic and commercial enterprise. The next phase of the project promises to veer more sharply toward commercial application, harnessing both the technical prowess of molecular biology and the rapidly growing body of knowledge about DNA structure to the pursuit of practical benefits. Faith that the systematic analysis of DNA structure will prove to be a powerful research tool underlies the rationale behind the genome project. The notion that most genetic information is embedded in the sequence of CNA base pairs comprising chromosomes is a central tenet. A rough analogy is to liken an organism's genetic code to computer code. The coal of the genome project, in this parlance, is to identify and catalog 75,000 or more files (genes) in the software that directs construction of a self-modifying and self-replicating system -- a living organism.

  12. Predicting biological networks from genomic data

    DEFF Research Database (Denmark)

    Harrington, Eoghan D; Jensen, Lars J; Bork, Peer

    2008-01-01

    Continuing improvements in DNA sequencing technologies are providing us with vast amounts of genomic data from an ever-widening range of organisms. The resulting challenge for bioinformatics is to interpret this deluge of data and place it back into its biological context. Biological networks...... provide a conceptual framework with which we can describe part of this context, namely the different interactions that occur between the molecular components of a cell. Here, we review the computational methods available to predict biological networks from genomic sequence data and discuss how they relate...

  13. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;

    in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... prove useful for less heritable traits such as diseases and fertility...

  14. An overview of the human genome project

    Energy Technology Data Exchange (ETDEWEB)

    Batzer, M.A.

    1994-01-01

    The human genome project is one of the most ambitious scientific projects to date, with the ultimate goal being a nucleotide sequence for all four billion bases of human DNA. In the process of determining the nucleotide sequence for each base, the location, function, and regulatory regions from the estimated 100,000 human genes will be identified. The genome project itself relies upon maps of the human genetic code derived from several different levels of resolution. Genetic linkage analysis provides a low resolution genome map. The information for genetic linkage maps is derived from the analysis of chromosome specific markers such as Sequence Tagged Sites (STSs), Variable Number of Tandem Repeats (VNTRs) or other polymorphic (highly informative) loci in a number of different-families. Using this information the location of an unknown disease gene can be limited to a region comprised of one million base pairs of DNA or less. After this point, one must construct or have access to a physical map of the region of interest. Physical mapping involves the construction of an ordered overlapping (contiguous) set of recombinant DNA clones. These clones may be derived from a number of different vectors including cosmids, Bacterial Artificial Chromosomes (BACs), P1 derived Artificial Chromosomes (PACs), somatic cell hybrids, or Yeast Artificial Chromosomes (YACs). The ultimate goal for physical mapping is to establish a completely overlapping (contiguous) set of clones for the entire genome. After a gene or region of interest has been localized using physical mapping the nucleotide sequence is determined. The overlap between genetic mapping, physical mapping and DNA sequencing has proven to be a powerful tool for the isolation of disease genes through positional cloning.

  15. Exuberant innovation: The Human Genome Project

    CERN Document Server

    Gisler, Monika; Woodard, Ryan

    2010-01-01

    We present a detailed synthesis of the development of the Human Genome Project (HGP) from 1986 to 2003 in order to test the "social bubble" hypothesis that strong social interactions between enthusiastic supporters of the HGP weaved a network of reinforcing feedbacks that led to a widespread endorsement and extraordinary commitment by those involved in the project, beyond what would be rationalized by a standard cost-benefit analysis in the presence of extraordinary uncertainties and risks. The vigorous competition and race between the initially public project and several private initiatives is argued to support the social bubble hypothesis. We also present quantitative analyses of the concomitant financial bubble concentrated on the biotech sector. Confirmation of this hypothesis is offered by the present consensus that it will take decades to exploit the fruits of the HGP, via a slow and arduous process aiming at disentangling the extraordinary complexity of the human complex body. The HGP has ushered other...

  16. Genomic Prediction Accounting for Residual Heteroskedasticity.

    Science.gov (United States)

    Ou, Zhining; Tempelman, Robert J; Steibel, Juan P; Ernst, Catherine W; Bates, Ronald O; Bello, Nora M

    2015-11-12

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit.

  17. Genomic Prediction Accounting for Residual Heteroskedasticity

    Science.gov (United States)

    Ou, Zhining; Tempelman, Robert J.; Steibel, Juan P.; Ernst, Catherine W.; Bates, Ronald O.; Bello, Nora M.

    2015-01-01

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. PMID:26564950

  18. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  19. Freedom and Responsibility in Synthetic Genomics: The Synthetic Yeast Project

    OpenAIRE

    Sliva, Anna; Yang, Huanming; Boeke, Jef D.; Debra J. H. Mathews

    2015-01-01

    First introduced in 2011, the Synthetic Yeast Genome (Sc2.0) Project is a large international synthetic genomics project that will culminate in the first eukaryotic cell (Saccharomyces cerevisiae) with a fully synthetic genome. With collaborators from across the globe and from a range of institutions spanning from do-it-yourself biology (DIYbio) to commercial enterprises, it is important that all scientists working on this project are cognizant of the ethical and policy issues associated with...

  20. Genomic Signal Processing: Predicting Basic Molecular Biological Principles

    Science.gov (United States)

    Alter, Orly

    2005-03-01

    Advances in high-throughput technologies enable acquisition of different types of molecular biological data, monitoring the flow of biological information as DNA is transcribed to RNA, and RNA is translated to proteins, on a genomic scale. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development. Recently we described data-driven models for genome-scale molecular biological data, which use singular value decomposition (SVD) and the comparative generalized SVD (GSVD). Now we describe an integrative data-driven model, which uses pseudoinverse projection (1). We also demonstrate the predictive power of these matrix algebra models (2). The integrative pseudoinverse projection model formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ``basis'' set. The mathematical variables of this integrative model, the pseudoinverse correlation patterns that are uncovered in the data, represent independent processes and corresponding cellular states (such as observed genome-wide effects of known regulators or transcription factors, the biological components of the cellular machinery that generate the genomic signals, and measured samples in which these regulators or transcription factors are over- or underactive). Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis, and gives a global picture of the correlations and possibly also causal coordination of

  1. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Cπ method and applied to 1,272 Duroc pigs with both genotypic and phenotypic records including residual (RFI) and daily feed intake (DFI), average daily gain (ADG) and back fat (BF)). Records were split into a training (968 pigs) and a validation dataset (304 pigs). SNPs were annotated by 14 different...... groups. Genomic prediction has accuracy comparable to an own phenotype and use of genomic prediction can be cost effective by replacing feed intake measurement. Use of genomic annotation of SNPs and QTL information had no largely significant impact on predictive accuracy for the current traits but may...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...

  2. Genomic prediction across dairy cattle populations and breeds

    DEFF Research Database (Denmark)

    Zhou, Lei

    Genomic prediction is successful in single breed genetic evaluation. However, there is no achievement in acoress breed prediction until now. This thesis investigated genomic prediction across populations and breeds using Chinese Holsterin, Nordic Holstein, Norwgian Red, and Nordic Red. Nordic Red...

  3. Life in our hands? Some ethical perspectives on the human genome and human genome diversity projects

    Directory of Open Access Journals (Sweden)

    Cornelius W. du Toit

    2014-01-01

    Full Text Available The article dealt with implications of the human genome and the human genome diversity project. It examined some theological implications, such as: humans as the image of God, God as the creator of life, the changed role of miracles and healings in religion, the sacredness of nature, life and the genome. Ethical issues that were addressed include eugenics, germline intervention, determinism and the human genome diversity project. Economic and legal factors that play a role were also discussed. Whilst positive aspects of genome research were considered, a critical stance was adopted towards patenting the human genome and some concluding guidelines were proposed.

  4. International network of cancer genome projects

    NARCIS (Netherlands)

    Hudson, Thomas J.; Anderson, Warwick; Aretz, Axel; Barker, Anna D.; Bell, Cindy; Bernabe, Rosa R.; Bhan, M. K.; Calvo, Fabien; Eerola, Iiro; Gerhard, Daniela S.; Guttmacher, Alan; Guyer, Mark; Hemsley, Fiona M.; Jennings, Jennifer L.; Kerr, David; Klatt, Peter; Kolar, Patrik; Kusuda, Jun; Lane, David P.; Laplace, Frank; Lu, Youyong; Nettekoven, Gerd; Ozenberger, Brad; Peterson, Jane; Rao, T. S.; Remacle, Jacques; Schafer, Alan J.; Shibata, Tatsuhiro; Stratton, Michael R.; Vockley, Joseph G.; Watanabe, Koichi; Yang, Huanming; Yuen, Matthew M. F.; Knoppers, M.; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn G.; Dyke, Stephanie O. M.; Joly, Yann; Kato, Kazuto; Kennedy, Karen L.; Nicolas, Pilar; Parker, Michael J.; Rial-Sebbag, Emmanuelle; Romeo-Casabona, Carlos M.; Shaw, Kenna M.; Wallace, Susan; Wiesner, Georgia L.; Zeps, Nikolajs; Lichter, Peter; Biankin, Andrew V.; Chabannon, Christian; Chin, Lynda; Clement, Bruno; de Alava, Enrique; Degos, Francoise; Ferguson, Martin L.; Geary, Peter; Hayes, D. Neil; Johns, Amber L.; Nakagawa, Hidewaki; Penny, Robert; Piris, Miguel A.; Sarin, Rajiv; Scarpa, Aldo; Shibata, Tatsuhiro; van de Vijver, Marc; Futreal, P. Andrew; Aburatani, Hiroyuki; Bayes, Monica; Bowtell, David D. L.; Campbell, Peter J.; Estivill, Xavier; Grimmond, Sean M.; Gut, Ivo; Hirst, Martin; Lopez-Otin, Carlos; Majumder, Partha; Marra, Marco; Nakagawa, Hidewaki; Ning, Zemin; Puente, Xose S.; Ruan, Yijun; Shibata, Tatsuhiro; Stratton, Michael R.; Stunnenberg, Hendrik G.; Swerdlow, Harold; Velculescu, Victor E.; Wilson, Richard K.; Xue, Hong H.; Yang, Liu; Spellman, Paul T.; Bader, Gary D.; Boutros, Paul C.; Campbell, Peter J.; Flicek, Paul; Getz, Gad; Guigo, Roderic; Guo, Guangwu; Haussler, David; Heath, Simon; Hubbard, Tim J.; Jiang, Tao; Jones, Steven M.; Li, Qibin; Lopez-Bigas, Nuria; Luo, Ruibang; Pearson, John V.; Puente, Xose S.; Quesada, Victor; Raphael, Benjamin J.; Sander, Chris; Shibata, Tatsuhiro; Speed, Terence P.; Stuart, Joshua M.; Teague, Jon W.; Totoki, Yasushi; Tsunoda, Tatsuhiko; Valencia, Alfonso; Wheeler, David A.; Wu, Honglong; Zhao, Shancen; Zhou, Guangyu; Stein, Lincoln D.; Guigo, Roderic; Hubbard, Tim J.; Joly, Yann; Jones, Steven M.; Lathrop, Mark; Lopez-Bigas, Nuria; Ouellette, B. F. Francis; Spellman, Paul T.; Teague, Jon W.; Thomas, Gilles; Valencia, Alfonso; Yoshida, Teruhiko; Kennedy, Karen L.; Axton, Myles; Dyke, Stephanie O. M.; Futreal, P. Andrew; Gunter, Chris; Guyer, Mark; McPherson, John D.; Miller, Linda J.; Ozenberger, Brad; Kasprzyk, Arek; Zhang, Junjun; Haider, Syed A.; Wang, Jianxin; Yung, Christina K.; Cross, Anthony; Liang, Yong; Gnaneshan, Saravanamuttu; Guberman, Jonathan; Hsu, Jack; Bobrow, Martin; Chalmers, Don R. C.; Hasel, Karl W.; Joly, Yann; Kaan, Terry S. H.; Kennedy, Karen L.; Knoppers, Bartha M.; Lowrance, William W.; Masui, Tohru; Nicolas, Pilar; Rial-Sebbag, Emmanuelle; Rodriguez, Laura Lyman; Vergely, Catherine; Yoshida, Teruhiko; Grimmond, Sean M.; Biankin, Andrew V.; Bowtell, David D. L.; Cloonan, Nicole; Defazio, Anna; Eshleman, James R.; Etemadmoghadam, Dariush; Gardiner, Brooke A.; Kench, James G.; Scarpa, Aldo; Sutherland, Robert L.; Tempero, Margaret A.; Waddell, Nicola J.; Wilson, Peter J.; Gallinger, Steve; Tsao, Ming-Sound; Shaw, Patricia A.; Petersen, Gloria M.; Mukhopadhyay, Debabrata; Chin, Lynda; DePinho, Ronald A.; Thayer, Sarah; Muthuswamy, Lakshmi; Shazand, Kamran; Beck, Timothy; Sam, Michelle; Timms, Lee; Ballin, Vanessa; Lu, Youyong; Ji, Jiafu; Zhang, Xiuqing; Chen, Feng; Hu, Xueda; Zhou, Guangyu; Yang, Qi; Tian, Geng; Zhang, Lianhai; Xing, Xiaofang; Li, Xianghong; Zhu, Zhenggang; Yu, Yingyan; Yu, Jun; Yang, Huanming; Lathrop, Mark; Tost, Joerg; Brennan, Paul; Holcatova, Ivana; Zaridze, David; Brazma, Alvis; Egevad, Lars; Prokhortchouk, Egor; Banks, Rosamonde Elizabeth; Uhlen, Mathias; Cambon-Thomsen, Anne; Viksna, Juris; Ponten, Fredrik; Skryabin, Konstantin; Stratton, Michael R.; Futreal, P. Andrew; Birney, Ewan; Borg, Ake; Borresen-Dale, Anne-Lise; Caldas, Carlos; Foekens, John A.; Martin, Sancha; Reis-Filho, Jorge S.; Richardson, Andrea L.; Sotiriou, Christos; Stunnenberg, Hendrik G.; Thomas, Gilles; van de Vijver, Marc; van't Veer, Laura; Birnbaum, Daniel; Blanche, Helene; Boucher, Pascal; Boyault, Sandrine; Chabannon, Christian; Gut, Ivo; Masson-Jacquemier, Jocelyne D.; Lathrop, Mark; Pauporte, Iris; Pivot, Xavier; Vincent-Salomon, Anne; Tabone, Eric; Theillet, Charles; Thomas, Gilles; Tost, Joerg; Treilleux, Isabelle; Bioulac-Sage, Paulette; Clement, Bruno; Decaens, Thomas; Degos, Francoise; Franco, Dominique; Gut, Ivo; Gut, Marta; Heath, Simon; Lathrop, Mark; Samuel, Didier; Thomas, Gilles; Zucman-Rossi, Jessica; Lichter, Peter; Eils, Roland; Brors, Benedikt; Korbel, Jan O.; Korshunov, Andrey; Landgraf, Pablo; Lehrach, Hans; Pfister, Stefan; Radlwimmer, Bernhard; Reifenberger, Guido; Taylor, Michael D.; von Kalle, Christof; Majumder, Partha P.; Sarin, Rajiv; Scarpa, Aldo; Pederzoli, Paolo; Lawlor, Rita T.; Delledonne, Massimo; Bardelli, Alberto; Biankin, Andrew V.; Grimmond, Sean M.; Gress, Thomas; Klimstra, David; Zamboni, Giuseppe; Shibata, Tatsuhiro; Nakamura, Yusuke; Nakagawa, Hidewaki; Kusuda, Jun; Tsunoda, Tatsuhiko; Miyano, Satoru; Aburatani, Hiroyuki; Kato, Kazuto; Fujimoto, Akihiro; Yoshida, Teruhiko; Campo, Elias; Lopez-Otin, Carlos; Estivill, Xavier; Guigo, Roderic; de Sanjose, Silvia; Piris, Miguel A.; Montserrat, Emili; Gonzalez-Diaz, Marcos; Puente, Xose S.; Jares, Pedro; Valencia, Alfonso; Himmelbaue, Heinz; Quesada, Victor; Bea, Silvia; Stratton, Michael R.; Futreal, P. Andrew; Campbell, Peter J.; Vincent-Salomon, Anne; Richardson, Andrea L.; Reis-Filho, Jorge S.; van de Vijver, Marc; Thomas, Gilles; Masson-Jacquemier, Jocelyne D.; Aparicio, Samuel; Borg, Ake; Borresen-Dale, Anne-Lise; Caldas, Carlos; Foekens, John A.; Stunnenberg, Hendrik G.; van't Veer, Laura; Easton, Douglas F.; Spellman, Paul T.; Martin, Sancha; Chin, Lynda; Collins, Francis S.; Compton, Carolyn C.; Ferguson, Martin L.; Getz, Gad; Gunter, Chris; Guyer, Mark; Hayes, D. Neil; Lander, Eric S.; Ozenberger, Brad; Penny, Robert; Peterson, Jane; Sander, Chris; Speed, Terence P.; Spellman, Paul T.; Wheeler, David A.; Wilson, Richard K.; Chin, Lynda; Knoppers, Bartha M.; Lander, Eric S.; Lichter, Peter; Stratton, Michael R.; Bobrow, Martin; Burke, Wylie; Collins, Francis S.; DePinho, Ronald A.; Easton, Douglas F.; Futreal, P. Andrew; Green, Anthony R.; Guyer, Mark; Hamilton, Stanley R.; Hubbard, Tim J.; Kallioniemi, Olli P.; Kennedy, Karen L.; Ley, Timothy J.; Liu, Edison T.; Lu, Youyong; Majumder, Partha; Marra, Marco; Ozenberger, Brad; Peterson, Jane; Schafer, Alan J.; Spellman, Paul T.; Stunnenberg, Hendrik G.; Wainwright, Brandon J.; Wilson, Richard K.; Yang, Huanming

    2010-01-01

    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic

  5. Weeding out the genes: the Arabidopsis genome project.

    Science.gov (United States)

    Martienssen, R A

    2000-05-01

    The Arabidopsis genome sequence is scheduled for completion at the end of this year (December 2000). It will be the first higher plant genome to be sequenced, and will allow a detailed comparison with bacterial, yeast and animal genomes. Already, two of the five chromosomes have been sequenced, and we have had our first glimpse of higher eukaryotic centromeres, and the structure of heterochromatin. The implications for understanding plant gene function, genome structure and genome organization are profound. In this review, the lessons learned for future genome projects are reviewed as well as a summary of the initial findings in Arabidopsis.

  6. Empirical and deterministic accuracies of across-population genomic prediction

    NARCIS (Netherlands)

    Wientjes, Y.C.J.; Veerkamp, R.F.; Bijma, P.; Bovenhuis, H.; Schrooten, C.; Calus, M.P.L.

    2015-01-01

    Background: Differences in linkage disequilibrium and in allele substitution effects of QTL (quantitative trait loci) may hinder genomic prediction across populations. Our objective was to develop a deterministic formula to estimate the accuracy of across-population genomic prediction, for which

  7. The PredictAD project

    DEFF Research Database (Denmark)

    Antila, Kari; Lötjönen, Jyrki; Thurfjell, Lennart;

    2013-01-01

    can be managed. Today the significance of early and precise diagnosis of AD is emphasized in order to minimize its irreversible effects on the nervous system. When new drugs and therapies enter the market it is also vital to effectively identify the right candidates to benefit from these. The main...... candidates and implement the framework in software. The results are currently used in several research projects, licensed to commercial use and being tested for clinical use in several trials....

  8. The Human Genome Project: big science transforms biology and medicine.

    Science.gov (United States)

    Hood, Leroy; Rowen, Lee

    2013-01-01

    The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called 'big science' - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.

  9. Cancer Genome Anatomy Project (CGAP) | Office of Cancer Genomics

    Science.gov (United States)

    CGAP generated a wide range of genomics data on cancerous cells that are accessible through easy-to-use online tools. Researchers, educators, and students can find "in silico" answers to biological questions through the CGAP website. Request a free copy of the CGAP Website Virtual Tour CD from ocg@mail.nih.gov to learn how to navigate the website.

  10. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Bansal Manju

    2011-07-01

    Full Text Available Abstract Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability in the vicinity of all annotated translation start sites (TLS. Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.

  11. Genomes to life project : quarterly report October 2003.

    Energy Technology Data Exchange (ETDEWEB)

    Heffelfinger, Grant S.

    2004-01-01

    This SAND report provides the technical progress through October 2003 of the Sandia-led project, 'Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling,' funded by the DOE Office of Science Genomes to Life Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will

  12. Genomes to Life Project Quartely Report October 2004.

    Energy Technology Data Exchange (ETDEWEB)

    Heffelfinger, Grant S.; Martino, Anthony; Rintoul, Mark Daniel; Geist, Al; Gorin, Andrey; Xu, Ying; Palenik, Brian

    2005-02-01

    This SAND report provides the technical progress through October 2004 of the Sandia-led project, %22Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling,%22 funded by the DOE Office of Science Genomes to Life Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop

  13. Genomes to Life Project Quarterly Report April 2005.

    Energy Technology Data Exchange (ETDEWEB)

    Heffelfinger, Grant S.; Martino, Anthony; Rintoul, Mark Daniel; Geist, Al; Gorin, Andrey; Xu, Ying; Palenik, Brian

    2006-02-01

    This SAND report provides the technical progress through April 2005 of the Sandia-led project, "Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling," funded by the DOE Office of Science Genomics:GTL Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop a set of

  14. Correction for Measurement Error from Genotyping-by-Sequencing in Genomic Variance and Genomic Prediction Models

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Janss, Luc; Jensen, Just

    Genotyping-by-sequencing (GBSeq) is becoming a cost-effective genotyping platform for species without available SNP arrays. GBSeq considers to sequence short reads from restriction sites covering a limited part of the genome (e.g., 5-10%) with low sequencing depth per individual (e.g., 5-10X per....... In the current work we show how the correction for measurement error in GBSeq can also be applied in whole genome genomic variance and genomic prediction models. Bayesian whole-genome random regression models are proposed to allow implementation of large-scale SNP-based models with a per-SNP correction...... for measurement error. We show correct retrieval of genomic explained variance, and improved genomic prediction when accounting for the measurement error in GBSeq data...

  15. Los Alamos Science: The Human Genome Project. Number 20, 1992

    Energy Technology Data Exchange (ETDEWEB)

    Cooper, N G; Shea, N [eds.

    1992-01-01

    This article provides a broad overview of the Human Genome Project, with particular emphasis on work being done at Los Alamos. It tries to emphasize the scientific aspects of the project, compared to the more speculative information presented in the popular press. There is a brief introduction to modern genetics, including a review of classic work. There is a broad overview of the Genome Project, describing what the project is, what are some of its major five-year goals, what are major technological challenges ahead of the project, and what can the field of biology, as well as society expect to see as benefits from this project. Specific results on the efforts directed at mapping chromosomes 16 and 5 are discussed. A brief introduction to DNA libraries is presented, bearing in mind that Los Alamos has housed such libraries for many years prior to the Genome Project. Information on efforts to do applied computational work related to the project are discussed, as well as experimental efforts to do rapid DNA sequencing by means of single-molecule detection using applied spectroscopic methods. The article introduces the Los Alamos staff which are working on the Genome Project, and concludes with brief discussions on ethical, legal, and social implications of this work; a brief glimpse of genetics as it may be practiced in the next century; and a glossary of relevant terms.

  16. Los Alamos Science: The Human Genome Project. Number 20, 1992

    Science.gov (United States)

    Cooper, N. G.; Shea, N. eds.

    1992-01-01

    This document provides a broad overview of the Human Genome Project, with particular emphasis on work being done at Los Alamos. It tries to emphasize the scientific aspects of the project, compared to the more speculative information presented in the popular press. There is a brief introduction to modern genetics, including a review of classic work. There is a broad overview of the Genome Project, describing what the project is, what are some of its major five-year goals, what are major technological challenges ahead of the project, and what can the field of biology, as well as society expect to see as benefits from this project. Specific results on the efforts directed at mapping chromosomes 16 and 5 are discussed. A brief introduction to DNA libraries is presented, bearing in mind that Los Alamos has housed such libraries for many years prior to the Genome Project. Information on efforts to do applied computational work related to the project are discussed, as well as experimental efforts to do rapid DNA sequencing by means of single-molecule detection using applied spectroscopic methods. The article introduces the Los Alamos staff which are working on the Genome Project, and concludes with brief discussions on ethical, legal, and social implications of this work; a brief glimpse of genetics as it may be practiced in the next century; and a glossary of relevant terms.

  17. The Human Genome Project, and recent advances in personalized genomics

    Directory of Open Access Journals (Sweden)

    Wilson BJ

    2015-02-01

    Full Text Available Brenda J Wilson, Stuart G Nicholls Department of Epidemiology and Community Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada Abstract: The language of “personalized medicine” and “personal genomics” has now entered the common lexicon. The idea of personalized medicine is the integration of genomic risk assessment alongside other clinical investigations. Consistent with this approach, testing is delivered by health care professionals who are not medical geneticists, and where results represent risks, as opposed to clinical diagnosis of disease, to be interpreted alongside the entirety of a patient's health and medical data. In this review we consider the evidence concerning the application of such personalized genomics within the context of population screening, and potential implications that arise from this. We highlight two general approaches which illustrate potential uses of genomic information in screening. The first is a narrowly targeted approach in which genetic profiling is linked with standard population-based screening for diseases; the second is a broader targeting of variants associated with multiple single gene disorders, performed opportunistically on patients being investigated for unrelated conditions. In doing so we consider the organization and evaluation of tests and services, the challenge of interpretation with less targeted testing, professional confidence, barriers in practice, and education needs. We conclude by discussing several issues pertinent to health policy, namely: avoiding the conflation of genetics with biological determinism, resisting the “technological imperative”, due consideration of the organization of screening services, the need for professional education, as well as informed decision making and public understanding. Keywords: genomics, personalized medicine, ethics, population health, evidence, education

  18. Combining SNPs in latent variables to improve genomic prediction

    DEFF Research Database (Denmark)

    Heuven, Henri C M; Rosa, G J M; Janss, Luc

    The objective of this study was to develop and test hierarchical genomic models with latent variables that represent parts of the genomic values. An interaction model and a chromosome model were compared with a model based on variable selection in a simulated and real dataset. The program Bayz......: Hierarchical genetic model; Predictive value; Gibbs sampling; Variable selection....

  19. Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project.

    Science.gov (United States)

    Kyrpides, Nikos C; Woyke, Tanja; Eisen, Jonathan A; Garrity, George; Lilburn, Timothy G; Beck, Brian J; Whitman, William B; Hugenholtz, Phil; Klenk, Hans-Peter

    2014-06-15

    The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both of the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.

  20. Genomic-enabled prediction with classification algorithms.

    Science.gov (United States)

    Ornella, L; Pérez, P; Tapia, E; González-Camacho, J M; Burgueño, J; Zhang, X; Singh, S; Vicente, F S; Bonnett, D; Dreisigacker, S; Singh, R; Long, N; Crossa, J

    2014-06-01

    Pearson's correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait-environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen's kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets

  1. The Genome 10K Project: a way forward.

    Science.gov (United States)

    Koepfli, Klaus-Peter; Paten, Benedict; O'Brien, Stephen J

    2015-01-01

    The Genome 10K Project was established in 2009 by a consortium of biologists and genome scientists determined to facilitate the sequencing and analysis of the complete genomes of 10,000 vertebrate species. Since then the number of selected and initiated species has risen from ∼26 to 277 sequenced or ongoing with funding, an approximately tenfold increase in five years. Here we summarize the advances and commitments that have occurred by mid-2014 and outline the achievements and present challenges of reaching the 10,000-species goal. We summarize the status of known vertebrate genome projects, recommend standards for pronouncing a genome as sequenced or completed, and provide our present and future vision of the landscape of Genome 10K. The endeavor is ambitious, bold, expensive, and uncertain, but together the Genome 10K Consortium of Scientists and the worldwide genomics community are moving toward their goal of delivering to the coming generation the gift of genome empowerment for many vertebrate species.

  2. A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.

    Science.gov (United States)

    Moraes, Fernanda; Góes, Andréa

    2016-05-06

    The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016.

  3. Unexpected cross-species contamination in genome sequencing projects

    Directory of Open Access Journals (Sweden)

    Samier Merchant

    2014-11-01

    Full Text Available The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.

  4. The Riken mouse genome encyclopedia project.

    Science.gov (United States)

    Hayashizaki, Yoshihide

    2003-01-01

    The Riken mouse genome encyclopedia a comprehensive full-length cDNA collection and sequence database. High-level functional annotation is based on sequence homology search, expression profiling, mapping and protein-protein interactions. More than 1000000 clones prepared from 163 tissues were end-sequenced and classified into 128000 clusters, and 60000 representative clones were fully sequenced representing 24000 clear protein-encoding genes. The application of the mouse genome database for positional cloning and gene network regulation analysis is reported.

  5. Genomes to life project quarterly report June 2004.

    Energy Technology Data Exchange (ETDEWEB)

    Heffelfinger, Grant S.

    2005-01-01

    This SAND report provides the technical progress through June 2004 of the Sandia-led project, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling'', funded by the DOE Office of Science Genomes to Life Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO{sub 2} are important terms in the global environmental response to anthropogenic atmospheric inputs of CO{sub 2} and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes

  6. CERAPP: Collaborative estrogen receptor activity prediction project

    DEFF Research Database (Denmark)

    Mansouri, Kamel; Abdelaziz, Ahmed; Rybacka, Aleksandra

    2016-01-01

    Background: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER......). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. oBjectives: We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project...

  7. The effect of genealogy-based haplotypes on genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Fernando, Rohan L.; Su, Guosheng

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression...... on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using...... local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (pi) of the haplotype covariates had zero effect...

  8. Promoter Prediction on a Genomic Scale—The Adh Experience

    Science.gov (United States)

    Ohler, Uwe

    2000-01-01

    We describe our statistical system for promoter recognition in genomic DNA with which we took part in the Genome Annotation Assessment Project (GASP1). We applied two versions of the system: the first uses a region-based approach toward transcription start site identification, namely, interpolated Markov chains; the second was a hybrid approach combining regions and signals within a stochastic segment model. We compare the results of both versions with each other and examine how well the application on a genomic scale compares with the results we previously obtained on smaller data sets. PMID:10779494

  9. The human genome project: Prospects and implications for clinical medicine

    Energy Technology Data Exchange (ETDEWEB)

    Green, E.D.; Waterston, R.H. (Washington Univ., St. Louis, MO (United States))

    1991-10-09

    The recently initiated human genome project is a large international effort to elucidate the genetic architecture of the genomes of man and several model organisms. The initial phases of this endeavor involve the establishment of rough blueprints (maps) of the genetic landscape of these genomes, with the long-term goal of determining their precise nucleotide sequences and identifying the genes. The knowledge gained by these studies will provide a vital tool for the study of many biologic processes and will have a profound impact on clinical medicine.

  10. Genomes to Life Project Quarterly Report April 2005.

    Energy Technology Data Exchange (ETDEWEB)

    Heffelfinger, Grant S.; Martino, Anthony; Rintoul, Mark Daniel; Geist, Al; Gorin, Andrey; Xu, Ying; Palenik, Brian

    2006-02-01

    This SAND report provides the technical progress through April 2005 of the Sandia-led project, "Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling," funded by the DOE Office of Science Genomics:GTL Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop a set of

  11. Genomes to Life Project Quartely Report October 2004.

    Energy Technology Data Exchange (ETDEWEB)

    Heffelfinger, Grant S.; Martino, Anthony; Rintoul, Mark Daniel; Geist, Al; Gorin, Andrey; Xu, Ying; Palenik, Brian

    2005-02-01

    This SAND report provides the technical progress through October 2004 of the Sandia-led project, %22Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling,%22 funded by the DOE Office of Science Genomes to Life Program. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. In this project, we will investigate the carbon sequestration behavior of Synechococcus Sp., an abundant marine cyanobacteria known to be important to environmental responses to carbon dioxide levels, through experimental and computational methods. This project is a combined experimental and computational effort with emphasis on developing and applying new computational tools and methods. Our experimental effort will provide the biology and data to drive the computational efforts and include significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Computational tools will be essential to our efforts to discover and characterize the function of the molecular machines of Synechococcus. To this end, molecular simulation methods will be coupled with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes. In addition, we will develop

  12. Genomic Islands Prediction and Analysis in Cyanobacteira by Bioinfomatics

    Institute of Scientific and Technical Information of China (English)

    Yi Li; Ni-Ni Rao; Feng Yang; Han-Ming Liu

    2014-01-01

    Genomic islands (Gis) are one of the most important components for cyanobacterial genome. The Gis code has many functions, such as symbiosis, pathogenesis, and adaptation. In this article, we predict and analyze the Gis in Synechocystis sp. PCC 6803 by bioinfomatics, and the results show that ISL1, ISL8, and ISL16 are homologous with many other bacteria, and they involve in basic reactions and have a conservative evolution. On the contrary, ISL15 has a unique sequence and function only for Synechocystis sp. PCC 6803. Most of Gis play a role in genome rearrangement because they have lots of transposase. Moreover, we find that recombination and horizontal transfer of Gis are important factors to affect the distribution of non-coding RNA. Our work contributes to a comprehensive understanding of genomic islands and their impact on genome of cyanobacteria.

  13. Freedom and Responsibility in Synthetic Genomics: The Synthetic Yeast Project.

    Science.gov (United States)

    Sliva, Anna; Yang, Huanming; Boeke, Jef D; Mathews, Debra J H

    2015-08-01

    First introduced in 2011, the Synthetic Yeast Genome (Sc2.0) PROJECT is a large international synthetic genomics project that will culminate in the first eukaryotic cell (Saccharomyces cerevisiae) with a fully synthetic genome. With collaborators from across the globe and from a range of institutions spanning from do-it-yourself biology (DIYbio) to commercial enterprises, it is important that all scientists working on this project are cognizant of the ethical and policy issues associated with this field of research and operate under a common set of principles. In this commentary, we survey the current ethics and regulatory landscape of synthetic biology and present the Sc2.0 Statement of Ethics and Governance to which all members of the project adhere. This statement focuses on four aspects of the Sc2.0 PROJECT: societal benefit, intellectual property, safety, and self-governance. We propose that such project-level agreements are an important, valuable, and flexible model of self-regulation for similar global, large-scale synthetic biology projects in order to maximize the benefits and minimize potential harms. Copyright © 2015 by the Genetics Society of America.

  14. Freedom and Responsibility in Synthetic Genomics: The Synthetic Yeast Project

    Science.gov (United States)

    Sliva, Anna; Yang, Huanming; Boeke, Jef D.; Mathews, Debra J. H.

    2015-01-01

    First introduced in 2011, the Synthetic Yeast Genome (Sc2.0) Project is a large international synthetic genomics project that will culminate in the first eukaryotic cell (Saccharomyces cerevisiae) with a fully synthetic genome. With collaborators from across the globe and from a range of institutions spanning from do-it-yourself biology (DIYbio) to commercial enterprises, it is important that all scientists working on this project are cognizant of the ethical and policy issues associated with this field of research and operate under a common set of principles. In this commentary, we survey the current ethics and regulatory landscape of synthetic biology and present the Sc2.0 Statement of Ethics and Governance to which all members of the project adhere. This statement focuses on four aspects of the Sc2.0 Project: societal benefit, intellectual property, safety, and self-governance. We propose that such project-level agreements are an important, valuable, and flexible model of self-regulation for similar global, large-scale synthetic biology projects in order to maximize the benefits and minimize potential harms. PMID:26272997

  15. Genome-Wide Prediction of C. elegans Genetic Interactions

    OpenAIRE

    Zhong, Weiwei; Sternberg, Paul W.

    2006-01-01

    To obtain a global view of functional interactions among genes in a metazoan genome, we computationally integrated interactome data, gene expression data, phenotype data, and functional annotation data from three model organisms—Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster—and predicted genome-wide genetic interactions in C. elegans. The resulting genetic interaction network (consisting of 18,183 interactions) provides a framework for system-level understandin...

  16. Network Based Prediction Model for Genomics Data Analysis*

    OpenAIRE

    Huang, Ying; Wang, Pei

    2012-01-01

    Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. ...

  17. Genome Project Standards in a New Era of Sequencing

    Energy Technology Data Exchange (ETDEWEB)

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better

  18. Using Genome-scale Models to Predict Biological Capabilities

    DEFF Research Database (Denmark)

    O’Brien, Edward J.; Monk, Jonathan M.; Palsson, Bernhard O.

    2015-01-01

    Constraint-based reconstruction and analysis (COBRA) methods at the genome scale have been under development since the first whole-genome sequences appeared in the mid-1990s. A few years ago, this approach began to demonstrate the ability to predict a range of cellular functions, including cellular...... growth capabilities on various substrates and the effect of gene knockouts at the genome scale. Thus, much interest has developed in understanding and applying these methods to areas such as metabolic engineering, antibiotic design, and organismal and enzyme evolution. This Primer will get you started....

  19. Predicting Tissue-Specific Enhancers in the Human Genome

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Loots, Gabriela G.; Nobrega, Marcelo A.; Ovcharenko, Ivan

    2006-07-01

    Determining how transcriptional regulatory signals areencoded in vertebrate genomes is essential for understanding the originsof multi-cellular complexity; yet the genetic code of vertebrate generegulation remains poorly understood. In an attempt to elucidate thiscode, we synergistically combined genome-wide gene expression profiling,vertebrate genome comparisons, and transcription factor binding siteanalysis to define sequence signatures characteristic of candidatetissue-specific enhancers in the human genome. We applied this strategyto microarray-based gene expression profiles from 79 human tissues andidentified 7,187 candidate enhancers that defined their flanking geneexpression, the majority of which were located outside of knownpromoters. We cross-validated this method for its ability to de novopredict tissue-specific gene expression and confirmed its reliability in57 of the 79 available human tissues, with an average precision inenhancer recognition ranging from 32 percent to 63 percent, and asensitivity of 47 percent. We used the sequence signatures identified bythis approach to assign tissue-specific predictions to ~;328,000human-mouse conserved noncoding elements in the human genome. Byoverlapping these genome-wide predictions with a large in vivo dataset ofenhancers validated in transgenic mice, we confirmed our results with a28 percent sensitivity and 50 percent precision. These results indicatethe power of combining complementary genomic datasets as an initialcomputational foray into the global view of tissue-specific generegulation in vertebrates.

  20. Prediction of Genomic Islands in Three Bacterial Pathogens of Pneumonia

    Directory of Open Access Journals (Sweden)

    Wen Wei

    2012-03-01

    Full Text Available Pneumonia is one kind of common infectious disease, which is usually caused by bacteria, viruses, or fungi. In this paper, we predicted genomic islands in three bacterial pathogens of pneumonia. They are Chlamydophila pneumoniae, Mycoplasma pneumoniae and Streptococcus pneumoniae, respectively. For each pathogen, one clinical strain is involved. After implementing the cumulative GC profile combined with h and BCN index, eight genomic islands are found in three pathogens. Among them, six genomic islands are found to have mobility elements, which constitute a kind of conserved character of genomic islands, and this introduces the possibility that they are genuine genomic islands. The present results show that the cumulative GC profile when combined with h and BCN indexes is a good method for predicting genomic islands in bacteria and it has lower false positive rate than the SIGI method. Specially, three genomic islands are found to contain clusters of genes coding for production of virulence factors and this is useful for research into the pathogenicity of these pathogens and helpful for the treatment of diseases caused by them.

  1. Genome-wide prediction of C. elegans genetic interactions.

    Science.gov (United States)

    Zhong, Weiwei; Sternberg, Paul W

    2006-03-10

    To obtain a global view of functional interactions among genes in a metazoan genome, we computationally integrated interactome data, gene expression data, phenotype data, and functional annotation data from three model organisms-Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster-and predicted genome-wide genetic interactions in C. elegans. The resulting genetic interaction network (consisting of 18,183 interactions) provides a framework for system-level understanding of gene functions. We experimentally tested the predicted interactions for two human disease-related genes and identified 14 new modifiers.

  2. [The Human Genome Project and the right to intellectual property].

    Science.gov (United States)

    Cambrón, A

    2000-01-01

    The Human Genome Project was designed to achieve two objectives. The scientific goal was the mapping and sequencing of the human genome and the social objective was to benefit the health and well-being of humanity. Although the first objective is nearing successful conclusion, the same cannot be said for the second, mainly because the benefits will take some time to be applicable and effective, but also due to the very nature of the project. The HGP also had a clear economic dimension, which has had a major bearing on its social side. Operating in the midst of these three dimensions is the right to intellectual property (although not just this right), which has facilitated the granting of patents on human genes. Put another way, the carrying out of the HGP has required the privatisation of knowledge of the human genome, and this can be considered an attack on the genetic heritage of mankind.

  3. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

    Science.gov (United States)

    Mallick, Swapan; Li, Heng; Lipson, Mark; Mathieson, Iain; Gymrek, Melissa; Racimo, Fernando; Zhao, Mengyao; Chennagiri, Niru; Nordenfelt, Susanne; Tandon, Arti; Skoglund, Pontus; Lazaridis, Iosif; Sankararaman, Sriram; Fu, Qiaomei; Rohland, Nadin; Renaud, Gabriel; Erlich, Yaniv; Willems, Thomas; Gallo, Carla; Spence, Jeffrey P.; Song, Yun S.; Poletti, Giovanni; Balloux, Francois; van Driem, George; de Knijff, Peter; Romero, Irene Gallego; Jha, Aashish R.; Behar, Doron M.; Bravi, Claudio M.; Capelli, Cristian; Hervig, Tor; Moreno-Estrada, Andres; Posukh, Olga L.; Balanovska, Elena; Balanovsky, Oleg; Karachanak-Yankova, Sena; Sahakyan, Hovhannes; Toncheva, Draga; Yepiskoposyan, Levon; Tyler-Smith, Chris; Xue, Yali; Abdullah, M. Syafiq; Ruiz-Linares, Andres; Beall, Cynthia M.; Di Rienzo, Anna; Jeong, Choongwon; Starikovskaya, Elena B.; Metspalu, Ene; Parik, Jüri; Villems, Richard; Henn, Brenna M.; Hodoglugil, Ugur; Mahley, Robert; Sajantila, Antti; Stamatoyannopoulos, George; Wee, Joseph T. S.; Khusainova, Rita; Khusnutdinova, Elza; Litvinov, Sergey; Ayodo, George; Comas, David; Hammer, Michael; Kivisild, Toomas; Klitz, William; Winkler, Cheryl; Labuda, Damian; Bamshad, Michael; Jorde, Lynn B.; Tishkoff, Sarah A.; Watkins, W. Scott; Metspalu, Mait; Dryomov, Stanislav; Sukernik, Rem; Singh, Lalji; Thangaraj, Kumarasamy; Pääbo, Svante; Kelso, Janet; Patterson, Nick; Reich, David

    2016-01-01

    We report the Simons Genome Diversity Project (SGDP) dataset: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioral modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that in other non-Africans. PMID:27654912

  4. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.

    Science.gov (United States)

    Mallick, Swapan; Li, Heng; Lipson, Mark; Mathieson, Iain; Gymrek, Melissa; Racimo, Fernando; Zhao, Mengyao; Chennagiri, Niru; Nordenfelt, Susanne; Tandon, Arti; Skoglund, Pontus; Lazaridis, Iosif; Sankararaman, Sriram; Fu, Qiaomei; Rohland, Nadin; Renaud, Gabriel; Erlich, Yaniv; Willems, Thomas; Gallo, Carla; Spence, Jeffrey P; Song, Yun S; Poletti, Giovanni; Balloux, Francois; van Driem, George; de Knijff, Peter; Romero, Irene Gallego; Jha, Aashish R; Behar, Doron M; Bravi, Claudio M; Capelli, Cristian; Hervig, Tor; Moreno-Estrada, Andres; Posukh, Olga L; Balanovska, Elena; Balanovsky, Oleg; Karachanak-Yankova, Sena; Sahakyan, Hovhannes; Toncheva, Draga; Yepiskoposyan, Levon; Tyler-Smith, Chris; Xue, Yali; Abdullah, M Syafiq; Ruiz-Linares, Andres; Beall, Cynthia M; Di Rienzo, Anna; Jeong, Choongwon; Starikovskaya, Elena B; Metspalu, Ene; Parik, Jüri; Villems, Richard; Henn, Brenna M; Hodoglugil, Ugur; Mahley, Robert; Sajantila, Antti; Stamatoyannopoulos, George; Wee, Joseph T S; Khusainova, Rita; Khusnutdinova, Elza; Litvinov, Sergey; Ayodo, George; Comas, David; Hammer, Michael F; Kivisild, Toomas; Klitz, William; Winkler, Cheryl A; Labuda, Damian; Bamshad, Michael; Jorde, Lynn B; Tishkoff, Sarah A; Watkins, W Scott; Metspalu, Mait; Dryomov, Stanislav; Sukernik, Rem; Singh, Lalji; Thangaraj, Kumarasamy; Pääbo, Svante; Kelso, Janet; Patterson, Nick; Reich, David

    2016-10-13

    Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.

  5. Genomic prediction of traits related to canine hip dysplasia

    Directory of Open Access Journals (Sweden)

    Enrique eSanchez-Molano

    2015-03-01

    Full Text Available Increased concern for the welfare of pedigree dogs has led to development of selection programs against inherited diseases. An example is canine hip dysplasia (CHD, which has a moderate heritability and a high prevalence in some large-sized breeds. To date, selection using phenotypes has led to only modest improvement, and alternative strategies such as genomic selection may prove more effective. The primary aims of this study were to compare the performance of pedigree- and genomic-based breeding against CHD in the UK Labrador retriever population and to evaluate the performance of different genomic selection methods. A sample of 1179 Labrador Retrievers evaluated for CHD according to the UK scoring method (hip score, HS was genotyped with the Illumina CanineHD BeadChip. Twelve functions of HS and its component traits were analyzed using different statistical methods (GBLUP, Bayes C and Single-Step methods, and results were compared with a pedigree-based approach (BLUP using cross-validation. Genomic methods resulted in similar or higher accuracies than pedigree-based methods with training sets of 944 individuals for all but the untransformed HS, suggesting that genomic selection is an effective strategy. GBLUP and Bayes C gave similar prediction accuracies for HS and related traits, indicating a polygenic architecture. This conclusion was also supported by the low accuracies obtained in additional GBLUP analyses performed using only the SNPs with highest test statistics, also indicating that marker-assisted selection would not be as effective as genomic selection. A Single-Step method that combines genomic and pedigree information also showed higher accuracy than GBLUP and Bayes C for the log-transformed HS, which is currently used for pedigree based evaluations in UK. In conclusion, genomic selection is a promising alternative to pedigree-based selection against CHD, requiring more phenotypes with genomic data to improve further the accuracy

  6. Enhancing Biology Instruction with the Human Genome Project

    Science.gov (United States)

    Buxeda, Rosa J.; Moore-Russo, Deborah A.

    2003-01-01

    The Human Genome Project (HGP) is a recent scientific milestone that has received notable attention. This article shows how a biology course is using the HGP to enhance students' experiences by providing awareness of cutting edge research, with information on new emerging career options, and with opportunities to consider ethical questions raised…

  7. The Human Genome Project: Biology, Computers, and Privacy.

    Science.gov (United States)

    Cutter, Mary Ann G.; Drexler, Edward; Gottesman, Kay S.; Goulding, Philip G.; McCullough, Laurence B.; McInerney, Joseph D.; Micikas, Lynda B.; Mural, Richard J.; Murray, Jeffrey C.; Zola, John

    This module, for high school teachers, is the second of two modules about the Human Genome Project (HGP) produced by the Biological Sciences Curriculum Study (BSCS). The first section of this module provides background information for teachers about the structure and objectives of the HGP, aspects of the science and technology that underlie the…

  8. Human Genome Project and cystic fibrosis--a symbiotic relationship.

    Science.gov (United States)

    Tolstoi, L G; Smith, C L

    1999-11-01

    When Watson and Crick determined the structure of DNA in 1953, a biological revolution began. One result of this revolution is the Human Genome Project. The primary goal of this international project is to obtain the complete nucleotide sequence of the human genome by the year 2005. Although molecular biologists and geneticists are most enthusiastic about the Human Genome Project, all areas of clinical medicine and fields of biology will be affected. Cystic fibrosis is the most common, inherited, lethal disease of white persons. In 1989, researchers located the cystic fibrosis gene on the long arm of chromosome 7 by a technique known as positional cloning. The most common mutation (a 3-base pair deletion) of the cystic fibrosis gene occurs in 70% of patients with cystic fibrosis. The knowledge gained from genetic research on cystic fibrosis will help researchers develop new therapies (e.g., gene) and improve standard therapies (e.g., pharmacologic) so that a patient's life span is increased and quality of life is improved. The purpose of this review is twofold. First, the article provides an overview of the Human Genome Project and its clinical significance in advancing interdisciplinary care for patients with cystic fibrosis. Second, the article includes a discussion of the genetic basis, pathophysiology, and management of cystic fibrosis.

  9. Reconsidering democracy - History of the human genome project

    NARCIS (Netherlands)

    Huijer, M

    What options are open for people-citizens, politicians, and other nonscientists-to become actively involved in and anticipate new directions in the life sciences? In addressing this question, this article focuses on the start of the Human Genome Project (1985-1990). By contrasting various models of

  10. Reconsidering democracy - History of the human genome project

    NARCIS (Netherlands)

    Huijer, M

    2003-01-01

    What options are open for people-citizens, politicians, and other nonscientists-to become actively involved in and anticipate new directions in the life sciences? In addressing this question, this article focuses on the start of the Human Genome Project (1985-1990). By contrasting various models of

  11. Mapping our genes: The genome projects: How big, how fast

    Energy Technology Data Exchange (ETDEWEB)

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  12. Mapping Our Genes: The Genome Projects: How Big, How Fast

    Science.gov (United States)

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for �writing the rules� of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  13. Relevance of the Human Genome Project to inherited metabolic disease.

    Science.gov (United States)

    Burn, J

    1994-01-01

    The Human Genome Project is an international effort to identify the complete structure of the human genome. HUGO, the Human Genome Organization, facilitates international cooperation and exchange of information while the Genome Data Base will act as the on-line information retrieval and storage system for the huge amount of information being accumulated. The clinical register MIM (Mendelian Inheritance in Man) established by Victor McKusick is now an on-line resource that will allow biochemists working with inborn errors of metabolism to access the rapidly expanding body of knowledge. Biochemical and molecular genetics are complementary and should draw together to find solutions to the academic and clinical problems posed by inborn errors of metabolism.

  14. Using Genetic Distance to Infer the Accuracy of Genomic Prediction.

    Directory of Open Access Journals (Sweden)

    Marco Scutari

    2016-09-01

    Full Text Available The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.

  15. Citrus sinensis annotation project (CAP): a comprehensive database for sweet orange genome.

    Science.gov (United States)

    Wang, Jia; Chen, Dijun; Lei, Yang; Chang, Ji-Wei; Hao, Bao-Hai; Xing, Feng; Li, Sen; Xu, Qiang; Deng, Xiu-Xin; Chen, Ling-Ling

    2014-01-01

    Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia), and constructed the Citrus sinensis annotation project (CAP) to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET) evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs) and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  16. The African Genome Variation Project shapes medical genetics in Africa.

    Science.gov (United States)

    Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S

    2015-01-15

    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.

  17. Genomic Prediction of Testcross Performance in Canola (Brassica napus.

    Directory of Open Access Journals (Sweden)

    Habib U Jan

    Full Text Available Genomic selection (GS is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81 followed by oil yield (0.75 and lowest for seedling emergence (0.29. For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF, prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows

  18. An assessment on epitope prediction methods for protozoa genomes

    Directory of Open Access Journals (Sweden)

    Resende Daniela M

    2012-11-01

    Full Text Available Abstract Background Epitope prediction using computational methods represents one of the most promising approaches to vaccine development. Reduction of time, cost, and the availability of completely sequenced genomes are key points and highly motivating regarding the use of reverse vaccinology. Parasites of genus Leishmania are widely spread and they are the etiologic agents of leishmaniasis. Currently, there is no efficient vaccine against this pathogen and the drug treatment is highly toxic. The lack of sufficiently large datasets of experimentally validated parasites epitopes represents a serious limitation, especially for trypanomatids genomes. In this work we highlight the predictive performances of several algorithms that were evaluated through the development of a MySQL database built with the purpose of: a evaluating individual algorithms prediction performances and their combination for CD8+ T cell epitopes, B-cell epitopes and subcellular localization by means of AUC (Area Under Curve performance and a threshold dependent method that employs a confusion matrix; b integrating data from experimentally validated and in silico predicted epitopes; and c integrating the subcellular localization predictions and experimental data. NetCTL, NetMHC, BepiPred, BCPred12, and AAP12 algorithms were used for in silico epitope prediction and WoLF PSORT, Sigcleave and TargetP for in silico subcellular localization prediction against trypanosomatid genomes. Results A database-driven epitope prediction method was developed with built-in functions that were capable of: a removing experimental data redundancy; b parsing algorithms predictions and storage experimental validated and predict data; and c evaluating algorithm performances. Results show that a better performance is achieved when the combined prediction is considered. This is particularly true for B cell epitope predictors, where the combined prediction of AAP12 and BCPred12 reached an AUC value

  19. [Prediction in medicine--genome contra envirome].

    Science.gov (United States)

    Brdicka, Radim

    2012-01-01

    Human phenotype is governed by its genotype--a set of genetic information materialized in DNA. Using traditional terminology we speak about a little more than 20 thousands genes that differ in strength to become realized and their effect is modified by a large number of other genes. The result originates from firmly established programmes we obtained from our ancestors. Development and activity of such molecules selected for maintenance, copying and transfer of information i.e. nucleic acids can be followed back to the very origin of the life. Nevertheless the final result is achieved not only by confrontation of the original information with other genetic information but largely also by external influences--environment. Though we are relatively successful in understanding what we have inherited from our parents, our knowledge of environmental factors and their effects on formation of the phenotype is still limited. From this point of view medical prediction has always to be very cautious and interpretations at the probability level must be done by a very experienced and responsible professional.

  20. The Human Genome Diversity (HGD) Project. Summary document

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1993-12-31

    In 1991 a group of human geneticists and molecular biologists proposed to the scientific community that a world wide survey be undertaken of variation in the human genome. To aid their considerations, the committee therefore decided to hold a small series of international workshops to explore the major scientific issues involved. The intention was to define a framework for the project which could provide a basis for much wider and more detailed discussion and planning--it was recognized that the successful implementation of the proposed project, which has come to be known as the Human Genome Diversity (HGD) Project, would not only involve scientists but also various national and international non-scientific groups all of which should contribute to the project`s development. The international HGD workshop held in Sardinia in September 1993 was the last in the initial series of planning workshops. As such it not only explored new ground but also pulled together into a more coherent form much of the formal and informal discussion that had taken place in the preceding two years. This report presents the deliberations of the Sardinia workshop within a consideration of the overall development of the HGD Project to date.

  1. The environmental genome project: ethical, legal, and social implications.

    OpenAIRE

    Sharp, R R; Barrett, J. C.

    2000-01-01

    The National Institute of Environmental Health Sciences is supporting a multiyear research initiative examining genetic influences on environmental response. Proponents of this new initiative, known as the Environmental Genome Project, hope that the information learned will improve our understanding of environmentally associated diseases and allow clinicians and public health officials to target disease-prevention strategies to those who are at increased risk. Despite these potential benefits...

  2. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

    Directory of Open Access Journals (Sweden)

    Surovcik Katharina

    2006-03-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired

  3. An infinitesimal model for quantitative trait genomic value prediction.

    Directory of Open Access Journals (Sweden)

    Zhiqiu Hu

    Full Text Available We developed a marker based infinitesimal model for quantitative trait analysis. In contrast to the classical infinitesimal model, we now have new information about the segregation of every individual locus of the entire genome. Under this new model, we propose that the genetic effect of an individual locus is a function of the genome location (a continuous quantity. The overall genetic value of an individual is the weighted integral of the genetic effect function along the genome. Numerical integration is performed to find the integral, which requires partitioning the entire genome into a finite number of bins. Each bin may contain many markers. The integral is approximated by the weighted sum of all the bin effects. We now turn the problem of marker analysis into bin analysis so that the model dimension has decreased from a virtual infinity to a finite number of bins. This new approach can efficiently handle virtually unlimited number of markers without marker selection. The marker based infinitesimal model requires high linkage disequilibrium of all markers within a bin. For populations with low or no linkage disequilibrium, we develop an adaptive infinitesimal model. Both the original and the adaptive models are tested using simulated data as well as beef cattle data. The simulated data analysis shows that there is always an optimal number of bins at which the predictability of the bin model is much greater than the original marker analysis. Result of the beef cattle data analysis indicates that the bin model can increase the predictability from 10% (multiple marker analysis to 33% (multiple bin analysis. The marker based infinitesimal model paves a way towards the solution of genetic mapping and genomic selection using the whole genome sequence data.

  4. Predicting human genetic interactions from cancer genome evolution.

    Directory of Open Access Journals (Sweden)

    Xiaowen Lu

    Full Text Available Synthetic Lethal (SL genetic interactions play a key role in various types of biological research, ranging from understanding genotype-phenotype relationships to identifying drug-targets against cancer. Despite recent advances in empirical measuring SL interactions in human cells, the human genetic interaction map is far from complete. Here, we present a novel approach to predict this map by exploiting patterns in cancer genome evolution. First, we show that empirically determined SL interactions are reflected in various gene presence, absence, and duplication patterns in hundreds of cancer genomes. The most evident pattern that we discovered is that when one member of an SL interaction gene pair is lost, the other gene tends not to be lost, i.e. the absence of co-loss. This observation is in line with expectation, because the loss of an SL interacting pair will be lethal to the cancer cell. SL interactions are also reflected in gene expression profiles, such as an under representation of cases where the genes in an SL pair are both under expressed, and an over representation of cases where one gene of an SL pair is under expressed, while the other one is over expressed. We integrated the various previously unknown cancer genome patterns and the gene expression patterns into a computational model to identify SL pairs. This simple, genome-wide model achieves a high prediction power (AUC = 0.75 for known genetic interactions. It allows us to present for the first time a comprehensive genome-wide list of SL interactions with a high estimated prediction precision, covering up to 591,000 gene pairs. This unique list can potentially be used in various application areas ranging from biotechnology to medical genetics.

  5. Genomic prediction for tuberculosis resistance in dairy cattle.

    Directory of Open Access Journals (Sweden)

    Smaragda Tsairidou

    Full Text Available BACKGROUND: The increasing prevalence of bovine tuberculosis (bTB in the UK and the limitations of the currently available diagnostic and control methods require the development of complementary approaches to assist in the sustainable control of the disease. One potential approach is the identification of animals that are genetically more resistant to bTB, to enable breeding of animals with enhanced resistance. This paper focuses on prediction of resistance to bTB. We explore estimation of direct genomic estimated breeding values (DGVs for bTB resistance in UK dairy cattle, using dense SNP chip data, and test these genomic predictions for situations when disease phenotypes are not available on selection candidates. METHODOLOGY/PRINCIPAL FINDINGS: We estimated DGVs using genomic best linear unbiased prediction methodology, and assessed their predictive accuracies with a cross validation procedure and receiver operator characteristic (ROC curves. Furthermore, these results were compared with theoretical expectations for prediction accuracy and area-under-the-ROC-curve (AUC. The dataset comprised 1151 Holstein-Friesian cows (bTB cases or controls. All individuals (592 cases and 559 controls were genotyped for 727,252 loci (Illumina Bead Chip. The estimated observed heritability of bTB resistance was 0.23±0.06 (0.34 on the liability scale and five-fold cross validation, replicated six times, provided a prediction accuracy of 0.33 (95% C.I.: 0.26, 0.40. ROC curves, and the resulting AUC, gave a probability of 0.58, averaged across six replicates, of correctly classifying cows as diseased or as healthy based on SNP chip genotype alone using these data. CONCLUSIONS/SIGNIFICANCE: These results provide a first step in the investigation of the potential feasibility of genomic selection for bTB resistance using SNP data. Specifically, they demonstrate that genomic selection is possible, even in populations with no pedigree data and on animals lacking b

  6. Psoriasis prediction from genome-wide SNP profiles

    Directory of Open Access Journals (Sweden)

    Fang Xiangzhong

    2011-01-01

    Full Text Available Abstract Background With the availability of large-scale genome-wide association study (GWAS data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs to predict psoriasis from searching GWAS data. Methods Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB method was compared with classical linear discriminant analysis(LDA for classification performance. Results The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698, while only 0.520(95% CI: 0.472-0.524 was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study. Conclusions The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.

  7. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    Science.gov (United States)

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  8. Predicting disease trait with genomic data: a composite kernel approach.

    Science.gov (United States)

    Yang, Haitao; Li, Shaoyu; Cao, Hongyan; Zhang, Chichen; Cui, Yuehua

    2016-06-02

    With the advancement of biotechniques, a vast amount of genomic data is generated with no limit. Predicting a disease trait based on these data offers a cost-effective and time-efficient way for early disease screening. Here we proposed a composite kernel partial least squares (CKPLS) regression model for quantitative disease trait prediction focusing on genomic data. It can efficiently capture nonlinear relationships among features compared with linear learning algorithms such as Least Absolute Shrinkage and Selection Operator or ridge regression. We proposed to optimize the kernel parameters and kernel weights with the genetic algorithm (GA). In addition to improved performance for parameter optimization, the proposed GA-CKPLS approach also has better learning capacity and generalization ability compared with single kernel-based KPLS method as well as other nonlinear prediction models such as the support vector regression. Extensive simulation studies demonstrated that GA-CKPLS had better prediction performance than its counterparts under different scenarios. The utility of the method was further demonstrated through two case studies. Our method provides an efficient quantitative platform for disease trait prediction based on increasing volume of omics data.

  9. Genomic prediction in a breeding program of perennial ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario; Ashraf, Bilal; Greve-Pedersen, Morten

    2015-01-01

    We present a genomic selection study performed on 1918 rye grass families (Lolium perenne L.), which were derived from a commercial breeding program at DLF-Trifolium, Denmark. Phenotypes were recorded on standard plots, across 13 years and in 6 different countries. Variants were identified...... this set. Estimated Breeding Value and prediction accuracies were calculated trough two different cross-validation schemes: (i) k-fold (k=10); (ii) leaving out one parent combination at the time, in order to test for accuracy of predicting new families. Accuracies ranged between 0.56 and 0.97 for scheme (i....... A larger set of 1791 F2s were used as training set to predict EBVs of 127 synthetic families (originated from poly-crosses between 5-11 single plants) for heading date and crown rust resistance. Prediction accuracies were 0.93 and 0.57 respectively. Results clearly demonstrate considerable potential...

  10. Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

    Directory of Open Access Journals (Sweden)

    Sungkyoung Choi

    2016-12-01

    Full Text Available The success of genome-wide association studies (GWASs has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR, least absolute shrinkage and selection operator (LASSO, and Elastic-Net (EN. We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

  11. Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

    Science.gov (United States)

    Choi, Sungkyoung; Bae, Sunghwan

    2016-01-01

    The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

  12. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Fenner, Marsha W; Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

    2007-12-31

    The Genomes On Line Database (GOLD) is a comprehensive resource of information for genome and metagenome projects world-wide. GOLD provides access to complete and ongoing projects and their associated metadata through pre-computed lists and a search page. The database currently incorporates information for more than 2900 sequencing projects, of which 639 have been completed and the data deposited in the public databases. GOLD is constantly expanding to provide metadata information related to the project and the organism and is compliant with the Minimum Information about a Genome Sequence (MIGS) specifications.

  13. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

    Energy Technology Data Exchange (ETDEWEB)

    Liolios, Konstantinos; Chen, Amy; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Phil; Markowitz, Victor; Kyrpides, Nikos C.

    2009-09-01

    The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification.

  14. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... itself. Depending on the trait’s economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage...... was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from...

  15. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects

    Directory of Open Access Journals (Sweden)

    Holt Carson

    2011-12-01

    Full Text Available Abstract Background Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. Results We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. Conclusions MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  16. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.

    Science.gov (United States)

    Holt, Carson; Yandell, Mark

    2011-12-22

    Second-generation sequencing technologies are precipitating major shifts with regards to what kinds of genomes are being sequenced and how they are annotated. While the first generation of genome projects focused on well-studied model organisms, many of today's projects involve exotic organisms whose genomes are largely terra incognita. This complicates their annotation, because unlike first-generation projects, there are no pre-existing 'gold-standard' gene-models with which to train gene-finders. Improvements in genome assembly and the wide availability of mRNA-seq data are also creating opportunities to update and re-annotate previously published genome annotations. Today's genome projects are thus in need of new genome annotation tools that can meet the challenges and opportunities presented by second-generation sequencing technologies. We present MAKER2, a genome annotation and data management tool designed for second-generation genome projects. MAKER2 is a multi-threaded, parallelized application that can process second-generation datasets of virtually any size. We show that MAKER2 can produce accurate annotations for novel genomes where training-data are limited, of low quality or even non-existent. MAKER2 also provides an easy means to use mRNA-seq data to improve annotation quality; and it can use these data to update legacy annotations, significantly improving their quality. We also show that MAKER2 can evaluate the quality of genome annotations, and identify and prioritize problematic annotations for manual review. MAKER2 is the first annotation engine specifically designed for second-generation genome projects. MAKER2 scales to datasets of any size, requires little in the way of training data, and can use mRNA-seq data to improve annotation quality. It can also update and manage legacy genome annotation datasets.

  17. The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects [version 1; referees: 2 approved, 1 approved with reservations

    Directory of Open Access Journals (Sweden)

    Alexie Papanicolaou

    2016-01-01

    Full Text Available Many research programs on non-model species biology have been empowered by genomics. In turn, genomics is underpinned by a reference sequence and ancillary information created by so-called “genome projects”. The most reliable genome projects are the ones created as part of an active research program and designed to address specific questions but their life extends past publication. In this opinion paper I outline four key insights that have facilitated maintaining genomic communities: the key role of computational capability, the iterative process of building genomic resources, the value of community participation and the importance of manual curation. Taken together, these ideas can and do ensure the longevity of genome projects and the growing non-model species community can use them to focus a discussion with regards to its future genomic infrastructure.

  18. Prospects for the Chinese Human Genome Project (HGP)at the beginning of next century

    Institute of Scientific and Technical Information of China (English)

    2000-01-01

    Chinese Human Genome Project (CHGP) as part of the international human genome research has achieved significant progress and created a solid foundation for further development. While participating in the human genome sequencing and gene discovery, the emphasis of CHGP in the next century will be laid on functional genomics. The strategy, resources and some policy issues will be addressed.

  19. Predicting genome-wide redundancy using machine learning

    Directory of Open Access Journals (Sweden)

    Shasha Dennis E

    2010-11-01

    Full Text Available Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1, suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

  20. Documenting genomics: Applying archival theory to preserving the records of the Human Genome Project.

    Science.gov (United States)

    Shaw, Jennifer

    2016-02-01

    The Human Genome Archive Project (HGAP) aimed to preserve the documentary heritage of the UK's contribution to the Human Genome Project (HGP) by using archival theory to develop a suitable methodology for capturing the results of modern, collaborative science. After assessing past projects and different archival theories, the HGAP used an approach based on the theory of documentation strategy to try to capture the records of a scientific project that had an influence beyond the purely scientific sphere. The HGAP was an archival survey that ran for two years. It led to ninety scientists being contacted and has, so far, led to six collections being deposited in the Wellcome Library, with additional collections being deposited in other UK repositories. In applying documentation strategy the HGAP was attempting to move away from traditional archival approaches to science, which have generally focused on retired Nobel Prize winners. It has been partially successful in this aim, having managed to secure collections from people who are not 'big names', but who made an important contribution to the HGP. However, the attempt to redress the gender imbalance in scientific collections and to improve record-keeping in scientific organisations has continued to be difficult to achieve.

  1. KRAS Genomic Status Predicts the Sensitivity of Ovarian Cancer Cells to Decitabine | Office of Cancer Genomics

    Science.gov (United States)

    Decitabine, a cancer therapeutic that inhibits DNA methylation, produces variable antitumor response rates in patients with solid tumors that might be leveraged clinically with identification of a predictive biomarker. In this study, we profiled the response of human ovarian, melanoma, and breast cancer cells treated with decitabine, finding that RAS/MEK/ERK pathway activation and DNMT1 expression correlated with cytotoxic activity. Further, we showed that KRAS genomic status predicted decitabine sensitivity in low-grade and high-grade serous ovarian cancer cells.

  2. Predicting statistical properties of open reading frames in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Katharina Mir

    Full Text Available An analytical model based on the statistical properties of Open Reading Frames (ORFs of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.

  3. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

    Directory of Open Access Journals (Sweden)

    Vijaykumar Yogesh Muley

    Full Text Available BACKGROUND: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. METHODS: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. CONCLUSIONS: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling

  4. Genomic medicine and risk prediction across the disease spectrum.

    Science.gov (United States)

    Kotze, Maritha J; Lückhoff, Hilmar K; Peeters, Armand V; Baatjes, Karin; Schoeman, Mardelle; van der Merwe, Lize; Grant, Kathleen A; Fisher, Leslie R; van der Merwe, Nicole; Pretorius, Jacobus; van Velden, David P; Myburgh, Ettienne J; Pienaar, Fredrieka M; van Rensburg, Susan J; Yako, Yandiswa Y; September, Alison V; Moremi, Kelebogile E; Cronje, Frans J; Tiffin, Nicki; Bouwens, Christianne S H; Bezuidenhout, Juanita; Apffelstaedt, Justus P; Hough, F Stephen; Erasmus, Rajiv T; Schneider, Johann W

    2015-01-01

    Genomic medicine is based on the knowledge that virtually every medical condition, disease susceptibility or response to treatment is caused, regulated or influenced by genes. Genetic testing may therefore add value across the disease spectrum, ranging from single-gene disorders with a Mendelian inheritance pattern to complex multi-factorial diseases. The critical factors for genomic risk prediction are to determine: (1) where the genomic footprint of a particular susceptibility or dysfunction resides within this continuum, and (2) to what extent the genetic determinants are modified by environmental exposures. Regarding the small subset of highly penetrant monogenic disorders, a positive family history and early disease onset are mostly sufficient to determine the appropriateness of genetic testing in the index case and to inform pre-symptomatic diagnosis in at-risk family members. In more prevalent polygenic non-communicable diseases (NCDs), the use of appropriate eligibility criteria is required to ensure a balance between benefit and risk. An additional screening step may therefore be necessary to identify individuals most likely to benefit from genetic testing. This need provided the stimulus for the development of a pathology-supported genetic testing (PSGT) service as a new model for the translational implementation of genomic medicine in clinical practice. PSGT is linked to the establishment of a research database proven to be an invaluable resource for the validation of novel and previously described gene-disease associations replicated in the South African population for a broad range of NCDs associated with increased cardio-metabolic risk. The clinical importance of inquiry concerning family history in determining eligibility for personalized genotyping was supported beyond its current limited role in diagnosing or screening for monogenic subtypes of NCDs. With the recent introduction of advanced microarray-based breast cancer subtyping, genetic testing

  5. Prediction of Unsteady Transonic Aerodynamics Project

    Data.gov (United States)

    National Aeronautics and Space Administration — An accurate prediction of aero-elastic effects depends on an accurate prediction of the unsteady aerodynamic forces. Perhaps the most difficult speed regime is...

  6. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots.

    Directory of Open Access Journals (Sweden)

    Sunita Kumari

    Full Text Available Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs. The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica and dicot (A. thaliana genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.

  7. The evolution of genomic imprinting: theories, predictions and empirical tests.

    Science.gov (United States)

    Patten, M M; Ross, L; Curley, J P; Queller, D C; Bonduriansky, R; Wolf, J B

    2014-08-01

    The epigenetic phenomenon of genomic imprinting has motivated the development of numerous theories for its evolutionary origins and genomic distribution. In this review, we examine the three theories that have best withstood theoretical and empirical scrutiny. These are: Haig and colleagues' kinship theory; Day and Bonduriansky's sexual antagonism theory; and Wolf and Hager's maternal-offspring coadaptation theory. These theories have fundamentally different perspectives on the adaptive significance of imprinting. The kinship theory views imprinting as a mechanism to change gene dosage, with imprinting evolving because of the differential effect that gene dosage has on the fitness of matrilineal and patrilineal relatives. The sexual antagonism and maternal-offspring coadaptation theories view genomic imprinting as a mechanism to modify the resemblance of an individual to its two parents, with imprinting evolving to increase the probability of expressing the fitter of the two alleles at a locus. In an effort to stimulate further empirical work on the topic, we carefully detail the logic and assumptions of all three theories, clarify the specific predictions of each and suggest tests to discriminate between these alternative theories for why particular genes are imprinted.

  8. Genomic prediction contributing to a promising global strategy to turbocharge gene banks.

    Science.gov (United States)

    Yu, Xiaoqing; Li, Xianran; Guo, Tingting; Zhu, Chengsong; Wu, Yuye; Mitchell, Sharon E; Roozeboom, Kraig L; Wang, Donghai; Wang, Ming Li; Pederson, Gary A; Tesso, Tesfaye T; Schnable, Patrick S; Bernardo, Rex; Yu, Jianming

    2016-10-03

    The 7.4 million plant accessions in gene banks are largely underutilized due to various resource constraints, but current genomic and analytic technologies are enabling us to mine this natural heritage. Here we report a proof-of-concept study to integrate genomic prediction into a broad germplasm evaluation process. First, a set of 962 biomass sorghum accessions were chosen as a reference set by germplasm curators. With high throughput genotyping-by-sequencing (GBS), we genetically characterized this reference set with 340,496 single nucleotide polymorphisms (SNPs). A set of 299 accessions was selected as the training set to represent the overall diversity of the reference set, and we phenotypically characterized the training set for biomass yield and other related traits. Cross-validation with multiple analytical methods using the data of this training set indicated high prediction accuracy for biomass yield. Empirical experiments with a 200-accession validation set chosen from the reference set confirmed high prediction accuracy. The potential to apply the prediction model to broader genetic contexts was also examined with an independent population. Detailed analyses on prediction reliability provided new insights into strategy optimization. The success of this project illustrates that a global, cost-effective strategy may be designed to assess the vast amount of valuable germplasm archived in 1,750 gene banks.

  9. Short communication : Validation of genomic breeding value predictions for feed intake and feed efficiency traits

    NARCIS (Netherlands)

    Pryce, J.E.; Wales, W.J.; Haas, de Y.; Veerkamp, R.F.; Hayes, B.J.; Coffey, M.P.; Marett, L.C.; Bornhill, J.B.; Gonzalez-Recio, O.

    2014-01-01

    Validating genomic prediction equations in independent populations is an important part of evaluating genomic selection. Published genomic predictions from 2 studies on (1) residual feed intake and (2) dry matter intake (DMI) were validated in a cohort of 78 multiparous Holsteins from Australia. The

  10. Are Psychotherapeutic Changes Predictable? Comparison of a Chicago Counseling Center Project with a Penn Psychotherapy Project.

    Science.gov (United States)

    Luborsky, Lester; And Others

    1979-01-01

    Compared studies predicting outcomes of psychotherapy. Level of prediction success in both projects was modest. Particularly for the rated benefits score, the profile of variables showed similar levels of success between the projects. Successful predictions were based on adequacy of personality functioning, match on marital status, and length of…

  11. Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

    Institute of Scientific and Technical Information of China (English)

    Heng Li; Tao Liu; Hai-Hong Li; Yan Li; Li-Jun Fang; Hui-Min Xie; Wei-Mou Zheng; Bai-Lin Hao; Jin-Song Liu; Zhao Xu; Jiao Jin; Lin Fang; Lei Gao; Yu-Dong Li; Zi-Xing Xing; Shao-Gen Gao

    2005-01-01

    With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM,GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

  12. Automated protein function prediction--the genomic challenge.

    Science.gov (United States)

    Friedberg, Iddo

    2006-09-01

    Overwhelmed with genomic data, biologists are facing the first big post-genomic question--what do all genes do? First, not only is the volume of pure sequence and structure data growing, but its diversity is growing as well, leading to a disproportionate growth in the number of uncharacterized gene products. Consequently, established methods of gene and protein annotation, such as homology-based transfer, are annotating less data and in many cases are amplifying existing erroneous annotation. Second, there is a need for a functional annotation which is standardized and machine readable so that function prediction programs could be incorporated into larger workflows. This is problematic due to the subjective and contextual definition of protein function. Third, there is a need to assess the quality of function predictors. Again, the subjectivity of the term 'function' and the various aspects of biological function make this a challenging effort. This article briefly outlines the history of automated protein function prediction and surveys the latest innovations in all three topics.

  13. Probabilistic protein function prediction from heterogeneous genome-wide data.

    Directory of Open Access Journals (Sweden)

    Naoki Nariai

    Full Text Available Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.

  14. Programming Useful Life Prediction (PULP) Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Accurately predicting Remaining Useful Life (RUL) provides significant benefits—it increases safety and reduces financial and labor resource requirements....

  15. The GenABEL Project for statistical genomics.

    Science.gov (United States)

    Karssen, Lennart C; van Duijn, Cornelia M; Aulchenko, Yurii S

    2016-01-01

    Development of free/libre open source software is usually done by a community of people with an interest in the tool. For scientific software, however, this is less often the case. Most scientific software is written by only a few authors, often a student working on a thesis. Once the paper describing the tool has been published, the tool is no longer developed further and is left to its own device. Here we describe the broad, multidisciplinary community we formed around a set of tools for statistical genomics. The GenABEL project for statistical omics actively promotes open interdisciplinary development of statistical methodology and its implementation in efficient and user-friendly software under an open source licence. The software tools developed withing the project collectively make up the GenABEL suite, which currently consists of eleven tools. The open framework of the project actively encourages involvement of the community in all stages, from formulation of methodological ideas to application of software to specific data sets. A web forum is used to channel user questions and discussions, further promoting the use of the GenABEL suite. Developer discussions take place on a dedicated mailing list, and development is further supported by robust development practices including use of public version control, code review and continuous integration. Use of this open science model attracts contributions from users and developers outside the "core team", facilitating agile statistical omics methodology development and fast dissemination.

  16. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  17. Human Genome Teacher Networking Project, Final Report, April 1, 1992 - March 31, 1998

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Debra

    1999-10-01

    Project to provide education regarding ethical legal and social implications of Human Genome Project to high school science teachers through two consecutive summer workshops, in class activities, and peer teaching workshops.

  18. The UK Human Genome Mapping Project online computing service.

    Science.gov (United States)

    Rysavy, F R; Bishop, M J; Gibbs, G P; Williams, G W

    1992-04-01

    This paper presents an overview of computing and networking facilities developed by the Medical Research Council to provide online computing support to the Human Genome Mapping Project (HGMP) in the UK. The facility is connected to a number of other computing facilities in various centres of genetics and molecular biology research excellence, either directly via high-speed links or through national and international wide-area networks. The paper describes the design and implementation of the current system, a 'client/server' network of Sun, IBM, DEC and Apple servers, gateways and workstations. A short outline of online computing services currently delivered by this system to the UK human genetics research community is also provided. More information about the services and their availability could be obtained by a direct approach to the UK HGMP-RC.

  19. Citrus sinensis annotation project (CAP: a comprehensive database for sweet orange genome.

    Directory of Open Access Journals (Sweden)

    Jia Wang

    Full Text Available Citrus is one of the most important and widely grown fruit crop with global production ranking firstly among all the fruit crops in the world. Sweet orange accounts for more than half of the Citrus production both in fresh fruit and processed juice. We have sequenced the draft genome of a double-haploid sweet orange (C. sinensis cv. Valencia, and constructed the Citrus sinensis annotation project (CAP to store and visualize the sequenced genomic and transcriptome data. CAP provides GBrowse-based organization of sweet orange genomic data, which integrates ab initio gene prediction, EST, RNA-seq and RNA-paired end tag (RNA-PET evidence-based gene annotation. Furthermore, we provide a user-friendly web interface to show the predicted protein-protein interactions (PPIs and metabolic pathways in sweet orange. CAP provides comprehensive information beneficial to the researchers of sweet orange and other woody plants, which is freely available at http://citrus.hzau.edu.cn/.

  20. Comparative genomics boosts target prediction for bacterial small RNAs.

    Science.gov (United States)

    Wright, Patrick R; Richter, Andreas S; Papenfort, Kai; Mann, Martin; Vogel, Jörg; Hess, Wolfgang R; Backofen, Rolf; Georg, Jens

    2013-09-10

    Small RNAs (sRNAs) constitute a large and heterogeneous class of bacterial gene expression regulators. Much like eukaryotic microRNAs, these sRNAs typically target multiple mRNAs through short seed pairing, thereby acting as global posttranscriptional regulators. In some bacteria, evidence for hundreds to possibly more than 1,000 different sRNAs has been obtained by transcriptome sequencing. However, the experimental identification of possible targets and, therefore, their confirmation as functional regulators of gene expression has remained laborious. Here, we present a strategy that integrates phylogenetic information to predict sRNA targets at the genomic scale and reconstructs regulatory networks upon functional enrichment and network analysis (CopraRNA, for Comparative Prediction Algorithm for sRNA Targets). Furthermore, CopraRNA precisely predicts the sRNA domains for target recognition and interaction. When applied to several model sRNAs, CopraRNA revealed additional targets and functions for the sRNAs CyaR, FnrS, RybB, RyhB, SgrS, and Spot42. Moreover, the mRNAs gdhA, lrp, marA, nagZ, ptsI, sdhA, and yobF-cspC were suggested as regulatory hubs targeted by up to seven different sRNAs. The verification of many previously undetected targets by CopraRNA, even for extensively investigated sRNAs, demonstrates its advantages and shows that CopraRNA-based analyses can compete with experimental target prediction approaches. A Web interface allows high-confidence target prediction and efficient classification of bacterial sRNAs.

  1. Genomics: Tool to predict and prevent male infertility.

    Science.gov (United States)

    Halder, Ashutosh; Kumar, Prashant; Jain, Manish; Kalsi, Amanpreet Kaur

    2017-06-01

    A large number of human diseases arise as a result of genetic abnormalities. With the advent of improved molecular biology techniques, the genetic etiology of male infertility is increasing. The common genetic factors responsible for male infertility are chromosomal abnormalities, Yq microdeletion and cystic fibrosis. These are responsible for approximately 30 percent cases of male infertility. About 40 percent cases of male infertility are categorized as idiopathic. These cases may be associated with genetic and genomic abnormalities. During last few years more and more genes are implicated in male infertility leading to decline in prevalence of idiopathic etiology. In this review we will summarize up to date published works on genetic etiologies of male infertility including our own works. We also briefly describe reproductive technologies used to overcome male infertility, dangers of transmitting genetic disorders to offspring and ways to prevent transmission of genetic disorders during assisted reproduction. At the end we will provide our points on how genomic information can be utilized for prediction and prevention of male infertility in coming years.

  2. Genomic prediction and genome-wide association analysis of female longevity in a composite beef cattle breed

    Science.gov (United States)

    Longevity is a highly important trait to the efficiency of beef cattle production. The objective of this study was to evaluate the genomic prediction of longevity and identify genomic regions associated with this trait. The data used in this study consisted of 547 Composite Gene Combination (CGC) c...

  3. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    Science.gov (United States)

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  4. The lawful uses of knowledge from the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Grad, F.P.

    1994-04-15

    Part I of this study deals with the right to know or not to know personal genetic information, and examines available legal protections of the right of privacy and the adverse effect of the disclosure of genetic information both on employment and insurance interests and on self esteem and protection of personal integrity. The study examines the rationale for the legal protection of privacy as the protection of a public interest. It examines the very limited protections currently available for privacy interests, including genetic privacy interests, and concludes that there is a need for broader, more far-reaching legal protections. The second part of the study is based on the assumption that as major a project as the Human Genome Project, spending billions of dollars on science which is health related, will indeed be applied for preventive and therapeutic public health purposes, as it has been in the past. It also addresses the recurring fear that public health initiatives in the genetic area must evolve a new eugenic agenda, that we must not repeat the miserable discriminatory experiences of the past.

  5. A Computational Tool for Helicopter Rotor Noise Prediction Project

    Data.gov (United States)

    National Aeronautics and Space Administration — This SBIR project proposes to develop a computational tool for helicopter rotor noise prediction based on hybrid Cartesian grid/gridless approach. The uniqueness of...

  6. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project

    NARCIS (Netherlands)

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Alhusseini, Tamera I; Bedford, Felicity E; Bennett, Dominic J; Booth, Hollie; Burton, Victoria J; Chng, Charlotte W T; Choimes, Argyrios; Correia, David L P; Day, Julie; Echeverría-Londoño, Susy; Emerson, Susan R; Gao, Di; Garon, Morgan; Harrison, Michelle L K; Ingram, Daniel J; Jung, Martin; Kemp, Victoria; Kirkpatrick, Lucinda; Martin, Callum D; Pan, Yuan; Pask-Hale, Gwilym D; Pynegar, Edwin L; Robinson, Alexandra N; Sanchez-Ortiz, Katia; Senior, Rebecca A; Simmons, Benno I; White, Hannah J; Zhang, Hanbin; Aben, Job; Abrahamczyk, Stefan; Adum, Gilbert B; Aguilar-Barquero, Virginia; Aizen, Marcelo A; Albertos, Belén; Alcala, E L; Del Mar Alguacil, Maria; Alignier, Audrey; Ancrenaz, Marc; Andersen, Alan N; Arbeláez-Cortés, Enrique; Armbrecht, Inge; Arroyo-Rodríguez, Víctor; Aumann, Tom; Axmacher, Jan C; Azhar, Badrul; Azpiroz, Adrián B; Baeten, Lander; Bakayoko, Adama; Báldi, András; Banks, John E; Baral, Sharad K; Barlow, Jos; Barratt, Barbara I P; Barrico, Lurdes; Bartolommei, Paola; Barton, Diane M; Basset, Yves; Batáry, Péter; Bates, Adam J; Baur, Bruno; Bayne, Erin M; Beja, Pedro; Benedick, Suzan; Berg, Åke; Bernard, Henry; Berry, Nicholas J; Bhatt, Dinesh; Bicknell, Jake E; Bihn, Jochen H; Blake, Robin J; Bobo, Kadiri S; Bóçon, Roberto; Boekhout, Teun; Böhning-Gaese, Katrin; Bonham, Kevin J; Borges, Paulo A V; Borges, Sérgio H; Boutin, Céline; Bouyer, Jérémy; Bragagnolo, Cibele; Brandt, Jodi S; Brearley, Francis Q; Brito, Isabel; Bros, Vicenç; Brunet, Jörg; Buczkowski, Grzegorz; Buddle, Christopher M; Bugter, Rob; Buscardo, Erika; Buse, Jörn; Cabra-García, Jimmy; Cáceres, Nilton C; Cagle, Nicolette L; Calviño-Cancela, María; Cameron, Sydney A; Cancello, Eliana M; Caparrós, Rut; Cardoso, Pedro; Carpenter, Dan; Carrijo, Tiago F; Carvalho, Anelena L; Cassano, Camila R; Castro, Helena; Castro-Luna, Alejandro A; Rolando, Cerda B; Cerezo, Alexis; Chapman, Kim Alan; Chauvat, Matthieu; Christensen, Morten; Clarke, Francis M; Cleary, Daniel F R; Colombo, Giorgio; Connop, Stuart P; Craig, Michael D; Cruz-López, Leopoldo; Cunningham, Saul A; D'Aniello, Biagio; D'Cruze, Neil; da Silva, Pedro Giovâni; Dallimer, Martin; Danquah, Emmanuel; Darvill, Ben; Dauber, Jens; Davis, Adrian L V; Dawson, Jeff; de Sassi, Claudio; de Thoisy, Benoit; Deheuvels, Olivier; Dejean, Alain; Devineau, Jean-Louis; Diekötter, Tim; Dolia, Jignasu V; Domínguez, Erwin; Dominguez-Haydar, Yamileth; Dorn, Silvia; Draper, Isabel; Dreber, Niels; Dumont, Bertrand; Dures, Simon G; Dynesius, Mats; Edenius, Lars; Eggleton, Paul; Eigenbrod, Felix; Elek, Zoltán; Entling, Martin H; Esler, Karen J; de Lima, Ricardo F; Faruk, Aisyah; Farwig, Nina; Fayle, Tom M; Felicioli, Antonio; Felton, Annika M; Fensham, Roderick J; Fernandez, Ignacio C; Ferreira, Catarina C; Ficetola, Gentile F; Fiera, Cristina; Filgueiras, Bruno K C; Fırıncıoğlu, Hüseyin K; Flaspohler, David; Floren, Andreas; Fonte, Steven J; Fournier, Anne; Fowler, Robert E; Franzén, Markus; Fraser, Lauchlan H; Fredriksson, Gabriella M; Freire, Geraldo B; Frizzo, Tiago L M; Fukuda, Daisuke; Furlani, Dario; Gaigher, René; Ganzhorn, Jörg U; García, Karla P; Garcia-R, Juan C; Garden, Jenni G; Garilleti, Ricardo; Ge, Bao-Ming; Gendreau-Berthiaume, Benoit; Gerard, Philippa J; Gheler-Costa, Carla; Gilbert, Benjamin; Giordani, Paolo; Giordano, Simonetta; Golodets, Carly; Gomes, Laurens G L; Gould, Rachelle K; Goulson, Dave; Gove, Aaron D; Granjon, Laurent; Grass, Ingo; Gray, Claudia L; Grogan, James; Gu, Weibin; Guardiola, Moisès; Gunawardene, Nihara R; Gutierrez, Alvaro G; Gutiérrez-Lamus, Doris L; Haarmeyer, Daniela H; Hanley, Mick E; Hanson, Thor; Hashim, Nor R; Hassan, Shombe N; Hatfield, Richard G; Hawes, Joseph E; Hayward, Matt W; Hébert, Christian; Helden, Alvin J; Henden, John-André; Henschel, Philipp; Hernández, Lionel; Herrera, James P; Herrmann, Farina; Herzog, Felix; Higuera-Diaz, Diego; Hilje, Branko; Höfer, Hubert; Hoffmann, Anke; Horgan, Finbarr G; Hornung, Elisabeth; Horváth, Roland; Hylander, Kristoffer; Isaacs-Cubides, Paola; Ishida, Hiroaki; Ishitani, Masahiro; Jacobs, Carmen T; Jaramillo, Víctor J; Jauker, Birgit; Hernández, F Jiménez; Johnson, McKenzie F; Jolli, Virat; Jonsell, Mats; Juliani, S Nur; Jung, Thomas S; Kapoor, Vena; Kappes, Heike; Kati, Vassiliki; Katovai, Eric; Kellner, Klaus; Kessler, Michael; Kirby, Kathryn R; Kittle, Andrew M; Knight, Mairi E; Knop, Eva; Kohler, Florian; Koivula, Matti; Kolb, Annette; Kone, Mouhamadou; Kőrösi, Ádám; Krauss, Jochen; Kumar, Ajith; Kumar, Raman; Kurz, David J; Kutt, Alex S; Lachat, Thibault; Lantschner, Victoria; Lara, Francisco; Lasky, Jesse R; Latta, Steven C; Laurance, William F; Lavelle, Patrick; Le Féon, Violette; LeBuhn, Gretchen; Légaré, Jean-Philippe; Lehouck, Valérie; Lencinas, María V; Lentini, Pia E; Letcher, Susan G; Li, Qi; Litchwark, Simon A; Littlewood, Nick A; Liu, Yunhui; Lo-Man-Hung, Nancy; López-Quintero, Carlos A; Louhaichi, Mounir; Lövei, Gabor L; Lucas-Borja, Manuel Esteban; Luja, Victor H; Luskin, Matthew S; MacSwiney G, M Cristina; Maeto, Kaoru; Magura, Tibor; Mallari, Neil Aldrin; Malone, Louise A; Malonza, Patrick K; Malumbres-Olarte, Jagoba; Mandujano, Salvador; Måren, Inger E; Marin-Spiotta, Erika; Marsh, Charles J; Marshall, E J P; Martínez, Eliana; Martínez Pastur, Guillermo; Moreno Mateos, David; Mayfield, Margaret M; Mazimpaka, Vicente; McCarthy, Jennifer L; McCarthy, Kyle P; McFrederick, Quinn S; McNamara, Sean; Medina, Nagore G; Medina, Rafael; Mena, Jose L; Mico, Estefania; Mikusinski, Grzegorz; Milder, Jeffrey C; Miller, James R; Miranda-Esquivel, Daniel R; Moir, Melinda L; Morales, Carolina L; Muchane, Mary N; Muchane, Muchai; Mudri-Stojnic, Sonja; Munira, A Nur; Muoñz-Alonso, Antonio; Munyekenye, B F; Naidoo, Robin; Naithani, A; Nakagawa, Michiko; Nakamura, Akihiro; Nakashima, Yoshihiro; Naoe, Shoji; Nates-Parra, Guiomar; Navarrete Gutierrez, Dario A; Navarro-Iriarte, Luis; Ndang'ang'a, Paul K; Neuschulz, Eike L; Ngai, Jacqueline T; Nicolas, Violaine; Nilsson, Sven G; Noreika, Norbertas; Norfolk, Olivia; Noriega, Jorge Ari; Norton, David A; Nöske, Nicole M; Nowakowski, A Justin; Numa, Catherine; O'Dea, Niall; O'Farrell, Patrick J; Oduro, William; Oertli, Sabine; Ofori-Boateng, Caleb; Oke, Christopher Omamoke; Oostra, Vicencio; Osgathorpe, Lynne M; Otavo, Samuel Eduardo; Page, Navendu V; Paritsis, Juan; Parra-H, Alejandro; Parry, Luke; Pe'er, Guy; Pearman, Peter B; Pelegrin, Nicolás; Pélissier, Raphaël; Peres, Carlos A; Peri, Pablo L; Persson, Anna S; Petanidou, Theodora; Peters, Marcell K; Pethiyagoda, Rohan S; Phalan, Ben; Philips, T Keith; Pillsbury, Finn C; Pincheira-Ulbrich, Jimmy; Pineda, Eduardo; Pino, Joan; Pizarro-Araya, Jaime; Plumptre, A J; Poggio, Santiago L; Politi, Natalia; Pons, Pere; Poveda, Katja; Power, Eileen F; Presley, Steven J; Proença, Vânia; Quaranta, Marino; Quintero, Carolina; Rader, Romina; Ramesh, B R; Ramirez-Pinilla, Martha P; Ranganathan, Jai; Rasmussen, Claus; Redpath-Downing, Nicola A; Reid, J Leighton; Reis, Yana T; Rey Benayas, José M; Rey-Velasco, Juan Carlos; Reynolds, Chevonne; Ribeiro, Danilo Bandini; Richards, Miriam H; Richardson, Barbara A; Richardson, Michael J; Ríos, Rodrigo Macip; Robinson, Richard; Robles, Carolina A; Römbke, Jörg; Romero-Duque, Luz Piedad; Rös, Matthias; Rosselli, Loreta; Rossiter, Stephen J; Roth, Dana S; Roulston, T'ai H; Rousseau, Laurent; Rubio, André V; Ruel, Jean-Claude; Sadler, Jonathan P; Sáfián, Szabolcs; Saldaña-Vázquez, Romeo A; Sam, Katerina; Samnegård, Ulrika; Santana, Joana; Santos, Xavier; Savage, Jade; Schellhorn, Nancy A; Schilthuizen, Menno; Schmiedel, Ute; Schmitt, Christine B; Schon, Nicole L; Schüepp, Christof; Schumann, Katharina; Schweiger, Oliver; Scott, Dawn M; Scott, Kenneth A; Sedlock, Jodi L; Seefeldt, Steven S; Shahabuddin, Ghazala; Shannon, Graeme; Sheil, Douglas; Sheldon, Frederick H; Shochat, Eyal; Siebert, Stefan J; Silva, Fernando A B; Simonetti, Javier A; Slade, Eleanor M; Smith, Jo; Smith-Pardo, Allan H; Sodhi, Navjot S; Somarriba, Eduardo J; Sosa, Ramón A; Soto Quiroga, Grimaldo; St-Laurent, Martin-Hugues; Starzomski, Brian M; Stefanescu, Constanti; Steffan-Dewenter, Ingolf; Stouffer, Philip C; Stout, Jane C; Strauch, Ayron M; Struebig, Matthew J; Su, Zhimin; Suarez-Rubio, Marcela; Sugiura, Shinji; Summerville, Keith S; Sung, Yik-Hei; Sutrisno, Hari; Svenning, Jens-Christian; Teder, Tiit; Threlfall, Caragh G; Tiitsaar, Anu; Todd, Jacqui H; Tonietto, Rebecca K; Torre, Ignasi; Tóthmérész, Béla; Tscharntke, Teja; Turner, Edgar C; Tylianakis, Jason M; Uehara-Prado, Marcio; Urbina-Cardona, Nicolas; Vallan, Denis; Vanbergen, Adam J; Vasconcelos, Heraldo L; Vassilev, Kiril; Verboven, Hans A F; Verdasca, Maria João; Verdú, José R; Vergara, Carlos H; Vergara, Pablo M; Verhulst, Jort; Virgilio, Massimiliano; Vu, Lien Van; Waite, Edward M; Walker, Tony R; Wang, Hua-Feng; Wang, Yanping; Watling, James I; Weller, Britta; Wells, Konstans; Westphal, Catrin; Wiafe, Edward D; Williams, Christopher D; Willig, Michael R; Woinarski, John C Z; Wolf, Jan H D; Wolters, Volkmar; Woodcock, Ben A; Wu, Jihua; Wunderle, Joseph M; Yamaura, Yuichi; Yoshikura, Satoko; Yu, Douglas W; Zaitsev, Andrey S; Zeidler, Juliane; Zou, Fasheng; Collen, Ben; Ewers, Rob M; Mace, Georgina M; Purves, Drew W; Scharlemann, Jörn P W; Purvis, Andy

    2017-01-01

    The PREDICTS project-Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)-has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of

  7. The database of the PREDICTS (Projecting Responses of Ecological Diversity in Changing Terrestrial Systems) project

    DEFF Research Database (Denmark)

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara

    2017-01-01

    The PREDICTS project-Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)-has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity ...

  8. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project

    NARCIS (Netherlands)

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Alhusseini, Tamera I; Bedford, Felicity E; Bennett, Dominic J; Booth, Hollie; Burton, Victoria J; Chng, Charlotte W T; Choimes, Argyrios; Correia, David L P; Day, Julie; Echeverría-Londoño, Susy; Emerson, Susan R; Gao, Di; Garon, Morgan; Harrison, Michelle L K; Ingram, Daniel J; Jung, Martin; Kemp, Victoria; Kirkpatrick, Lucinda; Martin, Callum D; Pan, Yuan; Pask-Hale, Gwilym D; Pynegar, Edwin L; Robinson, Alexandra N; Sanchez-Ortiz, Katia; Senior, Rebecca A; Simmons, Benno I; White, Hannah J; Zhang, Hanbin; Aben, Job; Abrahamczyk, Stefan; Adum, Gilbert B; Aguilar-Barquero, Virginia; Aizen, Marcelo A; Albertos, Belén; Alcala, E L; Del Mar Alguacil, Maria; Alignier, Audrey; Ancrenaz, Marc; Andersen, Alan N; Arbeláez-Cortés, Enrique; Armbrecht, Inge; Arroyo-Rodríguez, Víctor; Aumann, Tom; Axmacher, Jan C; Azhar, Badrul; Azpiroz, Adrián B; Baeten, Lander; Bakayoko, Adama; Báldi, András; Banks, John E; Baral, Sharad K; Barlow, Jos; Barratt, Barbara I P; Barrico, Lurdes; Bartolommei, Paola; Barton, Diane M; Basset, Yves; Batáry, Péter; Bates, Adam J; Baur, Bruno; Bayne, Erin M; Beja, Pedro; Benedick, Suzan; Berg, Åke; Bernard, Henry; Berry, Nicholas J; Bhatt, Dinesh; Bicknell, Jake E; Bihn, Jochen H; Blake, Robin J; Bobo, Kadiri S; Bóçon, Roberto; Boekhout, Teun; Böhning-Gaese, Katrin; Bonham, Kevin J; Borges, Paulo A V; Borges, Sérgio H; Boutin, Céline; Bouyer, Jérémy; Bragagnolo, Cibele; Brandt, Jodi S; Brearley, Francis Q; Brito, Isabel; Bros, Vicenç; Brunet, Jörg; Buczkowski, Grzegorz; Buddle, Christopher M; Bugter, Rob; Buscardo, Erika; Buse, Jörn; Cabra-García, Jimmy; Cáceres, Nilton C; Cagle, Nicolette L; Calviño-Cancela, María; Cameron, Sydney A; Cancello, Eliana M; Caparrós, Rut; Cardoso, Pedro; Carpenter, Dan; Carrijo, Tiago F; Carvalho, Anelena L; Cassano, Camila R; Castro, Helena; Castro-Luna, Alejandro A; Rolando, Cerda B; Cerezo, Alexis; Chapman, Kim Alan; Chauvat, Matthieu; Christensen, Morten; Clarke, Francis M; Cleary, Daniel F R; Colombo, Giorgio; Connop, Stuart P; Craig, Michael D; Cruz-López, Leopoldo; Cunningham, Saul A; D'Aniello, Biagio; D'Cruze, Neil; da Silva, Pedro Giovâni; Dallimer, Martin; Danquah, Emmanuel; Darvill, Ben; Dauber, Jens; Davis, Adrian L V; Dawson, Jeff; de Sassi, Claudio; de Thoisy, Benoit; Deheuvels, Olivier; Dejean, Alain; Devineau, Jean-Louis; Diekötter, Tim; Dolia, Jignasu V; Domínguez, Erwin; Dominguez-Haydar, Yamileth; Dorn, Silvia; Draper, Isabel; Dreber, Niels; Dumont, Bertrand; Dures, Simon G; Dynesius, Mats; Edenius, Lars; Eggleton, Paul; Eigenbrod, Felix; Elek, Zoltán; Entling, Martin H; Esler, Karen J; de Lima, Ricardo F; Faruk, Aisyah; Farwig, Nina; Fayle, Tom M; Felicioli, Antonio; Felton, Annika M; Fensham, Roderick J; Fernandez, Ignacio C; Ferreira, Catarina C; Ficetola, Gentile F; Fiera, Cristina; Filgueiras, Bruno K C; Fırıncıoğlu, Hüseyin K; Flaspohler, David; Floren, Andreas; Fonte, Steven J; Fournier, Anne; Fowler, Robert E; Franzén, Markus; Fraser, Lauchlan H; Fredriksson, Gabriella M; Freire, Geraldo B; Frizzo, Tiago L M; Fukuda, Daisuke; Furlani, Dario; Gaigher, René; Ganzhorn, Jörg U; García, Karla P; Garcia-R, Juan C; Garden, Jenni G; Garilleti, Ricardo; Ge, Bao-Ming; Gendreau-Berthiaume, Benoit; Gerard, Philippa J; Gheler-Costa, Carla; Gilbert, Benjamin; Giordani, Paolo; Giordano, Simonetta; Golodets, Carly; Gomes, Laurens G L; Gould, Rachelle K; Goulson, Dave; Gove, Aaron D; Granjon, Laurent; Grass, Ingo; Gray, Claudia L; Grogan, James; Gu, Weibin; Guardiola, Moisès; Gunawardene, Nihara R; Gutierrez, Alvaro G; Gutiérrez-Lamus, Doris L; Haarmeyer, Daniela H; Hanley, Mick E; Hanson, Thor; Hashim, Nor R; Hassan, Shombe N; Hatfield, Richard G; Hawes, Joseph E; Hayward, Matt W; Hébert, Christian; Helden, Alvin J; Henden, John-André; Henschel, Philipp; Hernández, Lionel; Herrera, James P; Herrmann, Farina; Herzog, Felix; Higuera-Diaz, Diego; Hilje, Branko; Höfer, Hubert; Hoffmann, Anke; Horgan, Finbarr G; Hornung, Elisabeth; Horváth, Roland; Hylander, Kristoffer; Isaacs-Cubides, Paola; Ishida, Hiroaki; Ishitani, Masahiro; Jacobs, Carmen T; Jaramillo, Víctor J; Jauker, Birgit; Hernández, F Jiménez; Johnson, McKenzie F; Jolli, Virat; Jonsell, Mats; Juliani, S Nur; Jung, Thomas S; Kapoor, Vena; Kappes, Heike; Kati, Vassiliki; Katovai, Eric; Kellner, Klaus; Kessler, Michael; Kirby, Kathryn R; Kittle, Andrew M; Knight, Mairi E; Knop, Eva; Kohler, Florian; Koivula, Matti; Kolb, Annette

    The PREDICTS project-Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)-has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of

  9. PREDICTS: Projecting Responses of Ecological Diversity in Changing Terrestrial Systems

    Directory of Open Access Journals (Sweden)

    Georgina Mace

    2012-12-01

    Full Text Available The PREDICTS project (www.predicts.org.uk is a three-year NERC-funded project to model and predict at a global scale how local terrestrial diversity responds to human pressures such as land use, land cover, pollution, invasive species and infrastructure. PREDICTS is a collaboration between Imperial College London, the UNEP World Conservation Monitoring Centre, Microsoft Research Cambridge, UCL and the University of Sussex. In order to meet its aims, the project relies on extensive data describing the diversity and composition of biological communities at a local scale. Such data are collected on a vast scale through the committed efforts of field ecologists. If you have appropriate data that you would be willing to share with us, please get in touch (enquiries@predicts.org.uk. All contributions will be acknowledged appropriately and all data contributors will be included as co-authors on an open-access paper describing the database.

  10. PRINCIPLE OF POINT MAKING OF MULTI-PROJECTION PREDICTIVE DECISION

    Directory of Open Access Journals (Sweden)

    Olga N. Lapaeva

    2016-01-01

    Full Text Available The principle of point making of multi-projection predictive decision in economics is set forth in the article. The principle envisages searching for the best variant in each projection and result making by crossing of partial sets

  11. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012 for starvation resistance (startle response. The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.

  12. An Integrative Pathway-based Clinical-genomic Model for Cancer Survival Prediction.

    Science.gov (United States)

    Chen, Xi; Wang, Lily; Ishwaran, Hemant

    2010-09-01

    Prediction models that use gene expression levels are now being proposed for personalized treatment of cancer, but building accurate models that are easy to interpret remains a challenge. In this paper, we describe an integrative clinical-genomic approach that combines both genomic pathway and clinical information. First, we summarize information from genes in each pathway using Supervised Principal Components (SPCA) to obtain pathway-based genomic predictors. Next, we build a prediction model based on clinical variables and pathway-based genomic predictors using Random Survival Forests (RSF). Our rationale for this two-stage procedure is that the underlying disease process may be influenced by environmental exposure (measured by clinical variables) and perturbations in different pathways (measured by pathway-based genomic variables), as well as their interactions. Using two cancer microarray datasets, we show that the pathway-based clinical-genomic model outperforms gene-based clinical-genomic models, with improved prediction accuracy and interpretability.

  13. Understanding the Human Genome Project -- A Fact Sheet

    Science.gov (United States)

    ... that contribute to human disease. In 1953, James Watson and Francis Crick described the double helix structure ... of sequencing whole exomes or genomes, groundbreaking comparative genomic studies are now identifiying the causes of rare ...

  14. The evolution of the Anopheles 16 genomes project

    NARCIS (Netherlands)

    Neafsey, Daniel E.; Christophides, George K.; Collins, Frank H.; Emrich, Scott J.; Fontaine, Michael C.; Gelbart, William; Hahn, Matthew W.; Howell, Paul I.; Kafatos, Fotis C.; Lawson, Daniel; Muskavitch, Marc A. T.; Waterhouse, Robert M.; Williams, Louise J.; Besansky, Nora J.

    2013-01-01

    We report the imminent completion of a set of reference genome assemblies for 16 species of Anopheles mosquitoes. In addition to providing a generally useful resource for comparative genomic analyses, these genome sequences will greatly facilitate exploration of the capacity exhibited by some Anophe

  15. Genome projects 5W1H: what, where, when, why, how and in which population?

    Directory of Open Access Journals (Sweden)

    Pelin Fidanoğlu

    2014-05-01

    Full Text Available Genome projects aim to decode an organism's complete set of deoxyribonucleic acid (DNA, which can be described as the living code of organism. The idea of the Human Genome Project (HGP was conceived in the early 1980s. The project was started at 1990 and finished at 2003. The sequencing of the whole human genome derived from the DNA of several anonymous volunteers, costed 3.8 billion dollars. In order to annotate the genome data, the 'topography of the genome' and the anatomy of the genes should have been revealed. For this purpose, genome projects of several model organisms was carried out in parallel with HGP with the aim to identify basic structural components, organizational structure and evolutionarily development of the genome. With the advent of microarray technology in the early 2000s, high-throughput screening of Single Nucleotide Polymorphisms (SNPs and Copy Number Variations (CNVs became feasible. After the completion of HGP in 13 years, James D. Watson's genome was sequenced with 1 million dollar budget in just 2 months using next generation sequencing technology. Today a human genome can be sequenced in just one day with the cost of 6.600 USD. In this reviev the HGP which created big expectations especially in medicine will be explained from its start to the present. Then we will summarize the studies paving the road to personalized medicine emphasizing the fact that to reveal the meaning of genomic information, it should become computable.

  16. A common reference population from four European Holstein populations increases reliability of genomic predictions

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; de Ross, Sander PW; de Vries, Alfred G

    2011-01-01

    Background Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to...

  17. A common reference population from four European Holstein populations increases reliability of genomic predictions

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; de Ross, Sander PW; de Vries, Alfred G

    2011-01-01

    Background Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to...

  18. Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.

    Science.gov (United States)

    Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan

    2016-11-01

    In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects.

  19. Reducing dimensionality for prediction of genome-wide breeding values

    Directory of Open Access Journals (Sweden)

    Woolliams John A

    2009-03-01

    Full Text Available Abstract Partial least square regression (PLSR and principal component regression (PCR are methods designed for situations where the number of predictors is larger than the number of records. The aim was to compare the accuracy of genome-wide breeding values (EBV produced using PLSR and PCR with a Bayesian method, 'BayesB'. Marker densities of 1, 2, 4 and 8 Ne markers/Morgan were evaluated when the effective population size (Ne was 100. The correlation between true breeding value and estimated breeding value increased with density from 0.611 to 0.681 and 0.604 to 0.658 using PLSR and PCR respectively, with an overall advantage to PLSR of 0.016 (s.e = 0.008. Both methods gave a lower accuracy compared to the 'BayesB', for which accuracy increased from 0.690 to 0.860. PLSR and PCR appeared less responsive to increased marker density with the advantage of 'BayesB' increasing by 17% from a marker density of 1 to 8Ne/M. PCR and PLSR showed greater bias than 'BayesB' in predicting breeding values at all densities. Although, the PLSR and PCR were computationally faster and simpler, these advantages do not outweigh the reduction in accuracy, and there is a benefit in obtaining relevant prior information from the distribution of gene effects.

  20. Outsmarting cancer: the power of hybrid genomic/proteomic biomarkers to predict drug response.

    Science.gov (United States)

    Rexer, Brent N; Arteaga, Carlos L

    2014-01-01

    A recent study by Niepel and colleagues describes a novel approach to predicting response to targeted anti-cancer therapies. The authors used biochemical profiling of signaling activity in basal and ligand-stimulated states for a panel of receptor and intracellular kinases to develop predictive models of drug sensitivity. In some cases, the response to ligand stimulation predicted drug response better than did target abundance or genomic alterations in the targeted pathway. Furthermore, combining biochemical profiles with genomic information was better at predicting drug response. This work suggests that incorporating biochemical signaling profiles with genomic alterations should provide powerful predictors of response to molecularly targeted therapies.

  1. Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

    DEFF Research Database (Denmark)

    de los Campos, Gustavo; Vazquez, Ana I; Fernando, Rohan;

    2013-01-01

    ) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations......Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR...... by imperfect LD between markers and QTL is given by (12b) 2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome...

  2. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome...... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms......-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation....

  3. Simplification of Training Data for Cross-Project Defect Prediction

    OpenAIRE

    He, Peng; Li, Bing; Zhang, Deguang; Ma, Yutao

    2014-01-01

    Cross-project defect prediction (CPDP) plays an important role in estimating the most likely defect-prone software components, especially for new or inactive projects. To the best of our knowledge, few prior studies provide explicit guidelines on how to select suitable training data of quality from a large number of public software repositories. In this paper, we have proposed a training data simplification method for practical CPDP in consideration of multiple levels of granularity and filte...

  4. Comparative genomic data of the Avian Phylogenomics Project

    DEFF Research Database (Denmark)

    Zhang, Guojie; Li, Bo; Li, Cai;

    2014-01-01

    , which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts...... in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence...

  5. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project

    OpenAIRE

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L.L.; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R. P.; Alhusseini, Tamera I.; Bedford, Felicity E.; Bennett, Dominic J.; Booth, Hollie; Burton, Victoria J.; Chng , Charlotte W. T.; Choimes, Argyrios; Correia, David L.P.

    2017-01-01

    The PREDICTS project-Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)-has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of human impacts relating to land use. We have used this evidence base to develop global and regional statistical models of how local biodiversity responds to these measures. We describe and make free...

  6. Drug target prediction and prioritization: using orthology to predict essentiality in parasite genomes

    Directory of Open Access Journals (Sweden)

    Hall Ross S

    2010-04-01

    Full Text Available Abstract Background New drug targets are urgently needed for parasites of socio-economic importance. Genes that are essential for parasite survival are highly desirable targets, but information on these genes is lacking, as gene knockouts or knockdowns are difficult to perform in many species of parasites. We examined the applicability of large-scale essentiality information from four model eukaryotes, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Saccharomyces cerevisiae, to discover essential genes in each of their genomes. Parasite genes that lack orthologues in their host are desirable as selective targets, so we also examined prediction of essential genes within this subset. Results Cross-species analyses showed that the evolutionary conservation of genes and the presence of essential orthologues are each strong predictors of essentiality in eukaryotes. Absence of paralogues was also found to be a general predictor of increased relative essentiality. By combining several orthology and essentiality criteria one can select gene sets with up to a five-fold enrichment in essential genes compared with a random selection. We show how quantitative application of such criteria can be used to predict a ranked list of potential drug targets from Ancylostoma caninum and Haemonchus contortus - two blood-feeding strongylid nematodes, for which there are presently limited sequence data but no functional genomic tools. Conclusions The present study demonstrates the utility of using orthology information from multiple, diverse eukaryotes to predict essential genes. The data also emphasize the challenge of identifying essential genes among those in a parasite that are absent from its host.

  7. Computational analysis and prediction for exons of PAC579 genomic sequence

    Institute of Scientific and Technical Information of China (English)

    HUANG; Yi(

    2001-01-01

    Hebsgaard, S. M., Korning, P. G., Tolstrup, N. et al., Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information, Nucleic Acids Res., 1996, 24(17): 3439-3452.[15]Milanesi. L., Rogozin, I. B., Prediction of human gene structure, in Guide to Human Genome Computing (ed. Bishop, M.J.), 2nd ed., Cambridge: AAAI Press, 1998, 215-259.[16]Brunak, S., Engelbrecht, J., Knudsen, S., Prediction of human mRNA donor and acceptor sites from the DNA sequence, J.Mol. Biol., 1991, 220(1): 49-65.[17]Claverie. J. M., Audic, S., The statistical significance of nucleotide position-weight matrix matches, Comput. Appl. Biosci., 1996, 12(5): 431-439.[18]Solovyev, V. V., Salamov, A. A., INFOGENE: a database of known gene structures and predicted genes and proteins in sequences of genome sequencing projects, Nucleic Acids Res., 1999, 27(1): 248-250.[19]Deutsch, M., Long, M., Intron-exon structures of eukaryotic model organisms, Nucleic Acids Res., 1999, 27(15): 3219-3228.[20]Claverie, J. M., Computational methods for the identification of genes in vertebrate genomic sequences, Hum. Mol. Genet.,1997.6(10): 1735-1744.[21]Reese, M. G., Kulp, D., Tammana, H. et al., Genie-gene finding in Drosophila melanogaster, Genome Res., 2000, 10(4):529-538.[22]Mironov, A. A., Koonin, E. V., Roytberg, M, A. et al., Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes, Nucleic Acids Res., 1999, 27(14): 2981-2989.[23]Dobromir, S., Masahira, H., Yoshiyuki, S. et al., Criteria for gene identification and features of genome organization:analysis of 6.5Mb of DNA sequence from human chromosome 21, Gene, 2000, 247: 215-232.[24]Zhang, M. Q., Statistical features of human exons and their flanking regions, Hum. Mol. Genet., 1998, 7 (5): 919-932.

  8. Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

    Science.gov (United States)

    The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...

  9. Genomic prediction in a breeding program of perennial ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario; Ashraf, Bilal; Greve-Pedersen, Morten;

    2015-01-01

    We present a genomic selection study performed on 1918 rye grass families (Lolium perenne L.), which were derived from a commercial breeding program at DLF-Trifolium, Denmark. Phenotypes were recorded on standard plots, across 13 years and in 6 different countries. Variants were identified...... in utilizing genomic selection in rye grass....

  10. An equation to predict the accuracy of genomic values by combining data from multiple traits, populations, or environments

    NARCIS (Netherlands)

    Wientjes, Y.C.J.; Bijma, P.; Veerkamp, R.F.; Calus, M.P.L.

    2016-01-01

    Predicting the accuracy of estimated genomic values using genome-wide marker information is an important step in designing training populations. Currently, different deterministic equations are available to predict accuracy within populations, but not for multipopulation scenarios where data from

  11. Data from: An equation to predict the accuracy of genomic values by combining data from multiple traits, populations, or environments

    NARCIS (Netherlands)

    Wientjes, Y.C.J.; Bijma, P.; Veerkamp, R.F.; Calus, M.P.L.

    2015-01-01

    Predicting the accuracy of estimated genomic values using genome-wide marker information is an important step in designing training populations. Currently, different deterministic equations are available to predict accuracy within populations, but not for multipopulation scenarios where data from

  12. Sixteen new lung function signals identified through 1000 Genomes Project reference panel imputation

    NARCIS (Netherlands)

    Artigas, Maria Soler; Wain, Louise V.; Miller, Suzanne; Kheirallah, Abdul Kader; Huffman, Jennifer E.; Ntalla, Ioanna; Shrine, Nick; Obeidat, Ma'en; Trochet, Holly; McArdle, Wendy L.; Alves, Alexessander Couto; Hui, Jennie; Zhao, Jing Hua; Joshi, Peter K.; Teumer, Alexander; Albrecht, Eva; Imboden, Medea; Rawal, Rajesh; Lopez, Lorna M.; Marten, Jonathan; Enroth, Stefan; Surakka, Ida; Polasek, Ozren; Lyytikainen, Leo-Pekka; Granell, Raquel; Hysi, Pirro G.; Flexeder, Claudia; Mahajan, Anubha; Beilby, John; Bosse, Yohan; Brandsma, Corry-Anke; Campbell, Harry; Gieger, Christian; Glaeser, Sven; Gonzalez, Juan R.; Grallert, Harald; Hammond, Chris J.; Harris, Sarah E.; Hartikainen, Anna-Liisa; Heliovaara, Markku; Henderson, John; Hocking, Lynne; Horikoshi, Momoko; Hutri-Kahonen, Nina; Ingelsson, Erik; Johansson, Asa; Kemp, John P.; Kolcic, Ivana; Kumar, Ashish; Lind, Lars; Melen, Erik; Musk, Arthur W.; Navarro, Pau; Nickle, David C.; Padmanabhan, Sandosh; Raitakari, Olli T.; Ried, Janina S.; Ripatti, Samuli; Schulz, Holger; Scott, Robert A.; Sin, Don D.; Starr, John M.; Vinuela, Ana; Voelzke, Henry; Wild, Sarah H.; Wright, Alan F.; Zemunik, Tatijana; Jarvis, Deborah L.; Spector, Tim D.; Evans, David M.; Lehtimaki, Terho; Vitart, Veronique; Kahonen, Mika; Gyllensten, Ulf; Rudan, Igor; Deary, Ian J.; Karrasch, Stefan; Probst-Hensch, Nicole M.; Heinrich, Joachim; Stubbe, Beate; Wilson, James F.; Wareham, Nicholas J.; James, Alan L.; Morris, Andrew P.; Jarvelin, Marjo-Riitta; Hayward, Caroline; Sayers, Ian; Strachan, David P.; Hall, Ian P.; Tobin, Martin D.; Deloukas, Panos; Hansell, Anna L.; Hubbard, Richard; Jackson, Victoria E.; Marchini, Jonathan; Pavord, Ian; Thomson, Neil C.; Zeggini, Eleftheria

    2015-01-01

    Lung function measures are used in the diagnosis of chronic obstructive pulmonary disease. In 38,199 European ancestry individuals, we studied genome-wide association of forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC) and FEV1/FVC with 1000 Genomes Project (phase 1)-imputed genot

  13. The Human Genome Project: Information access, management, and regulation. Final report

    Energy Technology Data Exchange (ETDEWEB)

    McInerney, J.D.; Micikas, L.B.

    1996-08-31

    The Human Genome Project is a large, internationally coordinated effort in biological research directed at creating a detailed map of human DNA. This report describes the access of information, management, and regulation of the project. The project led to the development of an instructional module titled The Human Genome Project: Biology, Computers, and Privacy, designed for use in high school biology classes. The module consists of print materials and both Macintosh and Windows versions of related computer software-Appendix A contains a copy of the print materials and discs containing the two versions of the software.

  14. Genomic prediction of starch content and chipping quality in tetraploid potato using genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Sverrisdóttir, Elsa; Byrne, Stephen; Sundmark, Ea Høegh Riis

    2017-01-01

    Genomic selection uses genome-wide molecular markers to predict performance of individuals and allows selections in the absence of direct phenotyping. It is regarded as a useful tool to accelerate genetic gain in breeding programs, and is becoming increasingly viable for crops as genotyping costs...... genomic estimated breeding values. Cross-validated prediction correlations of 0.56 and 0.73 were obtained within the training population for starch content and chipping quality, respectively, while correlations were lower when predicting performance in the test panel, at 0.30–0.31 and 0.......42–0.43, respectively. Predictions in the test panel were slightly improved when including representatives from the test panel in the training population but worsened when preceded by marker selection. Our results suggest that genomic prediction is feasible, however, the extremely high allelic diversity of tetraploid...

  15. Prediction of causative genomic relationships using sequence data of five French and Danish dairy cattle breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Boichard, Didier; Lund, Mogens Sandø

    and HD chips, or two 1 Kb intervals on both sides of each causative mutation, varying the distance between causative mutations and intervals from 1 base to 1 Mb. Subsequently, the regression coefficient of the genomic relationships at prediction markers on the genomic relationships at causal loci...... data is more likely to contain causative mutations and therefore increase the prediction accuracy in such populations. We studied the potential advantage of using real sequence data for prediction of genomic relationships at causative mutations using sequence data of chromosome 1 for 122 Holstein, 27...

  16. A System for Predicting Subcellular Localization of Yeast Genome Using Neural Network

    CERN Document Server

    Thampi, Sabu M

    2007-01-01

    The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. Many efforts have been made to predict protein subcellular localization. This paper aims to merge the artificial neural networks and bioinformatics to predict the location of protein in yeast genome. We introduce a new subcellular prediction method based on a backpropagation neural network. The results show that the prediction within an error limit of 5 to 10 percentage can be achieved with the system.

  17. Improved prediction of genetic predisposition to psychiatric disorders using genomic feature best linear unbiased prediction models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Demontis, Ditte; Børglum, Anders

    Introduction: Accurate prediction of unobserved phenotypes from observed genotypes is essential for the success in predicting disease risk from genotypes. However, the performance is somewhat limited. Genomic feature best linear unbiased prediction (GFBLUP) models separate the total genomic...... is enriched for causal variants. Here we apply the GFBLUP model to a small schizophrenia case-control study to test the promise of this model on psychiatric disorders, and hypothesize that the performance will be increased when applying the model to a larger ADHD case-control study if the genomic feature...... contains the causal variants. Materials and Methods: The schizophrenia study consisted of 882 controls and 888 schizophrenia cases genotyped for 520,000 SNPs. The ADHD study contained 25,954 controls and 16,663 ADHD cases with 8,4 million imputed genotypes. Results: The predictive ability for schizophrenia...

  18. Genomic prediction for Nordic Red Cattle using one-step and selection index blending

    DEFF Research Database (Denmark)

    Guosheng, Su; Madsen, Per; Nielsen, Ulrik Sander

    2012-01-01

    This study investigated the accuracy of direct genomic breeding values (DGV) using a genomic BLUP model, genomic enhanced breeding values (GEBV) using a one-step blending approach, and GEBV using a selection index blending approach for 15 traits of Nordic Red Cattle. The data comprised 6,631 bull......-step blending approach is a good alternative to predict GEBV in practical genetic evaluation program....

  19. REVISITING MOLECULAR CLONING TO SOLVE GENOME SEQUENCING PROJECT CONFLICTS

    National Research Council Canada - National Science Library

    Hugo A Barrera-Saldaña; Aarón Daniel Ramírez-Sánchez; Tiffany Editth Palacios-Tovar; Dionicio Aguirre-Treviño; Saúl Felipe Karr-de-León

    2017-01-01

    .... Molecular cloning was chosen as the most straight-forward strategy to solve the dilemma. The initial characterization of recombinant plasmids by restriction enzyme digestion confirmed the presence of two genomic sequences...

  20. Effect of marker-data editing on the accuracy of genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2013-01-01

    Genomic selection is a method to predict breeding values using genome-wide single-nucleotide polymorphism (SNP) markers. High-quality marker data are necessary for genomic selection. The aim of this study was to investigate the effect of marker-editing criteria on the accuracy of genomic predicti......Genomic selection is a method to predict breeding values using genome-wide single-nucleotide polymorphism (SNP) markers. High-quality marker data are necessary for genomic selection. The aim of this study was to investigate the effect of marker-editing criteria on the accuracy of genomic...... predictions in the Nordic Holstein and Jersey populations. Data included 4429 Holstein and 1071 Jersey bulls. In total, 48 222 SNP for Holstein and 44 305 SNP for Jersey were polymorphic. The SNP data were edited based on (i) minor allele frequencies (MAF) with thresholds of no limit, 0.001, 0.01, 0.02, 0.......05 and 0.10, (ii) deviations from Hardy–Weinberg proportions (HWP) with thresholds of no limit, chi-squared p-values of 0.001, 0.02, 0.05 and 0.10, and (iii) GenCall (GC) scores with thresholds of 0.15, 0.55, 0.60, 0.65 and 0.70. The marker data sets edited with different criteria were used for genomic...

  1. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17% of the genetic variance among lines in females (males, the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  2. Data Mining Approaches for Genomic Biomarker Development: Applications Using Drug Screening Data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia.

    Directory of Open Access Journals (Sweden)

    David G Covell

    Full Text Available Developing reliable biomarkers of tumor cell drug sensitivity and resistance can guide hypothesis-driven basic science research and influence pre-therapy clinical decisions. A popular strategy for developing biomarkers uses characterizations of human tumor samples against a range of cancer drug responses that correlate with genomic change; developed largely from the efforts of the Cancer Cell Line Encyclopedia (CCLE and Sanger Cancer Genome Project (CGP. The purpose of this study is to provide an independent analysis of this data that aims to vet existing and add novel perspectives to biomarker discoveries and applications. Existing and alternative data mining and statistical methods will be used to a evaluate drug responses of compounds with similar mechanism of action (MOA, b examine measures of gene expression (GE, copy number (CN and mutation status (MUT biomarkers, combined with gene set enrichment analysis (GSEA, for hypothesizing biological processes important for drug response, c conduct global comparisons of GE, CN and MUT as biomarkers across all drugs screened in the CGP dataset, and d assess the positive predictive power of CGP-derived GE biomarkers as predictors of drug response in CCLE tumor cells. The perspectives derived from individual and global examinations of GEs, MUTs and CNs confirm existing and reveal unique and shared roles for these biomarkers in tumor cell drug sensitivity and resistance. Applications of CGP-derived genomic biomarkers to predict the drug response of CCLE tumor cells finds a highly significant ROC, with a positive predictive power of 0.78. The results of this study expand the available data mining and analysis methods for genomic biomarker development and provide additional support for using biomarkers to guide hypothesis-driven basic science research and pre-therapy clinical decisions.

  3. Data Mining Approaches for Genomic Biomarker Development: Applications Using Drug Screening Data from the Cancer Genome Project and the Cancer Cell Line Encyclopedia.

    Science.gov (United States)

    Covell, David G

    2015-01-01

    Developing reliable biomarkers of tumor cell drug sensitivity and resistance can guide hypothesis-driven basic science research and influence pre-therapy clinical decisions. A popular strategy for developing biomarkers uses characterizations of human tumor samples against a range of cancer drug responses that correlate with genomic change; developed largely from the efforts of the Cancer Cell Line Encyclopedia (CCLE) and Sanger Cancer Genome Project (CGP). The purpose of this study is to provide an independent analysis of this data that aims to vet existing and add novel perspectives to biomarker discoveries and applications. Existing and alternative data mining and statistical methods will be used to a) evaluate drug responses of compounds with similar mechanism of action (MOA), b) examine measures of gene expression (GE), copy number (CN) and mutation status (MUT) biomarkers, combined with gene set enrichment analysis (GSEA), for hypothesizing biological processes important for drug response, c) conduct global comparisons of GE, CN and MUT as biomarkers across all drugs screened in the CGP dataset, and d) assess the positive predictive power of CGP-derived GE biomarkers as predictors of drug response in CCLE tumor cells. The perspectives derived from individual and global examinations of GEs, MUTs and CNs confirm existing and reveal unique and shared roles for these biomarkers in tumor cell drug sensitivity and resistance. Applications of CGP-derived genomic biomarkers to predict the drug response of CCLE tumor cells finds a highly significant ROC, with a positive predictive power of 0.78. The results of this study expand the available data mining and analysis methods for genomic biomarker development and provide additional support for using biomarkers to guide hypothesis-driven basic science research and pre-therapy clinical decisions.

  4. Large-Scale Release of Campylobacter Draft Genomes: Resources for Food Safety and Public Health from the 100K Pathogen Genome Project

    Science.gov (United States)

    Huang, Bihua C.; Storey, Dylan B.; Kong, Nguyet; Chen, Poyin; Arabyan, Narine; Gilpin, Brent; Mason, Carl; Townsend, Andrea K.; Smith, Woutrina A.; Byrne, Barbara A.; Taff, Conor C.

    2017-01-01

    ABSTRACT Campylobacter is a food-associated bacterium and a leading cause of foodborne illness worldwide, being associated with poultry in the food supply. This is the initial public release of 202 Campylobacter genome sequences as part of the 100K Pathogen Genome Project. These isolates represent global genomic diversity in the Campylobacter genus. PMID:28057746

  5. Evaluation of the utility of gene expression and metabolic information for genomic prediction in maize.

    Science.gov (United States)

    Guo, Zhigang; Magwire, Michael M; Basten, Christopher J; Xu, Zhanyou; Wang, Daolong

    2016-12-01

    Predictive ability derived from gene expression and metabolic information was evaluated using genomic prediction methods based on datasets from a public maize panel. With the rapid development of high throughput biological technologies, information from gene expression and metabolites has received growing attention in plant genetics and breeding. In this study, we evaluated the utility of gene expression and metabolic information for genomic prediction using data obtained from a maize diversity panel. Our results show that, when used as predictor variables, gene expression levels and metabolite abundances provided reasonable predictive abilities relative to those based on genetic markers, although these values were not as large as those with genetic markers. Integrating gene expression levels and metabolite abundances with genetic markers significantly improved predictive abilities in comparison to the benchmark genomic best linear unbiased prediction model using genome-wide markers only. Predictive abilities based on gene expression and metabolites were trait-specific and were affected by the time of measurement and tissue samples as well as the number of genes and metabolites included in the model. In general, our results suggest that, rather than being conventionally used as intermediate phenotypes, gene expression and metabolic information can be used as predictors for genomic prediction and help improve genetic gains for complex traits in breeding programs.

  6. The FlyBase database of the Drosophila genome projects andcommunity literature

    Energy Technology Data Exchange (ETDEWEB)

    Gelbart, William; Bayraktaroglu, Leyla; Bettencourt, Brian; Campbell, Kathy; Crosby, Madeline; Emmert, David; Hradecky, Pavel; Huang,Yanmei; Letovsky, Stan; Matthews, Beverly; Russo, Susan; Schroeder,Andrew; Smutniak, Frank; Zhou, Pinglei; Zytkovicz, Mark; Ashburner,Michael; Drysdale, Rachel; de Grey, Aubrey; Foulger, Rebecca; Millburn,Gillian; Yamada, Chihiro; Kaufman, Thomas; Matthews, Kathy; Gilbert, Don; Grumbling, Gary; Strelets, Victor; Shemen, C.; Rubin, Gerald; Berman,Brian; Frise, Erwin; Gibson, Mark; Harris, Nomi; Kaminker, Josh; Lewis,Suzanna; Marshall, Brad; Misra, Sima; Mungall, Christopher; Prochnik,Simon; Richter, John; Smith, Christopher; Shu, ShengQiang; Tupy,Jonathan; Wiel, Colin

    2002-09-16

    FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D.melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy.

  7. Establishing the basis for Genomic Prediction in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario

    2015-01-01

    Genomic Selection (GS) is a relatively new technology, which has already revolutionized animal breeding and which is expected to have a high impact on plant breeding. In contrast to traditional marker assisted breeding, which only focuses on specific genes. GS estimates the genetic value...... of individuals/families by using genomic information over the Whole genome. The benefits of GS include reductions in expensive and time-consuming phenotyping operations, higher genetic gains, and simultaneous selection of multiple traits. To date, GS has primarely been tested in species, which are grown...... as homogeneous varieties. For crops grown in heterogeneous families, investigations have been limited to af few theoretical considerations. The aim of the present thesis was to establish the basis for GS implementation in such species. Analyses were performed on real data from a breeding program of perennial...

  8. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  9. Kernel-based whole-genome prediction of complex traits: a review

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  10. From structure prediction to genomic screens for novel non-coding RNAs

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.

    2011-01-01

    . This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch...

  11. Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers.

    Science.gov (United States)

    Shepherd, Ross K; Meuwissen, Theo H E; Woolliams, John A

    2010-10-22

    The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time.

  12. A foundation for provitamin A biofortification of maize: genome-wide association and genomic prediction models of carotenoid levels.

    Science.gov (United States)

    Owens, Brenda F; Lipka, Alexander E; Magallanes-Lundback, Maria; Tiede, Tyler; Diepenbrock, Christine H; Kandianis, Catherine B; Kim, Eunha; Cepela, Jason; Mateos-Hernandez, Maria; Buell, C Robin; Buckler, Edward S; DellaPenna, Dean; Gore, Michael A; Rocheford, Torbert

    2014-12-01

    Efforts are underway for development of crops with improved levels of provitamin A carotenoids to help combat dietary vitamin A deficiency. As a global staple crop with considerable variation in kernel carotenoid composition, maize (Zea mays L.) could have a widespread impact. We performed a genome-wide association study (GWAS) of quantified seed carotenoids across a panel of maize inbreds ranging from light yellow to dark orange in grain color to identify some of the key genes controlling maize grain carotenoid composition. Significant associations at the genome-wide level were detected within the coding regions of zep1 and lut1, carotenoid biosynthetic genes not previously shown to impact grain carotenoid composition in association studies, as well as within previously associated lcyE and crtRB1 genes. We leveraged existing biochemical and genomic information to identify 58 a priori candidate genes relevant to the biosynthesis and retention of carotenoids in maize to test in a pathway-level analysis. This revealed dxs2 and lut5, genes not previously associated with kernel carotenoids. In genomic prediction models, use of markers that targeted a small set of quantitative trait loci associated with carotenoid levels in prior linkage studies were as effective as genome-wide markers for predicting carotenoid traits. Based on GWAS, pathway-level analysis, and genomic prediction studies, we outline a flexible strategy involving use of a small number of genes that can be selected for rapid conversion of elite white grain germplasm, with minimal amounts of carotenoids, to orange grain versions containing high levels of provitamin A.

  13. Genomic prediction and genomic variance partitioning of daily and residual feed intake in pigs using Bayesian Power Lasso models

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Janss, Luc L G; Strathe, Anders B

    Improvement of feed efficiency is essential in pig breeding and selection for reduced residual feed intake (RFI) is an option. The study applied Bayesian Power LASSO (BPL) models with different power parameter to investigate genetic architecture, to predict genomic breeding values, and to partition...... genomic variance for RFI and daily feed intake (DFI). A total of 1272 Duroc pigs had both genotypic and phenotypic records for these traits. Significant SNPs were detected on chromosome 1 (SSC 1) and SSC 14 for RFI and on SSC 1 for DFI. BPL had similar accuracy and bias as GBLUP but power parameters had...

  14. Genomic prediction and genomic variance partitioning of daily and residual feed intake in pigs using Bayesian Power Lasso models

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Janss, L. L. G.; Strathe, Anders Bjerring

    Improvement of feed efficiency is essential in pig breeding and selection for reduced residual feed intake (RFI) is an option. The study applied Bayesian Power LASSO (BPL) models with different power parameter to investigate genetic architecture, to predict genomic breeding values, and to partition...... genomic variance for RFI and daily feed intake (DFI). A total of 1272 Duroc pigs had both genotypic and phenotypic records for these traits. Significant SNPs were detected on chromosome 1 (SSC 1) and SSC 14 for RFI and on SSC 1 for DFI. BPL models had similar accuracy and bias as GBLUP method but use...

  15. Gene prediction in the fathead minnow [Pimephales promelas] genome

    Science.gov (United States)

    The fathead minnow is a well-established model organism which has been widely used for regulatory ecotoxicity testing and research for over half century. While much information has been gathered on the organism over the years, the fathead minnow genome, a critical source of infor...

  16. Genomic islands predict functional adaptation in marine actinobacteria

    Energy Technology Data Exchange (ETDEWEB)

    Penn, Kevin; Jenkins, Caroline; Nett, Markus; Udwary, Daniel; Gontang, Erin; McGlinchey, Ryan; Foster, Brian; Lapidus, Alla; Podell, Sheila; Allen, Eric; Moore, Bradley; Jensen, Paul

    2009-04-01

    Linking functional traits to bacterial phylogeny remains a fundamental but elusive goal of microbial ecology 1. Without this information, it becomes impossible to resolve meaningful units of diversity and the mechanisms by which bacteria interact with each other and adapt to environmental change. Ecological adaptations among bacterial populations have been linked to genomic islands, strain-specific regions of DNA that house functionally adaptive traits 2. In the case of environmental bacteria, these traits are largely inferred from bioinformatic or gene expression analyses 2, thus leaving few examples in which the functions of island genes have been experimentally characterized. Here we report the complete genome sequences of Salinispora tropica and S. arenicola, the first cultured, obligate marine Actinobacteria 3. These two species inhabit benthic marine environments and dedicate 8-10percent of their genomes to the biosynthesis of secondary metabolites. Despite a close phylogenetic relationship, 25 of 37 secondary metabolic pathways are species-specific and located within 21 genomic islands, thus providing new evidence linking secondary metabolism to ecological adaptation. Species-specific differences are also observed in CRISPR sequences, suggesting that variations in phage immunity provide fitness advantages that contribute to the cosmopolitan distribution of S. arenicola 4. The two Salinispora genomes have evolved by complex processes that include the duplication and acquisition of secondary metabolite genes, the products of which provide immediate opportunities for molecular diversification and ecological adaptation. Evidence that secondary metabolic pathways are exchanged by Horizontal Gene Transfer (HGT) yet are fixed among globally distributed populations 5 supports a functional role for their products and suggests that pathway acquisition represents a previously unrecognized force driving bacterial diversification

  17. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project

    OpenAIRE

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Alhusseini, Tamera I.; Bedford, Felicity E.; Bennett, Dominic J.; Booth, Hollie; Burton, Victoria J.; Chng, Charlotte W. T.; Choimes, Argyrios; Correia, David L.P.

    2016-01-01

    The PREDICTS project—Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)—has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of human impacts relating to land use. We have used this evidence base to develop global and regional statistical models of how local biodiversity responds to these measures. We describe and make free...

  18. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project

    OpenAIRE

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L.L.; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R. P.; Alhusseini, Tamera I.; Bedford, Felicity E.; Bennett, Dominic J.; Booth, Hollie; Burton, Victoria J.; Chng , Charlotte W. T.; Choimes, Argyrios; Correia, David L.P.

    2016-01-01

    Abstract The PREDICTS project—Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)—has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of human impacts relating to land use. We have used this evidence base to develop global and regional statistical models of how local biodiversity responds to these measures. We describe and ...

  19. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project

    OpenAIRE

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Alhusseini, Tamera I.; Bedford, Felicity E.; Bennett, Dominic J.; Booth, Hollie; Burton, Victoria J.; Chng, Charlotte W. T.; Choimes, Argyrios; Correia, David L.P.

    2017-01-01

    The PREDICTS project—Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)—has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of human impacts relating to land use. We have used this evidence base to develop global and regional statistical models of how local biodiversity responds to these measures. We describe and make free...

  20. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  1. Predictions of Chemical Weather in Asia: The EU Panda Project

    Science.gov (United States)

    Brasseur, G. P.; Petersen, A. K.; Wang, X.; Granier, C.; Bouarar, I.

    2014-12-01

    Air quality has become a pressing problem in Asia and specifically in China due to rapid economic development (i.e., rapidly expanding motor vehicle fleets, growing industrial and power generation activities, domestic and biomass burning). In spite of efforts to reduce chemical emissions, high levels of particle matter and ozone are observed and lead to severe health problems with a large number of premature deaths. To support efforts to reduce air pollution, the European Union is supporting the PANDA project whose objective is to use space and surface observations of chemical species as well as advanced meteorological and chemical models to analyze and predict air quality in China. The Project involves 7 European and 7 Chinese groups. The paper will describe the objectives of the project and present some first accomplishments. The project focuses on the improvement of methods for monitoring air quality from combined space and in-situ observations, the development of a comprehensive prediction system that makes use of these observations, the elaboration of indicators for air quality in support of policies, and the development of toolboxes for the dissemination of information.

  2. Predicting Essential Metabolic Genome Content of Niche-Specific Enterobacterial Human Pathogens during Simulation of Host Environments.

    Directory of Open Access Journals (Sweden)

    Tong Ding

    Full Text Available Microorganisms have evolved to occupy certain environmental niches, and the metabolic genes essential for growth in these locations are retained in the genomes. Many microorganisms inhabit niches located in the human body, sometimes causing disease, and may retain genes essential for growth in locations such as the bloodstream and urinary tract, or growth during intracellular invasion of the hosts' macrophage cells. Strains of Escherichia coli (E. coli and Salmonella spp. are thought to have evolved over 100 million years from a common ancestor, and now cause disease in specific niches within humans. Here we have used a genome scale metabolic model representing the pangenome of E. coli which contains all metabolic reactions encoded by genes from 16 E. coli genomes, and have simulated environmental conditions found in the human bloodstream, urinary tract, and macrophage to determine essential metabolic genes needed for growth in each location. We compared the predicted essential genes for three E. coli strains and one Salmonella strain that cause disease in each host environment, and determined that essential gene retention could be accurately predicted using this approach. This project demonstrated that simulating human body environments such as the bloodstream can successfully lead to accurate computational predictions of essential/important genes.

  3. Predicting Essential Metabolic Genome Content of Niche-Specific Enterobacterial Human Pathogens during Simulation of Host Environments.

    Science.gov (United States)

    Ding, Tong; Case, Kyle A; Omolo, Morrine A; Reiland, Holly A; Metz, Zachary P; Diao, Xinyu; Baumler, David J

    2016-01-01

    Microorganisms have evolved to occupy certain environmental niches, and the metabolic genes essential for growth in these locations are retained in the genomes. Many microorganisms inhabit niches located in the human body, sometimes causing disease, and may retain genes essential for growth in locations such as the bloodstream and urinary tract, or growth during intracellular invasion of the hosts' macrophage cells. Strains of Escherichia coli (E. coli) and Salmonella spp. are thought to have evolved over 100 million years from a common ancestor, and now cause disease in specific niches within humans. Here we have used a genome scale metabolic model representing the pangenome of E. coli which contains all metabolic reactions encoded by genes from 16 E. coli genomes, and have simulated environmental conditions found in the human bloodstream, urinary tract, and macrophage to determine essential metabolic genes needed for growth in each location. We compared the predicted essential genes for three E. coli strains and one Salmonella strain that cause disease in each host environment, and determined that essential gene retention could be accurately predicted using this approach. This project demonstrated that simulating human body environments such as the bloodstream can successfully lead to accurate computational predictions of essential/important genes.

  4. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  5. Genome-wide prediction and validation of sigma70 promoters in Lactobacillus plantarum WCFS1.

    Directory of Open Access Journals (Sweden)

    Tilman J Todt

    Full Text Available BACKGROUND: In prokaryotes, sigma factors are essential for directing the transcription machinery towards promoters. Various sigma factors have been described that recognize, and bind to specific DNA sequence motifs in promoter sequences. The canonical sigma factor σ(70 is commonly involved in transcription of the cell's housekeeping genes, which is mediated by the conserved σ(70 promoter sequence motifs. In this study the σ(70-promoter sequences in Lactobacillus plantarum WCFS1 were predicted using a genome-wide analysis. The accuracy of the transcriptionally-active part of this promoter prediction was subsequently evaluated by correlating locations of predicted promoters with transcription start sites inferred from the 5'-ends of transcripts detected by high-resolution tiling array transcriptome datasets. RESULTS: To identify σ(70-related promoter sequences, we performed a genome-wide sequence motif scan of the L. plantarum WCFS1 genome focussing on the regions upstream of protein-encoding genes. We obtained several highly conserved motifs including those resembling the conserved σ(70-promoter consensus. Position weight matrices-based models of the recovered σ(70-promoter sequence motif were employed to identify 3874 motifs with significant similarity (p-value<10(-4 to the model-motif in the L. plantarum genome. Genome-wide transcript information deduced from whole genome tiling-array transcriptome datasets, was used to infer transcription start sites (TSSs from the 5'-end of transcripts. By this procedure, 1167 putative TSSs were identified that were used to corroborate the transcriptionally active fraction of these predicted promoters. In total, 568 predicted promoters were found in proximity (≤ 40 nucleotides of the putative TSSs, showing a highly significant co-occurrence of predicted promoter and TSS (p-value<10(-263. CONCLUSIONS: High-resolution tiling arrays provide a suitable source to infer TSSs at a genome-wide level, and

  6. The 3,000 rice genomes project: new opportunities and challenges for future rice research

    OpenAIRE

    Li, Jia-Yang; Wang, Jun; Zeigler, Robert S.

    2014-01-01

    Rice is the world’s most important staple grown by millions of small-holder farmers. Sustaining rice production relies on the intelligent use of rice diversity. The 3,000 Rice Genomes Project is a giga-dataset of publically available genome sequences (averaging 14× depth of coverage) derived from 3,000 accessions of rice with global representation of genetic and functional diversity. The seed of these accessions is available from the International Rice Genebank Collection. Together, they are ...

  7. Human genome education model project. Ethical, legal, and social implications of the human genome project: Education of interdisciplinary professionals

    Energy Technology Data Exchange (ETDEWEB)

    Weiss, J.O. [Alliance of Genetic Support Groups, Chevy Chase, MD (United States); Lapham, E.V. [Georgetown Univ., Washington, DC (United States). Child Development Center

    1996-12-31

    This meeting was held June 10, 1996 at Georgetown University. The purpose of this meeting was to provide a multidisciplinary forum for exchange of state-of-the-art information on the human genome education model. Topics of discussion include the following: psychosocial issues; ethical issues for professionals; legislative issues and update; and education issues.

  8. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm

    Science.gov (United States)

    de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.

    2016-01-01

    Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657

  9. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  10. Genome-wide association and genomic prediction of resistance to maize lethal necrosis disease in tropical maize germplasm.

    Science.gov (United States)

    Gowda, Manje; Das, Biswanath; Makumbi, Dan; Babu, Raman; Semagn, Kassa; Mahuku, George; Olsen, Michael S; Bright, Jumbo M; Beyene, Yoseph; Prasanna, Boddupalli M

    2015-10-01

    Genome-wide association analysis in tropical and subtropical maize germplasm revealed that MLND resistance is influenced by multiple genomic regions with small to medium effects. The maize lethal necrosis disease (MLND) caused by synergistic interaction of Maize chlorotic mottle virus and Sugarcane mosaic virus, and has emerged as a serious threat to maize production in eastern Africa since 2011. Our objective was to gain insights into the genetic architecture underlying the resistance to MLND by genome-wide association study (GWAS) and genomic selection. We used two association mapping (AM) panels comprising a total of 615 diverse tropical/subtropical maize inbred lines. All the lines were evaluated against MLND under artificial inoculation. Both the panels were genotyped using genotyping-by-sequencing. Phenotypic variation for MLND resistance was significant and heritability was moderately high in both the panels. Few promising lines with high resistance to MLND were identified to be used as potential donors. GWAS revealed 24 SNPs that were significantly associated (P < 3 × 10(-5)) with MLND resistance. These SNPs are located within or adjacent to 20 putative candidate genes that are associated with plant disease resistance. Ridge regression best linear unbiased prediction with five-fold cross-validation revealed higher prediction accuracy for IMAS-AM panel (0.56) over DTMA-AM (0.36) panel. The prediction accuracy for both within and across panels is promising; inclusion of MLND resistance associated SNPs into the prediction model further improved the accuracy. Overall, the study revealed that resistance to MLND is controlled by multiple loci with small to medium effects and the SNPs identified by GWAS can be used as potential candidates in MLND resistance breeding program.

  11. Whole genome phylogeny for 21 Drosophila species using predicted 2b-RAD fragments

    Directory of Open Access Journals (Sweden)

    Arun S. Seetharam

    2013-12-01

    Full Text Available Type IIB restriction endonucleases are site-specific endonucleases that cut both strands of double-stranded DNA upstream and downstream of their recognition sequences. These restriction enzymes have recognition sequences that are generally interrupted and range from 5 to 7 bases long. They produce DNA fragments which are uniformly small, ranging from 21 to 33 base pairs in length (without cohesive ends. The fragments are generated from throughout the entire length of a genomic DNA providing an excellent fractional representation of the genome. In this study we simulated restriction enzyme digestions on 21 sequenced genomes of various Drosophila species using the predicted targets of 16 Type IIB restriction enzymes to effectively produce a large and arbitrary selection of loci from these genomes. The fragments were then used to compare organisms and to calculate the distance between genomes in pair-wise combination by counting the number of shared fragments between the two genomes. Phylogenetic trees were then generated for each enzyme using this distance measure and the consensus was calculated. The consensus tree obtained agrees well with the currently accepted tree for the Drosophila species. We conclude that multi-locus sub-genomic representation combined with next generation sequencing, especially for individuals and species without previous genome characterization, can accelerate studies of comparative genomics and the building of accurate phylogenetic trees.

  12. Genome-wide prediction, display and refinement of binding sites with information theory-based models

    Directory of Open Access Journals (Sweden)

    Leeder J Steven

    2003-09-01

    Full Text Available Abstract Background We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices. Results Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4–6 hours for transcription factor binding sites and 10–19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths

  13. Human Genome Project: an attentive reading of the book of life?

    OpenAIRE

    2010-01-01

    The idea to sequence all 3 billion bases of the humane genome started in the late 80s and the project began in the early 90s. In June 2000, the first "draft" was announced and in February, 2001 the final sequence was published by Science and Nature. Many debates about the ethical, legal and social issues originated from the Human Genome Project. The main questions are? "who should have access to an individual's genetic information?"; "will the genetic information be used as a discrimination t...

  14. The OASE project: Object-based Analysis and Seamless prediction

    Science.gov (United States)

    Troemel, Silke; Wapler, Kathrin; Bick, Theresa; Diederich, Malte; Deneke, Hartwig; Horvath, Akos; Senf, Fabian; Simmer, Clemens; Simon, Juergen

    2013-04-01

    The research group on Object-based Analysis and SEamless prediction (OASE) is part of the Hans Ertel Centre for Weather Research (HErZ). The group consists of scientists at the Meteorological Institute, University of Bonn, the Leibniz-Institute for Tropospheric Research in Leipzig and the German Weather Service. OASE addresses seamless prediction of convective events from nowcasting to daily predictions by combining radar/satellite compositing and tracking with high-resolution model-based ensemble generation and prediction. While observation-based nowcasting provides good results for lead times between 0-1 hours, numerical weather prediction addresses lead times between 3-21 hours. Especially the discontinuity between 1-3 hours needs to be addressed. Therefore a central goal of the project is a near real-time high-resolved unprecedented data base. A radar and satellite remote sensing-driven 3D observation-microphysics composite covering Germany, currently under development, contains gridded observations and estimated microphysical quantities. Observations and microphysics are intertwined via forward operators and estimated inverse relations, which also provide uncertainties for model ensemble initialisations. The lifetime evolution of dynamics and microphysics in (severe) convective storms is analysed based on 3D scale-space tracking. An object-based analysis condenses the information contained in the dynamic 3D distributions of observables and related microphysics into descriptors, which will allow identifying governing processes leading to the formation and evolution of severe weather events. The object-based approach efficiently characterises and quantifies the process structure and life cycles of severe weather events, and facilitates nowcasting and the generation and initialisation of model prediction ensembles. Observation-based nowcasting will exploit the dual-composite based 3D feature detection and tracking to generate a set of predictions (observation

  15. Crowdfunding the Azolla fern genome project: a grassroots approach.

    Science.gov (United States)

    Li, Fay-Wei; Pryer, Kathleen M

    2014-01-01

    Much of science progresses within the tight boundaries of what is often seen as a "black box". Though familiar to funding agencies, researchers and the academic journals they publish in, it is an entity that outsiders rarely get to peek into. Crowdfunding is a novel means that allows the public to participate in, as well as to support and witness advancements in science. Here we describe our recent crowdfunding efforts to sequence the Azolla genome, a little fern with massive green potential. Crowdfunding is a worthy platform not only for obtaining seed money for exploratory research, but also for engaging directly with the general public as a rewarding form of outreach.

  16. Enhancing genomic prediction with genome-wide association studies in multiparental maize populations

    Science.gov (United States)

    Genome-wide association mapping using dense marker sets has identified some nucleotide variants affecting complex traits which have been validated with fine-mapping and functional analysis. Many sequence variants associated with complex traits in maize have small effects and low repeatability, howev...

  17. Meta-analysis of genome-wide association from genomic prediction models

    Science.gov (United States)

    A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...

  18. Towards fully automated structure-based function prediction in structural genomics: a case study.

    Science.gov (United States)

    Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

    2007-04-13

    As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.

  19. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

    Directory of Open Access Journals (Sweden)

    McCarthy Fiona M

    2007-11-01

    Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and

  20. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

    Directory of Open Access Journals (Sweden)

    Xiaoyi eGao

    2012-06-01

    Full Text Available Genotype imputation is a vital tool in genome-wide association studies (GWAS and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR+CEU+YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation-based analysis in Latinos.

  1. Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels.

    Science.gov (United States)

    Gao, Xiaoyi; Haritunians, Talin; Marjoram, Paul; McKean-Cowdin, Roberta; Torres, Mina; Taylor, Kent D; Rotter, Jerome I; Gauderman, William J; Varma, Rohit

    2012-01-01

    Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos.

  2. Computational prediction of microRNA genes in silkworm genome

    Institute of Scientific and Technical Information of China (English)

    TONG Chuan-zhou; JIN Yong-feng; ZHANG Yao-zhou

    2006-01-01

    MicroRNAs (miRNAs) constitute a novel, extensive class of small RNAs (~21 nucleotides), and play important gene-regulation roles during growth and development in various organisms. Here we conducted a homology search to identify homologs of previously validated miRNAs from silkworm genome. We identified 24 potential miRNA genes, and gave each of them a name according to the common criteria. Interestingly, we found that a great number of newly identified miRNAs were conserved in silkworm and Drosophila, and family alignment revealed that miRNA families might possess single nucleotide polymorphisms. miRNA gene clusters and possible functions of complement miRNA pairs are discussed.

  3. SPOCS: software for predicting and visualizing orthology/paralogy relationships among genomes

    Science.gov (United States)

    Curtis, Darren S.; Phillips, Aaron R.; Callister, Stephen J.; Conlan, Sean; McCue, Lee Ann

    2013-01-01

    Summary: At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. Availability and Implementation: A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required. Contact: leeann.mccue@pnnl.gov PMID:23956303

  4. SPOCS: Software for Predicting and Visualizing Orthology/Paralogy Relationships Among Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Curtis, Darren S.; Phillips, Aaron R.; Callister, Stephen J.; Conlan, Sean; McCue, Lee Ann

    2013-10-15

    At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. AVAILABILITY AND IMPLEMENTATION: A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required.

  5. Computational analysis and prediction for exons of PAC579 genomic sequence

    Institute of Scientific and Technical Information of China (English)

    黄弋; 覃文新; 万大方; 赵新泰; 顾健人

    2001-01-01

    To isolate the novel genes related to human hepatocellular carcinoma (HCC), we sequenced P1-derived artificial chromosome PAC579 (D17S926 locus) mapped in the minimum LOH (loss of heterozygosity) deletion region of chromosome 17p13.3 in HCC, Four novel genes mapped in this genomic sequence area were isolated and cloned by wet-lab experiments, and the exons of these genes were located. 0-60 kb of this genomic sequence including the genes of interest was scanned with five different computational exon prediction programs as well as four splice site recognition programs. After analyzing and comparing the computationally predicted results with the wet-lab experiment results, some potential exons were predicted in the genomic sequence by using these programs.

  6. Prediction of Genomic Breeding Values for feed efficiency and related traits in pigs

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Janss, Luc; Strathe, Anders Bjerring

    Improvement of feed efficiency is essential in pig breeding and selection for reduced residual feed intake (RFI) is an option. Accuracy of genomic prediction (GP) relies on assumptions of genetic architecture of the traits. This study applied five different Bayesian Power LASSO (BPL) models...... with different power parameters to investigate genetic architecture of RFI, to predict genomic breeding values, and to partition genetic variances for different SNP groups. Data were 1272 Duroc pigs with both genotypic and phenotypic records for RFI as well as daily feed intake (DFI). The gene mapping confirmed...... and indicates their potentials for genomic prediction. Further work includes applying other GP methods for RFI and DFI as well as extending these methods to feed efficiency related traits such as feeding behaviour and body composition traits....

  7. Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Sakanyan Vehary

    2008-05-01

    Full Text Available Abstract Background Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. Results We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I σ70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the α subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the σ70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. Conclusion The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.

  8. Marked variation in predicted and observed variability of tandem repeat loci across the human genome

    Directory of Open Access Journals (Sweden)

    Shields Denis C

    2008-04-01

    Full Text Available Abstract Background Tandem repeat (TR variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. Results We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p Conclusion Variability among 2–12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y – likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter – and excesses in chromosomes 6, 13, 20 and 21.

  9. Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations.

    Science.gov (United States)

    Teo, Yik-Ying; Sim, Xueling; Ong, Rick T H; Tan, Adrian K S; Chen, Jieming; Tantoso, Erwin; Small, Kerrin S; Ku, Chee-Seng; Lee, Edmund J D; Seielstad, Mark; Chia, Kee-Seng

    2009-11-01

    The Singapore Genome Variation Project (SGVP) provides a publicly available resource of 1.6 million single nucleotide polymorphisms (SNPs) genotyped in 268 individuals from the Chinese, Malay, and Indian population groups in Southeast Asia. This online database catalogs information and summaries on genotype and phased haplotype data, including allele frequencies, assessment of linkage disequilibrium (LD), and recombination rates in a format similar to the International HapMap Project. Here, we introduce this resource and describe the analysis of human genomic variation upon agglomerating data from the HapMap and the Human Genome Diversity Project, providing useful insights into the population structure of the three major population groups in Asia. In addition, this resource also surveyed across the genome for variation in regional patterns of LD between the HapMap and SGVP populations, and for signatures of positive natural selection using two well-established metrics: iHS and XP-EHH. The raw and processed genetic data, together with all population genetic summaries, are publicly available for download and browsing through a web browser modeled with the Generic Genome Browser.

  10. Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.

    Science.gov (United States)

    Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu

    2017-09-01

    Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. Improving genomic prediction for Danish Jersey using a joint Danish-US reference population

    DEFF Research Database (Denmark)

    Su, Guosheng; Nielsen, Ulrik Sander; Wiggans, G;

    Accuracy of genomic prediction depends on the information in the reference population. Achieving an adequate sized reference population is a challenge for genomic prediction in small cattle populations. One way to increase the size of reference population is to combine reference data from different...... a GBLUP model from the Danish reference population and the joint Danish-US reference population. The traits in the analysis were milk yield, fat yield, protein yield, fertility, mastitis, longevity, body conformation, feet & legs, and longevity. Eight of the nine traits benefitted from the inclusion of US...

  12. Methods and Strategies to Impute Missing Genotypes for Improving Genomic Prediction

    DEFF Research Database (Denmark)

    Ma, Peipei

    Genomic prediction has been widely used in dairy cattle breeding. Genotype imputation is a key procedure to efficently utilize marker data from different chips and obtain high density marker data with minimizing cost. This thesis investigated methods and strategies to genotype imputation for impr......Genomic prediction has been widely used in dairy cattle breeding. Genotype imputation is a key procedure to efficently utilize marker data from different chips and obtain high density marker data with minimizing cost. This thesis investigated methods and strategies to genotype imputation...

  13. Predicting future uncertainty constraints on global warming projections.

    Science.gov (United States)

    Shiogama, H; Stone, D; Emori, S; Takahashi, K; Mori, S; Maeda, A; Ishizaki, Y; Allen, M R

    2016-01-11

    Projections of global mean temperature changes (ΔT) in the future are associated with intrinsic uncertainties. Much climate policy discourse has been guided by "current knowledge" of the ΔTs uncertainty, ignoring the likely future reductions of the uncertainty, because a mechanism for predicting these reductions is lacking. By using simulations of Global Climate Models from the Coupled Model Intercomparison Project Phase 5 ensemble as pseudo past and future observations, we estimate how fast and in what way the uncertainties of ΔT can decline when the current observation network of surface air temperature is maintained. At least in the world of pseudo observations under the Representative Concentration Pathways (RCPs), we can drastically reduce more than 50% of the ΔTs uncertainty in the 2040 s by 2029, and more than 60% of the ΔTs uncertainty in the 2090 s by 2049. Under the highest forcing scenario of RCPs, we can predict the true timing of passing the 2 °C (3 °C) warming threshold 20 (30) years in advance with errors less than 10 years. These results demonstrate potential for sequential decision-making strategies to take advantage of future progress in understanding of anthropogenic climate change.

  14. Ori-Finder 2, an integrated tool to predict replication origins in the archaeal genomes

    Directory of Open Access Journals (Sweden)

    Hao eLuo

    2014-09-01

    Full Text Available DNA replication is one of the most basic processes in all three domains of cellular life. With the advent of the post-genomic era, the increasing number of complete archaeal genomes has created an opportunity for exploration of the molecular mechanisms for initiating cellular DNA replication by in vivo experiments as well as in silico analysis. However, the location of replication origins (oriCs in many sequenced archaeal genomes remains unknown. We present a web-based tool Ori-Finder 2 to predict oriCs in the archaeal genomes automatically, based on the integrated method comprising the analysis of base composition asymmetry using the Z-curve method, the distribution of Origin Recognition Boxes (ORBs identified by FIMO tool, and the occurrence of genes frequently close to oriCs. The web server is also able to analyze the unannotated genome sequences by integrating with gene prediction pipelines and BLAST software for gene identification and function annotation. The result of the predicted oriCs is displayed as an HTML table, which offers an intuitive way to browse the result in graphical and tabular form. The software presented here is accurate for the genomes with single oriC, but it does not necessarily find all the origins of replication for the genomes with multiple oriCs. Ori-Finder 2 aims to become a useful platform for the identification and analysis of oriCs in the archaeal genomes, which would provide insight into the replication mechanisms in archaea. The web server is freely available at http://tubic.tju.edu.cn/Ori-Finder2/.

  15. Reflections on Mental Retardation and Eugenics, Old and New: Mensa and the Human Genome Project.

    Science.gov (United States)

    Smith, J. David

    1994-01-01

    This article addresses the moral and ethical issues of mental retardation and a continuing legacy of belief in eugenics. It discusses the involuntary sterilization of Carrie Buck in 1927, support for legalized killing of subnormal infants by 47% of respondents to a Mensa survey, and implications of the Human Genome Project for the field of mental…

  16. Democratizing Human Genome Project Information: A Model Program for Education, Information and Debate in Public Libraries.

    Science.gov (United States)

    Pollack, Miriam

    The "Mapping the Human Genome" project demonstrated that librarians can help whomever they serve in accessing information resources in the areas of biological and health information, whether it is the scientists who are developing the information or a member of the public who is using the information. Public libraries can guide library…

  17. Reflections on Mental Retardation and Eugenics, Old and New: Mensa and the Human Genome Project.

    Science.gov (United States)

    Smith, J. David

    1994-01-01

    This article addresses the moral and ethical issues of mental retardation and a continuing legacy of belief in eugenics. It discusses the involuntary sterilization of Carrie Buck in 1927, support for legalized killing of subnormal infants by 47% of respondents to a Mensa survey, and implications of the Human Genome Project for the field of mental…

  18. The Human Genome Project and Eugenics: Identifying the Impact on Individuals with Mental Retardation.

    Science.gov (United States)

    Kuna, Jason

    2001-01-01

    This article explores the impact of the mapping work of the Human Genome Project on individuals with mental retardation and the negative effects of genetic testing. The potential to identify disabilities and the concept of eugenics are discussed, along with ethical issues surrounding potential genetic therapies. (Contains references.) (CR)

  19. Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Science.gov (United States)

    Sokolov, Artem; Carlin, Daniel E; Paull, Evan O; Baertsch, Robert; Stuart, Joshua M

    2016-03-01

    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.

  20. Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Directory of Open Access Journals (Sweden)

    Artem Sokolov

    2016-03-01

    Full Text Available We present a novel regularization scheme called The Generalized Elastic Net (GELnet that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.

  1. Pathway-Based Genomics Prediction using Generalized Elastic Net

    Science.gov (United States)

    Sokolov, Artem; Carlin, Daniel E.; Paull, Evan O.; Baertsch, Robert; Stuart, Joshua M.

    2016-01-01

    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach. PMID:26960204

  2. Comprehensive prediction of chromosome dimer resolution sites in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Arakawa Kazuharu

    2011-01-01

    Full Text Available Abstract Background During the replication process of bacteria with circular chromosomes, an odd number of homologous recombination events results in concatenated dimer chromosomes that cannot be partitioned into daughter cells. However, many bacteria harbor a conserved dimer resolution machinery consisting of one or two tyrosine recombinases, XerC and XerD, and their 28-bp target site, dif. Results To study the evolution of the dif/XerCD system and its relationship with replication termination, we report the comprehensive prediction of dif sequences in silico using a phylogenetic prediction approach based on iterated hidden Markov modeling. Using this method, dif sites were identified in 641 organisms among 16 phyla, with a 97.64% identification rate for single-chromosome strains. The dif sequence positions were shown to be strongly correlated with the GC skew shift-point that is induced by replicational mutation/selection pressures, but the difference in the positions of the predicted dif sites and the GC skew shift-points did not correlate with the degree of replicational mutation/selection pressures. Conclusions The sequence of dif sites is widely conserved among many bacterial phyla, and they can be computationally identified using our method. The lack of correlation between dif position and the degree of GC skew suggests that replication termination does not occur strictly at dif sites.

  3. Assessment of the genomic prediction accuracy for feed efficiency traits in meat-type chickens.

    Science.gov (United States)

    Liu, Tianfei; Luo, Chenglong; Wang, Jie; Ma, Jie; Shu, Dingming; Lund, Mogens Sandø; Su, Guosheng; Qu, Hao

    2017-01-01

    Feed represents the major cost of chicken production. Selection for improving feed utilization is a feasible way to reduce feed cost and greenhouse gas emissions. The objectives of this study were to investigate the efficiency of genomic prediction for feed conversion ratio (FCR), residual feed intake (RFI), average daily gain (ADG) and average daily feed intake (ADFI) and to assess the impact of selection for feed efficiency traits FCR and RFI on eviscerating percentage (EP), breast muscle percentage (BMP) and leg muscle percentage (LMP) in meat-type chickens. Genomic prediction was assessed using a 4-fold cross-validation for two validation scenarios. The first scenario was a random family sampling validation (CVF), and the second scenario was a random individual sampling validation (CVR). Variance components were estimated based on the genomic relationship built with single nucleotide polymorphism markers. Genomic estimated breeding values (GEBV) were predicted using a genomic best linear unbiased prediction model. The accuracies of GEBV were evaluated in two ways: the correlation between GEBV and corrected phenotypic value divided by the square root of heritability, i.e., the correlation-based accuracy, and model-based theoretical accuracy. Breeding values were also predicted using a conventional pedigree-based best linear unbiased prediction model in order to compare accuracies of genomic and conventional predictions. The heritability estimates of FCR and RFI were 0.29 and 0.50, respectively. The heritability estimates of ADG, ADFI, EP, BMP and LMP ranged from 0.34 to 0.53. In the CVF scenario, the correlation-based accuracy and the theoretical accuracy of genomic prediction for FCR were slightly higher than those for RFI. The correlation-based accuracies for FCR, RFI, ADG and ADFI were 0.360, 0.284, 0.574 and 0.520, respectively, and the model-based theoretical accuracies were 0.420, 0.414, 0.401 and 0.382, respectively. In the CVR scenario, the correlation

  4. Assessment of the genomic prediction accuracy for feed efficiency traits in meat-type chickens

    Science.gov (United States)

    Wang, Jie; Ma, Jie; Shu, Dingming; Lund, Mogens Sandø; Su, Guosheng; Qu, Hao

    2017-01-01

    Feed represents the major cost of chicken production. Selection for improving feed utilization is a feasible way to reduce feed cost and greenhouse gas emissions. The objectives of this study were to investigate the efficiency of genomic prediction for feed conversion ratio (FCR), residual feed intake (RFI), average daily gain (ADG) and average daily feed intake (ADFI) and to assess the impact of selection for feed efficiency traits FCR and RFI on eviscerating percentage (EP), breast muscle percentage (BMP) and leg muscle percentage (LMP) in meat-type chickens. Genomic prediction was assessed using a 4-fold cross-validation for two validation scenarios. The first scenario was a random family sampling validation (CVF), and the second scenario was a random individual sampling validation (CVR). Variance components were estimated based on the genomic relationship built with single nucleotide polymorphism markers. Genomic estimated breeding values (GEBV) were predicted using a genomic best linear unbiased prediction model. The accuracies of GEBV were evaluated in two ways: the correlation between GEBV and corrected phenotypic value divided by the square root of heritability, i.e., the correlation-based accuracy, and model-based theoretical accuracy. Breeding values were also predicted using a conventional pedigree-based best linear unbiased prediction model in order to compare accuracies of genomic and conventional predictions. The heritability estimates of FCR and RFI were 0.29 and 0.50, respectively. The heritability estimates of ADG, ADFI, EP, BMP and LMP ranged from 0.34 to 0.53. In the CVF scenario, the correlation-based accuracy and the theoretical accuracy of genomic prediction for FCR were slightly higher than those for RFI. The correlation-based accuracies for FCR, RFI, ADG and ADFI were 0.360, 0.284, 0.574 and 0.520, respectively, and the model-based theoretical accuracies were 0.420, 0.414, 0.401 and 0.382, respectively. In the CVR scenario, the correlation

  5. Predictive biomarker discovery through the parallel integration of clinical trial and functional genomics datasets

    DEFF Research Database (Denmark)

    Swanton, C.; Larkin, J.M.; Gerlinger, M.

    2010-01-01

    RNA screens to identify and validate functionally important genomic or transcriptomic predictive biomarkers of individual drug response in patients. PREDICT's approach to predictive biomarker discovery differs from conventional associative learning approaches, which can be susceptible to the detection...... inhibitor. Through the analysis of tumour tissue derived from pre-operative renal cell carcinoma (RCC) clinical trials, the PREDICT consortium will use established and novel methods to integrate comprehensive tumour-derived genomic data with personalised tumour-derived shRNA and high throughput si......, reducing ineffective therapy in drug resistant disease, leading to improved quality of life and higher cost efficiency, which in turn should broaden patient access to beneficial therapeutics, thereby enhancing clinical outcome and cancer survival. The consortium will also establish and consolidate...

  6. Genome Neighborhood Network Reveals Insights into Enediyne Biosynthesis and Facilitates Prediction and Prioritization for Discovery

    Science.gov (United States)

    Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben

    2015-01-01

    The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027

  7. Genome-wide de Novo Prediction of Proximal and Distal Tissue-Specific Enhancers

    Energy Technology Data Exchange (ETDEWEB)

    Loots, G G; Ovcharenko, I V

    2005-11-03

    Determining how transcriptional regulatory networks are encoded in the human genome is essential for understanding how cellular processes are directed. Here, we present a novel approach for systematically predicting tissue specific regulatory elements (REs) that blends genome-wide expression profiling, vertebrate genome comparisons, and pattern analysis of transcription factor binding sites. This analysis yields 4,670 candidate REs in the human genome with distinct tissue specificities, the majority of which reside far away from transcription start sites. We identify key transcription factors (TFs) for 34 distinct tissues and demonstrate that tissue-specific gene expression relies on multiple regulatory pathways employing similar, but different cohorts of interacting TFs. The methods and results we describe provide a global view of tissue specific gene regulation in humans, and propose a strategy for deciphering the transcriptional regulatory code in eukaryotes.

  8. Accuracy of predicting genomic breeding values for carcass merit traits in Angus and Charolais beef cattle.

    Science.gov (United States)

    Chen, L; Vinsky, M; Li, C

    2015-02-01

    Accuracy of predicting genomic breeding values for carcass merit traits including hot carcass weight, longissimus muscle area (REA), carcass average backfat thickness (AFAT), lean meat yield (LMY) and carcass marbling score (CMAR) was evaluated based on 543 Angus and 400 Charolais steers genotyped on the Illumina BovineSNP50 Beadchip. For the genomic prediction within Angus, the average accuracy was 0.35 with a range from 0.32 (LMY) to 0.37 (CMAR) across different training/validation data-splitting strategies and statistical methods. The within-breed genomic prediction for Charolais yielded an average accuracy of 0.36 with a range from 0.24 (REA) to 0.46 (AFAT). The across-breed prediction had the lowest accuracy, which was on average near zero. When the data from the two breeds were combined to predict the breeding values of either breed, the prediction accuracy averaged 0.35 for Angus with a range from 0.33 (REA) to 0.39 (CMAR) and averaged 0.33 for Charolais with a range from 0.18 (REA) to 0.46 (AFAT). The prediction accuracy was slightly higher on average when the data were split by animal's birth year than when the data were split by sire family. These results demonstrate that the genetic relationship or relatedness of selection candidates with the training population has a great impact on the accuracy of predicting genomic breeding values under the density of the marker panel used in this study. © 2014 Her Majesty the Queen in Right of Canada. Animal Genetics © 2014 Stichting International Foundation for Animal Genetics.

  9. A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction

    Directory of Open Access Journals (Sweden)

    Osval A. Montesinos-López

    2017-06-01

    Full Text Available There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments.

  10. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery

    DEFF Research Database (Denmark)

    Hickey, John M.; Chiurugwi, Tinashe; Mackay, Ian

    2017-01-01

    The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human...... that unifies breeding approaches, biological discovery, and tools and methods. Here we compare and contrast some animal and plant breeding approaches to make a case for bringing the two together through the application of genomic selection. We propose a strategy for the use of genomic selection as a unifying...... use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic prediction of breeding values has the potential to improve selection, reduce costs and provide a platform...

  11. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery.

    Science.gov (United States)

    Hickey, John M; Chiurugwi, Tinashe; Mackay, Ian; Powell, Wayne

    2017-08-30

    The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic prediction of breeding values has the potential to improve selection, reduce costs and provide a platform that unifies breeding approaches, biological discovery, and tools and methods. Here we compare and contrast some animal and plant breeding approaches to make a case for bringing the two together through the application of genomic selection. We propose a strategy for the use of genomic selection as a unifying approach to deliver innovative 'step changes' in the rate of genetic gain at scale.

  12. RNA 3D modules in genome-wide predictions of RNA 2D structure

    DEFF Research Database (Denmark)

    Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

    2015-01-01

    Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution....... These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D...

  13. Haplotype Based Genome-Enabled Prediction of Traits Across Nordic Red Cattle Breeds

    DEFF Research Database (Denmark)

    Castro Dias Cuyabano, Beatriz; Lund, Mogens Sandø; Rosa, G J M;

    SNP markers have been widely explored in genome based prediction. This study explored the use of haplotype blocks (haploblocks) to predict five milk production traits (fertility, mastitis, protein, fat and milk yield), using a mix of Nordic Red cattle as reference population for training.......1% higher reliability than with the individual SNP approach in mastitis. This work gives evidence that predictions using haploblocks along with a combined training population of dairy cattle, may improve prediction accuracy of important traits in the individual populations........ Predictions were performed under a Bayesian approach comparing a GBLUP and a mixture model. In general, predictions were more reliable when using haploblocks instead of individual SNPs as predictors. The Danish Red cattle presented the largest benefit in predictive ability from haploblocks, achieving 5...

  14. Performance of genomic prediction within and across generations in maritime pine

    NARCIS (Netherlands)

    Bartholomé, Jérôme; Heerwaarden, Van Joost; Isik, Fikret; Boury, Christophe; Vidal, Marjorie; Plomion, Christophe; Bouffier, Laurent

    2016-01-01

    Background: Genomic selection (GS) is a promising approach for decreasing breeding cycle length in forest trees. Assessment of progeny performance and of the prediction accuracy of GS models over generations is therefore a key issue. Results: A reference population of maritime pine (Pinus

  15. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed was...

  16. Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications

    Science.gov (United States)

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optim...

  17. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these

  18. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed was...

  19. Genomic prediction of continuous and binary fertility traits of females in a composite beef cattle breed

    Science.gov (United States)

    Reproduction efficiency is a major factor in the profitability of the beef cattle industry. Genomic selection (GS) is a promising tool that may improve the predictive accuracy and genetic gain of fertility traits. There is a wide range of traits used to measure fertility in dairy and beef cattle inc...

  20. A combined approach for genome wide protein function annotation/prediction

    DEFF Research Database (Denmark)

    Benso, Alfredo; Di Carlo, Stefano; Ur Rehman, Hafeez

    2013-01-01

    proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. METHODS: We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein...

  1. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species.

    Science.gov (United States)

    de Bruijn, Irene; de Kock, Maarten J D; Yang, Meng; de Waard, Pieter; van Beek, Teris A; Raaijmakers, Jos M

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these predictions, however, are untested and the association between genome sequence and biological function of the predicted metabolite is lacking. Here we report the genome-based identification of previously unknown CLP gene clusters in plant pathogenic Pseudomonas syringae strains B728a and DC3000 and in plant beneficial Pseudomonas fluorescens Pf0-1 and SBW25. For P. fluorescens SBW25, a model strain in studying bacterial evolution and adaptation, the structure of the CLP with a predicted 9-amino acid peptide moiety was confirmed by chemical analyses. Mutagenesis confirmed that the three identified NRPS genes are essential for CLP synthesis in strain SBW25. CLP production was shown to play a key role in motility, biofilm formation and in activity of SBW25 against zoospores of Phytophthora infestans. This is the first time that an antimicrobial metabolite is identified from strain SBW25. The results indicate that genome mining may enable the discovery of unknown gene clusters and traits that are highly relevant in the lifestyle of plant beneficial and plant pathogenic bacteria.

  2. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these pre

  3. Preliminary genomic predictions of feed saved for 1.4 million Holsteins

    Science.gov (United States)

    Genomic predictions of transmitting ability (GPTAs) for residual feed intake (RFI) were computed using data from 4,621 42-day and 202 28-day feed intake trials of 3,947 U.S. Holsteins born 1999-2013 in 9 research herds. The 28-day records had 8.5% larger error variance than 42-day records and receiv...

  4. Potential of marker selection to increase prediction accuracy of genomic selection in soybean (Glycine max L.).

    Science.gov (United States)

    Ma, Yansong; Reif, Jochen C; Jiang, Yong; Wen, Zixiang; Wang, Dechun; Liu, Zhangxiong; Guo, Yong; Wei, Shuhong; Wang, Shuming; Yang, Chunming; Wang, Huicai; Yang, Chunyan; Lu, Weiguo; Xu, Ran; Zhou, Rong; Wang, Ruizhen; Sun, Zudong; Chen, Huaizhu; Zhang, Wanhai; Wu, Jian; Hu, Guohua; Liu, Chunyan; Luan, Xiaoyan; Fu, Yashu; Guo, Tai; Han, Tianfu; Zhang, Mengchen; Sun, Bincheng; Zhang, Lei; Chen, Weiyuan; Wu, Cunxiang; Sun, Shi; Yuan, Baojun; Zhou, Xinan; Han, Dezhi; Yan, Hongrui; Li, Wenbin; Qiu, Lijuan

    Genomic selection is a promising molecular breeding strategy enhancing genetic gain per unit time. The objectives of our study were to (1) explore the prediction accuracy of genomic selection for plant height and yield per plant in soybean [Glycine max (L.) Merr.], (2) discuss the relationship between prediction accuracy and numbers of markers, and (3) evaluate the effect of marker preselection based on different methods on the prediction accuracy. Our study is based on a population of 235 soybean varieties which were evaluated for plant height and yield per plant at multiple locations and genotyped by 5361 single nucleotide polymorphism markers. We applied ridge regression best linear unbiased prediction coupled with fivefold cross-validations and evaluated three strategies of marker preselection. For plant height, marker density and marker preselection procedure impacted prediction accuracy only marginally. In contrast, for grain yield, prediction accuracy based on markers selected with a haplotype block analyses-based approach increased by approximately 4 % compared with random or equidistant marker sampling. Thus, applying marker preselection based on haplotype blocks is an interesting option for a cost-efficient implementation of genomic selection for grain yield in soybean breeding.

  5. Getting the Word Out on the Human Genome Project: A Course for Physicians

    Energy Technology Data Exchange (ETDEWEB)

    Sara L. Tobin

    2004-09-29

    Our project, ''Getting the Word Out on the Human Genome Project: A Course for Physicians,'' presented educational goals to convey the power and promise of the Human Genome Program to a variety of professional, educational, and public audiences. Our initial goal was to provide practicing physicians with a comprehensive multimedia tool to update their skills in the genomic era. We therefore created the multimedia courseware, ''The New Genetics: Courseware for Physicians. Molecular Concepts, Applications, and Ramifications.'' However, as the project moved forward, several unanticipated audiences found the courseware to be useful for instruction and for self-education, so an additional edition of the courseware ''The New Genetics: Medicine and the Human Genome. Molecular Concepts, Applications, and Ramifications'' was published simultaneously with the physician version. At the time that both versions of the courseware were being completed, Stanford's Office of Technology Licensing opted not to commercialize the courseware and offered a license-back agreement if the authors founded a commercial business. The authors thus became closely involved in marketing and sales, and several thousand copies of the courseware have been sold. Surprisingly, the non-physician version has turned out to be more in demand, and this has led us in several new directions, most of which involve undergraduate education. These are discussed in detail in the Report.

  6. Cross–Project Defect Prediction With Respect To Code Ownership Model: An Empirical Study

    Directory of Open Access Journals (Sweden)

    Marian Jureczko

    2015-06-01

    Full Text Available The paper presents an analysis of 83 versions of industrial, open-source and academic projects. We have empirically evaluated whether those project types constitute separate classes of projects with regard to defect prediction. Statistical tests proved that there exist significant differences between the models trained on the aforementioned project classes. This work makes the next step towards cross-project reusability of defect prediction models and facilitates their adoption, which has been very limited so far.

  7. Influence of outliers on accuracy estimation in genomic prediction in plant breeding.

    Science.gov (United States)

    Estaghvirou, Sidi Boubacar Ould; Ogutu, Joseph O; Piepho, Hans-Peter

    2014-10-01

    Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 10 scenarios to evaluate the influence of outliers on the performance of seven methods for estimating accuracy. These scenarios are defined by the number of genotypes, marker effect variance, and magnitude of outliers. To mimic outliers, we added to one observation in each simulated dataset, in turn, 5-, 8-, and 10-times the error SD used to simulate small and large phenotypic datasets. The effect of outliers on accuracy estimation was evaluated by comparing deviations in the estimated and true accuracies for datasets with and without outliers. Outliers adversely influenced accuracy estimation, more so at small values of genetic variance or number of genotypes. A method for estimating heritability and predictive accuracy in plant breeding and another used to estimate accuracy in animal breeding were the most accurate and resistant to outliers across all scenarios and are therefore preferable for accuracy estimation in genomic prediction studies. The performances of the other five methods that use cross-validation were less consistent and varied widely across scenarios. The computing time for the methods increased as the size of outliers and sample size increased and the genetic variance decreased. Copyright © 2014 Ould Estaghvirou et al.

  8. [A novel method of the genome-wide prediction for the target genes and its application].

    Science.gov (United States)

    Zhang, Jing-Jing; Feng, Jing; Zhu, Ying-Guo; Li, Yang-Sheng

    2006-10-01

    Based on the protein databases of several model species, this study developed a new method of the Genome-wide prediction for the target genes, using Hidden Markov model by Perl programming. The advantages of this method are high throughput, high quality and easy prediction, especially in the case of multi-domains proteins families. By this method, we predicted the PPR and TPR proteins families in whole genome of several model species. There were 536 PPR proteins and 199 TPR proteins in Oryza sativa ssp. japonica, 519 PPR proteins and 177 TPR proteins in Oryza sativa L. ssp. indica, 735 PPR proteins and 292 TPR proteins in Arabidopsis thaliana, 6 PPR proteins and 32 TPR proteins in Cyanidioschyzon merolae. Synechococcus and Thermophilic archaebacterium did not have PPR proteins. By contrast, 10 TPR proteins were found in Synechococcus and 4 TPR proteins were found in Thermophilic archaebacterium. Moreover, of these results, some further bioinformatics analyses were conducted.

  9. Identification of DNA motifs implicated in maintenance of bacterial core genomes by predictive modeling.

    Science.gov (United States)

    Halpern, David; Chiapello, Hélène; Schbath, Sophie; Robin, Stéphane; Hennequet-Antier, Christelle; Gruss, Alexandra; El Karoui, Meriem

    2007-09-01

    Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the "crossover hotspot instigator," or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions.

  10. Identification of DNA motifs implicated in maintenance of bacterial core genomes by predictive modeling.

    Directory of Open Access Journals (Sweden)

    David Halpern

    2007-09-01

    Full Text Available Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the "crossover hotspot instigator," or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions.

  11. Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer.

    Directory of Open Access Journals (Sweden)

    Giovanny Covarrubias-Pazaran

    Full Text Available Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Moreover, Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. A new open-source R package called sommer is presented to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The use of sommer for genomic prediction is demonstrated through several examples using maize and wheat genotypic and phenotypic data. At its core, the program contains three algorithms for estimating variance components: Average information (AI, Expectation-Maximization (EM and Efficient Mixed Model Association (EMMA. Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. Results from sommer were comparable to other software, but the analysis was faster than Bayesian counterparts in the magnitude of hours to days. In addition, ability to deal with missing data, combined with greater flexibility and speed than other REML-based software was achieved by putting together some of the most efficient algorithms to fit models in a gentle environment such as R.

  12. Comparative genomics of bacterial and plant folate synthesis and salvage: predictions and validations

    Directory of Open Access Journals (Sweden)

    Noiriel Alexandre

    2007-07-01

    Full Text Available Abstract Background Folate synthesis and salvage pathways are relatively well known from classical biochemistry and genetics but they have not been subjected to comparative genomic analysis. The availability of genome sequences from hundreds of diverse bacteria, and from Arabidopsis thaliana, enabled such an analysis using the SEED database and its tools. This study reports the results of the analysis and integrates them with new and existing experimental data. Results Based on sequence similarity and the clustering, fusion, and phylogenetic distribution of genes, several functional predictions emerged from this analysis. For bacteria, these included the existence of novel GTP cyclohydrolase I and folylpolyglutamate synthase gene families, and of a trifunctional p-aminobenzoate synthesis gene. For plants and bacteria, the predictions comprised the identities of a 'missing' folate synthesis gene (folQ and of a folate transporter, and the absence from plants of a folate salvage enzyme. Genetic and biochemical tests bore out these predictions. Conclusion For bacteria, these results demonstrate that much can be learnt from comparative genomics, even for well-explored primary metabolic pathways. For plants, the findings particularly illustrate the potential for rapid functional assignment of unknown genes that have prokaryotic homologs, by analyzing which genes are associated with the latter. More generally, our data indicate how combined genomic analysis of both plants and prokaryotes can be more powerful than isolated examination of either group alone.

  13. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  14. Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response variables

    DEFF Research Database (Denmark)

    Guo, Gang; Lund, Mogens Sandø; Zhang, Y;

    2010-01-01

    This study compared genomic predictions using conventional estimated breeding values (EBV) and daughter yield deviations (DYD) as response variables based on simulated data. Eight scenarios were simulated in regard to heritability (0.05 and 0.30), number of daughters per sire (30, 100, and unequal......), the EBV and DYD approaches provided similar genomic estimated breeding value (GEBV) reliabilities, except for scenarios with unequal numbers of daughters and half of sires without genotype, for which the EBV approach was superior to the DYD approach (by 1.2 and 2.4%). Using a Bayesian mixture prior model...

  15. Can metabolomics in addition to genomics add to prognostic and predictive information in breast cancer?

    Science.gov (United States)

    Howell, Anthony

    2010-11-16

    Genomic data from breast cancers provide additional prognostic and predictive information that is beginning to be used for patient management. The question arises whether additional information derived from other 'omic' approaches such as metabolomics can provide additional information. In an article published this month in BMC Cancer, Borgan et al. add metabolomic information to genomic measures in breast tumours and demonstrate, for the first time, that it may be possible to further define subgroups of patients which could be of value clinically. See research article: http://www.biomedcentral.com/1471-2407/10/628.

  16. Using physicochemical and compositional characteristics of DNA sequence for prediction of genomic signals

    KAUST Repository

    Mulamba, Pierre Abraham

    2014-12-01

    The challenge in finding genes in eukaryotic organisms using computational methods is an ongoing problem in the biology. Based on various genomic signals found in eukaryotic genomes, this problem can be divided into many different sub­‐problems such as identification of transcription start sites, translation initiation sites, splice sites, poly (A) signals, etc. Each sub-­problem deals with a particular type of genomic signals and various computational methods are used to solve each sub-­problem. Aggregating information from all these individual sub-­problems can lead to a complete annotation of a gene and its component signals. The fundamental principle of most of these computational methods is the mapping principle – building an input-­output model for the prediction of a particular genomic signal based on a set of known input signals and their corresponding output signal. The type of input signals used to build the model is an essential element in most of these computational methods. The common factor of most of these methods is that they are mainly based on the statistical analysis of the basic nucleotide sequence string composition. 4 Our study is based on a novel approach to predict genomic signals in which uniquely generated structural profiles that combine compressed physicochemical properties with topological and compositional properties of DNA sequences are used to develop machine learning predictive models. The compression of the physicochemical properties is made using principal component analysis transformation. Our ideas are evaluated through prediction models of canonical splice sites using support vector machine models. We demonstrate across several species that the proposed methodology has resulted in the most accurate splice site predictors that are publicly available or described. We believe that the approach in this study is quite general and has various applications in other biological modeling problems.

  17. Effect of marker-data editing on the accuracy of genomic prediction.

    Science.gov (United States)

    Edriss, V; Guldbrandtsen, B; Lund, M S; Su, G

    2013-04-01

    Genomic selection is a method to predict breeding values using genome-wide single-nucleotide polymorphism (SNP) markers. High-quality marker data are necessary for genomic selection. The aim of this study was to investigate the effect of marker-editing criteria on the accuracy of genomic predictions in the Nordic Holstein and Jersey populations. Data included 4429 Holstein and 1071 Jersey bulls. In total, 48,222 SNP for Holstein and 44,305 SNP for Jersey were polymorphic. The SNP data were edited based on (i) minor allele frequencies (MAF) with thresholds of no limit, 0.001, 0.01, 0.02, 0.05 and 0.10, (ii) deviations from Hardy-Weinberg proportions (HWP) with thresholds of no limit, chi-squared p-values of 0.001, 0.02, 0.05 and 0.10, and (iii) GenCall (GC) scores with thresholds of 0.15, 0.55, 0.60, 0.65 and 0.70. The marker data sets edited with different criteria were used for genomic prediction of protein yield, fertility and mastitis using a Bayesian variable selection and a GBLUP model. De-regressed EBV were used as response variables. The result showed little difference between prediction accuracies based on marker data sets edited with MAF and deviation from HWP. However, accuracy decreased with more stringent thresholds of GC score. According to the results of this study, it would be appropriate to edit data with restriction of MAF being between 0.01 and 0.02, a p-value of deviation from HWP being 0.05, and keeping all individual SNP genotypes having a GC score over 0.15. © 2012 Blackwell Verlag GmbH.

  18. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

    Science.gov (United States)

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-01-01

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation. PMID:28212312

  19. Genomic prediction of survival time in a population of brown laying hens showing cannibalistic behavior.

    Science.gov (United States)

    Alemu, Setegn W; Calus, Mario P L; Muir, William M; Peeters, Katrijn; Vereijken, Addie; Bijma, Piter

    2016-09-13

    Mortality due to cannibalism causes both economic and welfare problems in laying hens. To limit mortality due to cannibalism, laying hens are often beak-trimmed, which is undesirable for animal welfare reasons. Genetic selection is an alternative strategy to increase survival and is more efficient by taking heritable variation that originates from social interactions into account, which are modelled as the so-called indirect genetic effects (IGE). Despite the considerable heritable variation in survival time due to IGE, genetic improvement of survival time in laying hens is still challenging because the detected heritable variation of the trait with IGE is still limited, ranging from 0.06 to 0.26, and individuals that are still alive at the end of the recording period are censored. Furthermore, survival time records are available late in life and only on females. To cope with these challenges, we tested the hypothesis that genomic prediction increases the accuracy of estimated breeding values (EBV) compared to parental average EBV, and increases response to selection for survival time compared to a traditional breeding scheme. We tested this hypothesis in two lines of brown layers with intact beaks, which show cannibalism, and also the hypothesis that the rate of inbreeding per year is lower for genomic selection than for the traditional breeding scheme. The standard deviation of genomic prediction EBV for survival time was around 22 days for both lines, indicating good prospects for selection against mortality in laying hens with intact beaks. Genomic prediction increased the accuracy of the EBV by 35 and 32 % compared to the parent average EBV for the two lines. At the current reference population size, predicted response to selection was 91 % higher when using genomic selection than with the traditional breeding scheme, as a result of a shorter generation interval in males and greater accuracy of selection in females. The predicted rate of inbreeding per

  20. The Human Genome Project and Mental Retardation: An Educational Program. Final Progress Report

    Energy Technology Data Exchange (ETDEWEB)

    Davis, Sharon

    1999-05-03

    The Arc, a national organization on mental retardation, conducted an educational program for members, many of whom have a family member with a genetic condition causing mental retardation. The project informed members about the Human Genome scientific efforts, conducted training regarding ethical, legal and social implications and involved members in issue discussions. Short reports and fact sheets on genetic and ELSI topics were disseminated to 2,200 of the Arc's leaders across the country and to other interested individuals. Materials produced by the project can e found on the Arc's web site, TheArc.org.

  1. Behavioral, Brain Imaging and Genomic Measures to Predict Functional Outcomes Post - Bed Rest and Spaceflight

    Science.gov (United States)

    Mulavara, A. P.; DeDios, Y. E.; Gadd, N. E.; Caldwell, E. E.; Batson, C. D.; Goel, R.; Seidler, R. D.; Oddsson, L.; Zanello, S.; Clarke, T.; Peters, B.; Cohen, H. S.; Reschke, M.; Wood, S.; Bloomberg, J. J.

    2016-01-01

    Astronauts experience sensorimotor disturbances during their initial exposure to microgravity and during the re-adaptation phase following a return to an Earth-gravitational environment. These alterations may disrupt crewmembers' ability to perform mission critical functional tasks requiring ambulation, manual control and gaze stability. Interestingly, astronauts who return from spaceflight show substantial differences in their abilities to readapt to a gravitational environment. The ability to predict the manner and degree to which individual astronauts would be affected would improve the effectiveness of countermeasure training programs designed to enhance sensorimotor adaptability. For such an approach to succeed, we must develop predictive measures of sensorimotor adaptability that will allow us to foresee, before actual spaceflight, which crewmembers are likely to experience the greatest challenges to their adaptive capacities. The goals of this project are to identify and characterize this set of predictive measures. Our approach includes: 1) behavioral tests to assess sensory bias and adaptability quantified using both strategic and plastic-adaptive responses; 2) imaging to determine individual brain morphological and functional features, using structural magnetic resonance imaging (MRI), diffusion tensor imaging, resting state functional connectivity MRI, and sensorimotor adaptation task-related functional brain activation; and 3) assessment of genotypic markers of genetic polymorphisms in the catechol-O-methyl transferase, dopamine receptor D2, and brain-derived neurotrophic factor genes and genetic polymorphisms of alpha2-adrenergic receptors that play a role in the neural pathways underlying sensorimotor adaptation. We anticipate that these predictive measures will be significantly correlated with individual differences in sensorimotor adaptability after long-duration spaceflight and exposure to an analog bed rest environment. We will be conducting a

  2. The database of the PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) project.

    Science.gov (United States)

    Hudson, Lawrence N; Newbold, Tim; Contu, Sara; Hill, Samantha L L; Lysenko, Igor; De Palma, Adriana; Phillips, Helen R P; Alhusseini, Tamera I; Bedford, Felicity E; Bennett, Dominic J; Booth, Hollie; Burton, Victoria J; Chng, Charlotte W T; Choimes, Argyrios; Correia, David L P; Day, Julie; Echeverría-Londoño, Susy; Emerson, Susan R; Gao, Di; Garon, Morgan; Harrison, Michelle L K; Ingram, Daniel J; Jung, Martin; Kemp, Victoria; Kirkpatrick, Lucinda; Martin, Callum D; Pan, Yuan; Pask-Hale, Gwilym D; Pynegar, Edwin L; Robinson, Alexandra N; Sanchez-Ortiz, Katia; Senior, Rebecca A; Simmons, Benno I; White, Hannah J; Zhang, Hanbin; Aben, Job; Abrahamczyk, Stefan; Adum, Gilbert B; Aguilar-Barquero, Virginia; Aizen, Marcelo A; Albertos, Belén; Alcala, E L; Del Mar Alguacil, Maria; Alignier, Audrey; Ancrenaz, Marc; Andersen, Alan N; Arbeláez-Cortés, Enrique; Armbrecht, Inge; Arroyo-Rodríguez, Víctor; Aumann, Tom; Axmacher, Jan C; Azhar, Badrul; Azpiroz, Adrián B; Baeten, Lander; Bakayoko, Adama; Báldi, András; Banks, John E; Baral, Sharad K; Barlow, Jos; Barratt, Barbara I P; Barrico, Lurdes; Bartolommei, Paola; Barton, Diane M; Basset, Yves; Batáry, Péter; Bates, Adam J; Baur, Bruno; Bayne, Erin M; Beja, Pedro; Benedick, Suzan; Berg, Åke; Bernard, Henry; Berry, Nicholas J; Bhatt, Dinesh; Bicknell, Jake E; Bihn, Jochen H; Blake, Robin J; Bobo, Kadiri S; Bóçon, Roberto; Boekhout, Teun; Böhning-Gaese, Katrin; Bonham, Kevin J; Borges, Paulo A V; Borges, Sérgio H; Boutin, Céline; Bouyer, Jérémy; Bragagnolo, Cibele; Brandt, Jodi S; Brearley, Francis Q; Brito, Isabel; Bros, Vicenç; Brunet, Jörg; Buczkowski, Grzegorz; Buddle, Christopher M; Bugter, Rob; Buscardo, Erika; Buse, Jörn; Cabra-García, Jimmy; Cáceres, Nilton C; Cagle, Nicolette L; Calviño-Cancela, María; Cameron, Sydney A; Cancello, Eliana M; Caparrós, Rut; Cardoso, Pedro; Carpenter, Dan; Carrijo, Tiago F; Carvalho, Anelena L; Cassano, Camila R; Castro, Helena; Castro-Luna, Alejandro A; Rolando, Cerda B; Cerezo, Alexis; Chapman, Kim Alan; Chauvat, Matthieu; Christensen, Morten; Clarke, Francis M; Cleary, Daniel F R; Colombo, Giorgio; Connop, Stuart P; Craig, Michael D; Cruz-López, Leopoldo; Cunningham, Saul A; D'Aniello, Biagio; D'Cruze, Neil; da Silva, Pedro Giovâni; Dallimer, Martin; Danquah, Emmanuel; Darvill, Ben; Dauber, Jens; Davis, Adrian L V; Dawson, Jeff; de Sassi, Claudio; de Thoisy, Benoit; Deheuvels, Olivier; Dejean, Alain; Devineau, Jean-Louis; Diekötter, Tim; Dolia, Jignasu V; Domínguez, Erwin; Dominguez-Haydar, Yamileth; Dorn, Silvia; Draper, Isabel; Dreber, Niels; Dumont, Bertrand; Dures, Simon G; Dynesius, Mats; Edenius, Lars; Eggleton, Paul; Eigenbrod, Felix; Elek, Zoltán; Entling, Martin H; Esler, Karen J; de Lima, Ricardo F; Faruk, Aisyah; Farwig, Nina; Fayle, Tom M; Felicioli, Antonio; Felton, Annika M; Fensham, Roderick J; Fernandez, Ignacio C; Ferreira, Catarina C; Ficetola, Gentile F; Fiera, Cristina; Filgueiras, Bruno K C; Fırıncıoğlu, Hüseyin K; Flaspohler, David; Floren, Andreas; Fonte, Steven J; Fournier, Anne; Fowler, Robert E; Franzén, Markus; Fraser, Lauchlan H; Fredriksson, Gabriella M; Freire, Geraldo B; Frizzo, Tiago L M; Fukuda, Daisuke; Furlani, Dario; Gaigher, René; Ganzhorn, Jörg U; García, Karla P; Garcia-R, Juan C; Garden, Jenni G; Garilleti, Ricardo; Ge, Bao-Ming; Gendreau-Berthiaume, Benoit; Gerard, Philippa J; Gheler-Costa, Carla; Gilbert, Benjamin; Giordani, Paolo; Giordano, Simonetta; Golodets, Carly; Gomes, Laurens G L; Gould, Rachelle K; Goulson, Dave; Gove, Aaron D; Granjon, Laurent; Grass, Ingo; Gray, Claudia L; Grogan, James; Gu, Weibin; Guardiola, Moisès; Gunawardene, Nihara R; Gutierrez, Alvaro G; Gutiérrez-Lamus, Doris L; Haarmeyer, Daniela H; Hanley, Mick E; Hanson, Thor; Hashim, Nor R; Hassan, Shombe N; Hatfield, Richard G; Hawes, Joseph E; Hayward, Matt W; Hébert, Christian; Helden, Alvin J; Henden, John-André; Henschel, Philipp; Hernández, Lionel; Herrera, James P; Herrmann, Farina; Herzog, Felix; Higuera-Diaz, Diego; Hilje, Branko; Höfer, Hubert; Hoffmann, Anke; Horgan, Finbarr G; Hornung, Elisabeth; Horváth, Roland; Hylander, Kristoffer; Isaacs-Cubides, Paola; Ishida, Hiroaki; Ishitani, Masahiro; Jacobs, Carmen T; Jaramillo, Víctor J; Jauker, Birgit; Hernández, F Jiménez; Johnson, McKenzie F; Jolli, Virat; Jonsell, Mats; Juliani, S Nur; Jung, Thomas S; Kapoor, Vena; Kappes, Heike; Kati, Vassiliki; Katovai, Eric; Kellner, Klaus; Kessler, Michael; Kirby, Kathryn R; Kittle, Andrew M; Knight, Mairi E; Knop, Eva; Kohler, Florian; Koivula, Matti; Kolb, Annette

    2017-01-01

    The PREDICTS project-Projecting Responses of Ecological Diversity In Changing Terrestrial Systems (www.predicts.org.uk)-has collated from published studies a large, reasonably representative database of comparable samples of biodiversity from multiple sites that differ in the nature or intensity of human impacts relating to land use. We have used this evidence base to develop global and regional statistical models of how local biodiversity responds to these measures. We describe and make freely available this 2016 release of the database, containing more than 3.2 million records sampled at over 26,000 locations and representing over 47,000 species. We outline how the database can help in answering a range of questions in ecology and conservation biology. To our knowledge, this is the largest and most geographically and taxonomically representative database of spatial comparisons of biodiversity that has been collated to date; it will be useful to researchers and international efforts wishing to model and understand the global status of biodiversity.

  3. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of

  4. Protein Subcellular Localization Prediction and Genomic Polymorphism Analysis of the SARS Coronavirus

    Institute of Scientific and Technical Information of China (English)

    季星来; 柳树群; 李岭; 孙之荣

    2004-01-01

    The cause of severe acute respiratory syndrome (SARS) has been identified as a new coronavirus (CoV).Several sequences of the complete genome of SARS-CoV have been determined.The subcellular localization (SubLocation) of annotated open-reading frames of the SARS-CoV genome was predicted using a support vector machine.Several gene products were predicted to locate in the Golgi body and cell nucleus.The SubLocation information was combined with predicted transmembrane information to develop a model of the viral life cycle.The results show that this information can be used to predict the functions of genes and even the virus pathogenesis.In addition,the entire SARS viral genome sequences currently available in GenBank were compared to identify the sequence variations among different isolates.Some variations in the Hong Kong strains may be related to the special clinical manifestations and provide clues for understanding the relationship between gene functions and evolution.These variations reflect the evolution of the SARS virus in human populations and may help development of a vaccine.

  5. Prediction of type III secretion signals in genomes of gram-negative bacteria.

    Directory of Open Access Journals (Sweden)

    Martin Löwer

    Full Text Available BACKGROUND: Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins" have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein, their chromosomes (11% and plasmids (13%, as well as 213 Firmicute genomes (7%. CONCLUSIONS/SIGNIFICANCE: We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org.

  6. The projection of a test genome onto a reference population and applications to humans and archaic hominins.

    Science.gov (United States)

    Yang, Melinda A; Harris, Kelley; Slatkin, Montgomery

    2014-12-01

    We introduce a method for comparing a test genome with numerous genomes from a reference population. Sites in the test genome are given a weight, w, that depends on the allele frequency, x, in the reference population. The projection of the test genome onto the reference population is the average weight for each x, [Formula: see text]. The weight is assigned in such a way that, if the test genome is a random sample from the reference population, then [Formula: see text]. Using analytic theory, numerical analysis, and simulations, we show how the projection depends on the time of population splitting, the history of admixture, and changes in past population size. The projection is sensitive to small amounts of past admixture, the direction of admixture, and admixture from a population not sampled (a ghost population). We compute the projections of several human and two archaic genomes onto three reference populations from the 1000 Genomes project-Europeans, Han Chinese, and Yoruba-and discuss the consistency of our analysis with previously published results for European and Yoruba demographic history. Including higher amounts of admixture between Europeans and Yoruba soon after their separation and low amounts of admixture more recently can resolve discrepancies between the projections and demographic inferences from some previous studies.

  7. Evaluative profiling of arsenic sensing and regulatory systems in the human microbiome project genomes.

    Science.gov (United States)

    Isokpehi, Raphael D; Udensi, Udensi K; Simmons, Shaneka S; Hollman, Antoinesha L; Cain, Antia E; Olofinsae, Samson A; Hassan, Oluwabukola A; Kashim, Zainab A; Enejoh, Ojochenemi A; Fasesan, Deborah E; Nashiru, Oyekanmi

    2014-01-01

    The influence of environmental chemicals including arsenic, a type 1 carcinogen, on the composition and function of the human-associated microbiota is of significance in human health and disease. We have developed a suite of bioinformatics and visual analytics methods to evaluate the availability (presence or absence) and abundance of functional annotations in a microbial genome for seven Pfam protein families: As(III)-responsive transcriptional repressor (ArsR), anion-transporting ATPase (ArsA), arsenical pump membrane protein (ArsB), arsenate reductase (ArsC), arsenical resistance operon transacting repressor (ArsD), water/glycerol transport protein (aquaporins), and universal stress protein (USP). These genes encode function for sensing and/or regulating arsenic content in the bacterial cell. The evaluative profiling strategy was applied to 3,274 genomes from which 62 genomes from 18 genera were identified to contain genes for the seven protein families. Our list included 12 genomes in the Human Microbiome Project (HMP) from the following genera: Citrobacter, Escherichia, Lactobacillus, Providencia, Rhodococcus, and Staphylococcus. Gene neighborhood analysis of the arsenic resistance operon in the genome of Bacteroides thetaiotaomicron VPI-5482, a human gut symbiont, revealed the adjacent arrangement of genes for arsenite binding/transfer (ArsD) and cytochrome c biosynthesis (DsbD_2). Visual analytics facilitated evaluation of protein annotations in 367 genomes in the phylum Bacteroidetes identified multiple genomes in which genes for ArsD and DsbD_2 were adjacently arranged. Cytochrome c, produced by a posttranslational process, consists of heme-containing proteins important for cellular energy production and signaling. Further research is desired to elucidate arsenic resistance and arsenic-mediated cellular energy production in the Bacteroidetes.

  8. Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction

    Science.gov (United States)

    Bandeira e Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose

    2017-01-01

    Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. PMID:28455415

  9. Impact of Relationships between Test and Reference Animals and between Reference Animals on Reliability of Genomic Prediction

    DEFF Research Database (Denmark)

    Wu, Xiaoping; Lund, Mogens Sandø; Sun, Dongxiao

    as a common test population. A GBLUP model and a Bayesian mixture model were applied to predict Genomic breeding values for bulls in the test data. Result showed that a closer relationship between test and reference animals led to a higher reliability, while a closer relationship between reference animal......This study investigated reliability of genomic prediction in various scenarios with regard to relationship between test and reference animals and between animals within the reference population. Different reference populations were generated from EuroGenomics data and 1288 Nordic Holstein bulls...... resulted in a lower reliability. Therefore, the design of reference population is important for improving the reliability of genomic prediction. With regard to model, the Bayesian mixture model in general led to slightly a higher reliability of genomic prediction than the GBLUP model....

  10. Evidence of genomic adaptation to climate in Eucalyptus microcarpa: implications for adaptive potential to projected climate change.

    Science.gov (United States)

    Jordan, Rebecca; Hoffmann, Ary A; Dillon, Shannon K; Prober, Suzanne M

    2017-09-01

    Understanding whether populations can adapt in situ or whether interventions are required is of key importance for biodiversity management under climate change. Landscape genomics is becoming an increasingly important and powerful tool for rapid assessments of climate adaptation, especially in long-lived species such as trees. We investigated climate adaptation in Eucalyptus microcarpa using the DArTseq genomic approach. A combination of FST outlier and environmental association analyses were performed using > 4,200 genome-wide single nucleotide polymorphisms (SNPs) from 26 populations spanning climate gradients in south-eastern Australia. Eighty-one SNPs were identified as putatively adaptive, based on significance in FST outlier tests and significant associations with one or more climate variables related to temperature (70 / 81), aridity (37 / 81) or precipitation (35 / 81). Adaptive SNPs were located on all 11 chromosomes, with no particular region associated with individual climate variables. Climate adaptation appeared to be characterized by subtle shifts in allele frequencies, with no consistent fixed differences identified. Based on these associations, we predict adaptation under projected changes in climate will include a suite of shifts in allele frequencies. Whether this can occur sufficiently rapidly through natural selection within populations, or would benefit from assisted gene migration, requires further evaluation. In some populations, the absence, or predicted increases to near fixation of particular adaptive alleles hint at potential limits to adaptive capacity. Together, these results reinforce the importance of standing genetic variation at the geographical level for maintaining species' evolutionary potential. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  11. Transistor roadmap projection using predictive full-band atomistic modeling

    Energy Technology Data Exchange (ETDEWEB)

    Salmani-Jelodar, M., E-mail: m.salmani@gmail.com; Klimeck, G. [Network for Computational Nanotechnology and School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907 (United States); Kim, S. [Intel Corporation, 2501 Northwest 229th Avenue, Hillsboro, Oregon 97124 (United States); Ng, K. [Semiconductor Research Corporation (SRC), 1101 Slater Rd, Durham, North Carolina 27703 (United States)

    2014-08-25

    In this letter, a full band atomistic quantum transport tool is used to predict the performance of double gate metal-oxide-semiconductor field-effect transistors (MOSFETs) over the next 15 years for International Technology Roadmap for Semiconductors (ITRS). As MOSFET channel lengths scale below 20 nm, the number of atoms in the device cross-sections becomes finite. At this scale, quantum mechanical effects play an important role in determining the device characteristics. These quantum effects can be captured with the quantum transport tool. Critical results show the ON-current degradation as a result of geometry scaling, which is in contrast to previous ITRS compact model calculations. Geometric scaling has significant effects on the ON-current by increasing source-to-drain (S/D) tunneling and altering the electronic band structure. By shortening the device gate length from 20 nm to 5.1 nm, the ratio of S/D tunneling current to the overall subthreshold OFF-current increases from 18% to 98%. Despite this ON-current degradation by scaling, the intrinsic device speed is projected to increase at a rate of at least 8% per year as a result of the reduction of the quantum capacitance.

  12. Coal Calorific Value Prediction Based on Projection Pursuit Principle

    Directory of Open Access Journals (Sweden)

    QI Minfang

    2012-10-01

    Full Text Available The calorific value of coal is an important factor for the economic operation of coal-fired power plant. However, calorific value is tremendous difference between the different coal, and even if coal is from the same mine. Restricted by the coal market, most of coal fired power plants can not burn the designed-coal by now in China. The properties of coal as received are changing so frequently that pulverized coal firing is always with the unexpected condition. Therefore, the researches on the prediction of calorific value of coal have a profound significance for the economic operation of power plants. Aiming at the problem of uncertainty of coal calorific value, establish a soft measurement model for calorific value of coal based on projection pursuit principle combined with genetic algorithm to optimize parameters, and support vector machine algorithm. It is shown by an example that the model has a stronger objectivity, effective and feasible for avoiding the disadvantage of the artificially decided weights of feature indexes. The model could provide a good guidance for the calculation of the coal calorific value and optimization operation of coal-fired power plants.  

  13. A novel genome-wide full- length kinesin prediction analysis reveals additional mammalian kinesins

    Institute of Scientific and Technical Information of China (English)

    XUE Yu; LIU Dan; FU Chuanhai; DOU Zhen; ZHOU Qing; YAO Xuebiao

    2006-01-01

    Kinesin superfamily of microtubule- based motor orchestrates a variety of cellular processes. Recent availability of mammalian genomes has enabled analyses of kinesins on the whole genome. Here we present a novel full-length kinesin prediction program (FKPP) for mammalian kinesin gene discovery based on a comparative genomics approach. Contrary to previous predictions of 94 kinesins, we identify a total of 134 potentially kinesin genes from mammalian genomes, including 45 from mouse, 45 from rat and 44 from human. In addition, FKPP synthesizes 25 potentially full-length mammalian kinesins based on the partial sequences in the database. Surprisingly, FKPP reveals that full-length human CENP-E contains 2701 aa rather than 2663 aa in the database. Experimentation using sequence specific antibody and cDNA sequencing of human CENP-E validates the accuracy of FKPP. Given the remarkable computing efficiency and accuracy of FKPP, we reclassify the mammalian kinesin superfamily. Since current databases contain many incomplete sequences, FKPP may provide a novel approach for molecular delineation of kinesins and other protein families.

  14. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    Science.gov (United States)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  15. Health and Maintenance Status Determination and Predictive Fault Diagnosis System Project

    Data.gov (United States)

    National Aeronautics and Space Administration — The objective of this project is to demonstrate intelligent health and maintenance status determination and predictive fault diagnosis techniques for NASA rocket...

  16. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  17. A systems approach to predict oncometabolites via context-specific genome-scale metabolic networks.

    Directory of Open Access Journals (Sweden)

    Hojung Nam

    2014-09-01

    Full Text Available Altered metabolism in cancer cells has been viewed as a passive response required for a malignant transformation. However, this view has changed through the recently described metabolic oncogenic factors: mutated isocitrate dehydrogenases (IDH, succinate dehydrogenase (SDH, and fumarate hydratase (FH that produce oncometabolites that competitively inhibit epigenetic regulation. In this study, we demonstrate in silico predictions of oncometabolites that have the potential to dysregulate epigenetic controls in nine types of cancer by incorporating massive scale genetic mutation information (collected from more than 1,700 cancer genomes, expression profiling data, and deploying Recon 2 to reconstruct context-specific genome-scale metabolic models. Our analysis predicted 15 compounds and 24 substructures of potential oncometabolites that could result from the loss-of-function and gain-of-function mutations of metabolic enzymes, respectively. These results suggest a substantial potential for discovering unidentified oncometabolites in various forms of cancers.

  18. A hybrid neural network system for prediction and recognition of promoter regions in human genome

    Institute of Scientific and Technical Information of China (English)

    CHEN Chuan-bo; LI Tao

    2005-01-01

    This paper proposes a high specificity and sensitivity algorithm called PromPredictor for recognizing promoter regions in the human genome. PromPredictor extracts compositional features and CpG islands information from genomic sequence,feeding these features as input for a hybrid neural network system (HNN) and then applies the HNN for prediction. It combines a novel promoter recognition model, coding theory, feature selection and dimensionality reduction with machine learning algorithm.Evaluation on Human chromosome 22 was ~66% in sensitivity and ~48% in specificity. Comparison with two other systems revealed that our method had superior sensitivity and specificity in predicting promoter regions. PromPredictor is written in MATLAB and requires Matlab to run. PromPredictor is freely available at http://www.whtelecom.com/Prompredictor.htm.

  19. Bias of genetic trend of genomic predictions based on both real dairy cattle and simulated data

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Nielsen, Ulrik Sander;

    population. In simulated data, there was no bias when the test animals were unselected cows. When the G matrix was derived from genotypes of causal genes, the bias was reduced. The results suggest that the main reasons for causing the bias of the prediction trends are the selection of bulls and bull dams......This study investigated the phenomenon of bias in the trend of genomic predictions and attempted to find the reason and solution for this bias. The data used in this study include Danish Jersey data and simulation data. In Jersey data, the bias was reduced when cows were included in the reference...... as well as the inaccurate relationship matrix. The possible strategies to eliminate the bias could be to use cow reference and improve genomic relationship matrix...

  20. The human genome project: Information management, access, and regulation. Technical progress report, 1 April--31 August 1993

    Energy Technology Data Exchange (ETDEWEB)

    McInerney, J.D.; Micikas, L.B.

    1993-09-10

    Efforts are described to prepare educational materials including computer based as well as conventional type teaching materials for training interested high school and elementary students in aspects of Human Genome Project.

  1. Ethical challenges and innovations in the dissemination of genomic data: the experience of the PERSPECTIVE project

    Directory of Open Access Journals (Sweden)

    Lévesque E

    2015-08-01

    Full Text Available Emmanuelle Lévesque,1 Bartha Maria Knoppers,1 Jacques Simard,2 1Department of Human Genetics, Centre for Genomics and Policy, McGill University, Montréal, 2Genomics Centre, CHU de Québec Research Center, Department of Molecular Medicine, Laval University, Québec City, QC, Canada Abstract: The importance of making genomic data available for future research is now widely recognized among the scientific community and policymakers. In this era of shared responsibility for data dissemination, improved patient care through research depends on the development of powerful and secure data-sharing systems. As part of the concerted effort to share research resources, the project entitled Personalized Risk Stratification for Prevention and Early Detection of Breast Cancer (PERSPECTIVE makes effective data sharing through the development of a data-sharing framework, one of its goals. The secondary uses of data from PERSPECTIVE for future research promise to enhance our knowledge of breast cancer etiologies without duplicating data-gathering efforts. Despite its benefit for research, we recognize the ethical challenges of data sharing on the local, national, and international levels. The effective management of ethical approvals for projects spanning across jurisdictions, the return of results to research participants, and research incentives and recognition for data production, are but a few pressing issues that need to be properly addressed. We discuss how we managed these issues and suggest how ongoing innovations might help to facilitate data sharing in future genomic research projects. Keywords: data sharing, research ethics, cancer

  2. Genomic predictions based on a joint reference population for the Nordic Red cattle breeds.

    Science.gov (United States)

    Zhou, L; Heringstad, B; Su, G; Guldbrandtsen, B; Meuwissen, T H E; Svendsen, M; Grove, H; Nielsen, U S; Lund, M S

    2014-07-01

    The main aim of this study was to compare accuracies of imputation and genomic predictions based on single and joint reference populations for Norwegian Red (NRF) and a composite breed (DFS) consisting of Danish Red, Finnish Ayrshire, and Swedish Red. The single nucleotide polymorphism (SNP) data for NRF consisted of 2 data sets: one including 25,000 markers (NRF25K) and the other including 50,000 markers (NRF50K). The NRF25K data set had 2,572 bulls, and the NRF50K data set had 1,128 bulls. Four hundred forty-two bulls were genotyped in both data sets (double-genotyped bulls). The DFS data set (DSF50K) included 50,000 markers of 13,472 individuals, of which around 4,700 were progeny-tested bulls. The NRF25K data set was imputed to 50,000 density using the software Beagle. The average error rate for the imputation of NRF25K decreased slightly from 0.023 to 0.021, and the correlation between observed and imputed genotypes changed from 0.935 to 0.936 when comparing the NRF50K reference and the NRF50K-DFS50K joint reference imputations. A genomic BLUP (GBLUP) model and a Bayesian 4-component mixture model were used to predict genomic breeding values for the NRF and DFS bulls based on the single and joint NRF and DFS reference populations. In the multiple population predictions, accuracies of genomic breeding values increased for the 3 production traits (milk, fat, and protein yields) for both NRF and DFS. Accuracies increased by 6 and 1.3 percentage points, on average, for the NRF and DFS bulls, respectively, using the GBLUP model, and by 9.3 and 1.3 percentage points, on average, using the Bayesian 4-component mixture model. However, accuracies for health or reproduction traits did not increase from the multiple population predictions. Among the 3 DFS populations, Swedish Red gained most in accuracies from the multiple population predictions, presumably because Swedish Red has a closer genetic relationship with NRF than Danish Red and Finnish Ayrshire. The Bayesian 4

  3. Distance from sub-Saharan Africa predicts mutational load in diverse human genomes.

    Science.gov (United States)

    Henn, Brenna M; Botigué, Laura R; Peischl, Stephan; Dupanloup, Isabelle; Lipatov, Mikhail; Maples, Brian K; Martin, Alicia R; Musharoff, Shaila; Cann, Howard; Snyder, Michael P; Excoffier, Laurent; Kidd, Jeffrey M; Bustamante, Carlos D

    2016-01-26

    The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

  4. A two step Bayesian approach for genomic prediction of breeding values

    DEFF Research Database (Denmark)

    Mahdi Shariati, Mohammad; Sørensen, Peter; Janss, Luc

    2012-01-01

    Background: In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter...... of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Conclusions: Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker...

  5. The emergence of commercial genomics: analysis of the rise of a biotechnology subsector during the Human Genome Project, 1990 to 2004.

    Science.gov (United States)

    Wiechers, Ilse R; Perin, Noah C; Cook-Deegan, Robert

    2013-01-01

    Development of the commercial genomics sector within the biotechnology industry relied heavily on the scientific commons, public funding, and technology transfer between academic and industrial research. This study tracks financial and intellectual property data on genomics firms from 1990 through 2004, thus following these firms as they emerged in the era of the Human Genome Project and through the 2000 to 2001 market bubble. A database was created based on an early survey of genomics firms, which was expanded using three web-based biotechnology services, scientific journals, and biotechnology trade and technical publications. Financial data for publicly traded firms was collected through the use of four databases specializing in firm financials. Patent searches were conducted using firm names in the US Patent and Trademark Office website search engine and the DNA Patent Database. A biotechnology subsector of genomics firms emerged in parallel to the publicly funded Human Genome Project. Trends among top firms show that hiring, capital improvement, and research and development expenditures continued to grow after a 2000 to 2001 bubble. The majority of firms are small businesses with great diversity in type of research and development, products, and services provided. Over half the public firms holding patents have the majority of their intellectual property portfolio in DNA-based patents. These data allow estimates of investment, research and development expenditures, and jobs that paralleled the rise of genomics as a sector within biotechnology between 1990 and 2004.

  6. From structure prediction to genomic screens for novel non-coding RNAs.

    Directory of Open Access Journals (Sweden)

    Jan Gorodkin

    2011-08-01

    Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  7. Fruits of human genome project and private venture, and their impact on life science.

    Science.gov (United States)

    Ikekawa, A; Ikekawa, S

    2001-12-01

    A small knowledge base was created by organizing the Human Genome Project (HGP) and its related issues in "Science" magazines between 1996 and 2000. This base revealed the stunning achievement of HGP and a private venture and its impact on today's biology and life science. In the mid-1990, they encouraged the development of advanced high throughput automated DNA sequencers and the technologies that can analyse all genes at once in a systematic fashion. Using these technologies, they completed the genome sequence of human and various other organisms. These fruits opened the door to comparative genomics, functional genomics, the interdisprinary field between computer and biology, and proteomics. They have caused a shift in biological investigation from studying single genes or proteins to studying all genes or proteins at once, and causing revolutional changes in traditional biology, drug discovery and therapy. They have expanded the range of potential drug targets and have facilitated a shift in drug discovery programs toward rational target-based strategies. They have spawned pharmacogenomics that could give rise to a new generation of highly effective drugs that treat causes, not just symptoms. They should also cause a migration from the traditional medications that are safe and effective for every members of the population to personalized medicine and personalized therapy.

  8. Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy.

    Directory of Open Access Journals (Sweden)

    Nicholas Erho

    Full Text Available PURPOSE: Clinicopathologic features and biochemical recurrence are sensitive, but not specific, predictors of metastatic disease and lethal prostate cancer. We hypothesize that a genomic expression signature detected in the primary tumor represents true biological potential of aggressive disease and provides improved prediction of early prostate cancer metastasis. METHODS: A nested case-control design was used to select 639 patients from the Mayo Clinic tumor registry who underwent radical prostatectomy between 1987 and 2001. A genomic classifier (GC was developed by modeling differential RNA expression using 1.4 million feature high-density expression arrays of men enriched for rising PSA after prostatectomy, including 213 who experienced early clinical metastasis after biochemical recurrence. A training set was used to develop a random forest classifier of 22 markers to predict for cases--men with early clinical metastasis after rising PSA. Performance of GC was compared to prognostic factors such as Gleason score and previous gene expression signatures in a withheld validation set. RESULTS: Expression profiles were generated from 545 unique patient samples, with median follow-up of 16.9 years. GC achieved an area under the receiver operating characteristic curve of 0.75 (0.67-0.83 in validation, outperforming clinical variables and gene signatures. GC was the only significant prognostic factor in multivariable analyses. Within Gleason score groups, cases with high GC scores experienced earlier death from prostate cancer and reduced overall survival. The markers in the classifier were found to be associated with a number of key biological processes in prostate cancer metastatic disease progression. CONCLUSION: A genomic classifier was developed and validated in a large patient cohort enriched with prostate cancer metastasis patients and a rising PSA that went on to experience metastatic disease. This early metastasis prediction model based on

  9. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    Science.gov (United States)

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  10. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Science.gov (United States)

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  11. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    Energy Technology Data Exchange (ETDEWEB)

    Reddy, Tatiparthi B. K. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Thomas, Alex D. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Stamatis, Dimitri [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Bertsch, Jon [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Isbandi, Michelle [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Jansson, Jakob [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Mallajosyula, Jyothi [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Pagani, Ioanna [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Lobos, Elizabeth A. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States); King Abdulaziz Univ., Jeddah (Saudi Arabia)

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  12. Untying the Gordian knot of creation: metaphors for the Human Genome Project in Greek newspapers.

    Science.gov (United States)

    Gogorosi, Eleni

    2005-12-01

    This article studies the metaphorical expressions used by newspapers to present the near completion of the Human Genome Project (HGP) to the Greek public in the year 2000. The analysis, based on cognitive metaphor theory, deals with the most frequent or captivating metaphors used to refer to the human genome, which give rise to both conventional and novel expressions. The majority of creative metaphorical expressions participate in the discourse of hope and promise propagated by the Greek media in an attempt to present the HGP and its outcome in a favorable light. Instances of the competing discourse of fear and danger are much rarer but can also be found in creative metaphorical expressions. Metaphors pertaining to the Greek culture or to ancient Greek mythology tend to carry a special rhetorical force. However, it will be shown that the Greek press strategically used most of the metaphors that circulated globally at the time, not only culture specific ones.

  13. Accuracy of genome enabled prediction in a dairy cattle population using different cross-validation layouts

    Directory of Open Access Journals (Sweden)

    M. Angeles ePérez-Cabal

    2012-02-01

    Full Text Available The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL, as well as a layout based on stratification by generation (GEN. The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, cross-validation designs should resemble the intended use of the predictive models, e.g. within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.

  14. Whole genomic prediction of growth and carcass traits in a Chinese quality chicken population.

    Science.gov (United States)

    Zhang, Z; Xu, Z-Q; Luo, Y-Y; Zhang, H-B; Gao, N; He, J-L; Ji, C-L; Zhang, D-X; Li, J-Q; Zhang, X-Q

    2017-01-01

    By incorporating high-density markers into breeding value prediction models, the whole genomic prediction (WGP) method can effectively accelerate genetic improvement in livestock breeding. However, the performance of WGP varies across species and populations and is affected by the underlying genetic architecture. In particular, very little is known about the performance of WGP for many chicken breeds. Here we estimate the genetic parameters and evaluate the performance of WGP for 18 growth and carcass traits in a Chinese quality chicken population. In total, 435 chickens were systematically phenotyped and genotyped using a 600K genotyping array. Two variance component estimation scenarios, 3 breeding value prediction methods, and 2 validation procedures were compared. The results showed that the heritability of these 18 traits was medium to high (ranging from 0.28 to 0.60) and that deviations existed between the heritability estimated from pedigrees and markers. Compared with conventional breeding methods, WGP could potentially increase the selection accuracy by 20% or more depending on the prediction model used, the trait under consideration, and the genetic connectedness between the training and validation individuals. Our results showed the potential of implementing genomic selection in small breeding herds.

  15. Whole-genome prediction of fatty acid composition in meat of Japanese Black cattle.

    Science.gov (United States)

    Onogi, A; Ogino, A; Komatsu, T; Shoji, N; Shimizu, K; Kurogi, K; Yasumori, T; Togashi, K; Iwata, H

    2015-10-01

    Because fatty acid composition influences the flavor and texture of meat, controlling it is particularly important for cattle breeds such as the Japanese Black, characterized by high meat quality. We evaluated the predictive ability of single-step genomic best linear unbiased prediction (ssGBLUP) in fatty acid composition of Japanese Black cattle by assessing the composition of seven fatty acids in 3088 cattle, of which 952 had genome-wide marker genotypes. All sires of the genotyped animals were genotyped, but their dams were not. Cross-validation was conducted for the 952 animals. The prediction accuracy was higher with ssGBLUP than with best linear unbiased prediction (BLUP) for all traits, and in an empirical investigation, the gain in accuracy of using ssGBLUP over BLUP increased as the deviations in phenotypic values of the animals increased. In addition, the superior accuracy of ssGBLUP tended to be more evident in animals whose maternal grandsire was genotyped than in other animals, although the effect was small.

  16. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat

    KAUST Repository

    Liu, Guozheng

    2016-07-06

    Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1) examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2) explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3) investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L.) and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs), but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population.

  17. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat.

    Directory of Open Access Journals (Sweden)

    Guozheng Liu

    Full Text Available Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1 examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2 explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3 investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L. and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs, but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population.

  18. Genomic biomarkers of prenatal intrauterine inflammation in umbilical cord tissue predict later life neurological outcomes.

    Science.gov (United States)

    Tilley, Sloane K; Joseph, Robert M; Kuban, Karl C K; Dammann, Olaf U; O'Shea, T Michael; Fry, Rebecca C

    2017-01-01

    Preterm birth is a major risk factor for neurodevelopmental delays and disorders. This study aimed to identify genomic biomarkers of intrauterine inflammation in umbilical cord tissue in preterm neonates that predict cognitive impairment at 10 years of age. Genome-wide messenger RNA (mRNA) levels from umbilical cord tissue were obtained from 43 neonates born before 28 weeks of gestation. Genes that were differentially expressed across four indicators of intrauterine inflammation were identified and their functions examined. Exact logistic regression was used to test whether expression levels in umbilical cord tissue predicted neurocognitive function at 10 years of age. Placental indicators of inflammation were associated with changes in the mRNA expression of 445 genes in umbilical cord tissue. Transcripts with decreased expression showed significant enrichment for biological signaling processes related to neuronal development and growth. The altered expression of six genes was found to predict neurocognitive impairment when children were 10 years old These genes include two that encode for proteins involved in neuronal development. Prenatal intrauterine inflammation is associated with altered gene expression in umbilical cord tissue. A set of six of the differentially expressed genes predict cognitive impairment later in life, suggesting that the fetal environment is associated with significant adverse effects on neurodevelopment that persist into later childhood.

  19. Insilco Prediction and Characterization of microRNAs from Oncopeltus fasciatus (Hemiptera: Lygaeidae) Genome.

    Science.gov (United States)

    Ellango, R; Asokan, R; Ramamurthy, V V

    2016-08-01

    For studies on functional genomics, small RNAs, especially microRNAs (miRNAs), have emerged as a hot topic due to their importance in cellular and developmental processes. Identification of insect miRNAs largely depends on the availability of genomic sequences in the public domain. The large milkweed bug, Oncopeltus fasciatus (Dallas) is a hemimetabolous insect which has become a model hemipteran system for various molecular studies. In this study, we identified 96 candidate mature miRNAs from O. fasciatus genome using a blast search with the previously reported animal miRNAs. The secondary structure of predicted miRNA sequences was determined online using "mfold" web server and verified by calculating the minimal free energy index (MFEI). Six miRNAs let-7e, miR-133c, miR-219b, mir-466d, mir-669f, and mir-669l are reported for the first time in Insecta. Comparison of O. fasciatus mir-2 and mir-71 family clusters to those of diverse insect species showed that they are highly conserved. The phylogenetic analysis of miRNAs revealed the evolutionary relationship of conserved miRNAs of O. fasciatus with other insect species. Using a classical rule-based algorithm method, we predicted the possible targets of the new miRNAs. Our study not only identified the list of miRNAs in O. fasciatus but also provides a basic platform for developing novel pest management strategies based on artificial miRNAs.

  20. Project Evaluation: Validation of a Scale and Analysis of Its Predictive Capacity

    Science.gov (United States)

    Fernandes Malaquias, Rodrigo; de Oliveira Malaquias, Fernanda Francielle

    2014-01-01

    The objective of this study was to validate a scale for assessment of academic projects. As a complement, we examined its predictive ability by comparing the scores of advised/corrected projects based on the model and the final scores awarded to the work by an examining panel (approximately 10 months after the project design). Results of…

  1. Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics.

    Science.gov (United States)

    Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi

    2013-02-01

    The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.

  2. Sequence-based prediction of single nucleosome positioning and genome-wide nucleosome occupancy.

    Science.gov (United States)

    van der Heijden, Thijn; van Vugt, Joke J F A; Logie, Colin; van Noort, John

    2012-09-18

    Nucleosome positioning dictates eukaryotic DNA compaction and access. To predict nucleosome positions in a statistical mechanics model, we exploited the knowledge that nucleosomes favor DNA sequences with specific periodically occurring dinucleotides. Our model is the first to capture both dyad position within a few base pairs, and free binding energy within 2 k(B)T, for all the known nucleosome positioning sequences. By applying Percus's equation to the derived energy landscape, we isolate sequence effects on genome-wide nucleosome occupancy from other factors that may influence nucleosome positioning. For both in vitro and in vivo systems, three parameters suffice to predict nucleosome occupancy with correlation coefficients of respectively 0.74 and 0.66. As predicted, we find the largest deviations in vivo around transcription start sites. This relatively simple algorithm can be used to guide future studies on the influence of DNA sequence on chromatin organization.

  3. Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis

    DEFF Research Database (Denmark)

    Pedersen, Anders Gorm; Nielsen, Henrik

    1997-01-01

    Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role.This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known...... and global sequence information. Furthermore, analysis of false predictions shows that AUGs in frame with the actual start codon are more frequently selected than out-of-frame AUGs, suggesting that our nteworks use reading frame detection. A number of conflicts between neural network predictions and database...... annotations are analysed in detail, leading to identification of possible database errors....

  4. Drought Prediction for Socio-Cultural Stability Project

    Science.gov (United States)

    Peters-Lidard, Christa; Eylander, John B.; Koster, Randall; Narapusetty, Balachandrudu; Kumar, Sujay; Rodell, Matt; Bolten, John; Mocko, David; Walker, Gregory; Arsenault, Kristi; Rheingrover, Scott

    2014-01-01

    The primary objective of this project is to answer the question: "Can existing, linked infrastructures be used to predict the onset of drought months in advance?" Based on our work, the answer to this question is "yes" with the qualifiers that skill depends on both lead-time and location, and especially with the associated teleconnections (e.g., ENSO, Indian Ocean Dipole) active in a given region season. As part of this work, we successfully developed a prototype drought early warning system based on existing/mature NASA Earth science components including the Goddard Earth Observing System Data Assimilation System Version 5 (GEOS-5) forecasting model, the Land Information System (LIS) land data assimilation software framework, the Catchment Land Surface Model (CLSM), remotely sensed terrestrial water storage from the Gravity Recovery and Climate Experiment (GRACE) and remotely sensed soil moisture products from the Aqua/Advanced Microwave Scanning Radiometer - EOS (AMSR-E). We focused on a single drought year - 2011 - during which major agricultural droughts occurred with devastating impacts in the Texas-Mexico region of North America (TEXMEX) and the Horn of Africa (HOA). Our results demonstrate that GEOS-5 precipitation forecasts show skill globally at 1-month lead, and can show up to 3 months skill regionally in the TEXMEX and HOA areas. Our results also demonstrate that the CLSM soil moisture percentiles are a goof indicator of drought, as compared to the North American Drought Monitor of TEXMEX and a combination of Famine Early Warning Systems Network (FEWS NET) data and Moderate Resolution Imaging Spectrometer (MODIS)'s Normalizing Difference Vegetation Index (NDVI) anomalies over HOA. The data assimilation experiments produced mixed results. GRACE terrestrial water storage (TWS) assimilation was found to significantly improve soil moisture and evapotransportation, as well as drought monitoring via soil moisture percentiles, while AMSR-E soil moisture

  5. ELSI Bibliography: Ethical legal and social implications of the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Yesley, M.S. [comp.

    1993-11-01

    This second edition of the ELSI Bibliography provides a current and comprehensive resource for identifying publications on the major topics related to the ethical, legal and social issues (ELSI) of the Human Genome Project. Since the first edition of the ELSI Bibliography was printed last year, new publications and earlier ones identified by additional searching have doubled our computer database of ELSI publications to over 5600 entries. The second edition of the ELSI Bibliography reflects this growth of the underlying computer database. Researchers should note that an extensive collection of publications in the database is available for public use at the General Law Library of Los Alamos National Laboratory (LANL).

  6. Genome-wide association analysis and genomic prediction of Mycobacterium avium subspecies paratuberculosis infection in US Jersey cattle.

    Directory of Open Access Journals (Sweden)

    Yalda Zare

    Full Text Available Paratuberculosis (Johne's disease, an enteric disorder in ruminants caused by Mycobacterium avium subspecies paratuberculosis (MAP, causes economic losses in excess of $200 million annually to the US dairy industry. To identify genomic regions underlying susceptibility to MAP infection in Jersey cattle, a case-control genome-wide association study (GWAS was performed. Blood and fecal samples were collected from ∼ 5,000 mature cows in 30 commercial Jersey herds from across the US. Discovery data consisted of 450 cases and 439 controls genotyped with the Illumina BovineSNP50 BeadChip. Cases were animals with positive ELISA and fecal culture (FC results. Controls were animals negative to both ELISA and FC tests that matched cases on birth date and herd. Validation data consisted of 180 animals including 90 cases (positive to FC and 90 controls (negative to ELISA and FC, selected from discovery herds and genotyped by Illumina BovineLD BeadChip (∼ 7K SNPs. Two analytical approaches were used: single-marker GWAS using the GRAMMAR-GC method and Bayesian variable selection (Bayes C using GenSel software. GRAMMAR-GC identified one SNP on BTA7 at 68 megabases (Mb surpassing a significance threshold of 5 × 10(-5. ARS-BFGL-NGS-11887 on BTA23 (27.7 Mb accounted for the highest percentage of genetic variance (3.3% in the Bayes C analysis. SNPs identified in common by GRAMMAR-GC and Bayes C in both discovery and combined data were mapped to BTA23 (27, 29 and 44 Mb, 3 (100, 101, 106 and 107 Mb and 17 (57 Mb. Correspondence between results of GRAMMAR-GC and Bayes C was high (70-80% of most significant SNPs in common. These SNPs could potentially be associated with causal variants underlying susceptibility to MAP infection in Jersey cattle. Predictive performance of the model developed by Bayes C for prediction of infection status of animals in validation set was low (55% probability of correct ranking of paired case and control samples.

  7. Computational prediction of cAMP receptor protein (CRP binding sites in cyanobacterial genomes

    Directory of Open Access Journals (Sweden)

    Su Zhengchang

    2009-01-01

    Full Text Available Abstract Background Cyclic AMP receptor protein (CRP, also known as catabolite gene activator protein (CAP, is an important transcriptional regulator widely distributed in many bacteria. The biological processes under the regulation of CRP are highly diverse among different groups of bacterial species. Elucidation of CRP regulons in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. Previously, CRP has been experimentally studied in only two cyanobacterial strains: Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120; therefore, a systematic genome-scale study of the potential CRP target genes and binding sites in cyanobacterial genomes is urgently needed. Results We have predicted and analyzed the CRP binding sites and regulons in 12 sequenced cyanobacterial genomes using a highly effective cis-regulatory binding site scanning algorithm. Our results show that cyanobacterial CRP binding sites are very similar to those in E. coli; however, the regulons are very different from that of E. coli. Furthermore, CRP regulons in different cyanobacterial species/ecotypes are also highly diversified, ranging from photosynthesis, carbon fixation and nitrogen assimilation, to chemotaxis and signal transduction. In addition, our prediction indicates that crp genes in modern cyanobacteria are likely inherited from a common ancestral gene in their last common ancestor, and have adapted various cellular functions in different environments, while some cyanobacteria lost their crp genes as well as CRP binding sites during the course of evolution. Conclusion The CRP regulons in cyanobacteria are highly diversified, probably as a result of divergent evolution to adapt to various ecological niches. Cyanobacterial CRPs may function as lineage-specific regulators participating in various cellular processes, and are important in some lineages. However, they are dispensable in some other lineages. The

  8. PRISM offers a comprehensive genomic approach to transcription factor function prediction

    KAUST Repository

    Wenger, A. M.

    2013-02-04

    The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

  9. Using information of relatives in genomic prediction to apply effective stratified medicine

    Science.gov (United States)

    Lee, S. Hong; Weerasinghe, W. M. Shalanee P.; Wray, Naomi R.; Goddard, Michael E.; van der Werf, Julius H. J.

    2017-01-01

    Genomic prediction shows promise for personalised medicine in which diagnosis and treatment are tailored to individuals based on their genetic profiles for complex diseases. We present a theoretical framework to demonstrate that prediction accuracy can be improved by targeting more informative individuals in the data set used to generate the predictors (“discovery sample”) to include those with genetically close relationships with the subjects put forward for risk prediction. Increase of prediction accuracy from closer relationships is achieved under an additive model and does not rely on any family or interaction effects. Using theory, simulations and real data analyses, we show that the predictive accuracy or the area under the receiver operating characteristic curve (AUC) increased exponentially with decreasing effective size (Ne), i.e. when individuals are closely related. For example, with the sample size of discovery set N = 3000, heritability h2 = 0.5 and population prevalence K = 0.1, AUC value approached to 0.9 and the top percentile of the estimated genetic profile scores had 23 times higher proportion of cases than the general population. This suggests that there is considerable room to increase prediction accuracy by using a design that does not exclude closer relationships. PMID:28181587

  10. Impact of Relationships between Test and Reference Animals and between Reference Animals on Reliability of Genomic Prediction

    DEFF Research Database (Denmark)

    Wu, Xiaoping; Lund, Mogens Sandø; Sun, Dongxiao

    This study investigated reliability of genomic prediction in various scenarios with regard to relationship between test and reference animals and between animals within the reference population. Different reference populations were generated from EuroGenomics data and 1288 Nordic Holstein bulls...... as a common test population. A GBLUP model and a Bayesian mixture model were applied to predict Genomic breeding values for bulls in the test data. Result showed that a closer relationship between test and reference animals led to a higher reliability, while a closer relationship between reference animal...

  11. Genomic risk models improve prediction of longitudinal lipid levels in children and young adults

    Directory of Open Access Journals (Sweden)

    Nathan E. Wineinger

    2013-05-01

    Full Text Available In clinical medicine, lipids are commonly measured biomarkers used to assess an individual’s risk for cardiovascular disease, heart attack, and stroke. Accurately predicting longitudinal lipid levels based on genomic information can inform therapeutic practices and decrease cardiovascular risk by identifying high-risk patients prior to onset. Using genotyped and imputed genetic data from 523 unrelated Caucasian Americans from the Bogalusa Heart Study, surveyed on 4,026 occasions from 4 to 48 years of age, we generated various lipid genomic risk models based on previously reported markers. We observed a significant improvement in prediction over non-genetic risk models in high density lipoprotein cholesterol (increase in the squared correlation between observed and predicted values, d=0.032, low density lipoprotein cholesterol (d=0.053, total cholesterol (d=0.043, and triglycerides (d=0.031. Many of our approaches are based on an n-fold cross-validation procedure that are, by design, adaptable to a clinical environment.

  12. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data.

    Directory of Open Access Journals (Sweden)

    Hua Yu

    Full Text Available In silico prediction of drug-target interactions from heterogeneous biological data can advance our system-level search for drug molecules and therapeutic targets, which efforts have not yet reached full fruition. In this work, we report a systematic approach that efficiently integrates the chemical, genomic, and pharmacological information for drug targeting and discovery on a large scale, based on two powerful methods of Random Forest (RF and Support Vector Machine (SVM. The performance of the derived models was evaluated and verified with internally five-fold cross-validation and four external independent validations. The optimal models show impressive performance of prediction for drug-target interactions, with a concordance of 82.83%, a sensitivity of 81.33%, and a specificity of 93.62%, respectively. The consistence of the performances of the RF and SVM models demonstrates the reliability and robustness of the obtained models. In addition, the validated models were employed to systematically predict known/unknown drugs and targets involving the enzymes, ion channels, GPCRs, and nuclear receptors, which can be further mapped to functional ontologies such as target-disease associations and target-target interaction networks. This approach is expected to help fill the existing gap between chemical genomics and network pharmacology and thus accelerate the drug discovery processes.

  13. Predicting co-complexed protein pairs using genomic and proteomic data integration

    Directory of Open Access Journals (Sweden)

    King Oliver D

    2004-04-01

    Full Text Available Abstract Background Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H and affinity purification coupled with mass spectrometry (APMS have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. Results Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue, a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database, and the remaining predictions may potentially represent unknown CCPs. Conclusions We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

  14. Hybrid Prediction Method for Aircraft Interior Noise Project

    Data.gov (United States)

    National Aeronautics and Space Administration — The goal of the project is research and development of methods for application of the Hybrid FE-SEA method to aircraft vibro-acoustic problems. This proposal...

  15. Predictive Models of Recombination Rate Variation across the Drosophila melanogaster Genome

    Science.gov (United States)

    Adrian, Andrew B.; Corchado, Johnny Cruz; Comeron, Josep M.

    2016-01-01

    In all eukaryotic species examined, meiotic recombination, and crossovers in particular, occur non‐randomly along chromosomes. The cause for this non-random distribution remains poorly understood but some specific DNA sequence motifs have been shown to be enriched near crossover hotspots in a number of species. We present analyses using machine learning algorithms to investigate whether DNA motif distribution across the genome can be used to predict crossover variation in Drosophila melanogaster, a species without hotspots. Our study exposes a combinatorial non-linear influence of motif presence able to account for a significant fraction of the genome-wide variation in crossover rates at all genomic scales investigated, from 20% at 5-kb to almost 70% at 2,500-kb scale. The models are particularly predictive for regions with the highest and lowest crossover rates and remain highly informative after removing sub-telomeric and -centromeric regions known to have strongly reduced crossover rates. Transcriptional activity during early meiosis and differences in motif use between autosomes and the X chromosome add to the predictive power of the models. Moreover, we show that population-specific differences in crossover rates can be partly explained by differences in motif presence. Our results suggest that crossover distribution in Drosophila is influenced by both meiosis-specific chromatin dynamics and very local constitutive open chromatin associated with DNA motifs that prevent nucleosome stabilization. These findings provide new information on the genetic factors influencing variation in recombination rates and a baseline to study epigenetic mechanisms responsible for plastic recombination as response to different biotic and abiotic conditions and stresses. PMID:27492232

  16. Human Genome Project discoveries: Dialectics and rhetoric in the science of genetics

    Science.gov (United States)

    Robidoux, Charlotte A.

    The Human Genome Project (HGP), a $437 million effort that began in 1990 to chart the chemical sequence of our three billion base pairs of DNA, was completed in 2003, marking the 50th anniversary that proved the definitive structure of the molecule. This study considered how dialectical and rhetorical arguments functioned in the science, political, and public forums over a 20-year period, from 1980 to 2000, to advance human genome research and to establish the official project. I argue that Aristotle's continuum of knowledge--which ranges from the probable on one end to certified or demonstrated knowledge on the other--provides useful distinctions for analyzing scientific reasoning. While contemporary scientific research seeks to discover certified knowledge, investigators generally employ the hypothetico-deductive or scientific method, which often yields probable rather than certain findings, making these dialectical in nature. Analysis of the discourse describing human genome research revealed the use of numerous rhetorical figures and topics. Persuasive and probable reasoning were necessary for scientists to characterize unknown genetic phenomena, to secure interest in and funding for large-scale human genome research, to solve scientific problems, to issue probable findings, to convince colleagues and government officials that the findings were sound and to disseminate information to the public. Both government and private venture scientists drew on these tools of reasoning to promote their methods of mapping and sequencing the genome. The debate over how to carry out sequencing was rooted in conflicting values. Scientists representing the academic tradition valued a more conservative method that would establish high quality results, and those supporting private industry valued an unconventional approach that would yield products and profits more quickly. Values in turn influenced political and public forum arguments. Agency representatives and investors sided

  17. Genetic programming as alternative for predicting development effort of individual software projects.

    Directory of Open Access Journals (Sweden)

    Arturo Chavoya

    Full Text Available Statistical and genetic programming techniques have been used to predict the software development effort of large software projects. In this paper, a genetic programming model was used for predicting the effort required in individually developed projects. Accuracy obtained from a genetic programming model was compared against one generated from the application of a statistical regression model. A sample of 219 projects developed by 71 practitioners was used for generating the two models, whereas another sample of 130 projects developed by 38 practitioners was used for validating them. The models used two kinds of lines of code as well as programming language experience as independent variables. Accuracy results from the model obtained with genetic programming suggest that it could be used to predict the software development effort of individual projects when these projects have been developed in a disciplined manner within a development-controlled environment.

  18. High-Throughput Phenotyping of Sorghum Plant Height Using an Unmanned Aerial Vehicle and Its Application to Genomic Prediction Modeling

    Science.gov (United States)

    Watanabe, Kakeru; Guo, Wei; Arai, Keigo; Takanashi, Hideki; Kajiya-Kanegae, Hiromi; Kobayashi, Masaaki; Yano, Kentaro; Tokunaga, Tsuyoshi; Fujiwara, Toru; Tsutsumi, Nobuhiro; Iwata, Hiroyoshi

    2017-01-01

    Genomics-assisted breeding methods have been rapidly developed with novel technologies such as next-generation sequencing, genomic selection and genome-wide association study. However, phenotyping is still time consuming and is a serious bottleneck in genomics-assisted breeding. In this study, we established a high-throughput phenotyping system for sorghum plant height and its response to nitrogen availability; this system relies on the use of unmanned aerial vehicle (UAV) remote sensing with either an RGB or near-infrared, green and blue (NIR-GB) camera. We evaluated the potential of remote sensing to provide phenotype training data in a genomic prediction model. UAV remote sensing with the NIR-GB camera and the 50th percentile of digital surface model, which is an indicator of height, performed well. The correlation coefficient between plant height measured by UAV remote sensing (PHUAV) and plant height measured with a ruler (PHR) was 0.523. Because PHUAV was overestimated (probably because of the presence of taller plants on adjacent plots), the correlation coefficient between PHUAV and PHR was increased to 0.678 by using one of the two replications (that with the lower PHUAV value). Genomic prediction modeling performed well under the low-fertilization condition, probably because PHUAV overestimation was smaller under this condition due to a lower plant height. The predicted values of PHUAV and PHR were highly correlated with each other (r = 0.842). This result suggests that the genomic prediction models generated with PHUAV were almost identical and that the performance of UAV remote sensing was similar to that of traditional measurements in genomic prediction modeling. UAV remote sensing has a high potential to increase the throughput of phenotyping and decrease its cost. UAV remote sensing will be an important and indispensable tool for high-throughput genomics-assisted plant breeding.

  19. High-Throughput Phenotyping of Sorghum Plant Height Using an Unmanned Aerial Vehicle and Its Application to Genomic Prediction Modeling.

    Science.gov (United States)

    Watanabe, Kakeru; Guo, Wei; Arai, Keigo; Takanashi, Hideki; Kajiya-Kanegae, Hiromi; Kobayashi, Masaaki; Yano, Kentaro; Tokunaga, Tsuyoshi; Fujiwara, Toru; Tsutsumi, Nobuhiro; Iwata, Hiroyoshi

    2017-01-01

    Genomics-assisted breeding methods have been rapidly developed with novel technologies such as next-generation sequencing, genomic selection and genome-wide association study. However, phenotyping is still time consuming and is a serious bottleneck in genomics-assisted breeding. In this study, we established a high-throughput phenotyping system for sorghum plant height and its response to nitrogen availability; this system relies on the use of unmanned aerial vehicle (UAV) remote sensing with either an RGB or near-infrared, green and blue (NIR-GB) camera. We evaluated the potential of remote sensing to provide phenotype training data in a genomic prediction model. UAV remote sensing with the NIR-GB camera and the 50th percentile of digital surface model, which is an indicator of height, performed well. The correlation coefficient between plant height measured by UAV remote sensing (PHUAV) and plant height measured with a ruler (PHR) was 0.523. Because PHUAV was overestimated (probably because of the presence of taller plants on adjacent plots), the correlation coefficient between PHUAV and PHR was increased to 0.678 by using one of the two replications (that with the lower PHUAV value). Genomic prediction modeling performed well under the low-fertilization condition, probably because PHUAV overestimation was smaller under this condition due to a lower plant height. The predicted values of PHUAV and PHR were highly correlated with each other (r = 0.842). This result suggests that the genomic prediction models generated with PHUAV were almost identical and that the performance of UAV remote sensing was similar to that of traditional measurements in genomic prediction modeling. UAV remote sensing has a high potential to increase the throughput of phenotyping and decrease its cost. UAV remote sensing will be an important and indispensable tool for high-throughput genomics-assisted plant breeding.

  20. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology.

    Directory of Open Access Journals (Sweden)

    Paolo Fontana

    Full Text Available BACKGROUND: Large-scale sequencing projects have now become routine lab practice and this has led to the development of a new generation of tools involving function prediction methods, bringing the latter back to the fore. The advent of Gene Ontology, with its structured vocabulary and paradigm, has provided computational biologists with an appropriate means for this task. METHODOLOGY: We present here a novel method called ARGOT (Annotation Retrieval of Gene Ontology Terms that is able to process quickly thousands of sequences for functional inference. The tool exploits for the first time an integrated approach which combines clustering of GO terms, based on their semantic similarities, with a weighting scheme which assesses retrieved hits sharing a certain number of biological features with the sequence to be annotated. These hits may be obtained by different methods and in this work we have based ARGOT processing on BLAST results. CONCLUSIONS: The extensive benchmark involved 10,000 protein sequences, the complete S. cerevisiae genome and a small subset of proteins for purposes of comparison with other available tools. The algorithm was proven to outperform existing methods and to be suitable for function prediction of single proteins due to its high degree of sensitivity, specificity and coverage.

  1. ECLogger: Cross-Project Catch-Block Logging Prediction Using Ensemble of Classifiers

    Directory of Open Access Journals (Sweden)

    Sangeeta Lal

    2017-01-01

    Full Text Available Background: Software developers insert log statements in the source code to record program execution information. However, optimizing the number of log statements in the source code is challenging. Machine learning based within-project logging prediction tools, proposed in previous studies, may not be suitable for new or small software projects. For such software projects, we can use cross-project logging prediction. Aim: The aim of the study presented here is to investigate cross-project logging prediction methods and techniques. Method: The proposed method is ECLogger, which is a novel, ensemble-based, cross-project, catch-block logging prediction model. In the research We use 9 base classifiers were used and combined using ensemble techniques. The performance of ECLogger was evaluated on on three open-source Java projects: Tomcat, CloudStack and Hadoop. Results: ECLogger Bagging, ECLogger AverageVote, and ECLogger MajorityVote show a considerable improvement in the average Logged F-measure (LF on 3, 5, and 4 source -> target project pairs, respectively, compared to the baseline classifiers. ECLogger AverageVote performs best and shows improvements of 3.12% (average LF and 6.08% (average ACC – Accuracy. Conclusion: The classifier based on ensemble techniques, such as bagging, average vote, and majority vote outperforms the baseline classifier. Overall, the ECLogger AverageVote model performs best. The results show that the CloudStack project is more generalizable than the other projects.

  2. TcruziDB, an Integrated Database, and the WWW Information Server for the Trypanosoma cruzi Genome Project

    Directory of Open Access Journals (Sweden)

    Degrave Wim

    1997-01-01

    Full Text Available Data analysis, presentation and distribution is of utmost importance to a genome project. A public domain software, ACeDB, has been chosen as the common basis for parasite genome databases, and a first release of TcruziDB, the Trypanosoma cruzi genome database, is available by ftp from ftp://iris.dbbm.fiocruz.br/pub/genomedb/TcruziDB as well as versions of the software for different operating systems (ftp://iris.dbbm.fiocruz.br/pub/unixsoft/. Moreover, data originated from the project are available from the WWW server at http://www.dbbm.fiocruz.br. It contains biological and parasitological data on CL Brener, its karyotype, all available T. cruzi sequences from Genbank, data on the EST-sequencing project and on available libraries, a T. cruzi codon table and a listing of activities and participating groups in the genome project, as well as meeting reports. T. cruzi discussion lists (tcruzi-l@iris.dbbm.fiocruz.br and tcgenics@iris.dbbm.fiocruz.br are being maintained for communication and to promote collaboration in the genome project

  3. Applications of population genetics to animal breeding, from wright, fisher and lush to genomic prediction.

    Science.gov (United States)

    Hill, William G

    2014-01-01

    Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives' performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher's infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with "genomic selection" is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas.

  4. Applications of Population Genetics to Animal Breeding, from Wright, Fisher and Lush to Genomic Prediction

    Science.gov (United States)

    Hill, William G.

    2014-01-01

    Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives’ performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher’s infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with “genomic selection” is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas. PMID:24395822

  5. A Comparative Genomics Approach to Prediction of New Members of Regulons

    Science.gov (United States)

    Tan, Kai; Moreno-Hagelsieb, Gabriel; Collado-Vides, Julio; Stormo, Gary D.

    2001-01-01

    Identifying the complete transcriptional regulatory network for an organism is a major challenge. For each regulatory protein, we want to know all the genes it regulates, that is, its regulon. Examples of known binding sites can be used to estimate the binding specificity of the protein and to predict other binding sites. However, binding site predictions can be unreliable because determining the true specificity of the protein is difficult because of the considerable variability of binding sites. Because regulatory systems tend to be conserved through evolution, we can use comparisons between species to increase the reliability of binding site predictions. In this article, an approach is presented to evaluate the computational predicitions of regulatory sites. We combine the prediction of transcription units having orthologous genes with the prediction of transcription factor binding sites based on probabilistic models. We augment the sets of genes in Escherichia coli that are expected to be regulated by two transcription factors, the cAMP receptor protein and the fumarate and nitrate reduction regulatory protein, through a comparison with the Haemophilus influenzae genome. At the same time, we learned more about the regulatory networks of H. influenzae, a species with much less experimental knowledge than E. coli. By studying orthologous genes subject to regulation by the same transcription factor, we also gained understanding of the evolution of the entire regulatory systems. PMID:11282972

  6. Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets

    Directory of Open Access Journals (Sweden)

    Amjad Ali

    2015-01-01

    Full Text Available Helicobacter pylori is a human gastric pathogen implicated as the major cause of peptic ulcer and second leading cause of gastric cancer (~70% around the world. Conversely, an increased resistance to antibiotics and hindrances in the development of vaccines against H. pylori are observed. Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome approach; the predicted conserved gene families (1,193 constitute ~77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost homolog proteins were characterized as universal therapeutic targets against H. pylori based on their functional annotation and protein-protein interaction. Finally, pathogenomics and genome plasticity analysis revealed 3 highly conserved and 2 highly variable putative pathogenicity islands in all of the H. pylori genomes been analyzed.

  7. Computational Appliance for Rapid Prediction of Aircraft Trajectories Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Next generation air traffic management systems will be based to a greater degree on predicted trajectories of aircraft. Due to the iterative nature of future air...

  8. Computational Appliance for Rapid Prediction of Aircraft Trajectories Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Next generation air traffic management systems will be based to a greater degree on predicted trajectories of aircraft. Due to the iterative nature of future air...

  9. Vehicle Interior Noise Prediction Using Energy Finite Element Analysis Project

    Data.gov (United States)

    National Aeronautics and Space Administration — It is proposed to develop and implement a computational technique based on Energy Finite Element Analysis (EFEA) for interior noise prediction of advanced aerospace...

  10. Enhanced Prediction of Gear Tooth Surface Fatigue Life Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Sentient will develop an enhanced prediction of gear tooth surface fatigue life with rigorous analysis of the tribological phenomena that contribute to pitting...

  11. Vehicle Interior Noise Prediction Using Energy Finite Element Analysis Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Prediction and enhancement of vehicle interior noise due to high frequency excitation, based on computer simulation, allows the application of the technology at the...

  12. Predicting effects of structural stress in a genome-reduced model bacterial metabolism

    Science.gov (United States)

    Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles

    2012-08-01

    Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.

  13. Prediction of drug-target interactions for drug repositioning only based on genomic expression similarity.

    Directory of Open Access Journals (Sweden)

    Kejian Wang

    Full Text Available Small drug molecules usually bind to multiple protein targets or even unintended off-targets. Such drug promiscuity has often led to unwanted or unexplained drug reactions, resulting in side effects or drug repositioning opportunities. So it is always an important issue in pharmacology to identify potential drug-target interactions (DTI. However, DTI discovery by experiment remains a challenging task, due to high expense of time and resources. Many computational methods are therefore developed to predict DTI with high throughput biological and clinical data. Here, we initiatively demonstrate that the on-target and off-target effects could be characterized by drug-induced in vitro genomic expression changes, e.g. the data in Connectivity Map (CMap. Thus, unknown ligands of a certain target can be found from the compounds showing high gene-expression similarity to the known ligands. Then to clarify the detailed practice of CMap based DTI prediction, we objectively evaluate how well each target is characterized by CMap. The results suggest that (1 some targets are better characterized than others, so the prediction models specific to these well characterized targets would be more accurate and reliable; (2 in some cases, a family of ligands for the same target tend to interact with common off-targets, which may help increase the efficiency of DTI discovery and explain the mechanisms of complicated drug actions. In the present study, CMap expression similarity is proposed as a novel indicator of drug-target interactions. The detailed strategies of improving data quality by decreasing the batch effect and building prediction models are also effectively established. We believe the success in CMap can be further translated into other public and commercial data of genomic expression, thus increasing research productivity towards valid drug repositioning and minimal side effects.

  14. In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity

    Directory of Open Access Journals (Sweden)

    Fiers Mark WJE

    2009-04-01

    Full Text Available Abstract Background MicroRNAs (miRNAs, short ~21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted temporal or tissue-specific expression remain undiscovered. Various strategies for in silico miRNA identification have been proposed to facilitate miRNA discovery. Notably support vector machine (SVM methods have recently gained popularity. However, a drawback of these methods is that they do not provide insight into the biological properties of miRNA sequences. Results We here propose a new strategy for miRNA hairpin prediction in which the likelihood that a genomic hairpin is a true miRNA hairpin is evaluated based on statistical distributions of observed biological variation of properties (descriptors of known miRNA hairpins. These distributions are transformed into a single and continuous outcome classifier called the L score. Using a dataset of known miRNA hairpins from the miRBase database and an exhaustive set of genomic hairpins identified in the genome of Caenorhabditis elegans, a subset of 18 most informative descriptors was selected after detailed analysis of correlation among and discriminative power of individual descriptors. We show that the majority of previously identified miRNA hairpins have high L scores, that the method outperforms miRNA prediction by threshold filtering and that it is more transparent than SVM classifiers. Conclusion The L score is applicable as a prediction classifier with high sensitivity for novel miRNA hairpins. The L-score approach can be used to rank and select interesting miRNA hairpin candidates for downstream experimental analysis when coupled to a genome-wide set of in silico-identified hairpins or to facilitate the analysis of large sets of putative miRNA hairpin loci obtained in deep

  15. The effects of relatedness and GxE interaction on prediction accuracies in genomic selection: a study in cassava

    Science.gov (United States)

    Prior to implementation of genomic selection, an evaluation of the potential accuracy of prediction can be obtained by cross validation. In this procedure, a population with both phenotypes and genotypes is split into training and validation sets. The prediction model is fitted using the training se...

  16. Predicting Flu Season Requirements: An Undergraduate Modeling Project

    Science.gov (United States)

    Kramlich, Gary R., II; Braunstein Fierson, Janet L.; Wright, J. Adam

    2010-01-01

    This project was designed to be used in a freshman calculus class whose students had already been introduced to logistic functions and basic data modeling techniques. It need not be limited to such an audience, however; it has also been implemented in a topics in mathematics class for college upperclassmen. Originally intended to be presented in…

  17. Should the markers on X chromosome be used for genomic prediction?

    DEFF Research Database (Denmark)

    Su, Guosheng; Guldbrandtsen, Bernt; Aamand, Gert Pedersen;

    2013-01-01

    excluding the X chromosome.Averaged over 15 traits, the gains in reliability from the X chromosome rangedfrom 0.3% to 0.5% points among the three data sets and models. Using a model with a G-matrix accounting for sex-linkedrelationship appropriately or a model which divided genomic breeding value intoan......This study investigated theaccuracy of imputation from LD (7K) to 54K panel and compared accuracy ofgenomic prediction with or without the X chromosome information, based on data ofNordic Holstein bulls. Beagle and Findhap were used for imputation. Averagedover two imputation datasets, the allele...... correct rates of imputation usingFindhap were 98.2% for autosomal markers, 89.7% for markers on the pseudoautosomal region of the X chromosome, and 96.4% for X-specific markers. Theallele correct rates were 98.9%, 91.2% and 96.8%, respectively, when usingBeagle. Genomic predictions were carried out for 15...

  18. Intrinsic disorder in Viral Proteins Genome-Linked: experimental and predictive analyses

    Directory of Open Access Journals (Sweden)

    Van Dorsselaer Alain

    2009-02-01

    Full Text Available Abstract Background VPgs are viral proteins linked to the 5' end of some viral genomes. Interactions between several VPgs and eukaryotic translation initiation factors eIF4Es are critical for plant infection. However, VPgs are not restricted to phytoviruses, being also involved in genome replication and protein translation of several animal viruses. To date, structural data are still limited to small picornaviral VPgs. Recently three phytoviral VPgs were shown to be natively unfolded proteins. Results In this paper, we report the bacterial expression, purification and biochemical characterization of two phytoviral VPgs, namely the VPgs of Rice yellow mottle virus (RYMV, genus Sobemovirus and Lettuce mosaic virus (LMV, genus Potyvirus. Using far-UV circular dichroism and size exclusion chromatography, we show that RYMV and LMV VPgs are predominantly or partly unstructured in solution, respectively. Using several disorder predictors, we show that both proteins are predicted to possess disordered regions. We next extend theses results to 14 VPgs representative of the viral diversity. Disordered regions were predicted in all VPg sequences whatever the genus and the family. Conclusion Based on these results, we propose that intrinsic disorder is a common feature of VPgs. The functional role of intrinsic disorder is discussed in light of the biological roles of VPgs.

  19. Compatibility of pedigree-based and marker-based relationships for single-step genomic prediction

    DEFF Research Database (Denmark)

    Christensen, Ole Fredslund

    2012-01-01

    Single-step methods for genomic prediction have recently become popular because they are conceptually simple and in practice such a method can completely replace a pedigree-based method for routine genetic evaluation. An issue with single-step methods is compatibility between the marker-based rel......Single-step methods for genomic prediction have recently become popular because they are conceptually simple and in practice such a method can completely replace a pedigree-based method for routine genetic evaluation. An issue with single-step methods is compatibility between the marker......-based relationship matrix and the pedigree-based relationship matrix. The compatibility issue involves which allele frequencies to use in the marker-based relationship matrix, and also that adjustments of this matrix to the pedigree-based relationship matrix are needed. In addition, it has been overlooked...... in the base population. Here, two ideas are explored. The first idea is to instead adjust the pedigree-based relationship matrix to be compatible to the marker-based relationship matrix, whereas the second idea is to include the likelihood for the observed markers. A single-step method is used where...

  20. Predicting Defects Using Information Intelligence Process Models in the Software Technology Project.

    Science.gov (United States)

    Selvaraj, Manjula Gandhi; Jayabal, Devi Shree; Srinivasan, Thenmozhi; Balasubramanie, Palanisamy

    2015-01-01

    A key differentiator in a competitive market place is customer satisfaction. As per Gartner 2012 report, only 75%-80% of IT projects are successful. Customer satisfaction should be considered as a part of business strategy. The associated project parameters should be proactively managed and the project outcome needs to be predicted by a technical manager. There is lot of focus on the end state and on minimizing defect leakage as much as possible. Focus should be on proactively managing and shifting left in the software life cycle engineering model. Identify the problem upfront in the project cycle and do not wait for lessons to be learnt and take reactive steps. This paper gives the practical applicability of using predictive models and illustrates use of these models in a project to predict system testing defects thus helping to reduce residual defects.

  1. Understanding our genetic inheritance: The US Human Genome Project, The first five years FY 1991--1995

    Energy Technology Data Exchange (ETDEWEB)

    None

    1990-04-01

    The Human Genome Initiative is a worldwide research effort with the goal of analyzing the structure of human DNA and determining the location of the estimated 100,000 human genes. In parallel with this effort, the DNA of a set of model organisms will be studied to provide the comparative information necessary for understanding the functioning of the human genome. The information generated by the human genome project is expected to be the source book for biomedical science in the 21st century and will by of immense benefit to the field of medicine. It will help us to understand and eventually treat many of the more than 4000 genetic diseases that affect mankind, as well as the many multifactorial diseases in which genetic predisposition plays an important role. A centrally coordinated project focused on specific objectives is believed to be the most efficient and least expensive way of obtaining this information. The basic data produced will be collected in electronic databases that will make the information readily accessible on convenient form to all who need it. This report describes the plans for the U.S. human genome project and updates those originally prepared by the Office of Technology Assessment (OTA) and the National Research Council (NRC) in 1988. In the intervening two years, improvements in technology for almost every aspect of genomics research have taken place. As a result, more specific goals can now be set for the project.

  2. Understanding our Genetic Inheritance: The U.S. Human Genome Project, The First Five Years FY 1991--1995

    Science.gov (United States)

    1990-04-01

    The Human Genome Initiative is a worldwide research effort with the goal of analyzing the structure of human DNA and determining the location of the estimated 100,000 human genes. In parallel with this effort, the DNA of a set of model organisms will be studied to provide the comparative information necessary for understanding the functioning of the human genome. The information generated by the human genome project is expected to be the source book for biomedical science in the 21st century and will by of immense benefit to the field of medicine. It will help us to understand and eventually treat many of the more than 4000 genetic diseases that affect mankind, as well as the many multifactorial diseases in which genetic predisposition plays an important role. A centrally coordinated project focused on specific objectives is believed to be the most efficient and least expensive way of obtaining this information. The basic data produced will be collected in electronic databases that will make the information readily accessible on convenient form to all who need it. This report describes the plans for the U.S. human genome project and updates those originally prepared by the Office of Technology Assessment (OTA) and the National Research Council (NRC) in 1988. In the intervening two years, improvements in technology for almost every aspect of genomics research have taken place. As a result, more specific goals can now be set for the project.

  3. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties.

    Science.gov (United States)

    Yue, Zhenyu; Zhang, Wenna; Lu, Yongming; Yang, Qiaoyue; Ding, Qiuying; Xia, Junfeng; Chen, Yan

    2015-01-01

    Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  4. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties

    Directory of Open Access Journals (Sweden)

    Zhenyu Yue

    2015-11-01

    Full Text Available Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  5. Integrated genome-scale prediction of detrimental mutations in transcription networks.

    Directory of Open Access Journals (Sweden)

    Mirko Francesconi

    2011-05-01

    Full Text Available A central challenge in genetics is to understand when and why mutations alter the phenotype of an organism. The consequences of gene inhibition have been systematically studied and can be predicted reasonably well across a genome. However, many sequence variants important for disease and evolution may alter gene regulation rather than gene function. The consequences of altering a regulatory interaction (or "edge" rather than a gene (or "node" in a network have not been as extensively studied. Here we use an integrative analysis and evolutionary conservation to identify features that predict when the loss of a regulatory interaction is detrimental in the extensively mapped transcription network of budding yeast. Properties such as the strength of an interaction, location and context in a promoter, regulator and target gene importance, and the potential for compensation (redundancy associate to some extent with interaction importance. Combined, however, these features predict quite well whether the loss of a regulatory interaction is detrimental across many promoters and for many different transcription factors. Thus, despite the potential for regulatory diversity, common principles can be used to understand and predict when changes in regulation are most harmful to an organism.

  6. The GenABEL Project for statistical genomics [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Lennart C. Karssen

    2016-05-01

    Full Text Available Development of free/libre open source software is usually done by a community of people with an interest in the tool. For scientific software, however, this is less often the case. Most scientific software is written by only a few authors, often a student working on a thesis. Once the paper describing the tool has been published, the tool is no longer developed further and is left to its own device. Here we describe the broad, multidisciplinary community we formed around a set of tools for statistical genomics. The GenABEL project for statistical omics actively promotes open interdisciplinary development of statistical methodology and its implementation in efficient and user-friendly software under an open source licence. The software tools developed withing the project collectively make up the GenABEL suite, which currently consists of eleven tools. The open framework of the project actively encourages involvement of the community in all stages, from formulation of methodological ideas to application of software to specific data sets. A web forum is used to channel user questions and discussions, further promoting the use of the GenABEL suite. Developer discussions take place on a dedicated mailing list, and development is further supported by robust development practices including use of public version control, code review and continuous integration. Use of this open science model attracts contributions from users and developers outside the “core team”, facilitating agile statistical omics methodology development and fast dissemination.

  7. Human Genome Diversity Project. Summary of planning workshop 3(B): Ethical and human-rights implications

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1993-12-31

    The third planning workshop of the Human Genome Diversity Project was held on the campus of the US National Institutes of Health in Bethesda, Maryland, from February 16 through February 18, 1993. The second day of the workshop was devoted to an exploration of the ethical and human-rights implications of the Project. This open meeting centered on three roundtables, involving 12 invited participants, and the resulting discussions among all those present. Attendees and their affiliations are listed in the attached Appendix A. The discussion was guided by a schedule and list of possible issues, distributed to all present and attached as Appendix B. This is a relatively complete, and thus lengthy, summary of the comments at the meeting. The beginning of the summary sets out as conclusions some issues on which there appeared to be widespread agreement, but those conclusions are not intended to serve as a set of detailed recommendations. The meeting organizer is distributing his recommendations in a separate memorandum; recommendations from others who attended the meeting are welcome and will be distributed by the meeting organizer to the participants and to the Project committee.

  8. Genomic prediction with parallel computing for slaughter traits in Chinese Simmental beef cattle using high-density genotypes.

    Science.gov (United States)

    Guo, Peng; Zhu, Bo; Xu, Lingyang; Niu, Hong; Wang, Zezhao; Guan, Long; Liang, Yonghu; Ni, Hemin; Guo, Yong; Chen, Yan; Zhang, Lupei; Gao, Xue; Gao, Huijiang; Li, Junya

    2017-01-01

    Genomic selection has been widely used for complex quantitative trait in farm animals. Estimations of breeding values for slaughter traits are most important to beef cattle industry, and it is worthwhile to investigate prediction accuracies of genomic selection for these traits. In this study, we assessed genomic predictive abilities for average daily gain weight (ADG), live weight (LW), carcass weight (CW), dressing percentage (DP), lean meat percentage (LMP) and retail meat weight (RMW) using Illumina Bovine 770K SNP Beadchip in Chinese Simmental cattle. To evaluate the abilities of prediction, marker effects were estimated using genomic BLUP (GBLUP) and three parallel Bayesian models, including multiple chains parallel BayesA, BayesB and BayesCπ (PBayesA, PBayesB and PBayesCπ). Training set and validation set were divided by random allocation, and the predictive accuracies were evaluated using 5-fold cross validations. We found the accuracies of genomic predictions ranged from 0.195±0.084 (GBLUP for LMP) to 0.424±0.147 (PBayesB for CW). The average accuracies across traits were 0.327±0.085 (GBLUP), 0.335±0.063 (PBayesA), 0.347±0.093 (PBayesB) and 0.334±0.077 (PBayesCπ), respectively. Notably, parallel Bayesian models were more accurate than GBLUP across six traits. Our study suggested that genomic selections with multiple chains parallel Bayesian models are feasible for slaughter traits in Chinese Simmental cattle. The estimations of direct genomic breeding values using parallel Bayesian methods can offer important insights into improving prediction accuracy at young ages and may also help to identify superior candidates in breeding programs.

  9. SNP detection and prediction of variability between chicken lines using genome resequencing of DNA pools

    Directory of Open Access Journals (Sweden)

    Carlborg Örjan

    2010-11-01

    Full Text Available Abstract Background Next-generation sequencing technologies are widely used for detection of millions of Single Nucleotide Polymorphisms (SNPs and also provide a means of assessing their variation. This information is useful for composing subsets of highly informative SNPs for region-specific or genome-wide analysis and to identify mutations regulating phenotypic differences within or between populations. In this study, we investigated the sensitivity of SNP detection and introduced the flanking SNPs value (FSV as a novel measure for predicting SNP-variability using ~5X genome resequencing with ABI SOLID and DNA pools from two chicken lines divergently selected for juvenile bodyweight. Results Genotyping with a 60 K SNP chip revealed polymorphisms within or between two divergently selected chicken lines for 31 363 SNPs, 48% of which were also detected using resequencing of DNA pools. SNP detection using resequencing was more powerful for positions with larger differences in allele frequency between the lines. About 50% of the SNPs with non-reference allele frequencies in the range 0.5-0.6 and 67% of those with frequencies > 0.9 could be detected. On average, ~3.7 SNPs/kb were detected by resequencing, with about 5% lower density on microchromosomes than on macrochromosomes. There was a positive correlation between the observed between-line SNP variation from the 60 K chip analysis and our proposed FSV score computed from the genome resequencing data. The strongest correlations on macrochromosomes and microchromosomes were observed when the FSV was calculated with total flanking regions of 62 kb (correlation 0.55 and 38 kb (correlation 0.45, respectively. Conclusions Genome resequencing with limited coverage (~5X using pooled DNA samples and three non-reference reads as a threshold for SNP detection, identified 50 - 67% of the 60 K SNPs with a non-reference allele frequency larger than 0.5. The SNP density was around 5% lower on the

  10. Report on three Genomes to Life Workshops: Data Infrastructure, Modeling and Simulation, and Protein Structure Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Geist, GA

    2003-09-16

    On July 22, 23, 24, 2003, three one day workshops were held in Gaithersburg, Maryland. Each was attended by about 30 computational biologists, mathematicians, and computer scientists who were experts in the respective workshop areas The first workshop discussed the data infrastructure needs for the Genomes to Life (GTL) program with the objective to identify gaps in the present GTL data infrastructure and define the GTL data infrastructure required for the success of the proposed GTL facilities. The second workshop discussed the modeling and simulation needs for the next phase of the GTL program and defined how these relate to the experimental data generated by genomics, proteomics, and metabolomics. The third workshop identified emerging technical challenges in computational protein structure prediction for DOE missions and outlining specific goals for the next phase of GTL. The workshops were attended by representatives from both OBER and OASCR. The invited experts at each of the workshops made short presentations on what they perceived as the key needs in the GTL data infrastructure, modeling and simulation, and structure prediction respectively. Each presentation was followed by a lively discussion by all the workshop attendees. The following findings and recommendations were derived from the three workshops. A seamless integration of GTL data spanning the entire range of genomics, proteomics, and metabolomics will be extremely challenging but it has to be treated as the first-class component of the GTL program to assure GTL's chances for success. High-throughput GTL facilities and ultrascale computing will make it possible to address the ultimate goal of modern biology: to achieve a fundamental, comprehensive, and systematic understanding of life. But first the GTL community needs to address the problem of the massive quantities and increased complexity of biological data produced by experiments and computations. Genome-scale collection, analysis

  11. O admirável Projeto Genoma Humano The brave New Human Genome Project

    Directory of Open Access Journals (Sweden)

    Marilena V. Corrêa

    2002-12-01

    Full Text Available Este artigo apresenta um panorama das implicações sociais, éticas e legais do Projeto Genoma Humano. Os benefícios desse megaprojeto, traduzidos em promessas de uma revolução terapêutica na medicina, não se realizarão sem conflitos. O processo de inovação tecnológica na genética traz problemas de ordens diversas: por um lado, pesquisas em consórcio, patenteamento de genes e produtos da genômica apontam interesses comerciais e dificuldades de gerenciamento dos resultados dessas pesquisas. Esses problemas colocam desafios em termos de uma possível desigualdade no acesso aos benefícios das pesquisas. Por outro lado, temos a questão da informação genética e da proteção de dados individuais sobre riscos e suscetibilidades a doenças e atributos humanos. O problema da definição de homens e mulheres em função de traços genéticos traz uma ameaça discriminatória clara, e se torna agudo em função do reducionismo genético que a mídia ajuda a propagar. As respostas a esses problemas não podem ser esperadas apenas da bioética. A abordagem bioética deve poder combinar-se a análises políticas da reprodução, da sexualidade, da saúde e da medicina. Um vastíssimo espectro de problemas como estes não pode ser discutido em profundidade em um artigo. Optou-se por mapeá-los no sentido de enfatizar em que medida, na reflexão sobre o projeto genoma, a genômica e a pós-genômica, enfrenta-se o desafio de articular aspectos tão diferenciados.This article presents an overview of the social, ethical, and legal implications of the Human Genome Project. The benefits of this mega-project, expressed as promises of a therapeutic revolution in medicine, will not be achieved without conflict. The process of technological innovation in genetics poses problems of various orders: on the one hand, consortium-based research, gene patenting, and genomic products tend to feature commercial interests and management of the results of such

  12. Behavioral, Brain Imaging and Genomic Measures to Predict Functional Outcomes Post-Bed Rest and Space Flight

    Science.gov (United States)

    Mulavara, A. P.; Peters, B.; De Dios, Y. E.; Gadd, N. E.; Caldwell, E. E.; Batson, C. D.; Goel, R.; Oddsson, L.; Kreutzberg, G.; Zanello, S.; Clark, T. K.; Oman, C. M.; Cohen, H. S.; Wood, S.; Seidler, R. D.; Reschke, M. F.; Bloomberg, J. J.

    2017-01-01

    Astronauts experience sensorimotor disturbances during their initial exposure to microgravity and during the re-adaptation phase following a return to an Earth-gravitational environment. These alterations may disrupt crewmembers' ability to perform mission critical functional tasks requiring ambulation, manual control and gaze stability. Interestingly, astronauts who return from spaceflight show substantial differences in their abilities to readapt to a gravitational environment. The ability to predict the manner and degree to which individual astronauts are affected will improve the effectiveness of countermeasure training programs designed to enhance sensorimotor adaptability. For such an approach to succeed, we must develop predictive measures of sensorimotor adaptability that will allow us to foresee, before actual spaceflight, which crewmembers are likely to experience greater challenges to their adaptive capacities. The goals of this project are to identify and characterize this set of predictive measures. Our approach includes: 1) behavioral tests to assess sensory bias and adaptability quantified using both strategic and plastic-adaptive responses; 2) imaging to determine individual brain morphological and functional features, using structural magnetic resonance imaging (MRI), diffusion tensor imaging, resting state functional connectivity MRI, and sensorimotor adaptation task-related functional brain activation; and 3) assessment of genetic polymorphisms in the catechol-O-methyl transferase, dopamine receptor D2, and brain-derived neurotrophic factor genes and genetic polymorphisms of alpha2-adrenergic receptors that play a role in the neural pathways underlying sensorimotor adaptation. We anticipate that these predictive measures will be significantly correlated with individual differences in sensorimotor adaptability after long-duration spaceflight and exposure to an analog bed rest environment. We will be conducting a retrospective study, leveraging

  13. Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset.

    Science.gov (United States)

    Sengupta, Dhriti; Choudhury, Ananyo; Basu, Analabha; Ramsay, Michèle

    2016-12-31

    Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasure trove to catalogue innocuous as well as clinically relevant rare mutations. Recent studies have revealed four dominant ancestries in populations from mainland India: Ancestral North-Indian (ANI), Ancestral South-Indian (ASI), Ancestral Tibeto-Burman (ATB) and Ancestral Austro-Asiatic (AAA). The 1000 Genomes Project (KGP) Phase-3 data include about 500 genomes from five linguistically defined Indian-Subcontinent (IS) populations (Punjabi, Gujrati, Bengali, Telugu and Tamil) some of whom are recent migrants to USA or UK. Comparative analyses show that despite the distinct geographic origins of the KGP-IS populations, the ANI component is predominantly represented in this dataset. Previous studies demonstrated population substructure in the HapMap Gujrati population, and we found evidence for additional substructure in the Punjabi and Telugu populations. These substructured populations have characteristic/significant differences in heterozygosity and inbreeding coefficients. Moreover, we demonstrate that the substructure is better explained by factors like differences in proportion of ancestral components, and endogamy driven social structure rather than invoking a novel ancestral component to explain it. Therefore, using language and/or geography as a proxy for an ethnic unit is inadequate for many of the IS populations. This highlights the necessity for more nuanced sampling strategies or corrective statistical approaches, particularly for biomedical and population genetics research in India.

  14. Mitogenomes from The 1000 Genome Project reveal new Near Eastern features in present-day Tuscans.

    Directory of Open Access Journals (Sweden)

    Alberto Gómez-Carballa

    Full Text Available Genetic analyses have recently been carried out on present-day Tuscans (Central Italy in order to investigate their presumable recent Near East ancestry in connection with the long-standing debate on the origins of the Etruscan civilization. We retrieved mitogenomes and genome-wide SNP data from 110 Tuscans analyzed within the context of The 1000 Genome Project. For phylogeographic and evolutionary analysis we made use of a large worldwide database of entire mitogenomes (>26,000 and partial control region sequences (>180,000.Different analyses reveal the presence of typical Near East haplotypes in Tuscans representing isolated members of various mtDNA phylogenetic branches. As a whole, the Near East component in Tuscan mitogenomes can be estimated at about 8%; a proportion that is comparable to previous estimates but significantly lower than admixture estimates obtained from autosomal SNP data (21%. Phylogeographic and evolutionary inter-population comparisons indicate that the main signal of Near Eastern Tuscan mitogenomes comes from Iran.Mitogenomes of recent Near East origin in present-day Tuscans do not show local or regional variation. This points to a demographic scenario that is compatible with a recent arrival of Near Easterners to this region in Italy with no founder events or bottlenecks.

  15. Mitogenomes from The 1000 Genome Project Reveal New Near Eastern Features in Present-Day Tuscans

    Science.gov (United States)

    Pardo-Seco, Jacobo; Amigo, Jorge; Martinón-Torres, Federico

    2015-01-01

    Background Genetic analyses have recently been carried out on present-day Tuscans (Central Italy) in order to investigate their presumable recent Near East ancestry in connection with the long-standing debate on the origins of the Etruscan civilization. We retrieved mitogenomes and genome-wide SNP data from 110 Tuscans analyzed within the context of The 1000 Genome Project. For phylogeographic and evolutionary analysis we made use of a large worldwide database of entire mitogenomes (>26,000) and partial control region sequences (>180,000). Results Different analyses reveal the presence of typical Near East haplotypes in Tuscans representing isolated members of various mtDNA phylogenetic branches. As a whole, the Near East component in Tuscan mitogenomes can be estimated at about 8%; a proportion that is comparable to previous estimates but significantly lower than admixture estimates obtained from autosomal SNP data (21%). Phylogeographic and evolutionary inter-population comparisons indicate that the main signal of Near Eastern Tuscan mitogenomes comes from Iran. Conclusions Mitogenomes of recent Near East origin in present-day Tuscans do not show local or regional variation. This points to a demographic scenario that is compatible with a recent arrival of Near Easterners to this region in Italy with no founder events or bottlenecks. PMID:25786119

  16. [The Bilbao declaration: international meeting on the law concerning the human genome project].

    Science.gov (United States)

    1994-06-01

    The Bilbao statement was the result of a work meeting, held the day before the closing session by a group of representative experts, formed by general chairmen and meeting organizers. The compelled and necessary consent gave rise to the document that was read and communicated to the world's public opinion during the closing act on may 26, 1993. Notwithstanding, the working group considered that the divulged version was provisory and committed to continue the task of re-elaborating the statement. The aim was to complete and improve it, taking the greatest advantage of the important meeting achievements. The document that is next reproduced is the definitive integral version of the Bilbao Statement. The expert group that takes the responsibility of this Statement is Jean Dausset, Nobel Prize of Medicine (1980); Carleton Gajdusek, Nobel Prize of Medicine (1976); Santiago Grisolía president of UNESCO committee for the Genome Project; Michael Kirby, President of the Court of Appeal of the Supreme Court of New South Wales, Australia; Aaron Klug, member of the Constitutional Council, Paris, France; Rafael Mendizábal, Judge of the Constitutional Court, Madrid, Spain; Juan Bautista Pardo, President of the Superior Court of Justice of the Basque Country and Carlos María Romeo Casabona, Director of the Chair of Law and Human Genome of the University of Deusto (Bilbao).

  17. Bioethics methods in the ethical, legal, and social implications of the human genome project literature.

    Science.gov (United States)

    Walker, Rebecca L; Morrissey, Clair

    2014-11-01

    While bioethics as a field has concerned itself with methodological issues since the early years, there has been no systematic examination of how ethics is incorporated into research on the Ethical, Legal and Social Implications (ELSI) of the Human Genome Project. Yet ELSI research may bear a particular burden of investigating and substantiating its methods given public funding, an explicitly cross-disciplinary approach, and the perceived significance of adequate responsiveness to advances in genomics. We undertook a qualitative content analysis of a sample of ELSI publications appearing between 2003 and 2008 with the aim of better understanding the methods, aims, and approaches to ethics that ELSI researchers employ. We found that the aims of ethics within ELSI are largely prescriptive and address multiple groups. We also found that the bioethics methods used in the ELSI literature are both diverse between publications and multiple within publications, but are usually not themselves discussed or employed as suggested by bioethics method proponents. Ethics in ELSI is also sometimes undistinguished from related inquiries (such as social, legal, or political investigations).

  18. PREDICTING THE CONSEQUENCES OF SEAWATER INTRUSION AND PROTECTION PROJECTS

    Institute of Scientific and Technical Information of China (English)

    袁益让; 梁栋; 芮洪兴

    2001-01-01

    The simulation of this process and the effects of protection irojects lays the foundation of its effective control and defence. The mathematical model of the problem and upwind splitting alternating direction method were presented. Using this method, the numerical simulation of seawater intrusion in Laizhou Bay Area of Shandong Provivce was finished. The numerical results turned out to be identical with the real measurements, so the prediction of the consequences of protection projectects is reasonable.

  19. Genome-wide protein localization prediction strategies for gram negative bacteria

    Directory of Open Access Journals (Sweden)

    Romine Margaret F

    2011-06-01

    Full Text Available Abstract Background Genome-wide prediction of protein subcellular localization is an important type of evidence used for inferring protein function. While a variety of computational tools have been developed for this purpose, errors in the gene models and use of protein sorting signals that are not recognized by the more commonly accepted tools can diminish the accuracy of their output. Results As part of an effort to manually curate the annotations of 19 strains of Shewanella, numerous insights were gained regarding the use of computational tools and proteomics data to predict protein localization. Identification of the suite of secretion systems present in each strain at the start of the process made it possible to tailor-fit the subsequent localization prediction strategies to each strain for improved accuracy. Comparisons of the computational predictions among orthologous proteins revealed inconsistencies in the computational outputs, which could often be resolved by adjusting the gene models or ortholog group memberships. While proteomic data was useful for verifying start site predictions and post-translational proteolytic cleavage, care was needed to distinguish cellular versus sample processing-mediated cleavage events. Searches for lipoprotein signal peptides revealed that neither TatP nor LipoP are designed for identification of lipoprotein substrates of the twin arginine translocation system and that the +2 rule for lipoprotein sorting does not apply to this Genus. Analysis of the relationships between domain occurrence and protein localization prediction enabled identification of numerous location-informative domains which could then be used to refine or increase confidence in location predictions. This collective knowledge was used to develop a general strategy for predicting protein localization that could be adapted to other organisms. Conclusion Improved localization prediction accuracy is not simply a matter of developing better

  20. Predicting relatedness of bacterial genomes using the chaperonin-60 universal target (cpn60 UT): application to Thermoanaerobacter species.

    Science.gov (United States)

    Verbeke, Tobin J; Sparling, Richard; Hill, Janet E; Links, Matthew G; Levin, David; Dumonceaux, Tim J

    2011-05-01

    D.R. Zeigler determined that the sequence identity of bacterial genomes can be predicted accurately using the sequence identities of a corresponding set of genes that meet certain criteria [32]. This three-gene model for comparing bacterial genome pairs requires the determination of the sequence identities for recN, thdF, and rpoA. This involves the generation of approximately 4.2kb of genomic DNA sequence from each organism to be compared, and also normally requires that oligonucleotide primers be designed for amplification and sequencing based on the sequences of closely related organisms. However, we have developed an analogous mathematical model for predicting the sequence identity of whole genomes based on the sequence identity of the 542-567 base pair chaperonin-60 universal target (cpn60 UT). The cpn60 UT is accessible in nearly all bacterial genomes with a single set of universal primers, and its length is such that it can be completely sequenced in one pair of overlapping sequencing reads via di-deoxy sequencing. These mathematical models were applied to a set of Thermoanaerobacter isolates from a wood chip compost pile and it was shown that both the one-gene cpn60 UT-based model and the three-gene model based on recN, rpoA, and thdF predicted that these isolates could be classified as Thermoanaerobacter thermohydrosulfuricus. Furthermore, it was found that the genomic prediction model using cpn60 UT gave similar results to whole-genome sequence alignments over a broad range of taxa, suggesting that this method may have general utility for screening isolates and predicting their taxonomic affiliations.

  1. Dispositional optimism and perceived risk interact to predict intentions to learn genome sequencing results.

    Science.gov (United States)

    Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Biesecker, Leslie G; Biesecker, Barbara B

    2015-07-01

    Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. (c) 2015 APA, all rights reserved.

  2. Comparison of whole genome prediction accuracy across generations using parametric and semi parametric methods

    Directory of Open Access Journals (Sweden)

    Abbas Atefi

    2016-11-01

    Full Text Available Accuracy of genomic prediction was compared using three parametric and semi parametric methods, including BayesA, Bayesian LASSO and Reproducing kernel Hilbert spaces regression under various levels of heritability (0.15, 0.3 and 0.45, different number of markers (500, 750 and 1000 and generation intervals of validating set. A historical population of 1000 individuals with equal sex ratio was simulated for 100 generations at constant size. It followed by 100 extra generations of gradually reducing size down to 500 individuals in generation 200. Individuals of generation 200 were mated randomly for 10 more generations applying litter size of 5 to expand the historical generation. Finally, 50 males and 500 females chosen from generation 210 were randomly mated to generate 10 more generations of recent population. Individuals born in generation 211 considered as the training set while the validation set was composed of individuals either from generations 213, 215 or 217. The genome comprised one chromosome of 100 cM length carrying 50 QTLs. There was no significant difference between accuracy of investigated methods (p > 0.05 but among three methods, the highest mean accuracy (0.659 was observed for BayesA. By increasing the heritability, the average genomic accuracy increased from 0.53 to 0.75 (p < 0.05. The number of SNPs affected the accuracy and accuracies increased as number of SNPs increased; therefore, the highest accuracy was for the case number of SNPs=1000. With getting away from validating set, the accuracies decreased and the most severe decay observed in the case of low heritability. Decreasing the accuracy across generations affected by marker density but was independent from investigated methods.

  3. Genomic selection prediction accuracy in a perennial crop: case study of oil palm (Elaeis guineensis Jacq.).

    Science.gov (United States)

    Cros, David; Denis, Marie; Sánchez, Leopoldo; Cochard, Benoit; Flori, Albert; Durand-Gasselin, Tristan; Nouy, Bruno; Omoré, Alphonse; Pomiès, Virginie; Riou, Virginie; Suryana, Edyana; Bouvet, Jean-Marc

    2015-03-01

    Genomic selection empirically appeared valuable for reciprocal recurrent selection in oil palm as it could account for family effects and Mendelian sampling terms, despite small populations and low marker density. Genomic selection (GS) can increase the genetic gain in plants. In perennial crops, this is expected mainly through shortened breeding cycles and increased selection intensity, which requires sufficient GS accuracy in selection candidates, despite often small training populations. Our objective was to obtain the first empirical estimate of GS accuracy in oil palm (Elaeis guineensis), the major world oil crop. We used two parental populations involved in conventional reciprocal recurrent selection (Deli and Group B) with 131 individuals each, genotyped with 265 SSR. We estimated within-population GS accuracies when predicting breeding values of non-progeny-tested individuals for eight yield traits. We used three methods to sample training sets and five statistical methods to estimate genomic breeding values. The results showed that GS could account for family effects and Mendelian sampling terms in Group B but only for family effects in Deli. Presumably, this difference between populations originated from their contrasting breeding history. The GS accuracy ranged from -0.41 to 0.94 and was positively correlated with the relationship between training and test sets. Training sets optimized with the so-called CDmean criterion gave the highest accuracies, ranging from 0.49 (pulp to fruit ratio in Group B) to 0.94 (fruit weight in Group B). The statistical methods did not affect the accuracy. Finally, Group B could be preselected for progeny tests by applying GS to key yield traits, therefore increasing the selection intensity. Our results should be valuable for breeding programs with small populations, long breeding cycles, or reduced effective size.

  4. Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers

    Directory of Open Access Journals (Sweden)

    Litonjua Augusto A

    2011-06-01

    Full Text Available Abstract Background Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. To date, disease prediction has been based on a few environmental factors and/or single nucleotide polymorphisms (SNPs, while complex diseases are usually affected by many genetic and environmental factors with each factor contributing a small portion to the outcome. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics. Methods In this study, using emergency room visits or hospitalizations as the definition of a severe asthma exacerbation, we first identified a list of top Genome Wide Association Study (GWAS SNPs ranked by Random Forests (RF importance score for the CAMP (Childhood Asthma Management Program population of 127 exacerbation cases and 290 non-exacerbation controls. We predict severe asthma exacerbations using the top 10 to 320 SNPs together with age, sex, pre-bronchodilator FEV1 percentage predicted, and treatment group. Results Testing in an independent set of the CAMP population shows that severe asthma exacerbations can be predicted with an Area Under the Curve (AUC = 0.66 with 160-320 SNPs in comparison to an AUC score of 0.57 with 10 SNPs. Using the clinical traits alone yielded AUC score of 0.54, suggesting the phenotype is affected by genetic as well as environmental factors. Conclusions Our study shows that a random forests algorithm can effectively extract and use the information contained in a small number of samples. Random forests, and other machine learning tools, can be used with GWAS studies to integrate large numbers of predictors simultaneously.

  5. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds.

    Science.gov (United States)

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-08-10

    A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of

  6. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  7. 139 Clinically Applicable and Biologically Validated MRI Radiomic Test Method Predicts Glioblastoma Genomic Landscape and Survival.

    Science.gov (United States)

    Zinn, Pascal O; Singh, Sanjay K; Kotrotsou, Aikaterini; Zandi, Faramak; Thomas, Ginu; Hatami, Masumeh; Luedi, Markus M; Elakkad, Ahmed; Hassan, Islam; Gumin, Joy; Sulman, Erik P; Lang, Frederick F; Colen, Rivka R

    2016-08-01

    Imaging is the modality of choice for noninvasive characterization of biological tissue and organ systems; imaging serves as early diagnostic tool for most disease processes and is rapidly evolving, thus transforming the way we diagnose and follow patients over time. A vast number of cancer imaging characteristics have been correlated to underlying genomics; however, none have established causality. Therefore, our objectives were to test if there is a causal relationship between imaging and genomic information; and to develop a clinically relevant radiomic pipeline for glioblastoma molecular characterization. Functional validation was performed using a prototypic in vivo RNA-interference-based orthotopic xenograft mouse model. The automated pipeline collects 4800 MRI-derived texture features per tumor. Using univariate feature selection and boosted tree predictive modeling, a patient-specific genomic probability map was derived and patient survival predicted (The Cancer Genome Atlas/MD Anderson data sets). Data demonstrated a significant xenograft to human association (area under the curve [AUC] 84%, P applicable analytical imaging method termed Radiome Sequencing to allow for automated image analysis, prediction of key genomic events, and survival. This method is scalable and applicable to any type of medical imaging. Further, it allows for human-mouse matched coclinical trials, in-depth end point analysis, and upfront noninvasive high-resolution radiomics-based diagnostic, prognostic, and predictive biomarker development.

  8. In silico prediction and screening of modular crystal structures via a high-throughput genomic approach

    Science.gov (United States)

    Li, Yi; Li, Xu; Liu, Jiancong; Duan, Fangzheng; Yu, Jihong

    2015-09-01

    High-throughput computational methods capable of predicting, evaluating and identifying promising synthetic candidates with desired properties are highly appealing to today's scientists. Despite some successes, in silico design of crystalline materials with complex three-dimensionally extended structures remains challenging. Here we demonstrate the application of a new genomic approach to ABC-6 zeolites, a family of industrially important catalysts whose structures are built from the stacking of modular six-ring layers. The sequences of layer stacking, which we deem the genes of this family, determine the structures and the properties of ABC-6 zeolites. By enumerating these gene-like stacking sequences, we have identified 1,127 most realizable new ABC-6 structures out of 78 groups of 84,292 theoretical ones, and experimentally realized 2 of them. Our genomic approach can extract crucial structural information directly from these gene-like stacking sequences, enabling high-throughput identification of synthetic targets with desired properties among a large number of candidate structures.

  9. DNAskew: Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-RuMA; Shao-BoXIAO; Ai-ZhenGUO; Jian-QiangLUE; Huan-ChunCHEN

    2004-01-01

    Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution,in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

  10. Genome-wide Transcription Factor Gene Prediction and their Expressional Tissue-Specificities in Maize

    Institute of Scientific and Technical Information of China (English)

    Yi Jiang; Biao Zeng; Hainan Zhao; Mei Zhang; Shaojun Xie; Jinsheng Lai

    2012-01-01

    Transcription factors (TFs) are important regulators of gene expression.To better understand TFencoding genes in maize (Zea mays L.),a genome-wide TF prediction was performed using the updated B73 reference genome.A total of 2 298 TF genes were identified,which can be classified into 56 families.The largest family,known as the MYB superfamily,comprises 322 MYB and MYB-related TF genes.The expression patterns of 2014 (87.64%) TF genes were examined using RNA-seq data,which resulted in the identification of a subset of TFs that are specifically expressed in particular tissues (including root,shoot,leaf,ear,tassel and kernel).Similarly,98 kernel-specific TF genes were further analyzed,and it was observed that 29 of the kernel-specific genes were preferentially expressed in the early kernel developmental stage,while 69 of the genes were expressed in the late kernel developmental stage.Identification of these TFs,particularly the tissue-specific ones,provides important information for the understanding of development and transcriptional regulation of maize.

  11. Performance of genotype imputations using data from the 1000 Genomes Project.

    Science.gov (United States)

    Sung, Yun Ju; Wang, Lihua; Rankinen, Tuomo; Bouchard, Claude; Rao, D C

    2012-01-01

    Genotype imputations based on 1000 Genomes (1KG) Project data have the advantage of imputing many more SNPs than imputations based on HapMap data. It also provides an opportunity to discover associations with relatively rare variants. Recent investigations are increasingly using 1KG data for genotype imputations, but only limited evaluations of the performance of this approach are available. In this paper, we empirically evaluated imputation performance using 1KG data by comparing imputation results to those using the HapMap Phase II data that have been widely used. We used three reference panels: the CEU panel consisting of 120 haplotypes from HapMap II and 1KG data (June 2010 release) and the EUR panel consisting of 566 haplotypes also from 1KG data (August 2010 release). We used Illumina 324,607 autosomal SNPs genotyped in 501 individuals of European ancestry. Our most important finding was that both 1KG reference panels provided much higher imputation yield than the HapMap II panel. There were more than twice as many successfully imputed SNPs as there were using the HapMap II panel (6.7 million vs. 2.5 million). Our second most important finding was that accuracy using both 1KG panels was high and almost identical to accuracy using the HapMap II panel. Furthermore, after removing SNPs with MACH Rsq Project is still underway, we expect that later versions will provide even better imputation performance.

  12. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

    DEFF Research Database (Denmark)

    Birney, Ewan; Stamatoyannopoulos, John A; Dutta, Anindya

    2007-01-01

    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses...

  13. [Results of work on the project "Instruments, reagents, probes" of the state scientific-technical program "Human genome" (1989-1994)].

    Science.gov (United States)

    Tverdokhlebov, E N

    1995-01-01

    This report reviews the activities of the "Reagents, Devices, Probes" branch of the Russian State "Human Genome" Project for six-year period (1989-1994). Data on pilot and commercial production of reagents and equipment for human genome studies along with information on the project costs and awarded grants are presented.

  14. Investigating Effort Prediction of Software Projects on the ISBSG Dataset

    Directory of Open Access Journals (Sweden)

    Sanaa Elyassami

    2012-04-01

    Full Text Available Many cost estimation models have been proposed over the last three decades. In this study, we investigatefuzzy ID3 decision tree as a method for software effort estimation. Fuzzy ID software effort estimationmodel is designed by incorporating the principles of ID3 decision tree and the concepts of the fuzzy settheoretic; permitting the model to handle uncertain and imprecise data when presenting the softwareprojects.MMRE (Mean Magnitude of Relative Error and Pred(l (Prediction at level l are used, as measures ofprediction accuracy, for this study. A series of experiments is reported using ISBSG software projectsdataset. Fuzzy trees are grown using different fuzziness control thresholds.Results showed that optimizing the fuzzy ID3 parameters can improve greatly the accuracy of the generatedsoftware cost estimate.

  15. Genome-wide association and genomic prediction of breeding values for fatty acid composition in subcutaneous adipose and longissimus lumborum muscle of beef cattle.

    Science.gov (United States)

    Chen, Liuhong; Ekine-Dzivenu, Chinyere; Vinsky, Michael; Basarab, John; Aalhus, Jennifer; Dugan, Mike E R; Fitzsimmons, Carolyn; Stothard, Paul; Li, Changxi

    2015-11-21

    Identification of genetic variants that are associated with fatty acid composition in beef will enhance our understanding of host genetic influence on the trait and also allow for more effective improvement of beef fatty acid profiles through genomic selection and marker-assisted diet management. In this study, 81 and 83 fatty acid traits were measured in subcutaneous adipose (SQ) and longissimus lumborum muscle (LL), respectively, from 1366 purebred and crossbred beef steers and heifers that were genotyped on the Illumina BovineSNP50 Beadchip. The objective was to conduct genome-wide association studies (GWAS) for the fatty acid traits and to evaluate the accuracy of genomic prediction for fatty acid composition using genomic best linear unbiased prediction (GBLUP) and Bayesian methods. In total, 302 and 360 significant SNPs spanning all autosomal chromosomes were identified to be associated with fatty acid composition in SQ and LL tissues, respectively. Proportions of total genetic variance explained by individual significant SNPs ranged from 0.03 to 11.06% in SQ, and from 0.005 to 24.28% in the LL muscle. Markers with relatively large effects were located near fatty acid synthase (FASN), stearoyl-CoA desaturase (SCD), and thyroid hormone responsive (THRSP) genes. For the majority of the fatty acid traits studied, the accuracy of genomic prediction was relatively low ( = 0.50) were achieved for 10:0, 12:0, 14:0, 15:0, 16:0, 9c-14:1, 12c-16:1, 13c-18:1, and health index (HI) in LL, and for 12:0, 14:0, 15:0, 10 t,12c-18:2, and 11 t,13c + 11c,13 t-18:2 in SQ. The Bayesian method performed similarly as GBLUP for most of the traits but substantially better for traits that were affected by SNPs of large effects as identified by GWAS. Fatty acid composition in beef is influenced by a few host genes with major effects and many genes of smaller effects. With the current training population size and marker density, genomic prediction has the potential to predict

  16. Predictive ability of genomic selection models for breeding value estimation on growth traits of Pacific white shrimp Litopenaeus vannamei

    Science.gov (United States)

    Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai

    2017-09-01

    Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.

  17. Predictive ability of genomic selection models for breeding value estimation on growth traits of Pacific white shrimp Litopenaeus vannamei

    Science.gov (United States)

    Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai

    2016-10-01

    Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.

  18. An Overview of the MATERHORN Fog Project: Observations and Predictability

    Science.gov (United States)

    Gultepe, I.; Fernando, H. J. S.; Pardyjak, E. R.; Hoch, S. W.; Silver, Z.; Creegan, E.; Leo, L. S.; Pu, Zhaoxia; De Wekker, S. F. J.; Hang, Chaoxun

    2016-09-01

    . Temperature profiles suggested that an inversion layer contributed significantly to IF formation at Heber. Ice fog forecasts via Weather Research and Forecasting (WRF) model indicated the limitations of IF predictability. Results suggest that IF predictions need to be improved based on ice microphysical parameterizations and ice nucleation processes.

  19. Comparative genomic analysis of evolutionarily conserved but functionally uncharacterized membrane proteins in archaea: Prediction of novel components of secretion, membrane remodeling and glycosylation systems.

    Science.gov (United States)

    Makarova, Kira S; Galperin, Michael Y; Koonin, Eugene V

    2015-11-01

    A systematic comparative genomic analysis of all archaeal membrane proteins that have been projected to the last archaeal common ancestor gene set led to the identification of several novel components of predicted secretion, membrane remodeling, and protein glycosylation systems. Among other findings, most crenarchaea have been shown to encode highly diverged orthologs of the membrane insertase YidC, which is nearly universal in bacteria, eukaryotes, and euryarchaea. We also identified a vast family of archaeal proteins, including the C-terminal domain of N-glycosylation protein AglD, as membrane flippases homologous to the flippase domain of bacterial multipeptide resistance factor MprF, a bifunctional lysylphosphatidylglycerol synthase and flippase. Additionally, several proteins were predicted to function as membrane transporters. The results of this work, combined with our previous analyses, reveal an unexpected diversity of putative archaeal membrane-associated functional systems that remain to be functionally characterized. A more general conclusion from this work is that the currently available collection of archaeal (and bacterial) genomes could be sufficient to identify (almost) all widespread functional modules and develop experimentally testable predictions of their functions.

  20. Integration of Multiple Genomic Data Sources in a Bayesian Cox Model for Variable Selection and Prediction.

    Science.gov (United States)

    Treppmann, Tabea; Ickstadt, Katja; Zucknick, Manuela

    2017-01-01

    Bayesian variable selection becomes more and more important in statistical analyses, in particular when performing variable selection in high dimensions. For survival time models and in the presence of genomic data, the state of the art is still quite unexploited. One of the more recent approaches suggests a Bayesian semiparametric proportional hazards model for right censored time-to-event data. We extend this model to directly include variable selection, based on a stochastic search procedure within a Markov chain Monte Carlo sampler for inference. This equips us with an intuitive and flexible approach and provides a way for integrating additional data sources and further extensions. We make use of the possibility of implementing parallel tempering to help improve the mixing of the Markov chains. In our examples, we use this Bayesian approach to integrate copy number variation data into a gene-expression-based survival prediction model. This is achieved by formulating an informed prior based on copy number variation. We perform a simulation study to investigate the model's behavior and prediction performance in different situations before applying it to a dataset of glioblastoma patients and evaluating the biological relevance of the findings.

  1. Genomic Prostate Cancer Classifier Predicts Biochemical Failure and Metastases in Patients After Postoperative Radiation Therapy

    Energy Technology Data Exchange (ETDEWEB)

    Den, Robert B., E-mail: Robert.Den@jeffersonhospital.org [Kimmel Cancer Center, Jefferson Medical College of Thomas Jefferson University, Philadelphia, Pennsylvania (United States); Feng, Felix Y. [University of Michigan, Michigan Union, Michigan (United States); Showalter, Timothy N. [University of Virginia School of Medicine, Charlottesville, Virginia (United States); Mishra, Mark V. [University of Maryland Medical Center, Baltimore, Maryland (United States); Trabulsi, Edouard J.; Lallas, Costas D.; Gomella, Leonard G.; Kelly, W. Kevin; Birbe, Ruth C.; McCue, Peter A. [Kimmel Cancer Center, Jefferson Medical College of Thomas Jefferson University, Philadelphia, Pennsylvania (United States); Ghadessi, Mercedeh; Yousefi, Kasra; Davicioni, Elai [GenomeDx Biosciences Inc., Vancouver, British Columbia (Canada); Knudsen, Karen E.; Dicker, Adam P. [Kimmel Cancer Center, Jefferson Medical College of Thomas Jefferson University, Philadelphia, Pennsylvania (United States)

    2014-08-01

    Purpose: To test the hypothesis that a genomic classifier (GC) would predict biochemical failure (BF) and distant metastasis (DM) in men receiving radiation therapy (RT) after radical prostatectomy (RP). Methods and Materials: Among patients who underwent post-RP RT, 139 were identified for pT3 or positive margin, who did not receive neoadjuvant hormones and had paraffin-embedded specimens. Ribonucleic acid was extracted from the highest Gleason grade focus and applied to a high-density-oligonucleotide microarray. Receiver operating characteristic, calibration, cumulative incidence, and Cox regression analyses were performed to assess GC performance for predicting BF and DM after post-RP RT in comparison with clinical nomograms. Results: The area under the receiver operating characteristic curve of the Stephenson model was 0.70 for both BF and DM, with addition of GC significantly improving area under the receiver operating characteristic curve to 0.78 and 0.80, respectively. Stratified by GC risk groups, 8-year cumulative incidence was 21%, 48%, and 81% for BF (P<.0001) and for DM was 0, 12%, and 17% (P=.032) for low, intermediate, and high GC, respectively. In multivariable analysis, patients with high GC had a hazard ratio of 8.1 and 14.3 for BF and DM. In patients with intermediate or high GC, those irradiated with undetectable prostate-specific antigen (PSA ≤0.2 ng/mL) had median BF survival of >8 years, compared with <4 years for patients with detectable PSA (>0.2 ng/mL) before initiation of RT. At 8 years, the DM cumulative incidence for patients with high GC and RT with undetectable PSA was 3%, compared with 23% with detectable PSA (P=.03). No outcome differences were observed for low GC between the treatment groups. Conclusion: The GC predicted BF and metastasis after post-RP irradiation. Patients with lower GC risk may benefit from delayed RT, as opposed to those with higher GC; however, this needs prospective validation. Genomic-based models

  2. Neutral Theory Predicts the Relative Abundance and Diversity of Genetic Elements in a Broad Array of Eukaryotic Genomes

    Science.gov (United States)

    Serra, François; Becher, Verónica; Dopazo, Hernán

    2013-01-01

    It is universally true in ecological communities, terrestrial or aquatic, temperate or tropical, that some species are very abundant, others are moderately common, and the majority are rare. Likewise, eukaryotic genomes also contain classes or “species” of genetic elements that vary greatly in abundance: DNA transposons, retrotransposons, satellite sequences, simple repeats and their less abundant functional sequences such as RNA or genes. Are the patterns of relative species abundance and diversity similar among ecological communities and genomes? Previous dynamical models of genomic diversity have focused on the selective forces shaping the abundance and diversity of transposable elements (TEs). However, ideally, models of genome dynamics should consider not only TEs, but also the diversity of all genetic classes or “species” populating eukaryotic genomes. Here, in an analysis of the diversity and abundance of genetic elements in >500 eukaryotic chromosomes, we show that the patterns are consistent with a neutral hypothesis of genome assembly in virtually all chromosomes tested. The distributions of relative abundance of genetic elements are quite precisely predicted by the dynamics of an ecological model for which the principle of functional equivalence is the main assumption. We hypothesize that at large temporal scales an overarching neutral or nearly neutral process governs the evolution of abundance and diversity of genetic elements in eukaryotic genomes. PMID:23798991

  3. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights.

    Science.gov (United States)

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.

  4. Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations

    Directory of Open Access Journals (Sweden)

    José Marcelo Soriano Viana

    2016-06-01

    Full Text Available ABSTRACT To date, the quantitative genetics theory for genomic selection has focused mainly on the relationship between marker and additive variances assuming one marker and one quantitative trait locus (QTL. This study extends the quantitative genetics theory to genomic selection in order to prove that prediction of breeding values based on thousands of single nucleotide polymorphisms (SNPs depends on linkage disequilibrium (LD between markers and QTLs, assuming dominance. We also assessed the efficiency of genomic selection in relation to phenotypic selection, assuming mass selection in an open-pollinated population, all QTLs of lower effect, and reduced sample size, based on simulated data. We show that the average effect of a SNP substitution is proportional to LD measure and to average effect of a gene substitution for each QTL that is in LD with the marker. Weighted (by SNP frequencies and unweighted breeding value predictors have the same accuracy. Efficiency of genomic selection in relation to phenotypic selection is inversely proportional to heritability. Accuracy of breeding value prediction is not affected by the dominance degree and the method of analysis, however, it is influenced by LD extent and magnitude of additive variance. The increase in the number of markers asymptotically improved accuracy of breeding value prediction. The decrease in the sample size from 500 to 200 did not reduce considerably accuracy of breeding value prediction.

  5. Genome-scale prediction of proteins with long intrinsically disordered regions.

    Science.gov (United States)

    Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz

    2014-01-01

    Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.

  6. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix.

    Science.gov (United States)

    Zhang, Zhe; Erbe, Malena; He, Jinlong; Ober, Ulrike; Gao, Ning; Zhang, Hao; Simianer, Henner; Li, Jiaqi

    2015-02-09

    Obtaining accurate predictions of unobserved genetic or phenotypic values for complex traits in animal, plant, and human populations is possible through whole-genome prediction (WGP), a combined analysis of genotypic and phenotypic data. Because the underlying genetic architecture of the trait of interest is an important factor affecting model selection, we propose a new strategy, termed BLUP|GA (BLUP-given genetic architecture), which can use genetic architecture information within the dataset at hand rather than from public sources. This is achieved by using a trait-specific covariance matrix ( T: ), which is a weighted sum of a genetic architecture part ( S: matrix) and the realized relationship matrix ( G: ). The algorithm of BLUP|GA (BLUP-given genetic architecture) is provided and illustrated with real and simulated datasets. Predictive ability of BLUP|GA was validated with three model traits in a dairy cattle dataset and 11 traits in three public datasets with a variety of genetic architectures and compared with GBLUP and other approaches. Results show that BLUP|GA outperformed GBLUP in 20 of 21 scenarios in the dairy cattle dataset and outperformed GBLUP, BayesA, and BayesB in 12 of 13 traits in the analyzed public datasets. Further analyses showed that the difference of accuracies for BLUP|GA and GBLUP significantly correlate with the distance between the T: and G: matrices. The new strategy applied in BLUP|GA is a favorable and flexible alternative to the standard GBLUP model, allowing to account for the genetic architecture of the quantitative trait under consideration when necessary. This feature is mainly due to the increased similarity between the trait-specific relationship matrix ( T: matrix) and the genetic relationship matrix at unobserved causal loci. Applying BLUP|GA in WGP would ease the burden of model selection. Copyright © 2015 Zhang et al.

  7. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    Science.gov (United States)

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  8. Lawrence Livermore National Laboratory- Completing the Human Genome Project and Triggering Nearly $1 Trillion in U.S. Economic Activity

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, Jeffrey S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-07-28

    The success of the Human Genome project is already nearing $1 Trillion dollars of U.S. economic activity. Lawrence Livermore National Laboratory (LLNL) was a co-leader in one of the biggest biological research effort in history, sequencing the Human Genome Project. This ambitious research effort set out to sequence the approximately 3 billion nucleotides in the human genome, an effort many thought was nearly impossible. Deoxyribonucleic acid (DNA) was discovered in 1869, and by 1943 came the discovery that DNA was a molecule that encodes the genetic instructions used in the development and functioning of living organisms and many viruses. To make full use of the information, scientists needed to first sequence the billions of nucleotides to begin linking them to genetic traits and illnesses, and eventually more effective treatments. New medical discoveries and improved agriculture productivity were some of the expected benefits. While the potential benefits were vast, the timeline (over a decade) and cost ($3.8 Billion) exceeded what the private sector would normally attempt, especially when this would only be the first phase toward the path to new discoveries and market opportunities. The Department of Energy believed its best research laboratories could meet this Grand Challenge and soon convinced the National Institute of Health to formally propose the Human Genome project to the federal government. The U.S. government accepted the risk and challenge to potentially create new healthcare and food discoveries that could benefit the world and the U.S. Industry.

  9. Prediction for supersaturated total dissolved gas in high-dam hydropower projects

    Institute of Scientific and Technical Information of China (English)

    2009-01-01

    The supersaturated total dissolved gas(TDG)generated during high dam spills may cause gas bubble disease for fish and ultimately endanger their existence.As more and more high-dam hydropower projects have been constructed in China,the environmental assessment of the supersaturated TDG is becoming more and more important.It is of great importance for quantitative impact assessment of the supersaturated TDG of high dams and for the construction of ecological friendly high-dam hydropower projects.Based on the conceptual summarization of the TDG production process,the TDG prediction model for high-dam projects,in which the ski-jump energy dissipation is adopted,is developed in the paper.The model is validated by field data and employed in the TDG prediction of a high-dam hydropower project to be built in southwest China.

  10. Biofilm Formation Mechanisms of Pseudomonas aeruginosa Predicted via Genome-Scale Kinetic Models of Bacterial Metabolism.

    Science.gov (United States)

    Vital-Lopez, Francisco G; Reifman, Jaques; Wallqvist, Anders

    2015-10-01

    A hallmark of Pseudomonas aeruginosa is its ability to establish biofilm-based infections that are difficult to eradicate. Biofilms are less susceptible to host inflammatory and immune responses and have higher antibiotic tolerance than free-living planktonic cells. Developing treatments against biofilms requires an understanding of bacterial biofilm-specific physiological traits. Research efforts have started to elucidate the intricate mechanisms underlying biofilm development. However, many aspects of these mechanisms are still poorly understood. Here, we addressed questions regarding biofilm metabolism using a genome-scale kinetic model of the P. aeruginosa metabolic network and gene expression profiles. Specifically, we computed metabolite concentration differences between known mutants with altered biofilm formation and the wild-type strain to predict drug targets against P. aeruginosa biofilms. We also simulated the altered metabolism driven by gene expression changes between biofilm and stationary growth-phase planktonic cultures. Our analysis suggests that the synthesis of important biofilm-related molecules, such as the quorum-sensing molecule Pseudomonas quinolone signal and the exopolysaccharide Psl, is regulated not only through the expression of genes in their own synthesis pathway, but also through the biofilm-specific expression of genes in pathways competing for precursors to these molecules. Finally, we investigated why mutants defective in anthranilate degradation have an impaired ability to form biofilms. Alternative to a previous hypothesis that this biofilm reduction is caused by a decrease in energy production, we proposed that the dysregulation of the synthesis of secondary metabolites derived from anthranilate and chorismate is what impaired the biofilms of these mutants. Notably, these insights generated through our kinetic model-based approach are not accessible from previous constraint-based model analyses of P. aeruginosa biofilm

  11. Inroads to Predict in Vivo Toxicology—An Introduction to the eTOX Project

    Directory of Open Access Journals (Sweden)

    Jörg D. Wichard

    2012-03-01

    Full Text Available There is a widespread awareness that the wealth of preclinical toxicity data that the pharmaceutical industry has generated in recent decades is not exploited as efficiently as it could be. Enhanced data availability for compound comparison (“read-across”, or for data mining to build predictive tools, should lead to a more efficient drug development process and contribute to the reduction of animal use (3Rs principle. In order to achieve these goals, a consortium approach, grouping numbers of relevant partners, is required. The eTOX (“electronic toxicity” consortium represents such a project and is a public-private partnership within the framework of the European Innovative Medicines Initiative (IMI. The project aims at the development of in silico prediction systems for organ and in vivo toxicity. The backbone of the project will be a database consisting of preclinical toxicity data for drug compounds or candidates extracted from previously unpublished, legacy reports from thirteen European and European operation-based pharmaceutical companies. The database will be enhanced by incorporation of publically available, high quality toxicology data. Seven academic institutes and five small-to-medium size enterprises (SMEs contribute with their expertise in data gathering, database curation, data mining, chemoinformatics and predictive systems development. The outcome of the project will be a predictive system contributing to early potential hazard identification and risk assessment during the drug development process. The concept and strategy of the eTOX project is described here, together with current achievements and future deliverables.

  12. Rhipicephalus microplus strain Deutsch, whole genome shotgun sequencing project Version 2

    Science.gov (United States)

    The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. Cot filtration/selection techniques were used ...

  13. Energy Yield prediction of offshore wind farm clusters at the EERA–DTOC European project

    DEFF Research Database (Denmark)

    Cantero, E.; Sanz, J.; Lozano, S.;

    A new integrated design tool for optimization of offshore wind farm clusters is under development in the European Energy Research Alliance – Design Tools for Offshore wind farm Cluster project (EERA DTOC). The project builds on already established design tools from the project partners and possibly...... of uncertainty associated to each step. Methodologies for the assessment of offshore gross annual energy production are analyzed based on the Fino 1 test case. Measured data and virtual data from Numerical Weather Prediction models have been used to calculate long term mean wind speed, vertical wind profile...

  14. ELSI Bibliography: Ethical, legal and social implications of the Human Genome Project. 1994 Supplement

    Energy Technology Data Exchange (ETDEWEB)

    Yesley, M.S.; Ossorio, P.N. [comps.

    1994-09-01

    This report updates and expands the second edition of the ELSI Bibliography, published in 1993. The Bibliography and Supplement provides a comprehensive resource for identifying publications on the major topics related to the ethical, legal and social issues (ELSI) of the Human Genome Project. The Bibliography and Supplement are extracted from a database compiled at Los Alamos National Laboratory with the support of the Office of Energy Research, US Department of Energy. The second edition of the ELSI Bibliography was dated May 1993 but included publications added to the database until fall 1993. This Supplement reflects approximately 1,000 entries added to the database during the past year, bringing the total to approximately 7,000 entries. More than half of the new entries were published in the last year, and the remainder are earlier publications not previously included in the database. Most of the new entries were published in the academic and professional literature. The remainder are press reports from newspapers of record and scientific journals. The topical listing of the second edition has been followed in the Supplement, with a few changes. The topics of Cystic Fibrosis, Huntington`s Disease, and Sickle Cell Anemia have been combined in a single topic, Disorders. Also, all the entries published in the past year are included in a new topic, Publications: September 1993--September 1994, which provides a comprehensive view of recent reporting and commentary on the science and ELSI of genetics.

  15. Familial aggregation of focal seizure semiology in the Epilepsy Phenome/Genome Project.

    Science.gov (United States)

    Tobochnik, Steven; Fahlstrom, Robyn; Shain, Catherine; Winawer, Melodie R

    2017-07-04

    To improve phenotype definition in genetic studies of epilepsy, we assessed the familial aggregation of focal seizure types and of specific seizure symptoms within the focal epilepsies in families from the Epilepsy Phenome/Genome Project. We studied 302 individuals with nonacquired focal epilepsy from 149 families. Familial aggregation was assessed by logistic regression analysis of relatives' traits (dependent variable) by probands' traits (independent variable), estimating the odds ratio for each symptom in a relative given presence vs absence of the symptom in the proband. In families containing multiple individuals with nonacquired focal epilepsy, we found significant evidence for familial aggregation of ictal motor, autonomic, psychic, and aphasic symptoms. Within these categories, ictal whole body posturing, diaphoresis, dyspnea, fear/anxiety, and déjà vu/jamais vu showed significant familial aggregation. Focal seizure type aggregated as well, including complex partial, simple partial, and secondarily generalized tonic-clonic seizures. Our results provide insight into genotype-phenotype correlation in the nonacquired focal epilepsies and a framework for identifying subgroups of patients likely to share susceptibility genes. © 2017 American Academy of Neurology.

  16. Functional divergence in the genus Oenococcus as predicted by genome sequencing of the newly-described species, Oenococcus kitaharae.

    Directory of Open Access Journals (Sweden)

    Anthony R Borneman

    Full Text Available Oenococcus kitaharae is only the second member of the genus Oenococcus to be identified and is the closest relative of the industrially important wine bacterium Oenococcus oeni. To provide insight into this new species, the genome of the type strain of O. kitaharae, DSM 17330, was sequenced. Comparison of the sequenced genomes of both species show that the genome of O. kitaharae DSM 17330 contains many genes with predicted functions in cellular defence (bacteriocins, antimicrobials, restriction-modification systems and a CRISPR locus which are lacking in O. oeni. The two genomes also appear to differentially encode several metabolic pathways associated with amino acid biosynthesis and carbohydrate utilization and which have direct phenotypic consequences. This would indicate that the two species have evolved different survival techniques to suit their particular environmental niches. O. oeni has adapted to survive in the harsh, but predictable, environment of wine that provides very few competitive species. However O. kitaharae appears to have adapted to a growth environment in which biological competition provides a significant selective pressure by accumulating biological defence molecules, such as bacteriocins and restriction-modification systems, throughout its genome.

  17. Evaluation of genome-wide power of genetic association studies based on empirical data from the HapMap project.

    Science.gov (United States)

    Nannya, Yasuhito; Taura, Kenjiro; Kurokawa, Mineo; Chiba, Shigeru; Ogawa, Seishi

    2007-10-15

    With recent advances in high-throughput single nucleotide polymorphism (SNP) typing technologies, genome-wide association studies have become a realistic approach to identify the causative genes that are responsible for common diseases of complex genetic traits. In this strategy, a trade-off between the increased genome coverage and a chance of finding SNPs incidentally showing a large statistics becomes serious due to extreme multiple-hypothesis testing. We investigated the extent to which this trade-off limits the genome-wide power with this approach by simulating a large number of case-control panels based on the empirical data from the HapMap Project. In our simulations, statistical costs of multiple hypothesis testing were evaluated by empirically calculating distributions of the maximum value of the chi(2) statistics for a series of marker sets having increasing numbers of SNPs, which were used to determine a genome-wide threshold in the following power simulations. With a practical study size, the cost of multiple testing largely offsets the potential benefits from increased genome coverage given modest genetic effects and/or low frequencies of causal alleles. In most realistic scenarios, increasing genome coverage becomes less influential on the power, while sample size is the predominant determinant of the feasibility of genome-wide association tests. Increasing genome coverage without corresponding increase in sample size will only consume resources without little gain in power. For common causal alleles with relatively large effect sizes [genotype relative risk > or =1.7], we can expect satisfactory power with currently available large-scale genotyping platforms using realistic sample size ( approximately 1000 per arm).

  18. Genetic parameters for predicted methane production and potential for reducing enteric emissions through genomic selection.

    Science.gov (United States)

    Haas, Y de; Windig, J J; Calus, M P L; Dijkstra, J; Haan, M de; Bannink, A; Veerkamp, R F

    2011-12-01

    Mitigation of enteric methane (CH₄) emission in ruminants has become an important area of research because accumulation of CH₄ is linked to global warming. Nutritional and microbial opportunities to reduce CH₄ emissions have been extensively researched, but little is known about using natural variation to breed animals with lower CH₄ yield. Measuring CH₄ emission rates directly from animals is difficult and hinders direct selection on reduced CH₄ emission. However, improvements can be made through selection on associated traits (e.g., residual feed intake, RFI) or through selection on CH₄ predicted from feed intake and diet composition. The objective was to establish phenotypic and genetic variation in predicted CH₄ output, and to determine the potential of genetics to reduce methane emissions in dairy cattle. Experimental data were used and records on daily feed intake, weekly body weights, and weekly milk production were available from 548 heifers. Residual feed intake (MJ/d) is the difference between net energy intake and calculated net energy requirements for maintenance as a function of body weight and for fat- and protein-corrected milk production. Predicted methane emission (PME; g/d) is 6% of gross energy intake (Intergovernmental Panel on Climate Change methodology) corrected for energy content of methane (55.65 kJ/g). The estimated heritabilities for PME and RFI were 0.35 and 0.40, respectively. The positive genetic correlation between RFI and PME indicated that cows with lower RFI have lower PME (estimates ranging from 0.18 to 0.84). Hence, it is possible to decrease the methane production of a cow by selecting more-efficient cows, and the genetic variation suggests that reductions in the order of 11 to 26% in 10 yr are theoretically possible, and could be even higher in a genomic selection program. However, several uncertainties are discussed; for example, the lack of true methane measurements (and the key assumption that methane

  19. Proyecto genoma humano: un arma de doble filo The Human Genome Project: A double edge weapon

    Directory of Open Access Journals (Sweden)

    Elizabeth Hernández Moore

    2001-04-01

    Full Text Available Después de breve reseña histórica que informa sobre los sorprendentes avances de la genética a partir del descubrimiento de la estructura helicoidal del DNA, el artículo centra su atención en el nacimiento de los estudios genómicos en los Estados Unidos de Norteamérica, las causas y condiciones que los motivaron, hasta desembocar en el multinacional Proyecto Genoma Humano. Sin olvidar la estatura científica de tal empresa, se intenta una mirada desde la perspectiva de las relaciones Norte-Sur, remitiéndonos de modo más incisivo a los aspectos éticos más controvertidos del PGH. Argumentamos que en las sociedades del Sur debemos ocuparnos en jerarquizar los principales problemas bioéticos que nos aquejan y que están aún muy distantes de los que se "encargan" al PGH . Referimos que las sociedades del Sur deben insertar en su agenda, proyecciones en Ciencia, Tecnología y Sociedad, entre las que el PGH no califica como una prioridad autóctona, aún cuando no descalificamos en su esencia tales megaproyectos, originados en los centros y circuitos propios de la ciencia del NorteAlter brief historical review that informs on the surprising advances of the genetics starting from the discovery of the spiral structure of the DNA, the article centres its attention in the birth of the genetic studies in the United Status of America, the causes and conditions that motivated them, intil ending in the I multinacional Human Genome Project without forgetting the scientific stature of such Project. It is attempted a llok from the perspective of the North-South relationships, remiting us of the more incisive way to the most controversial ethical aspects of the HPG. We argue that in the societies of the South we shoujd be in charge of organizing hierchically the main bioethical problems that we suffer and they are even very distant of those that are in charge of the HGP. We refer that the societies of the South should insert in their calendar

  20. Predictive genomic and metabolomic analysis for the standardization of enzyme data

    Directory of Open Access Journals (Sweden)

    Masaaki Kotera

    2014-05-01

    Full Text Available The IUBMB׳s Enzyme List gives a valuable library of the individual experimental facts on enzyme activities, providing the standard classification and nomenclature of enzymes. Empirical knowledge about the relationships between the enzyme protein sequences (or structures and their functions (the capability of catalyzing chemical reactions has been accumulating in public literatures and databases. This provides a complementary approach to standardize and organize enzyme data, i.e., predicting the possible enzymes, reactions and metabolites that remain to be identified experimentally. Thus, we suggest the necessity of classifying enzymes based on the evidence and different perspectives obtained from various experimental works. The KEGG (Kyoto Encyclopedia of Genes and Genomes database describes enzymes from many different viewpoints including; the IUBMB׳s enzyme nomenclature/classification (EC numbers, the similarity group of enzyme reactions (KEGG Reaction Class; RCLASS based solely on the chemical structure transformation patterns, and the similarity groups of enzyme genes (KEGG Orthology; KO based on the orthologous groups that can be mapped to the KEGG PATHWAY and BRITE functional hierarchy. Some unique identifiers were additionally introduced to the KEGG database other than the EC numbers established by IUBMB. R, RP and RC numbers are given to distinguish reactions, reactant pairs and RCLASS, respectively. Genes, including enzyme genes, have their own ID numbers in specific organisms, and they are classified into ortholog groups that are identified by K numbers. In this review, we explain the concept and methodology of this formulation with some concrete example cases. We propose it beneficial to create a standard classification scheme that deals with both experimentally identified and theoretically predicted enzymes.

  1. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

    Science.gov (United States)

    Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun

    2013-10-16

    The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining

  2. Predicting transcription factor binding sites using local over-representation and comparative genomics

    Directory of Open Access Journals (Sweden)

    Touzet Hélène

    2006-08-01

    Full Text Available Abstract Background Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms. Results We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets. Conclusion TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at http://bioinfo.lifl.fr/TFM-Explorer.

  3. Spheres of influence: Ethical, legal, and social issues of the Human Genome Project: What to do with what we know

    Energy Technology Data Exchange (ETDEWEB)

    Pellerin, C. (Alexandria, VA (United States))

    1994-01-01

    Since fiscal year 1991, the U.S. Human Genome Project has spent $170.6 million in federal funds to help isolate genes associated with Huntington's disease, amyotrophic lateral sclerosis, neurofibromatosis types 1 and 2, myotonic dystrophy, and fragile X syndrome and to localize genes that predispose people to breast cancer, colon cancer, hypertension, diabetes, and Alzheimer's disease. Now come the hard part. Biology's 21st century megaproject starts to look relatively manageable compared to another challenge facing the enterprise: sorting out ethical, legal, and social issues associated with using this information. [open quotes]The Human Genome Project,[close quotes] wrote Senior Editor Barbara Jasny in the October 1 Science editorial, stretches [open quotes]the limits of the technology and the limits of our ability to ethically and rationally apply genetic information to our lives.[close quotes

  4. On the limits of computational functional genomics for bacterial lifestyle prediction

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Röttger, Richard; Hauschild, Anne-Christin

    2014-01-01

    We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HP...... in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited....

  5. Accuracy of Igenity genomically estimated breeding values for predicting Australian Angus BREEDPLAN traits.

    Science.gov (United States)

    Boerner, V; Johnston, D; Wu, X-L; Bauck, S

    2015-02-01

    Genomically estimated breeding values (GEBV) for Angus beef cattle are available from at least 2 commercial suppliers (Igenity [http://www.igenity.com] and Zoetis [http://www.zoetis.com]). The utility of these GEBV for improving genetic evaluation depends on their accuracies, which can be estimated by the genetic correlation with phenotypic target traits. Genomically estimated breeding values of 1,032 Angus bulls calculated from prediction equations (PE) derived by 2 different procedures in the U.S. Angus population were supplied by Igenity. Both procedures were based on Illuminia BovineSNP50 BeadChip genotypes. In procedure sg, GEBV were calculated from PE that used subsets of only 392 SNP, where these subsets were individually selected for each trait by BayesCπ. In procedure rg GEBV were calculated from PE derived in a ridge regression approach using all available SNP. Because the total set of 1,032 bulls with GEBV contained 732 individuals used in the Igenity training population, GEBV subsets were formed characterized by a decreasing average relationship between individuals in the subsets and individuals in the training population. Accuracies of GEBV were estimated as genetic correlations between GEBV and their phenotypic target traits modeling GEBV as trait observations in a bivariate REML approach, in which phenotypic observations were those recorded in the commercial Australian Angus seed stock sector. Using results from the GEBV subset excluding all training individuals as a reference, estimated accuracies were generally in agreement with those already published, with both types of GEBV (sg and rg) yielding similar results. Accuracies for growth traits ranged from 0.29 to 0.45, for reproductive traits from 0.11 to 0.53, and for carcass traits from 0.3 to 0.75. Accuracies generally decreased with an increasing genetic distance between the training and the validation population. However, for some carcass traits characterized by a low number of phenotypic

  6. Integrating genomics and proteomics data to predict drug effects using binary linear programming.

    Science.gov (United States)

    Ji, Zhiwei; Su, Jing; Liu, Chenglin; Wang, Hongyan; Huang, Deshuang; Zhou, Xiaobo

    2014-01-01

    The Library of Integrated Network-Based Cellular Signatures (LINCS) project aims to create a network-based understanding of biology by cataloging changes in gene expression and signal transduction that occur when cells are exposed to a variety of perturbations. It is helpful for understanding cell pathways and facilitating drug discovery. Here, we developed a novel approach to infer cell-specific pathways and identify a compound's effects using gene expression and phosphoproteomics data under treatments with different compounds. Gene expression data were employed to infer potential targets of compounds and create a generic pathway map. Binary linear programming (BLP) was then developed to optimize the generic pathway topology based on the mid-stage signaling response of phosphorylation. To demonstrate effectiveness of this approach, we built a generic pathway map for the MCF7 breast cancer cell line and inferred the cell-specific pathways by BLP. The first group of 11 compounds was utilized to optimize the generic pathways, and then 4 compounds were used to identify effects based on the inferred cell-specific pathways. Cross-validation indicated that the cell-specific pathways reliably predicted a compound's effects. Finally, we applied BLP to re-optimize the cell-specific pathways to predict the effects of 4 compounds (trichostatin A, MS-275, staurosporine, and digoxigenin) according to compound-induced topological alterations. Trichostatin A and MS-275 (both HDAC inhibitors) inhibited the downstream pathway of HDAC1 and caused cell growth arrest via activation of p53 and p21; the effects of digoxigenin were totally opposite. Staurosporine blocked the cell cycle via p53 and p21, but also promoted cell growth via activated HDAC1 and its downstream pathway. Our approach was also applied to the PC3 prostate cancer cell line, and the cross-validation analysis showed very good accuracy in predicting effects of 4 compounds. In summary, our computational model can be

  7. Genetics of Charcot-Marie-Tooth (CMT) Disease within the Frame of the Human Genome Project Success

    Science.gov (United States)

    Timmerman, Vincent; Strickland, Alleene V.; Züchner, Stephan

    2014-01-01

    Charcot-Marie-Tooth (CMT) neuropathies comprise a group of monogenic disorders affecting the peripheral nervous system. CMT is characterized by a clinically and genetically heterogeneous group of neuropathies, involving all types of Mendelian inheritance patterns. Over 1,000 different mutations have been discovered in 80 disease-associated genes. Genetic research of CMT has pioneered the discovery of genomic disorders and aided in understanding the effects of copy number variation and the mechanisms of genomic rearrangements. CMT genetic study also unraveled common pathomechanisms for peripheral nerve degeneration, elucidated gene networks, and initiated the development of therapeutic approaches. The reference genome, which became available thanks to the Human Genome Project, and the development of next generation sequencing tools, considerably accelerated gene and mutation discoveries. In fact, the first clinical whole genome sequence was reported in a patient with CMT. Here we review the history of CMT gene discoveries, starting with technologies from the early days in human genetics through the high-throughput application of modern DNA analyses. We highlight the most relevant examples of CMT genes and mutation mechanisms, some of which provide promising treatment strategies. Finally, we propose future initiatives to accelerate diagnosis of CMT patients through new ways of sharing large datasets and genetic variants, and at ever diminishing costs. PMID:24705285

  8. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

    Science.gov (United States)

    Birney, Ewan; Stamatoyannopoulos, John A; Dutta, Anindya; Guigó, Roderic; Gingeras, Thomas R; Margulies, Elliott H; Weng, Zhiping; Snyder, Michael; Dermitzakis, Emmanouil T; Thurman, Robert E; Kuehn, Michael S; Taylor, Christopher M; Neph, Shane; Koch, Christoph M; Asthana, Saurabh; Malhotra, Ankit; Adzhubei, Ivan; Greenbaum, Jason A; Andrews, Robert M; Flicek, Paul; Boyle, Patrick J; Cao, Hua; Carter, Nigel P; Clelland, Gayle K; Davis, Sean; Day, Nathan; Dhami, Pawandeep; Dillon, Shane C; Dorschner, Michael O; Fiegler, Heike; Giresi, Paul G; Goldy, Jeff; Hawrylycz, Michael; Haydock, Andrew; Humbert, Richard; James, Keith D; Johnson, Brett E; Johnson, Ericka M; Frum, Tristan T; Rosenzweig, Elizabeth R; Karnani, Neerja; Lee, Kirsten; Lefebvre, Gregory C; Navas, Patrick A; Neri, Fidencio; Parker, Stephen C J; Sabo, Peter J; Sandstrom, Richard; Shafer, Anthony; Vetrie, David; Weaver, Molly; Wilcox, Sarah; Yu, Man; Collins, Francis S; Dekker, Job; Lieb, Jason D; Tullius, Thomas D; Crawford, Gregory E; Sunyaev, Shamil; Noble, William S; Dunham, Ian; Denoeud, France; Reymond, Alexandre; Kapranov, Philipp; Rozowsky, Joel; Zheng, Deyou; Castelo, Robert; Frankish, Adam; Harrow, Jennifer; Ghosh, Srinka; Sandelin, Albin; Hofacker, Ivo L; Baertsch, Robert; Keefe, Damian; Dike, Sujit; Cheng, Jill; Hirsch, Heather A; Sekinger, Edward A; Lagarde, Julien; Abril, Josep F; Shahab, Atif; Flamm, Christoph; Fried, Claudia; Hackermüller, Jörg; Hertel, Jana; Lindemeyer, Manja; Missal, Kristin; Tanzer, Andrea; Washietl, Stefan; Korbel, Jan; Emanuelsson, Olof; Pedersen, Jakob S; Holroyd, Nancy; Taylor, Ruth; Swarbreck, David; Matthews, Nicholas; Dickson, Mark C; Thomas, Daryl J; Weirauch, Matthew T; Gilbert, James; Drenkow, Jorg; Bell, Ian; Zhao, XiaoDong; Srinivasan, K G; Sung, Wing-Kin; Ooi, Hong Sain; Chiu, Kuo Ping; Foissac, Sylvain; Alioto, Tyler; Brent, Michael; Pachter, Lior; Tress, Michael L; Valencia, Alfonso; Choo, Siew Woh; Choo, Chiou Yu; Ucla, Catherine; Manzano, Caroline; Wyss, Carine; Cheung, Evelyn; Clark, Taane G; Brown, James B; Ganesh, Madhavan; Patel, Sandeep; Tammana, Hari; Chrast, Jacqueline; Henrichsen, Charlotte N; Kai, Chikatoshi; Kawai, Jun; Nagalakshmi, Ugrappa; Wu, Jiaqian; Lian, Zheng; Lian, Jin; Newburger, Peter; Zhang, Xueqing; Bickel, Peter; Mattick, John S; Carninci, Piero; Hayashizaki, Yoshihide; Weissman, Sherman; Hubbard, Tim; Myers, Richard M; Rogers, Jane; Stadler, Peter F; Lowe, Todd M; Wei, Chia-Lin; Ruan, Yijun; Struhl, Kevin; Gerstein, Mark; Antonarakis, Stylianos E; Fu, Yutao; Green, Eric D; Karaöz, Ulaş; Siepel, Adam; Taylor, James; Liefer, Laura A; Wetterstrand, Kris A; Good, Peter J; Feingold, Elise A; Guyer, Mark S; Cooper, Gregory M; Asimenos, George; Dewey, Colin N; Hou, Minmei; Nikolaev, Sergey; Montoya-Burgos, Juan I; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Huang, Haiyan; Zhang, Nancy R; Holmes, Ian; Mullikin, James C; Ureta-Vidal, Abel; Paten, Benedict; Seringhaus, Michael; Church, Deanna; Rosenbloom, Kate; Kent, W James; Stone, Eric A; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross C; Haussler, David; Miller, Webb; Sidow, Arend; Trinklein, Nathan D; Zhang, Zhengdong D; Barrera, Leah; Stuart, Rhona; King, David C; Ameur, Adam; Enroth, Stefan; Bieda, Mark C; Kim, Jonghwan; Bhinge, Akshay A; Jiang, Nan; Liu, Jun; Yao, Fei; Vega, Vinsensius B; Lee, Charlie W H; Ng, Patrick; Shahab, Atif; Yang, Annie; Moqtaderi, Zarmik; Zhu, Zhou; Xu, Xiaoqin; Squazzo, Sharon; Oberley, Matthew J; Inman, David; Singer, Michael A; Richmond, Todd A; Munn, Kyle J; Rada-Iglesias, Alvaro; Wallerman, Ola; Komorowski, Jan; Fowler, Joanna C; Couttet, Phillippe; Bruce, Alexander W; Dovey, Oliver M; Ellis, Peter D; Langford, Cordelia F; Nix, David A; Euskirchen, Ghia; Hartman, Stephen; Urban, Alexander E; Kraus, Peter; Van Calcar, Sara; Heintzman, Nate; Kim, Tae Hoon; Wang, Kun; Qu, Chunxu; Hon, Gary; Luna, Rosa; Glass, Christopher K; Rosenfeld, M Geoff; Aldred, Shelley Force; Cooper, Sara J; Halees, Anason; Lin, Jane M; Shulha, Hennady P; Zhang, Xiaoling; Xu, Mousheng; Haidar, Jaafar N S; Yu, Yong; Ruan, Yijun; Iyer, Vishwanath R; Green, Roland D; Wadelius, Claes; Farnham, Peggy J; Ren, Bing; Harte, Rachel A; Hinrichs, Angie S; Trumbower, Heather; Clawson, Hiram; Hillman-Jackson, Jennifer; Zweig, Ann S; Smith, Kayla; Thakkapallayil, Archana; Barber, Galt; Kuhn, Robert M; Karolchik, Donna; Armengol, Lluis; Bird, Christine P; de Bakker, Paul I W; Kern, Andrew D; Lopez-Bigas, Nuria; Martin, Joel D; Stranger, Barbara E; Woodroffe, Abigail; Davydov, Eugene; Dimas, Antigone; Eyras, Eduardo; Hallgrímsdóttir, Ingileif B; Huppert, Julian; Zody, Michael C; Abecasis, Gonçalo R; Estivill, Xavier; Bouffard, Gerard G; Guan, Xiaobin; Hansen, Nancy F; Idol, Jacquelyn R; Maduro, Valerie V B; Maskeri, Baishali; McDowell, Jennifer C; Park, Morgan; Thomas, Pamela J; Young, Alice C; Blakesley, Robert W; Muzny, Donna M; Sodergren, Erica; Wheeler, David A; Worley, Kim C; Jiang, Huaiyang; Weinstock, George M; Gibbs, Richard A; Graves, Tina; Fulton, Robert; Mardis, Elaine R; Wilson, Richard K; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B; Chang, Jean L; Lindblad-Toh, Kerstin; Lander, Eric S; Koriabine, Maxim; Nefedov, Mikhail; Osoegawa, Kazutoyo; Yoshinaga, Yuko; Zhu, Baoli; de Jong, Pieter J

    2007-06-14

    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

  9. Genetics of Charcot-Marie-Tooth (CMT Disease within the Frame of the Human Genome Project Success

    Directory of Open Access Journals (Sweden)

    Vincent Timmerman

    2014-01-01

    Full Text Available Charcot-Marie-Tooth (CMT neuropathies comprise a group of monogenic disorders affecting the peripheral nervous system. CMT is characterized by a clinically and genetically heterogeneous group of neuropathies, involving all types of Mendelian inheritance patterns. Over 1,000 different mutations have been discovered in 80 disease-associated genes. Genetic research of CMT has pioneered the discovery of genomic disorders and aided in understanding the effects of copy number variation and the mechanisms of genomic rearrangements. CMT genetic study also unraveled common pathomechanisms for peripheral nerve degeneration, elucidated gene networks, and initiated the development of therapeutic approaches. The reference genome, which became available thanks to the Human Genome Project, and the development of next generation sequencing tools, considerably accelerated gene and mutation discoveries. In fact, the first clinical whole genome sequence was reported in a patient with CMT. Here we review the history of CMT gene discoveries, starting with technologies from the early days in human genetics through the high-throughput application of modern DNA analyses. We highlight the most relevant examples of CMT genes and mutation mechanisms, some of which provide promising treatment strategies. Finally, we propose future initiatives to accelerate diagnosis of CMT patients through new ways of sharing large datasets and genetic variants, and at ever diminishing costs.

  10. A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data

    OpenAIRE

    2012-01-01

    Background Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common var...

  11. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

    Science.gov (United States)

    Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan

    2016-05-01

    An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To

  12. A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education.

    Science.gov (United States)

    Oleksyk, Taras K; Pombert, Jean-Francois; Siu, Daniel; Mazo-Vargas, Anyimilehidi; Ramos, Brian; Guiblet, Wilfried; Afanador, Yashira; Ruiz-Rodriguez, Christina T; Nickerson, Michael L; Logue, David M; Dean, Michael; Figueroa, Luis; Valentin, Ricardo; Martinez-Cruzado, Juan-Carlos

    2012-09-28

    Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrot species in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studied on a genomic scale. In a unique community-based funded project, DNA from an A. vittata female was sequenced using a HiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89x average coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in 259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments (N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of 1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other available avian whole-genome sequences. The current data represents the first genomic information from and work carried out with a unique source of funding. This analysis further provides a means for directed training of young researchers in genetic and bioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Rican parrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful for comparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contribute to an improved understanding of the overall population health of this species and aid in ongoing and future conservation efforts.

  13. Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress

    Directory of Open Access Journals (Sweden)

    Benham Craig J

    2006-05-01

    Full Text Available Abstract Background In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters. Results We show that the propensity for stress-induced DNA duplex destabilization (SIDD is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes. Conclusion In this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in

  14. MicroTrout: A comprehensive, genome-wide miRNA target prediction framework for rainbow trout, Oncorhynchus mykiss.

    Science.gov (United States)

    Mennigen, Jan A; Zhang, Dapeng

    2016-12-01

    Rainbow trout represent an important teleost research model and aquaculture species. As such, rainbow trout are employed in diverse areas of biological research, including basic biological disciplines such as comparative physiology, toxicology, and, since rainbow trout have undergone both teleost- and salmonid-specific rounds of genome duplication, molecular evolution. In recent years, microRNAs (miRNAs, small non-protein coding RNAs) have emerged as important posttranscriptional regulators of gene expression in animals. Given the increasingly recognized importance of miRNAs as an additional layer in the regulation of gene expression and hence biological function, recent efforts using RNA- and genome sequencing approaches have resulted in the creation of several resources for the construction of a comprehensive repertoire of rainbow trout miRNAs and isomiRs (variant miRNA sequences that all appear to derive from the same gene but vary in sequence due to post-transcriptional processing). Importantly, through the recent publication of the rainbow trout genome (Berthelot et al., 2014), mRNA 3'UTR information has become available, allowing for the first time the genome-wide prediction of miRNA-target RNA relationships in this species. We here report the creation of the microtrout database, a comprehensive resource for rainbow trout miRNA and annotated 3'UTRs. The comprehensive database was used to implement an algorithm to predict genome-wide rainbow trout-specific miRNA-mRNA target relationships, generating an improved predictive framework over previously published approaches. This work will serve as a useful framework and sequence resource to experimentally address the role of miRNAs in several research areas using the rainbow trout model, examples of which are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  16. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots

    Science.gov (United States)

    Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the compu...

  17. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sükösd, Zsuzsanna; Andersen, Ebbe Sloth; Seemann, Ernst Stefan;

    2015-01-01

    of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping...

  18. Bayesian prediction of bacterial growth temperature range based on genome sequences

    DEFF Research Database (Denmark)

    Jensen, Dan Børge; Vesth, Tammi Camilla; Hallin, Peter Fischer

    2012-01-01

    on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results: This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles...... and psychrophilic adapted bacterial genomes....

  19. Prediction of disease and phenotype associations from genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Stephanie N Lewis

    Full Text Available BACKGROUND: Genome wide association studies (GWAS have proven useful as a method for identifying genetic variations associated with diseases. In this study, we analyzed GWAS data for 61 diseases and phenotypes to elucidate common associations based on single nucleotide polymorphisms (SNP. The study was an expansion on a previous study on identifying disease associations via data from a single GWAS on seven diseases. METHODOLOGY/PRINCIPAL FINDINGS: Adjustments to the originally reported study included expansion of the SNP dataset using Linkage Disequilibrium (LD and refinement of the four levels of analysis to encompass SNP, SNP block, gene, and pathway level comparisons. A pair-wise comparison between diseases and phenotypes was performed at each level and the Jaccard similarity index was used to measure the degree of association between two diseases/phenotypes. Disease relatedness networks (DRNs were used to visualize our results. We saw predominant relatedness between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis for the first three levels of analysis. Expected relatedness was also seen between lipid- and blood-related traits. CONCLUSIONS/SIGNIFICANCE: The predominant associations between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis can be validated by clinical studies. The diseases have been proposed to share a systemic inflammation phenotype that can result in progression of additional diseases in patients with one of these three diseases. We also noticed unexpected relationships between metabolic and neurological diseases at the pathway comparison level. The less significant relationships found between diseases require a more detailed literature review to determine validity of the predictions. The results from this study serve as a first step towards a better understanding of seemingly unrelated diseases and phenotypes with similar symptoms or modes of treatment.

  20. Persistency of Prediction Accuracy and Genetic Gain in Synthetic Populations Under Recurrent Genomic Selection

    Directory of Open Access Journals (Sweden)

    Dominik Müller

    2017-03-01

    Full Text Available Recurrent selection (RS has been used in plant breeding to successively improve synthetic and other multiparental populations. Synthetics are generated from a limited number of parents ( Np , but little is known about how Np affects genomic selection (GS in RS, especially the persistency of prediction accuracy (rg , g ^ and genetic gain. Synthetics were simulated by intermating Np= 2–32 parent lines from an ancestral population with short- or long-range linkage disequilibrium (LDA and subjected to multiple cycles of GS. We determined rg , g ^ and genetic gain across 30 cycles for different training set (TS sizes, marker densities, and generations of recombination before model training. Contributions to rg , g ^ and genetic gain from pedigree relationships, as well as from cosegregation and LDA between QTL and markers, were analyzed via four scenarios differing in (i the relatedness between TS and selection candidates and (ii whether selection was based on markers or pedigree records. Persistency of rg , g ^ was high for small Np , where predominantly cosegregation contributed to rg , g ^ , but also for large Np , where LDA replaced cosegregation as the dominant information source. Together with increasing genetic variance, this compensation resulted in relatively constant long- and short-term genetic gain for increasing Np > 4, given long-range LDA in the ancestral population. Although our scenarios suggest that information from pedigree relationships contributed to rg , g ^ for only very few generations in GS, we expect a longer contribution than in pedigree BLUP, because capturing Mendelian sampling by markers reduces selective pressure on pedigree relationships. Larger TS size (NTS and higher marker density improved persistency of rg , g ^ and hence genetic gain, but additional recombinations could not increase genetic gain.

  1. Controlling our destinies: Historical, philosophical, social and ethical perspectives on the Human Genome Project: Final report, July 1, 1995-June 30, 1996

    Energy Technology Data Exchange (ETDEWEB)

    Sloan, P.R.

    1996-09-25

    This report briefly describes the efforts by the organizing committee in preparation for the conference entitled Controlling Our Destinies: Historical, Philosophical, Social, and Ethical Perspectives on the Human Genome Project. The conference was held October 5-8, 1995.

  2. Genome wide prediction of HNF4alpha functional binding sites by the use of local and global sequence context.

    Science.gov (United States)

    Kel, Alexander E; Niehof, Monika; Matys, Volker; Zemlin, Rüdiger; Borlak, Jürgen

    2008-01-01

    We report an application of machine learning algorithms that enables prediction of the functional context of transcription factor binding sites in the human genome. We demonstrate that our method allowed de novo identification of hepatic nuclear factor (HNF)4alpha binding sites and significantly improved an overall recognition of faithful HNF4alpha targets. When applied to published findings, an unprecedented high number of false positives were identified. The technique can be applied to any transcription factor.

  3. Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2006-07-01

    Full Text Available Abstract Background A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast. Results We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis. Conclusion By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding

  4. Low frequency variants, collapsed based on biological knowledge, uncover complexity of population stratification in 1000 genomes project data.

    Directory of Open Access Journals (Sweden)

    Carrie B Moore

    Full Text Available Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03 between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.

  5. Massively parallel processing on the Intel Paragon system: One tool in achieving the goals of the Human Genome Project

    Energy Technology Data Exchange (ETDEWEB)

    Ecklund, D.J. [Intel Supercomputer Systems Division, Beaverton, OR (United States)

    1993-12-31

    A massively parallel computing system is one tool that has been adopted by researchers in the Human Genome Project. This tool is one of many in a toolbox of theories, algorithms, and systems that are used to attack the many questions posed by the project. A good tool functions well when applied alone to the problem for which it was devised. A superior tool achieves its solitary goal, and supports and interacts with other tools to achieve goals beyond the scope of any individual tool. The author believes that Intel`s massively parallel Paragon{trademark} XP/S system is a superior tool. This paper presents specific requirements for a superior computing tool for the Human Genome Project (HGP) and shows how the Paragon system addresses these requirements. Computing requirements for HGP are based on three factors: (1) computing requirements of algorithms currently used in sequence homology, protein folding, and database insertion/retrieval; (2) estimates of the computing requirements of new applications arising from evolving biological theories; and (3) the requirements for facilities that support collaboration among scientists in a project of this magnitude. The Paragon system provides many hardware and software features that effectively address these requirements.

  6. Accuracy of genomic prediction using deregressed breeding values estimated from purebred and crossbred offspring phenotypes in pigs.

    Science.gov (United States)

    Hidalgo, A M; Bastiaansen, J W M; Lopes, M S; Veroneze, R; Groenen, M A M; de Koning, D-J

    2015-07-01

    Genomic selection is applied to dairy cattle breeding to improve the genetic progress of purebred (PB) animals, whereas in pigs and poultry the target is a crossbred (CB) animal for which a different strategy appears to be needed. The source of information used to estimate the breeding values, i.e., using phenotypes of CB or PB animals, may affect the accuracy of prediction. The objective of our study was to assess the direct genomic value (DGV) accuracy of CB and PB pigs using different sources of phenotypic information. Data used were from 3 populations: 2,078 Dutch Landrace-based, 2,301 Large White-based, and 497 crossbreds from an F1 cross between the 2 lines. Two female reproduction traits were analyzed: gestation length (GLE) and total number of piglets born (TNB). Phenotypes used in the analyses originated from offspring of genotyped individuals. Phenotypes collected on CB and PB animals were analyzed as separate traits using a single-trait model. Breeding values were estimated separately for each trait in a pedigree BLUP analysis and subsequently deregressed. Deregressed EBV for each trait originating from different sources (CB or PB offspring) were used to study the accuracy of genomic prediction. Accuracy of prediction was computed as the correlation between DGV and the DEBV of the validation population. Accuracy of prediction within PB populations ranged from 0.43 to 0.62 across GLE and TNB. Accuracies to predict genetic merit of CB animals with one PB population in the training set ranged from 0.12 to 0.28, with the exception of using the CB offspring phenotype of the Dutch Landrace that resulted in an accuracy estimate around 0 for both traits. Accuracies to predict genetic merit of CB animals with both parental PB populations in the training set ranged from 0.17 to 0.30. We conclude that prediction within population and trait had good predictive ability regardless of the trait being the PB or CB performance, whereas using PB population(s) to predict

  7. GenoMatrix: A Software Package for Pedigree-Based and Genomic Prediction Analyses on Complex Traits.

    Science.gov (United States)

    Nazarian, Alireza; Gezan, Salvador Alejandro

    2016-07-01

    Genomic and pedigree-based best linear unbiased prediction methodologies (G-BLUP and P-BLUP) have proven themselves efficient for partitioning the phenotypic variance of complex traits into its components, estimating the individuals' genetic merits, and predicting unobserved (or yet-to-be observed) phenotypes in many species and fields of study. The GenoMatrix software, presented here, is a user-friendly package to facilitate the process of using genome-wide marker data and parentage information for G-BLUP and P-BLUP analyses on complex traits. It provides users with a collection of applications which help them on a set of tasks from performing quality control on data to constructing and manipulating the genomic and pedigree-based relationship matrices and obtaining their inverses. Such matrices will be then used in downstream analyses by other statistical packages. The package also enables users to obtain predicted values for unobserved individuals based on the genetic values of observed related individuals. GenoMatrix is available to the research community as a Windows 64bit executable and can be downloaded free of charge at: http://compbio.ufl.edu/software/genomatrix/. © The American Genetic Association. 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. A New Approach to Predict Microbial Community Assembly and Function Using a Stochastic, Genome-Enabled Modeling Framework

    Science.gov (United States)

    King, E.; Brodie, E.; Anantharaman, K.; Karaoz, U.; Bouskill, N.; Banfield, J. F.; Steefel, C. I.; Molins, S.

    2016-12-01

    Characterizing and predicting the microbial and chemical compositions of subsurface aquatic systems necessitates an understanding of the metabolism and physiology of organisms that are often uncultured or studied under conditions not relevant for one's environment of interest. Cultivation-independent approaches are therefore important and have greatly enhanced our ability to characterize functional microbial diversity. The capability to reconstruct genomes representing thousands of populations from microbial communities using metagenomic techniques provides a foundation for development of predictive models for community structure and function. Here, we discuss a genome-informed stochastic trait-based model incorporated into a reactive transport framework to represent the activities of coupled guilds of hypothetical microorganisms. Metabolic pathways for each microbe within a functional guild are parameterized from metagenomic data with a unique combination of traits governing organism fitness under dynamic environmental conditions. We simulate the thermodynamics of coupled electron donor and acceptor reactions to predict the energy available for cellular maintenance, respiration, biomass development, and enzyme production. While `omics analyses can now characterize the metabolic potential of microbial communities, it is functionally redundant as well as computationally prohibitive to explicitly include the thousands of recovered organisms into biogeochemical models. However, one can derive potential metabolic pathways from genomes along with trait-linkages to build probability distributions of traits. These distributions are used to assemble groups of microbes that couple one or more of these pathways. From the initial ensemble of microbes, only a subset will persist based on the interaction of their physiological and metabolic traits with environmental conditions, competing organisms, etc. Here, we analyze the predicted niches of these hypothetical microbes and

  9. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes.

    Science.gov (United States)

    Régnier, Mireille; Chassignet, Philippe

    2016-01-01

    Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences.

  10. The Micronutrient Genomics Project: A community-driven knowledge base for micronutrient research

    NARCIS (Netherlands)

    Ommen, B. van; El-Sohemy, A.; Hesketh, J.; Kaput, J.; Fenech, M.; Evelo, C.T.; McArdle, H.J.; Bouwman, J.; Lietz, G.; Mathers, J.C.; Fairweather-Tait, S.; Kranen, H. van; Elliott, R.; Wopereis, S.; Ferguson, L.R.; Méplan, C.; Perozzi, G.; Allen, L.; Rivero, D.

    2010-01-01

    Micronutrients influence multiple metabolic pathways including oxidative and inflammatory processes. Optimum micronutrient supply is important for the maintenance of homeostasis in metabolism and, ultimately, for maintaining good health. With advances in systems biology and genomics technologies, it

  11. The Micronutrient Genomics Project: A community-driven knowledge base for micronutrient research

    NARCIS (Netherlands)

    Ommen, B. van; El-Sohemy, A.; Hesketh, J.; Kaput, J.; Fenech, M.; Evelo, C.T.; McArdle, H.J.; Bouwman, J.; Lietz, G.; Mathers, J.C.; Fairweather-Tait, S.; Kranen, H. van; Elliott, R.; Wopereis, S.; Ferguson, L.R.; Méplan, C.; Perozzi, G.; Allen, L.; Rivero, D.

    2010-01-01

    Micronutrients influence multiple metabolic pathways including oxidative and inflammatory processes. Optimum micronutrient supply is important for the maintenance of homeostasis in metabolism and, ultimately, for maintaining good health. With advances in systems biology and genomics technologies, it

  12. Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

    Science.gov (United States)

    Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K

    2015-01-01

    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set

  13. Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts

    Science.gov (United States)

    Pérez-Cabal, M. Angeles; Vazquez, Ana I.; Gianola, Daniel; Rosa, Guilherme J. M.; Weigel, Kent A.

    2012-01-01

    The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered. PMID:22403583

  14. Traumatic Brain Injury Induces Genome-Wide Transcriptomic, Methylomic, and Network Perturbations in Brain and Blood Predicting Neurological Disorders

    Directory of Open Access Journals (Sweden)

    Qingying Meng

    2017-02-01

    Full Text Available The complexity of the traumatic brain injury (TBI pathology, particularly concussive injury, is a serious obstacle for diagnosis, treatment, and long-term prognosis. Here we utilize modern systems biology in a rodent model of concussive injury to gain a thorough view of the impact of TBI on fundamental aspects of gene regulation, which have the potential to drive or alter the course of the TBI pathology. TBI perturbed epigenomic programming, transcriptional activities (expression level and alternative splicing, and the organization of genes in networks centered around genes such as Anax2, Ogn, and Fmod. Transcriptomic signatures in the hippocampus are involved in neuronal signaling, metabolism, inflammation, and blood function, and they overlap with those in leukocytes from peripheral blood. The homology between genomic signatures from blood and brain elicited by TBI provides proof of concept information for development of biomarkers of TBI based on composite genomic patterns. By intersecting with human genome-wide association studies, many TBI signature genes and network regulators identified in our rodent model were causally associated with brain disorders with relevant link to TBI. The overall results show that concussive brain injury reprograms genes which could lead to predisposition to neurological and psychiatric disorders, and that genomic information from peripheral leukocytes has the potential to predict TBI pathogenesis in the brain.

  15. Traumatic Brain Injury Induces Genome-Wide Transcriptomic, Methylomic, and Network Perturbations in Brain and Blood Predicting Neurological Disorders.

    Science.gov (United States)

    Meng, Qingying; Zhuang, Yumei; Ying, Zhe; Agrawal, Rahul; Yang, Xia; Gomez-Pinilla, Fernando

    2017-02-01

    The complexity of the traumatic brain injury (TBI) pathology, particularly concussive injury, is a serious obstacle for diagnosis, treatment, and long-term prognosis. Here we utilize modern systems biology in a rodent model of concussive injury to gain a thorough view of the impact of TBI on fundamental aspects of gene regulation, which have the potential to drive or alter the course of the TBI pathology. TBI perturbed epigenomic programming, transcriptional activities (expression level and alternative splicing), and the organization of genes in networks centered around genes such as Anax2, Ogn, and Fmod. Transcriptomic signatures in the hippocampus are involved in neuronal signaling, metabolism, inflammation, and blood function, and they overlap with those in leukocytes from peripheral blood. The homology between genomic signatures from blood and brain elicited by TBI provides proof of concept information for development of biomarkers of TBI based on composite genomic patterns. By intersecting with human genome-wide association studies, many TBI signature genes and network regulators identified in our rodent model were causally associated with brain disorders with relevant link to TBI. The overall results show that concussive brain injury reprograms genes which could lead to predisposition to neurological and psychiatric disorders, and that genomic information from peripheral leukocytes has the potential to predict TBI pathogenesis in the brain.

  16. COMPARISON OF TREND PROJECTION METHODS AND BACKPROPAGATION PROJECTIONS METHODS TREND IN PREDICTING THE NUMBER OF VICTIMS DIED IN TRAFFIC ACCIDENT IN TIMOR TENGAH REGENCY, NUSA TENGGARA

    Directory of Open Access Journals (Sweden)

    Aleksius Madu

    2016-10-01

    Full Text Available The purpose of this study is to predict the number of traffic accident victims who died in Timor Tengah Regency with Trend Projection method and Backpropagation method, and compare the two methods based on the degree of guilt and predict the number traffic accident victims in the Timor Tengah Regency for the coming year. This research was conducted in Timor Tengah Regency where data used in this study was obtained from Police Unit in Timor Tengah Regency. The data is on the number of traffic accidents in Timor Tengah Regency from 2000 – 2013, which is obtained by a quantitative analysis with Trend Projection and Backpropagation method. The results of the data analysis predicting the number of traffic accidents victims using Trend Projection method obtained the best model which is the quadratic trend model with equation Yk = 39.786 + (3.297 X + (0.13 X2. Whereas by using back propagation method, it is obtained the optimum network that consists of 2 inputs, 3 hidden screens, and 1 output. Based on the error rates obtained, Back propagation method is better than the Trend Projection method which means that the predicting accuracy with Back propagation method is the best method to predict the number of traffic accidents victims in Timor Tengah Regency. Thus obtained predicting the numbers of traffic accident victims for the next 5 years (Years 2014-2018 respectively - are 106 person, 115 person, 115 person, 119 person and 120 person.   Keywords: Trend Projection, Back propagation, Predicting.

  17. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale

    Science.gov (United States)

    Zhou, Tianyin; Yang, Lin; Lu, Yan; Dror, Iris; Dantas Machado, Ana Carolina; Ghane, Tahereh; Di Felice, Rosa; Rohs, Remo

    2013-01-01

    We present a method and web server for predicting DNA structural features in a high-throughput (HT) manner for massive sequence data. This approach provides the framework for the integration of DNA sequence and shape analyses in genome-wide studies. The HT methodology uses a sliding-window approach to mine DNA structural information obtained from Monte Carlo simulations. It requires only nucleotide sequence as input and instantly predicts multiple structural features of DNA (minor groove width, roll, propeller twist and helix twist). The results of rigorous validations of the HT predictions based on DNA structures solved by X-ray crystallography and NMR spectroscopy, hydroxyl radical cleavage data, statistical analysis and cross-validation, and molecular dynamics simulations provide strong confidence in this approach. The DNAshape web server is freely available at http://rohslab.cmb.usc.edu/DNAshape/. PMID:23703209

  18. The study of neural tube defects after the Human Genome Project and folic acid fortification of foods.

    Science.gov (United States)

    Graf, W D; Oleinik, O E

    2000-12-01

    The implementation of folic acid fortification will eliminate a proportion of neural tube defects (NTD). As a result, the etiologic and clinical profiles of the developmental disorder may both change. In the assessment of NTD as it evolves, the bioinformatics structure and content of the Human Genome Project will find vital application. One important development will be an enhanced understanding of the role of folic acid in global regulation of gene expression through epigenetic processes. In addition, bioinformatics will facilitate coordination of research in the basic sciences with clinical investigations to better define remaining etiologic factors.

  19. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  20. Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets

    DEFF Research Database (Denmark)

    Ali, Amjad; Naz, Anam; Soares, Siomar C.

    2015-01-01

    . Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome...

  1. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project.

    Science.gov (United States)

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-02-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen.

  2. Psychological and behavioural impact of returning personal results from whole-genome sequencing: the HealthSeq project

    Science.gov (United States)

    Sanderson, Saskia C; Linderman, Michael D; Suckiel, Sabrina A; Zinberg, Randi; Wasserstein, Melissa; Kasarskis, Andrew; Diaz, George A; Schadt, Eric E

    2017-01-01

    Providing ostensibly healthy individuals with personal results from whole-genome sequencing could lead to improved health and well-being via enhanced disease risk prediction, prevention, and diagnosis, but also poses practical and ethical challenges. Understanding how individuals react psychologically and behaviourally will be key in assessing the potential utility of personal whole-genome sequencing. We conducted an exploratory longitudinal cohort study in which quantitative surveys and in-depth qualitative interviews were conducted before and after personal results were returned to individuals who underwent whole-genome sequencing. The participants were offered a range of interpreted results, including Alzheimer's disease, type 2 diabetes, pharmacogenomics, rare disease-associated variants, and ancestry. They were also offered their raw data. Of the 35 participants at baseline, 29 (82.9%) completed the 6-month follow-up. In the quantitative surveys, test-related distress was low, although it was higher at 1-week than 6-month follow-up (Z=2.68, P=0.007). In the 6-month qualitative interviews, most participants felt happy or relieved about their results. A few were concerned, particularly about rare disease-associated variants and Alzheimer's disease results. Two of the 29 participants had sought clinical follow-up as a direct or indirect consequence of rare disease-associated variants results. Several had mentioned their results to their doctors. Some participants felt having their raw data might be medically useful to them in the future. The majority reported positive reactions to having their genomes sequenced, but there were notable exceptions to this. The impact and value of returning personal results from whole-genome sequencing when implemented on a larger scale remains to be seen. PMID:28051073

  3. Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes

    Science.gov (United States)

    Oh, Jung Hun; Kerns, Sarah; Ostrer, Harry; Powell, Simon N.; Rosenstein, Barry; Deasy, Joseph O.

    2017-02-01

    The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.

  4. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  5. The Arab genome: Health and wealth.

    Science.gov (United States)

    Zayed, Hatem

    2016-11-01

    The 22 Arab nations have a unique genetic structure, which reflects both conserved and diverse gene pools due to the prevalent endogamous and consanguineous marriage culture and the long history of admixture among different ethnic subcultures descended from the Asian, European, and African continents. Human genome sequencing has enabled large-scale genomic studies of different populations and has become a powerful tool for studying disease predictions and diagnosis. Despite the importance of the Arab genome for better understanding the dynamics of the human genome, discovering rare genetic variations, and studying early human migration out of Africa, it is poorly represented in human genome databases, such as HapMap and the 1000 Genomes Project. In this review, I demonstrate the significance of sequencing the Arab genome and setting an Arab genome reference(s) for better understanding the molecular pathogenesis of genetic diseases, discovering novel/rare variants, and identifying a meaningful genotype-phenotype correlation for complex diseases.

  6. Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers

    DEFF Research Database (Denmark)

    Su, Guosheng; Christensen, Ole Fredslund; Ostersen, Tage;

    2012-01-01

    Non-additive genetic variation is usually ignored when genome-wide markers are used to study the genetic architecture and genomic prediction of complex traits in human, wild life, model organisms or farm animals. However, non-additive genetic effects may have an important contribution to total...... genetic variation of complex traits. This study presented a genomic BLUP model including additive and non-additive genetic effects, in which additive and non-additive genetic relation matrices were constructed from information of genome-wide dense single nucleotide polymorphism (SNP) markers. In addition...... of genomic predictions for daily gain in pigs. In the analysis of daily gain, four linear models were used: 1) a simple additive genetic model (MA), 2) a model including both additive and additive by additive epistatic genetic effects (MAE), 3) a model including both additive and dominance genetic effects...

  7. Comparison on genomic predictions using GBLUP models and two single-step blending methods with different relationship matrices in the Nordic Holstein population

    DEFF Research Database (Denmark)

    Gao, Hongding; Christensen, Ole Fredslund; Madsen, Per

    2012-01-01

    Background A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may...... not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16......) a simple GBLUP method, 2) a GBLUP method with a polygenic effect, 3) an adjusted GBLUP method with a polygenic effect, 4) a single-step blending method, and 5) an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted...

  8. Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix.

    Directory of Open Access Journals (Sweden)

    Zhe Zhang

    Full Text Available BACKGROUND: With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest. METHODOLOGY/PRINCIPAL FINDINGS: In the framework of mixed model equations, a new best linear unbiased prediction (BLUP method including a trait-specific relationship matrix (TA was presented and termed TABLUP. The TA matrix was constructed on the basis of marker genotypes and their weights in relation to the trait of interest. A simulation study with 1,000 individuals as the training population and five successive generations as candidate population was carried out to validate the proposed method. The proposed TABLUP method outperformed the ridge regression BLUP (RRBLUP and BLUP with realized relationship matrix (GBLUP. It performed slightly worse than BayesB with an accuracy of 0.79 in the standard scenario. CONCLUSIONS/SIGNIFICANCE: The proposed TABLUP method is an improvement of the RRBLUP and GBLUP method. It might be equivalent to the BayesB method but it has additional benefits like the calculation of accuracies for individual breeding values. The results also showed that the TA-matrix performs better in predicting ability than the classical numerator relationship matrix and the realized relationship matrix which are derived solely from pedigree or markers without regard to the trait. This is because the TA-matrix not only accounts for the Mendelian sampling term, but also puts the greater emphasis on those markers that

  9. A unified and comprehensible view of parametric and kernel methods for genomic prediction with application to rice

    Directory of Open Access Journals (Sweden)

    Laval Jacquin

    2016-08-01

    Full Text Available One objective of this study was to provide readers with a clear and unified understanding ofparametric statistical and kernel methods, used for genomic prediction, and to compare some ofthese in the context of rice breeding for quantitative traits. Furthermore, another objective wasto provide a simple and user-friendly R package, named KRMM, which allows users to performRKHS regression with several kernels. After introducing the concept of regularized empiricalrisk minimization, the connections between well-known parametric and kernel methods suchas Ridge regression (i.e. genomic best linear unbiased predictor (GBLUP and reproducingkernel Hilbert space (RKHS regression were reviewed. Ridge regression was then reformulatedso as to show and emphasize the advantage of the kernel trick concept, exploited by kernelmethods in the context of epistatic genetic architectures, over parametric frameworks used byconventional methods. Some parametric and kernel methods; least absolute shrinkage andselection operator (LASSO, GBLUP, support vector machine regression (SVR and RKHSregression were thereupon compared for their genomic predictive ability in the context of ricebreeding using three real data sets. Among the compared methods, RKHS regression and SVRwere often the most accurate methods for prediction followed by GBLUP and LASSO. An Rfunction which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression,with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time hasbeen developed. Moreover, a modified version of this function, which allows users to tune kernelsfor RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.

  10. Accuracy of genomic prediction of purebreds for cross bred performance in pigs

    NARCIS (Netherlands)

    Marubayashi Hidalgo, Andre; Bastiaansen, J.W.M.; Lopes, M.S.; Calus, M.P.L.; Koning, de D.J.

    2016-01-01

    In pig breeding, as the final product is a cross bred (CB) animal, the goal is to increase the CB performance. This goal requires different strategies for the implementation of genomic selection from what is currently implemented in, for example dairy cattle breeding. A good strategy is to estima

  11. Bias due to selective genotyping in genomic prediction using H-BLUP

    DEFF Research Database (Denmark)

    Wang, Lei; Madsen, Per; Sapp, Robyn

    H-BLUP uses a variance-covariance structure based on a combined relationship matrix (H), which augments a pedigree-based relationship matrix (A) with a genomic relationship matrix (G) for genotyped individuals. In practice, often only preselected individuals are genotyped and this selective genot...

  12. Gross genomic damage measured by DNA image cytometry independently predicts gastric cancer patient survival

    NARCIS (Netherlands)

    Belien, J.A.M.; Buffart, T.E.; Gill, A.; Broeckaert, M.A.M.; Quirke, P.; Meijer, G.A.; Grabsch, H.

    2009-01-01

    BACKGROUND: DNA aneuploidy reflects gross genomic changes. It can be measured by flow cytometry (FCM-DNA) or image cytometry (ICM-DNA). In gastric cancer, the prevalence of DNA aneuploidy has been reported to range from 27 to 100%, with conflicting associations with clinicopathological variables. Th

  13. Prediction of total genetic value using genome-wide dense marker maps

    NARCIS (Netherlands)

    Meuwissen, T.H.; Hayes, B.J.; Goddard, M.E.

    2001-01-01

    Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of ∼50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was

  14. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sukosd, Zsuzsanna; Andersen, Ebbe S.; Seemann, Stefan E.

    2015-01-01

    protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential...

  15. The role of genomics in the identification, prediction, and prevention of biological threats.

    Science.gov (United States)

    Fricke, W Florian; Rasko, David A; Ravel, Jacques

    2009-10-01

    In all likelihood, it is only a matter of time before our public health system will face a major biological threat, whether intentionally dispersed or originating from a known or newly emerging infectious disease. It is necessary not only to increase our reactive "biodefense," but also to be proactive and increase our preparedness. To achieve this goal, it is essential that the scientific and public health communities fully embrace the genomic revolution, and that novel bioinformatic and computing tools necessary to make great strides in our understanding of these novel and emerging threats be developed. Genomics has graduated from a specialized field of science to a research tool that soon will be routine in research laboratories and clinical settings. Because the technology is becoming more affordable, genomics can and should be used proactively to build our preparedness and responsiveness to biological threats. All pieces, including major continued funding, advances in next-generation sequencing technologies, bioinformatics infrastructures, and open access to data and metadata, are being set in place for genomics to play a central role in our public health system.

  16. The role of genomics in the identification, prediction, and prevention of biological threats.

    Directory of Open Access Journals (Sweden)

    W Florian Fricke

    2009-10-01

    Full Text Available In all likelihood, it is only a matter of time before our public health system will face a major biological threat, whether intentionally dispersed or originating from a known or newly emerging infectious disease. It is necessary not only to increase our r