WorldWideScience

Sample records for genomic islands predict

  1. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    Science.gov (United States)

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  2. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

    Directory of Open Access Journals (Sweden)

    Surovcik Katharina

    2006-03-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired

  3. Genomic islands predict functional adaptation in marine actinobacteria

    Energy Technology Data Exchange (ETDEWEB)

    Penn, Kevin; Jenkins, Caroline; Nett, Markus; Udwary, Daniel; Gontang, Erin; McGlinchey, Ryan; Foster, Brian; Lapidus, Alla; Podell, Sheila; Allen, Eric; Moore, Bradley; Jensen, Paul

    2009-04-01

    Linking functional traits to bacterial phylogeny remains a fundamental but elusive goal of microbial ecology 1. Without this information, it becomes impossible to resolve meaningful units of diversity and the mechanisms by which bacteria interact with each other and adapt to environmental change. Ecological adaptations among bacterial populations have been linked to genomic islands, strain-specific regions of DNA that house functionally adaptive traits 2. In the case of environmental bacteria, these traits are largely inferred from bioinformatic or gene expression analyses 2, thus leaving few examples in which the functions of island genes have been experimentally characterized. Here we report the complete genome sequences of Salinispora tropica and S. arenicola, the first cultured, obligate marine Actinobacteria 3. These two species inhabit benthic marine environments and dedicate 8-10percent of their genomes to the biosynthesis of secondary metabolites. Despite a close phylogenetic relationship, 25 of 37 secondary metabolic pathways are species-specific and located within 21 genomic islands, thus providing new evidence linking secondary metabolism to ecological adaptation. Species-specific differences are also observed in CRISPR sequences, suggesting that variations in phage immunity provide fitness advantages that contribute to the cosmopolitan distribution of S. arenicola 4. The two Salinispora genomes have evolved by complex processes that include the duplication and acquisition of secondary metabolite genes, the products of which provide immediate opportunities for molecular diversification and ecological adaptation. Evidence that secondary metabolic pathways are exchanged by Horizontal Gene Transfer (HGT) yet are fixed among globally distributed populations 5 supports a functional role for their products and suggests that pathway acquisition represents a previously unrecognized force driving bacterial diversification

  4. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  5. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  6. Defense islands in bacterial and archaeal genomes and prediction of novel defense systems.

    Science.gov (United States)

    Makarova, Kira S; Wolf, Yuri I; Snir, Sagi; Koonin, Eugene V

    2011-11-01

    The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) in bacterial and archaeal genomes and demonstrated statistically significant clustering of antivirus defense systems and mobile genes and elements in genomic islands. The defense islands are enriched in putative operons and contain numerous overrepresented gene families. A detailed sequence analysis of the proteins encoded by genes in these families shows that many of them are diverged variants of known defense system components, whereas others show features, such as characteristic operonic organization, that are suggestive of novel defense systems. Thus, genomic islands provide abundant material for the experimental study of bacterial and archaeal antivirus defense. Except for the CRISPR-Cas systems, different classes of defense systems, in particular toxin-antitoxin and restriction-modification systems, show nonrandom clustering in defense islands. It remains unclear to what extent these associations reflect functional cooperation between different defense systems and to what extent the islands are genomic "sinks" that accumulate diverse nonessential genes, particularly those acquired via horizontal gene transfer. The characteristics of defense islands resemble those of mobilome islands. Defense and mobilome genes are nonrandomly associated in islands, suggesting nonadaptive evolution of the islands via a preferential attachment-like mechanism underpinned by the addictive properties of defense systems such as toxins-antitoxins and an important role of horizontal mobility in the evolution of these islands.

  7. Defense Islands in Bacterial and Archaeal Genomes and Prediction of Novel Defense Systems ▿†‡

    Science.gov (United States)

    Makarova, Kira S.; Wolf, Yuri I.; Snir, Sagi; Koonin, Eugene V.

    2011-01-01

    The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) in bacterial and archaeal genomes and demonstrated statistically significant clustering of antivirus defense systems and mobile genes and elements in genomic islands. The defense islands are enriched in putative operons and contain numerous overrepresented gene families. A detailed sequence analysis of the proteins encoded by genes in these families shows that many of them are diverged variants of known defense system components, whereas others show features, such as characteristic operonic organization, that are suggestive of novel defense systems. Thus, genomic islands provide abundant material for the experimental study of bacterial and archaeal antivirus defense. Except for the CRISPR-Cas systems, different classes of defense systems, in particular toxin-antitoxin and restriction-modification systems, show nonrandom clustering in defense islands. It remains unclear to what extent these associations reflect functional cooperation between different defense systems and to what extent the islands are genomic “sinks” that accumulate diverse nonessential genes, particularly those acquired via horizontal gene transfer. The characteristics of defense islands resemble those of mobilome islands. Defense and mobilome genes are nonrandomly associated in islands, suggesting nonadaptive evolution of the islands via a preferential attachment-like mechanism underpinned by the addictive properties of defense systems such as toxins-antitoxins and an important role of horizontal mobility in the evolution of these islands. PMID:21908672

  8. Unsupervised statistical identification of genomic islands using ...

    Indian Academy of Sciences (India)

    Vibrio species. These investigations lead to observations that are of evolutionary ... Identification of genomic islands in prokaryotic genomes has received considerable attention in the literature due to .... For instance, selective pres- sures as a ...

  9. Genomic prediction using subsampling

    OpenAIRE

    Xavier, Alencar; Xu, Shizhong; Muir, William; Rainey, Katy Martin

    2017-01-01

    Background Genome-wide assisted selection is a critical tool for the?genetic improvement of plants and animals. Whole-genome regression models in Bayesian framework represent the main family of prediction methods. Fitting such models with a large number of observations involves a prohibitive computational burden. We propose the use of subsampling bootstrap Markov chain in genomic prediction. Such method consists of fitting whole-genome regression models by subsampling observations in each rou...

  10. Genomic island excisions in Bordetella petrii

    Directory of Open Access Journals (Sweden)

    Levillain Erwan

    2009-07-01

    Full Text Available Abstract Background Among the members of the genus Bordetella B. petrii is unique, since it is the only species isolated from the environment, while the pathogenic Bordetellae are obligately associated with host organisms. Another feature distinguishing B. petrii from the other sequenced Bordetellae is the presence of a large number of mobile genetic elements including several large genomic regions with typical characteristics of genomic islands collectively known as integrative and conjugative elements (ICEs. These elements mainly encode accessory metabolic factors enabling this bacterium to grow on a large repertoire of aromatic compounds. Results During in vitro culture of Bordetella petrii colony variants appear frequently. We show that this variability can be attributed to the presence of a large number of metastable mobile genetic elements on its chromosome. In fact, the genome sequence of B. petrii revealed the presence of at least seven large genomic islands mostly encoding accessory metabolic functions involved in the degradation of aromatic compounds and detoxification of heavy metals. Four of these islands (termed GI1 to GI3 and GI6 are highly related to ICEclc of Pseudomonas knackmussii sp. strain B13. Here we present first data about the molecular characterization of these islands. We defined the exact borders of each island and we show that during standard culture of the bacteria these islands get excised from the chromosome. For all but one of these islands (GI5 we could detect circular intermediates. For the clc-like elements GI1 to GI3 of B. petrii we provide evidence that tandem insertion of these islands which all encode highly related integrases and attachment sites may also lead to incorporation of genomic DNA which originally was not part of the island and to the formation of huge composite islands. By integration of a tetracycline resistance cassette into GI3 we found this island to be rather unstable and to be lost from

  11. Genomic prediction using subsampling.

    Science.gov (United States)

    Xavier, Alencar; Xu, Shizhong; Muir, William; Rainey, Katy Martin

    2017-03-24

    Genome-wide assisted selection is a critical tool for the genetic improvement of plants and animals. Whole-genome regression models in Bayesian framework represent the main family of prediction methods. Fitting such models with a large number of observations involves a prohibitive computational burden. We propose the use of subsampling bootstrap Markov chain in genomic prediction. Such method consists of fitting whole-genome regression models by subsampling observations in each round of a Markov Chain Monte Carlo. We evaluated the effect of subsampling bootstrap on prediction and computational parameters. Across datasets, we observed an optimal subsampling proportion of observations around 50% with replacement, and around 33% without replacement. Subsampling provided a substantial decrease in computation time, reducing the time to fit the model by half. On average, losses on predictive properties imposed by subsampling were negligible, usually below 1%. For each dataset, an optimal subsampling point that improves prediction properties was observed, but the improvements were also negligible. Combining subsampling with Gibbs sampling is an interesting ensemble algorithm. The investigation indicates that the subsampling bootstrap Markov chain algorithm substantially reduces computational burden associated with model fitting, and it may slightly enhance prediction properties.

  12. CpG island mapping by epigenome prediction.

    Directory of Open Access Journals (Sweden)

    Christoph Bock

    2007-06-01

    Full Text Available CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1 reliance on arbitrary threshold parameters that bear little biological justification, (2 failure to account for widespread heterogeneity among CpG islands, and (3 apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of "CpG island strength" that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted "bona fide" CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to

  13. CpG island mapping by epigenome prediction.

    Science.gov (United States)

    Bock, Christoph; Walter, Jörn; Paulsen, Martina; Lengauer, Thomas

    2007-06-01

    CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1) reliance on arbitrary threshold parameters that bear little biological justification, (2) failure to account for widespread heterogeneity among CpG islands, and (3) apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of "CpG island strength" that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted "bona fide" CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to their characteristic

  14. Genomic Prediction in Barley

    DEFF Research Database (Denmark)

    Edriss, Vahid; Cericola, Fabio; Jensen, Jens D

    2015-01-01

    to next generation. The main goal of this study was to see the potential of using genomic prediction in a commercial Barley breeding program. The data used in this study was from Nordic Seed company which is located in Denmark. Around 350 advanced lines were genotyped with 9K Barely chip from Illumina....... Traits used in this study were grain yield, plant height and heading date. Heading date is number days it takes after 1st June for plant to head. Heritabilities were 0.33, 0.44 and 0.48 for yield, height and heading, respectively for the average of nine plots. The GBLUP model was used for genomic...

  15. SIGI: score-based identification of genomic islands

    Directory of Open Access Journals (Sweden)

    Merkl Rainer

    2004-03-01

    Full Text Available Abstract Background Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired via horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands. Results A scoring scheme on codon frequencies Score_G1G2(cdn = log(f_G2(cdn / f_G1(cdn was utilized. To analyse genes of a species G1 and to test their relatedness to species G2, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position G2. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration. Conclusions The method reliably allows to identify genomic island and the likely origin of alien genes.

  16. Statistical analyses of conserved features of genomic islands in bacteria.

    Science.gov (United States)

    Guo, F-B; Xia, Z-K; Wei, W; Zhao, H-L

    2014-03-17

    We performed statistical analyses of five conserved features of genomic islands of bacteria. Analyses were made based on 104 known genomic islands, which were identified by comparative methods. Four of these features include sequence size, abnormal G+C content, flanking tRNA gene, and embedded mobility gene, which are frequently investigated. One relatively new feature, G+C homogeneity, was also investigated. Among the 104 known genomic islands, 88.5% were found to fall in the typical length of 10-200 kb and 80.8% had G+C deviations with absolute values larger than 2%. For the 88 genomic islands whose hosts have been sequenced and annotated, 52.3% of them were found to have flanking tRNA genes and 64.7% had embedded mobility genes. For the homogeneity feature, 85% had an h homogeneity index less than 0.1, indicating that their G+C content is relatively uniform. Taking all the five features into account, 87.5% of 88 genomic islands had three of them. Only one genomic island had only one conserved feature and none of the genomic islands had zero features. These statistical results should help to understand the general structure of known genomic islands. We found that larger genomic islands tend to have relatively small G+C deviations relative to absolute values. For example, the absolute G+C deviations of 9 genomic islands longer than 100,000 bp were all less than 5%. This is a novel but reasonable result given that larger genomic islands should have greater restrictions in their G+C contents, in order to maintain the stable G+C content of the recipient genome.

  17. Genome Island: A Virtual Science Environment in Second Life

    Science.gov (United States)

    Clark, Mary Anne

    2009-01-01

    Mary Anne CLark describes the organization and uses of Genome Island, a virtual laboratory complex constructed in Second Life. Genome Island was created for teaching genetics to university undergraduates but also provides a public space where anyone interested in genetics can spend a few minutes, or a few hours, interacting with genetic…

  18. Evolutionary forces shaping genomic islands of population differentiation in humans

    Directory of Open Access Journals (Sweden)

    Hofer Tamara

    2012-03-01

    Full Text Available Abstract Background Levels of differentiation among populations depend both on demographic and selective factors: genetic drift and local adaptation increase population differentiation, which is eroded by gene flow and balancing selection. We describe here the genomic distribution and the properties of genomic regions with unusually high and low levels of population differentiation in humans to assess the influence of selective and neutral processes on human genetic structure. Methods Individual SNPs of the Human Genome Diversity Panel (HGDP showing significantly high or low levels of population differentiation were detected under a hierarchical-island model (HIM. A Hidden Markov Model allowed us to detect genomic regions or islands of high or low population differentiation. Results Under the HIM, only 1.5% of all SNPs are significant at the 1% level, but their genomic spatial distribution is significantly non-random. We find evidence that local adaptation shaped high-differentiation islands, as they are enriched for non-synonymous SNPs and overlap with previously identified candidate regions for positive selection. Moreover there is a negative relationship between the size of islands and recombination rate, which is stronger for islands overlapping with genes. Gene ontology analysis supports the role of diet as a major selective pressure in those highly differentiated islands. Low-differentiation islands are also enriched for non-synonymous SNPs, and contain an overly high proportion of genes belonging to the 'Oncogenesis' biological process. Conclusions Even though selection seems to be acting in shaping islands of high population differentiation, neutral demographic processes might have promoted the appearance of some genomic islands since i as much as 20% of islands are in non-genic regions ii these non-genic islands are on average two times shorter than genic islands, suggesting a more rapid erosion by recombination, and iii most loci are

  19. A quantitative account of genomic island acquisitions in prokaryotes

    Directory of Open Access Journals (Sweden)

    Roos Tom E

    2011-08-01

    Full Text Available Abstract Background Microbial genomes do not merely evolve through the slow accumulation of mutations, but also, and often more dramatically, by taking up new DNA in a process called horizontal gene transfer. These innovation leaps in the acquisition of new traits can take place via the introgression of single genes, but also through the acquisition of large gene clusters, which are termed Genomic Islands. Since only a small proportion of all the DNA diversity has been sequenced, it can be hard to find the appropriate donors for acquired genes via sequence alignments from databases. In contrast, relative oligonucleotide frequencies represent a remarkably stable genomic signature in prokaryotes, which facilitates compositional comparisons as an alignment-free alternative for phylogenetic relatedness. In this project, we test whether Genomic Islands identified in individual bacterial genomes have a similar genomic signature, in terms of relative dinucleotide frequencies, and can therefore be expected to originate from a common donor species. Results When multiple Genomic Islands are present within a single genome, we find that up to 28% of these are compositionally very similar to each other, indicative of frequent recurring acquisitions from the same donor to the same acceptor. Conclusions This represents the first quantitative assessment of common directional transfer events in prokaryotic evolutionary history. We suggest that many of the resident Genomic Islands per prokaryotic genome originated from the same source, which may have implications with respect to their regulatory interactions, and for the elucidation of the common origins of these acquired gene clusters.

  20. On detection and assessment of statistical significance of Genomic Islands

    Directory of Open Access Journals (Sweden)

    Chaudhuri Probal

    2008-04-01

    Full Text Available Abstract Background Many of the available methods for detecting Genomic Islands (GIs in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. Results Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. Conclusion The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods.

  1. Patterns and architecture of genomic islands in marine bacteria

    Directory of Open Access Journals (Sweden)

    Fernández-Gómez Beatriz

    2012-07-01

    Full Text Available Abstract Background Genomic Islands (GIs have key roles since they modulate the structure and size of bacterial genomes displaying a diverse set of laterally transferred genes. Despite their importance, GIs in marine bacterial genomes have not been explored systematically to uncover possible trends and to analyze their putative ecological significance. Results We carried out a comprehensive analysis of GIs in 70 selected marine bacterial genomes detected with IslandViewer to explore the distribution, patterns and functional gene content in these genomic regions. We detected 438 GIs containing a total of 8152 genes. GI number per genome was strongly and positively correlated with the total GI size. In 50% of the genomes analyzed the GIs accounted for approximately 3% of the genome length, with a maximum of 12%. Interestingly, we found transposases particularly enriched within Alphaproteobacteria GIs, and site-specific recombinases in Gammaproteobacteria GIs. We described specific Homologous Recombination GIs (HR-GIs in several genera of marine Bacteroidetes and in Shewanella strains among others. In these HR-GIs, we recurrently found conserved genes such as the β-subunit of DNA-directed RNA polymerase, regulatory sigma factors, the elongation factor Tu and ribosomal protein genes typically associated with the core genome. Conclusions Our results indicate that horizontal gene transfer mediated by phages, plasmids and other mobile genetic elements, and HR by site-specific recombinases play important roles in the mobility of clusters of genes between taxa and within closely related genomes, modulating the flexible pool of the genome. Our findings suggest that GIs may increase bacterial fitness under environmental changing conditions by acquiring novel foreign genes and/or modifying gene transcription and/or transduction.

  2. Genomic selection: genome-wide prediction in plant improvement.

    Science.gov (United States)

    Desta, Zeratsion Abera; Ortiz, Rodomiro

    2014-09-01

    Association analysis is used to measure relations between markers and quantitative trait loci (QTL). Their estimation ignores genes with small effects that trigger underpinning quantitative traits. By contrast, genome-wide selection estimates marker effects across the whole genome on the target population based on a prediction model developed in the training population (TP). Whole-genome prediction models estimate all marker effects in all loci and capture small QTL effects. Here, we review several genomic selection (GS) models with respect to both the prediction accuracy and genetic gain from selection. Phenotypic selection or marker-assisted breeding protocols can be replaced by selection, based on whole-genome predictions in which phenotyping updates the model to build up the prediction accuracy. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Proteus genomic island 1 (PGI1), a new resistance genomic island from two Proteus mirabilis French clinical isolates.

    Science.gov (United States)

    Siebor, Eliane; Neuwirth, Catherine

    2014-12-01

    To analyse the genetic environment of the antibiotic resistance genes in two clinical Proteus mirabilis isolates resistant to multiple antibiotics. PCR, gene walking and whole-genome sequencing were used to determine the sequence of the resistance regions, the surrounding genetic structure and the flanking chromosomal regions. A genomic island of 81.1 kb named Proteus genomic island 1 (PGI1) located at the 3'-end of trmE (formerly known as thdF) was characterized. The large MDR region of PGI1 (55.4 kb) included a class 1 integron (aadB and aadA2) and regions deriving from several transposons: Tn2 (blaTEM-135), Tn21, Tn6020-like transposon (aphA1b), a hybrid Tn502/Tn5053 transposon, Tn501, a hybrid Tn1696/Tn1721 transposon [tetA(A)] carrying a class 1 integron (aadA1) and Tn5393 (strA and strB). Several ISs were also present (IS4321, IS1R and IS26). The PGI1 backbone (25.7 kb) was identical to that identified in Salmonella Heidelberg SL476 and shared some identity with the Salmonella genomic island 1 (SGI1) backbone. An IS26-mediated recombination event caused the division of the MDR region into two parts separated by a large chromosomal DNA fragment of 197 kb, the right end of PGI1 and this chromosomal sequence being in inverse orientation. PGI1 is a new resistance genomic island from P. mirabilis belonging to the same island family as SGI1. The role of PGI1 in the spread of antimicrobial resistance genes among Enterobacteriaceae of medical importance needs to be evaluated. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. Genomic Islands: an overview of current software tools and future improvements

    Directory of Open Access Journals (Sweden)

    Soares Siomar de Castro

    2016-03-01

    Full Text Available Microbes are highly diverse and widely distributed organisms. They account for ~60% of Earth’s biomass and new predictions point for the existence of 1011 to 1012 species, which are constantly sharing genes through several different mechanisms. Genomic Islands (GI are critical in this context, as they are large regions acquired through horizontal gene transfer. Also, they present common features like genomic signature deviation, transposase genes, flanking tRNAs and insertion sequences. GIs carry large numbers of genes related to specific lifestyle and are commonly classified in Pathogenicity, Resistance, Metabolic or Symbiotic Islands. With the advent of the next-generation sequencing technologies and the deluge of genomic data, many software tools have been developed that aim to tackle the problem of GI prediction and they are all based on the prediction of GI common features. However, there is still room for the development of new software tools that implements new approaches, such as, machine learning and pangenomics based analyses. Finally, GIs will always hold a potential application in every newly invented genomic approach as they are directly responsible for much of the genomic plasticity of bacteria.

  5. Genomic Islands: an overview of current software tools and future improvements.

    Science.gov (United States)

    Soares, Siomar de Castro; Oliveira, Letícia de Castro; Jaiswal, Arun Kumar; Azevedo, Vasco

    2016-03-01

    Microbes are highly diverse and widely distributed organisms. They account for ~60% of Earth's biomass and new predictions point for the existence of 1011 to 1012 species, which are constantly sharing genes through several different mechanisms. Genomic Islands (GI) are critical in this context, as they are large regions acquired through horizontal gene transfer. Also, they present common features like genomic signature deviation, transposase genes, flanking tRNAs and insertion sequences. GIs carry large numbers of genes related to specific lifestyle and are commonly classified in Pathogenicity, Resistance, Metabolic or Symbiotic Islands. With the advent of the next-generation sequencing technologies and the deluge of genomic data, many software tools have been developed that aim to tackle the problem of GI prediction and they are all based on the prediction of GI common features. However, there is still room for the development of new software tools that implements new approaches, such as, machine learning and pangenomics based analyses. Finally, GIs will always hold a potential application in every newly invented genomic approach as they are directly responsible for much of the genomic plasticity of bacteria.

  6. Genome position specific priors for genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Lund, Mogens Sandø

    2012-01-01

    casual mutation is different between the populations but affects the same gene. Proportions of a four-distribution mixture for SNP effects in segments of fixed size along the genome are derived from one population and set as location specific prior proportions of distributions of SNP effects...... for the target population. The model was tested using dairy cattle populations of different breeds: 540 Australian Jersey bulls, 2297 Australian Holstein bulls and 5214 Nordic Holstein bulls. The traits studied were protein-, fat- and milk yield. Genotypic data was Illumina 777K SNPs, real or imputed Results...

  7. Study on the Mitochondrial Genome of Sea Island Cotton (Gossypium barbadense) by BAC Library Screening

    Institute of Scientific and Technical Information of China (English)

    SU Ai-guo; LI Shuang-shuang; LIU Guo-zheng; LEI Bin-bin; KANG Ding-ming; LI Zhao-hu; MA Zhi-ying; HUA Jin-ping

    2014-01-01

    The plant mitochondrial genome displays complex features, particularly in terms of cytoplasmic male sterility (CMS). Therefore, research on the cotton mitochondrial genome may provide important information for analyzing genome evolution and exploring the molecular mechanism of CMS. In this paper, we present a preliminary study on the mitochondrial genome of sea island cotton (Gossypium barbadense) based on positive clones from the bacterial artiifcial chromosome (BAC) library. Thirty-ifve primers designed with the conserved sequences of functional genes and exons of mitochondria were used to screen positive clones in the genome library of the sea island cotton variety called Pima 90-53. Ten BAC clones were obtained and veriifed for further study. A contig was obtained based on six overlapping clones and subsequently laid out primarily on the mitochondrial genome. One BAC clone, clone 6 harbored with the inserter of approximate 115 kb mtDNA sequence, in which more than 10 primers fragments could be ampliifed, was sequenced and assembled using the Solexa strategy. Fifteen mitochondrial functional genes were revealed in clone 6 by gene annotation. The characteristics of the syntenic gene/exon of the sequences and RNA editing were preliminarily predicted.

  8. Methyl-CpG island-associated genome signature tags

    Science.gov (United States)

    Dunn, John J

    2014-05-20

    Disclosed is a method for analyzing the organismic complexity of a sample through analysis of the nucleic acid in the sample. In the disclosed method, through a series of steps, including digestion with a type II restriction enzyme, ligation of capture adapters and linkers and digestion with a type IIS restriction enzyme, genome signature tags are produced. The sequences of a statistically significant number of the signature tags are determined and the sequences are used to identify and quantify the organisms in the sample. Various embodiments of the invention described herein include methods for using single point genome signature tags to analyze the related families present in a sample, methods for analyzing sequences associated with hyper- and hypo-methylated CpG islands, methods for visualizing organismic complexity change in a sampling location over time and methods for generating the genome signature tag profile of a sample of fragmented DNA.

  9. Accounting for discovery bias in genomic prediction

    Science.gov (United States)

    Our objective was to evaluate an approach to mitigating discovery bias in genomic prediction. Accuracy may be improved by placing greater emphasis on regions of the genome expected to be more influential on a trait. Methods emphasizing regions result in a phenomenon known as “discovery bias” if info...

  10. GAPIT: genome association and prediction integrated tool.

    Science.gov (United States)

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  11. Predictable evolution toward flightlessness in volant island birds.

    Science.gov (United States)

    Wright, Natalie A; Steadman, David W; Witt, Christopher C

    2016-04-26

    Birds are prolific colonists of islands, where they readily evolve distinct forms. Identifying predictable, directional patterns of evolutionary change in island birds, however, has proved challenging. The "island rule" predicts that island species evolve toward intermediate sizes, but its general applicability to birds is questionable. However, convergent evolution has clearly occurred in the island bird lineages that have undergone transitions to secondary flightlessness, a process involving drastic reduction of the flight muscles and enlargement of the hindlimbs. Here, we investigated whether volant island bird populations tend to change shape in a way that converges subtly on the flightless form. We found that island bird species have evolved smaller flight muscles than their continental relatives. Furthermore, in 366 populations of Caribbean and Pacific birds, smaller flight muscles and longer legs evolved in response to increasing insularity and, strikingly, the scarcity of avian and mammalian predators. On smaller islands with fewer predators, birds exhibited shifts in investment from forelimbs to hindlimbs that were qualitatively similar to anatomical rearrangements observed in flightless birds. These findings suggest that island bird populations tend to evolve on a trajectory toward flightlessness, even if most remain volant. This pattern was consistent across nine families and four orders that vary in lifestyle, foraging behavior, flight style, and body size. These predictable shifts in avian morphology may reduce the physical capacity for escape via flight and diminish the potential for small-island taxa to diversify via dispersal.

  12. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  13. High-density transcriptional initiation signals underline genomic islands in bacteria.

    Directory of Open Access Journals (Sweden)

    Qianli Huang

    Full Text Available Genomic islands (GIs, frequently associated with the pathogenicity of bacteria and having a substantial influence on bacterial evolution, are groups of "alien" elements which probably undergo special temporal-spatial regulation in the host genome. Are there particular hallmark transcriptional signals for these "exotic" regions? We here explore the potential transcriptional signals that underline the GIs beyond the conventional views on basic sequence composition, such as codon usage and GC property bias. It showed that there is a significant enrichment of the transcription start positions (TSPs in the GI regions compared to the whole genome of Salmonella enterica and Escherichia coli. There was up to a four-fold increase for the 70% GIs, implying high-density TSPs profile can potentially differentiate the GI regions. Based on this feature, we developed a new sliding window method GIST, Genomic-island Identification by Signals of Transcription, to identify these regions. Subsequently, we compared the known GI-associated features of the GIs detected by GIST and by the existing method Islandviewer to those of the whole genome. Our method demonstrates high sensitivity in detecting GIs harboring genes with biased GI-like function, preferred subcellular localization, skewed GC property, shorter gene length and biased "non-optimal" codon usage. The special transcriptional signals discovered here may contribute to the coordinate expression regulation of foreign genes. Finally, by using GIST, we detected many interesting GIs in the 2011 German E. coli O104:H4 outbreak strain TY-2482, including the microcin H47 system and gene cluster ycgXEFZ-ymgABC that activates the production of biofilm matrix. The aforesaid findings highlight the power of GIST to predict GIs with distinct intrinsic features to the genome. The heterogeneity of cumulative TSPs profiles may not only be a better identity for "alien" regions, but also provide hints to the special

  14. Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops.

    Science.gov (United States)

    Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi

    2016-01-01

    Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an "island model" inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic

  15. Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops.

    Directory of Open Access Journals (Sweden)

    Shiori Yabe

    Full Text Available Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS, which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an "island model" inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the

  16. Genomic Prediction of Barley Hybrid Performance

    Directory of Open Access Journals (Sweden)

    Norman Philipp

    2016-07-01

    Full Text Available Hybrid breeding in barley ( L. offers great opportunities to accelerate the rate of genetic improvement and to boost yield stability. A crucial requirement consists of the efficient selection of superior hybrid combinations. We used comprehensive phenotypic and genomic data from a commercial breeding program with the goal of examining the potential to predict the hybrid performances. The phenotypic data were comprised of replicated grain yield trials for 385 two-way and 408 three-way hybrids evaluated in up to 47 environments. The parental lines were genotyped using a 3k single nucleotide polymorphism (SNP array based on an Illumina Infinium assay. We implemented ridge regression best linear unbiased prediction modeling for additive and dominance effects and evaluated the prediction ability using five-fold cross validations. The prediction ability of hybrid performances based on general combining ability (GCA effects was moderate, amounting to 0.56 and 0.48 for two- and three-way hybrids, respectively. The potential of GCA-based hybrid prediction requires that both parental components have been evaluated in a hybrid background. This is not necessary for genomic prediction for which we also observed moderate cross-validated prediction abilities of 0.51 and 0.58 for two- and three-way hybrids, respectively. This exemplifies the potential of genomic prediction in hybrid barley. Interestingly, prediction ability using the two-way hybrids as training population and the three-way hybrids as test population or vice versa was low, presumably, because of the different genetic makeup of the parental source populations. Consequently, further research is needed to optimize genomic prediction approaches combining different source populations in barley.

  17. Predictive genomics: A cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data

    OpenAIRE

    Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor, Maureen

    2014-01-01

    We discuss a cancer hallmark network framework for modelling genome-sequencing data to predict cancer clonal evolution and associated clinical phenotypes. Strategies of using this framework in conjunction with genome sequencing data in an attempt to predict personalized drug targets, drug resistance, and metastasis for a cancer patient, as well as cancer risks for a healthy individual are discussed. Accurate prediction of cancer clonal evolution and clinical phenotypes will have substantial i...

  18. Development of genomic prediction in sorghum

    NARCIS (Netherlands)

    Hunt, Colleen H.; Eeuwijk, van Fred A.; Mace, Emma S.; Hayes, Ben J.; Jordan, David R.

    2018-01-01

    Genomic selection can increase the rate of genetic gain in plant breeding programs by shortening the breeding cycle. Gain can also be increased through higher selection intensities, as the size of the population available for selection can be increased by predicting performance of nonphenotyped, but

  19. Genomic prediction within and across biparental families

    NARCIS (Netherlands)

    Schopp, Pascal; Müller, Dominik; Wientjes, Yvonne C.J.; Melchinger, Albrecht E.

    2017-01-01

    A major application of genomic prediction (GP) in plant breeding is the identification of superior inbred lines within families derived from biparental crosses. When models for various traits were trained within related or unrelated biparental families (BPFs), experimental studies found substantial

  20. Genomic islands of divergence are not affected by geography of speciation in sunflowers.

    Science.gov (United States)

    Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H

    2013-01-01

    Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.

  1. Genomic Prediction of Sunflower Hybrids Oil Content

    Directory of Open Access Journals (Sweden)

    Brigitte Mangin

    2017-09-01

    Full Text Available Prediction of hybrid performance using incomplete factorial mating designs is widely used in breeding programs including different heterotic groups. Based on the general combining ability (GCA of the parents, predictions are accurate only if the genetic variance resulting from the specific combining ability is small and both parents have phenotyped descendants. Genomic selection (GS can predict performance using a model trained on both phenotyped and genotyped hybrids that do not necessarily include all hybrid parents. Therefore, GS could overcome the issue of unknown parent GCA. Here, we compared the accuracy of classical GCA-based and genomic predictions for oil content of sunflower seeds using several GS models. Our study involved 452 sunflower hybrids from an incomplete factorial design of 36 female and 36 male lines. Re-sequencing of parental lines allowed to identify 468,194 non-redundant SNPs and to infer the hybrid genotypes. Oil content was observed in a multi-environment trial (MET over 3 years, leading to nine different environments. We compared GCA-based model to different GS models including female and male genomic kinships with the addition of the female-by-male interaction genomic kinship, the use of functional knowledge as SNPs in genes of oil metabolic pathways, and with epistasis modeling. When both parents have descendants in the training set, the predictive ability was high even for GCA-based prediction, with an average MET value of 0.782. GS performed slightly better (+0.2%. Neither the inclusion of the female-by-male interaction, nor functional knowledge of oil metabolism, nor epistasis modeling improved the GS accuracy. GS greatly improved predictive ability when one or both parents were untested in the training set, increasing GCA-based predictive ability by 10.4% from 0.575 to 0.635 in the MET. In this scenario, performing GS only considering SNPs in oil metabolic pathways did not improve whole genome GS prediction but

  2. GaussianCpG: a Gaussian model for detection of CpG island in human genome sequences.

    Science.gov (United States)

    Yu, Ning; Guo, Xuan; Zelikovsky, Alexander; Pan, Yi

    2017-05-24

    As crucial markers in identifying biological elements and processes in mammalian genomes, CpG islands (CGI) play important roles in DNA methylation, gene regulation, epigenetic inheritance, gene mutation, chromosome inactivation and nuclesome retention. The generally accepted criteria of CGI rely on: (a) %G+C content is ≥ 50%, (b) the ratio of the observed CpG content and the expected CpG content is ≥ 0.6, and (c) the general length of CGI is greater than 200 nucleotides. Most existing computational methods for the prediction of CpG island are programmed on these rules. However, many experimentally verified CpG islands deviate from these artificial criteria. Experiments indicate that in many cases %G+C is human genome. We analyze the energy distribution over genomic primary structure for each CpG site and adopt the parameters from statistics of Human genome. The evaluation results show that the new model can predict CpG islands efficiently by balancing both sensitivity and specificity over known human CGI data sets. Compared with other models, GaussianCpG can achieve better performance in CGI detection. Our Gaussian model aims to simplify the complex interaction between nucleotides. The model is computed not by the linear statistical method but by the Gaussian energy distribution and accumulation. The parameters of Gaussian function are not arbitrarily designated but deliberately chosen by optimizing the biological statistics. By using the pseudopotential analysis on CpG islands, the novel model is validated on both the real and artificial data sets.

  3. CpG islands undermethylation in human genomic regions under selective pressure.

    Directory of Open Access Journals (Sweden)

    Sergio Cocozza

    Full Text Available DNA methylation at CpG islands (CGIs is one of the most intensively studied epigenetic mechanisms. It is fundamental for cellular differentiation and control of transcriptional potential. DNA methylation is involved also in several processes that are central to evolutionary biology, including phenotypic plasticity and evolvability. In this study, we explored the relationship between CpG islands methylation and signatures of selective pressure in Homo Sapiens, using a computational biology approach. By analyzing methylation data of 25 cell lines from the Encyclopedia of DNA Elements (ENCODE Consortium, we compared the DNA methylation of CpG islands in genomic regions under selective pressure with the methylation of CpG islands in the remaining part of the genome. To define genomic regions under selective pressure, we used three different methods, each oriented to provide distinct information about selective events. Independently of the method and of the cell type used, we found evidences of undermethylation of CGIs in human genomic regions under selective pressure. Additionally, by analyzing SNP frequency in CpG islands, we demonstrated that CpG islands in regions under selective pressure show lower genetic variation. Our findings suggest that the CpG islands in regions under selective pressure seem to be somehow more "protected" from methylation when compared with other regions of the genome.

  4. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Genomic selection is widely used in both animal and plant species, however, it is performed with no input from known genomic or biological role of genetic variants and therefore is a black box approach in a genomic era. This study investigated the role of different genomic regions and detected QTLs...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... classes. Predictive accuracy was 0.531, 0.532, 0.302, and 0.344 for DFI, RFI, ADG and BF, respectively. The contribution per SNP to total genomic variance was similar among annotated classes across different traits. Predictive performance of SNP classes did not significantly differ from randomized SNP...

  5. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  6. Mitochondrial genomes suggest rapid evolution of dwarf California Channel Islands foxes (Urocyon littoralis).

    Science.gov (United States)

    Hofman, Courtney A; Rick, Torben C; Hawkins, Melissa T R; Funk, W Chris; Ralls, Katherine; Boser, Christina L; Collins, Paul W; Coonan, Tim; King, Julie L; Morrison, Scott A; Newsome, Seth D; Sillett, T Scott; Fleischer, Robert C; Maldonado, Jesus E

    2015-01-01

    Island endemics are typically differentiated from their mainland progenitors in behavior, morphology, and genetics, often resulting from long-term evolutionary change. To examine mechanisms for the origins of island endemism, we present a phylogeographic analysis of whole mitochondrial genomes from the endangered island fox (Urocyon littoralis), endemic to California's Channel Islands, and mainland gray foxes (U. cinereoargenteus). Previous genetic studies suggested that foxes first appeared on the islands >16,000 years ago, before human arrival (~13,000 cal BP), while archaeological and paleontological data supported a colonization >7000 cal BP. Our results are consistent with initial fox colonization of the northern islands probably by rafting or human introduction ~9200-7100 years ago, followed quickly by human translocation of foxes from the northern to southern Channel Islands. Mitogenomes indicate that island foxes are monophyletic and most closely related to gray foxes from northern California that likely experienced a Holocene climate-induced range shift. Our data document rapid morphological evolution of island foxes (in ~2000 years or less). Despite evidence for bottlenecks, island foxes have generated and maintained multiple mitochondrial haplotypes. This study highlights the intertwined evolutionary history of island foxes and humans, and illustrates a new approach for investigating the evolutionary histories of other island endemics.

  7. Genomic Predictability of Interconnected Biparental Maize Populations

    Science.gov (United States)

    Riedelsheimer, Christian; Endelman, Jeffrey B.; Stange, Michael; Sorrells, Mark E.; Jannink, Jean-Luc; Melchinger, Albrecht E.

    2013-01-01

    Intense structuring of plant breeding populations challenges the design of the training set (TS) in genomic selection (GS). An important open question is how the TS should be constructed from multiple related or unrelated small biparental families to predict progeny from individual crosses. Here, we used a set of five interconnected maize (Zea mays L.) populations of doubled-haploid (DH) lines derived from four parents to systematically investigate how the composition of the TS affects the prediction accuracy for lines from individual crosses. A total of 635 DH lines genotyped with 16,741 polymorphic SNPs were evaluated for five traits including Gibberella ear rot severity and three kernel yield component traits. The populations showed a genomic similarity pattern, which reflects the crossing scheme with a clear separation of full sibs, half sibs, and unrelated groups. Prediction accuracies within full-sib families of DH lines followed closely theoretical expectations, accounting for the influence of sample size and heritability of the trait. Prediction accuracies declined by 42% if full-sib DH lines were replaced by half-sib DH lines, but statistically significantly better results could be achieved if half-sib DH lines were available from both instead of only one parent of the validation population. Once both parents of the validation population were represented in the TS, including more crosses with a constant TS size did not increase accuracies. Unrelated crosses showing opposite linkage phases with the validation population resulted in negative or reduced prediction accuracies, if used alone or in combination with related families, respectively. We suggest identifying and excluding such crosses from the TS. Moreover, the observed variability among populations and traits suggests that these uncertainties must be taken into account in models optimizing the allocation of resources in GS. PMID:23535384

  8. Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data.

    Science.gov (United States)

    Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor-McCourt, Maureen

    2015-02-01

    Tumor genome sequencing leads to documenting thousands of DNA mutations and other genomic alterations. At present, these data cannot be analyzed adequately to aid in the understanding of tumorigenesis and its evolution. Moreover, we have little insight into how to use these data to predict clinical phenotypes and tumor progression to better design patient treatment. To meet these challenges, we discuss a cancer hallmark network framework for modeling genome sequencing data to predict cancer clonal evolution and associated clinical phenotypes. The framework includes: (1) cancer hallmarks that can be represented by a few molecular/signaling networks. 'Network operational signatures' which represent gene regulatory logics/strengths enable to quantify state transitions and measures of hallmark traits. Thus, sets of genomic alterations which are associated with network operational signatures could be linked to the state/measure of hallmark traits. The network operational signature transforms genotypic data (i.e., genomic alterations) to regulatory phenotypic profiles (i.e., regulatory logics/strengths), to cellular phenotypic profiles (i.e., hallmark traits) which lead to clinical phenotypic profiles (i.e., a collection of hallmark traits). Furthermore, the framework considers regulatory logics of the hallmark networks under tumor evolutionary dynamics and therefore also includes: (2) a self-promoting positive feedback loop that is dominated by a genomic instability network and a cell survival/proliferation network is the main driver of tumor clonal evolution. Surrounding tumor stroma and its host immune systems shape the evolutionary paths; (3) cell motility initiating metastasis is a byproduct of the above self-promoting loop activity during tumorigenesis; (4) an emerging hallmark network which triggers genome duplication dominates a feed-forward loop which in turn could act as a rate-limiting step for tumor formation; (5) mutations and other genomic alterations have

  9. Campylobacter fetus subspecies: Comparative genomics and prediction of potential virulence targets

    DEFF Research Database (Denmark)

    Ali, Amjad; Soares, Siomar C.; Santos, Anderson R.

    2012-01-01

    . The potential candidate factors identified for attenuation and/or subunit vaccine development against C. fetus subspecies contain: nucleoside diphosphate kinase (Ndk), type IV secretion systems (T4SS), outer membrane proteins (OMP), substrate binding proteins CjaA and CjaC, surface array proteins, sap gene......, and cytolethal distending toxin (CDT). Significantly, many of those genes were found in genomic regions with signals of horizontal gene transfer and, therefore, predicted as putative pathogenicity islands. We found CRISPR loci and dam genes in an island specific for C. fetus subsp. fetus, and T4SS and sap genes...

  10. Genomic prediction across dairy cattle populations and breeds

    DEFF Research Database (Denmark)

    Zhou, Lei

    Genomic prediction is successful in single breed genetic evaluation. However, there is no achievement in acoress breed prediction until now. This thesis investigated genomic prediction across populations and breeds using Chinese Holsterin, Nordic Holstein, Norwgian Red, and Nordic Red. Nordic Red...

  11. Using a Bayesian network to predict barrier island geomorphologic characteristics

    Science.gov (United States)

    Gutierrez, Ben; Plant, Nathaniel G.; Thieler, E. Robert; Turecek, Aaron

    2015-01-01

    Quantifying geomorphic variability of coastal environments is important for understanding and describing the vulnerability of coastal topography, infrastructure, and ecosystems to future storms and sea level rise. Here we use a Bayesian network (BN) to test the importance of multiple interactions between barrier island geomorphic variables. This approach models complex interactions and handles uncertainty, which is intrinsic to future sea level rise, storminess, or anthropogenic processes (e.g., beach nourishment and other forms of coastal management). The BN was developed and tested at Assateague Island, Maryland/Virginia, USA, a barrier island with sufficient geomorphic and temporal variability to evaluate our approach. We tested the ability to predict dune height, beach width, and beach height variables using inputs that included longer-term, larger-scale, or external variables (historical shoreline change rates, distances to inlets, barrier width, mean barrier elevation, and anthropogenic modification). Data sets from three different years spanning nearly a decade sampled substantial temporal variability and serve as a proxy for analysis of future conditions. We show that distinct geomorphic conditions are associated with different long-term shoreline change rates and that the most skillful predictions of dune height, beach width, and beach height depend on including multiple input variables simultaneously. The predictive relationships are robust to variations in the amount of input data and to variations in model complexity. The resulting model can be used to evaluate scenarios related to coastal management plans and/or future scenarios where shoreline change rates may differ from those observed historically.

  12. Predicting community structure in snakes on Eastern Nearctic islands using ecological neutral theory and phylogenetic methods.

    Science.gov (United States)

    Burbrink, Frank T; McKelvy, Alexander D; Pyron, R Alexander; Myers, Edward A

    2015-11-22

    Predicting species presence and richness on islands is important for understanding the origins of communities and how likely it is that species will disperse and resist extinction. The equilibrium theory of island biogeography (ETIB) and, as a simple model of sampling abundances, the unified neutral theory of biodiversity (UNTB), predict that in situations where mainland to island migration is high, species-abundance relationships explain the presence of taxa on islands. Thus, more abundant mainland species should have a higher probability of occurring on adjacent islands. In contrast to UNTB, if certain groups have traits that permit them to disperse to islands better than other taxa, then phylogeny may be more predictive of which taxa will occur on islands. Taking surveys of 54 island snake communities in the Eastern Nearctic along with mainland communities that have abundance data for each species, we use phylogenetic assembly methods and UNTB estimates to predict island communities. Species richness is predicted by island area, whereas turnover from the mainland to island communities is random with respect to phylogeny. Community structure appears to be ecologically neutral and abundance on the mainland is the best predictor of presence on islands. With regard to young and proximate islands, where allopatric or cladogenetic speciation is not a factor, we find that simple neutral models following UNTB and ETIB predict the structure of island communities. © 2015 The Author(s).

  13. Genome characterization of Long Island tick rhabdovirus, a new virus identified in Amblyomma americanum ticks.

    Science.gov (United States)

    Tokarz, Rafal; Sameroff, Stephen; Leon, Maria Sanchez; Jain, Komal; Lipkin, W Ian

    2014-02-11

    Ticks are implicated as hosts to a wide range of animal and human pathogens. The full range of microbes harbored by ticks has not yet been fully explored. As part of a viral surveillance and discovery project in arthropods, we used unbiased high-throughput sequencing to examine viromes of ticks collected on Long Island, New York in 2013. We detected and sequenced the complete genome of a novel rhabdovirus originating from a pool of Amblyomma americanum ticks. This virus, which we provisionally name Long Island tick rhabdovirus, is distantly related to Moussa virus from Africa. The Long Island tick rhabdovirus may represent a novel species within family Rhabdoviridae.

  14. Empirical and deterministic accuracies of across-population genomic prediction

    NARCIS (Netherlands)

    Wientjes, Y.C.J.; Veerkamp, R.F.; Bijma, P.; Bovenhuis, H.; Schrooten, C.; Calus, M.P.L.

    2015-01-01

    Background: Differences in linkage disequilibrium and in allele substitution effects of QTL (quantitative trait loci) may hinder genomic prediction across populations. Our objective was to develop a deterministic formula to estimate the accuracy of across-population genomic prediction, for which

  15. Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution.

    Directory of Open Access Journals (Sweden)

    James A Cahill

    Full Text Available Despite extensive genetic analysis, the evolutionary relationship between polar bears (Ursus maritimus and brown bears (U. arctos remains unclear. The two most recent comprehensive reports indicate a recent divergence with little subsequent admixture or a much more ancient divergence followed by extensive admixture. At the center of this controversy are the Alaskan ABC Islands brown bears that show evidence of shared ancestry with polar bears. We present an analysis of genome-wide sequence data for seven polar bears, one ABC Islands brown bear, one mainland Alaskan brown bear, and a black bear (U. americanus, plus recently published datasets from other bears. Surprisingly, we find clear evidence for gene flow from polar bears into ABC Islands brown bears but no evidence of gene flow from brown bears into polar bears. Importantly, while polar bears contributed <1% of the autosomal genome of the ABC Islands brown bear, they contributed 6.5% of the X chromosome. The magnitude of sex-biased polar bear ancestry and the clear direction of gene flow suggest a model wherein the enigmatic ABC Island brown bears are the descendants of a polar bear population that was gradually converted into brown bears via male-dominated brown bear admixture. We present a model that reconciles heretofore conflicting genetic observations. We posit that the enigmatic ABC Islands brown bears derive from a population of polar bears likely stranded by the receding ice at the end of the last glacial period. Since then, male brown bear migration onto the island has gradually converted these bears into an admixed population whose phenotype and genotype are principally brown bear, except at mtDNA and X-linked loci. This process of genome erosion and conversion may be a common outcome when climate change or other forces cause a population to become isolated and then overrun by species with which it can hybridize.

  16. Adaptation in Toxic Environments: Arsenic Genomic Islands in the Bacterial Genus Thiomonas.

    Directory of Open Access Journals (Sweden)

    Kelle C Freel

    Full Text Available Acid mine drainage (AMD is a highly toxic environment for most living organisms due to the presence of many lethal elements including arsenic (As. Thiomonas (Tm. bacteria are found ubiquitously in AMD and can withstand these extreme conditions, in part because they are able to oxidize arsenite. In order to further improve our knowledge concerning the adaptive capacities of these bacteria, we sequenced and assembled the genome of six isolates derived from the Carnoulès AMD, and compared them to the genomes of Tm. arsenitoxydans 3As (isolated from the same site and Tm. intermedia K12 (isolated from a sewage pipe. A detailed analysis of the Tm. sp. CB2 genome revealed various rearrangements had occurred in comparison to what was observed in 3As and K12 and over 20 genomic islands (GEIs were found in each of these three genomes. We performed a detailed comparison of the two arsenic-related islands found in CB2, carrying the genes required for arsenite oxidation and As resistance, with those found in K12, 3As, and five other Thiomonas strains also isolated from Carnoulès (CB1, CB3, CB6, ACO3 and ACO7. Our results suggest that these arsenic-related islands have evolved differentially in these closely related Thiomonas strains, leading to divergent capacities to survive in As rich environments.

  17. LifeStyle-Specific-Islands (LiSSI): Integrated Bioinformatics Platform for Genomic Island Analysis

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Rottger, Richard; Hauschild, Anne-Christin

    2017-01-01

    Distinct bacteria are able to cope with highly diverse lifestyles; for instance, they can be free living or host-associated. Thus, these organisms must possess a large and varied genomic arsenal to withstand different environmental conditions. To facilitate the identification of genomic features ...

  18. Correction for Measurement Error from Genotyping-by-Sequencing in Genomic Variance and Genomic Prediction Models

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Janss, Luc; Jensen, Just

    sample). The GBSeq data can be used directly in genomic models in the form of individual SNP allele-frequency estimates (e.g., reference reads/total reads per polymorphic site per individual), but is subject to measurement error due to the low sequencing depth per individual. Due to technical reasons....... In the current work we show how the correction for measurement error in GBSeq can also be applied in whole genome genomic variance and genomic prediction models. Bayesian whole-genome random regression models are proposed to allow implementation of large-scale SNP-based models with a per-SNP correction...... for measurement error. We show correct retrieval of genomic explained variance, and improved genomic prediction when accounting for the measurement error in GBSeq data...

  19. Draft Genome Sequence of Halostagnicola sp. A56, an Extremely Halophilic Archaeon Isolated from the Andaman Islands

    Science.gov (United States)

    Kanekar, Sagar P.; Saxena, Neha; Pore, Soham D.; Arora, Preeti; Kanekar, P. P.

    2015-01-01

    The first draft genome of Halostagnicola sp. A56, isolated from the Andaman Islands is reported here. The A56 genome comprises 3,178,490 bp in 26 contigs with a G+C content of 60.8%. The genome annotation revealed that A56 could have potential applications for the production of polyhydroxyalkanoate or bioplastics. PMID:26564049

  20. Extensive Genome Rearrangements and Multiple Horizontal Gene Transfers in a Population of Pyrococcus Isolates from Vulcano Island, Italy▿ †

    Science.gov (United States)

    White, James R.; Escobar-Paramo, Patricia; Mongodin, Emmanuel F.; Nelson, Karen E.; DiRuggiero, Jocelyne

    2008-01-01

    The extent of chromosome rearrangements in Pyrococcus isolates from marine hydrothermal vents in Vulcano Island, Italy, was evaluated by high-throughput genomic methods. The results illustrate the dynamic nature of the genomes of the genus Pyrococcus and raise the possibility of a connection between rapidly changing environmental conditions and adaptive genomic properties. PMID:18723649

  1. Extensive genome rearrangements and multiple horizontal gene transfers in a population of pyrococcus isolates from Vulcano Island, Italy.

    Science.gov (United States)

    White, James R; Escobar-Paramo, Patricia; Mongodin, Emmanuel F; Nelson, Karen E; DiRuggiero, Jocelyne

    2008-10-01

    The extent of chromosome rearrangements in Pyrococcus isolates from marine hydrothermal vents in Vulcano Island, Italy, was evaluated by high-throughput genomic methods. The results illustrate the dynamic nature of the genomes of the genus Pyrococcus and raise the possibility of a connection between rapidly changing environmental conditions and adaptive genomic properties.

  2. Genomic value prediction for quantitative traits under the epistatic model

    Directory of Open Access Journals (Sweden)

    Xu Shizhong

    2011-01-01

    Full Text Available Abstract Background Most quantitative traits are controlled by multiple quantitative trait loci (QTL. The contribution of each locus may be negligible but the collective contribution of all loci is usually significant. Genome selection that uses markers of the entire genome to predict the genomic values of individual plants or animals can be more efficient than selection on phenotypic values and pedigree information alone for genetic improvement. When a quantitative trait is contributed by epistatic effects, using all markers (main effects and marker pairs (epistatic effects to predict the genomic values of plants can achieve the maximum efficiency for genetic improvement. Results In this study, we created 126 recombinant inbred lines of soybean and genotyped 80 makers across the genome. We applied the genome selection technique to predict the genomic value of somatic embryo number (a quantitative trait for each line. Cross validation analysis showed that the squared correlation coefficient between the observed and predicted embryo numbers was 0.33 when only main (additive effects were used for prediction. When the interaction (epistatic effects were also included in the model, the squared correlation coefficient reached 0.78. Conclusions This study provided an excellent example for the application of genome selection to plant breeding.

  3. Genomic Prediction of Manganese Efficiency in Winter Barley

    Directory of Open Access Journals (Sweden)

    Florian Leplat

    2016-07-01

    Full Text Available Manganese efficiency is a quantitative abiotic stress trait controlled by several genes each with a small effect. Manganese deficiency leads to yield reduction in winter barley ( L.. Breeding new cultivars for this trait remains difficult because of the lack of visual symptoms and the polygenic features of the trait. Hence, Mn efficiency is a potential suitable trait for a genomic selection (GS approach. A collection of 248 winter barley varieties was screened for Mn efficiency using Chlorophyll (Chl fluorescence in six environments prone to induce Mn deficiency. Two models for genomic prediction were implemented to predict future performance and breeding value of untested varieties. Predictions were obtained using multivariate mixed models: best linear unbiased predictor (BLUP and genomic best linear unbiased predictor (G-BLUP. In the first model, predictions were based on the phenotypic evaluation, whereas both phenotypic and genomic marker data were included in the second model. Accuracy of predicting future phenotype, , and accuracy of predicting true breeding values, , were calculated and compared for both models using six cross-validation (CV schemes; these were designed to mimic plant breeding programs. Overall, the CVs showed that prediction accuracies increased when using the G-BLUP model compared with the prediction accuracies using the BLUP model. Furthermore, the accuracies [] of predicting breeding values were more accurate than accuracy of predicting future phenotypes []. The study confirms that genomic data may enhance the prediction accuracy. Moreover it indicates that GS is a suitable breeding approach for quantitative abiotic stress traits.

  4. Genetic Programming for Sea Level Predictions in an Island Environment

    Directory of Open Access Journals (Sweden)

    M.A. Ghorbani

    2010-03-01

    Full Text Available Accurate predictions of sea-level are important for geodetic applications, navigation, coastal, industrial and tourist activities. In the current work, the Genetic Programming (GP and artificial neural networks (ANNs were applied to forecast half-daily and daily sea-level variations from 12 hours to 5 days ahead. The measurements at the Cocos (Keeling Islands in the Indian Ocean were used for training and testing of the employed artificial intelligence techniques. A comparison was performed of the predictions from the GP model and the ANN simulations. Based on the comparison outcomes, it was found that the Genetic Programming approach can be successfully employed in forecasting of sea level variations.

  5. Genomic islands of differentiation in two songbird species reveal candidate genes for hybrid female sterility.

    Science.gov (United States)

    Mořkovský, Libor; Janoušek, Václav; Reif, Jiří; Rídl, Jakub; Pačes, Jan; Choleva, Lukáš; Janko, Karel; Nachman, Michael W; Reifová, Radka

    2018-02-01

    Hybrid sterility is a common first step in the evolution of postzygotic reproductive isolation. According to Haldane's Rule, it affects predominantly the heterogametic sex. While the genetic basis of hybrid male sterility in organisms with heterogametic males has been studied for decades, the genetic basis of hybrid female sterility in organisms with heterogametic females has received much less attention. We investigated the genetic basis of reproductive isolation in two closely related avian species, the common nightingale (Luscinia megarhynchos) and the thrush nightingale (L. luscinia), that hybridize in a secondary contact zone and produce viable hybrid progeny. In accordance with Haldane's Rule, hybrid females are sterile, while hybrid males are fertile, allowing gene flow to occur between the species. Using transcriptomic data from multiple individuals of both nightingale species, we identified genomic islands of high differentiation (F ST ) and of high divergence (D xy ), and we analysed gene content and patterns of molecular evolution within these islands. Interestingly, we found that these islands were enriched for genes related to female meiosis and metabolism. The islands of high differentiation and divergence were also characterized by higher levels of linkage disequilibrium than the rest of the genome in both species indicating that they might be situated in genomic regions of low recombination. This study provides one of the first insights into genetic basis of hybrid female sterility in organisms with heterogametic females. © 2018 John Wiley & Sons Ltd.

  6. Genomic prediction when some animals are not genotyped

    Directory of Open Access Journals (Sweden)

    Lund Mogens S

    2010-01-01

    Full Text Available Abstract Background The use of genomic selection in breeding programs may increase the rate of genetic improvement, reduce the generation time, and provide higher accuracy of estimated breeding values (EBVs. A number of different methods have been developed for genomic prediction of breeding values, but many of them assume that all animals have been genotyped. In practice, not all animals are genotyped, and the methods have to be adapted to this situation. Results In this paper we provide an extension of a linear mixed model method for genomic prediction to the situation with non-genotyped animals. The model specifies that a breeding value is the sum of a genomic and a polygenic genetic random effect, where genomic genetic random effects are correlated with a genomic relationship matrix constructed from markers and the polygenic genetic random effects are correlated with the usual relationship matrix. The extension of the model to non-genotyped animals is made by using the pedigree to derive an extension of the genomic relationship matrix to non-genotyped animals. As a result, in the extended model the estimated breeding values are obtained by blending the information used to compute traditional EBVs and the information used to compute purely genomic EBVs. Parameters in the model are estimated using average information REML and estimated breeding values are best linear unbiased predictions (BLUPs. The method is illustrated using a simulated data set. Conclusions The extension of the method to non-genotyped animals presented in this paper makes it possible to integrate all the genomic, pedigree and phenotype information into a one-step procedure for genomic prediction. Such a one-step procedure results in more accurate estimated breeding values and has the potential to become the standard tool for genomic prediction of breeding values in future practical evaluations in pig and cattle breeding.

  7. Full genome sequences are key to disclose RHDV2 emergence in the Macaronesian islands.

    Science.gov (United States)

    Lopes, Ana M; Blanco-Aguiar, Jose; Martín-Alonso, Aaron; Leitão, Manuel; Foronda, Pilar; Mendes, Marco; Gonçalves, David; Abrantes, Joana; Esteves, Pedro J

    2018-02-01

    A recent publication by Carvalho et al. in "Virus Genes" (June 2017) reported the presence of the new variant of rabbit hemorrhagic disease virus (RHDV2) in the two larger islands of the archipelago of Madeira. Based on the capsid protein sequence, the authors suggested that the high sequence identity, along with the short time spanning between outbreaks, points to dissemination from Porto Santo to Madeira. By including information of the full RHDV2 genome of strains from Azores, Madeira, and the Canary Islands, we confirm the results obtained by Carvalho et al., but further show that several subtypes of RHDV2 circulate in these islands: non-recombinant RHDV2 in the Canary Islands, G1/RHDV2 in Azores, Porto Santo and Madeira, and NP/RHDV2 also in Madeira. Here we conclude that RHDV2 has been independently introduced in these archipelagos, and that in Madeira at least two independent introductions must have occurred. We provide additional information on the dynamics of RHDV2 in the Macaronesian archipelagos of Azores, Madeira, and the Canary Islands and highlight the importance of analyzing RHDV2 complete genome.

  8. Using Genome-scale Models to Predict Biological Capabilities

    DEFF Research Database (Denmark)

    O’Brien, Edward J.; Monk, Jonathan M.; Palsson, Bernhard O.

    2015-01-01

    Constraint-based reconstruction and analysis (COBRA) methods at the genome scale have been under development since the first whole-genome sequences appeared in the mid-1990s. A few years ago, this approach began to demonstrate the ability to predict a range of cellular functions, including cellul...

  9. Genomic breeding value prediction:methods and procedures

    NARCIS (Netherlands)

    Calus, M.P.L.

    2010-01-01

    Animal breeding faces one of the most significant changes of the past decades – the implementation of genomic selection. Genomic selection uses dense marker maps to predict the breeding value of animals with reported accuracies that are up to 0.31 higher than those of pedigree indexes, without the

  10. EuGI: a novel resource for studying genomic islands to facilitate horizontal gene transfer detection in eukaryotes.

    Science.gov (United States)

    Clasen, Frederick Johannes; Pierneef, Rian Ewald; Slippers, Bernard; Reva, Oleg

    2018-05-03

    Genomic islands (GIs) are inserts of foreign DNA that have potentially arisen through horizontal gene transfer (HGT). There are evidences that GIs can contribute significantly to the evolution of prokaryotes. The acquisition of GIs through HGT in eukaryotes has, however, been largely unexplored. In this study, the previously developed GI prediction tool, SeqWord Gene Island Sniffer (SWGIS), is modified to predict GIs in eukaryotic chromosomes. Artificial simulations are used to estimate ratios of predicting false positive and false negative GIs by inserting GIs into different test chromosomes and performing the SWGIS v2.0 algorithm. Using SWGIS v2.0, GIs are then identified in 36 fungal, 22 protozoan and 8 invertebrate genomes. SWGIS v2.0 predicts GIs in large eukaryotic chromosomes based on the atypical nucleotide composition of these regions. Averages for predicting false negative and false positive GIs were 20.1% and 11.01% respectively. A total of 10,550 GIs were identified in 66 eukaryotic species with 5299 of these GIs coding for at least one functional protein. The EuGI web-resource, freely accessible at http://eugi.bi.up.ac.za , was developed that allows browsing the database created from identified GIs and genes within GIs through an interactive and visual interface. SWGIS v2.0 along with the EuGI database, which houses GIs identified in 66 different eukaryotic species, and the EuGI web-resource, provide the first comprehensive resource for studying HGT in eukaryotes.

  11. PReMod: a database of genome-wide mammalian cis-regulatory module predictions.

    Science.gov (United States)

    Ferretti, Vincent; Poitras, Christian; Bergeron, Dominique; Coulombe, Benoit; Robert, François; Blanchette, Mathieu

    2007-01-01

    We describe PReMod, a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes. The prediction algorithm, described previously in Blanchette et al. (2006) Genome Res., 16, 656-668, exploits the fact that many known CRMs are made of clusters of phylogenetically conserved and repeated transcription factors (TF) binding sites. Contrary to other existing databases, PReMod is not restricted to modules located proximal to genes, but in fact mostly contains distal predicted CRMs (pCRMs). Through its web interface, PReMod allows users to (i) identify pCRMs around a gene of interest; (ii) identify pCRMs that have binding sites for a given TF (or a set of TFs) or (iii) download the entire dataset for local analyses. Queries can also be refined by filtering for specific chromosomal regions, for specific regions relative to genes or for the presence of CpG islands. The output includes information about the binding sites predicted within the selected pCRMs, and a graphical display of their distribution within the pCRMs. It also provides a visual depiction of the chromosomal context of the selected pCRMs in terms of neighboring pCRMs and genes, all of which are linked to the UCSC Genome Browser and the NCBI. PReMod: http://genomequebec.mcgill.ca/PReMod.

  12. A genome-wide map of aberrantly expressed chromosomal islands in colorectal cancer

    Directory of Open Access Journals (Sweden)

    Castanos-Velez Esmeralda

    2006-09-01

    Full Text Available Abstract Background Cancer development is accompanied by genetic phenomena like deletion and amplification of chromosome parts or alterations of chromatin structure. It is expected that these mechanisms have a strong effect on regional gene expression. Results We investigated genome-wide gene expression in colorectal carcinoma (CRC and normal epithelial tissues from 25 patients using oligonucleotide arrays. This allowed us to identify 81 distinct chromosomal islands with aberrant gene expression. Of these, 38 islands show a gain in expression and 43 a loss of expression. In total, 7.892 genes (25.3% of all human genes are located in aberrantly expressed islands. Many chromosomal regions that are linked to hereditary colorectal cancer show deregulated expression. Also, many known tumor genes localize to chromosomal islands of misregulated expression in CRC. Conclusion An extensive comparison with published CGH data suggests that chromosomal regions known for frequent deletions in colon cancer tend to show reduced expression. In contrast, regions that are often amplified in colorectal tumors exhibit heterogeneous expression patterns: even show a decrease of mRNA expression. Because for several islands of deregulated expression chromosomal aberrations have never been observed, we speculate that additional mechanisms (like abnormal states of regional chromatin also have a substantial impact on the formation of co-expression islands in colorectal carcinoma.

  13. Long-range autocorrelations of CpG islands in the human genome.

    Directory of Open Access Journals (Sweden)

    Benjamin Koester

    Full Text Available In this paper, we use a statistical estimator developed in astrophysics to study the distribution and organization of features of the human genome. Using the human reference sequence we quantify the global distribution of CpG islands (CGI in each chromosome and demonstrate that the organization of the CGI across a chromosome is non-random, exhibits surprisingly long range correlations (10 Mb and varies significantly among chromosomes. These correlations of CGI summarize functional properties of the genome that are not captured when considering variation in any particular separate (and local feature. The demonstration of the proposed methods to quantify the organization of CGI in the human genome forms the basis of future studies. The most illuminating of these will assess the potential impact on phenotypic variation of inter-individual variation in the organization of the functional features of the genome within and among chromosomes, and among individuals for particular chromosomes.

  14. Phylogenetic Relationships of the Fern Cyrtomium falcatum (Dryopteridaceae from Dokdo Island Based on Chloroplast Genome Sequencing

    Directory of Open Access Journals (Sweden)

    Gurusamy Raman

    2016-12-01

    Full Text Available Cyrtomium falcatum is a popular ornamental fern cultivated worldwide. Native to the Korean Peninsula, Japan, and Dokdo Island in the Sea of Japan, it is the only fern present on Dokdo Island. We isolated and characterized the chloroplast (cp genome of C. falcatum, and compared it with those of closely related species. The genes trnV-GAC and trnV-GAU were found to be present within the cp genome of C. falcatum, whereas trnP-GGG and rpl21 were lacking. Moreover, cp genomes of Cyrtomium devexiscapulae and Adiantum capillus-veneris lack trnP-GGG and rpl21, suggesting these are not conserved among angiosperm cp genomes. The deletion of trnR-UCG, trnR-CCG, and trnSeC in the cp genomes of C. falcatum and other eupolypod ferns indicates these genes are restricted to tree ferns, non-core leptosporangiates, and basal ferns. The C. falcatum cp genome also encoded ndhF and rps7, with GUG start codons that were only conserved in polypod ferns, and it shares two significant inversions with other ferns, including a minor inversion of the trnD-GUC region and an approximate 3 kb inversion of the trnG-trnT region. Phylogenetic analyses showed that Equisetum was found to be a sister clade to Psilotales-Ophioglossales with a 100% bootstrap (BS value. The sister relationship between Pteridaceae and eupolypods was also strongly supported by a 100% BS, but Bayesian molecular clock analyses suggested that C. falcatum diversified in the mid-Paleogene period (45.15 ± 4.93 million years ago and might have moved from Eurasia to Dokdo Island.

  15. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    OpenAIRE

    Jayapal, Karthik P; Lian, Wei; Glod, Frank; Sherman, David H; Hu, Wei-Shou

    2007-01-01

    Abstract Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Res...

  16. Genome characterization of Long Island tick rhabdovirus, a new virus identified in Amblyomma americanum ticks

    Science.gov (United States)

    2014-01-01

    Background Ticks are implicated as hosts to a wide range of animal and human pathogens. The full range of microbes harbored by ticks has not yet been fully explored. Methods As part of a viral surveillance and discovery project in arthropods, we used unbiased high-throughput sequencing to examine viromes of ticks collected on Long Island, New York in 2013. Results We detected and sequenced the complete genome of a novel rhabdovirus originating from a pool of Amblyomma americanum ticks. This virus, which we provisionally name Long Island tick rhabdovirus, is distantly related to Moussa virus from Africa. Conclusions The Long Island tick rhabdovirus may represent a novel species within family Rhabdoviridae. PMID:24517260

  17. Prediction in medicine – genome contra envirome

    Czech Academy of Sciences Publication Activity Database

    Brdička, Radim

    2012-01-01

    Roč. 151, č. 1 (2012), s. 22-25 ISSN 0008-7335 R&D Projects: GA MZd NS9804 Institutional research plan: CEZ:AV0Z50390703 Keywords : genome * genotype * phenotype Subject RIV: FP - Other Medical Disciplines

  18. Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution.

    Science.gov (United States)

    Cahill, James A; Green, Richard E; Fulton, Tara L; Stiller, Mathias; Jay, Flora; Ovsyanikov, Nikita; Salamzade, Rauf; St John, John; Stirling, Ian; Slatkin, Montgomery; Shapiro, Beth

    2013-01-01

    Despite extensive genetic analysis, the evolutionary relationship between polar bears (Ursus maritimus) and brown bears (U. arctos) remains unclear. The two most recent comprehensive reports indicate a recent divergence with little subsequent admixture or a much more ancient divergence followed by extensive admixture. At the center of this controversy are the Alaskan ABC Islands brown bears that show evidence of shared ancestry with polar bears. We present an analysis of genome-wide sequence data for seven polar bears, one ABC Islands brown bear, one mainland Alaskan brown bear, and a black bear (U. americanus), plus recently published datasets from other bears. Surprisingly, we find clear evidence for gene flow from polar bears into ABC Islands brown bears but no evidence of gene flow from brown bears into polar bears. Importantly, while polar bears contributed bear, they contributed 6.5% of the X chromosome. The magnitude of sex-biased polar bear ancestry and the clear direction of gene flow suggest a model wherein the enigmatic ABC Island brown bears are the descendants of a polar bear population that was gradually converted into brown bears via male-dominated brown bear admixture. We present a model that reconciles heretofore conflicting genetic observations. We posit that the enigmatic ABC Islands brown bears derive from a population of polar bears likely stranded by the receding ice at the end of the last glacial period. Since then, male brown bear migration onto the island has gradually converted these bears into an admixed population whose phenotype and genotype are principally brown bear, except at mtDNA and X-linked loci. This process of genome erosion and conversion may be a common outcome when climate change or other forces cause a population to become isolated and then overrun by species with which it can hybridize.

  19. Adaptive divergence despite strong genetic drift: genomic analysis of the evolutionary mechanisms causing genetic differentiation in the island fox (Urocyon littoralis)

    Science.gov (United States)

    FUNK, W. CHRIS; LOVICH, ROBERT E.; HOHENLOHE, PAUL A.; HOFMAN, COURTNEY A.; MORRISON, SCOTT A.; SILLETT, T. SCOTT; GHALAMBOR, CAMERON K.; MALDONADO, JESUS E.; RICK, TORBEN C.; DAY, MITCH D.; POLATO, NICHOLAS R.; FITZPATRICK, SARAH W.; COONAN, TIMOTHY J.; CROOKS, KEVIN R.; DILLON, ADAM; GARCELON, DAVID K.; KING, JULIE L.; BOSER, CHRISTINA L.; GOULD, NICHOLAS; ANDELT, WILLIAM F.

    2016-01-01

    The evolutionary mechanisms generating the tremendous biodiversity of islands have long fascinated evolutionary biologists. Genetic drift and divergent selection are predicted to be strong on islands and both could drive population divergence and speciation. Alternatively, strong genetic drift may preclude adaptation. We conducted a genomic analysis to test the roles of genetic drift and divergent selection in causing genetic differentiation among populations of the island fox (Urocyon littoralis). This species consists of 6 subspecies, each of which occupies a different California Channel Island. Analysis of 5293 SNP loci generated using Restriction-site Associated DNA (RAD) sequencing found support for genetic drift as the dominant evolutionary mechanism driving population divergence among island fox populations. In particular, populations had exceptionally low genetic variation, small Ne (range = 2.1–89.7; median = 19.4), and significant genetic signatures of bottlenecks. Moreover, islands with the lowest genetic variation (and, by inference, the strongest historical genetic drift) were most genetically differentiated from mainland gray foxes, and vice versa, indicating genetic drift drives genome-wide divergence. Nonetheless, outlier tests identified 3.6–6.6% of loci as high FST outliers, suggesting that despite strong genetic drift, divergent selection contributes to population divergence. Patterns of similarity among populations based on high FST outliers mirrored patterns based on morphology, providing additional evidence that outliers reflect adaptive divergence. Extremely low genetic variation and small Ne in some island fox populations, particularly on San Nicolas Island, suggest that they may be vulnerable to fixation of deleterious alleles, decreased fitness, and reduced adaptive potential. PMID:26992010

  20. Adaptive divergence despite strong genetic drift: genomic analysis of the evolutionary mechanisms causing genetic differentiation in the island fox (Urocyon littoralis).

    Science.gov (United States)

    Funk, W Chris; Lovich, Robert E; Hohenlohe, Paul A; Hofman, Courtney A; Morrison, Scott A; Sillett, T Scott; Ghalambor, Cameron K; Maldonado, Jesus E; Rick, Torben C; Day, Mitch D; Polato, Nicholas R; Fitzpatrick, Sarah W; Coonan, Timothy J; Crooks, Kevin R; Dillon, Adam; Garcelon, David K; King, Julie L; Boser, Christina L; Gould, Nicholas; Andelt, William F

    2016-05-01

    The evolutionary mechanisms generating the tremendous biodiversity of islands have long fascinated evolutionary biologists. Genetic drift and divergent selection are predicted to be strong on islands and both could drive population divergence and speciation. Alternatively, strong genetic drift may preclude adaptation. We conducted a genomic analysis to test the roles of genetic drift and divergent selection in causing genetic differentiation among populations of the island fox (Urocyon littoralis). This species consists of six subspecies, each of which occupies a different California Channel Island. Analysis of 5293 SNP loci generated using Restriction-site Associated DNA (RAD) sequencing found support for genetic drift as the dominant evolutionary mechanism driving population divergence among island fox populations. In particular, populations had exceptionally low genetic variation, small Ne (range = 2.1-89.7; median = 19.4), and significant genetic signatures of bottlenecks. Moreover, islands with the lowest genetic variation (and, by inference, the strongest historical genetic drift) were most genetically differentiated from mainland grey foxes, and vice versa, indicating genetic drift drives genome-wide divergence. Nonetheless, outlier tests identified 3.6-6.6% of loci as high FST outliers, suggesting that despite strong genetic drift, divergent selection contributes to population divergence. Patterns of similarity among populations based on high FST outliers mirrored patterns based on morphology, providing additional evidence that outliers reflect adaptive divergence. Extremely low genetic variation and small Ne in some island fox populations, particularly on San Nicolas Island, suggest that they may be vulnerable to fixation of deleterious alleles, decreased fitness and reduced adaptive potential. © 2016 John Wiley & Sons Ltd.

  1. An estimator-based distributed voltage-predictive control strategy for ac islanded microgrids

    DEFF Research Database (Denmark)

    Wang, Yanbo; Chen, Zhe; Wang, Xiongfei

    2015-01-01

    This paper presents an estimator-based voltage predictive control strategy for AC islanded microgrids, which is able to perform voltage control without any communication facilities. The proposed control strategy is composed of a network voltage estimator and a voltage predictive controller for each...... and has a good capability to reject uncertain perturbations of islanded microgrids....

  2. Assessing Predictive Properties of Genome-Wide Selection in Soybeans

    Directory of Open Access Journals (Sweden)

    Alencar Xavier

    2016-08-01

    Full Text Available Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr. We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set.

  3. Assessing Predictive Properties of Genome-Wide Selection in Soybeans.

    Science.gov (United States)

    Xavier, Alencar; Muir, William M; Rainey, Katy Martin

    2016-08-09

    Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. Copyright © 2016 Xavie et al.

  4. Compositional searching of CpG islands in the human genome

    Science.gov (United States)

    Luque-Escamilla, Pedro Luis; Martínez-Aroza, José; Oliver, José L.; Gómez-Lopera, Juan Francisco; Román-Roldán, Ramón

    2005-06-01

    We report on an entropic edge detector based on the local calculation of the Jensen-Shannon divergence with application to the search for CpG islands. CpG islands are pieces of the genome related to gene expression and cell differentiation, and thus to cancer formation. Searching for these CpG islands is a major task in genetics and bioinformatics. Some algorithms have been proposed in the literature, based on moving statistics in a sliding window, but its size may greatly influence the results. The local use of Jensen-Shannon divergence is a completely different strategy: the nucleotide composition inside the islands is different from that in their environment, so a statistical distance—the Jensen-Shannon divergence—between the composition of two adjacent windows may be used as a measure of their dissimilarity. Sliding this double window over the entire sequence allows us to segment it compositionally. The fusion of those segments into greater ones that satisfy certain identification criteria must be achieved in order to obtain the definitive results. We find that the local use of Jensen-Shannon divergence is very suitable in processing DNA sequences for searching for compositionally different structures such as CpG islands, as compared to other algorithms in literature.

  5. Complete Genome Sequences of Four Avian Paramyxoviruses of Serotype 10 Isolated from Rockhopper Penguins on the Falkland Islands

    OpenAIRE

    Goraichuk, Iryna V.; Dimitrov, Kiril M.; Sharma, Poonam; Miller, Patti J.; Swayne, David E.; Suarez, David L.; Afonso, Claudio L.

    2017-01-01

    ABSTRACT The first complete genome sequences of four avian paramyxovirus serotype 10 (APMV-10) isolates are described here. The viruses were isolated from rockhopper penguins on the Falkland Islands, sampled in 2007. All four genomes are 15,456 nucleotides in length, and phylogenetic analyses show them to be closely related.

  6. Complete Genome Sequences of Four Avian Paramyxoviruses of Serotype 10 Isolated from Rockhopper Penguins on the Falkland Islands

    Science.gov (United States)

    Goraichuk, Iryna V.; Dimitrov, Kiril M.; Sharma, Poonam; Miller, Patti J.; Swayne, David E.; Suarez, David L.

    2017-01-01

    ABSTRACT The first complete genome sequences of four avian paramyxovirus serotype 10 (APMV-10) isolates are described here. The viruses were isolated from rockhopper penguins on the Falkland Islands, sampled in 2007. All four genomes are 15,456 nucleotides in length, and phylogenetic analyses show them to be closely related. PMID:28572332

  7. Predicting Tissue-Specific Enhancers in the Human Genome

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Loots, Gabriela G.; Nobrega, Marcelo A.; Ovcharenko, Ivan

    2006-07-01

    Determining how transcriptional regulatory signals areencoded in vertebrate genomes is essential for understanding the originsof multi-cellular complexity; yet the genetic code of vertebrate generegulation remains poorly understood. In an attempt to elucidate thiscode, we synergistically combined genome-wide gene expression profiling,vertebrate genome comparisons, and transcription factor binding siteanalysis to define sequence signatures characteristic of candidatetissue-specific enhancers in the human genome. We applied this strategyto microarray-based gene expression profiles from 79 human tissues andidentified 7,187 candidate enhancers that defined their flanking geneexpression, the majority of which were located outside of knownpromoters. We cross-validated this method for its ability to de novopredict tissue-specific gene expression and confirmed its reliability in57 of the 79 available human tissues, with an average precision inenhancer recognition ranging from 32 percent to 63 percent, and asensitivity of 47 percent. We used the sequence signatures identified bythis approach to assign tissue-specific predictions to ~;328,000human-mouse conserved noncoding elements in the human genome. Byoverlapping these genome-wide predictions with a large in vivo dataset ofenhancers validated in transgenic mice, we confirmed our results with a28 percent sensitivity and 50 percent precision. These results indicatethe power of combining complementary genomic datasets as an initialcomputational foray into the global view of tissue-specific generegulation in vertebrates.

  8. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    Science.gov (United States)

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  9. Genomic prediction of traits related to canine hip dysplasia

    Directory of Open Access Journals (Sweden)

    Enrique eSanchez-Molano

    2015-03-01

    Full Text Available Increased concern for the welfare of pedigree dogs has led to development of selection programs against inherited diseases. An example is canine hip dysplasia (CHD, which has a moderate heritability and a high prevalence in some large-sized breeds. To date, selection using phenotypes has led to only modest improvement, and alternative strategies such as genomic selection may prove more effective. The primary aims of this study were to compare the performance of pedigree- and genomic-based breeding against CHD in the UK Labrador retriever population and to evaluate the performance of different genomic selection methods. A sample of 1179 Labrador Retrievers evaluated for CHD according to the UK scoring method (hip score, HS was genotyped with the Illumina CanineHD BeadChip. Twelve functions of HS and its component traits were analyzed using different statistical methods (GBLUP, Bayes C and Single-Step methods, and results were compared with a pedigree-based approach (BLUP using cross-validation. Genomic methods resulted in similar or higher accuracies than pedigree-based methods with training sets of 944 individuals for all but the untransformed HS, suggesting that genomic selection is an effective strategy. GBLUP and Bayes C gave similar prediction accuracies for HS and related traits, indicating a polygenic architecture. This conclusion was also supported by the low accuracies obtained in additional GBLUP analyses performed using only the SNPs with highest test statistics, also indicating that marker-assisted selection would not be as effective as genomic selection. A Single-Step method that combines genomic and pedigree information also showed higher accuracy than GBLUP and Bayes C for the log-transformed HS, which is currently used for pedigree based evaluations in UK. In conclusion, genomic selection is a promising alternative to pedigree-based selection against CHD, requiring more phenotypes with genomic data to improve further the accuracy

  10. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... explored. Results show that low sequencing depth caused underestimation of allele substitution effects in GWAS and overestimation of genomic heritability in prediction studies. Other factors susch as SNP marker density, population structure and size of training population influenced accuracy of genomic...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  11. CpGislandEVO: A Database and Genome Browser for Comparative Evolutionary Genomics of CpG Islands

    Directory of Open Access Journals (Sweden)

    Guillermo Barturen

    2013-01-01

    Full Text Available Hypomethylated, CpG-rich DNA segments (CpG islands, CGIs are epigenome markers involved in key biological processes. Aberrant methylation is implicated in the appearance of several disorders as cancer, immunodeficiency, or centromere instability. Furthermore, methylation differences at promoter regions between human and chimpanzee strongly associate with genes involved in neurological/psychological disorders and cancers. Therefore, the evolutionary comparative analyses of CGIs can provide insights on the functional role of these epigenome markers in both health and disease. Given the lack of specific tools, we developed CpGislandEVO. Briefly, we first compile a database of statistically significant CGIs for the best assembled mammalian genome sequences available to date. Second, by means of a coupled browser front-end, we focus on the CGIs overlapping orthologous genes extracted from OrthoDB, thus ensuring the comparison between CGIs located on truly homologous genome segments. This allows comparing the main compositional features between homologous CGIs. Finally, to facilitate nucleotide comparisons, we lifted genome coordinates between assemblies from different species, which enables the analysis of sequence divergence by direct count of nucleotide substitutions and indels occurring between homologous CGIs. The resulting CpGislandEVO database, linking together CGIs and single-cytosine DNA methylation data from several mammalian species, is freely available at our website.

  12. Why close a bacterial genome? The plasmid of Alteromonas macleodii HOT1A3 is a vector for inter-specific transfer of a flexible genomic island

    Directory of Open Access Journals (Sweden)

    Eduard eFadeev

    2016-03-01

    Full Text Available Genome sequencing is rapidly becoming a staple technique in environmental and clinical microbiology, yet computational challenges still remain, leading to many draft genomes which are typically fragmented into many contigs. We sequenced and completely assembled the genome of a marine heterotrophic bacterium, Alteromonas macleodii HOT1A3, and compared its full genome to several draft genomes obtained using different reference-based and de-novo methods. In general, the de-novo assemblies clearly outperformed the reference-based or hybrid ones, covering>99% of the genes and representing essentially all of the gene functions. However, only the fully closed genome (~4.5Mbp allowed us to identify the presence of a large, 148 kbp plasmid, pAM1A3. While HOT1A3 belongs to Alteromonas macleodii, typically found in surface waters (surface ecotype, this plasmid consists of an almost complete flexible genomic island, containing many genes involved in metal resistance previously identified in the genomes of Alteromonas mediterranea (deep ecotype. Indeed, similar to A. mediterranea, A. macleodii HOT1A3 grows at concentrations of zinc, mercury and copper that are inhibitory for other A. macleodii strains. The presence of a plasmid encoding almost an entire flexible genomic island suggests that wholesale genomic exchange between heterotrophic marine bacteria belonging to related but ecologically different populations is not uncommon.

  13. Genomic prediction based on data from three layer lines: a comparison between linear methods

    NARCIS (Netherlands)

    Calus, M.P.L.; Huang, H.; Vereijken, J.; Visscher, J.; Napel, ten J.; Windig, J.J.

    2014-01-01

    Background The prediction accuracy of several linear genomic prediction models, which have previously been used for within-line genomic prediction, was evaluated for multi-line genomic prediction. Methods Compared to a conventional BLUP (best linear unbiased prediction) model using pedigree data, we

  14. Genomic prediction of reproduction traits for Merino sheep.

    Science.gov (United States)

    Bolormaa, S; Brown, D J; Swan, A A; van der Werf, J H J; Hayes, B J; Daetwyler, H D

    2017-06-01

    Economically important reproduction traits in sheep, such as number of lambs weaned and litter size, are expressed only in females and later in life after most selection decisions are made, which makes them ideal candidates for genomic selection. Accurate genomic predictions would lead to greater genetic gain for these traits by enabling accurate selection of young rams with high genetic merit. The aim of this study was to design and evaluate the accuracy of a genomic prediction method for female reproduction in sheep using daughter trait deviations (DTD) for sires and ewe phenotypes (when individual ewes were genotyped) for three reproduction traits: number of lambs born (NLB), litter size (LSIZE) and number of lambs weaned. Genomic best linear unbiased prediction (GBLUP), BayesR and pedigree BLUP analyses of the three reproduction traits measured on 5340 sheep (4503 ewes and 837 sires) with real and imputed genotypes for 510 174 SNPs were performed. The prediction of breeding values using both sire and ewe trait records was validated in Merino sheep. Prediction accuracy was evaluated by across sire family and random cross-validations. Accuracies of genomic estimated breeding values (GEBVs) were assessed as the mean Pearson correlation adjusted by the accuracy of the input phenotypes. The addition of sire DTD into the prediction analysis resulted in higher accuracies compared with using only ewe records in genomic predictions or pedigree BLUP. Using GBLUP, the average accuracy based on the combined records (ewes and sire DTD) was 0.43 across traits, but the accuracies varied by trait and type of cross-validations. The accuracies of GEBVs from random cross-validations (range 0.17-0.61) were higher than were those from sire family cross-validations (range 0.00-0.51). The GEBV accuracies of 0.41-0.54 for NLB and LSIZE based on the combined records were amongst the highest in the study. Although BayesR was not significantly different from GBLUP in prediction accuracy

  15. Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models

    Science.gov (United States)

    Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo

    2016-01-01

    The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970

  16. Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models

    Directory of Open Access Journals (Sweden)

    Jaime Cuevas

    2017-01-01

    Full Text Available The phenomenon of genotype × environment (G × E interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects ( u that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP and Gaussian (Gaussian kernel, GK. The other model has the same genetic component as the first model ( u plus an extra component, f, that captures random effects between environments that were not captured by the random effects u . We used five CIMMYT data sets (one maize and four wheat that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u   and   f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u .

  17. Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models.

    Science.gov (United States)

    Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A; Burgueño, Juan; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo

    2017-01-05

    The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects [Formula: see text] that can be assessed by the Kronecker product of variance-covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model [Formula: see text] plus an extra component, F: , that captures random effects between environments that were not captured by the random effects [Formula: see text] We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with [Formula: see text] over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect [Formula: see text]. Copyright © 2017 Cuevas et al.

  18. Genomics and the prediction of xenobiotic toxicity

    International Nuclear Information System (INIS)

    Meyer, Urs-A.; Gut, Josef

    2002-01-01

    The systematic identification and functional analysis of human genes is revolutionizing the study of disease processes and the development and rational use of drugs. It increasingly enables medicine to make reliable assessments of the individual risk to acquire a particular disease, raises the number and specificity of drug targets and explains interindividual variation of the effectiveness and toxicity of drugs. Mutant alleles at a single gene locus for more than 20 drug metabolizing enzymes are some of the best studied individual risk factors for adverse drug reactions and xenobiotic toxicity. Increasingly, genetic polymorphisms of transporter and receptor systems are also recognized as causing interindividual variation in drug response and drug toxicity. However, pharmacogenetic and toxicogenetic factors rarely act alone; they produce a phenotype in concert with other variant genes and with environmental factors. Environmental factors may affect gene expression in many ways. For instance, numerous drugs induce their own and the metabolism of other xenobiotics by interacting with nuclear receptors such as AhR, PPAR, PXR and CAR. Genomics is providing the information and technology to analyze these complex situations to obtain individual genotypic and gene expression information to assess the risk of toxicity

  19. Genomic predictions across Nordic Holstein and Nordic Red using the genomic best linear unbiased prediction model with different genomic relationship matrices.

    Science.gov (United States)

    Zhou, L; Lund, M S; Wang, Y; Su, G

    2014-08-01

    This study investigated genomic predictions across Nordic Holstein and Nordic Red using various genomic relationship matrices. Different sources of information, such as consistencies of linkage disequilibrium (LD) phase and marker effects, were used to construct the genomic relationship matrices (G-matrices) across these two breeds. Single-trait genomic best linear unbiased prediction (GBLUP) model and two-trait GBLUP model were used for single-breed and two-breed genomic predictions. The data included 5215 Nordic Holstein bulls and 4361 Nordic Red bulls, which was composed of three populations: Danish Red, Swedish Red and Finnish Ayrshire. The bulls were genotyped with 50 000 SNP chip. Using the two-breed predictions with a joint Nordic Holstein and Nordic Red reference population, accuracies increased slightly for all traits in Nordic Red, but only for some traits in Nordic Holstein. Among the three subpopulations of Nordic Red, accuracies increased more for Danish Red than for Swedish Red and Finnish Ayrshire. This is because closer genetic relationships exist between Danish Red and Nordic Holstein. Among Danish Red, individuals with higher genomic relationship coefficients with Nordic Holstein showed more increased accuracies in the two-breed predictions. Weighting the two-breed G-matrices by LD phase consistencies, marker effects or both did not further improve accuracies of the two-breed predictions. © 2014 Blackwell Verlag GmbH.

  20. Using Genetic Distance to Infer the Accuracy of Genomic Prediction.

    Directory of Open Access Journals (Sweden)

    Marco Scutari

    2016-09-01

    Full Text Available The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.

  1. Molecular characteristics of Salmonella genomic island 1 in Proteus mirabilis isolates from poultry farms in China.

    Science.gov (United States)

    Lei, Chang-Wei; Zhang, An-Yun; Liu, Bi-Hui; Wang, Hong-Ning; Guan, Zhong-Bin; Xu, Chang-Wen; Xia, Qing-Qing; Cheng, Han; Zhang, Dong-Dong

    2014-12-01

    Six out of the 64 studied Proteus mirabilis isolates from 11 poultry farms in China contained Salmonella genomic island 1 (SGI1). PCR mapping showed that the complete nucleotide sequences of SGI1s ranged from 33.2 to 42.5 kb. Three novel variants, SGI1-W, SGI1-X, and SGI1-Y, have been characterized. Resistance genes lnuF, dfrA25, and qnrB2 were identified in SGI1 for the first time. Copyright © 2014, American Society for Microbiology. All Rights Reserved.

  2. Tracing common origins of Genomic Islands in prokaryotes based on genome signature analyses.

    Science.gov (United States)

    van Passel, Mark Wj

    2011-09-01

    Horizontal gene transfer constitutes a powerful and innovative force in evolution, but often little is known about the actual origins of transferred genes. Sequence alignments are generally of limited use in tracking the original donor, since still only a small fraction of the total genetic diversity is thought to be uncovered. Alternatively, approaches based on similarities in the genome specific relative oligonucleotide frequencies do not require alignments. Even though the exact origins of horizontally transferred genes may still not be established using these compositional analyses, it does suggest that compositionally very similar regions are likely to have had a common origin. These analyses have shown that up to a third of large acquired gene clusters that reside in the same genome are compositionally very similar, indicative of a shared origin. This brings us closer to uncovering the original donors of horizontally transferred genes, and could help in elucidating possible regulatory interactions between previously unlinked sequences.

  3. Mobilisation and remobilisation of a large archetypal pathogenicity island of uropathogenic Escherichia coli in vitro support the role of conjugation for horizontal transfer of genomic islands

    Directory of Open Access Journals (Sweden)

    Hochhut Bianca

    2011-09-01

    Full Text Available Abstract Background A substantial amount of data has been accumulated supporting the important role of genomic islands (GEIs - including pathogenicity islands (PAIs - in bacterial genome plasticity and the evolution of bacterial pathogens. Their instability and the high level sequence similarity of different (partial islands suggest an exchange of PAIs between strains of the same or even different bacterial species by horizontal gene transfer (HGT. Transfer events of archetypal large genomic islands of enterobacteria which often lack genes required for mobilisation or transfer have been rarely investigated so far. Results To study mobilisation of such large genomic regions in prototypic uropathogenic E. coli (UPEC strain 536, PAI II536 was supplemented with the mobRP4 region, an origin of replication (oriVR6K, an origin of transfer (oriTRP4 and a chloramphenicol resistance selection marker. In the presence of helper plasmid RP4, conjugative transfer of the 107-kb PAI II536 construct occured from strain 536 into an E. coli K-12 recipient. In transconjugants, PAI II536 existed either as a cytoplasmic circular intermediate (CI or integrated site-specifically into the recipient's chromosome at the leuX tRNA gene. This locus is the chromosomal integration site of PAI II536 in UPEC strain 536. From the E. coli K-12 recipient, the chromosomal PAI II536 construct as well as the CIs could be successfully remobilised and inserted into leuX in a PAI II536 deletion mutant of E. coli 536. Conclusions Our results corroborate that mobilisation and conjugal transfer may contribute to evolution of bacterial pathogens through horizontal transfer of large chromosomal regions such as PAIs. Stabilisation of these mobile genetic elements in the bacterial chromosome result from selective loss of mobilisation and transfer functions of genomic islands.

  4. Genomic Prediction of Testcross Performance in Canola (Brassica napus)

    Science.gov (United States)

    Jan, Habib U.; Abbadi, Amine; Lücke, Sophie; Nichols, Richard A.; Snowdon, Rod J.

    2016-01-01

    Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable

  5. Genetics of Genome-Wide Recombination Rate Evolution in Mice from an Isolated Island.

    Science.gov (United States)

    Wang, Richard J; Payseur, Bret A

    2017-08-01

    Recombination rate is a heritable quantitative trait that evolves despite the fundamentally conserved role that recombination plays in meiosis. Differences in recombination rate can alter the landscape of the genome and the genetic diversity of populations. Yet our understanding of the genetic basis of recombination rate evolution in nature remains limited. We used wild house mice ( Mus musculus domesticus ) from Gough Island (GI), which diverged recently from their mainland counterparts, to characterize the genetics of recombination rate evolution. We quantified genome-wide autosomal recombination rates by immunofluorescence cytology in spermatocytes from 240 F 2 males generated from intercrosses between GI-derived mice and the wild-derived inbred strain WSB/EiJ. We identified four quantitative trait loci (QTL) responsible for inter-F 2 variation in this trait, the strongest of which had effects that opposed the direction of the parental trait differences. Candidate genes and mutations for these QTL were identified by overlapping the detected intervals with whole-genome sequencing data and publicly available transcriptomic profiles from spermatocytes. Combined with existing studies, our findings suggest that genome-wide recombination rate divergence is not directional and its evolution within and between subspecies proceeds from distinct genetic loci. Copyright © 2017 by the Genetics Society of America.

  6. Description of genomic islands associated to the multidrug-resistant Pseudomonas aeruginosa clone ST277.

    Science.gov (United States)

    Silveira, Melise Chaves; Albano, Rodolpho Mattos; Asensi, Marise Dutra; Carvalho-Assef, Ana Paula D'Alincourt

    2016-08-01

    Multidrug-resistant Pseudomonas aeruginosa clone ST277 is disseminated in Brazil where it is mainly associated with the presence of metallo-β-lactamase SPM-1. Furthermore, it carries the class I integron In163 and a 16S rRNA methylase rmtD that confers aminoglycoside resistance. To analyze the genetic characteristics that might be responsible for the success of this endemic clone, genomes of four P. aeruginosa strains that were isolated in distinct years and in different Brazilian states were sequenced. The strains differed regarding the presence of the genes blaSPM-1 and rmtD. Genomic comparisons that included genomes of other clones that have spread worldwide from this species were also performed. These analyses revealed a 763,863bp region in the P. aeruginosa chromosome that concentrates acquired genetic structures comprising two new genomic islands (PAGI-13 and PAGI-14), a mobile element that could be used for ST277 fingerprinting and a recently reported Integrative and Conjugative Element (ICE) associated to blaSPM-1. The genetic elements rmtD and In163 are inserted in PAGI-13 while PAGI-14 has genes encoding proteins related to type III restriction system and phages. The data reported in this study provide a basis for a clearer understanding of the genetic content of clone ST277 and illustrate the mechanisms that are responsible for the success of these endemic clones. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. An assessment on epitope prediction methods for protozoa genomes

    Directory of Open Access Journals (Sweden)

    Resende Daniela M

    2012-11-01

    Full Text Available Abstract Background Epitope prediction using computational methods represents one of the most promising approaches to vaccine development. Reduction of time, cost, and the availability of completely sequenced genomes are key points and highly motivating regarding the use of reverse vaccinology. Parasites of genus Leishmania are widely spread and they are the etiologic agents of leishmaniasis. Currently, there is no efficient vaccine against this pathogen and the drug treatment is highly toxic. The lack of sufficiently large datasets of experimentally validated parasites epitopes represents a serious limitation, especially for trypanomatids genomes. In this work we highlight the predictive performances of several algorithms that were evaluated through the development of a MySQL database built with the purpose of: a evaluating individual algorithms prediction performances and their combination for CD8+ T cell epitopes, B-cell epitopes and subcellular localization by means of AUC (Area Under Curve performance and a threshold dependent method that employs a confusion matrix; b integrating data from experimentally validated and in silico predicted epitopes; and c integrating the subcellular localization predictions and experimental data. NetCTL, NetMHC, BepiPred, BCPred12, and AAP12 algorithms were used for in silico epitope prediction and WoLF PSORT, Sigcleave and TargetP for in silico subcellular localization prediction against trypanosomatid genomes. Results A database-driven epitope prediction method was developed with built-in functions that were capable of: a removing experimental data redundancy; b parsing algorithms predictions and storage experimental validated and predict data; and c evaluating algorithm performances. Results show that a better performance is achieved when the combined prediction is considered. This is particularly true for B cell epitope predictors, where the combined prediction of AAP12 and BCPred12 reached an AUC value

  8. [Prediction in medicine--genome contra envirome].

    Science.gov (United States)

    Brdicka, Radim

    2012-01-01

    Human phenotype is governed by its genotype--a set of genetic information materialized in DNA. Using traditional terminology we speak about a little more than 20 thousands genes that differ in strength to become realized and their effect is modified by a large number of other genes. The result originates from firmly established programmes we obtained from our ancestors. Development and activity of such molecules selected for maintenance, copying and transfer of information i.e. nucleic acids can be followed back to the very origin of the life. Nevertheless the final result is achieved not only by confrontation of the original information with other genetic information but largely also by external influences--environment. Though we are relatively successful in understanding what we have inherited from our parents, our knowledge of environmental factors and their effects on formation of the phenotype is still limited. From this point of view medical prediction has always to be very cautious and interpretations at the probability level must be done by a very experienced and responsible professional.

  9. The effect of genealogy-based haplotypes on genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Fernando, Rohan L.; Su, Guosheng

    2013-01-01

    on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using...... local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (pi) of the haplotype covariates had zero effect......, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some...

  10. SNP-associations and phenotype predictions from hundreds of microbial genomes without genome alignments.

    Science.gov (United States)

    Hall, Barry G

    2014-01-01

    SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such studies in bacterial or viral genomes. The PPFS (Predict Phenotypes From SNPs) package described here is an add-on to kSNP , a program that can identify SNPs in a data set of hundreds of microbial genomes. PPFS identifies those SNPs that are non-randomly associated with a phenotype based on the χ² probability, then uses those diagnostic SNPs for two distinct, but related, purposes: (1) to predict the phenotypes of strains whose phenotypes are unknown, and (2) to identify those diagnostic SNPs that are most likely to be causally related to the phenotype. In the example illustrated here, from a set of 68 E. coli genomes, for 67 of which the pathogenicity phenotype was known, there were 418,500 SNPs. Using the phenotypes of 36 of those strains, PPFS identified 207 diagnostic SNPs. The diagnostic SNPs predicted the phenotypes of all of the genomes with 97% accuracy. It then identified 97 SNPs whose probability of being causally related to the pathogenic phenotype was >0.999. In a second example, from a set of 116 E. coli genome sequences, using the phenotypes of 65 strains PPFS identified 101 SNPs that predicted the source host (human or non-human) with 90% accuracy.

  11. The New Macrolide-Lincosamide-Streptogramin B Resistance Gene erm(45) Is Located within a Genomic Island in Staphylococcus fleurettii

    DEFF Research Database (Denmark)

    Wipf, Juliette R K; Schwendener, Sybille; Nielsen, Jesper Boye

    2015-01-01

    Genome alignment of a macrolide, lincosamide, and streptogramin B (MLSB)-resistant Staphylococcus fleurettii strain with an MLSB-susceptible S. fleurettii strain revealed a novel 11,513-bp genomic island carrying the new erythromycin resistance methylase gene erm(45). This gene was shown to confer...... inducible MLSB resistance when cloned into Staphylococcus aureus. The erm(45)-containing island was integrated into the housekeeping gene guaA in S. fleurettii and was able to form a circular intermediate but was not transmissible to S. aureus....

  12. A Bayesian Network to Predict Barrier Island Geomorphologic Characteristics

    Science.gov (United States)

    Gutierrez, B.; Plant, N. G.; Thieler, E. R.; Turecek, A.; Stippa, S.

    2014-12-01

    Understanding how barrier islands along the Atlantic and Gulf coasts of the United States respond to storms and sea-level rise is an important management concern. Although these threats are well recognized, quantifying the integrated vulnerability is challenging due to the range of time and space scalesover which these processes act. Developing datasets and methods to identify the physical vulnerabilities of coastal environments due to storms and sea-level rise thus is an important scientific focus that supports land management decision making. Here we employ a Bayesian Network (BN) to model the interactions between geomorphic variables sampled from existing datasets that capture both storm-and sea-level rise related coastal evolution. The BN provides a means of estimating probabilities of changes in specific geomorphic characteristics such as foredune crest height, beach width, beach height, given knowledge of barrier island width, maximum barrier island elevation, distance from an inlet, the presence of anthropogenic modifications, and long-term shoreline change rates, which we assume to be directly related to sea-level rise. We evaluate BN skill and explore how different constraints, such as shoreline change characteristics (eroding, stable, accreting), distance to nearby inlets and island width, affect the probability distributions of future morphological characteristics. Our work demonstrates that a skillful BN can be constructed and that factors such as distance to inlet, shoreline change rate, and the presence of human alterations have the strongest influences on network performance. For Assateague Island, Maryland/Virginia, USA, we find that different shoreline change behaviors affect the probabilities of specific geomorphic characteristics, such as dune height, which allows us to identify vulnerable locations on the barrier island where habitat or infrastructure may be vulnerable to storms and sea-level rise.

  13. Genome-wide CpG island methylation analysis implicates novel genes in the pathogenesis of renal cell carcinoma

    OpenAIRE

    Ricketts, Christopher J.; Morris, Mark R.; Gentle, Dean; Brown, Michael; Wake, Naomi; Woodward, Emma R.; Clarke, Noel; Latif, Farida; Maher, Eamonn R.

    2012-01-01

    In order to identify novel candidate tumor suppressor genes (TSGs) implicated in renal cell carcinoma (RCC), we performed genome-wide methylation profiling of RCC using the HumanMethylation27 BeadChips to assess methylation at >14,000 genes. Two hundred and twenty hypermethylated probes representing 205 loci/genes were identified in genomic CpG islands. A subset of TSGs investigated in detail exhibited frequent tumor methylation, promoter methylation associated transcriptional silencing an...

  14. Genomic prediction for tuberculosis resistance in dairy cattle.

    Directory of Open Access Journals (Sweden)

    Smaragda Tsairidou

    Full Text Available The increasing prevalence of bovine tuberculosis (bTB in the UK and the limitations of the currently available diagnostic and control methods require the development of complementary approaches to assist in the sustainable control of the disease. One potential approach is the identification of animals that are genetically more resistant to bTB, to enable breeding of animals with enhanced resistance. This paper focuses on prediction of resistance to bTB. We explore estimation of direct genomic estimated breeding values (DGVs for bTB resistance in UK dairy cattle, using dense SNP chip data, and test these genomic predictions for situations when disease phenotypes are not available on selection candidates.We estimated DGVs using genomic best linear unbiased prediction methodology, and assessed their predictive accuracies with a cross validation procedure and receiver operator characteristic (ROC curves. Furthermore, these results were compared with theoretical expectations for prediction accuracy and area-under-the-ROC-curve (AUC. The dataset comprised 1151 Holstein-Friesian cows (bTB cases or controls. All individuals (592 cases and 559 controls were genotyped for 727,252 loci (Illumina Bead Chip. The estimated observed heritability of bTB resistance was 0.23±0.06 (0.34 on the liability scale and five-fold cross validation, replicated six times, provided a prediction accuracy of 0.33 (95% C.I.: 0.26, 0.40. ROC curves, and the resulting AUC, gave a probability of 0.58, averaged across six replicates, of correctly classifying cows as diseased or as healthy based on SNP chip genotype alone using these data.These results provide a first step in the investigation of the potential feasibility of genomic selection for bTB resistance using SNP data. Specifically, they demonstrate that genomic selection is possible, even in populations with no pedigree data and on animals lacking bTB phenotypes. However, a larger training population will be required to

  15. A human genome-wide library of local phylogeny predictions for whole-genome inference problems

    Directory of Open Access Journals (Sweden)

    Schwartz Russell

    2008-08-01

    Full Text Available Abstract Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation.

  16. Genomic prediction of complex human traits: relatedness, trait architecture and predictive meta-models

    Science.gov (United States)

    Spiliopoulou, Athina; Nagy, Reka; Bermingham, Mairead L.; Huffman, Jennifer E.; Hayward, Caroline; Vitart, Veronique; Rudan, Igor; Campbell, Harry; Wright, Alan F.; Wilson, James F.; Pong-Wong, Ricardo; Agakov, Felix; Navarro, Pau; Haley, Chris S.

    2015-01-01

    We explore the prediction of individuals' phenotypes for complex traits using genomic data. We compare several widely used prediction models, including Ridge Regression, LASSO and Elastic Nets estimated from cohort data, and polygenic risk scores constructed using published summary statistics from genome-wide association meta-analyses (GWAMA). We evaluate the interplay between relatedness, trait architecture and optimal marker density, by predicting height, body mass index (BMI) and high-density lipoprotein level (HDL) in two data cohorts, originating from Croatia and Scotland. We empirically demonstrate that dense models are better when all genetic effects are small (height and BMI) and target individuals are related to the training samples, while sparse models predict better in unrelated individuals and when some effects have moderate size (HDL). For HDL sparse models achieved good across-cohort prediction, performing similarly to the GWAMA risk score and to models trained within the same cohort, which indicates that, for predicting traits with moderately sized effects, large sample sizes and familial structure become less important, though still potentially useful. Finally, we propose a novel ensemble of whole-genome predictors with GWAMA risk scores and demonstrate that the resulting meta-model achieves higher prediction accuracy than either model on its own. We conclude that although current genomic predictors are not accurate enough for diagnostic purposes, performance can be improved without requiring access to large-scale individual-level data. Our methodologically simple meta-model is a means of performing predictive meta-analysis for optimizing genomic predictions and can be easily extended to incorporate multiple population-level summary statistics or other domain knowledge. PMID:25918167

  17. Psoriasis prediction from genome-wide SNP profiles

    Directory of Open Access Journals (Sweden)

    Fang Xiangzhong

    2011-01-01

    Full Text Available Abstract Background With the availability of large-scale genome-wide association study (GWAS data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs to predict psoriasis from searching GWAS data. Methods Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB method was compared with classical linear discriminant analysis(LDA for classification performance. Results The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698, while only 0.520(95% CI: 0.472-0.524 was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study. Conclusions The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.

  18. Predicting genotypes environmental range from genome-environment associations.

    Science.gov (United States)

    Manel, Stéphanie; Andrello, Marco; Henry, Karine; Verdelet, Daphné; Darracq, Aude; Guerin, Pierre-Edouard; Desprez, Bruno; Devaux, Pierre

    2018-05-17

    Genome-environment association methods aim to detect genetic markers associated with environmental variables. The detected associations are usually analysed separately to identify the genomic regions involved in local adaptation. However, a recent study suggests that single-locus associations can be combined and used in a predictive way to estimate environmental variables for new individuals on the basis of their genotypes. Here, we introduce an original approach to predict the environmental range (values and upper and lower limits) of species genotypes from the genetic markers significantly associated with those environmental variables in an independent set of individuals. We illustrate this approach to predict aridity in a database constituted of 950 individuals of wild beets and 299 individuals of cultivated beets genotyped at 14,409 random Single Nucleotide Polymorphisms (SNPs). We detected 66 alleles associated with aridity and used them to calculate the fraction (I) of aridity-associated alleles in each individual. The fraction I correctly predicted the values of aridity in an independent validation set of wild individuals and was then used to predict aridity in the 299 cultivated individuals. Wild individuals had higher median values and a wider range of values of aridity than the cultivated individuals, suggesting that wild individuals have higher ability to resist to stress-aridity conditions and could be used to improve the resistance of cultivated varieties to aridity. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  19. Genomic prediction in a breeding program of perennial ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario; Ashraf, Bilal; Greve-Pedersen, Morten

    2015-01-01

    We present a genomic selection study performed on 1918 rye grass families (Lolium perenne L.), which were derived from a commercial breeding program at DLF-Trifolium, Denmark. Phenotypes were recorded on standard plots, across 13 years and in 6 different countries. Variants were identified...... this set. Estimated Breeding Value and prediction accuracies were calculated trough two different cross-validation schemes: (i) k-fold (k=10); (ii) leaving out one parent combination at the time, in order to test for accuracy of predicting new families. Accuracies ranged between 0.56 and 0.97 for scheme (i....... A larger set of 1791 F2s were used as training set to predict EBVs of 127 synthetic families (originated from poly-crosses between 5-11 single plants) for heading date and crown rust resistance. Prediction accuracies were 0.93 and 0.57 respectively. Results clearly demonstrate considerable potential...

  20. Antibiotic resistance, integrons and Salmonella genomic island 1 among non-typhoidal Salmonella serovars in The Netherlands.

    NARCIS (Netherlands)

    Vo, An T T; Duijkeren, Engeline van; Fluit, Ad C; Wannet, Wim J B; Verbruggen, Anjo J; Maas, Henny M E; Gaastra, Wim

    2006-01-01

    The objective of this study was to investigate the antimicrobial resistance patterns, integron characteristics and gene cassettes as well as the presence of Salmonella genomic island 1 (SGI1) in non-typhoidal Salmonella (NTS) isolates from human and animal origin. Epidemiologically unrelated Dutch

  1. A genomic island provides Acidithiobacillus ferrooxidans ATCC 53993 additional copper resistance: a possible competitive advantage.

    Science.gov (United States)

    Orellana, Luis H; Jerez, Carlos A

    2011-11-01

    There is great interest in understanding how extremophilic biomining bacteria adapt to exceptionally high copper concentrations in their environment. Acidithiobacillus ferrooxidans ATCC 53993 genome possesses the same copper resistance determinants as strain ATCC 23270. However, the former strain contains in its genome a 160-kb genomic island (GI), which is absent in ATCC 23270. This GI contains, amongst other genes, several genes coding for an additional putative copper ATPase and a Cus system. A. ferrooxidans ATCC 53993 showed a much higher resistance to CuSO(4) (>100 mM) than that of strain ATCC 23270 (<25 mM). When a similar number of bacteria from each strain were mixed and allowed to grow in the absence of copper, their respective final numbers remained approximately equal. However, in the presence of copper, there was a clear overgrowth of strain ATCC 53993 compared to ATCC 23270. This behavior is most likely explained by the presence of the additional copper-resistance genes in the GI of strain ATCC 53993. As determined by qRT-PCR, it was demonstrated that these genes are upregulated when A. ferrooxidans ATCC 53993 is grown in the presence of copper and were shown to be functional when expressed in copper-sensitive Escherichia coli mutants. Thus, the reason for resistance to copper of two strains of the same acidophilic microorganism could be determined by slight differences in their genomes, which may not only lead to changes in their capacities to adapt to their environment, but may also help to select the more fit microorganisms for industrial biomining operations. © Springer-Verlag 2011

  2. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    OpenAIRE

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Deng, Zixin; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’....

  3. Antimicrobial resistance, class 1 integrons, and genomic island 1 in Salmonella isolates from Vietnam.

    Directory of Open Access Journals (Sweden)

    An T T Vo

    Full Text Available BACKGROUND: The objective was to investigate the phenotypic and genotypic resistance and the horizontal transfer of resistance determinants from Salmonella isolates from humans and animals in Vietnam. METHODOLOGY/PRINCIPAL FINDINGS: The susceptibility of 297 epidemiologically unrelated non-typhoid Salmonella isolates was investigated by disk diffusion assay. The isolates were screened for the presence of class 1 integrons and Salmonella genomic island 1 by PCR. The potential for the transfer of resistance determinants was investigated by conjugation experiments. Resistance to gentamicin, kanamycin, chloramphenicol, streptomycin, trimethoprim, ampicillin, nalidixic acid, sulphonamides, and tetracycline was found in 13 to 50% of the isolates. Nine distinct integron types were detected in 28% of the isolates belonging to 11 Salmonella serovars including S. Tallahassee. Gene cassettes identified were aadA1, aadA2, aadA5, bla(PSE-1, bla(OXA-30, dfrA1, dfrA12, dfrA17, and sat, as well as open reading frames with unknown functions. Most integrons were located on conjugative plasmids, which can transfer their antimicrobial resistance determinants to Escherichia coli or Salmonella Enteritidis, or with Salmonella Genomic Island 1 or its variants. The resistance gene cluster in serovar Emek identified by PCR mapping and nucleotide sequencing contained SGI1-J3 which is integrated in SGI1 at another position than the majority of SGI1. This is the second report on the insertion of SGI1 at this position. High-level resistance to fluoroquinolones was found in 3 multiresistant S. Typhimurium isolates and was associated with mutations in the gyrA gene leading to the amino acid changes Ser83Phe and Asp87Asn. CONCLUSIONS: Resistance was common among Vietnamese Salmonella isolates from different sources. Legislation to enforce a more prudent use of antibiotics in both human and veterinary medicine should be implemented by the authorities in Vietnam.

  4. Implementation of genomic prediction in Lolium perenne (L. breeding populations

    Directory of Open Access Journals (Sweden)

    Nastasiya F Grinberg

    2016-02-01

    Full Text Available Perennial ryegrass (Lolium perenne L. is one of the most widely grown forage grasses in temperate agriculture. In order to maintain and increase its usage as forage in livestock agriculture, there is a continued need for improvement in biomass yield, quality, disease resistance and seed yield. Genetic gain for traits such as biomass yield has been relatively modest. This has been attributed to its long breeding cycle, and the necessity to use population based breeding methods. Thanks to recent advances in genotyping techniques there is increasing interest in genomic selection from which genomically estimated breeding values (GEBV are derived. In this paper we compare the classical RRBLUP model with state-of-the-art machine learning (ML techniques that should yield themselves easily to use in GS and demonstrate their application to predicting quantitative traits in a breeding population of L. perenne. Prediction accuracies varied from 0 to 0.59 depending on trait, prediction model and composition of the training population. The BLUP model produced the highest prediction accuracies for most traits and training populations. Forage quality traits had the highest accuracies compared to yield related traits. There appeared to be no clear pattern to the effect of the training population composition on the prediction accuracies. The heritability of the forage quality traits was generally higher than for the yield related traits, and could partly explain the difference in accuracy. Some population structure was evident in the breeding populations, and probably contributed to the varying effects of training population on the predictions. The average linkage disequilibrium (LD between adjacent markers ranged from 0.121 to 0.215. Higher marker density and larger training population closely related with the test population are likely to improve the prediction accuracy.

  5. A Bayesian antedependence model for whole genome prediction.

    Science.gov (United States)

    Yang, Wenzhao; Tempelman, Robert J

    2012-04-01

    Hierarchical mixed effects models have been demonstrated to be powerful for predicting genomic merit of livestock and plants, on the basis of high-density single-nucleotide polymorphism (SNP) marker panels, and their use is being increasingly advocated for genomic predictions in human health. Two particularly popular approaches, labeled BayesA and BayesB, are based on specifying all SNP-associated effects to be independent of each other. BayesB extends BayesA by allowing a large proportion of SNP markers to be associated with null effects. We further extend these two models to specify SNP effects as being spatially correlated due to the chromosomally proximal effects of causal variants. These two models, that we respectively dub as ante-BayesA and ante-BayesB, are based on a first-order nonstationary antedependence specification between SNP effects. In a simulation study involving 20 replicate data sets, each analyzed at six different SNP marker densities with average LD levels ranging from r(2) = 0.15 to 0.31, the antedependence methods had significantly (P 0. 24) with differences exceeding 3%. A cross-validation study was also conducted on the heterogeneous stock mice data resource (http://mus.well.ox.ac.uk/mouse/HS/) using 6-week body weights as the phenotype. The antedependence methods increased cross-validation prediction accuracies by up to 3.6% compared to their classical counterparts (P benchmark data sets and demonstrated that the antedependence methods were more accurate than their classical counterparts for genomic predictions, even for individuals several generations beyond the training data.

  6. Complete chloroplast genome of Prunus yedoensis Matsum.(Rosaceae), wild and endemic flowering cherry on Jeju Island, Korea.

    Science.gov (United States)

    Cho, Myong-Suk; Hyun Cho, Chung; Yeon Kim, Su; Su Yoon, Hwan; Kim, Seung-Chul

    2016-09-01

    The complete chloroplast genome sequences of the wild flowering cherry, Prunus yedoensis Matsum., which is native and endemic to Jeju Island, Korea, is reported in this study. The genome size is 157 786 bp in length with 36.7% GC content, which is composed of LSC region of 85 908 bp, SSC region of 19 120 bp and two IR copies of 26 379 bp each. The cp genome contains 131 genes, including 86 coding genes, 8 rRNA genes and 37 tRNA genes. The maximum likelihood analysis was conducted to verify a phylogenetic position of the newly sequenced cp genome of P. yedoensis using 11 representatives of complete cp genome sequences within the family Rosaceae. The genus Prunus exhibited monophyly and the result of the phylogenetic relationship agreed with the previous phylogenetic analyses within Rosaceae.

  7. Predicting human height by Victorian and genomic methods.

    Science.gov (United States)

    Aulchenko, Yurii S; Struchalin, Maksim V; Belonogova, Nadezhda M; Axenovich, Tatiana I; Weedon, Michael N; Hofman, Albert; Uitterlinden, Andre G; Kayser, Manfred; Oostra, Ben A; van Duijn, Cornelia M; Janssens, A Cecile J W; Borodin, Pavel M

    2009-08-01

    In the Victorian era, Sir Francis Galton showed that 'when dealing with the transmission of stature from parents to children, the average height of the two parents, ... is all we need care to know about them' (1886). One hundred and twenty-two years after Galton's work was published, 54 loci showing strong statistical evidence for association to human height were described, providing us with potential genomic means of human height prediction. In a population-based study of 5748 people, we find that a 54-loci genomic profile explained 4-6% of the sex- and age-adjusted height variance, and had limited ability to discriminate tall/short people, as characterized by the area under the receiver-operating characteristic curve (AUC). In a family-based study of 550 people, with both parents having height measurements, we find that the Galtonian mid-parental prediction method explained 40% of the sex- and age-adjusted height variance, and showed high discriminative accuracy. We have also explored how much variance a genomic profile should explain to reach certain AUC values. For highly heritable traits such as height, we conclude that in applications in which parental phenotypic information is available (eg, medicine), the Victorian Galton's method will long stay unsurpassed, in terms of both discriminative accuracy and costs. For less heritable traits, and in situations in which parental information is not available (eg, forensics), genomic methods may provide an alternative, given that the variants determining an essential proportion of the trait's variation can be identified.

  8. Genomic Signal Processing: Predicting Basic Molecular Biological Principles

    Science.gov (United States)

    Alter, Orly

    2005-03-01

    Advances in high-throughput technologies enable acquisition of different types of molecular biological data, monitoring the flow of biological information as DNA is transcribed to RNA, and RNA is translated to proteins, on a genomic scale. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development. Recently we described data-driven models for genome-scale molecular biological data, which use singular value decomposition (SVD) and the comparative generalized SVD (GSVD). Now we describe an integrative data-driven model, which uses pseudoinverse projection (1). We also demonstrate the predictive power of these matrix algebra models (2). The integrative pseudoinverse projection model formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ``basis'' set. The mathematical variables of this integrative model, the pseudoinverse correlation patterns that are uncovered in the data, represent independent processes and corresponding cellular states (such as observed genome-wide effects of known regulators or transcription factors, the biological components of the cellular machinery that generate the genomic signals, and measured samples in which these regulators or transcription factors are over- or underactive). Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis, and gives a global picture of the correlations and possibly also causal coordination of

  9. Predicting genome-wide redundancy using machine learning

    Directory of Open Access Journals (Sweden)

    Shasha Dennis E

    2010-11-01

    Full Text Available Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1, suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

  10. Genome-wide association studies in an isolated founder population from the Pacific Island of Kosrae.

    Directory of Open Access Journals (Sweden)

    Jennifer K Lowe

    2009-02-01

    Full Text Available It has been argued that the limited genetic diversity and reduced allelic heterogeneity observed in isolated founder populations facilitates discovery of loci contributing to both Mendelian and complex disease. A strong founder effect, severe isolation, and substantial inbreeding have dramatically reduced genetic diversity in natives from the island of Kosrae, Federated States of Micronesia, who exhibit a high prevalence of obesity and other metabolic disorders. We hypothesized that genetic drift and possibly natural selection on Kosrae might have increased the frequency of previously rare genetic variants with relatively large effects, making these alleles readily detectable in genome-wide association analysis. However, mapping in large, inbred cohorts introduces analytic challenges, as extensive relatedness between subjects violates the assumptions of independence upon which traditional association test statistics are based. We performed genome-wide association analysis for 15 quantitative traits in 2,906 members of the Kosrae population, using novel approaches to manage the extreme relatedness in the sample. As positive controls, we observe association to known loci for plasma cholesterol, triglycerides, and C-reactive protein and to a compelling candidate loci for thyroid stimulating hormone and fasting plasma glucose. We show that our study is well powered to detect common alleles explaining >/=5% phenotypic variance. However, no such large effects were observed with genome-wide significance, arguing that even in such a severely inbred population, common alleles typically have modest effects. Finally, we show that a majority of common variants discovered in Caucasians have indistinguishable effect sizes on Kosrae, despite the major differences in population genetics and environment.

  11. Predicting statistical properties of open reading frames in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Katharina Mir

    Full Text Available An analytical model based on the statistical properties of Open Reading Frames (ORFs of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.

  12. Genomics of antibiotic-resistance prediction in Pseudomonas aeruginosa.

    Science.gov (United States)

    Jeukens, Julie; Freschi, Luca; Kukavica-Ibrulj, Irena; Emond-Rheault, Jean-Guillaume; Tucker, Nicholas P; Levesque, Roger C

    2017-06-02

    Antibiotic resistance is a worldwide health issue spreading quickly among human and animal pathogens, as well as environmental bacteria. Misuse of antibiotics has an impact on the selection of resistant bacteria, thus contributing to an increase in the occurrence of resistant genotypes that emerge via spontaneous mutation or are acquired by horizontal gene transfer. There is a specific and urgent need not only to detect antimicrobial resistance but also to predict antibiotic resistance in silico. We now have the capability to sequence hundreds of bacterial genomes per week, including assembly and annotation. Novel and forthcoming bioinformatics tools can predict the resistome and the mobilome with a level of sophistication not previously possible. Coupled with bacterial strain collections and databases containing strain metadata, prediction of antibiotic resistance and the potential for virulence are moving rapidly toward a novel approach in molecular epidemiology. Here, we present a model system in antibiotic-resistance prediction, along with its promises and limitations. As it is commonly multidrug resistant, Pseudomonas aeruginosa causes infections that are often difficult to eradicate. We review novel approaches for genotype prediction of antibiotic resistance. We discuss the generation of microbial sequence data for real-time patient management and the prediction of antimicrobial resistance. © 2017 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals Inc. on behalf of The New York Academy of Sciences.

  13. The dnd operon for DNA phosphorothioation modification system in Escherichia coli is located in diverse genomic islands.

    Science.gov (United States)

    Ho, Wing Sze; Ou, Hong-Yu; Yeo, Chew Chieng; Thong, Kwai Lin

    2015-03-17

    Strains of Escherichia coli that are non-typeable by pulsed-field gel electrophoresis (PFGE) due to in-gel degradation can influence their molecular epidemiological data. The DNA degradation phenotype (Dnd(+)) is mediated by the dnd operon that encode enzymes catalyzing the phosphorothioation of DNA, rendering the modified DNA susceptible to oxidative cleavage during a PFGE run. In this study, a PCR assay was developed to detect the presence of the dnd operon in Dnd(+) E. coli strains and to improve their typeability. Investigations into the genetic environments of the dnd operon in various E. coli strains led to the discovery that the dnd operon is harboured in various diverse genomic islands. The dndBCDE genes (dnd operon) were detected in all Dnd(+) E. coli strains by PCR. The addition of thiourea improved the typeability of Dnd(+) E. coli strains to 100% using PFGE and the Dnd(+) phenotype can be observed in both clonal and genetically diverse E. coli strains. Genomic analysis of 101 dnd operons from genome sequences of Enterobacteriaceae revealed that the dnd operons of the same bacterial species were generally clustered together in the phylogenetic tree. Further analysis of dnd operons of 52 E. coli genomes together with their respective immediate genetic environments revealed a total of 7 types of genetic organizations, all of which were found to be associated with genomic islands designated dnd-encoding GIs. The dnd-encoding GIs displayed mosaic structure and the genomic context of the 7 islands (with 1 representative genome from each type of genetic organization) were also highly variable, suggesting multiple recombination events. This is also the first report where two dnd operons were found within a strain although the biological implication is unknown. Surprisingly, dnd operons were frequently found in pathogenic E. coli although their link with virulence has not been explored. Genomic islands likely play an important role in facilitating the horizontal

  14. Mycoplasma hyopneumoniae Transcription Unit Organization: Genome Survey and Prediction

    Science.gov (United States)

    Siqueira, Franciele Maboni; Schrank, Augusto; Schrank, Irene Silveira

    2011-01-01

    Mycoplasma hyopneumoniae is associated with swine respiratory diseases. Although gene organization and regulation are well known in many prokaryotic organisms, knowledge on mycoplasma is limited. This study performed a comparative analysis of three strains of M. hyopneumoniae (7448, J and 232), with a focus on genome organization and gene comparison for open read frame (ORF) cluster (OC) identification. An in silico analysis of gene organization demonstrated 117 OCs and 34 single ORFs in M. hyopneumoniae 7448 and J, while 116 OCs and 36 single ORFs were identified in M. hyopneumoniae 232. Genomic comparison revealed high synteny and conservation of gene order between the OCs defined for 7448 and J strains as well as for 7448 and 232 strains. Twenty-one OCs were chosen and experimentally confirmed by reverse transcription–PCR from M. hyopneumoniae 7448 genome, validating our prediction. A subset of the ORFs within an OC could be independently transcribed due to the presence of internal promoters. Our results suggest that transcription occurs in ‘run-on’ from an upstream promoter in M. hyopneumoniae, thus forming large ORF clusters (from 2 to 29 ORFs in the same orientation) and indicating a complex transcriptional organization. PMID:22086999

  15. Microdiversification of a Pelagic Polynucleobacter Species Is Mainly Driven by Acquisition of Genomic Islands from a Partially Interspecific Gene Pool

    Science.gov (United States)

    Schmidt, Johanna; Jezberová, Jitka; Koll, Ulrike; Hahn, Martin W.

    2016-01-01

    ABSTRACT Microdiversification of a planktonic freshwater bacterium was studied by comparing 37 Polynucleobacter asymbioticus strains obtained from three geographically separated sites in the Austrian Alps. Genome comparison of nine strains revealed a core genome of 1.8 Mb, representing 81% of the average genome size. Seventy-five percent of the remaining flexible genome is clustered in genomic islands (GIs). Twenty-four genomic positions could be identified where GIs are potentially located. These positions are occupied strain specifically from a set of 28 GI variants, classified according to similarities in their gene content. One variant, present in 62% of the isolates, encodes a pathway for the degradation of aromatic compounds, and another, found in 78% of the strains, contains an operon for nitrate assimilation. Both variants were shown in ecophysiological tests to be functional, thus providing the potential for microniche partitioning. In addition, detected interspecific horizontal exchange of GIs indicates a large gene pool accessible to Polynucleobacter species. In contrast to core genes, GIs are spread more successfully across spatially separated freshwater habitats. The mobility and functional diversity of GIs allow for rapid evolution, which may be a key aspect for the ubiquitous occurrence of Polynucleobacter bacteria. IMPORTANCE Assessing the ecological relevance of bacterial diversity is a key challenge for current microbial ecology. The polyphasic approach which was applied in this study, including targeted isolation of strains, genome analysis, and ecophysiological tests, is crucial for the linkage of genetic and ecological knowledge. Particularly great importance is attached to the high number of closely related strains which were investigated, represented by genome-wide average nucleotide identities (ANI) larger than 97%. The extent of functional diversification found on this narrow phylogenetic scale is compelling. Moreover, the transfer of

  16. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index...... itself. Depending on the trait’s economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage...

  17. Genomic fingerprinting and serotyping of Salmonella from Galápagos iguanas demonstrates island differences in strain diversity.

    Science.gov (United States)

    Wheeler, Emily; Cann, Isaac K O; Mackie, Roderick I

    2011-04-01

    Salmonella carriage patterns in wild and captive reptiles suggest that both geographical proximity and host ecological differences may determine bacterial diversity among reptile populations. In this study, we explore the relative importance of these factors on Salmonella diversity in free-living Galápagos iguanas. We isolated Salmonella enterica from marine iguanas (Amblyrhynchus cristatus) and land iguanas (Conolophus subcristatus and C. pallidus) living on two islands (Plaza Sur and Santa Fe). We evaluated Salmonella population patterns using genomic fingerprints, sequence typing and serotyping. Rep-PCR fingerprinting revealed significant grouping of isolates by iguana population. Island residence had the strongest effect on isolate similarity, but a smaller divergence among Salmonella isolates from different iguana ecotypes (land versus marine) was detected within each island. In contrast, sequence typing detected a marginal difference in isolate genotypes between islands. Sequence types corresponded strongly to serotype identity, with both islands hosting a unique serovar pool. Our findings suggest that both geographical location and host ecotype differences (either from within host strain selection or from differences in habitat use) contribute to Salmonella population patterns in the Galápagos Islands. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.

  18. The Pacific Rat Race to Easter Island: Tracking the Prehistoric Dispersal of Rattus exulans Using Ancient Mitochondrial Genomes

    Directory of Open Access Journals (Sweden)

    Katrina West

    2017-05-01

    Full Text Available The location of the immediate eastern Polynesian origin for the settlement of Easter Island (Rapa Nui, remains unclear with conflicting archeological and linguistic evidence. Previous genetic commensal research using the Pacific rat, Rattus exulans; a species transported by humans across Remote Oceania and throughout the Polynesian Triangle, has identified broad interaction spheres across the region. However, there has been limited success in distinguishing finer-scale movements between Remote Oceanic islands as the same mitochondrial control region haplotype has been identified in the majority of ancient rat specimens. To improve molecular resolution and identify a pattern of prehistoric dispersal to Easter Island, we sequenced complete mitochondrial genomes from ancient Pacific rat specimens obtained from early archeological contexts across West and East Polynesia. Ancient Polynesian rat haplotypes are closely related and reflect the widely supported scenario of a central East Polynesian homeland region from which eastern expansion occurred. An Easter Island and Tubuai (Austral Islands grouping of related haplotypes suggests that both islands were established by the same colonization wave, proposed to have originated in the central homeland region before dispersing through the south-eastern corridor of East Polynesia.

  19. Stability and predictability in younger crystalline rock system: Japanese Islands case

    International Nuclear Information System (INIS)

    Yoshida, S.

    2009-01-01

    The Japanese Islands consist of igneous, sedimentary, and metamorphic rocks ranging in age from Paleozoic to Cenozoic. Among these, Carboniferous to Paleogene rocks occupy about 60% of the total area of the Japanese Islands. It should be noted that Quaternary volcanic rocks occupy only about 9% of the total area, although Quaternary volcanoes occur throughout the Japanese Islands. Long-term stability and predictability in the rock system are discussed in terms of volcanic activity, active faulting, and plate motion. Volcanic activity in the Japanese Islands is intimately related to subduction of the Pacific Plate and the Philippine Sea Plate. The volcanic front related to the Pacific and the Philippine Sea plates has been essentially fixed since about 6 Ma. The main active faults, which are distributed sporadically throughout the Japanese Islands, number about 150 and have been extensively investigated. The modes of the Pacific Plate and the Philippine Sea Plate have been essentially invariable since 10 Ma and 6 Ma, respectively. These lines of evidence imply that volcanism and tectonism in the Japanese Islands will scarcely change for hundreds of thousands of years into the future. It is clear that many places suitable for geological disposal will be present in this rock system. (author)

  20. Finite-Control-Set Model Predictive Control (FCS-MPC) for Islanded Hybrid Microgrids

    OpenAIRE

    Yi, Zhehan; Babqi, Abdulrahman J.; Wang, Yishen; Shi, Di; Etemadi, Amir H.; Wang, Zhiwei; Huang, Bibin

    2018-01-01

    Microgrids consisting of multiple distributed energy resources (DERs) provide a promising solution to integrate renewable energies, e.g., solar photovoltaic (PV) systems. Hybrid AC/DC microgrids leverage the merits of both AC and DC power systems. In this paper, a control strategy for islanded multi-bus hybrid microgrids is proposed based on the Finite-Control-Set Model Predictive Control (FCS-MPC) technologies. The control loops are expedited by predicting the future states and determining t...

  1. Predictions of barrier island berm evolution in a time-varying storm climatology

    Science.gov (United States)

    Plant, Nathaniel G.; Flocks, James; Stockdon, Hilary F.; Long, Joseph W.; Guy, Kristy K.; Thompson, David M.; Cormier, Jamie M.; Smith, Christopher G.; Miselis, Jennifer L.; Dalyander, P. Soupy

    2014-01-01

    Low-lying barrier islands are ubiquitous features of the world's coastlines, and the processes responsible for their formation, maintenance, and destruction are related to the evolution of smaller, superimposed features including sand dunes, beach berms, and sandbars. The barrier island and its superimposed features interact with oceanographic forces (e.g., overwash) and exchange sediment with each other and other parts of the barrier island system. These interactions are modulated by changes in storminess. An opportunity to study these interactions resulted from the placement and subsequent evolution of a 2 m high sand berm constructed along the northern Chandeleur Islands, LA. We show that observed berm length evolution is well predicted by a model that was fit to the observations by estimating two parameters describing the rate of berm length change. The model evaluates the probability and duration of berm overwash to predict episodic berm erosion. A constant berm length change rate is also predicted that persists even when there is no overwash. The analysis is extended to a 16 year time series that includes both intraannual and interannual variability of overwash events. This analysis predicts that as many as 10 or as few as 1 day of overwash conditions would be expected each year. And an increase in berm elevation from 2 m to 3.5 m above mean sea level would reduce the expected frequency of overwash events from 4 to just 0.5 event-days per year. This approach can be applied to understanding barrier island and berm evolution at other locations using past and future storm climatologies.

  2. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012 for starvation resistance (startle response. The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.

  3. Emergence of Salmonella genomic island 1 (SGI1) among Proteus mirabilis clinical isolates in Dijon, France.

    Science.gov (United States)

    Siebor, Eliane; Neuwirth, Catherine

    2013-08-01

    Salmonella genomic island 1 (SGI1) is often encountered in antibiotic-resistant Salmonella enterica and exceptionally in Proteus mirabilis. We investigated the prevalence of SGI1-producing clinical isolates of P. mirabilis in our hospital (Dijon, France). A total of 57 strains of P. mirabilis resistant to amoxicillin and/or gentamicin and/or trimethoprim/sulfamethoxazole isolated from August 2011 to February 2012 as well as 9 extended-spectrum β-lactamase (ESBL)-producing P. mirabilis from our collection were tested for the presence of SGI1 by PCR. The complete SGI1 structure from positive isolates [backbone and multidrug resistance (MDR) region] was sequenced. SGI1 was detected in 7 isolates; 5 out of the 57 isolates collected during the study period (9%) and 2 out of the 9 ESBL-producing strains of our collection. The structures of the seven SGI1s were distinct. Three different backbones were identified: one identical to the SGI1 backbone from the epidemic Salmonella Typhimurium DT104, one with variations already described in SGI1-K from Salmonella Kentucky (deletion and insertion of IS1359 in the region spanning from S005 to S009) and one with a variation never detected before (deletion from S005 to S009). Six different MDR regions were identified: four simple variants containing resistance genes already described and two variants harbouring a very complex structure including regions derived from several transposons and IS26 elements with aphA1a never reported to date in SGI1. SGI1 variants are widely distributed among P. mirabilis clinical strains and might spread to other commensal Enterobacteriaceae. This would become a serious public health problem.

  4. Whole-genome regression and prediction methods applied to plant and animal breeding

    NARCIS (Netherlands)

    Los Campos, De G.; Hickey, J.M.; Pong-Wong, R.; Daetwyler, H.D.; Calus, M.P.L.

    2013-01-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding, and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of

  5. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking

    NARCIS (Netherlands)

    Daetwyler, H.D.; Calus, M.P.L.; Pong-Wong, R.; Los Campos, De G.; Hickey, J.M.

    2013-01-01

    The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant

  6. A common reference population from four European Holstein populations increases reliability of genomic predictions

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; de Ross, Sander PW; de Vries, Alfred G

    2011-01-01

    Background Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to...

  7. A toxin antitoxin system promotes the maintenance of the IncA/C-mobilizable Salmonella Genomic Island 1.

    Science.gov (United States)

    Huguet, Kevin T; Gonnet, Mathieu; Doublet, Benoît; Cloeckaert, Axel

    2016-08-31

    The multidrug resistance Salmonella Genomic Island 1 (SGI1) is an integrative mobilizable element identified in several enterobacterial pathogens. This chromosomal island requires a conjugative IncA/C plasmid to be excised as a circular extrachromosomal form and conjugally mobilized in trans. Preliminary observations suggest stable maintenance of SGI1 in the host chromosome but paradoxically also incompatibility between SGI1 and IncA/C plasmids. Here, using a Salmonella enterica serovar Agona clonal bacterial population as model, we demonstrate that a Toxin-Antitoxin (TA) system encoded by SGI1 plays a critical role in its stable host maintenance when an IncA/C plasmid is concomitantly present. This system, designated sgiAT for Salmonella genomic island 1 Antitoxin and Toxin respectively, thus seems to play a stabilizing role in a situation where SGI1 is susceptible to be lost through plasmid IncA/C-mediated excision. Moreover and for the first time, the incompatibility between SGI1 and IncA/C plasmids was experimentally confirmed.

  8. An Integrative Genomic Island Affects the Adaptations of Piezophilic Hyperthermophilic Archaeon Pyrococcus yayanosii to High Temperature and High Hydrostatic Pressure

    Directory of Open Access Journals (Sweden)

    Zhen Li

    2016-11-01

    Full Text Available Deep-sea hydrothermal vent environments are characterized by high hydrostatic pressure and sharp temperature and chemical gradients. Horizontal gene transfer is thought to play an important role in the microbial adaptation to such an extreme environment. In this study, a 21.4-kb DNA fragment was identified as a genomic island, designated PYG1, in the genomic sequence of the piezophilic hyperthermophile Pyrococcus yayanosii. According to the sequence alignment and functional annotation, the genes in PYG1 could tentatively be divided into five modules, with functions related to mobility, DNA repair, metabolic processes and the toxin-antitoxin system. Integrase can mediate the site-specific integration and excision of PYG1 in the chromosome of P. yayanosii A1. Gene replacement of PYG1 with a SimR cassette was successful. The growth of the mutant strain ∆PYG1 was compared with its parent strain P. yayanosii A2 under various stress conditions, including different pH, salinity, temperature and hydrostatic pressure. The ∆PYG1 mutant strain showed reduced growth when grown at 100 °C, while the biomass of ∆PYG1 increased significantly when cultured at 80 MPa. Differential expression of the genes in module Ⅲ of PYG1 was observed under different temperature and pressure conditions. This study demonstrates the first example of an archaeal integrative genomic island that could affect the adaptation of the hyperthermophilic piezophile P. yayanosii to high temperature and high hydrostatic pressure.

  9. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

    Science.gov (United States)

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

  10. Genomic diversity and differentiation of a managed island wild boar population

    DEFF Research Database (Denmark)

    Iacolina, Laura; Scandura, Massimo; J. Goedbloed, Daniel

    2016-01-01

    The evolution of island populations in natural systems is driven by local adaptation and genetic drift. However, evolutionary pathways may be altered by humans in several ways. The wild boar (WB) (Sus scrofa) is an iconic game species occurring in several islands, where it has been strongly manag...

  11. The Salmonella genomic island 1 is specifically mobilized in trans by the IncA/C multidrug resistance plasmid family.

    Science.gov (United States)

    Douard, Gregory; Praud, Karine; Cloeckaert, Axel; Doublet, Benoît

    2010-12-20

    The Salmonella genomic island 1 (SGI1) is a Salmonella enterica-derived integrative mobilizable element (IME) containing various complex multiple resistance integrons identified in several S. enterica serovars and in Proteus mirabilis. Previous studies have shown that SGI1 transfers horizontally by in trans mobilization in the presence of the IncA/C conjugative helper plasmid pR55. Here, we report the ability of different prevalent multidrug resistance (MDR) plasmids including extended-spectrum β-lactamase (ESBL) gene-carrying plasmids to mobilize the multidrug resistance genomic island SGI1. Through conjugation experiments, none of the 24 conjugative plasmids tested of the IncFI, FII, HI2, I1, L/M, N, P incompatibility groups were able to mobilize SGI1 at a detectable level (transfer frequency IncA/C incompatibility group. Several conjugative IncA/C MDR plasmids as well as the sequenced IncA/C reference plasmid pRA1 of 143,963 bp were shown to mobilize in trans SGI1 from a S. enterica donor to the Escherichia coli recipient strain. Depending on the IncA/C plasmid used, the conjugative transfer of SGI1 occurred at frequencies ranging from 10(-3) to 10(-6) transconjugants per donor. Of particular concern, some large IncA/C MDR plasmids carrying the extended-spectrum cephalosporinase bla(CMY-2) gene were shown to mobilize in trans SGI1. The ability of the IncA/C MDR plasmid family to mobilize SGI1 could contribute to its spread by horizontal transfer among enteric pathogens. Moreover, the increasing prevalence of IncA/C plasmids in MDR S. enterica isolates worldwide has potential implications for the epidemic success of the antibiotic resistance genomic island SGI1 and its close derivatives.

  12. Predictive Modeling of Spinner Dolphin (Stenella longirostris) Resting Habitat in the Main Hawaiian Islands

    Science.gov (United States)

    Thorne, Lesley H.; Johnston, David W.; Urban, Dean L.; Tyne, Julian; Bejder, Lars; Baird, Robin W.; Yin, Suzanne; Rickards, Susan H.; Deakos, Mark H.; Mobley, Joseph R.; Pack, Adam A.; Chapla Hill, Marie

    2012-01-01

    Predictive habitat models can provide critical information that is necessary in many conservation applications. Using Maximum Entropy modeling, we characterized habitat relationships and generated spatial predictions of spinner dolphin (Stenella longirostris) resting habitat in the main Hawaiian Islands. Spinner dolphins in Hawai'i exhibit predictable daily movements, using inshore bays as resting habitat during daylight hours and foraging in offshore waters at night. There are growing concerns regarding the effects of human activities on spinner dolphins resting in coastal areas. However, the environmental factors that define suitable resting habitat remain unclear and must be assessed and quantified in order to properly address interactions between humans and spinner dolphins. We used a series of dolphin sightings from recent surveys in the main Hawaiian Islands and a suite of environmental variables hypothesized as being important to resting habitat to model spinner dolphin resting habitat. The model performed well in predicting resting habitat and indicated that proximity to deep water foraging areas, depth, the proportion of bays with shallow depths, and rugosity were important predictors of spinner dolphin habitat. Predicted locations of suitable spinner dolphin resting habitat provided in this study indicate areas where future survey efforts should be focused and highlight potential areas of conflict with human activities. This study provides an example of a presence-only habitat model used to inform the management of a species for which patterns of habitat availability are poorly understood. PMID:22937022

  13. Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship.

    Directory of Open Access Journals (Sweden)

    S Hong Lee

    Full Text Available Genomic prediction is emerging in a wide range of fields including animal and plant breeding, risk prediction in human precision medicine and forensic. It is desirable to establish a theoretical framework for genomic prediction accuracy when the reference data consists of information sources with varying degrees of relationship to the target individuals. A reference set can contain both close and distant relatives as well as 'unrelated' individuals from the wider population in the genomic prediction. The various sources of information were modeled as different populations with different effective population sizes (Ne. Both the effective number of chromosome segments (Me and Ne are considered to be a function of the data used for prediction. We validate our theory with analyses of simulated as well as real data, and illustrate that the variation in genomic relationships with the target is a predictor of the information content of the reference set. With a similar amount of data available for each source, we show that close relatives can have a substantially larger effect on genomic prediction accuracy than lesser related individuals. We also illustrate that when prediction relies on closer relatives, there is less improvement in prediction accuracy with an increase in training data or marker panel density. We release software that can estimate the expected prediction accuracy and power when combining different reference sources with various degrees of relationship to the target, which is useful when planning genomic prediction (before or after collecting data in animal, plant and human genetics.

  14. Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking

    Science.gov (United States)

    Daetwyler, Hans D.; Calus, Mario P. L.; Pong-Wong, Ricardo; de los Campos, Gustavo; Hickey, John M.

    2013-01-01

    The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits

  15. Identifying Rare Variation in Cases of Schizophrenia in the Isolated Population of the Faroe Islands using Whole-genome Sequencing

    DEFF Research Database (Denmark)

    Als, Thomas Damm; Lescai, Francesco; Dahl, Hans

    to map risk variants involved in complex traits. We aim at utilizing samples of cases and controls of the isolated population of the Faroe Islands to conduct whole-genome-sequence analysis in order to identify rare genetic variants associated with schizophrenia. We will search for rare genetic variants...... of developing SZ. However, these studies are designed to examining only “the common variant” proportion of the genomic landscape of SZ. Due to increased genetic drift during founding and potential bottlenecks, followed by population expansion, isolated populations may be particularly useful in identifying rare...... disease variants, that may appear at higher frequencies and/or within a more clearly distinct haplotype structure compared to outbred populations. Small isolated populations also typically show reduced phenotypic, genetic and environmental heterogeneity, thus making them advantageous in studies aiming...

  16. Complete genome sequence and comparative genomic analysis of Mycobacterium massiliense JCM 15300 in the Mycobacterium abscessus group reveal a conserved genomic island MmGI-1 related to putative lipid metabolism.

    Directory of Open Access Journals (Sweden)

    Tsuyoshi Sekizuka

    Full Text Available Mycobacterium abscessus group subsp., such as M. massiliense, M. abscessus sensu stricto and M. bolletii, are an environmental organism found in soil, water and other ecological niches, and have been isolated from respiratory tract infection, skin and soft tissue infection, postoperative infection of cosmetic surgery. To determine the unique genetic feature of M. massiliense, we sequenced the complete genome of M. massiliense type strain JCM 15300 (corresponding to CCUG 48898. Comparative genomic analysis was performed among Mycobacterium spp. and among M. abscessus group subspp., showing that additional ß-oxidation-related genes and, notably, the mammalian cell entry (mce operon were located on a genomic island, M. massiliense Genomic Island 1 (MmGI-1, in M. massiliense. In addition, putative anaerobic respiration system-related genes and additional mycolic acid cyclopropane synthetase-related genes were found uniquely in M. massiliense. Japanese isolates of M. massiliense also frequently possess the MmGI-1 (14/44, approximately 32% and three unique conserved regions (26/44; approximately 60%, 34/44; approximately 77% and 40/44; approximately 91%, as well as isolates of other countries (Malaysia, France, United Kingdom and United States. The well-conserved genomic island MmGI-1 may play an important role in high growth potential with additional lipid metabolism, extra factors for survival in the environment or synthesis of complex membrane-associated lipids. ORFs on MmGI-1 showed similarities to ORFs of phylogenetically distant M. avium complex (MAC, suggesting that horizontal gene transfer or genetic recombination events might have occurred within MmGI-1 among M. massiliense and MAC.

  17. Drug target prediction and prioritization: using orthology to predict essentiality in parasite genomes

    Directory of Open Access Journals (Sweden)

    Hall Ross S

    2010-04-01

    Full Text Available Abstract Background New drug targets are urgently needed for parasites of socio-economic importance. Genes that are essential for parasite survival are highly desirable targets, but information on these genes is lacking, as gene knockouts or knockdowns are difficult to perform in many species of parasites. We examined the applicability of large-scale essentiality information from four model eukaryotes, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Saccharomyces cerevisiae, to discover essential genes in each of their genomes. Parasite genes that lack orthologues in their host are desirable as selective targets, so we also examined prediction of essential genes within this subset. Results Cross-species analyses showed that the evolutionary conservation of genes and the presence of essential orthologues are each strong predictors of essentiality in eukaryotes. Absence of paralogues was also found to be a general predictor of increased relative essentiality. By combining several orthology and essentiality criteria one can select gene sets with up to a five-fold enrichment in essential genes compared with a random selection. We show how quantitative application of such criteria can be used to predict a ranked list of potential drug targets from Ancylostoma caninum and Haemonchus contortus - two blood-feeding strongylid nematodes, for which there are presently limited sequence data but no functional genomic tools. Conclusions The present study demonstrates the utility of using orthology information from multiple, diverse eukaryotes to predict essential genes. The data also emphasize the challenge of identifying essential genes among those in a parasite that are absent from its host.

  18. Predicting Where a Radiation Will Occur: Acoustic and Molecular Surveys Reveal Overlooked Diversity in Indian Ocean Island Crickets (Mogoplistinae: Ornebius.

    Directory of Open Access Journals (Sweden)

    Ben H Warren

    Full Text Available Recent theory suggests that the geographic location of island radiations (local accumulation of species diversity due to cladogenesis can be predicted based on island area and isolation. Crickets are a suitable group for testing these predictions, as they show both the ability to reach some of the most isolated islands in the world, and to speciate at small spatial scales. Despite substantial song variation between closely related species in many island cricket lineages worldwide, to date this characteristic has not received attention in the western Indian Ocean islands; existing species descriptions are based on morphology alone. Here we use a combination of acoustics and DNA sequencing to survey these islands for Ornebius crickets. We uncover a small but previously unknown radiation in the Mascarenes, constituting a three-fold increase in the Ornebius species diversity of this archipelago (from two to six species. A further new species is detected in the Comoros. Although double archipelago colonisation is the best explanation for species diversity in the Seychelles, in situ cladogenesis is the best explanation for the six species in the Mascarenes and two species of the Comoros. Whether the radiation of Mascarene Ornebius results from intra- or purely inter- island speciation cannot be determined on the basis of the phylogenetic data alone. However, the existence of genetic, song and ecological divergence at the intra-island scale is suggestive of an intra-island speciation scenario in which ecological and mating traits diverge hand-in-hand. Our results suggest that the geographic location of Ornebius radiations is partially but not fully explained by island area and isolation. A notable anomaly is Madagascar, where our surveys are consistent with existing accounts in finding no Ornebius species present. Possible explanations are discussed, invoking ecological differences between species and differences in environmental history between

  19. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

    Science.gov (United States)

    Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J

    2018-02-02

    Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.

  20. Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower

    Directory of Open Access Journals (Sweden)

    Patrick Thorwarth

    2018-02-01

    Full Text Available Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower (Brassica oleracea var. botrytis by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding.

  1. Ecological fitness, genomic islands and bacterial pathogenicity: A Darwinian view of the evolution of microbes

    OpenAIRE

    Hacker, Jörg; Carniel, Elisabeth

    2001-01-01

    The compositions of bacterial genomes can be changed rapidly and dramatically through a variety of processes including horizontal gene transfer. This form of change is key to bacterial evolution, as it leads to ‘evolution in quantum leaps’. Horizontal gene transfer entails the incorporation of genetic elements transferred from another organism—perhaps in an earlier generation—directly into the genome, where they form ‘genomic islands’, i.e. blocks of DNA with signatures of mobile genetic elem...

  2. Predicting sea-level rise vulnerability of terrestrial habitat and wildlife of the Northwestern Hawaiian Islands

    Science.gov (United States)

    Reynolds, Michelle H.; Berkowitz, Paul; Courtot, Karen N.; Krause, Crystal M.; Reynolds, Michelle H.; Berkowitz, Paul; Courtot, Karen N.; Krause, Crystal M.

    2012-01-01

    If current climate change trends continue, rising sea levels may inundate low-lying islands across the globe, placing island biodiversity at risk. Recent models predict a rise of approximately one meter (1 m) in global sea level by 2100, with larger increases possible in areas of the Pacific Ocean. Pacific Islands are unique ecosystems home to many endangered endemic plant and animal species. The Northwestern Hawaiian Islands (NWHI), which extend 1,930 kilometers (km) beyond the main Hawaiian Islands, are a World Heritage Site and part of the Papahanaumokuakea Marine National Monument. These NWHI support the largest tropical seabird rookery in the world, providing breeding habitat for 21 species of seabirds, 4 endemic land bird species and essential foraging, breeding, or haul-out habitat for other resident and migratory wildlife. In recent years, concern has grown about the increasing vulnerability of the NWHI and their wildlife populations to changing climatic patterns, particularly the uncertainty associated with potential impacts from global sea-level rise (SLR) and storms. In response to the need by managers to adapt future resource protection strategies to climate change variability and dynamic island ecosystems, we have synthesized and down scaled analyses for this important region. This report describes a 2-year study of a remote northwestern Pacific atoll ecosystem and identifies wildlife and habitat vulnerable to rising sea levels and changing climate conditions. A lack of high-resolution topographic data for low-lying islands of the NWHI had previously precluded an extensive quantitative model of the potential impacts of SLR on wildlife habitat. The first chapter (chapter 1) describes the vegetation and topography of 20 islands of Papahanaumokuakea Marine National Monument, the distribution and status of wildlife populations, and the predicted impacts for a range of SLR scenarios. Furthermore, this chapter explores the potential effects of SLR on

  3. Genome-enabled prediction models for yield related traits in chickpea

    Science.gov (United States)

    Genomic selection (GS) unlike marker-assisted backcrossing (MABC) predicts breeding values of lines using genome-wide marker profiling and allows selection of lines prior to field-phenotyping, thereby shortening the breeding cycle. A collection of 320 elite breeding lines was selected and phenotyped...

  4. Genomic selection accuracy using multi-family prediction models in a wheat breeding program

    Science.gov (United States)

    Genomic selection (GS) uses genome-wide molecular marker data to predict the genetic value of selection candidates in breeding programs. In plant breeding, the ability to produce large numbers of progeny per cross allows GS to be conducted within each family. However, this approach requires phenotyp...

  5. Genome-based prediction of common diseases: Methodological considerations for future research

    NARCIS (Netherlands)

    A.C.J.W. Janssens (Cécile); P. Tikka-Kleemola (Päivi)

    2009-01-01

    textabstractThe translation of emerging genomic knowledge into public health and clinical care is one of the major challenges for the coming decades. At the moment, genome-based prediction of common diseases, such as type 2 diabetes, coronary heart disease and cancer, is still not informative. Our

  6. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Science.gov (United States)

    Ober, Ulrike; Huang, Wen; Magwire, Michael; Schlather, Martin; Simianer, Henner; Mackay, Trudy F C

    2015-01-01

    The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17%) of the genetic variance among lines in females (males), the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  7. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17% of the genetic variance among lines in females (males, the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  8. Improving genomic prediction for Danish Jersey using a joint Danish-US reference population

    DEFF Research Database (Denmark)

    Su, Guosheng; Nielsen, Ulrik Sander; Wiggans, G

    Accuracy of genomic prediction depends on the information in the reference population. Achieving an adequate sized reference population is a challenge for genomic prediction in small cattle populations. One way to increase the size of reference population is to combine reference data from different...... populations. The objective of this study was to assess the gain of genomic prediction accuracy when including US Jersey bulls in the Danish Jersey reference population. The data included 1,262 Danish progeny-tested bulls and 1,157 US progeny-tested bulls. Genomic breeding values (GEBV) were predicted using...... a GBLUP model from the Danish reference population and the joint Danish-US reference population. The traits in the analysis were milk yield, fat yield, protein yield, fertility, mastitis, longevity, body conformation, feet & legs, and longevity. Eight of the nine traits benefitted from the inclusion of US...

  9. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    Science.gov (United States)

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  10. The master activator of IncA/C conjugative plasmids stimulates genomic islands and multidrug resistance dissemination.

    Science.gov (United States)

    Carraro, Nicolas; Matteau, Dominick; Luo, Peng; Rodrigue, Sébastien; Burrus, Vincent

    2014-10-01

    Dissemination of antibiotic resistance genes occurs mostly by conjugation, which mediates DNA transfer between cells in direct contact. Conjugative plasmids of the IncA/C incompatibility group have become a substantial threat due to their broad host-range, the extended spectrum of antimicrobial resistance they confer, their prevalence in enteric bacteria and their very efficient spread by conjugation. However, their biology remains largely unexplored. Using the IncA/C conjugative plasmid pVCR94ΔX as a prototype, we have investigated the regulatory circuitry that governs IncA/C plasmids dissemination and found that the transcriptional activator complex AcaCD is essential for the expression of plasmid transfer genes. Using chromatin immunoprecipitation coupled with exonuclease digestion (ChIP-exo) and RNA sequencing (RNA-seq) approaches, we have identified the sequences recognized by AcaCD and characterized the AcaCD regulon. Data mining using the DNA motif recognized by AcaCD revealed potential AcaCD-binding sites upstream of genes involved in the intracellular mobility functions (recombination directionality factor and mobilization genes) in two widespread classes of genomic islands (GIs) phylogenetically unrelated to IncA/C plasmids. The first class, SGI1, confers and propagates multidrug resistance in Salmonella enterica and Proteus mirabilis, whereas MGIVmi1 in Vibrio mimicus belongs to a previously uncharacterized class of GIs. We have demonstrated that through expression of AcaCD, IncA/C plasmids specifically trigger the excision and mobilization of the GIs at high frequencies. This study provides new evidence of the considerable impact of IncA/C plasmids on bacterial genome plasticity through their own mobility and the mobilization of genomic islands.

  11. Comparative Genomics of Rhodococcus equi Virulence Plasmids Indicates Host-Driven Evolution of the vap Pathogenicity Island.

    Science.gov (United States)

    MacArthur, Iain; Anastasi, Elisa; Alvarez, Sonsiray; Scortti, Mariela; Vázquez-Boland, José A

    2017-05-01

    The conjugative virulence plasmid is a key component of the Rhodococcus equi accessory genome essential for pathogenesis. Three host-associated virulence plasmid types have been identified the equine pVAPA and porcine pVAPB circular variants, and the linear pVAPN found in bovine (ruminant) isolates. We recently characterized the R. equi pangenome (Anastasi E, et al. 2016. Pangenome and phylogenomic analysis of the pathogenic actinobacterium Rhodococcus equi. Genome Biol Evol. 8:3140-3148.) and we report here the comparative analysis of the virulence plasmid genomes. Plasmids within each host-associated type were highly similar despite their diverse origins. Variation was accounted for by scattered single nucleotide polymorphisms and short nucleotide indels, while larger indels-mostly in the plasticity region near the vap pathogencity island (PAI)-defined plasmid genomic subtypes. Only one of the plasmids analyzed, of pVAPN type, was exceptionally divergent due to accumulation of indels in the housekeeping backbone. Each host-associated plasmid type carried a unique PAI differing in vap gene complement, suggesting animal host-specific evolution of the vap multigene family. Complete conservation of the vap PAI was observed within each host-associated plasmid type. Both diversity of host-associated plasmid types and clonality of specific chromosomal-plasmid genomic type combinations were observed within the same R. equi phylogenomic subclade. Our data indicate that the overall strong conservation of the R. equi host-associated virulence plasmids is the combined result of host-driven selection, lateral transfer between strains, and geographical spread due to international livestock exchanges. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    Directory of Open Access Journals (Sweden)

    Bonten Marc JM

    2010-04-01

    Full Text Available Abstract Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI, which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

  13. Identification of a Genomic Signature Predicting for Recurrence in Early Stage Ovarian Cancer

    Science.gov (United States)

    2015-12-01

    do it. Thus, instead of simply sequencing all the FFPE samples, we used 10 tumor samples (5 recurrent and 5 non recurrent ) to test sequencing and...Award Number: W81XWH-12-1-0521 TITLE: Identification of a Genomic Signature Predicting for Recurrence in Early-Stage Ovarian Cancer PRINCIPAL...4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-12-1-0521 Identification of a Genomic Signature Predicting for Recurrence in

  14. Kernel-based whole-genome prediction of complex traits: a review.

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  15. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  16. The effect of using genealogy-based haplotypes for genomic prediction.

    Science.gov (United States)

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  17. A map to a new treasure island: the human genome and the concept of common heritage.

    Science.gov (United States)

    Byk, C

    1998-06-01

    While the 1970's have been called the environmental years, the 1990's could be seen as the genome years. As the challenge to map and to sequence the human genome mobilized the scientific community, risks and benefits of information and uses that would derive from this project have also raised ethical issues at the international level. The particular interest of the 1997 UNESCO Declaration relies on the fact that it emphasizes both the scientific importance of genetics and the appropriate reinforcement of human rights in this area. It considers the human genome, at least symbolically, as the common heritage of humanity.

  18. Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato

    Directory of Open Access Journals (Sweden)

    Benjamin Stich

    2018-03-01

    Full Text Available Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP, BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY, and tuber yield (TY of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs.

  19. Destabilization of IncA and IncC plasmids by SGI1 and SGI2 type Salmonella genomic islands.

    Science.gov (United States)

    Harmer, Christopher J; Hamidian, Mohammad; Ambrose, Stephanie J; Hall, Ruth M

    Both the Salmonella genomic islands (SGI) and the conjugative IncC plasmids are known to contribute substantially to the acquisition of resistance to multiple antibiotics, and plasmids in the A/C group are known to mobilize the Salmonella genomic island SGI1, which also carries multiple antibiotic resistance genes. Plasmid pRMH760 (IncC; A/C 2 ) was shown to mobilize SGI1 variants SGI1-I, SGI1-F, SGI1-K and SGI2 from Salmonella enterica to Escherichia coli where it was integrated at the preferred location, at the end of the trmE (thdF) gene. The plasmid was transferred at a similar frequency. However, we observed that co-transfer of the SGI and the plasmid was rarer. In E. coli to E. coli transfer, the frequency of transfer of the IncC plasmid pRMH760 was at least 1000-fold lower when the donor carried SGI1-I or SGI1-K, indicating that the SGI suppresses transfer of the plasmid. In addition, pRMH760 was rapidly lost from both E. coli and S. enterica strains that also carried SGI1-I, SGI1-F or SGI2. However, plasmid loss was not seen when the SGI1 variant was SGI1-K, which lacks two segments of the SGI1 backbone. The complete sequence of the SGI1-I and SGI1-F were determined and SGI1-K also carries two single base substitutions relative to SGI1-I. The IncA (A/C 1 ) plasmid RA1 was also shown to mobilize SGI2-A and though there are significant differences between the backbones of IncA and IncC plasmids, RA1 was also rapidly lost when SGI2-A was present in the same cell. We conclude that there are multiple interactions, both cooperative and antagonistic, between an IncA or IncC plasmid and the SGI1 and SGI2 family genomic islands. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. A seven-gene CpG-island methylation panel predicts breast cancer progression

    International Nuclear Information System (INIS)

    Li, Yan; Melnikov, Anatoliy A.; Levenson, Victor; Guerra, Emanuela; Simeone, Pasquale; Alberti, Saverio; Deng, Youping

    2015-01-01

    DNA methylation regulates gene expression, through the inhibition/activation of gene transcription of methylated/unmethylated genes. Hence, DNA methylation profiling can capture pivotal features of gene expression in cancer tissues from patients at the time of diagnosis. In this work, we analyzed a breast cancer case series, to identify DNA methylation determinants of metastatic versus non-metastatic tumors. CpG-island methylation was evaluated on a 56-gene cancer-specific biomarker microarray in metastatic versus non-metastatic breast cancers in a multi-institutional case series of 123 breast cancer patients. Global statistical modeling and unsupervised hierarchical clustering were applied to identify a multi-gene binary classifier with high sensitivity and specificity. Network analysis was utilized to quantify the connectivity of the identified genes. Seven genes (BRCA1, DAPK1, MSH2, CDKN2A, PGR, PRKCDBP, RANKL) were found informative for prognosis of metastatic diffusion and were used to calculate classifier accuracy versus the entire data-set. Individual-gene performances showed sensitivities of 63–79 %, 53–84 % specificities, positive predictive values of 59–83 % and negative predictive values of 63–80 %. When modelled together, these seven genes reached a sensitivity of 93 %, 100 % specificity, a positive predictive value of 100 % and a negative predictive value of 93 %, with high statistical power. Unsupervised hierarchical clustering independently confirmed these findings, in close agreement with the accuracy measurements. Network analyses indicated tight interrelationship between the identified genes, suggesting this to be a functionally-coordinated module, linked to breast cancer progression. Our findings identify CpG-island methylation profiles with deep impact on clinical outcome, paving the way for use as novel prognostic assays in clinical settings. The online version of this article (doi:10.1186/s12885-015-1412-9) contains supplementary

  1. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  2. Gene Silencing Triggers Polycomb Repressive Complex 2 Recruitment to CpG Islands Genome Wide

    DEFF Research Database (Denmark)

    Riising, Eva Madi; Vacher-Comet, Itys; Leblanc, Benjamin Olivier

    2014-01-01

    -wide ectopic PRC2 recruitment to endogenous PcG target genes found in other tissues. PRC2 binding analysis shows that it is restricted to nucleosome-free CpG islands (CGIs) of untranscribed genes. Our results show that it is the transcriptional state that governs PRC2 binding, and we propose that it binds...

  3. Methods to compute reliabilities for genomic predictions of feed intake

    Science.gov (United States)

    For new traits without historical reference data, cross-validation is often the preferred method to validate reliability (REL). Time truncation is less useful because few animals gain substantial REL after the truncation point. Accurate cross-validation requires separating genomic gain from pedigree...

  4. Establishing the basis for Genomic Prediction in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario

    2015-01-01

    Genomic Selection (GS) is a relatively new technology, which has already revolutionized animal breeding and which is expected to have a high impact on plant breeding. In contrast to traditional marker assisted breeding, which only focuses on specific genes. GS estimates the genetic value...

  5. Gene prediction in the fathead minnow [Pimephales promelas] genome

    Science.gov (United States)

    The fathead minnow is a well-established model organism which has been widely used for regulatory ecotoxicity testing and research for over half century. While much information has been gathered on the organism over the years, the fathead minnow genome, a critical source of infor...

  6. Genomic Prediction Accuracy for Resistance Against Piscirickettsia salmonis in Farmed Rainbow Trout

    Directory of Open Access Journals (Sweden)

    Grazyella M. Yoshida

    2018-02-01

    Full Text Available Salmonid rickettsial syndrome (SRS, caused by the intracellular bacterium Piscirickettsia salmonis, is one of the main diseases affecting rainbow trout (Oncorhynchus mykiss farming. To accelerate genetic progress, genomic selection methods can be used as an effective approach to control the disease. The aims of this study were: (i to compare the accuracy of estimated breeding values using pedigree-based best linear unbiased prediction (PBLUP with genomic BLUP (GBLUP, single-step GBLUP (ssGBLUP, Bayes C, and Bayesian Lasso (LASSO; and (ii to test the accuracy of genomic prediction and PBLUP using different marker densities (0.5, 3, 10, 20, and 27 K for resistance against P. salmonis in rainbow trout. Phenotypes were recorded as number of days to death (DD and binary survival (BS from 2416 fish challenged with P. salmonis. A total of 1934 fish were genotyped using a 57 K single-nucleotide polymorphism (SNP array. All genomic prediction methods achieved higher accuracies than PBLUP. The relative increase in accuracy for different genomic models ranged from 28 to 41% for both DD and BS at 27 K SNP. Between different genomic models, the highest relative increase in accuracy was obtained with Bayes C (∼40%, where 3 K SNP was enough to achieve a similar accuracy to that of the 27 K SNP for both traits. For resistance against P. salmonis in rainbow trout, we showed that genomic predictions using GBLUP, ssGBLUP, Bayes C, and LASSO can increase accuracy compared with PBLUP. Moreover, it is possible to use relatively low-density SNP panels for genomic prediction without compromising accuracy predictions for resistance against P. salmonis in rainbow trout.

  7. Interactions of Neuropathogenic Escherichia coli K1 (RS218) and Its Derivatives Lacking Genomic Islands with Phagocytic Acanthamoeba castellanii and Nonphagocytic Brain Endothelial Cells

    Science.gov (United States)

    Yousuf, Farzana Abubakar; Yousuf, Zuhair; Iqbal, Junaid; Siddiqui, Ruqaiyyah; Khan, Hafsa; Khan, Naveed Ahmed

    2014-01-01

    Here we determined the role of various genomic islands in E. coli K1 interactions with phagocytic A. castellanii and nonphagocytic brain microvascular endothelial cells. The findings revealed that the genomic islands deletion mutants of RS218 related to toxins (peptide toxin, α-hemolysin), adhesins (P fimbriae, F17-like fimbriae, nonfimbrial adhesins, Hek, and hemagglutinin), protein secretion system (T1SS for hemolysin), invasins (IbeA, CNF1), metabolism (D-serine catabolism, dihydroxyacetone, glycerol, and glyoxylate metabolism) showed reduced interactions with both A. castellanii and brain microvascular endothelial cells. Interestingly, the deletion of RS218-derived genomic island 21 containing adhesins (P fimbriae, F17-like fimbriae, nonfimbrial adhesins, Hek, and hemagglutinin), protein secretion system (T1SS for hemolysin), invasins (CNF1), metabolism (D-serine catabolism) abolished E. coli K1-mediated HBMEC cytotoxicity in a CNF1-independent manner. Therefore, the characterization of these genomic islands should reveal mechanisms of evolutionary gain for E. coli K1 pathogenicity. PMID:24818136

  8. Interactions of Neuropathogenic Escherichia coli K1 (RS218 and Its Derivatives Lacking Genomic Islands with Phagocytic Acanthamoeba castellanii and Nonphagocytic Brain Endothelial Cells

    Directory of Open Access Journals (Sweden)

    Farzana Abubakar Yousuf

    2014-01-01

    Full Text Available Here we determined the role of various genomic islands in E. coli K1 interactions with phagocytic A. castellanii and nonphagocytic brain microvascular endothelial cells. The findings revealed that the genomic islands deletion mutants of RS218 related to toxins (peptide toxin, α-hemolysin, adhesins (P fimbriae, F17-like fimbriae, nonfimbrial adhesins, Hek, and hemagglutinin, protein secretion system (T1SS for hemolysin, invasins (IbeA, CNF1, metabolism (D-serine catabolism, dihydroxyacetone, glycerol, and glyoxylate metabolism showed reduced interactions with both A. castellanii and brain microvascular endothelial cells. Interestingly, the deletion of RS218-derived genomic island 21 containing adhesins (P fimbriae, F17-like fimbriae, nonfimbrial adhesins, Hek, and hemagglutinin, protein secretion system (T1SS for hemolysin, invasins (CNF1, metabolism (D-serine catabolism abolished E. coli K1-mediated HBMEC cytotoxicity in a CNF1-independent manner. Therefore, the characterization of these genomic islands should reveal mechanisms of evolutionary gain for E. coli K1 pathogenicity.

  9. Accuracy of genomic breeding value prediction for intramuscular fat using different genomic relationship matrices in Hanwoo (Korean cattle).

    Science.gov (United States)

    Choi, Taejeong; Lim, Dajeong; Park, Byoungho; Sharma, Aditi; Kim, Jong-Joo; Kim, Sidong; Lee, Seung Hwan

    2017-07-01

    Intramuscular fat is one of the meat quality traits that is considered in the selection strategies for Hanwoo (Korean cattle). Different methods are used to estimate the breeding value of selection candidates. In the present work we focused on accuracy of different genotype relationship matrices as described by forni and pedigree based relationship matrix. The data set included a total of 778 animals that were genotyped for BovineSNP50 BeadChip. Among these 778 animals, 72 animals were sires for 706 reference animals and were used as a validation dataset. Single trait animal model (best linear unbiased prediction and genomic best linear unbiased prediction) was used to estimate the breeding values from genomic and pedigree information. The diagonal elements for the pedigree based coefficients were slightly higher for the genomic relationship matrices (GRM) based coefficients while off diagonal elements were considerably low for GRM based coefficients. The accuracy of breeding value for the pedigree based relationship matrix (A) was 13% while for GRM (GOF, G05, and Yang) it was 0.37, 0.45, and 0.38, respectively. Accuracy of GRM was 1.5 times higher than A in this study. Therefore, genomic information will be more beneficial than pedigree information in the Hanwoo breeding program.

  10. Genomic Prediction of Single Crosses in the Early Stages of a Maize Hybrid Breeding Pipeline

    Directory of Open Access Journals (Sweden)

    Dnyaneshwar C. Kadam

    2016-11-01

    Full Text Available Prediction of single-cross performance has been a major goal of plant breeders since the beginning of hybrid breeding. Recently, genomic prediction has shown to be a promising approach, but only limited studies have examined the accuracy of predicting single-cross performance. Moreover, no studies have examined the potential of predicting single crosses among random inbreds derived from a series of biparental families, which resembles the structure of germplasm comprising the initial stages of a hybrid maize breeding pipeline. The main objectives of this study were to evaluate the potential of genomic prediction for identifying superior single crosses early in the hybrid breeding pipeline and optimize its application. To accomplish these objectives, we designed and analyzed a novel population of single crosses representing the Iowa Stiff Stalk synthetic/non-Stiff Stalk heterotic pattern commonly used in the development of North American commercial maize hybrids. The performance of single crosses was predicted using parental combining ability and covariance among single crosses. Prediction accuracies were estimated using cross-validation and ranged from 0.28 to 0.77 for grain yield, 0.53 to 0.91 for plant height, and 0.49 to 0.94 for staygreen, depending on the number of tested parents of the single cross and genomic prediction method used. The genomic estimated general and specific combining abilities showed an advantage over genomic covariances among single crosses when one or both parents of the single cross were untested. Overall, our results suggest that genomic prediction of single crosses in the early stages of a hybrid breeding pipeline holds great potential to redesign hybrid breeding and increase its efficiency.

  11. Cytotoxic chromosomal targeting by CRISPR/Cas systems can reshape bacterial genomes and expel or remodel pathogenicity islands.

    Science.gov (United States)

    Vercoe, Reuben B; Chang, James T; Dy, Ron L; Taylor, Corinda; Gristwood, Tamzin; Clulow, James S; Richter, Corinna; Przybilski, Rita; Pitman, Andrew R; Fineran, Peter C

    2013-04-01

    In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (Cas) proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2) involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas-mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM) beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA-targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity.

  12. Cytotoxic chromosomal targeting by CRISPR/Cas systems can reshape bacterial genomes and expel or remodel pathogenicity islands.

    Directory of Open Access Journals (Sweden)

    Reuben B Vercoe

    2013-04-01

    Full Text Available In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs and their associated (Cas proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2 involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas-mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA-targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity.

  13. Cytotoxic Chromosomal Targeting by CRISPR/Cas Systems Can Reshape Bacterial Genomes and Expel or Remodel Pathogenicity Islands

    Science.gov (United States)

    Vercoe, Reuben B.; Chang, James T.; Dy, Ron L.; Taylor, Corinda; Gristwood, Tamzin; Clulow, James S.; Richter, Corinna; Przybilski, Rita; Pitman, Andrew R.; Fineran, Peter C.

    2013-01-01

    In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (Cas) proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2) involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas–mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM) beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA–targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity. PMID:23637624

  14. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  15. Genome Replikin Count Predicts Increased Infectivity/Lethality of Viruses

    OpenAIRE

    Samuel Bogoch; Elenore S. Bogoch

    2012-01-01

    The genomes of all groups of viruses whose sequences are listed on Pubmed, specimens since 1918, analyzed by a software from Bioradar UK Ltd., contain Replikins which range in concentration from a Replikin Count (number of Replikins per 100 amino acids) of less than 1 to 30 (see accompanying communications for higher Counts in tuberculosis, malaria, and cancer, associated with higher lethality). Counts of less than 4.0 were found in ‘resting’ virus states; Counts greater than 4....

  16. Comparative Genomics and Identification of an Enterotoxin-Bearing Pathogenicity Island, SEPI-1/SECI-1, in Staphylococcus epidermidis Pathogenic Strains.

    Science.gov (United States)

    Argemi, Xavier; Nanoukon, Chimène; Affolabi, Dissou; Keller, Daniel; Hansmann, Yves; Riegel, Philippe; Baba-Moussa, Lamine; Prévost, Gilles

    2018-02-25

    Staphylococcus epidermidis is a leading cause of nosocomial infections, majorly resistant to beta-lactam antibiotics, and may transfer several mobile genetic elements among the members of its own species, as well as to Staphylococcus aureus ; however, a genetic exchange from S. aureus to S. epidermidis remains controversial. We recently identified two pathogenic clinical strains of S. epidermidis that produce a staphylococcal enterotoxin C3-like (SEC) similar to that by S. aureus pathogenicity islands. This study aimed to determine the genetic environment of the SEC-coding sequence and to identify the mobile genetic elements. Whole-genome sequencing and annotation of the S. epidermidis strains were performed using Illumina technology and a bioinformatics pipeline for assembly, which provided evidence that the SEC-coding sequences were located in a composite pathogenicity island that was previously described in the S. epidermidis strain FRI909, called SePI-1/SeCI-1, with 83.8-89.7% nucleotide similarity. Various other plasmids were identified, particularly p_3_95 and p_4_95, which carry antibiotic resistance genes ( hsrA and dfrG , respectively), and share homologies with SAP085A and pUSA04-2-SUR11, two plasmids described in S. aureus . Eventually, one complete prophage was identified, ΦSE90, sharing 30 out of 52 coding sequences with the Acinetobacter phage vB_AbaM_IME200. Thus, the SePI-1/SeCI-1 pathogenicity island was identified in two pathogenic strains of S. epidermidis that produced a SEC enterotoxin causing septic shock. These findings suggest the existence of in vivo genetic exchange from S. aureus to S. epidermidis .

  17. Comparative Genomics and Identification of an Enterotoxin-Bearing Pathogenicity Island, SEPI-1/SECI-1, in Staphylococcus epidermidis Pathogenic Strains

    Directory of Open Access Journals (Sweden)

    Xavier Argemi

    2018-02-01

    Full Text Available Staphylococcus epidermidis is a leading cause of nosocomial infections, majorly resistant to beta-lactam antibiotics, and may transfer several mobile genetic elements among the members of its own species, as well as to Staphylococcus aureus; however, a genetic exchange from S. aureus to S. epidermidis remains controversial. We recently identified two pathogenic clinical strains of S. epidermidis that produce a staphylococcal enterotoxin C3-like (SEC similar to that by S. aureus pathogenicity islands. This study aimed to determine the genetic environment of the SEC-coding sequence and to identify the mobile genetic elements. Whole-genome sequencing and annotation of the S. epidermidis strains were performed using Illumina technology and a bioinformatics pipeline for assembly, which provided evidence that the SEC-coding sequences were located in a composite pathogenicity island that was previously described in the S. epidermidis strain FRI909, called SePI-1/SeCI-1, with 83.8–89.7% nucleotide similarity. Various other plasmids were identified, particularly p_3_95 and p_4_95, which carry antibiotic resistance genes (hsrA and dfrG, respectively, and share homologies with SAP085A and pUSA04-2-SUR11, two plasmids described in S. aureus. Eventually, one complete prophage was identified, ΦSE90, sharing 30 out of 52 coding sequences with the Acinetobacter phage vB_AbaM_IME200. Thus, the SePI-1/SeCI-1 pathogenicity island was identified in two pathogenic strains of S. epidermidis that produced a SEC enterotoxin causing septic shock. These findings suggest the existence of in vivo genetic exchange from S. aureus to S. epidermidis.

  18. Genomic prediction of starch content and chipping quality in tetraploid potato using genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Sverrisdóttir, Elsa; Byrne, Stephen; Nielsen, Ea Høegh Riis

    2017-01-01

    continue to fall. In this study, we have generated genomic prediction models for starch content and chipping quality in tetraploid potato to facilitate varietal development. Chipping quality was evaluated as the colour of a potato chip after frying following cold induced sweetening. We used genotyping...... genomic estimated breeding values. Cross-validated prediction correlations of 0.56 and 0.73 were obtained within the training population for starch content and chipping quality, respectively, while correlations were lower when predicting performance in the test panel, at 0.30–0.31 and 0...

  19. Genomic timetree and historical biogeography of Caribbean island ameiva lizards (Pholidoscelis: Teiidae)

    OpenAIRE

    Tucker, Derek B.; Hedges, Stephen Blair; Colli, Guarino R.; Pyron, Robert Alexander; Sites, Jack W.

    2017-01-01

    Abstract The phylogenetic relationships and biogeographic history of Caribbean island ameivas (Pholidoscelis) are not well?known because of incomplete sampling, conflicting datasets, and poor support for many clades. Here, we use phylogenomic and mitochondrial DNA datasets to reconstruct a well?supported phylogeny and assess historical colonization patterns in the group. We obtained sequence data from 316 nuclear loci and one mitochondrial marker for 16 of 19 extant species of the Caribbean e...

  20. Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

    DEFF Research Database (Denmark)

    de los Campos, Gustavo; Vazquez, Ana I; Fernando, Rohan

    2013-01-01

    Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR......) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations....... However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the erformance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage...

  1. History Shaped the Geographic Distribution of Genomic Admixture on the Island of Puerto Rico

    Science.gov (United States)

    Via, Marc; Gignoux, Christopher R.; Roth, Lindsey A.; Fejerman, Laura; Galanter, Joshua; Choudhry, Shweta; Toro-Labrador, Gladys; Viera-Vera, Jorge; Oleksyk, Taras K.; Beckman, Kenneth; Ziv, Elad; Risch, Neil

    2011-01-01

    Contemporary genetic variation among Latin Americans human groups reflects population migrations shaped by complex historical, social and economic factors. Consequently, admixture patterns may vary by geographic regions ranging from countries to neighborhoods. We examined the geographic variation of admixture across the island of Puerto Rico and the degree to which it could be explained by historic and social events. We analyzed a census-based sample of 642 Puerto Rican individuals that were genotyped for 93 ancestry informative markers (AIMs) to estimate African, European and Native American ancestry. Socioeconomic status (SES) data and geographic location were obtained for each individual. There was significant geographic variation of ancestry across the island. In particular, African ancestry demonstrated a decreasing East to West gradient that was partially explained by historical factors linked to the colonial sugar plantation system. SES also demonstrated a parallel decreasing cline from East to West. However, at a local level, SES and African ancestry were negatively correlated. European ancestry was strongly negatively correlated with African ancestry and therefore showed patterns complementary to African ancestry. By contrast, Native American ancestry showed little variation across the island and across individuals and appears to have played little social role historically. The observed geographic distributions of SES and genetic variation relate to historical social events and mating patterns, and have substantial implications for the design of studies in the recently admixed Puerto Rican population. More generally, our results demonstrate the importance of incorporating social and geographic data with genetics when studying contemporary admixed populations. PMID:21304981

  2. Apophysomyces variabilis: draft genome sequence and comparison of predictive virulence determinants with other medically important Mucorales.

    Science.gov (United States)

    Prakash, Hariprasath; Rudramurthy, Shivaprakash Mandya; Gandham, Prasad S; Ghosh, Anup Kumar; Kumar, Milner M; Badapanda, Chandan; Chakrabarti, Arunaloke

    2017-09-18

    Apophysomyces species are prevalent in tropical countries and A. variabilis is the second most frequent agent causing mucormycosis in India. Among Apophysomyces species, A. elegans, A. trapeziformis and A. variabilis are commonly incriminated in human infections. The genome sequences of A. elegans and A. trapeziformis are available in public database, but not A. variabilis. We, therefore, performed the whole genome sequence of A. variabilis to explore its genomic structure and possible genes determining the virulence of the organism. The whole genome of A. variabilis NCCPF 102052 was sequenced and the genomic structure of A. variabilis was compared with already available genome structures of A. elegans, A. trapeziformis and other medically important Mucorales. The total size of genome assembly of A. variabilis was 39.38 Mb with 12,764 protein-coding genes. The transposable elements (TEs) were low in Apophysomyces genome and the retrotransposon Ty3-gypsy was the common TE. Phylogenetically, Apophysomyces species were grouped closely with Phycomyces blakesleeanus. OrthoMCL analysis revealed 3025 orthologues proteins, which were common in those three pathogenic Apophysomyces species. Expansion of multiple gene families/duplication was observed in Apophysomyces genomes. Approximately 6% of Apophysomyces genes were predicted to be associated with virulence on PHIbase analysis. The virulence determinants included the protein families of CotH proteins (invasins), proteases, iron utilisation pathways, siderophores and signal transduction pathways. Serine proteases were the major group of proteases found in all Apophysomyces genomes. The carbohydrate active enzymes (CAZymes) constitute the majority of the secretory proteins. The present study is the maiden attempt to sequence and analyze the genomic structure of A. variabilis. Together with available genome sequence of A. elegans and A. trapeziformis, the study helped to indicate the possible virulence determinants of

  3. Meta-analysis of genome-wide association from genomic prediction models

    Science.gov (United States)

    A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...

  4. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    International Nuclear Information System (INIS)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-01-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society

  5. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    Energy Technology Data Exchange (ETDEWEB)

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  6. From structure prediction to genomic screens for novel non-coding RNAs

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.

    2011-01-01

    Abstract: Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction....... This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other....

  7. Genomic Prediction of Seed Quality Traits Using Advanced Barley Breeding Lines

    Science.gov (United States)

    Nielsen, Nanna Hellum; Jahoor, Ahmed; Jensen, Jens Due; Orabi, Jihad; Cericola, Fabio; Edriss, Vahid; Jensen, Just

    2016-01-01

    Genomic selection was recently introduced in plant breeding. The objective of this study was to develop genomic prediction for important seed quality parameters in spring barley. The aim was to predict breeding values without expensive phenotyping of large sets of lines. A total number of 309 advanced spring barley lines tested at two locations each with three replicates were phenotyped and each line was genotyped by Illumina iSelect 9Kbarley chip. The population originated from two different breeding sets, which were phenotyped in two different years. Phenotypic measurements considered were: seed size, protein content, protein yield, test weight and ergosterol content. A leave-one-out cross-validation strategy revealed high prediction accuracies ranging between 0.40 and 0.83. Prediction across breeding sets resulted in reduced accuracies compared to the leave-one-out strategy. Furthermore, predicting across full and half-sib-families resulted in reduced prediction accuracies. Additionally, predictions were performed using reduced marker sets and reduced training population sets. In conclusion, using less than 200 lines in the training set can result in low prediction accuracy, and the accuracy will then be highly dependent on the family structure of the selected training set. However, the results also indicate that relatively small training sets (200 lines) are sufficient for genomic prediction in commercial barley breeding. In addition, our results indicate a minimum marker set of 1,000 to decrease the risk of low prediction accuracy for some traits or some families. PMID:27783639

  8. Genomic Prediction of Seed Quality Traits Using Advanced Barley Breeding Lines.

    Directory of Open Access Journals (Sweden)

    Nanna Hellum Nielsen

    Full Text Available Genomic selection was recently introduced in plant breeding. The objective of this study was to develop genomic prediction for important seed quality parameters in spring barley. The aim was to predict breeding values without expensive phenotyping of large sets of lines. A total number of 309 advanced spring barley lines tested at two locations each with three replicates were phenotyped and each line was genotyped by Illumina iSelect 9Kbarley chip. The population originated from two different breeding sets, which were phenotyped in two different years. Phenotypic measurements considered were: seed size, protein content, protein yield, test weight and ergosterol content. A leave-one-out cross-validation strategy revealed high prediction accuracies ranging between 0.40 and 0.83. Prediction across breeding sets resulted in reduced accuracies compared to the leave-one-out strategy. Furthermore, predicting across full and half-sib-families resulted in reduced prediction accuracies. Additionally, predictions were performed using reduced marker sets and reduced training population sets. In conclusion, using less than 200 lines in the training set can result in low prediction accuracy, and the accuracy will then be highly dependent on the family structure of the selected training set. However, the results also indicate that relatively small training sets (200 lines are sufficient for genomic prediction in commercial barley breeding. In addition, our results indicate a minimum marker set of 1,000 to decrease the risk of low prediction accuracy for some traits or some families.

  9. Unraveling the regulatory network of IncA/C plasmid mobilization: When genomic islands hijack conjugative elements.

    Science.gov (United States)

    Carraro, Nicolas; Matteau, Dominick; Burrus, Vincent; Rodrigue, Sébastien

    2015-01-01

    Conjugative plasmids of the A/C incompatibility group (IncA/C) have become substantial players in the dissemination of multidrug resistance. These large conjugative plasmids are characterized by their broad host-range, extended spectrum of antimicrobials resistance, and prevalence in enteric bacteria recovered from both environmental and clinical settings. Until recently, relatively little was known about the basic biology of IncA/C plasmids, mostly because of the hindrance of multidrug resistance for molecular biology experiments. To circumvent this issue, we previously developed pVCR94ΔX, a convenient prototype that codes for a reduced set of antibiotic resistances. Using pVCR94ΔX, we then characterized the regulatory pathway governing IncA/C plasmid dissemination. We found that the expression of roughly 2 thirds of the genes encoded by this plasmid, including large operons involved in the conjugation process, depends on an FlhCD-like master activator called AcaCD. Beyond the mobility of IncA/C plasmids, AcaCD was also shown to play a key role in the mobilization of different classes of genomic islands (GIs) identified in various pathogenic bacteria. By doing so, IncA/C plasmids can have a considerable impact on bacterial genomes plasticity and evolution.

  10. Genome-wide CpG island methylation and intergenic demethylation propensities vary among different tumor sites.

    Science.gov (United States)

    Lee, Seung-Tae; Wiemels, Joseph L

    2016-02-18

    The epigenetic landscape of cancer includes both focal hypermethylation and broader hypomethylation in a genome-wide manner. By means of a comprehensive genomic analysis on 6637 tissues of 21 tumor types, we here show that the degrees of overall methylation in CpG island (CGI) and demethylation in intergenic regions, defined as 'backbone', largely vary among different tumors. Depending on tumor type, both CGI methylation and backbone demethylation are often associated with clinical, epidemiological and biological features such as age, sex, smoking history, anatomic location, histological type and grade, stage, molecular subtype and biological pathways. We found connections between CGI methylation and hypermutability, microsatellite instability, IDH1 mutation, 19p gain and polycomb features, and backbone demethylation with chromosomal instability, NSD1 and TP53 mutations, 5q and 19p loss and long repressive domains. These broad epigenetic patterns add a new dimension to our understanding of tumor biology and its clinical implications. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates.

    Science.gov (United States)

    Onogi, Akio; Watanabe, Maya; Mochizuki, Toshihiro; Hayashi, Takeshi; Nakagawa, Hiroshi; Hasegawa, Toshihiro; Iwata, Hiroyoshi

    2016-04-01

    It is suggested that accuracy in predicting plant phenotypes can be improved by integrating genomic prediction with crop modelling in a single hierarchical model. Accurate prediction of phenotypes is important for plant breeding and management. Although genomic prediction/selection aims to predict phenotypes on the basis of whole-genome marker information, it is often difficult to predict phenotypes of complex traits in diverse environments, because plant phenotypes are often influenced by genotype-environment interaction. A possible remedy is to integrate genomic prediction with crop/ecophysiological modelling, which enables us to predict plant phenotypes using environmental and management information. To this end, in the present study, we developed a novel method for integrating genomic prediction with phenological modelling of Asian rice (Oryza sativa, L.), allowing the heading date of untested genotypes in untested environments to be predicted. The method simultaneously infers the phenological model parameters and whole-genome marker effects on the parameters in a Bayesian framework. By cultivating backcross inbred lines of Koshihikari × Kasalath in nine environments, we evaluated the potential of the proposed method in comparison with conventional genomic prediction, phenological modelling, and two-step methods that applied genomic prediction to phenological model parameters inferred from Nelder-Mead or Markov chain Monte Carlo algorithms. In predicting heading dates of untested lines in untested environments, the proposed and two-step methods tended to provide more accurate predictions than the conventional genomic prediction methods, particularly in environments where phenotypes from environments similar to the target environment were unavailable for training genomic prediction. The proposed method showed greater accuracy in prediction than the two-step methods in all cross-validation schemes tested, suggesting the potential of the integrated approach in

  12. Accuracy of Genomic Prediction in Switchgrass (Panicum virgatum L. Improved by Accounting for Linkage Disequilibrium

    Directory of Open Access Journals (Sweden)

    Guillaume P. Ramstein

    2016-04-01

    Full Text Available Switchgrass is a relatively high-yielding and environmentally sustainable biomass crop, but further genetic gains in biomass yield must be achieved to make it an economically viable bioenergy feedstock. Genomic selection (GS is an attractive technology to generate rapid genetic gains in switchgrass, and meet the goals of a substantial displacement of petroleum use with biofuels in the near future. In this study, we empirically assessed prediction procedures for genomic selection in two different populations, consisting of 137 and 110 half-sib families of switchgrass, tested in two locations in the United States for three agronomic traits: dry matter yield, plant height, and heading date. Marker data were produced for the families’ parents by exome capture sequencing, generating up to 141,030 polymorphic markers with available genomic-location and annotation information. We evaluated prediction procedures that varied not only by learning schemes and prediction models, but also by the way the data were preprocessed to account for redundancy in marker information. More complex genomic prediction procedures were generally not significantly more accurate than the simplest procedure, likely due to limited population sizes. Nevertheless, a highly significant gain in prediction accuracy was achieved by transforming the marker data through a marker correlation matrix. Our results suggest that marker-data transformations and, more generally, the account of linkage disequilibrium among markers, offer valuable opportunities for improving prediction procedures in GS. Some of the achieved prediction accuracies should motivate implementation of GS in switchgrass breeding programs.

  13. Gene prediction and RFX transcriptional regulation analysis using comparative genomics

    OpenAIRE

    Chu, Jeffrey Shih Chieh

    2011-01-01

    Regulatory Factor X (RFX) is a family of transcription factors (TF) that is conserved in all metazoans, in some fungi, and in only a few single-cellular organisms. Seven members are found in mammals, nine in fishes, three in fruit flies, and a single member in nematodes and fungi. RFX is involved in many different roles in humans, but a particular function that is conserved in many metazoans is its regulation of ciliogenesis. Probing over 150 genomes for the presence of RFX and ciliary genes ...

  14. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

    Directory of Open Access Journals (Sweden)

    McCarthy Fiona M

    2007-11-01

    Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and

  15. Genome-wide association study, genomic prediction and marker-assisted selection for seed weight in soybean (Glycine max).

    Science.gov (United States)

    Zhang, Jiaoping; Song, Qijian; Cregan, Perry B; Jiang, Guo-Liang

    2016-01-01

    Twenty-two loci for soybean SW and candidate genes conditioning seed development were identified; and prediction accuracies of GS and MAS were estimated through cross-validation and validation with unrelated populations. Soybean (Glycine max) is a major crop for plant protein and oil production, and seed weight (SW) is important for yield and quality in food/vegetable uses of soybean. However, our knowledge of genes controlling SW remains limited. To better understand the molecular mechanism underlying the trait and explore marker-based breeding approaches, we conducted a genome-wide association study in a population of 309 soybean germplasm accessions using 31,045 single nucleotide polymorphisms (SNPs), and estimated the prediction accuracy of genomic selection (GS) and marker-assisted selection (MAS) for SW. Twenty-two loci of minor effect associated with SW were identified, including hotspots on Gm04 and Gm19. The mixed model containing these loci explained 83.4% of phenotypic variation. Candidate genes with Arabidopsis orthologs conditioning SW were also proposed. The prediction accuracies of GS and MAS by cross-validation were 0.75-0.87 and 0.62-0.75, respectively, depending on the number of SNPs used and the size of training population. GS also outperformed MAS when the validation was performed using unrelated panels across a wide range of maturities, with an average prediction accuracy of 0.74 versus 0.53. This study convincingly demonstrated that soybean SW is controlled by numerous minor-effect loci. It greatly enhances our understanding of the genetic basis of SW in soybean and facilitates the identification of genes controlling the trait. It also suggests that GS holds promise for accelerating soybean breeding progress. The results are helpful for genetic improvement and genomic prediction of yield in soybean.

  16. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    Science.gov (United States)

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  17. Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets

    DEFF Research Database (Denmark)

    Ali, Amjad; Naz, Anam; Soares, Siomar C.

    2015-01-01

    -genome approach; the predicted conserved gene families (1,193) constitute similar to 77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost....... Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan...

  18. Landscape genomic prediction for restoration of a Eucalyptus foundation species under climate change.

    Science.gov (United States)

    Supple, Megan Ann; Bragg, Jason G; Broadhurst, Linda M; Nicotra, Adrienne B; Byrne, Margaret; Andrew, Rose L; Widdup, Abigail; Aitken, Nicola C; Borevitz, Justin O

    2018-04-24

    As species face rapid environmental change, we can build resilient populations through restoration projects that incorporate predicted future climates into seed sourcing decisions. Eucalyptus melliodora is a foundation species of a critically endangered community in Australia that is a target for restoration. We examined genomic and phenotypic variation to make empirical based recommendations for seed sourcing. We examined isolation by distance and isolation by environment, determining high levels of gene flow extending for 500 km and correlations with climate and soil variables. Growth experiments revealed extensive phenotypic variation both within and among sampling sites, but no site-specific differentiation in phenotypic plasticity. Model predictions suggest that seed can be sourced broadly across the landscape, providing ample diversity for adaptation to environmental change. Application of our landscape genomic model to E. melliodora restoration projects can identify genomic variation suitable for predicted future climates, thereby increasing the long term probability of successful restoration. © 2018, Supple et al.

  19. SPOCS: Software for Predicting and Visualizing Orthology/Paralogy Relationships Among Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Curtis, Darren S.; Phillips, Aaron R.; Callister, Stephen J.; Conlan, Sean; McCue, Lee Ann

    2013-10-15

    At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. AVAILABILITY AND IMPLEMENTATION: A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required.

  20. Model training across multiple breeding cycles significantly improves genomic prediction accuracy in rye (Secale cereale L.).

    Science.gov (United States)

    Auinger, Hans-Jürgen; Schönleben, Manfred; Lehermeier, Christina; Schmidt, Malthe; Korzun, Viktor; Geiger, Hartwig H; Piepho, Hans-Peter; Gordillo, Andres; Wilde, Peer; Bauer, Eva; Schön, Chris-Carolin

    2016-11-01

    Genomic prediction accuracy can be significantly increased by model calibration across multiple breeding cycles as long as selection cycles are connected by common ancestors. In hybrid rye breeding, application of genome-based prediction is expected to increase selection gain because of long selection cycles in population improvement and development of hybrid components. Essentially two prediction scenarios arise: (1) prediction of the genetic value of lines from the same breeding cycle in which model training is performed and (2) prediction of lines from subsequent cycles. It is the latter from which a reduction in cycle length and consequently the strongest impact on selection gain is expected. We empirically investigated genome-based prediction of grain yield, plant height and thousand kernel weight within and across four selection cycles of a hybrid rye breeding program. Prediction performance was assessed using genomic and pedigree-based best linear unbiased prediction (GBLUP and PBLUP). A total of 1040 S 2 lines were genotyped with 16 k SNPs and each year testcrosses of 260 S 2 lines were phenotyped in seven or eight locations. The performance gap between GBLUP and PBLUP increased significantly for all traits when model calibration was performed on aggregated data from several cycles. Prediction accuracies obtained from cross-validation were in the order of 0.70 for all traits when data from all cycles (N CS  = 832) were used for model training and exceeded within-cycle accuracies in all cases. As long as selection cycles are connected by a sufficient number of common ancestors and prediction accuracy has not reached a plateau when increasing sample size, aggregating data from several preceding cycles is recommended for predicting genetic values in subsequent cycles despite decreasing relatedness over time.

  1. Methods to improve genomic prediction and GWAS using combined Holstein populations

    DEFF Research Database (Denmark)

    Li, Xiujin

    The thesis focuses on methods to improve GWAS and genomic prediction using combined Holstein populations and investigations G by E interaction. The conclusions are: 1) Prediction reliabilities for Brazilian Holsteins can be increased by adding Nordic and Frensh genotyped bulls and a large G by E...... interaction exists between populations. 2) Combining data from Chinese and Danish Holstein populations increases the power of GWAS and detects new QTL regions for milk fatty acid traits. 3) The novel multi-trait Bayesian model efficiently estimates region-specific genomic variances, covariances...

  2. Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.

    Science.gov (United States)

    Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu

    2017-09-01

    Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  3. Genomic prediction in a nuclear population of layers using single-step models.

    Science.gov (United States)

    Yan, Yiyuan; Wu, Guiqin; Liu, Aiqiao; Sun, Congjiao; Han, Wenpeng; Li, Guangqi; Yang, Ning

    2018-02-01

    Single-step genomic prediction method has been proposed to improve the accuracy of genomic prediction by incorporating information of both genotyped and ungenotyped animals. The objective of this study is to compare the prediction performance of single-step model with a 2-step models and the pedigree-based models in a nuclear population of layers. A total of 1,344 chickens across 4 generations were genotyped by a 600 K SNP chip. Four traits were analyzed, i.e., body weight at 28 wk (BW28), egg weight at 28 wk (EW28), laying rate at 38 wk (LR38), and Haugh unit at 36 wk (HU36). In predicting offsprings, individuals from generation 1 to 3 were used as training data and females from generation 4 were used as validation set. The accuracies of predicted breeding values by pedigree BLUP (PBLUP), genomic BLUP (GBLUP), SSGBLUP and single-step blending (SSBlending) were compared for both genotyped and ungenotyped individuals. For genotyped females, GBLUP performed no better than PBLUP because of the small size of training data, while the 2 single-step models predicted more accurately than the PBLUP model. The average predictive ability of SSGBLUP and SSBlending were 16.0% and 10.8% higher than the PBLUP model across traits, respectively. Furthermore, the predictive abilities for ungenotyped individuals were also enhanced. The average improvements of prediction abilities were 5.9% and 1.5% for SSGBLUP and SSBlending model, respectively. It was concluded that single-step models, especially the SSGBLUP model, can yield more accurate prediction of genetic merits and are preferable for practical implementation of genomic selection in layers. © 2017 Poultry Science Association Inc.

  4. Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available Protein interaction networks are an important part of the post-genomic effort to integrate a part-list view of the cell into system-level understanding. Using a set of 11 yeast genomes we show that combining comparative genomics and secondary structure information greatly increases consensus-based prediction of SH3 targets. Benchmarking of our method against positive and negative standards gave 83% accuracy with 26% coverage. The concept of an optimal divergence time for effective comparative genomics studies was analyzed, demonstrating that genomes of species that diverged very recently from Saccharomyces cerevisiae(S. mikatae, S. bayanus, and S. paradoxus, or a long time ago (Neurospora crassa and Schizosaccharomyces pombe, contain less information for accurate prediction of SH3 targets than species within the optimal divergence time proposed. We also show here that intrinsically disordered SH3 domain targets are more probable sites of interaction than equivalent sites within ordered regions. Our findings highlight several novel S. cerevisiae SH3 protein interactions, the value of selection of optimal divergence times in comparative genomics studies, and the importance of intrinsic disorder for protein interactions. Based on our results we propose novel roles for the S. cerevisiae proteins Abp1p in endocytosis and Hse1p in endosome protein sorting.

  5. Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions.

    Directory of Open Access Journals (Sweden)

    Pedro Beltrao

    2005-08-01

    Full Text Available Protein interaction networks are an important part of the post-genomic effort to integrate a part-list view of the cell into system-level understanding. Using a set of 11 yeast genomes we show that combining comparative genomics and secondary structure information greatly increases consensus-based prediction of SH3 targets. Benchmarking of our method against positive and negative standards gave 83% accuracy with 26% coverage. The concept of an optimal divergence time for effective comparative genomics studies was analyzed, demonstrating that genomes of species that diverged very recently from Saccharomyces cerevisiae(S. mikatae, S. bayanus, and S. paradoxus, or a long time ago (Neurospora crassa and Schizosaccharomyces pombe, contain less information for accurate prediction of SH3 targets than species within the optimal divergence time proposed. We also show here that intrinsically disordered SH3 domain targets are more probable sites of interaction than equivalent sites within ordered regions. Our findings highlight several novel S. cerevisiae SH3 protein interactions, the value of selection of optimal divergence times in comparative genomics studies, and the importance of intrinsic disorder for protein interactions. Based on our results we propose novel roles for the S. cerevisiae proteins Abp1p in endocytosis and Hse1p in endosome protein sorting.

  6. Multiple-Trait Genomic Selection Methods Increase Genetic Value Prediction Accuracy

    Science.gov (United States)

    Jia, Yi; Jannink, Jean-Luc

    2012-01-01

    Genetic correlations between quantitative traits measured in many breeding programs are pervasive. These correlations indicate that measurements of one trait carry information on other traits. Current single-trait (univariate) genomic selection does not take advantage of this information. Multivariate genomic selection on multiple traits could accomplish this but has been little explored and tested in practical breeding programs. In this study, three multivariate linear models (i.e., GBLUP, BayesA, and BayesCπ) were presented and compared to univariate models using simulated and real quantitative traits controlled by different genetic architectures. We also extended BayesA with fixed hyperparameters to a full hierarchical model that estimated hyperparameters and BayesCπ to impute missing phenotypes. We found that optimal marker-effect variance priors depended on the genetic architecture of the trait so that estimating them was beneficial. We showed that the prediction accuracy for a low-heritability trait could be significantly increased by multivariate genomic selection when a correlated high-heritability trait was available. Further, multiple-trait genomic selection had higher prediction accuracy than single-trait genomic selection when phenotypes are not available on all individuals and traits. Additional factors affecting the performance of multiple-trait genomic selection were explored. PMID:23086217

  7. The stealth episome: suppression of gene expression on the excised genomic island PPHGI-1 from Pseudomonas syringae pv. phaseolicola.

    Directory of Open Access Journals (Sweden)

    Scott A C Godfrey

    2011-03-01

    Full Text Available Pseudomonas syringae pv. phaseolicola is the causative agent of halo blight in the common bean, Phaseolus vulgaris. P. syringae pv. phaseolicola race 4 strain 1302A contains the avirulence gene avrPphB (syn. hopAR1, which resides on PPHGI-1, a 106 kb genomic island. Loss of PPHGI-1 from P. syringae pv. phaseolicola 1302A following exposure to the hypersensitive resistance response (HR leads to the evolution of strains with altered virulence. Here we have used fluorescent protein reporter systems to gain insight into the mobility of PPHGI-1. Confocal imaging of dual-labelled P. syringae pv. phaseolicola 1302A strain, F532 (dsRFP in chromosome and eGFP in PPHGI-1, revealed loss of PPHGI-1::eGFP encoded fluorescence during plant infection and when grown in vitro on extracted leaf apoplastic fluids. Fluorescence-activated cell sorting (FACS of fluorescent and non-fluorescent PPHGI-1::eGFP F532 populations showed that cells lost fluorescence not only when the GI was deleted, but also when it had excised and was present as a circular episome. In addition to reduced expression of eGFP, quantitative PCR on sub-populations separated by FACS showed that transcription of other genes on PPHGI-1 (avrPphB and xerC was also greatly reduced in F532 cells harbouring the excised PPHGI-1::eGFP episome. Our results show how virulence determinants located on mobile pathogenicity islands may be hidden from detection by host surveillance systems through the suppression of gene expression in the episomal state.

  8. Prediction of Land Use Change in Long Island Sound Watersheds Using Nighttime Light Data

    Directory of Open Access Journals (Sweden)

    Ruiting Zhai

    2016-12-01

    Full Text Available The Long Island Sound Watersheds (LISW are experiencing significant land use/cover change (LUCC, which affects the environment and ecosystems in the watersheds through water pollution, carbon emissions, and loss of wildlife. LUCC modeling is an important approach to understanding what has happened in the landscape and what may change in the future. Moreover, prospective modeling can provide sustainable and efficient decision support for land planning and environmental management. This paper modeled the LUCCs between 1996, 2001 and 2006 in the LISW in the New England region, which experienced an increase in developed area and a decrease of forest. The low-density development pattern played an important role in the loss of forest and the expansion of urban areas. The key driving forces were distance to developed areas, distance to roads, and social-economic drivers, such as nighttime light intensity and population density. In addition, this paper compared and evaluated two integrated LUCC models—the logistic regression–Markov chain model and the multi-layer perception–Markov chain (MLP–MC model. Both models achieved high accuracy in prediction, but the MLP–MC model performed slightly better. Finally, a land use map for 2026 was predicted by using the MLP–MC model, and it indicates the continued loss of forest and increase of developed area.

  9. Involvement of β-carbonic anhydrase (β-CA) genes in bacterial genomic islands and horizontal transfer to protists.

    Science.gov (United States)

    Zolfaghari Emameh, Reza; Barker, Harlan R; Hytönen, Vesa P; Parkkila, Seppo

    2018-05-25

    Genomic islands (GIs) are a type of mobile genetic element (MGE) that are present in bacterial chromosomes. They consist of a cluster of genes which produce proteins that contribute to a variety of functions, including, but not limited to, regulation of cell metabolism, anti-microbial resistance, pathogenicity, virulence, and resistance to heavy metals. The genes carried in MGEs can be used as a trait reservoir in times of adversity. Transfer of genes using MGEs, occurring outside of reproduction, is called horizontal gene transfer (HGT). Previous literature has shown that numerous HGT events have occurred through endosymbiosis between prokaryotes and eukaryotes.Beta carbonic anhydrase (β-CA) enzymes play a critical role in the biochemical pathways of many prokaryotes and eukaryotes. We have previously suggested horizontal transfer of β-CA genes from plasmids of some prokaryotic endosymbionts to their protozoan hosts. In this study, we set out to identify β-CA genes that might have transferred between prokaryotic and protist species through HGT in GIs. Therefore, we investigated prokaryotic chromosomes containing β-CA-encoding GIs and utilized multiple bioinformatics tools to reveal the distinct movements of β-CA genes among a wide variety of organisms. Our results identify the presence of β-CA genes in GIs of several medically and industrially relevant bacterial species, and phylogenetic analyses reveal multiple cases of likely horizontal transfer of β-CA genes from GIs of ancestral prokaryotes to protists. IMPORTANCE The evolutionary process is mediated by mobile genetic elements (MGEs), such as genomic islands (GIs). A gene or set of genes in the GIs are exchanged between and within various species through horizontal gene transfer (HGT). Based on the crucial role that GIs can play in bacterial survival and proliferation, they were introduced as the environmental- and pathogen-associated factors. Carbonic anhydrases (CAs) are involved in many critical

  10. Prediction of Cacao (Theobroma cacao) Resistance to Moniliophthora spp. Diseases via Genome-Wide Association Analysis and Genomic Selection.

    Science.gov (United States)

    McElroy, Michel S; Navarro, Alberto J R; Mustiga, Guiliana; Stack, Conrad; Gezan, Salvador; Peña, Geover; Sarabia, Widem; Saquicela, Diego; Sotomayor, Ignacio; Douglas, Gavin M; Migicovsky, Zoë; Amores, Freddy; Tarqui, Omar; Myles, Sean; Motamayor, Juan C

    2018-01-01

    Cacao ( Theobroma cacao ) is a globally important crop, and its yield is severely restricted by disease. Two of the most damaging diseases, witches' broom disease (WBD) and frosty pod rot disease (FPRD), are caused by a pair of related fungi: Moniliophthora perniciosa and Moniliophthora roreri , respectively. Resistant cultivars are the most effective long-term strategy to address Moniliophthora diseases, but efficiently generating resistant and productive new cultivars will require robust methods for screening germplasm before field testing. Marker-assisted selection (MAS) and genomic selection (GS) provide two potential avenues for predicting the performance of new genotypes, potentially increasing the selection gain per unit time. To test the effectiveness of these two approaches, we performed a genome-wide association study (GWAS) and GS on three related populations of cacao in Ecuador genotyped with a 15K single nucleotide polymorphism (SNP) microarray for three measures of WBD infection (vegetative broom, cushion broom, and chirimoya pod), one of FPRD (monilia pod) and two productivity traits (total fresh weight of pods and % healthy pods produced). GWAS yielded several SNPs associated with disease resistance in each population, but none were significantly correlated with the same trait in other populations. Genomic selection, using one population as a training set to estimate the phenotypes of the remaining two (composed of different families), varied among traits, from a mean prediction accuracy of 0.46 (vegetative broom) to 0.15 (monilia pod), and varied between training populations. Simulations demonstrated that selecting seedlings using GWAS markers alone generates no improvement over selecting at random, but that GS improves the selection process significantly. Our results suggest that the GWAS markers discovered here are not sufficiently predictive across diverse germplasm to be useful for MAS, but that using all markers in a GS framework holds

  11. Prediction of Cacao (Theobroma cacao Resistance to Moniliophthora spp. Diseases via Genome-Wide Association Analysis and Genomic Selection

    Directory of Open Access Journals (Sweden)

    Michel S. McElroy

    2018-03-01

    Full Text Available Cacao (Theobroma cacao is a globally important crop, and its yield is severely restricted by disease. Two of the most damaging diseases, witches’ broom disease (WBD and frosty pod rot disease (FPRD, are caused by a pair of related fungi: Moniliophthora perniciosa and Moniliophthora roreri, respectively. Resistant cultivars are the most effective long-term strategy to address Moniliophthora diseases, but efficiently generating resistant and productive new cultivars will require robust methods for screening germplasm before field testing. Marker-assisted selection (MAS and genomic selection (GS provide two potential avenues for predicting the performance of new genotypes, potentially increasing the selection gain per unit time. To test the effectiveness of these two approaches, we performed a genome-wide association study (GWAS and GS on three related populations of cacao in Ecuador genotyped with a 15K single nucleotide polymorphism (SNP microarray for three measures of WBD infection (vegetative broom, cushion broom, and chirimoya pod, one of FPRD (monilia pod and two productivity traits (total fresh weight of pods and % healthy pods produced. GWAS yielded several SNPs associated with disease resistance in each population, but none were significantly correlated with the same trait in other populations. Genomic selection, using one population as a training set to estimate the phenotypes of the remaining two (composed of different families, varied among traits, from a mean prediction accuracy of 0.46 (vegetative broom to 0.15 (monilia pod, and varied between training populations. Simulations demonstrated that selecting seedlings using GWAS markers alone generates no improvement over selecting at random, but that GS improves the selection process significantly. Our results suggest that the GWAS markers discovered here are not sufficiently predictive across diverse germplasm to be useful for MAS, but that using all markers in a GS framework holds

  12. Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

    Science.gov (United States)

    de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

    2013-01-01

    Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228

  13. RNA 3D modules in genome-wide predictions of RNA 2D structure

    DEFF Research Database (Denmark)

    Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

    2015-01-01

    . These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D......Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution...

  14. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery

    DEFF Research Database (Denmark)

    Hickey, John M.; Chiurugwi, Tinashe; Mackay, Ian

    2017-01-01

    The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human...... that unifies breeding approaches, biological discovery, and tools and methods. Here we compare and contrast some animal and plant breeding approaches to make a case for bringing the two together through the application of genomic selection. We propose a strategy for the use of genomic selection as a unifying...... use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic prediction of breeding values has the potential to improve selection, reduce costs and provide a platform...

  15. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery.

    Science.gov (United States)

    Hickey, John M; Chiurugwi, Tinashe; Mackay, Ian; Powell, Wayne

    2017-08-30

    The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic prediction of breeding values has the potential to improve selection, reduce costs and provide a platform that unifies breeding approaches, biological discovery, and tools and methods. Here we compare and contrast some animal and plant breeding approaches to make a case for bringing the two together through the application of genomic selection. We propose a strategy for the use of genomic selection as a unifying approach to deliver innovative 'step changes' in the rate of genetic gain at scale.

  16. A probabilistic model to predict clinical phenotypic traits from genome sequencing.

    Science.gov (United States)

    Chen, Yun-Ching; Douville, Christopher; Wang, Cheng; Niknafs, Noushin; Yeo, Grace; Beleva-Guthrie, Violeta; Carter, Hannah; Stenson, Peter D; Cooper, David N; Li, Biao; Mooney, Sean; Karchin, Rachel

    2014-09-01

    Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.

  17. Sequence-Based Characterization of Tn5801-Like Genomic Islands in Tetracycline-Resistant Staphylococcus pseudintermedius and Other Gram-positive Bacteria from Humans and Animals

    DEFF Research Database (Denmark)

    de Vries, Lisbeth Elvira; Hasman, Henrik; Jurado Rabadán, Sonia

    2016-01-01

    Antibiotic resistance in pathogens is often associated with mobile genetic elements, such as genomic islands (GI) including integrative and conjugative elements (ICEs). These can transfer resistance genes within and between bacteria from humans and/or animals. The aim of this study was to investi......Antibiotic resistance in pathogens is often associated with mobile genetic elements, such as genomic islands (GI) including integrative and conjugative elements (ICEs). These can transfer resistance genes within and between bacteria from humans and/or animals. The aim of this study......-like GIs appear to be relatively common in tetracycline-resistant S. pseudintermedius in Denmark. Almost identical Tn5801-like GIs were identified in different Gram-positive species of pet and human origin, suggesting that horizontal transfer of these elements has occurred between S. pseudintermedius...

  18. Evaluation of genome-enabled selection for bacterial cold water disease resistance using progeny performance data in Rainbow Trout: Insights on genotyping methods and genomic prediction models

    Science.gov (United States)

    Bacterial cold water disease (BCWD) causes significant economic losses in salmonid aquaculture, and traditional family-based breeding programs aimed at improving BCWD resistance have been limited to exploiting only between-family variation. We used genomic selection (GS) models to predict genomic br...

  19. Joint Genomic Prediction of Canine Hip Dysplasia in UK and US Labrador Retrievers

    Directory of Open Access Journals (Sweden)

    Stefan M. Edwards

    2018-03-01

    Full Text Available Canine hip dysplasia, a debilitating orthopedic disorder that leads to osteoarthritis and cartilage degeneration, is common in several large-sized dog breeds and shows moderate heritability suggesting that selection can reduce prevalence. Estimating genomic breeding values require large reference populations, which are expensive to genotype for development of genomic prediction tools. Combining datasets from different countries could be an option to help build larger reference datasets without incurring extra genotyping costs. Our objective was to evaluate genomic prediction based on a combination of UK and US datasets of genotyped dogs with records of Norberg angle scores, related to canine hip dysplasia. Prediction accuracies using a single population were 0.179 and 0.290 for 1,179 and 242 UK and US Labrador Retrievers, respectively. Prediction accuracies changed to 0.189 and 0.260, with an increased bias of genomic breeding values when using a joint training set (biased upwards for the US population and downwards for the UK population. Our results show that in this study of canine hip dysplasia, little or no benefit was gained from using a joint training set as compared to using a single population as training set. We attribute this to differences in the genetic background of the two populations as well as the small sample size of the US dataset.

  20. Genomic prediction based on data from three layer lines using non-linear regression models

    NARCIS (Netherlands)

    Huang, H.; Windig, J.J.; Vereijken, A.; Calus, M.P.L.

    2014-01-01

    Background - Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. Methods - In an attempt to alleviate

  1. Performance of genomic prediction within and across generations in maritime pine

    NARCIS (Netherlands)

    Bartholomé, Jérôme; Heerwaarden, Van Joost; Isik, Fikret; Boury, Christophe; Vidal, Marjorie; Plomion, Christophe; Bouffier, Laurent

    2016-01-01

    Background: Genomic selection (GS) is a promising approach for decreasing breeding cycle length in forest trees. Assessment of progeny performance and of the prediction accuracy of GS models over generations is therefore a key issue. Results: A reference population of maritime pine (Pinus

  2. Genome-enabled predictions for binomial traits in sugar beet populations.

    Science.gov (United States)

    Biscarini, Filippo; Stevanato, Piergiorgio; Broccanello, Chiara; Stella, Alessandra; Saccomani, Massimo

    2014-07-22

    Genomic information can be used to predict not only continuous but also categorical (e.g. binomial) traits. Several traits of interest in human medicine and agriculture present a discrete distribution of phenotypes (e.g. disease status). Root vigor in sugar beet (B. vulgaris) is an example of binomial trait of agronomic importance. In this paper, a panel of 192 SNPs (single nucleotide polymorphisms) was used to genotype 124 sugar beet individual plants from 18 lines, and to classify them as showing "high" or "low" root vigor. A threshold model was used to fit the relationship between binomial root vigor and SNP genotypes, through the matrix of genomic relationships between individuals in a genomic BLUP (G-BLUP) approach. From a 5-fold cross-validation scheme, 500 testing subsets were generated. The estimated average cross-validation error rate was 0.000731 (0.073%). Only 9 out of 12326 test observations (500 replicates for an average test set size of 24.65) were misclassified. The estimated prediction accuracy was quite high. Such accurate predictions may be related to the high estimated heritability for root vigor (0.783) and to the few genes with large effect underlying the trait. Despite the sparse SNP panel, there was sufficient within-scaffold LD where SNPs with large effect on root vigor were located to allow for genome-enabled predictions to work.

  3. Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis

    DEFF Research Database (Denmark)

    Pedersen, Anders Gorm; Nielsen, Henrik

    1997-01-01

    Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role.This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known...

  4. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed was...

  5. Genome-Wide Polygenic Scores Predict Reading Performance throughout the School Years

    Science.gov (United States)

    Selzam, Saskia; Dale, Philip S.; Wagner, Richard K.; DeFries, John C.; Cederlöf, Martin; O'Reilly, Paul F.; Krapohl, Eva; Plomin, Robert

    2017-01-01

    It is now possible to create individual-specific genetic scores, called genome-wide polygenic scores (GPS). We used a GPS for years of education ("EduYears") to predict reading performance assessed at UK National Curriculum Key Stages 1 (age 7), 2 (age 12) and 3 (age 14) and on reading tests administered at ages 7 and 12 in a UK sample…

  6. Genome-Wide Prediction of the Performance of Three-Way Hybrids in Barley

    Directory of Open Access Journals (Sweden)

    Zuo Li

    2017-03-01

    Full Text Available Predicting the grain yield performance of three-way hybrids is challenging. Three-way crosses are relevant for hybrid breeding in barley ( L. and maize ( L. adapted to East Africa. The main goal of our study was to implement and evaluate genome-wide prediction approaches of the performance of three-way hybrids using data of single-cross hybrids for a scenario in which parental lines of the three-way hybrids originate from three genetically distinct subpopulations. We extended the ridge regression best linear unbiased prediction (RRBLUP and devised a genomic selection model allowing for subpopulation-specific marker effects (GSA-RRBLUP: general and subpopulation-specific additive RRBLUP. Using an empirical barley data set, we showed that applying GSA-RRBLUP tripled the prediction ability of three-way hybrids from 0.095 to 0.308 compared with RRBLUP, modeling one additive effect for all three subpopulations. The experimental findings were further substantiated with computer simulations. Our results emphasize the potential of GSA-RRBLUP to improve genome-wide hybrid prediction of three-way hybrids for scenarios of genetically diverse parental populations. Because of the advantages of the GSA-RRBLUP model in dealing with hybrids from different parental populations, it may also be a promising approach to boost the prediction ability for hybrid breeding programs based on genetically diverse heterotic groups.

  7. Sharing reference data and including cows in the reference population improve genomic predictions in Danish Jersey.

    Science.gov (United States)

    Su, G; Ma, P; Nielsen, U S; Aamand, G P; Wiggans, G; Guldbrandtsen, B; Lund, M S

    2016-06-01

    Small reference populations limit the accuracy of genomic prediction in numerically small breeds, such like Danish Jersey. The objective of this study was to investigate two approaches to improve genomic prediction by increasing size of reference population in Danish Jersey. The first approach was to include North American Jersey bulls in Danish Jersey reference population. The second was to genotype cows and use them as reference animals. The validation of genomic prediction was carried out on bulls and cows, respectively. In validation on bulls, about 300 Danish bulls (depending on traits) born in 2005 and later were used as validation data, and the reference populations were: (1) about 1050 Danish bulls, (2) about 1050 Danish bulls and about 1150 US bulls. In validation on cows, about 3000 Danish cows from 87 young half-sib families were used as validation data, and the reference populations were: (1) about 1250 Danish bulls, (2) about 1250 Danish bulls and about 1150 US bulls, (3) about 1250 Danish bulls and about 4800 cows, (4) about 1250 Danish bulls, 1150 US bulls and 4800 Danish cows. Genomic best linear unbiased prediction model was used to predict breeding values. De-regressed proofs were used as response variables. In the validation on bulls for eight traits, the joint DK-US bull reference population led to higher reliability of genomic prediction than the DK bull reference population for six traits, but not for fertility and longevity. Averaged over the eight traits, the gain was 3 percentage points. In the validation on cows for six traits (fertility and longevity were not available), the gain from inclusion of US bull in reference population was 6.6 percentage points in average over the six traits, and the gain from inclusion of cows was 8.2 percentage points. However, the gains from cows and US bulls were not accumulative. The total gain of including both US bulls and Danish cows was 10.5 percentage points. The results indicate that sharing reference

  8. Structure of a short-chain dehydrogenase/reductase (SDR) within a genomic island from a clinical strain of Acinetobacter baumannii

    Energy Technology Data Exchange (ETDEWEB)

    Shah, Bhumika S., E-mail: bhumika.shah@mq.edu.au; Tetu, Sasha G. [Macquarie University, Research Park Drive, Sydney, NSW 2109 (Australia); Harrop, Stephen J. [University of New South Wales, Sydney, NSW 2052 (Australia); Paulsen, Ian T.; Mabbutt, Bridget C. [Macquarie University, Research Park Drive, Sydney, NSW 2109 (Australia)

    2014-09-25

    The structure of a short-chain dehydrogenase encoded within genomic islands of A. baumannii strains has been solved to 2.4 Å resolution. This classical SDR incorporates a flexible helical subdomain. The NADP-binding site and catalytic side chains are identified. Over 15% of the genome of an Australian clinical isolate of Acinetobacter baumannii occurs within genomic islands. An uncharacterized protein encoded within one island feature common to this and other International Clone II strains has been studied by X-ray crystallography. The 2.4 Å resolution structure of SDR-WM99c reveals it to be a new member of the classical short-chain dehydrogenase/reductase (SDR) superfamily. The enzyme contains a nucleotide-binding domain and, like many other SDRs, is tetrameric in form. The active site contains a catalytic tetrad (Asn117, Ser146, Tyr159 and Lys163) and water molecules occupying the presumed NADP cofactor-binding pocket. An adjacent cleft is capped by a relatively mobile helical subdomain, which is well positioned to control substrate access.

  9. Structure of a short-chain dehydrogenase/reductase (SDR) within a genomic island from a clinical strain of Acinetobacter baumannii

    International Nuclear Information System (INIS)

    Shah, Bhumika S.; Tetu, Sasha G.; Harrop, Stephen J.; Paulsen, Ian T.; Mabbutt, Bridget C.

    2014-01-01

    The structure of a short-chain dehydrogenase encoded within genomic islands of A. baumannii strains has been solved to 2.4 Å resolution. This classical SDR incorporates a flexible helical subdomain. The NADP-binding site and catalytic side chains are identified. Over 15% of the genome of an Australian clinical isolate of Acinetobacter baumannii occurs within genomic islands. An uncharacterized protein encoded within one island feature common to this and other International Clone II strains has been studied by X-ray crystallography. The 2.4 Å resolution structure of SDR-WM99c reveals it to be a new member of the classical short-chain dehydrogenase/reductase (SDR) superfamily. The enzyme contains a nucleotide-binding domain and, like many other SDRs, is tetrameric in form. The active site contains a catalytic tetrad (Asn117, Ser146, Tyr159 and Lys163) and water molecules occupying the presumed NADP cofactor-binding pocket. An adjacent cleft is capped by a relatively mobile helical subdomain, which is well positioned to control substrate access

  10. iPat: intelligent prediction and association tool for genomic research.

    Science.gov (United States)

    Chen, Chunpeng James; Zhang, Zhiwu

    2018-06-01

    The ultimate goal of genomic research is to effectively predict phenotypes from genotypes so that medical management can improve human health and molecular breeding can increase agricultural production. Genomic prediction or selection (GS) plays a complementary role to genome-wide association studies (GWAS), which is the primary method to identify genes underlying phenotypes. Unfortunately, most computing tools cannot perform data analyses for both GWAS and GS. Furthermore, the majority of these tools are executed through a command-line interface (CLI), which requires programming skills. Non-programmers struggle to use them efficiently because of the steep learning curves and zero tolerance for data formats and mistakes when inputting keywords and parameters. To address these problems, this study developed a software package, named the Intelligent Prediction and Association Tool (iPat), with a user-friendly graphical user interface. With iPat, GWAS or GS can be performed using a pointing device to simply drag and/or click on graphical elements to specify input data files, choose input parameters and select analytical models. Models available to users include those implemented in third party CLI packages such as GAPIT, PLINK, FarmCPU, BLINK, rrBLUP and BGLR. Users can choose any data format and conduct analyses with any of these packages. File conversions are automatically conducted for specified input data and selected packages. A GWAS-assisted genomic prediction method was implemented to perform genomic prediction using any GWAS method such as FarmCPU. iPat was written in Java for adaptation to multiple operating systems including Windows, Mac and Linux. The iPat executable file, user manual, tutorials and example datasets are freely available at http://zzlab.net/iPat. zhiwu.zhang@wsu.edu.

  11. From structure prediction to genomic screens for novel non-coding RNAs.

    Science.gov (United States)

    Gorodkin, Jan; Hofacker, Ivo L

    2011-08-01

    Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  12. A Bayesian method and its variational approximation for prediction of genomic breeding values in multiple traits

    Directory of Open Access Journals (Sweden)

    Hayashi Takeshi

    2013-01-01

    Full Text Available Abstract Background Genomic selection is an effective tool for animal and plant breeding, allowing effective individual selection without phenotypic records through the prediction of genomic breeding value (GBV. To date, genomic selection has focused on a single trait. However, actual breeding often targets multiple correlated traits, and, therefore, joint analysis taking into consideration the correlation between traits, which might result in more accurate GBV prediction than analyzing each trait separately, is suitable for multi-trait genomic selection. This would require an extension of the prediction model for single-trait GBV to multi-trait case. As the computational burden of multi-trait analysis is even higher than that of single-trait analysis, an effective computational method for constructing a multi-trait prediction model is also needed. Results We described a Bayesian regression model incorporating variable selection for jointly predicting GBVs of multiple traits and devised both an MCMC iteration and variational approximation for Bayesian estimation of parameters in this multi-trait model. The proposed Bayesian procedures with MCMC iteration and variational approximation were referred to as MCBayes and varBayes, respectively. Using simulated datasets of SNP genotypes and phenotypes for three traits with high and low heritabilities, we compared the accuracy in predicting GBVs between multi-trait and single-trait analyses as well as between MCBayes and varBayes. The results showed that, compared to single-trait analysis, multi-trait analysis enabled much more accurate GBV prediction for low-heritability traits correlated with high-heritability traits, by utilizing the correlation structure between traits, while the prediction accuracy for uncorrelated low-heritability traits was comparable or less with multi-trait analysis in comparison with single-trait analysis depending on the setting for prior probability that a SNP has zero

  13. Irruptive dynamics of introduced caribou on Adak Island, Alaska: an evaluation of Riney-Caughley model predictions

    Science.gov (United States)

    Ricca, Mark A.; Van Vuren, Dirk H.; Weckerly, Floyd W.; Williams, Jeffrey C.; Miles, A. Keith

    2014-01-01

    Large mammalian herbivores introduced to islands without predators are predicted to undergo irruptive population and spatial dynamics, but only a few well-documented case studies support this paradigm. We used the Riney-Caughley model as a framework to test predictions of irruptive population growth and spatial expansion of caribou (Rangifer tarandus granti) introduced to Adak Island in the Aleutian archipelago of Alaska in 1958 and 1959. We utilized a time series of spatially explicit counts conducted on this population intermittently over a 54-year period. Population size increased from 23 released animals to approximately 2900 animals in 2012. Population dynamics were characterized by two distinct periods of irruptive growth separated by a long time period of relative stability, and the catalyst for the initial irruption was more likely related to annual variation in hunting pressure than weather conditions. An unexpected pattern resembling logistic population growth occurred between the peak of the second irruption in 2005 and the next survey conducted seven years later in 2012. Model simulations indicated that an increase in reported harvest alone could not explain the deceleration in population growth, yet high levels of unreported harvest combined with increasing density-dependent feedbacks on fecundity and survival were the most plausible explanation for the observed population trend. No studies of introduced island Rangifer have measured a time series of spatial use to the extent described in this study. Spatial use patterns during the post-calving season strongly supported Riney-Caughley model predictions, whereby high-density core areas expanded outwardly as population size increased. During the calving season, caribou displayed marked site fidelity across the full range of population densities despite availability of other suitable habitats for calving. Finally, dispersal and reproduction on neighboring Kagalaska Island represented a new dispersal front

  14. Genomic prediction in contrast to a genome-wide association study in explaining heritable variation of complex growth traits in breeding populations of Eucalyptus.

    Science.gov (United States)

    Müller, Bárbara S F; Neves, Leandro G; de Almeida Filho, Janeo E; Resende, Márcio F R; Muñoz, Patricio R; Dos Santos, Paulo E T; Filho, Estefano Paludzyszyn; Kirst, Matias; Grattapaglia, Dario

    2017-07-11

    The advent of high-throughput genotyping technologies coupled to genomic prediction methods established a new paradigm to integrate genomics and breeding. We carried out whole-genome prediction and contrasted it to a genome-wide association study (GWAS) for growth traits in breeding populations of Eucalyptus benthamii (n =505) and Eucalyptus pellita (n =732). Both species are of increasing commercial interest for the development of germplasm adapted to environmental stresses. Predictive ability reached 0.16 in E. benthamii and 0.44 in E. pellita for diameter growth. Predictive abilities using either Genomic BLUP or different Bayesian methods were similar, suggesting that growth adequately fits the infinitesimal model. Genomic prediction models using ~5000-10,000 SNPs provided predictive abilities equivalent to using all 13,787 and 19,506 SNPs genotyped in the E. benthamii and E. pellita populations, respectively. No difference was detected in predictive ability when different sets of SNPs were utilized, based on position (equidistantly genome-wide, inside genes, linkage disequilibrium pruned or on single chromosomes), as long as the total number of SNPs used was above ~5000. Predictive abilities obtained by removing relatedness between training and validation sets fell near zero for E. benthamii and were halved for E. pellita. These results corroborate the current view that relatedness is the main driver of genomic prediction, although some short-range historical linkage disequilibrium (LD) was likely captured for E. pellita. A GWAS identified only one significant association for volume growth in E. pellita, illustrating the fact that while genome-wide regression is able to account for large proportions of the heritability, very little or none of it is captured into significant associations using GWAS in breeding populations of the size evaluated in this study. This study provides further experimental data supporting positive prospects of using genome-wide data to

  15. Extensions of Island Biogeography Theory predict the scaling of functional trait composition with habitat area and isolation.

    Science.gov (United States)

    Jacquet, Claire; Mouillot, David; Kulbicki, Michel; Gravel, Dominique

    2017-02-01

    The Theory of Island Biogeography (TIB) predicts how area and isolation influence species richness equilibrium on insular habitats. However, the TIB remains silent about functional trait composition and provides no information on the scaling of functional diversity with area, an observation that is now documented in many systems. To fill this gap, we develop a probabilistic approach to predict the distribution of a trait as a function of habitat area and isolation, extending the TIB beyond the traditional species-area relationship. We compare model predictions to the body-size distribution of piscivorous and herbivorous fishes found on tropical reefs worldwide. We find that small and isolated reefs have a higher proportion of large-sized species than large and connected reefs. We also find that knowledge of species body-size and trophic position improves the predictions of fish occupancy on tropical reefs, supporting both the allometric and trophic theory of island biogeography. The integration of functional ecology to island biogeography is broadly applicable to any functional traits and provides a general probabilistic approach to study the scaling of trait distribution with habitat area and isolation. © 2016 John Wiley & Sons Ltd/CNRS.

  16. Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation.

    Directory of Open Access Journals (Sweden)

    Frank Technow

    Full Text Available Genomic selection, enabled by whole genome prediction (WGP methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E, continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC, a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics.

  17. HOX Gene Promoter Prediction and Inter-genomic Comparison: An Evo-Devo Study

    Directory of Open Access Journals (Sweden)

    Marla A. Endriga

    2010-10-01

    Full Text Available Homeobox genes direct the anterior-posterior axis of the body plan in eukaryotic organisms. Promoter regions upstream of the Hox genes jumpstart the transcription process. CpG islands found within the promoter regions can cause silencing of these promoters. The locations of the promoter regions and the CpG islands of Homeo sapiens sapiens (human, Pan troglodytes (chimpanzee, Mus musculus (mouse, and Rattus norvegicus (brown rat are compared and related to the possible influence on the specification of the mammalian body plan. The sequence of each gene in Hox clusters A-D of the mammals considered were retrieved from Ensembl and locations of promoter regions and CpG islands predicted using Exon Finder. The predicted promoter sequences were confirmed via BLAST and verified against the Eukaryotic Promoter Database. The significance of the locations was determined using the Kruskal-Wallis test. Among the four clusters, only promoter locations in cluster B showed significant difference. HOX B genes have been linked with the control of genes that direct the development of axial morphology, particularly of the vertebral column bones. The magnitude of variation among the body plans of closely-related species can thus be partially attributed to the promoter kind, location and number, and gene inactivation via CpG methylation.

  18. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    Science.gov (United States)

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  19. Impact of relationships between test and training animals and among training animals on reliability of genomic prediction.

    Science.gov (United States)

    Wu, X; Lund, M S; Sun, D; Zhang, Q; Su, G

    2015-10-01

    One of the factors affecting the reliability of genomic prediction is the relationship among the animals of interest. This study investigated the reliability of genomic prediction in various scenarios with regard to the relationship between test and training animals, and among animals within the training data set. Different training data sets were generated from EuroGenomics data and a group of Nordic Holstein bulls (born in 2005 and afterwards) as a common test data set. Genomic breeding values were predicted using a genomic best linear unbiased prediction model and a Bayesian mixture model. The results showed that a closer relationship between test and training animals led to a higher reliability of genomic predictions for the test animals, while a closer relationship among training animals resulted in a lower reliability. In addition, the Bayesian mixture model in general led to a slightly higher reliability of genomic prediction, especially for the scenario of distant relationships between training and test animals. Therefore, to prevent a decrease in reliability, constant updates of the training population with animals from more recent generations are required. Moreover, a training population consisting of less-related animals is favourable for reliability of genomic prediction. © 2015 Blackwell Verlag GmbH.

  20. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12.

    Science.gov (United States)

    Thieffry, D; Salgado, H; Huerta, A M; Collado-Vides, J

    1998-06-01

    As one of the best-characterized free-living organisms, Escherichia coli and its recently completed genomic sequence offer a special opportunity to exploit systematically the variety of regulatory data available in the literature in order to make a comprehensive set of regulatory predictions in the whole genome. The complete genome sequence of E.coli was analyzed for the binding of transcriptional regulators upstream of coding sequences. The biological information contained in RegulonDB (Huerta, A.M. et al., Nucleic Acids Res.,26,55-60, 1998) for 56 different transcriptional proteins was the support to implement a stringent strategy combining string search and weight matrices. We estimate that our search included representatives of 15-25% of the total number of regulatory binding proteins in E.coli. This search was performed on the set of 4288 putative regulatory regions, each 450 bp long. Within the regions with predicted sites, 89% are regulated by one protein and 81% involve only one site. These numbers are reasonably consistent with the distribution of experimental regulatory sites. Regulatory sites are found in 603 regions corresponding to 16% of operon regions and 10% of intra-operonic regions. Additional evidence gives stronger support to some of these predictions, including the position of the site, biological consistency with the function of the downstream gene, as well as genetic evidence for the regulatory interaction. The predictions described here were incorporated into the map presented in the paper describing the complete E.coli genome (Blattner,F.R. et al., Science, 277, 1453-1461, 1997). The complete set of predictions in GenBank format is available at the url: http://www. cifn.unam.mx/Computational_Biology/E.coli-predictions ecoli-reg@cifn.unam.mx, collado@cifn.unam.mx

  1. Using physicochemical and compositional characteristics of DNA sequence for prediction of genomic signals

    KAUST Repository

    Mulamba, Pierre Abraham

    2014-12-01

    The challenge in finding genes in eukaryotic organisms using computational methods is an ongoing problem in the biology. Based on various genomic signals found in eukaryotic genomes, this problem can be divided into many different sub­‐problems such as identification of transcription start sites, translation initiation sites, splice sites, poly (A) signals, etc. Each sub-­problem deals with a particular type of genomic signals and various computational methods are used to solve each sub-­problem. Aggregating information from all these individual sub-­problems can lead to a complete annotation of a gene and its component signals. The fundamental principle of most of these computational methods is the mapping principle – building an input-­output model for the prediction of a particular genomic signal based on a set of known input signals and their corresponding output signal. The type of input signals used to build the model is an essential element in most of these computational methods. The common factor of most of these methods is that they are mainly based on the statistical analysis of the basic nucleotide sequence string composition. 4 Our study is based on a novel approach to predict genomic signals in which uniquely generated structural profiles that combine compressed physicochemical properties with topological and compositional properties of DNA sequences are used to develop machine learning predictive models. The compression of the physicochemical properties is made using principal component analysis transformation. Our ideas are evaluated through prediction models of canonical splice sites using support vector machine models. We demonstrate across several species that the proposed methodology has resulted in the most accurate splice site predictors that are publicly available or described. We believe that the approach in this study is quite general and has various applications in other biological modeling problems.

  2. Genomic prediction applied to high-biomass sorghum for bioenergy production.

    Science.gov (United States)

    de Oliveira, Amanda Avelar; Pastina, Maria Marta; de Souza, Vander Filipe; da Costa Parrella, Rafael Augusto; Noda, Roberto Willians; Simeone, Maria Lúcia Ferreira; Schaffert, Robert Eugene; de Magalhães, Jurandir Vieira; Damasceno, Cynthia Maria Borges; Margarido, Gabriel Rodrigues Alves

    2018-01-01

    The increasing cost of energy and finite oil and gas reserves have created a need to develop alternative fuels from renewable sources. Due to its abiotic stress tolerance and annual cultivation, high-biomass sorghum ( Sorghum bicolor L. Moench) shows potential as a bioenergy crop. Genomic selection is a useful tool for accelerating genetic gains and could restructure plant breeding programs by enabling early selection and reducing breeding cycle duration. This work aimed at predicting breeding values via genomic selection models for 200 sorghum genotypes comprising landrace accessions and breeding lines from biomass and saccharine groups. These genotypes were divided into two sub-panels, according to breeding purpose. We evaluated the following phenotypic biomass traits: days to flowering, plant height, fresh and dry matter yield, and fiber, cellulose, hemicellulose, and lignin proportions. Genotyping by sequencing yielded more than 258,000 single-nucleotide polymorphism markers, which revealed population structure between subpanels. We then fitted and compared genomic selection models BayesA, BayesB, BayesCπ, BayesLasso, Bayes Ridge Regression and random regression best linear unbiased predictor. The resulting predictive abilities varied little between the different models, but substantially between traits. Different scenarios of prediction showed the potential of using genomic selection results between sub-panels and years, although the genotype by environment interaction negatively affected accuracies. Functional enrichment analyses performed with the marker-predicted effects suggested several interesting associations, with potential for revealing biological processes relevant to the studied quantitative traits. This work shows that genomic selection can be successfully applied in biomass sorghum breeding programs.

  3. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    Science.gov (United States)

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  4. Pathogenicity island mobility and gene content.

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Kelly Porter

    2013-10-01

    Key goals towards national biosecurity include methods for analyzing pathogens, predicting their emergence, and developing countermeasures. These goals are served by studying bacterial genes that promote pathogenicity and the pathogenicity islands that mobilize them. Cyberinfrastructure promoting an island database advances this field and enables deeper bioinformatic analysis that may identify novel pathogenicity genes. New automated methods and rich visualizations were developed for identifying pathogenicity islands, based on the principle that islands occur sporadically among closely related strains. The chromosomally-ordered pan-genome organizes all genes from a clade of strains; gaps in this visualization indicate islands, and decorations of the gene matrix facilitate exploration of island gene functions. A %E2%80%9Clearned phyloblocks%E2%80%9D method was developed for automated island identification, that trains on the phylogenetic patterns of islands identified by other methods. Learned phyloblocks better defined termini of previously identified islands in multidrug-resistant Klebsiella pneumoniae ATCC BAA-2146, and found its only antibiotic resistance island.

  5. Prediction of expected years of life using whole-genome markers.

    Directory of Open Access Journals (Sweden)

    Gustavo de los Campos

    Full Text Available Genetic factors are believed to account for 25% of the interindividual differences in Years of Life (YL among humans. However, the genetic loci that have thus far been found to be associated with YL explain a very small proportion of the expected genetic variation in this trait, perhaps reflecting the complexity of the trait and the limitations of traditional association studies when applied to traits affected by a large number of small-effect genes. Using data from the Framingham Heart Study and statistical methods borrowed largely from the field of animal genetics (whole-genome prediction, WGP, we developed a WGP model for the study of YL and evaluated the extent to which thousands of genetic variants across the genome examined simultaneously can be used to predict interindividual differences in YL. We find that a sizable proportion of differences in YL--which were unexplained by age at entry, sex, smoking and BMI--can be accounted for and predicted using WGP methods. The contribution of genomic information to prediction accuracy was even higher than that of smoking and body mass index (BMI combined; two predictors that are considered among the most important life-shortening factors. We evaluated the impacts of familial relationships and population structure (as described by the first two marker-derived principal components and concluded that in our dataset population structure explained partially, but not fully the gains in prediction accuracy obtained with WGP. Further inspection of prediction accuracies by age at death indicated that most of the gains in predictive ability achieved with WGP were due to the increased accuracy of prediction of early mortality, perhaps reflecting the ability of WGP to capture differences in genetic risk to deadly diseases such as cancer, which are most often responsible for early mortality in our sample.

  6. Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

    Directory of Open Access Journals (Sweden)

    Sungkyoung Choi

    2016-12-01

    Full Text Available The success of genome-wide association studies (GWASs has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR, least absolute shrinkage and selection operator (LASSO, and Elastic-Net (EN. We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

  7. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human.

    Science.gov (United States)

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-02-16

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.

  8. Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes

    Science.gov (United States)

    Spanò, M.; Lillo, F.; Miccichè, S.; Mantegna, R. N.

    2008-10-01

    By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups, hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact, we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.

  9. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    Science.gov (United States)

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  10. Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat.

    Science.gov (United States)

    Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Huerta-Espino, Julio; Lan, Caixia; Bhavani, Sridhar; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

    2017-07-01

    Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.

  11. A Genomics-Based Model for Prediction of Severe Bioprosthetic Mitral Valve Calcification.

    Science.gov (United States)

    Ponasenko, Anastasia V; Khutornaya, Maria V; Kutikhin, Anton G; Rutkovskaya, Natalia V; Tsepokina, Anna V; Kondyukova, Natalia V; Yuzhalin, Arseniy E; Barbarash, Leonid S

    2016-08-31

    Severe bioprosthetic mitral valve calcification is a significant problem in cardiovascular surgery. Unfortunately, clinical markers did not demonstrate efficacy in prediction of severe bioprosthetic mitral valve calcification. Here, we examined whether a genomics-based approach is efficient in predicting the risk of severe bioprosthetic mitral valve calcification. A total of 124 consecutive Russian patients who underwent mitral valve replacement surgery were recruited. We investigated the associations of the inherited variation in innate immunity, lipid metabolism and calcium metabolism genes with severe bioprosthetic mitral valve calcification. Genotyping was conducted utilizing the TaqMan assay. Eight gene polymorphisms were significantly associated with severe bioprosthetic mitral valve calcification and were therefore included into stepwise logistic regression which identified male gender, the T/T genotype of the rs3775073 polymorphism within the TLR6 gene, the C/T genotype of the rs2229238 polymorphism within the IL6R gene, and the A/A genotype of the rs10455872 polymorphism within the LPA gene as independent predictors of severe bioprosthetic mitral valve calcification. The developed genomics-based model had fair predictive value with area under the receiver operating characteristic (ROC) curve of 0.73. In conclusion, our genomics-based approach is efficient for the prediction of severe bioprosthetic mitral valve calcification.

  12. Impact of Relationships between Test and Reference Animals and between Reference Animals on Reliability of Genomic Prediction

    DEFF Research Database (Denmark)

    Wu, Xiaoping; Lund, Mogens Sandø; Sun, Dongxiao

    This study investigated reliability of genomic prediction in various scenarios with regard to relationship between test and reference animals and between animals within the reference population. Different reference populations were generated from EuroGenomics data and 1288 Nordic Holstein bulls...... as a common test population. A GBLUP model and a Bayesian mixture model were applied to predict Genomic breeding values for bulls in the test data. Result showed that a closer relationship between test and reference animals led to a higher reliability, while a closer relationship between reference animal...... resulted in a lower reliability. Therefore, the design of reference population is important for improving the reliability of genomic prediction. With regard to model, the Bayesian mixture model in general led to slightly a higher reliability of genomic prediction than the GBLUP model...

  13. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  14. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    Science.gov (United States)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  15. Genomic prediction based on data from three layer lines using non-linear regression models.

    Science.gov (United States)

    Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

    2014-11-06

    Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional

  16. Breeding Jatropha curcas by genomic selection: A pilot assessment of the accuracy of predictive models.

    Science.gov (United States)

    Azevedo Peixoto, Leonardo de; Laviola, Bruno Galvêas; Alves, Alexandre Alonso; Rosado, Tatiana Barbosa; Bhering, Leonardo Lopes

    2017-01-01

    Genomic wide selection is a promising approach for improving the selection accuracy in plant breeding, particularly in species with long life cycles, such as Jatropha. Therefore, the objectives of this study were to estimate the genetic parameters for grain yield (GY) and the weight of 100 seeds (W100S) using restricted maximum likelihood (REML); to compare the performance of GWS methods to predict GY and W100S; and to estimate how many markers are needed to train the GWS model to obtain the maximum accuracy. Eight GWS models were compared in terms of predictive ability. The impact that the marker density had on the predictive ability was investigated using a varying number of markers, from 2 to 1,248. Because the genetic variance between evaluated genotypes was significant, it was possible to obtain selection gain. All of the GWS methods tested in this study can be used to predict GY and W100S in Jatropha. A training model fitted using 1,000 and 800 markers is sufficient to capture the maximum genetic variance and, consequently, maximum prediction ability of GY and W100S, respectively. This study demonstrated the applicability of genome-wide prediction to identify useful genetic sources of GY and W100S for Jatropha breeding. Further research is needed to confirm the applicability of the proposed approach to other complex traits.

  17. A fast EM algorithm for BayesA-like prediction of genomic breeding values.

    Directory of Open Access Journals (Sweden)

    Xiaochen Sun

    Full Text Available Prediction accuracies of estimated breeding values for economically important traits are expected to benefit from genomic information. Single nucleotide polymorphism (SNP panels used in genomic prediction are increasing in density, but the Markov Chain Monte Carlo (MCMC estimation of SNP effects can be quite time consuming or slow to converge when a large number of SNPs are fitted simultaneously in a linear mixed model. Here we present an EM algorithm (termed "fastBayesA" without MCMC. This fastBayesA approach treats the variances of SNP effects as missing data and uses a joint posterior mode of effects compared to the commonly used BayesA which bases predictions on posterior means of effects. In each EM iteration, SNP effects are predicted as a linear combination of best linear unbiased predictions of breeding values from a mixed linear animal model that incorporates a weighted marker-based realized relationship matrix. Method fastBayesA converges after a few iterations to a joint posterior mode of SNP effects under the BayesA model. When applied to simulated quantitative traits with a range of genetic architectures, fastBayesA is shown to predict GEBV as accurately as BayesA but with less computing effort per SNP than BayesA. Method fastBayesA can be used as a computationally efficient substitute for BayesA, especially when an increasing number of markers bring unreasonable computational burden or slow convergence to MCMC approaches.

  18. Breeding Jatropha curcas by genomic selection: A pilot assessment of the accuracy of predictive models.

    Directory of Open Access Journals (Sweden)

    Leonardo de Azevedo Peixoto

    Full Text Available Genomic wide selection is a promising approach for improving the selection accuracy in plant breeding, particularly in species with long life cycles, such as Jatropha. Therefore, the objectives of this study were to estimate the genetic parameters for grain yield (GY and the weight of 100 seeds (W100S using restricted maximum likelihood (REML; to compare the performance of GWS methods to predict GY and W100S; and to estimate how many markers are needed to train the GWS model to obtain the maximum accuracy. Eight GWS models were compared in terms of predictive ability. The impact that the marker density had on the predictive ability was investigated using a varying number of markers, from 2 to 1,248. Because the genetic variance between evaluated genotypes was significant, it was possible to obtain selection gain. All of the GWS methods tested in this study can be used to predict GY and W100S in Jatropha. A training model fitted using 1,000 and 800 markers is sufficient to capture the maximum genetic variance and, consequently, maximum prediction ability of GY and W100S, respectively. This study demonstrated the applicability of genome-wide prediction to identify useful genetic sources of GY and W100S for Jatropha breeding. Further research is needed to confirm the applicability of the proposed approach to other complex traits.

  19. Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction.

    Science.gov (United States)

    Bandeira E Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose

    2017-06-07

    Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. Copyright © 2017 Bandeira e Sousa et al.

  20. Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction

    Directory of Open Access Journals (Sweden)

    Massaine Bandeira e Sousa

    2017-06-01

    Full Text Available Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1 single-environment, main genotypic effect model (SM; (2 multi-environment, main genotypic effects model (MM; (3 multi-environment, single variance G×E deviation model (MDs; and (4 multi-environment, environment-specific variance G×E deviation model (MDe. Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB, and a nonlinear kernel Gaussian kernel (GK. The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets, having different numbers of maize hybrids evaluated in different environments for grain yield (GY, plant height (PH, and ear height (EH. Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied.

  1. A systems approach to predict oncometabolites via context-specific genome-scale metabolic networks.

    Directory of Open Access Journals (Sweden)

    Hojung Nam

    2014-09-01

    Full Text Available Altered metabolism in cancer cells has been viewed as a passive response required for a malignant transformation. However, this view has changed through the recently described metabolic oncogenic factors: mutated isocitrate dehydrogenases (IDH, succinate dehydrogenase (SDH, and fumarate hydratase (FH that produce oncometabolites that competitively inhibit epigenetic regulation. In this study, we demonstrate in silico predictions of oncometabolites that have the potential to dysregulate epigenetic controls in nine types of cancer by incorporating massive scale genetic mutation information (collected from more than 1,700 cancer genomes, expression profiling data, and deploying Recon 2 to reconstruct context-specific genome-scale metabolic models. Our analysis predicted 15 compounds and 24 substructures of potential oncometabolites that could result from the loss-of-function and gain-of-function mutations of metabolic enzymes, respectively. These results suggest a substantial potential for discovering unidentified oncometabolites in various forms of cancers.

  2. Analysis and prediction of gene splice sites in four Aspergillus genomes

    DEFF Research Database (Denmark)

    Wang, Kai; Ussery, David; Brunak, Søren

    2009-01-01

    Several Aspergillus fungal genomic sequences have been published, with many more in progress. Obviously, it is essential to have high-quality, consistently annotated sets of proteins from each of the genomes, in order to make meaningful comparisons. We have developed a dedicated, publicly available......, splice site prediction program called NetAspGene, for the genus Aspergillus. Gene sequences from Aspergillus fumigatus, the most common mould pathogen, were used to build and test our model. Compared to many animals and plants, Aspergillus contains smaller introns; thus we have applied a larger window...... better splice site prediction than other available tools. NetAspGene will be very helpful for the study in Aspergillus splice sites and especially in alternative splicing. A webpage for NetAspGene is publicly available at http://www.cbs.dtu.dk/services/NetAspGene....

  3. Biofilm Formation Mechanisms of Pseudomonas aeruginosa Predicted via Genome-Scale Kinetic Models of Bacterial Metabolism

    Science.gov (United States)

    2016-03-15

    RESEARCH ARTICLE Biofilm Formation Mechanisms of Pseudomonas aeruginosa Predicted via Genome-Scale Kinetic Models of Bacterial Metabolism Francisco G...jaques.reifman.civ@mail.mil Abstract A hallmark of Pseudomonas aeruginosa is its ability to establish biofilm -based infections that are difficult to...eradicate. Biofilms are less susceptible to host inflammatory and immune responses and have higher antibiotic tolerance than free-living planktonic

  4. De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture.

    Science.gov (United States)

    Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N

    2017-11-14

    Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.

  5. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery

    OpenAIRE

    Hickey, John M; Chiurugwi, Tinashe; Mackay, Ian; Powell, Wayne; Implementing Genomic Selection in CGIAR Breeding Programs Workshop Participants

    2017-01-01

    The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic pre...

  6. Genomic Prediction Within and Across Biparental Families: Means and Variances of Prediction Accuracy and Usefulness of Deterministic Equations

    Directory of Open Access Journals (Sweden)

    Pascal Schopp

    2017-11-01

    Full Text Available A major application of genomic prediction (GP in plant breeding is the identification of superior inbred lines within families derived from biparental crosses. When models for various traits were trained within related or unrelated biparental families (BPFs, experimental studies found substantial variation in prediction accuracy (PA, but little is known about the underlying factors. We used SNP marker genotypes of inbred lines from either elite germplasm or landraces of maize (Zea mays L. as parents to generate in silico 300 BPFs of doubled-haploid lines. We analyzed PA within each BPF for 50 simulated polygenic traits, using genomic best linear unbiased prediction (GBLUP models trained with individuals from either full-sib (FSF, half-sib (HSF, or unrelated families (URF for various sizes (Ntrain of the training set and different heritabilities (h2 . In addition, we modified two deterministic equations for forecasting PA to account for inbreeding and genetic variance unexplained by the training set. Averaged across traits, PA was high within FSF (0.41–0.97 with large variation only for Ntrain < 50 and h2 < 0.6. For HSF and URF, PA was on average ∼40–60% lower and varied substantially among different combinations of BPFs used for model training and prediction as well as different traits. As exemplified by HSF results, PA of across-family GP can be very low if causal variants not segregating in the training set account for a sizeable proportion of the genetic variance among predicted individuals. Deterministic equations accurately forecast the PA expected over many traits, yet cannot capture trait-specific deviations. We conclude that model training within BPFs generally yields stable PA, whereas a high level of uncertainty is encountered in across-family GP. Our study shows the extent of variation in PA that must be at least reckoned with in practice and offers a starting point for the design of training sets composed of multiple BPFs.

  7. Predictive Power Estimation Algorithm (PPEA--a new algorithm to reduce overfitting for genomic biomarker discovery.

    Directory of Open Access Journals (Sweden)

    Jiangang Liu

    Full Text Available Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA, which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1 PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2 the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3 using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4 more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses.

  8. Bias of genetic trend of genomic predictions based on both real dairy cattle and simulated data

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Nielsen, Ulrik Sander

    This study investigated the phenomenon of bias in the trend of genomic predictions and attempted to find the reason and solution for this bias. The data used in this study include Danish Jersey data and simulation data. In Jersey data, the bias was reduced when cows were included in the reference...... population. In simulated data, there was no bias when the test animals were unselected cows. When the G matrix was derived from genotypes of causal genes, the bias was reduced. The results suggest that the main reasons for causing the bias of the prediction trends are the selection of bulls and bull dams...

  9. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.

    Science.gov (United States)

    Ould Estaghvirou, Sidi Boubacar; Ogutu, Joseph O; Schulz-Streeck, Torben; Knaak, Carsten; Ouzunova, Milena; Gordillo, Andres; Piepho, Hans-Peter

    2013-12-06

    In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least

  10. Gaussian covariance graph models accounting for correlated marker effects in genome-wide prediction.

    Science.gov (United States)

    Martínez, C A; Khare, K; Rahman, S; Elzo, M A

    2017-10-01

    Several statistical models used in genome-wide prediction assume uncorrelated marker allele substitution effects, but it is known that these effects may be correlated. In statistics, graphical models have been identified as a useful tool for covariance estimation in high-dimensional problems and it is an area that has recently experienced a great expansion. In Gaussian covariance graph models (GCovGM), the joint distribution of a set of random variables is assumed to be Gaussian and the pattern of zeros of the covariance matrix is encoded in terms of an undirected graph G. In this study, methods adapting the theory of GCovGM to genome-wide prediction were developed (Bayes GCov, Bayes GCov-KR and Bayes GCov-H). In simulated data sets, improvements in correlation between phenotypes and predicted breeding values and accuracies of predicted breeding values were found. Our models account for correlation of marker effects and permit to accommodate general structures as opposed to models proposed in previous studies, which consider spatial correlation only. In addition, they allow incorporation of biological information in the prediction process through its use when constructing graph G, and their extension to the multi-allelic loci case is straightforward. © 2017 Blackwell Verlag GmbH.

  11. From structure prediction to genomic screens for novel non-coding RNAs.

    Directory of Open Access Journals (Sweden)

    Jan Gorodkin

    2011-08-01

    Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  12. Multi-population genomic prediction using a multi-task Bayesian learning model.

    Science.gov (United States)

    Chen, Liuhong; Li, Changxi; Miller, Stephen; Schenkel, Flavio

    2014-05-03

    Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method. A multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an

  13. Improved prediction of genetic predisposition to psychiatric disorders using genomic feature best linear unbiased prediction models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Demontis, Ditte; Børglum, Anders

    is enriched for causal variants. Here we apply the GFBLUP model to a small schizophrenia case-control study to test the promise of this model on psychiatric disorders, and hypothesize that the performance will be increased when applying the model to a larger ADHD case-control study if the genomic feature...... contains the causal variants. Materials and Methods: The schizophrenia study consisted of 882 controls and 888 schizophrenia cases genotyped for 520,000 SNPs. The ADHD study contained 25,954 controls and 16,663 ADHD cases with 8,4 million imputed genotypes. Results: The predictive ability for schizophrenia.......6% for the null model). Conclusion: The improvement in predictive ability for schizophrenia was marginal, however, greater improvement is expected for the larger ADHD data....

  14. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.

    Directory of Open Access Journals (Sweden)

    Xiao-Lin Wu

    Full Text Available Low-density (LD single nucleotide polymorphism (SNP arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD or high-density (HD SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE or haplotype-averaged Shannon entropy (HASE and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus

  15. Genome-wide prediction of traits with different genetic architecture through efficient variable selection.

    Science.gov (United States)

    Wimmer, Valentin; Lehermeier, Christina; Albrecht, Theresa; Auinger, Hans-Jürgen; Wang, Yu; Schön, Chris-Carolin

    2013-10-01

    In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.

  16. Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato.

    Science.gov (United States)

    Endelman, Jeffrey B; Carley, Cari A Schmitz; Bethke, Paul C; Coombs, Joseph J; Clough, Mark E; da Silva, Washington L; De Jong, Walter S; Douches, David S; Frederick, Curtis M; Haynes, Kathleen G; Holm, David G; Miller, J Creighton; Muñoz, Patricio R; Navarro, Felix M; Novy, Richard G; Palta, Jiwan P; Porter, Gregory A; Rak, Kyle T; Sathuvalli, Vidyasagar R; Thompson, Asunta L; Yencho, G Craig

    2018-05-01

    As one of the world's most important food crops, the potato ( Solanum tuberosum L.) has spurred innovation in autotetraploid genetics, including in the use of SNP arrays to determine allele dosage at thousands of markers. By combining genotype and pedigree information with phenotype data for economically important traits, the objectives of this study were to (1) partition the genetic variance into additive vs. nonadditive components, and (2) determine the accuracy of genome-wide prediction. Between 2012 and 2017, a training population of 571 clones was evaluated for total yield, specific gravity, and chip fry color. Genomic covariance matrices for additive ( G ), digenic dominant ( D ), and additive × additive epistatic ( G # G ) effects were calculated using 3895 markers, and the numerator relationship matrix ( A ) was calculated from a 13-generation pedigree. Based on model fit and prediction accuracy, mixed model analysis with G was superior to A for yield and fry color but not specific gravity. The amount of additive genetic variance captured by markers was 20% of the total genetic variance for specific gravity, compared to 45% for yield and fry color. Within the training population, including nonadditive effects improved accuracy and/or bias for all three traits when predicting total genotypic value. When six F 1 populations were used for validation, prediction accuracy ranged from 0.06 to 0.63 and was consistently lower (0.13 on average) without allele dosage information. We conclude that genome-wide prediction is feasible in potato and that it will improve selection for breeding value given the substantial amount of nonadditive genetic variance in elite germplasm. Copyright © 2018 by the Genetics Society of America.

  17. Genomic-Enabled Prediction Kernel Models with Random Intercepts for Multi-environment Trials

    Science.gov (United States)

    Cuevas, Jaime; Granato, Italo; Fritsche-Neto, Roberto; Montesinos-Lopez, Osval A.; Burgueño, Juan; Bandeira e Sousa, Massaine; Crossa, José

    2018-01-01

    In this study, we compared the prediction accuracy of the main genotypic effect model (MM) without G×E interactions, the multi-environment single variance G×E deviation model (MDs), and the multi-environment environment-specific variance G×E deviation model (MDe) where the random genetic effects of the lines are modeled with the markers (or pedigree). With the objective of further modeling the genetic residual of the lines, we incorporated the random intercepts of the lines (l) and generated another three models. Each of these 6 models were fitted with a linear kernel method (Genomic Best Linear Unbiased Predictor, GB) and a Gaussian Kernel (GK) method. We compared these 12 model-method combinations with another two multi-environment G×E interactions models with unstructured variance-covariances (MUC) using GB and GK kernels (4 model-method). Thus, we compared the genomic-enabled prediction accuracy of a total of 16 model-method combinations on two maize data sets with positive phenotypic correlations among environments, and on two wheat data sets with complex G×E that includes some negative and close to zero phenotypic correlations among environments. The two models (MDs and MDE with the random intercept of the lines and the GK method) were computationally efficient and gave high prediction accuracy in the two maize data sets. Regarding the more complex G×E wheat data sets, the prediction accuracy of the model-method combination with G×E, MDs and MDe, including the random intercepts of the lines with GK method had important savings in computing time as compared with the G×E interaction multi-environment models with unstructured variance-covariances but with lower genomic prediction accuracy. PMID:29476023

  18. Genome-wide prediction of discrete traits using bayesian regressions and machine learning

    Directory of Open Access Journals (Sweden)

    Forni Selma

    2011-02-01

    Full Text Available Abstract Background Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates small n (number of observations problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance. It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. Methods This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO and two machine learning algorithms (boosting and random forest to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability. Results The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. Conclusions The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different

  19. MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.

    Science.gov (United States)

    Gupta, Ankit; Kapil, Rohan; Dhakan, Darshan B; Sharma, Vineet K

    2014-01-01

    The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51-100 amino acids and Blind B: 30-50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100-150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.

  20. GWAS and Genomic Prediction Based on Markers of SNP-CHIPS and Sequence Data in Cattle Populations

    DEFF Research Database (Denmark)

    Wu, Xiaoping

    This thesis investigated the methods and models for genome wide association study and genomic prediction. The main conclusions are: 1) The power of QTL detection can be increased by increasing marker densities, and the Bayesian variable selection model together with the analysis of the QTL intens...

  1. Modeling heterogeneous (co)variances from adjacent-SNP groups improves genomic prediction for milk protein composition traits

    DEFF Research Database (Denmark)

    Gebreyesus, Grum; Lund, Mogens Sandø; Buitenhuis, Albert Johannes

    2017-01-01

    Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci...... of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we...... developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls...

  2. The Arsenic Resistance-Associated Listeria Genomic Island LGI2 Exhibits Sequence and Integration Site Diversity and a Propensity for Three Listeria monocytogenes Clones with Enhanced Virulence.

    Science.gov (United States)

    Lee, Sangmi; Ward, Todd J; Jima, Dereje D; Parsons, Cameron; Kathariou, Sophia

    2017-11-01

    In the foodborne pathogen Listeria monocytogenes , arsenic resistance is encountered primarily in serotype 4b clones considered to have enhanced virulence and is associated with an arsenic resistance gene cluster within a 35-kb chromosomal region, Listeria genomic island 2 (LGI2). LGI2 was first identified in strain Scott A and includes genes putatively involved in arsenic and cadmium resistance, DNA integration, conjugation, and pathogenicity. However, the genomic localization and sequence content of LGI2 remain poorly characterized. Here we investigated 85 arsenic-resistant L. monocytogenes strains, mostly of serotype 4b. All but one of the 70 serotype 4b strains belonged to clonal complex 1 (CC1), CC2, and CC4, three major clones associated with enhanced virulence. PCR analysis suggested that 53 strains (62.4%) harbored an island highly similar to LGI2 of Scott A, frequently (42/53) in the same location as Scott A ( LMOf2365_2257 homolog). Random-primed PCR and whole-genome sequencing revealed seven novel insertion sites, mostly internal to chromosomal coding sequences, among strains harboring LGI2 outside the LMOf2365_2257 homolog. Interestingly, many CC1 strains harbored a noticeably diversified LGI2 (LGI2-1) in a unique location ( LMOf2365_0902 homolog) and with a novel additional gene. With few exceptions, the tested LGI2 genes were not detected in arsenic-resistant strains of serogroup 1/2, which instead often harbored a Tn 554 -associated arsenic resistance determinant not encountered in serotype 4b. These findings indicate that in L. monocytogenes , LGI2 has a propensity for certain serotype 4b clones, exhibits content diversity, and is highly promiscuous, suggesting an ability to mobilize various accessory genes into diverse chromosomal loci. IMPORTANCE Listeria monocytogenes is widely distributed in the environment and causes listeriosis, a foodborne disease with high mortality and morbidity. Arsenic and other heavy metals can powerfully shape the

  3. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

    Science.gov (United States)

    Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C.; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

    2018-01-01

    In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. PMID:29097376

  4. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems.

    Science.gov (United States)

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José C; Mota-Sanchez, David; Estrada-González, Fermín; Gillberg, Jussi; Singh, Ravi; Mondal, Suchismita; Juliana, Philomin

    2018-01-04

    In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF) and the matrix factorization algorithm (MF) in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment-trait combinations) and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets. Copyright © 2018 Montesinos-Lopez et al.

  5. Prediction of Multiple-Trait and Multiple-Environment Genomic Data Using Recommender Systems

    Directory of Open Access Journals (Sweden)

    Osval A. Montesinos-López

    2018-01-01

    Full Text Available In genomic-enabled prediction, the task of improving the accuracy of the prediction of lines in environments is difficult because the available information is generally sparse and usually has low correlations between traits. In current genomic selection, although researchers have a large amount of information and appropriate statistical models to process it, there is still limited computing efficiency to do so. Although some statistical models are usually mathematically elegant, many of them are also computationally inefficient, and they are impractical for many traits, lines, environments, and years because they need to sample from huge normal multivariate distributions. For these reasons, this study explores two recommender systems: item-based collaborative filtering (IBCF and the matrix factorization algorithm (MF in the context of multiple traits and multiple environments. The IBCF and MF methods were compared with two conventional methods on simulated and real data. Results of the simulated and real data sets show that the IBCF technique was slightly better in terms of prediction accuracy than the two conventional methods and the MF method when the correlation was moderately high. The IBCF technique is very attractive because it produces good predictions when there is high correlation between items (environment–trait combinations and its implementation is computationally feasible, which can be useful for plant breeders who deal with very large data sets.

  6. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat

    KAUST Repository

    Liu, Guozheng

    2016-07-06

    Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1) examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2) explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3) investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L.) and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs), but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population.

  7. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat.

    Directory of Open Access Journals (Sweden)

    Guozheng Liu

    Full Text Available Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1 examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2 explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3 investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L. and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs, but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population.

  8. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat

    KAUST Repository

    Liu, Guozheng; Zhao, Yusheng; Gowda, Manje; Longin, C. Friedrich H.; Reif, Jochen C.; Mette, Michael F.

    2016-01-01

    Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1) examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2) explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3) investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L.) and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs), but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population.

  9. Genomic Selection Accuracy using Multifamily Prediction Models in a Wheat Breeding Program

    Directory of Open Access Journals (Sweden)

    Elliot L. Heffner

    2011-03-01

    Full Text Available Genomic selection (GS uses genome-wide molecular marker data to predict the genetic value of selection candidates in breeding programs. In plant breeding, the ability to produce large numbers of progeny per cross allows GS to be conducted within each family. However, this approach requires phenotypes of lines from each cross before conducting GS. This will prolong the selection cycle and may result in lower gains per year than approaches that estimate marker-effects with multiple families from previous selection cycles. In this study, phenotypic selection (PS, conventional marker-assisted selection (MAS, and GS prediction accuracy were compared for 13 agronomic traits in a population of 374 winter wheat ( L. advanced-cycle breeding lines. A cross-validation approach that trained and validated prediction accuracy across years was used to evaluate effects of model selection, training population size, and marker density in the presence of genotype × environment interactions (G×E. The average prediction accuracies using GS were 28% greater than with MAS and were 95% as accurate as PS. For net merit, the average accuracy across six selection indices for GS was 14% greater than for PS. These results provide empirical evidence that multifamily GS could increase genetic gain per unit time and cost in plant breeding.

  10. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat

    Science.gov (United States)

    Liu, Guozheng; Zhao, Yusheng; Gowda, Manje; Longin, C. Friedrich H.; Reif, Jochen C.; Mette, Michael F.

    2016-01-01

    Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1) examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2) explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3) investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L.) and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs), but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population. PMID:27383841

  11. Genomic biomarkers of prenatal intrauterine inflammation in umbilical cord tissue predict later life neurological outcomes.

    Directory of Open Access Journals (Sweden)

    Sloane K Tilley

    Full Text Available Preterm birth is a major risk factor for neurodevelopmental delays and disorders. This study aimed to identify genomic biomarkers of intrauterine inflammation in umbilical cord tissue in preterm neonates that predict cognitive impairment at 10 years of age.Genome-wide messenger RNA (mRNA levels from umbilical cord tissue were obtained from 43 neonates born before 28 weeks of gestation. Genes that were differentially expressed across four indicators of intrauterine inflammation were identified and their functions examined. Exact logistic regression was used to test whether expression levels in umbilical cord tissue predicted neurocognitive function at 10 years of age.Placental indicators of inflammation were associated with changes in the mRNA expression of 445 genes in umbilical cord tissue. Transcripts with decreased expression showed significant enrichment for biological signaling processes related to neuronal development and growth. The altered expression of six genes was found to predict neurocognitive impairment when children were 10 years old These genes include two that encode for proteins involved in neuronal development.Prenatal intrauterine inflammation is associated with altered gene expression in umbilical cord tissue. A set of six of the differentially expressed genes predict cognitive impairment later in life, suggesting that the fetal environment is associated with significant adverse effects on neurodevelopment that persist into later childhood.

  12. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  13. Genomic prediction based on next generation sequencing of 1000 F2-families in Lolium perenne L

    DEFF Research Database (Denmark)

    Fé, Dario; Ashraf, Bilal; Greve-Pedersen, Morten

    2014-01-01

    and abiotic stresses. The study is performed on 995 F2 families originated from the DLF breeding program. All families were genotyped by reduced representation sequencing. A total of 1,020,065 SNPs were detected and used for genomic prediction. First analyses, used for model testing, have been carried out...... on salt stress tolerance. Ryegrass families where sown in rockwool blocks (in four replicates) in greenhouse, and allowed to establish over 60 days using standard fertilization and watering. Three consecutive treatments, with increasing salt (NaCl) concentrations, were applied. Ten days after initiation...... of each treatment, the percentage of green matter was evaluated by visual scoring and by digital imaging. Preliminary analysis using GBLUP have identified a significant amount of genetic variance (individual heritabilities ranging between 0.20 and 0.40 and family heritabilities up to about 0.15). Genomic...

  14. Predictive Genomic Analyses Inform the Basis for Vitamin Metabolism and Provisioning in Bacteria-Arthropod Endosymbioses.

    Science.gov (United States)

    Serbus, Laura R; Rodriguez, Brian Garcia; Sharmin, Zinat; Momtaz, A J M Zehadee; Christensen, Steen

    2017-06-07

    The requirement of vitamins for core metabolic processes creates a unique set of pressures for arthropods subsisting on nutrient-limited diets. While endosymbiotic bacteria carried by arthropods have been widely implicated in vitamin provisioning, the underlying molecular mechanisms are not well understood. To address this issue, standardized predictive assessment of vitamin metabolism was performed in 50 endosymbionts of insects and arachnids. The results predicted that arthropod endosymbionts overall have little capacity for complete de novo biosynthesis of conventional or active vitamin forms. Partial biosynthesis pathways were commonly predicted, suggesting a substantial role in vitamin provisioning. Neither taxonomic relationships between host and symbiont, nor the mode of host-symbiont interaction were clear predictors of endosymbiont vitamin pathway capacity. Endosymbiont genome size and the synthetic capacity of nonsymbiont taxonomic relatives were more reliable predictors. We developed a new software application that also predicted that last-step conversion of intermediates into active vitamin forms may contribute further to vitamin biosynthesis by endosymbionts. Most instances of predicted vitamin conversion were paralleled by predictions of vitamin use. This is consistent with achievement of provisioning in some cases through upregulation of pathways that were retained for endosymbiont benefit. The predicted absence of other enzyme classes further suggests a baseline of vitamin requirement by the majority of endosymbionts, as well as some instances of putative mutualism. Adaptation of this workflow to analysis of other organisms and metabolic pathways will provide new routes for considering the molecular basis for symbiosis on a comprehensive scale. Copyright © 2017 Serbus et al.

  15. Predictive Genomic Analyses Inform the Basis for Vitamin Metabolism and Provisioning in Bacteria-Arthropod Endosymbioses

    Directory of Open Access Journals (Sweden)

    Laura R. Serbus

    2017-06-01

    Full Text Available The requirement of vitamins for core metabolic processes creates a unique set of pressures for arthropods subsisting on nutrient-limited diets. While endosymbiotic bacteria carried by arthropods have been widely implicated in vitamin provisioning, the underlying molecular mechanisms are not well understood. To address this issue, standardized predictive assessment of vitamin metabolism was performed in 50 endosymbionts of insects and arachnids. The results predicted that arthropod endosymbionts overall have little capacity for complete de novo biosynthesis of conventional or active vitamin forms. Partial biosynthesis pathways were commonly predicted, suggesting a substantial role in vitamin provisioning. Neither taxonomic relationships between host and symbiont, nor the mode of host-symbiont interaction were clear predictors of endosymbiont vitamin pathway capacity. Endosymbiont genome size and the synthetic capacity of nonsymbiont taxonomic relatives were more reliable predictors. We developed a new software application that also predicted that last-step conversion of intermediates into active vitamin forms may contribute further to vitamin biosynthesis by endosymbionts. Most instances of predicted vitamin conversion were paralleled by predictions of vitamin use. This is consistent with achievement of provisioning in some cases through upregulation of pathways that were retained for endosymbiont benefit. The predicted absence of other enzyme classes further suggests a baseline of vitamin requirement by the majority of endosymbionts, as well as some instances of putative mutualism. Adaptation of this workflow to analysis of other organisms and metabolic pathways will provide new routes for considering the molecular basis for symbiosis on a comprehensive scale.

  16. Calibrated prediction of Pine Island Glacier retreat during the 21st and 22nd centuries with a coupled flowline model

    Science.gov (United States)

    Gladstone, Rupert M.; Lee, Victoria; Rougier, Jonathan; Payne, Antony J.; Hellmer, Hartmut; Le Brocq, Anne; Shepherd, Andrew; Edwards, Tamsin L.; Gregory, Jonathan; Cornford, Stephen L.

    2012-06-01

    A flowline ice sheet model is coupled to a box model for cavity circulation and configured for the Pine Island Glacier. An ensemble of 5000 simulations are carried out from 1900 to 2200 with varying inputs and parameters, forced by ocean temperatures predicted by a regional ocean model under the A1B ‘business as usual’ emissions scenario. Comparison is made against recent observations to provide a calibrated prediction in the form of a 95% confidence set. Predictions are for monotonic (apart from some small scale fluctuations in a minority of cases) retreat of the grounding line over the next 200 yr with huge uncertainty in the rate of retreat. Full collapse of the main trunk of the PIG during the 22nd century remains a possibility.

  17. Where does Neisseria acquire foreign DNA from: an examination of the source of genomic and pathogenic islands and the evolution of the Neisseria genus.

    Science.gov (United States)

    Putonti, Catherine; Nowicki, Bogdan; Shaffer, Michael; Fofanov, Yuriy; Nowicki, Stella

    2013-09-04

    Pathogenicity islands (PAIs) or genomic islands (GEIs) are considered to be the result of a recent horizontal transfer. Detecting PAIs/GEIs as well as their putative source can provide insight into the organism's pathogenicity within its host. Previously we introduced a tool called S-plot which provides a visual representation of the variation in compositional properties across and between genomic sequences. Utilizing S-plot and new functionality developed here, we examined 18 publicly available Neisseria genomes, including strains of both pathogenic and non-pathogenic species, in order to identify regions of unusual compositional properties (RUCPs) using both a sliding window as well as a gene-by-gene approach. Numerous GEIs and PAIs were identified including virulence genes previously found within the pathogenic Neisseria species. While some genes were conserved amongst all species, only pathogenic species, or an individual species, a number of genes were detected that are unique to an individual strain. While the majority of such genes have an origin unknown, a number of putative sources including pathogenic and capsule-containing bacteria were determined, indicative of gene exchange between Neisseria spp. and other bacteria within their microhabitat. Furthermore, we uncovered evidence that both N. meningitidis and N. gonorrhoeae have separately acquired DNA from their human host. Data suggests that all three Neisseria species have received horizontally transferred elements post-speciation. Using this approach, we were able to not only find previously identified regions of virulence but also new regions which may be contributing to the virulence of the species. This comparative analysis provides a means for tracing the evolutionary history of the acquisition of foreign DNA within this genus. Looking specifically at the RUCPs present within the 18 genomes considered, a stronger similarity between N. meningitidis and N. lactamica is observed, suggesting that N

  18. miRNAFold: a web server for fast miRNA precursor prediction in genomes.

    Science.gov (United States)

    Tav, Christophe; Tempel, Sébastien; Poligny, Laurent; Tahi, Fariza

    2016-07-08

    Computational methods are required for prediction of non-coding RNAs (ncRNAs), which are involved in many biological processes, especially at post-transcriptional level. Among these ncRNAs, miRNAs have been largely studied and biologists need efficient and fast tools for their identification. In particular, ab initio methods are usually required when predicting novel miRNAs. Here we present a web server dedicated for miRNA precursors identification at a large scale in genomes. It is based on an algorithm called miRNAFold that allows predicting miRNA hairpin structures quickly with high sensitivity. miRNAFold is implemented as a web server with an intuitive and user-friendly interface, as well as a standalone version. The web server is freely available at: http://EvryRNA.ibisc.univ-evry.fr/miRNAFold. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Genome-Wide Polygenic Scores Predict Reading Performance Throughout the School Years.

    Science.gov (United States)

    Selzam, Saskia; Dale, Philip S; Wagner, Richard K; DeFries, John C; Cederlöf, Martin; O'Reilly, Paul F; Krapohl, Eva; Plomin, Robert

    2017-07-04

    It is now possible to create individual-specific genetic scores, called genome-wide polygenic scores (GPS). We used a GPS for years of education ( EduYears ) to predict reading performance assessed at UK National Curriculum Key Stages 1 (age 7), 2 (age 12) and 3 (age 14) and on reading tests administered at ages 7 and 12 in a UK sample of 5,825 unrelated individuals. EduYears GPS accounts for up to 5% of the variance in reading performance at age 14. GPS predictions remained significant after accounting for general cognitive ability and family socioeconomic status. Reading performance of children in the lowest and highest 12.5% of the EduYears GPS distribution differed by a mean growth in reading ability of approximately two school years. It seems certain that polygenic scores will be used to predict strengths and weaknesses in education.

  20. Computational prediction of cAMP receptor protein (CRP binding sites in cyanobacterial genomes

    Directory of Open Access Journals (Sweden)

    Su Zhengchang

    2009-01-01

    Full Text Available Abstract Background Cyclic AMP receptor protein (CRP, also known as catabolite gene activator protein (CAP, is an important transcriptional regulator widely distributed in many bacteria. The biological processes under the regulation of CRP are highly diverse among different groups of bacterial species. Elucidation of CRP regulons in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. Previously, CRP has been experimentally studied in only two cyanobacterial strains: Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120; therefore, a systematic genome-scale study of the potential CRP target genes and binding sites in cyanobacterial genomes is urgently needed. Results We have predicted and analyzed the CRP binding sites and regulons in 12 sequenced cyanobacterial genomes using a highly effective cis-regulatory binding site scanning algorithm. Our results show that cyanobacterial CRP binding sites are very similar to those in E. coli; however, the regulons are very different from that of E. coli. Furthermore, CRP regulons in different cyanobacterial species/ecotypes are also highly diversified, ranging from photosynthesis, carbon fixation and nitrogen assimilation, to chemotaxis and signal transduction. In addition, our prediction indicates that crp genes in modern cyanobacteria are likely inherited from a common ancestral gene in their last common ancestor, and have adapted various cellular functions in different environments, while some cyanobacteria lost their crp genes as well as CRP binding sites during the course of evolution. Conclusion The CRP regulons in cyanobacteria are highly diversified, probably as a result of divergent evolution to adapt to various ecological niches. Cyanobacterial CRPs may function as lineage-specific regulators participating in various cellular processes, and are important in some lineages. However, they are dispensable in some other lineages. The

  1. Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model.

    Science.gov (United States)

    Lopez-Cruz, Marco; Crossa, Jose; Bonnett, David; Dreisigacker, Susanne; Poland, Jesse; Jannink, Jean-Luc; Singh, Ravi P; Autrique, Enrique; de los Campos, Gustavo

    2015-02-06

    Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates of selection. Originally, these models were developed without considering genotype × environment interaction(G×E). Several authors have proposed extensions of the single-environment GS model that accommodate G×E using either covariance functions or environmental covariates. In this study, we model G×E using a marker × environment interaction (M×E) GS model; the approach is conceptually simple and can be implemented with existing GS software. We discuss how the model can be implemented by using an explicit regression of phenotypes on markers or using co-variance structures (a genomic best linear unbiased prediction-type model). We used the M×E model to analyze three CIMMYT wheat data sets (W1, W2, and W3), where more than 1000 lines were genotyped using genotyping-by-sequencing and evaluated at CIMMYT's research station in Ciudad Obregon, Mexico, under simulated environmental conditions that covered different irrigation levels, sowing dates and planting systems. We compared the M×E model with a stratified (i.e., within-environment) analysis and with a standard (across-environment) GS model that assumes that effects are constant across environments (i.e., ignoring G×E). The prediction accuracy of the M×E model was substantially greater of that of an across-environment analysis that ignores G×E. Depending on the prediction problem, the M×E model had either similar or greater levels of prediction accuracy than the stratified analyses. The M×E model decomposes marker effects and genomic values into components that are stable across environments (main effects) and others that are environment-specific (interactions). Therefore, in principle, the interaction model could shed light over which variants have effects that are stable across environments and which ones are responsible for G×E. The data set and the scripts required to reproduce the analysis are

  2. Pre-drilling prediction techniques on the high-temperature high-pressure hydrocarbon reservoirs offshore Hainan Island, China

    Science.gov (United States)

    Zhang, Hanyu; Liu, Huaishan; Wu, Shiguo; Sun, Jin; Yang, Chaoqun; Xie, Yangbing; Chen, Chuanxu; Gao, Jinwei; Wang, Jiliang

    2018-02-01

    Decreasing the risks and geohazards associated with drilling engineering in high-temperature high-pressure (HTHP) geologic settings begins with the implementation of pre-drilling prediction techniques (PPTs). To improve the accuracy of geopressure prediction in HTHP hydrocarbon reservoirs offshore Hainan Island, we made a comprehensive summary of current PPTs to identify existing problems and challenges by analyzing the global distribution of HTHP hydrocarbon reservoirs, the research status of PPTs, and the geologic setting and its HTHP formation mechanism. Our research results indicate that the HTHP formation mechanism in the study area is caused by multiple factors, including rapid loading, diapir intrusions, hydrocarbon generation, and the thermal expansion of pore fluids. Due to this multi-factor interaction, a cloud of HTHP hydrocarbon reservoirs has developed in the Ying-Qiong Basin, but only traditional PPTs have been implemented, based on the assumption of conditions that do not conform to the actual geologic environment, e.g., Bellotti's law and Eaton's law. In this paper, we focus on these issues, identify some challenges and solutions, and call for further PPT research to address the drawbacks of previous works and meet the challenges associated with the deepwater technology gap. In this way, we hope to contribute to the improved accuracy of geopressure prediction prior to drilling and provide support for future HTHP drilling offshore Hainan Island.

  3. Prediction of malting quality traits in barley based on genome-wide marker data to assess the potential of genomic selection.

    Science.gov (United States)

    Schmidt, Malthe; Kollers, Sonja; Maasberg-Prelle, Anja; Großer, Jörg; Schinkel, Burkhard; Tomerius, Alexandra; Graner, Andreas; Korzun, Viktor

    2016-02-01

    Genomic prediction of malting quality traits in barley shows the potential of applying genomic selection to improve selection for malting quality and speed up the breeding process. Genomic selection has been applied to various plant species, mostly for yield or yield-related traits such as grain dry matter yield or thousand kernel weight, and improvement of resistances against diseases. Quality traits have not been the main scope of analysis for genomic selection, but have rather been addressed by marker-assisted selection. In this study, the potential to apply genomic selection to twelve malting quality traits in two commercial breeding programs of spring and winter barley (Hordeum vulgare L.) was assessed. Phenotypic means were calculated combining multilocational field trial data from 3 or 4 years, depending on the trait investigated. Three to five locations were available in each of these years. Heritabilities for malting traits ranged between 0.50 and 0.98. Predictive abilities (PA), as derived from cross validation, ranged between 0.14 to 0.58 for spring barley and 0.40-0.80 for winter barley. Small training sets were shown to be sufficient to obtain useful PAs, possibly due to the narrow genetic base in this breeding material. Deployment of genomic selection in malting barley breeding clearly has the potential to reduce cost intensive phenotyping for quality traits, increase selection intensity and to shorten breeding cycles.

  4. Genome-Enabled Prediction of Breeding Values for Feedlot Average Daily Weight Gain in Nelore Cattle

    Directory of Open Access Journals (Sweden)

    Adriana L. Somavilla

    2017-06-01

    Full Text Available Nelore is the most economically important cattle breed in Brazil, and the use of genetically improved animals has contributed to increased beef production efficiency. The Brazilian beef feedlot industry has grown considerably in the last decade, so the selection of animals with higher growth rates on feedlot has become quite important. Genomic selection (GS could be used to reduce generation intervals and improve the rate of genetic gains. The aim of this study was to evaluate the prediction of genomic-estimated breeding values (GEBV for average daily weight gain (ADG in 718 feedlot-finished Nelore steers. Analyses of three Bayesian model specifications [Bayesian GBLUP (BGBLUP, BayesA, and BayesCπ] were performed with four genotype panels [Illumina BovineHD BeadChip, TagSNPs, and GeneSeek High- and Low-density indicus (HDi and LDi, respectively]. Estimates of Pearson correlations, regression coefficients, and mean squared errors were used to assess accuracy and bias of predictions. Overall, the BayesCπ model resulted in less biased predictions. Accuracies ranged from 0.18 to 0.27, which are reasonable values given the heritability estimates (from 0.40 to 0.44 and sample size (568 animals in the training population. Furthermore, results from Bos taurus indicus panels were as informative as those from Illumina BovineHD, indicating that they could be used to implement GS at lower costs.

  5. PRISM offers a comprehensive genomic approach to transcription factor function prediction

    KAUST Repository

    Wenger, A. M.; Clarke, S. L.; Guturu, H.; Chen, J.; Schaar, B. T.; McLean, C. Y.; Bejerano, G.

    2013-01-01

    The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

  6. PRISM offers a comprehensive genomic approach to transcription factor function prediction

    KAUST Repository

    Wenger, A. M.

    2013-02-04

    The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

  7. Integration of the blaNDM-1 carbapenemase gene into Proteus genomic island 1 (PGI1-PmPEL) in a Proteus mirabilis clinical isolate.

    Science.gov (United States)

    Girlich, Delphine; Dortet, Laurent; Poirel, Laurent; Nordmann, Patrice

    2015-01-01

    To decipher the mechanisms and their associated genetic determinants responsible for β-lactam resistance in a Proteus mirabilis clinical isolate. The entire genetic structure surrounding the β-lactam resistance genes was characterized by PCR, gene walking and DNA sequencing. Genes encoding the carbapenemase NDM-1 and the ESBL VEB-6 were located in a 38.5 kb MDR structure, which itself was inserted into a new variant of the Proteus genomic island 1 (PGI1). This new PGI1-PmPEL variant of 64.4 kb was chromosomally located, as an external circular form in the P. mirabilis isolate, suggesting potential mobility. This is the first known description of the bla(NDM-1) gene in a genomic island structure, which might further enhance the spread of the bla(NDM-1) carbapenemase gene among enteric pathogens. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Predicting co-complexed protein pairs using genomic and proteomic data integration

    Directory of Open Access Journals (Sweden)

    King Oliver D

    2004-04-01

    Full Text Available Abstract Background Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H and affinity purification coupled with mass spectrometry (APMS have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. Results Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue, a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database, and the remaining predictions may potentially represent unknown CCPs. Conclusions We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

  9. Assessing Genomic Selection Prediction Accuracy in a Dynamic Barley Breeding Population

    Directory of Open Access Journals (Sweden)

    A. H. Sallam

    2015-03-01

    Full Text Available Prediction accuracy of genomic selection (GS has been previously evaluated through simulation and cross-validation; however, validation based on progeny performance in a plant breeding program has not been investigated thoroughly. We evaluated several prediction models in a dynamic barley breeding population comprised of 647 six-row lines using four traits differing in genetic architecture and 1536 single nucleotide polymorphism (SNP markers. The breeding lines were divided into six sets designated as one parent set and five consecutive progeny sets comprised of representative samples of breeding lines over a 5-yr period. We used these data sets to investigate the effect of model and training population composition on prediction accuracy over time. We found little difference in prediction accuracy among the models confirming prior studies that found the simplest model, random regression best linear unbiased prediction (RR-BLUP, to be accurate across a range of situations. In general, we found that using the parent set was sufficient to predict progeny sets with little to no gain in accuracy from generating larger training populations by combining the parent set with subsequent progeny sets. The prediction accuracy ranged from 0.03 to 0.99 across the four traits and five progeny sets. We explored characteristics of the training and validation populations (marker allele frequency, population structure, and linkage disequilibrium, LD as well as characteristics of the trait (genetic architecture and heritability, . Fixation of markers associated with a trait over time was most clearly associated with reduced prediction accuracy for the mycotoxin trait DON. Higher trait in the training population and simpler trait architecture were associated with greater prediction accuracy.

  10. Genomic Selection for Predicting Fusarium Head Blight Resistance in a Wheat Breeding Program

    Directory of Open Access Journals (Sweden)

    Marcio P. Arruda

    2015-11-01

    Full Text Available Genomic selection (GS is a breeding method that uses marker–trait models to predict unobserved phenotypes. This study developed GS models for predicting traits associated with resistance to head blight (FHB in wheat ( L.. We used genotyping-by-sequencing (GBS to identify 5054 single-nucleotide polymorphisms (SNPs, which were then treated as predictor variables in GS analysis. We compared how the prediction accuracy of the genomic-estimated breeding values (GEBVs was affected by (i five genotypic imputation methods (random forest imputation [RFI], expectation maximization imputation [EMI], -nearest neighbor imputation [kNNI], singular value decomposition imputation [SVDI], and the mean imputation [MNI]; (ii three statistical models (ridge-regression best linear unbiased predictor [RR-BLUP], least absolute shrinkage and operator selector [LASSO], and elastic net; (iii marker density ( = 500, 1500, 3000, and 4500 SNPs; (iv training population (TP size ( = 96, 144, 192, and 218; (v marker-based and pedigree-based relationship matrices; and (vi control for relatedness in TPs and validation populations (VPs. No discernable differences in prediction accuracy were observed among imputation methods. The RR-BLUP outperformed other models in nearly all scenarios. Accuracies decreased substantially when marker number decreased to 3000 or 1500 SNPs, depending on the trait; when sample size of the training set was less than 192; when using pedigree-based instead of marker-based matrix; or when no control for relatedness was implemented. Overall, moderate to high prediction accuracies were observed in this study, suggesting that GS is a very promising breeding strategy for FHB resistance in wheat.

  11. A Simple Predictive Enhancer Syntax for Hindbrain Patterning Is Conserved in Vertebrate Genomes.

    Directory of Open Access Journals (Sweden)

    Joseph Grice

    Full Text Available Determining the function of regulatory elements is fundamental for our understanding of development, disease and evolution. However, the sequence features that mediate these functions are often unclear and the prediction of tissue-specific expression patterns from sequence alone is non-trivial. Previous functional studies have demonstrated a link between PBX-HOX and MEIS/PREP binding interactions and hindbrain enhancer activity, but the defining grammar of these sites, if any exists, has remained elusive.Here, we identify a shared sequence signature (syntax within a heterogeneous set of conserved vertebrate hindbrain enhancers composed of spatially co-occurring PBX-HOX and MEIS/PREP transcription factor binding motifs. We use this syntax to accurately predict hindbrain enhancers in 89% of cases (67/75 predicted elements from a set of conserved non-coding elements (CNEs. Furthermore, mutagenesis of the sites abolishes activity or generates ectopic expression, demonstrating their requirement for segmentally restricted enhancer activity in the hindbrain. We refine and use our syntax to predict over 3,000 hindbrain enhancers across the human genome. These sequences tend to be located near developmental transcription factors and are enriched in known hindbrain activating elements, demonstrating the predictive power of this simple model.Our findings support the theory that hundreds of CNEs, and perhaps thousands of regions across the human genome, function to coordinate gene expression in the developing hindbrain. We speculate that deeply conserved sequences of this kind contributed to the co-option of new genes into the hindbrain gene regulatory network during early vertebrate evolution by linking patterns of hox expression to downstream genes involved in segmentation and patterning, and evolutionarily newer instances may have continued to contribute to lineage-specific elaboration of the hindbrain.

  12. A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants.

    Science.gov (United States)

    Pilkington, Sarah M; Crowhurst, Ross; Hilario, Elena; Nardozza, Simona; Fraser, Lena; Peng, Yongyan; Gunaseelan, Kularajathevan; Simpson, Robert; Tahir, Jibran; Deroles, Simon C; Templeton, Kerry; Luo, Zhiwei; Davy, Marcus; Cheng, Canhong; McNeilage, Mark; Scaglione, Davide; Liu, Yifei; Zhang, Qiong; Datson, Paul; De Silva, Nihal; Gardiner, Susan E; Bassett, Heather; Chagné, David; McCallum, John; Dzierzon, Helge; Deng, Cecilia; Wang, Yen-Yi; Barron, Lorna; Manako, Kelvina; Bowen, Judith; Foster, Toshi M; Erridge, Zoe A; Tiffin, Heather; Waite, Chethi N; Davies, Kevin M; Grierson, Ella P; Laing, William A; Kirk, Rebecca; Chen, Xiuyin; Wood, Marion; Montefiori, Mirco; Brummell, David A; Schwinn, Kathy E; Catanach, Andrew; Fullerton, Christina; Li, Dawei; Meiyalaghan, Sathiyamoorthy; Nieuwenhuizen, Niels; Read, Nicola; Prakash, Roneel; Hunter, Don; Zhang, Huaibi; McKenzie, Marian; Knäbel, Mareike; Harris, Alastair; Allan, Andrew C; Gleave, Andrew; Chen, Angela; Janssen, Bart J; Plunkett, Blue; Ampomah-Dwamena, Charles; Voogd, Charlotte; Leif, Davin; Lafferty, Declan; Souleyre, Edwige J F; Varkonyi-Gasic, Erika; Gambi, Francesco; Hanley, Jenny; Yao, Jia-Long; Cheung, Joey; David, Karine M; Warren, Ben; Marsh, Ken; Snowden, Kimberley C; Lin-Wang, Kui; Brian, Lara; Martinez-Sanchez, Marcela; Wang, Mindy; Ileperuma, Nadeesha; Macnee, Nikolai; Campin, Robert; McAtee, Peter; Drummond, Revel S M; Espley, Richard V; Ireland, Hilary S; Wu, Rongmei; Atkinson, Ross G; Karunairetnam, Sakuntala; Bulley, Sean; Chunkath, Shayhan; Hanley, Zac; Storey, Roy; Thrimawithana, Amali H; Thomson, Susan; David, Charles; Testolin, Raffaele; Huang, Hongwen; Hellens, Roger P; Schaffer, Robert J

    2018-04-16

    Most published genome sequences are drafts, and most are dominated by computational gene prediction. Draft genomes typically incorporate considerable sequence data that are not assigned to chromosomes, and predicted genes without quality confidence measures. The current Actinidia chinensis (kiwifruit) 'Hongyang' draft genome has 164 Mb of sequences unassigned to pseudo-chromosomes, and omissions have been identified in the gene models. A second genome of an A. chinensis (genotype Red5) was fully sequenced. This new sequence resulted in a 554.0 Mb assembly with all but 6 Mb assigned to pseudo-chromosomes. Pseudo-chromosomal comparisons showed a considerable number of translocation events have occurred following a whole genome duplication (WGD) event some consistent with centromeric Robertsonian-like translocations. RNA sequencing data from 12 tissues and ab initio analysis informed a genome-wide manual annotation, using the WebApollo tool. In total, 33,044 gene loci represented by 33,123 isoforms were identified, named and tagged for quality of evidential support. Of these 3114 (9.4%) were identical to a protein within 'Hongyang' The Kiwifruit Information Resource (KIR v2). Some proportion of the differences will be varietal polymorphisms. However, as most computationally predicted Red5 models required manual re-annotation this proportion is expected to be small. The quality of the new gene models was tested by fully sequencing 550 cloned 'Hort16A' cDNAs and comparing with the predicted protein models for Red5 and both the original 'Hongyang' assembly and the revised annotation from KIR v2. Only 48.9% and 63.5% of the cDNAs had a match with 90% identity or better to the original and revised 'Hongyang' annotation, respectively, compared with 90.9% to the Red5 models. Our study highlights the need to take a cautious approach to draft genomes and computationally predicted genes. Our use of the manual annotation tool WebApollo facilitated manual checking and

  13. Genome-Wide Association Studies and Comparison of Models and Cross-Validation Strategies for Genomic Prediction of Quality Traits in Advanced Winter Wheat Breeding Lines

    Directory of Open Access Journals (Sweden)

    Peter S. Kristensen

    2018-02-01

    Full Text Available The aim of the this study was to identify SNP markers associated with five important wheat quality traits (grain protein content, Zeleny sedimentation, test weight, thousand-kernel weight, and falling number, and to investigate the predictive abilities of GBLUP and Bayesian Power Lasso models for genomic prediction of these traits. In total, 635 winter wheat lines from two breeding cycles in the Danish plant breeding company Nordic Seed A/S were phenotyped for the quality traits and genotyped for 10,802 SNPs. GWAS were performed using single marker regression and Bayesian Power Lasso models. SNPs with large effects on Zeleny sedimentation were found on chromosome 1B, 1D, and 5D. However, GWAS failed to identify single SNPs with significant effects on the other traits, indicating that these traits were controlled by many QTL with small effects. The predictive abilities of the models for genomic prediction were studied using different cross-validation strategies. Leave-One-Out cross-validations resulted in correlations between observed phenotypes corrected for fixed effects and genomic estimated breeding values of 0.50 for grain protein content, 0.66 for thousand-kernel weight, 0.70 for falling number, 0.71 for test weight, and 0.79 for Zeleny sedimentation. Alternative cross-validations showed that the genetic relationship between lines in training and validation sets had a bigger impact on predictive abilities than the number of lines included in the training set. Using Bayesian Power Lasso instead of GBLUP models, gave similar or slightly higher predictive abilities. Genomic prediction based on all SNPs was more effective than prediction based on few associated SNPs.

  14. SeMPI: a genome-based secondary metabolite prediction and identification web server.

    Science.gov (United States)

    Zierep, Paul F; Padilla, Natàlia; Yonchev, Dimitar G; Telukunta, Kiran K; Klementz, Dennis; Günther, Stefan

    2017-07-03

    The secondary metabolism of bacteria, fungi and plants yields a vast number of bioactive substances. The constantly increasing amount of published genomic data provides the opportunity for an efficient identification of gene clusters by genome mining. Conversely, for many natural products with resolved structures, the encoding gene clusters have not been identified yet. Even though genome mining tools have become significantly more efficient in the identification of biosynthetic gene clusters, structural elucidation of the actual secondary metabolite is still challenging, especially due to as yet unpredictable post-modifications. Here, we introduce SeMPI, a web server providing a prediction and identification pipeline for natural products synthesized by polyketide synthases of type I modular. In order to limit the possible structures of PKS products and to include putative tailoring reactions, a structural comparison with annotated natural products was introduced. Furthermore, a benchmark was designed based on 40 gene clusters with annotated PKS products. The web server of the pipeline (SeMPI) is freely available at: http://www.pharmaceutical-bioinformatics.de/sempi. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Applications of population genetics to animal breeding, from wright, fisher and lush to genomic prediction.

    Science.gov (United States)

    Hill, William G

    2014-01-01

    Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives' performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher's infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with "genomic selection" is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas.

  16. High Genomic Instability Predicts Survival in Metastatic High-Risk Neuroblastoma

    Directory of Open Access Journals (Sweden)

    Sara Stigliani

    2012-09-01

    Full Text Available We aimed to identify novel molecular prognostic markers to better predict relapse risk estimate for children with high-risk (HR metastatic neuroblastoma (NB. We performed genome- and/or transcriptome-wide analyses of 129 stage 4 HR NBs. Children older than 1 year of age were categorized as “short survivors” (dead of disease within 5 years from diagnosis and “long survivors” (alive with an overall survival time ≥ 5 years. We reported that patients with less than three segmental copy number aberrations in their tumor represent a molecularly defined subgroup with a high survival probability within the current HR group of patients. The complex genomic pattern is a prognostic marker independent of NB-associated chromosomal aberrations, i.e., MYCN amplification, 1p and 11q losses, and 17q gain. Integrative analysis of genomic and expression signatures demonstrated that fatal outcome is mainly associated with loss of cell cycle control and deregulation of Rho guanosine triphosphates (GTPases functioning in neuritogenesis. Tumors with MYCN amplification show a lower chromosome instability compared to MYCN single-copy NBs (P = .0008, dominated by 17q gain and 1p loss. Moreover, our results suggest that the MYCN amplification mainly drives disruption of neuronal differentiation and reduction of cell adhesion process involved in tumor invasion and metastasis. Further validation studies are warranted to establish this as a risk stratification for patients.

  17. Downstream Antisense Transcription Predicts Genomic Features That Define the Specific Chromatin Environment at Mammalian Promoters.

    Directory of Open Access Journals (Sweden)

    Christopher A Lavender

    2016-08-01

    Full Text Available Antisense transcription is a prevalent feature at mammalian promoters. Previous studies have primarily focused on antisense transcription initiating upstream of genes. Here, we characterize promoter-proximal antisense transcription downstream of gene transcription starts sites in human breast cancer cells, investigating the genomic context of downstream antisense transcription. We find extensive correlations between antisense transcription and features associated with the chromatin environment at gene promoters. Antisense transcription downstream of promoters is widespread, with antisense transcription initiation observed within 2 kb of 28% of gene transcription start sites. Antisense transcription initiates between nucleosomes regularly positioned downstream of these promoters. The nucleosomes between gene and downstream antisense transcription start sites carry histone modifications associated with active promoters, such as H3K4me3 and H3K27ac. This region is bound by chromatin remodeling and histone modifying complexes including SWI/SNF subunits and HDACs, suggesting that antisense transcription or resulting RNA transcripts contribute to the creation and maintenance of a promoter-associated chromatin environment. Downstream antisense transcription overlays additional regulatory features, such as transcription factor binding, DNA accessibility, and the downstream edge of promoter-associated CpG islands. These features suggest an important role for antisense transcription in the regulation of gene expression and the maintenance of a promoter-associated chromatin environment.

  18. Genomic prediction by single-step genomic BLUP using cow reference population in Holstein crossbred cattle in India

    DEFF Research Database (Denmark)

    Nayee, Nilesh Kumar; Su, Guosheng; Gajjar, Swapnil

    2018-01-01

    Advantages of genomic selection in breeds with limited numbers of progeny tested bulls have been demonstrated by adding genotypes of females to the reference population (Thomasen et al., 2014). The current study was conducted to explore the feasibility of implementing genomic selection in a Holst......Advantages of genomic selection in breeds with limited numbers of progeny tested bulls have been demonstrated by adding genotypes of females to the reference population (Thomasen et al., 2014). The current study was conducted to explore the feasibility of implementing genomic selection...... in a Holstein Friesian crossbred population with cows kept under small holder conditions using test day records and single step genomic BLUP (ssGBLUP). Milk yield records from 10,797 daughters sired by 258 bulls were used Of these 2194 daughters and 109 sires were genotyped with customized genotyping chip...

  19. Strategies for Selecting Crosses Using Genomic Prediction in Two Wheat Breeding Programs.

    Science.gov (United States)

    Lado, Bettina; Battenfield, Sarah; Guzmán, Carlos; Quincke, Martín; Singh, Ravi P; Dreisigacker, Susanne; Peña, R Javier; Fritz, Allan; Silva, Paula; Poland, Jesse; Gutiérrez, Lucía

    2017-07-01

    The single most important decision in plant breeding programs is the selection of appropriate crosses. The ideal cross would provide superior predicted progeny performance and enough diversity to maintain genetic gain. The aim of this study was to compare the best crosses predicted using combinations of mid-parent value and variance prediction accounting for linkage disequilibrium (V) or assuming linkage equilibrium (V). After predicting the mean and the variance of each cross, we selected crosses based on mid-parent value, the top 10% of the progeny, and weighted mean and variance within progenies for grain yield, grain protein content, mixing time, and loaf volume in two applied wheat ( L.) breeding programs: Instituto Nacional de Investigación Agropecuaria (INIA) Uruguay and CIMMYT Mexico. Although the variance of the progeny is important to increase the chances of finding superior individuals from transgressive segregation, we observed that the mid-parent values of the crosses drove the genetic gain but the variance of the progeny had a small impact on genetic gain for grain yield. However, the relative importance of the variance of the progeny was larger for quality traits. Overall, the genomic resources and the statistical models are now available to plant breeders to predict both the performance of breeding lines per se as well as the value of progeny from any potential crosses. Copyright © 2017 Crop Science Society of America.

  20. Predicting effects of structural stress in a genome-reduced model bacterial metabolism

    Science.gov (United States)

    Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles

    2012-08-01

    Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.

  1. Bayesian prediction of bacterial growth temperature range based on genome sequences

    DEFF Research Database (Denmark)

    Jensen, Dan Børge; Vesth, Tammi Camilla; Hallin, Peter Fischer

    2012-01-01

    Background: The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based...... on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results: This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles...... that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naive Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic...

  2. Intrinsic disorder in Viral Proteins Genome-Linked: experimental and predictive analyses

    Directory of Open Access Journals (Sweden)

    Van Dorsselaer Alain

    2009-02-01

    Full Text Available Abstract Background VPgs are viral proteins linked to the 5' end of some viral genomes. Interactions between several VPgs and eukaryotic translation initiation factors eIF4Es are critical for plant infection. However, VPgs are not restricted to phytoviruses, being also involved in genome replication and protein translation of several animal viruses. To date, structural data are still limited to small picornaviral VPgs. Recently three phytoviral VPgs were shown to be natively unfolded proteins. Results In this paper, we report the bacterial expression, purification and biochemical characterization of two phytoviral VPgs, namely the VPgs of Rice yellow mottle virus (RYMV, genus Sobemovirus and Lettuce mosaic virus (LMV, genus Potyvirus. Using far-UV circular dichroism and size exclusion chromatography, we show that RYMV and LMV VPgs are predominantly or partly unstructured in solution, respectively. Using several disorder predictors, we show that both proteins are predicted to possess disordered regions. We next extend theses results to 14 VPgs representative of the viral diversity. Disordered regions were predicted in all VPg sequences whatever the genus and the family. Conclusion Based on these results, we propose that intrinsic disorder is a common feature of VPgs. The functional role of intrinsic disorder is discussed in light of the biological roles of VPgs.

  3. Predicting growth of the healthy infant using a genome scale metabolic model.

    Science.gov (United States)

    Nilsson, Avlant; Mardinoglu, Adil; Nielsen, Jens

    2017-01-01

    An estimated 165 million children globally have stunted growth, and extensive growth data are available. Genome scale metabolic models allow the simulation of molecular flux over each metabolic enzyme, and are well adapted to analyze biological systems. We used a human genome scale metabolic model to simulate the mechanisms of growth and integrate data about breast-milk intake and composition with the infant's biomass and energy expenditure of major organs. The model predicted daily metabolic fluxes from birth to age 6 months, and accurately reproduced standard growth curves and changes in body composition. The model corroborates the finding that essential amino and fatty acids do not limit growth, but that energy is the main growth limiting factor. Disruptions to the supply and demand of energy markedly affected the predicted growth, indicating that elevated energy expenditure may be detrimental. The model was used to simulate the metabolic effect of mineral deficiencies, and showed the greatest growth reduction for deficiencies in copper, iron, and magnesium ions which affect energy production through oxidative phosphorylation. The model and simulation method were integrated to a platform and shared with the research community. The growth model constitutes another step towards the complete representation of human metabolism, and may further help improve the understanding of the mechanisms underlying stunting.

  4. High-quality draft genome sequence of Ensifer meliloti Mlalz-1, a microsymbiont of Medicago laciniata (L.) miller collected in Lanzarote, Canary Islands, Spain.

    Science.gov (United States)

    Osman, Wan Adnawani Meor; van Berkum, Peter; León-Barrios, Milagros; Velázquez, Encarna; Elia, Patrick; Tian, Rui; Ardley, Julie; Gollagher, Margaret; Seshadri, Rekha; Reddy, T B K; Ivanova, Natalia; Woyke, Tanja; Pati, Amrita; Markowitz, Victor; Baeshen, Mohamed N; Baeshen, Naseebh Nabeeh; Kyrpides, Nikos; Reeve, Wayne

    2017-01-01

    10.1601/nm.1335 Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata . This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here the features of 10.1601/nm.1335 Mlalz-1 are described, together with high-quality permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to 10.1601/nm.1335 10.1601/strainfinder?urlappend=%3Fid%3DIAM+12611 T , 10.1601/nm.1334 A 321 T and 10.1601/nm.17831 10.1601/strainfinder?urlappend=%3Fid%3DORS+1407 T , based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as 10.1601/nm.1335. Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata -nodulating 10.1601/nm.1328 strains, but ≤93% with nodC of 10.1601/nm.1328 strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced 10.1601/nm.1335 strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In 10.1601/nm.1334 strain 10.1601/strainfinder?urlappend=%3Fid%3DWSM+419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of 10.1601/nm.1334 strains, which suggests genetic

  5. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    Science.gov (United States)

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2018-04-01

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  6. Host Genome Influence on Gut Microbial Composition and Microbial Prediction of Complex Traits in Pigs.

    Science.gov (United States)

    Camarinha-Silva, Amelia; Maushammer, Maria; Wellmann, Robin; Vital, Marius; Preuss, Siegfried; Bennewitz, Jörn

    2017-07-01

    The aim of the present study was to analyze the interplay between gastrointestinal tract (GIT) microbiota, host genetics, and complex traits in pigs using extended quantitative-genetic methods. The study design consisted of 207 pigs that were housed and slaughtered under standardized conditions, and phenotyped for daily gain, feed intake, and feed conversion rate. The pigs were genotyped with a standard 60 K SNP chip. The GIT microbiota composition was analyzed by 16S rRNA gene amplicon sequencing technology. Eight from 49 investigated bacteria genera showed a significant narrow sense host heritability, ranging from 0.32 to 0.57. Microbial mixed linear models were applied to estimate the microbiota variance for each complex trait. The fraction of phenotypic variance explained by the microbial variance was 0.28, 0.21, and 0.16 for daily gain, feed conversion, and feed intake, respectively. The SNP data and the microbiota composition were used to predict the complex traits using genomic best linear unbiased prediction (G-BLUP) and microbial best linear unbiased prediction (M-BLUP) methods, respectively. The prediction accuracies of G-BLUP were 0.35, 0.23, and 0.20 for daily gain, feed conversion, and feed intake, respectively. The corresponding prediction accuracies of M-BLUP were 0.41, 0.33, and 0.33. Thus, in addition to SNP data, microbiota abundances are an informative source of complex trait predictions. Since the pig is a well-suited animal for modeling the human digestive tract, M-BLUP, in addition to G-BLUP, might be beneficial for predicting human predispositions to some diseases, and, consequently, for preventative and personalized medicine. Copyright © 2017 by the Genetics Society of America.

  7. Estimation and prediction of maximum daily rainfall at Sagar Island using best fit probability models

    Science.gov (United States)

    Mandal, S.; Choudhury, B. U.

    2015-07-01

    Sagar Island, setting on the continental shelf of Bay of Bengal, is one of the most vulnerable deltas to the occurrence of extreme rainfall-driven climatic hazards. Information on probability of occurrence of maximum daily rainfall will be useful in devising risk management for sustaining rainfed agrarian economy vis-a-vis food and livelihood security. Using six probability distribution models and long-term (1982-2010) daily rainfall data, we studied the probability of occurrence of annual, seasonal and monthly maximum daily rainfall (MDR) in the island. To select the best fit distribution models for annual, seasonal and monthly time series based on maximum rank with minimum value of test statistics, three statistical goodness of fit tests, viz. Kolmogorove-Smirnov test (K-S), Anderson Darling test ( A 2 ) and Chi-Square test ( X 2) were employed. The fourth probability distribution was identified from the highest overall score obtained from the three goodness of fit tests. Results revealed that normal probability distribution was best fitted for annual, post-monsoon and summer seasons MDR, while Lognormal, Weibull and Pearson 5 were best fitted for pre-monsoon, monsoon and winter seasons, respectively. The estimated annual MDR were 50, 69, 86, 106 and 114 mm for return periods of 2, 5, 10, 20 and 25 years, respectively. The probability of getting an annual MDR of >50, >100, >150, >200 and >250 mm were estimated as 99, 85, 40, 12 and 03 % level of exceedance, respectively. The monsoon, summer and winter seasons exhibited comparatively higher probabilities (78 to 85 %) for MDR of >100 mm and moderate probabilities (37 to 46 %) for >150 mm. For different recurrence intervals, the percent probability of MDR varied widely across intra- and inter-annual periods. In the island, rainfall anomaly can pose a climatic threat to the sustainability of agricultural production and thus needs adequate adaptation and mitigation measures.

  8. An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.

    Science.gov (United States)

    Ferlaino, Michael; Rogers, Mark F; Shihab, Hashem A; Mort, Matthew; Cooper, David N; Gaunt, Tom R; Campbell, Colin

    2017-10-06

    Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.

  9. Multitrait, Random Regression, or Simple Repeatability Model in High-Throughput Phenotyping Data Improve Genomic Prediction for Wheat Grain Yield.

    Science.gov (United States)

    Sun, Jin; Rutkoski, Jessica E; Poland, Jesse A; Crossa, José; Jannink, Jean-Luc; Sorrells, Mark E

    2017-07-01

    High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat ( L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. Copyright © 2017 Crop Science Society of America.

  10. Genome sequence analysis of predicted polyprenol reductase gene from mangrove plant kandelia obovata

    Science.gov (United States)

    Basyuni, M.; Sagami, H.; Baba, S.; Oku, H.

    2018-03-01

    It has been previously reported that dolichols but not polyprenols were predominated in mangrove leaves and roots. Therefore, the occurrence of larger amounts of dolichol in leaves of mangrove plants implies that polyprenol reductase is responsible for the conversion of polyprenol to dolichol may be active in mangrove leaves. Here we report the early assessment of probably polyprenol reductase gene from genome sequence of mangrove plant Kandelia obovata. The functional assignment of the gene was based on a homology search of the sequences against the non-redundant (nr) peptide database of NCBI using Blastx. The degree of sequence identity between DNA sequence and known polyprenol reductase was confirmed using the Blastx probability E-value, total score, and identity. The genome sequence data resulted in three partial sequences, termed c23157 (700 bp), c23901 (960 bp), and c24171 (531 bp). The c23157 gene showed the highest similarity (61%) to predicted polyprenol reductase 2- like from Gossypium raimondii with E-value 2e-100. The second gene was c23901 to exhibit high similarity (78%) to the steroid 5-alpha-reductase Det2 from J. curcas with E-value 2e-140. Furthermore, the c24171 gene depicted highest similarity (79%) to the polyprenol reductase 2 isoform X1 from Jatropha curcas with E- value 7e-21.The present study suggested that the c23157, c23901, and c24171, genes may encode predicted polyprenol reductase. The c23157, c23901, c24171 are therefore the new type of predicted polyprenol reductase from K. obovata.

  11. Computational prediction and molecular confirmation of Helitron transposons in the maize genome

    Directory of Open Access Journals (Sweden)

    He Limei

    2008-01-01

    Full Text Available Abstract Background Helitrons represent a new class of transposable elements recently uncovered in plants and animals. One remarkable feature of Helitrons is their ability to capture gene sequences, which makes them of considerable potential evolutionary importance. However, because Helitrons lack the typical structural features of other DNA transposable elements, identifying them is a challenge. Currently, most researchers identify Helitrons manually by comparing sequences. With the maize whole genome sequencing project underway, an automated computational Helitron searching tool is needed. The characterization of Helitron activities in maize needs to be addressed in order to better understand the impact of Helitrons on the organization of the genome. Results We developed and implemented a heuristic searching algorithm in PERL for identifying Helitrons. Our HelitronFinder program will (i take FASTA-formatted DNA sequences as input and identify the hairpin looping patterns, and (ii exploit the consensus 5' and 3' end sequences of known Helitrons to identify putative ends. We randomly selected five predicted Helitrons from the program's high quality output for molecular verification. Four out of the five predicted Helitrons were confirmed by PCR assays and DNA sequencing in different maize inbred lines. The HelitronFinder program identified two head-to-head dissimilar Helitrons in a maize BAC sequence. Conclusion We have identified 140 new Helitron candidates in maize with our computational tool HelitronFinder by searching maize DNA sequences currently available in GenBank. Four out of five candidates were confirmed to be real by empirical methods, thus validating the predictions of HelitronFinder. Additional points to emerge from our study are that Helitrons do not always insert at an AT dinucleotide in the host sequences, that they can insert immediately adjacent to an existing Helitron, and that their movement may cause changes in the flanking

  12. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds.

    Science.gov (United States)

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-08-10

    A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of

  13. The KL24 gene cluster and a genomic island encoding a Wzy polymerase contribute genes needed for synthesis of the K24 capsular polysaccharide by the multiply antibiotic resistant Acinetobacter baumannii isolate RCH51.

    Science.gov (United States)

    Kenyon, Johanna J; Kasimova, Anastasiya A; Shneider, Mikhail M; Shashkov, Alexander S; Arbatsky, Nikolay P; Popova, Anastasiya V; Miroshnikov, Konstantin A; Hall, Ruth M; Knirel, Yuriy A

    2017-03-01

    The whole-genome sequence of the multiply antibiotic resistant Acinetobacter baumannii isolate RCH51 belonging to sequence type ST103 (Institut Pasteur scheme) revealed that the set of genes at the capsule locus, KL24, includes four genes predicted to direct the synthesis of 3-acetamido-3,6-dideoxy-d-galactose (d-Fuc3NAc), and this sugar was found in the capsular polysaccharide (CPS). One of these genes, fdtE, encodes a novel bifunctional protein with an N-terminal FdtA 3,4-ketoisomerase domain and a C-terminal acetyltransferase domain. KL24 lacks a gene encoding a Wzy polymerase to link the oligosaccharide K units to form the CPS found associated with isolate RCH51, and a wzy gene was found in a small genomic island (GI) near the cpn60 gene. This GI is in precisely the same location as another GI carrying wzy and atr genes recently found in several A. baumannii isolates, but it does not otherwise resemble it. The CPS isolated from RCH51, studied by sugar analysis and 1D and 2D 1H and 13C NMR spectroscopy, revealed that the K unit has a branched pentasaccharide structure made up of Gal, GalNAc and GlcNAc residues with d-Fuc3NAc as a side branch, and the K units are linked via a β-d-GlcpNAc-(1→3)-β-d-Galp linkage formed by the Wzy encoded by the GI. The functions of the glycosyltransferases encoded by KL24 were assigned to formation of specific bonds. A correspondence between the order of the genes in KL24 and other KL and the order of the linkages they form was noted, and this may be useful in future predictions of glycosyltransferase specificities.

  14. Genomic selection prediction accuracy in a perennial crop: case study of oil palm (Elaeis guineensis Jacq.).

    Science.gov (United States)

    Cros, David; Denis, Marie; Sánchez, Leopoldo; Cochard, Benoit; Flori, Albert; Durand-Gasselin, Tristan; Nouy, Bruno; Omoré, Alphonse; Pomiès, Virginie; Riou, Virginie; Suryana, Edyana; Bouvet, Jean-Marc

    2015-03-01

    Genomic selection empirically appeared valuable for reciprocal recurrent selection in oil palm as it could account for family effects and Mendelian sampling terms, despite small populations and low marker density. Genomic selection (GS) can increase the genetic gain in plants. In perennial crops, this is expected mainly through shortened breeding cycles and increased selection intensity, which requires sufficient GS accuracy in selection candidates, despite often small training populations. Our objective was to obtain the first empirical estimate of GS accuracy in oil palm (Elaeis guineensis), the major world oil crop. We used two parental populations involved in conventional reciprocal recurrent selection (Deli and Group B) with 131 individuals each, genotyped with 265 SSR. We estimated within-population GS accuracies when predicting breeding values of non-progeny-tested individuals for eight yield traits. We used three methods to sample training sets and five statistical methods to estimate genomic breeding values. The results showed that GS could account for family effects and Mendelian sampling terms in Group B but only for family effects in Deli. Presumably, this difference between populations originated from their contrasting breeding history. The GS accuracy ranged from -0.41 to 0.94 and was positively correlated with the relationship between training and test sets. Training sets optimized with the so-called CDmean criterion gave the highest accuracies, ranging from 0.49 (pulp to fruit ratio in Group B) to 0.94 (fruit weight in Group B). The statistical methods did not affect the accuracy. Finally, Group B could be preselected for progeny tests by applying GS to key yield traits, therefore increasing the selection intensity. Our results should be valuable for breeding programs with small populations, long breeding cycles, or reduced effective size.

  15. Dispositional optimism and perceived risk interact to predict intentions to learn genome sequencing results.

    Science.gov (United States)

    Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Biesecker, Leslie G; Biesecker, Barbara B

    2015-07-01

    Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. (c) 2015 APA, all rights reserved.

  16. OI-57, a Genomic Island of Escherichia coli O157, Is Present in Other Seropathotypes of Shiga Toxin-Producing E. coli Associated with Severe Human Disease▿

    Science.gov (United States)

    Imamovic, Lejla; Tozzoli, Rosangela; Michelacci, Valeria; Minelli, Fabio; Marziano, Maria Luisa; Caprioli, Alfredo; Morabito, Stefano

    2010-01-01

    Strains of Shiga toxin-producing Escherichia coli (STEC) are a heterogeneous E. coli group that may cause severe disease in humans. STEC have been categorized into seropathotypes (SPTs) based on their phenotypic and molecular characteristics and the clinical features of the associated diseases. SPTs range from A to E, according to a decreasing rank of pathogenicity. To define the virulence gene asset (“virulome”) characterizing the highly pathogenic SPTs, we used microarray hybridization to compare the whole genomes of STEC belonging to SPTs B, C, and D with that of STEC O157 (SPT A). The presence of the open reading frames (ORFs) associated with SPTs A and B was subsequently investigated by PCR in a larger panel of STEC and in other E. coli strains. A genomic island termed OI-57 was present in SPTs A and B but not in the other SPTs. OI-57 harbors the putative virulence gene adfO, encoding a factor enhancing the adhesivity of STEC O157, and ckf, encoding a putative killing factor for the bacterial cell. PCR analyses showed that OI-57 was present in its entirety in the majority of the STEC genomes examined, indicating that it represents a stable acquisition of the positive clonal lineages. OI-57 was also present in a high proportion of the human enteropathogenic E. coli genomes assayed, suggesting that it could be involved in the attaching-and-effacing colonization of the intestinal mucosa. In conclusion, OI-57 appears to be part of the virulome of pathogenic STEC and further studies are needed to elucidate its role in the pathogenesis of STEC infections. PMID:20823207

  17. Genotyping cows for the reference increase reliability of genomic prediction in a small breed

    DEFF Research Database (Denmark)

    Thomasen, Jørn Rind; Sørensen, Anders Christian; Lund, Mogens Sandø

    2013-01-01

    We hypothesized that adding cows to the reference population in a breed with a small number of reference bulls would increase reliabilities of genomic breeding values and genetic gain. We tested this premise by comparing two strategies for maintaining the reference population for genetic gain......, inbreeding and reliabilities of genomic predictions: 1) Adding 60 progeny tested bulls each year (B), and 2) in addition to 60 progeny tested bulls, adding 2,000 genotyped cows per year (C). Two breeding schemes were tested: 1) A turbo scheme (T) with only genotyped young bulls used intensively, and 2...... compared to the H-B, at the same level of ∆F. T-C yielded 15% higher ∆G compared t o T-B. Changing the breeding scheme from H-B to H-C increased ∆G by 5.5%. The lowest ∆F was observed with genotyping of cows. Reliabilities of GEBV in the C schemes showed a steep increase in reliability during the first...

  18. Insight into Potential Probiotic Markers Predicted in Lactobacillus pentosus MP-10 Genome Sequence

    Directory of Open Access Journals (Sweden)

    Hikmate Abriouel

    2017-05-01

    Full Text Available Lactobacillus pentosus MP-10 is a potential probiotic lactic acid bacterium originally isolated from naturally fermented Aloreña green table olives. The entire genome sequence was annotated to in silico analyze the molecular mechanisms involved in the adaptation of L. pentosus MP-10 to the human gastrointestinal tract (GIT, such as carbohydrate metabolism (related with prebiotic utilization and the proteins involved in bacteria–host interactions. We predicted an arsenal of genes coding for carbohydrate-modifying enzymes to modify oligo- and polysaccharides, such as glycoside hydrolases, glycoside transferases, and isomerases, and other enzymes involved in complex carbohydrate metabolism especially starch, raffinose, and levan. These enzymes represent key indicators of the bacteria’s adaptation to the GIT environment, since they involve the metabolism and assimilation of complex carbohydrates not digested by human enzymes. We also detected key probiotic ligands (surface proteins, excreted or secreted proteins involved in the adhesion to host cells such as adhesion to mucus, epithelial cells or extracellular matrix, and plasma components; also, moonlighting proteins or multifunctional proteins were found that could be involved in adhesion to epithelial cells and/or extracellular matrix proteins and also affect host immunomodulation. In silico analysis of the genome sequence of L. pentosus MP-10 is an important initial step to screen for genes encoding for proteins that may provide probiotic features, and thus provides one new routes for screening and studying this potentially probiotic bacterium.

  19. Accuracy of Genomic Prediction in a Commercial Perennial Ryegrass Breeding Program

    Directory of Open Access Journals (Sweden)

    Dario Fè

    2016-11-01

    Full Text Available The implementation of genomic selection (GS in plant breeding, so far, has been mainly evaluated in crops farmed as homogeneous varieties, and the results have been generally positive. Fewer results are available for species, such as forage grasses, that are grown as heterogenous families (developed from multiparent crosses in which the control of the genetic variation is far more complex. Here we test the potential for implementing GS in the breeding of perennial ryegrass ( L. using empirical data from a commercial forage breeding program. Biparental F and multiparental synthetic (SYN families of diploid perennial ryegrass were genotyped using genotyping-by-sequencing, and phenotypes for five different traits were analyzed. Genotypes were expressed as family allele frequencies, and phenotypes were recorded as family means. Different models for genomic prediction were compared by using practically relevant cross-validation strategies. All traits showed a highly significant level of genetic variance, which could be traced using the genotyping assay. While there was significant genotype × environment (G × E interaction for some traits, accuracies were high among F families and between biparental F and multiparental SYN families. We have demonstrated that the implementation of GS in grass breeding is now possible and presents an opportunity to make significant gains for various traits.

  20. Improving Genomic Prediction in Cassava Field Experiments by Accounting for Interplot Competition.

    Science.gov (United States)

    Elias, Ani A; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc

    2018-03-02

    Plants competing for available resources is an unavoidable phenomenon in a field. We conducted studies in cassava ( Manihot esculenta Crantz) in order to understand the pattern of this competition. Taking into account the competitive ability of genotypes while selecting parents for breeding advancement or commercialization can be very useful. We assumed that competition could occur at two levels: (i) the genotypic level, which we call interclonal, and (ii) the plot level irrespective of the type of genotype, which we call interplot competition or competition error. Modification in incidence matrices was applied in order to relate neighboring genotype/plot to the performance of a target genotype/plot with respect to its competitive ability. This was added into a genomic selection (GS) model to simultaneously predict the direct and competitive ability of a genotype. Predictability of the models was tested through a 10-fold cross-validation method repeated five times. The best model was chosen as the one with the lowest prediction root mean squared error (pRMSE) compared to that of the base model having no competitive component. Results from our real data studies indicated that value reached up to 25% with a GS-competition error model. We also found that the competitive influence of a cassava clone is not just limited to the adjacent neighbors but spreads beyond them. Through simulations, we found that a 26% increase of accuracy in estimating trait genotypic effect can be achieved even in the presence of high competitive variance. Copyright © 2018 Elias et al.

  1. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  2. Genome wide predictions of miRNA regulation by transcription factors.

    Science.gov (United States)

    Ruffalo, Matthew; Bar-Joseph, Ziv

    2016-09-01

    Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated. To enable genome wide predictions of TF-miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs. Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/ zivbj@cs.cmu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Genetic diversity and trait genomic prediction in a pea diversity panel.

    Science.gov (United States)

    Burstin, Judith; Salloignon, Pauline; Chabert-Martinello, Marianne; Magnin-Robert, Jean-Bernard; Siol, Mathieu; Jacquin, Françoise; Chauveau, Aurélie; Pont, Caroline; Aubert, Grégoire; Delaitre, Catherine; Truntzer, Caroline; Duc, Gérard

    2015-02-21

    Pea (Pisum sativum L.), a major pulse crop grown for its protein-rich seeds, is an important component of agroecological cropping systems in diverse regions of the world. New breeding challenges imposed by global climate change and new regulations urge pea breeders to undertake more efficient methods of selection and better take advantage of the large genetic diversity present in the Pisum sativum genepool. Diversity studies conducted so far in pea used Simple Sequence Repeat (SSR) and Retrotransposon Based Insertion Polymorphism (RBIP) markers. Recently, SNP marker panels have been developed that will be useful for genetic diversity assessment and marker-assisted selection. A collection of diverse pea accessions, including landraces and cultivars of garden, field or fodder peas as well as wild peas was characterised at the molecular level using newly developed SNP markers, as well as SSR markers and RBIP markers. The three types of markers were used to describe the structure of the collection and revealed different pictures of the genetic diversity among the collection. SSR showed the fastest rate of evolution and RBIP the slowest rate of evolution, pointing to their contrasted mode of evolution. SNP markers were then used to predict phenotypes -the date of flowering (BegFlo), the number of seeds per plant (Nseed) and thousand seed weight (TSW)- that were recorded for the collection. Different statistical methods were tested including the LASSO (Least Absolute Shrinkage ans Selection Operator), PLS (Partial Least Squares), SPLS (Sparse Partial Least Squares), Bayes A, Bayes B and GBLUP (Genomic Best Linear Unbiased Prediction) methods and the structure of the collection was taken into account in the prediction. Despite a limited number of 331 markers used for prediction, TSW was reliably predicted. The development of marker assisted selection has not reached its full potential in pea until now. This paper shows that the high-throughput SNP arrays that are being

  4. Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients.

    Science.gov (United States)

    Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock

    2017-09-29

    Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.

  5. Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers

    Directory of Open Access Journals (Sweden)

    Litonjua Augusto A

    2011-06-01

    Full Text Available Abstract Background Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. To date, disease prediction has been based on a few environmental factors and/or single nucleotide polymorphisms (SNPs, while complex diseases are usually affected by many genetic and environmental factors with each factor contributing a small portion to the outcome. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics. Methods In this study, using emergency room visits or hospitalizations as the definition of a severe asthma exacerbation, we first identified a list of top Genome Wide Association Study (GWAS SNPs ranked by Random Forests (RF importance score for the CAMP (Childhood Asthma Management Program population of 127 exacerbation cases and 290 non-exacerbation controls. We predict severe asthma exacerbations using the top 10 to 320 SNPs together with age, sex, pre-bronchodilator FEV1 percentage predicted, and treatment group. Results Testing in an independent set of the CAMP population shows that severe asthma exacerbations can be predicted with an Area Under the Curve (AUC = 0.66 with 160-320 SNPs in comparison to an AUC score of 0.57 with 10 SNPs. Using the clinical traits alone yielded AUC score of 0.54, suggesting the phenotype is affected by genetic as well as environmental factors. Conclusions Our study shows that a random forests algorithm can effectively extract and use the information contained in a small number of samples. Random forests, and other machine learning tools, can be used with GWAS studies to integrate large numbers of predictors simultaneously.

  6. A two step Bayesian approach for genomic prediction of breeding values

    DEFF Research Database (Denmark)

    Mahdi Shariati, Mohammad; Sørensen, Peter; Janss, Luc

    2012-01-01

    . A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. Methods: The simulated data from the 15th QTL......Background: In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter......-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based...

  7. Genomic Features That Predict Allelic Imbalance in Humans Suggest Patterns of Constraint on Gene Expression Variation

    Science.gov (United States)

    Fédrigo, Olivier; Haygood, Ralph; Mukherjee, Sayan; Wray, Gregory A.

    2009-01-01

    Variation in gene expression is an important contributor to phenotypic diversity within and between species. Although this variation often has a genetic component, identification of the genetic variants driving this relationship remains challenging. In particular, measurements of gene expression usually do not reveal whether the genetic basis for any observed variation lies in cis or in trans to the gene, a distinction that has direct relevance to the physical location of the underlying genetic variant, and which may also impact its evolutionary trajectory. Allelic imbalance measurements identify cis-acting genetic effects by assaying the relative contribution of the two alleles of a cis-regulatory region to gene expression within individuals. Identification of patterns that predict commonly imbalanced genes could therefore serve as a useful tool and also shed light on the evolution of cis-regulatory variation itself. Here, we show that sequence motifs, polymorphism levels, and divergence levels around a gene can be used to predict commonly imbalanced genes in a human data set. Reduction of this feature set to four factors revealed that only one factor significantly differentiated between commonly imbalanced and nonimbalanced genes. We demonstrate that these results are consistent between the original data set and a second published data set in humans obtained using different technical and statistical methods. Finally, we show that variation in the single allelic imbalance-associated factor is partially explained by the density of genes in the region of a target gene (allelic imbalance is less probable for genes in gene-dense regions), and, to a lesser extent, the evenness of expression of the gene across tissues and the magnitude of negative selection on putative regulatory regions of the gene. These results suggest that the genomic distribution of functional cis-regulatory variants in the human genome is nonrandom, perhaps due to local differences in evolutionary

  8. Genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts.

    Science.gov (United States)

    van der Lee, Robin; Wiel, Laurens; van Dam, Teunis J P; Huynen, Martijn A

    2017-10-13

    Hotspots of rapid genome evolution hold clues about human adaptation. We present a comparative analysis of nine whole-genome sequenced primates to identify high-confidence targets of positive selection. We find strong statistical evidence for positive selection in 331 protein-coding genes (3%), pinpointing 934 adaptively evolving codons (0.014%). Our new procedure is stringent and reveals substantial artefacts (20% of initial predictions) that have inflated previous estimates. The final 331 positively selected genes (PSG) are strongly enriched for innate and adaptive immunity, secreted and cell membrane proteins (e.g. pattern recognition, complement, cytokines, immune receptors, MHC, Siglecs). We also find evidence for positive selection in reproduction and chromosome segregation (e.g. centromere-associated CENPO, CENPT), apolipoproteins, smell/taste receptors and mitochondrial proteins. Focusing on the virus-host interaction, we retrieve most evolutionary conflicts known to influence antiviral activity (e.g. TRIM5, MAVS, SAMHD1, tetherin) and predict 70 novel cases through integration with virus-human interaction data. Protein structure analysis further identifies positive selection in the interaction interfaces between viruses and their cellular receptors (CD4-HIV; CD46-measles, adenoviruses; CD55-picornaviruses). Finally, primate PSG consistently show high sequence variation in human exomes, suggesting ongoing evolution. Our curated dataset of positive selection is a rich source for studying the genetics underlying human (antiviral) phenotypes. Procedures and data are available at https://github.com/robinvanderlee/positive-selection. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. The master regulator of IncA/C plasmids is recognized by the Salmonella Genomic island SGI1 as a signal for excision and conjugal transfer.

    Science.gov (United States)

    Kiss, János; Papp, Péter Pál; Szabó, Mónika; Farkas, Tibor; Murányi, Gábor; Szakállas, Erik; Olasz, Ferenc

    2015-10-15

    The genomic island SGI1 and its variants, the important vehicles of multi-resistance in Salmonella strains, are integrative elements mobilized exclusively by the conjugative IncA/C plasmids. Integration and excision of the island are carried out by the SGI1-encoded site-specific recombinase Int and the recombination directionality factor Xis. Chromosomal integration ensures the stable maintenance and vertical transmission of SGI1, while excision is the initial step of horizontal transfer, followed by conjugation and integration into the recipient. We report here that SGI1 not only exploits the conjugal apparatus of the IncA/C plasmids but also utilizes the regulatory mechanisms of the conjugation system for the exact timing and activation of excision to ensure efficient horizontal transfer. This study demonstrates that the FlhDC-family activator AcaCD, which regulates the conjugation machinery of the IncA/C plasmids, serves as a signal of helper entry through binding to SGI1 xis promoter and activating SGI1 excision. Promoters of int and xis genes have been identified and the binding site of the activator has been located by footprinting and deletion analyses. We prove that expression of xis is activator-dependent while int is constitutively expressed, and this regulatory mechanism is presumably responsible for the efficient transfer and stable maintenance of SGI1. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Genomic Variability of O Islands Encoding Tellurite Resistance in Enterohemorrhagic Escherichia coli O157:H7 Isolates

    OpenAIRE

    Taylor, Diane E.; Rooker, Michelle; Keelan, Monika; Ng, Lai-King; Martin, Irene; Perna, Nicole T.; Burland, N. T. Valerie; Blattner, Fredrick R.

    2002-01-01

    Strains of Escherichia coli causing enterohemorrhagic colitis belonging to the O157:H7 lineage are reported to be highly related. Fifteen strains of E. coli O157:H7 and 1 strain of E. coli O46:H− (nonflagellated) were examined for the presence of potassium tellurite resistance (Ter). Ter genes comprising terABCDEF were shown previously to be part of a pathogenicity island also containing integrase, phage, and urease genes. PCR analysis, both conventional and light cycler based, demonstrated t...

  11. Prospects of Genomic Prediction in the USDA Soybean Germplasm Collection: Historical Data Creates Robust Models for Enhancing Selection of Accessions

    Directory of Open Access Journals (Sweden)

    Diego Jarquin

    2016-08-01

    Full Text Available The identification and mobilization of useful genetic variation from germplasm banks for use in breeding programs is critical for future genetic gain and protection against crop pests. Plummeting costs of next-generation sequencing and genotyping is revolutionizing the way in which researchers and breeders interface with plant germplasm collections. An example of this is the high density genotyping of the entire USDA Soybean Germplasm Collection. We assessed the usefulness of 50K single nucleotide polymorphism data collected on 18,480 domesticated soybean (Glycine max accessions and vast historical phenotypic data for developing genomic prediction models for protein, oil, and yield. Resulting genomic prediction models explained an appreciable amount of the variation in accession performance in independent validation trials, with correlations between predicted and observed reaching up to 0.92 for oil and protein and 0.79 for yield. The optimization of training set design was explored using a series of cross-validation schemes. It was found that the target population and environment need to be well represented in the training set. Second, genomic prediction training sets appear to be robust to the presence of data from diverse geographical locations and genetic clusters. This finding, however, depends on the influence of shattering and lodging, and may be specific to soybean with its presence of maturity groups. The distribution of 7608 nonphenotyped accessions was examined through the application of genomic prediction models. The distribution of predictions of phenotyped accessions was representative of the distribution of predictions for nonphenotyped accessions, with no nonphenotyped accessions being predicted to fall far outside the range of predictions of phenotyped accessions.

  12. Predictive ability of genomic selection models for breeding value estimation on growth traits of Pacific white shrimp Litopenaeus vannamei

    Science.gov (United States)

    Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai

    2017-09-01

    Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.

  13. A two step Bayesian approach for genomic prediction of breeding values.

    Science.gov (United States)

    Shariati, Mohammad M; Sørensen, Peter; Janss, Luc

    2012-05-21

    In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter. A better alternative could be to form clusters of markers with similar effects where markers in a cluster have a common variance. Therefore, the influence of each marker group of size p on the posterior distribution of the marker variances will be p df. The simulated data from the 15th QTL-MAS workshop were analyzed such that SNP markers were ranked based on their effects and markers with similar estimated effects were grouped together. In step 1, all markers with minor allele frequency more than 0.01 were included in a SNP-BLUP prediction model. In step 2, markers were ranked based on their estimated variance on the trait in step 1 and each 150 markers were assigned to one group with a common variance. In further analyses, subsets of 1500 and 450 markers with largest effects in step 2 were kept in the prediction model. Grouping markers outperformed SNP-BLUP model in terms of accuracy of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker variances increases. A prior knowledge of the genetic architecture of the trait is necessary for clustering markers and appropriate prior parameterization.

  14. Demethylation by 5-aza-2'-deoxycytidine in colorectal cancer cells targets genomic DNA whilst promoter CpG island methylation persists

    International Nuclear Information System (INIS)

    Mossman, David; Kim, Kyu-Tae; Scott, Rodney J

    2010-01-01

    DNA methylation and histone acetylation are epigenetic modifications that act as regulators of gene expression. Aberrant epigenetic gene silencing in tumours is a frequent event, yet the factors which dictate which genes are targeted for inactivation are unknown. DNA methylation and histone acetylation can be modified with the chemical agents 5-aza-2'-deoxycytidine (5-aza-dC) and Trichostatin A (TSA) respectively. The aim of this study was to analyse de-methylation and re-methylation and its affect on gene expression in colorectal cancer cell lines treated with 5-aza-dC alone and in combination with TSA. We also sought to identify methylation patterns associated with long term reactivation of previously silenced genes. Colorectal cancer cell lines were treated with 5-aza-dC, with and without TSA, to analyse global methylation decreases by High Performance Liquid Chromatography (HPLC). Re-methylation was observed with removal of drug treatments. Expression arrays identified silenced genes with differing patterns of expression after treatment, such as short term reactivation or long term reactivation. Sodium bisulfite sequencing was performed on the CpG island associated with these genes and expression was verified with real time PCR. Treatment with 5-aza-dC was found to affect genomic methylation and to a lesser extent gene specific methylation. Reactivated genes which remained expressed 10 days post 5-aza-dC treatment featured hypomethylated CpG sites adjacent to the transcription start site (TSS). In contrast, genes with uniformly hypermethylated CpG islands were only temporarily reactivated. These results imply that 5-aza-dC induces strong de-methylation of the genome and initiates reactivation of transcriptionally inactive genes, but this does not require gene associated CpG island de-methylation to occur. In addition, for three of our selected genes, hypomethylation at the TSS of an epigenetically silenced gene is associated with the long term reversion of

  15. Genomic scar signatures associated with homologous recombination deficiency predict adverse clinical outcomes in patients with ovarian clear cell carcinoma.

    Science.gov (United States)

    Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Jung, Shih-Ming; Lee, Yun-Shien; Chang, Wei-Yang; Yang, Lan-Yang; Ku, Fei-Chun; Huang, Huei-Jean; Chao, An-Shine; Wang, Chin-Jung; Chang, Ting-Chang; Wu, Ren-Chin

    2018-05-03

    We investigated whether genomic scar signatures associated with homologous recombination deficiency (HRD), which include telomeric allelic imbalance (TAI), large-scale transition (LST), and loss of heterozygosity (LOH), can predict clinical outcomes in patients with ovarian clear cell carcinoma (OCCC). We enrolled patients with OCCC (n = 80) and high-grade serous carcinoma (HGSC; n = 92) subjected to primary cytoreductive surgery, most of whom received platinum-based adjuvant chemotherapy. Genomic scar signatures based on genome-wide copy number data were determined in all participants and investigated in relation to prognosis. OCCC had significantly lower genomic scar signature scores than HGSC (p < 0.001). Near-triploid OCCC specimens showed higher TAI and LST scores compared with diploid tumors (p < 0.001). While high scores of these genomic scar signatures were significantly associated with better clinical outcomes in patients with HGSC, the opposite was evident for OCCC. Multivariate survival analysis in patients with OCCC identified high LOH scores as the main independent adverse predictor for both cancer-specific (hazard ratio [HR] = 3.22, p = 0.005) and progression-free survival (HR = 2.54, p = 0.01). In conclusion, genomic scar signatures associated with HRD predict adverse clinical outcomes in patients with OCCC. The LOH score was identified as the strongest prognostic indicator in this patient group. Genomic scar signatures associated with HRD are less frequent in OCCC than in HGSC. Genomic scar signatures associated with HRD have an adverse prognostic impact in patients with OCCC. LOH score is the strongest adverse prognostic factor in patients with OCCC.

  16. Microdiversification of a Pelagic Polynucleobacter Species Is Mainly Driven by Acquisition of Genomic Islands from a Partially Interspecific Gene Pool

    Czech Academy of Sciences Publication Activity Database

    Hoetzinger, M.; Schmidt, J.; Jezberová, Jitka; Koll, U.; Hahn, M.W.

    2017-01-01

    Roč. 83, č. 3 (2017), č. článku e02266-16. ISSN 0099-2240 Institutional support: RVO:60077344 Keywords : Polynucleobacter * ecophysiology * environmental genomics * functional diversity Subject RIV: EE - Microbiology, Virology OBOR OECD: Microbiology Impact factor: 3.807, year: 2016

  17. Prediction of arsenic and antimony transporter major intrinsic proteins from the genomes of crop plants.

    Science.gov (United States)

    Azad, Abul Kalam; Ahmed, Jahed; Alum, Md Asraful; Hasan, Md Mahbub; Ishikawa, Takahiro; Sawa, Yoshihiro

    2018-02-01

    Major intrinsic proteins (MIPs), commonly known as aquaporins, transport water and non-polar small solutes. Comparing the 3D models and the primary selectivity-related motifs (two Asn-Pro-Ala (NPA) regions, the aromatic/arginine (ar/R) selectivity filter, and Froger's positions (FPs)) of all plant MIPs that have been experimentally proven to transport arsenic (As) and antimony (Sb), some substrate-specific signature sequences (SSSS) or specificity determining sites (SDPs) have been predicted. These SSSS or SDPs were determined in 543 MIPs found in the genomes of 12 crop plants; the As and Sb transporters were predicted to be distributed in noduline-26 like intrinsic proteins (NIPs), and every plant had one or several As and Sb transporter NIPs. Phylogenetic grouping of the NIP subfamily based on the ar/R selectivity filter and FPs were linked to As and Sb transport. We further determined the group-wise substrate selectivity profiles of the NIPs in the 12 crop plants. In addition to two NPA regions, the ar/R filter, and FPs, certain amino acids especially in the pore line, loop D, and termini contribute to the functional distinctiveness of the NIP groups. Expression analysis of transcripts in different organs indicated that most of the As and Sb transporter NIPs were expressed in roots. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Klebsiella pneumoniae asparagine tDNAs are integration hotspots for different genomic islands encoding microcin E492 production determinants and other putative virulence factors present in hypervirulent strains

    Directory of Open Access Journals (Sweden)

    Andrés Esteban Marcoleta

    2016-06-01

    Full Text Available Due to the developing of multi-resistant and invasive hypervirulent strains, Klebsiella pneumoniae has become one of the most urgent bacterial pathogen threats in the last years. Genomic comparison of a growing number of sequenced isolates has allowed the identification of putative virulence factors, proposed to be acquirable mainly through horizontal gene transfer. In particular, those related with synthesizing the antibacterial peptide microcin E492 (MccE492 and salmochelin siderophores were found to be highly prevalent among hypervirulent strains. The determinants for the production of both molecules were first reported as part of a 13-kbp segment of K. pneumoniae RYC492 chromosome, and were cloned and characterized in E. coli. However, the genomic context of this segment in K. pneumoniae remained uncharacterized.In this work we provided experimental and bioinformatics evidence indicating that the MccE492 cluster is part of a highly conserved 23-kbp genomic island (GI named GIE492, that was integrated in a specific asparagine-tRNA gene (asn-tDNA and was found in a high proportion of isolates from liver abscesses sampled around the world. This element resulted to be unstable and its excision frequency increased after treating bacteria with mytomicin C and upon the overexpression of the island-encoded integrase. Besides the MccE492 genetic cluster, it invariably included an integrase-coding gene, at least 7 protein-coding genes of unknown function, and a putative transfer origin that possibly allows this GI to be mobilized through conjugation. In addition, we analyzed the asn-tDNA loci of all the available K. pneumoniae assembled chromosomes to evaluate them as GI-integration sites. Remarkably, 73% of the strains harbored at least one GI integrated in one of the four asn-tDNA present in this species, confirming them as integration hotspots. Each of these tDNAs was occupied with different frequencies, although they were 100% identical. Also, we

  19. K19 capsular polysaccharide of Acinetobacter baumannii is produced via a Wzy polymerase encoded in a small genomic island rather than the KL19 capsule gene cluster.

    Science.gov (United States)

    Kenyon, Johanna J; Shneider, Mikhail M; Senchenkova, Sofya N; Shashkov, Alexander S; Siniagina, Maria N; Malanin, Sergey Y; Popova, Anastasiya V; Miroshnikov, Konstantin A; Hall, Ruth M; Knirel, Yuriy A

    2016-08-01

    Polymerization of the oligosaccharides (K units) of complex capsular polysaccharides (CPSs) requires a Wzy polymerase, which is usually encoded in the gene cluster that directs K unit synthesis. Here, a gene cluster at the Acinetobacter K locus (KL) that lacks a wzy gene, KL19, was found in Acinetobacter baumannii ST111 isolates 28 and RBH2 recovered from hospitals in the Russian Federation and Australia, respectively. However, these isolates produced long-chain capsule, and a wzy gene was found in a 6.1 kb genomic island (GI) located adjacent to the cpn60 gene. The GI also includes an acetyltransferase gene, atr25, which is interrupted by an insertion sequence (IS) in RBH2. The capsule structure from both strains was →3)-α-d-GalpNAc-(1→4)-α-d-GalpNAcA-(1→3)-β-d-QuipNAc4NAc-(1→, determined using NMR spectroscopy. Biosynthesis of the K unit was inferred to be initiated with QuiNAc4NAc, and hence the Wzy forms the β-(1→3) linkage between QuipNAc4NAc and GalpNAc. The GalpNAc residue is 6-O-acetylated in isolate 28 only, showing that atr25 is responsible for this acetylation. The same GI with or without an IS in atr25 was found in draft genomes of other KL19 isolates, as well as ones carrying a closely related CPS gene cluster, KL39, which differs from KL19 only in a gene for an acyltransferase in the QuiNAc4NR synthesis pathway. Isolates carrying a KL1 variant with the wzy and atr genes each interrupted by an ISAba125 also have this GI. To our knowledge, this study is the first report of genes involved in capsule biosynthesis normally found at the KL located elsewhere in A. baumannii genomes.

  20. Equations for predicting biomass of six introduced tree species, island of Hawaii

    Science.gov (United States)

    Thomas H. Schukrt; Robert F. Strand; Thomas G. Cole; Katharine E. McDuffie

    1988-01-01

    Regression equations to predict total and stem-only above-ground dry biomass for six species (Acacia melanoxylon, Albizio falcataria, Eucalyptus globulus, E. grandis, E. robusta, and E. urophylla) were developed by felling and measuring 2- to 6-year-old...

  1. Genome-scale prediction of proteins with long intrinsically disordered regions.

    Science.gov (United States)

    Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz

    2014-01-01

    Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.

  2. Resource allocation for maximizing prediction accuracy and genetic gain of genomic selection in plant breeding: a simulation experiment.

    Science.gov (United States)

    Lorenz, Aaron J

    2013-03-01

    Allocating resources between population size and replication affects both genetic gain through phenotypic selection and quantitative trait loci detection power and effect estimation accuracy for marker-assisted selection (MAS). It is well known that because alleles are replicated across individuals in quantitative trait loci mapping and MAS, more resources should be allocated to increasing population size compared with phenotypic selection. Genomic selection is a form of MAS using all marker information simultaneously to predict individual genetic values for complex traits and has widely been found superior to MAS. No studies have explicitly investigated how resource allocation decisions affect success of genomic selection. My objective was to study the effect of resource allocation on response to MAS and genomic selection in a single biparental population of doubled haploid lines by using computer simulation. Simulation results were compared with previously derived formulas for the calculation of prediction accuracy under different levels of heritability and population size. Response of prediction accuracy to resource allocation strategies differed between genomic selection models (ridge regression best linear unbiased prediction [RR-BLUP], BayesCπ) and multiple linear regression using ordinary least-squares estimation (OLS), leading to different optimal resource allocation choices between OLS and RR-BLUP. For OLS, it was always advantageous to maximize population size at the expense of replication, but a high degree of flexibility was observed for RR-BLUP. Prediction accuracy of doubled haploid lines included in the training set was much greater than of those excluded from the training set, so there was little benefit to phenotyping only a subset of the lines genotyped. Finally, observed prediction accuracies in the simulation compared well to calculated prediction accuracies, indicating these theoretical formulas are useful for making resource allocation

  3. Back from a predicted climatic extinction of an island endemic: a future for the Corsican Nuthatch.

    Directory of Open Access Journals (Sweden)

    Morgane Barbet-Massin

    Full Text Available The Corsican Nuthatch (Sitta whiteheadi is red-listed as vulnerable to extinction by the IUCN because of its endemism, reduced population size, and recent decline. A further cause is the fragmentation and loss of its spatially-restricted favourite habitat, the Corsican pine (Pinus nigra laricio forest. In this study, we aimed at estimating the potential impact of climate change on the distribution of the Corsican Nuthatch using species distribution models. Because this species has a strong trophic association with the Corsican and Maritime pines (P. nigra laricio and P. pinaster, we first modelled the current and future potential distribution of both pine species in order to use them as habitat variables when modelling the nuthatch distribution. However, the Corsican pine has suffered large distribution losses in the past centuries due to the development of anthropogenic activities, and is now restricted to mountainous woodland. As a consequence, its realized niche is likely significantly smaller than its fundamental niche, so that a projection of the current distribution under future climatic conditions would produce misleading results. To obtain a predicted pine distribution at closest to the geographic projection of the fundamental niche, we used available information on the current pine distribution associated to information on the persistence of isolated natural pine coppices. While common thresholds (maximizing the sum of sensitivity and specificity predicted a potential large loss of the Corsican Nuthatch distribution by 2100, the use of more appropriate thresholds aiming at getting closer to the fundamental distribution of the Corsican pine predicted that 98% of the current presence points should remain potentially suitable for the nuthatch and its range could be 10% larger in the future. The habitat of the endemic Corsican Nuthatch is therefore more likely threatened by an increasing frequency and intensity of wildfires or anthropogenic

  4. The dog and cat population on Maio Island, Cape Verde: characterisation and prediction based on household survey and remotely sensed imagery.

    Science.gov (United States)

    Lopes Antunes, Ana Carolina; Ducheyne, Els; Bryssinckx, Ward; Vieira, Sara; Malta, Manuel; Vaz, Yolanda; Nunes, Telmo; Mintiens, Koen

    2015-11-04

    The objective was to estimate and characterise the dog and cat population on Maio Island, Cape Verde. Remotely sensed imagery was used to document the number of houses across the island and a household survey was carried out in six administrative areas recording the location of each animal using a global positioning system instrument. Linear statistical models were applied to predict the dog and cat populations based on the number of houses found and according to various levels of data aggregation. In the surveyed localities, a total of 457 dogs and 306 cats were found. The majority of animals had owners and only a few had free access to outdoor activities. The estimated population size was 531 dogs [95% confidence interval (CI): 453-609] and 354 cats (95% CI: 275-431). Stray animals were not a concern on the island in contrast to the rest of the country.

  5. Validation of genomic predictions for wellness traits in US Holstein cows.

    Science.gov (United States)

    McNeel, Anthony K; Reiter, Brenda C; Weigel, Dan; Osterstock, Jason; Di Croce, Fernando A

    2017-11-01

    The objective of this study was to evaluate the efficacy of wellness trait genetic predictions in commercial herds of US Holstein cows from herds that do not contribute phenotypic information to the evaluation. Tissue samples for DNA extraction were collected from more than 3,400 randomly selected pregnant Holstein females in 11 herds and 2 age groups (69% nulliparous, 31% primiparous) approximately 30 to 60 d before their expected calving date. Lactation records from cows that calved between September 1, 2015, and December 31, 2015, were included in the analysis. Genomically enhanced predicted transmitting abilities for the wellness traits of retained placenta, metritis, ketosis, displaced abomasum, mastitis, and lameness were estimated by the Zoetis genetic evaluation and converted into standardized transmitting abilities. Mean reliabilities of the animals in the study ranged between 45 and 47% for each of the 6 traits. Animals were ranked by their standardized transmitting abilities within herd and age group then assigned to 1 of 4 groups of percentile-based genetic groups of equal size. Adverse health events, including retained placenta, metritis, ketosis, displaced abomasum, mastitis, and lameness, were collected from on-farm herd management software, and animal phenotype was coded as either healthy (0), diseased (1), or excluded for each of the 6 outcomes of interest. Statistical analysis was performed using a generalized linear mixed model with genetic group, age group, and lactation as fixed effects, whereas herd and animal nested within herd were set as random effects. Results of the analysis indicated that the wellness trait predictions were associated with differences in phenotypic disease incidence between the worst and best genetic groups. The difference between the worst and best genetic groups in recorded disease incidence was 2.9% for retained placenta, 10.8% for metritis, 1.1% for displaced abomasum, 1.7% for ketosis, 7.4% for mastitis, and 3

  6. The Accuracy and Bias of Single-Step Genomic Prediction for Populations Under Selection

    Directory of Open Access Journals (Sweden)

    Wan-Ling Hsu

    2017-08-01

    Full Text Available In single-step analyses, missing genotypes are explicitly or implicitly imputed, and this requires centering the observed genotypes using the means of the unselected founders. If genotypes are only available for selected individuals, centering on the unselected founder mean is not straightforward. Here, computer simulation is used to study an alternative analysis that does not require centering genotypes but fits the mean μg of unselected individuals as a fixed effect. Starting with observed diplotypes from 721 cattle, a five-generation population was simulated with sire selection to produce 40,000 individuals with phenotypes, of which the 1000 sires had genotypes. The next generation of 8000 genotyped individuals was used for validation. Evaluations were undertaken with (J or without (N μg when marker covariates were not centered; and with (JC or without (C μg when all observed and imputed marker covariates were centered. Centering did not influence accuracy of genomic prediction, but fitting μg did. Accuracies were improved when the panel comprised only quantitative trait loci (QTL; models JC and J had accuracies of 99.4%, whereas models C and N had accuracies of 90.2%. When only markers were in the panel, the 4 models had accuracies of 80.4%. In panels that included QTL, fitting μg in the model improved accuracy, but had little impact when the panel contained only markers. In populations undergoing selection, fitting μg in the model is recommended to avoid bias and reduction in prediction accuracy due to selection.

  7. Predicting transcription factor binding sites using local over-representation and comparative genomics

    Directory of Open Access Journals (Sweden)

    Touzet Hélène

    2006-08-01

    Full Text Available Abstract Background Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms. Results We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets. Conclusion TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at http://bioinfo.lifl.fr/TFM-Explorer.

  8. Genomic-Enabled Prediction Based on Molecular Markers and Pedigree Using the Bayesian Linear Regression Package in R

    Directory of Open Access Journals (Sweden)

    Paulino Pérez

    2010-09-01

    Full Text Available The availability of dense molecular markers has made possible the use of genomic selection in plant and animal breeding. However, models for genomic selection pose several computational and statistical challenges and require specialized computer programs, not always available to the end user and not implemented in standard statistical software yet. The R-package BLR (Bayesian Linear Regression implements several statistical procedures (e.g., Bayesian Ridge Regression, Bayesian LASSO in a unified framework that allows including marker genotypes and pedigree data jointly. This article describes the classes of models implemented in the BLR package and illustrates their use through examples. Some challenges faced when applying genomic-enabled selection, such as model choice, evaluation of predictive ability through cross-validation, and choice of hyper-parameters, are also addressed.

  9. Numerical Analysis of Soil Settlement Prediction and Its Application In Large-Scale Marine Reclamation Artificial Island Project

    Directory of Open Access Journals (Sweden)

    Zhao Jie

    2017-11-01

    Full Text Available In an artificial island construction project based on the large-scale marine reclamation land, the soil settlement is a key to affect the late safe operation of the whole field. To analyze the factors of the soil settlement in a marine reclamation project, the SEM method in the soil micro-structural analysis method is used to test and study six soil samples such as the representative silt, mucky silty clay, silty clay and clay in the area. The structural characteristics that affect the soil settlement are obtained by observing the SEM charts at different depths. By combining numerical calculation method of Terzaghi’s one-dimensional and Biot’s two-dimensional consolidation theory, the one-dimensional and two-dimensional creep models are established and the numerical calculation results of two consolidation theories are compared in order to predict the maximum settlement of the soils 100 years after completion. The analysis results indicate that the micro-structural characteristics are the essential factor to affect the settlement in this area. Based on numerical analysis of one-dimensional and two-dimensional settlement, the settlement law and trend obtained by two numerical analysis method is similar. The analysis of this paper can provide reference and guidance to the project related to the marine reclamation land.

  10. Accuracy of Igenity genomically estimated breeding values for predicting Australian Angus BREEDPLAN traits.

    Science.gov (United States)

    Boerner, V; Johnston, D; Wu, X-L; Bauck, S

    2015-02-01

    Genomically estimated breeding values (GEBV) for Angus beef cattle are available from at least 2 commercial suppliers (Igenity [http://www.igenity.com] and Zoetis [http://www.zoetis.com]). The utility of these GEBV for improving genetic evaluation depends on their accuracies, which can be estimated by the genetic correlation with phenotypic target traits. Genomically estimated breeding values of 1,032 Angus bulls calculated from prediction equations (PE) derived by 2 different procedures in the U.S. Angus population were supplied by Igenity. Both procedures were based on Illuminia BovineSNP50 BeadChip genotypes. In procedure sg, GEBV were calculated from PE that used subsets of only 392 SNP, where these subsets were individually selected for each trait by BayesCπ. In procedure rg GEBV were calculated from PE derived in a ridge regression approach using all available SNP. Because the total set of 1,032 bulls with GEBV contained 732 individuals used in the Igenity training population, GEBV subsets were formed characterized by a decreasing average relationship between individuals in the subsets and individuals in the training population. Accuracies of GEBV were estimated as genetic correlations between GEBV and their phenotypic target traits modeling GEBV as trait observations in a bivariate REML approach, in which phenotypic observations were those recorded in the commercial Australian Angus seed stock sector. Using results from the GEBV subset excluding all training individuals as a reference, estimated accuracies were generally in agreement with those already published, with both types of GEBV (sg and rg) yielding similar results. Accuracies for growth traits ranged from 0.29 to 0.45, for reproductive traits from 0.11 to 0.53, and for carcass traits from 0.3 to 0.75. Accuracies generally decreased with an increasing genetic distance between the training and the validation population. However, for some carcass traits characterized by a low number of phenotypic

  11. The RadGenomics project. Prediction for radio-susceptibility of individuals with genetic predisposition

    International Nuclear Information System (INIS)

    Imai, Takashi

    2003-01-01

    The ultimate goal of our project, named RadGenomics, is to elucidate the heterogeneity of the response to ionizing radiation arising from genetic variation among individuals, for the purpose of developing personalized radiation therapy regimens for cancer patients. Cancer patients exhibit patient-to-patient variability in normal tissue reactions after radiotherapy. Several observations support the hypothesis that the radiosensitivity of normal tissue is influenced by genetic factors. The rapid progression of human genome sequencing and the recent development of new technologies in molecular biology are providing new opportunities for elucidating the genetic basis of individual differences in susceptibility to radiation exposure. The development of a sufficiently robust, predictive assay enabling individual dose adjustment would improve the outcome of radiation therapy in patients. Our strategy for identification of DNA polymorphisms that contribute to the individual radiosensitivity is as follows. First, we have been categorizing DNA samples obtained from cancer patients, who have been kindly introduced to us through many collaborators, according to their clinical characteristics including the method and effect of treatment and side effects as scored by toxicity criteria, and also the result of an in vitro radiosensitivity assay, e.g., the micronuclei assay of their lymphocytes. Second, we have identified candidate genes for genotyping mainly by using our custom-designed oligonucleotide array with RNA samples, in which the probes were obtained from more than 40 cancer and 3 fibroblast cell lines whose radiosensitivity level was quite heterogeneous. We have also been studying the modification of proteins after irradiation of cells which may be caused by mainly phosphorylation or dephosphorylation, using mass spectrometry. Genes encoding the modified proteins and/or other proteins with which they interact such as specific protein kinases and phosphatases are also

  12. Genomic Variability of O Islands Encoding Tellurite Resistance in Enterohemorrhagic Escherichia coli O157:H7 Isolates

    Science.gov (United States)

    Taylor, Diane E.; Rooker, Michelle; Keelan, Monika; Ng, Lai-King; Martin, Irene; Perna, Nicole T.; Burland, N. T. Valerie; Blattner, Fredrick R.

    2002-01-01

    Strains of Escherichia coli causing enterohemorrhagic colitis belonging to the O157:H7 lineage are reported to be highly related. Fifteen strains of E. coli O157:H7 and 1 strain of E. coli O46:H− (nonflagellated) were examined for the presence of potassium tellurite resistance (Ter). Ter genes comprising terABCDEF were shown previously to be part of a pathogenicity island also containing integrase, phage, and urease genes. PCR analysis, both conventional and light cycler based, demonstrated that about one-half of the Ter E. coli O157:H7 strains (6 of 15), including the Sakai strain, which has been sequenced, carried a single copy of the Ter genes. Five of the strains, including EDL933, which has also been sequenced, contained two copies. Three other O157:H7 strains and the O46:H− strain did not contain the Ter genes. In strains containing two copies, the Ter genes were associated with the serW and serX tRNA genes. Five O157:H7 strains resembled the O157 Sakai strain whose sequence contained one copy, close to serX, whereas in one isolate the single copy was associated with serW. There was no correlation between Ter and the ability to produce Shiga toxin ST1 or ST2. The Ter MIC for most strains, containing either one or two copies, was 1,024 μg/ml, although for a few the MIC was intermediate, 64 to 128 μg/ml, which could be increased to 512 μg/ml by pregrowth of strains in subinhibitory concentrations of potassium tellurite. Reverse transcriptase PCR analysis confirmed that in most strains Ter was constitutive but that in the rest it was inducible and involved induction of terB and terC genes. Only the terB, -C, -D, and -E genes are required for Ter. The considerable degree of homology between the ter genes on IncH12 plasmid R478, which originated in Serratia marcescens, and pTE53, from an E. coli clinical isolate, suggests that the pathogenicity island was acquired from a plasmid. This work demonstrates diversity among E. coli O157:H7 isolates, at least as

  13. Application of Genome Wide Association and Genomic Prediction for Improvement of Cacao Productivity and Resistance to Black and Frosty Pod Diseases

    Directory of Open Access Journals (Sweden)

    J. Alberto Romero Navarro

    2017-11-01

    Full Text Available Chocolate is a highly valued and palatable confectionery product. Chocolate is primarily made from the processed seeds of the tree species Theobroma cacao. Cacao cultivation is highly relevant for small-holder farmers throughout the tropics, yet its productivity remains limited by low yields and widespread pathogens. A panel of 148 improved cacao clones was assembled based on productivity and disease resistance, and phenotypic single-tree replicated clonal evaluation was performed for 8 years. Using high-density markers, the diversity of clones was expressed relative to 10 known ancestral cacao populations, and significant effects of ancestry were observed in productivity and disease resistance. Genome-wide association (GWA was performed, and six markers were significantly associated with frosty pod disease resistance. In addition, genomic selection was performed, and consistent with the observed extensive linkage disequilibrium, high predictive ability was observed at low marker densities for all traits. Finally, quantitative trait locus mapping and differential expression analysis of two cultivars with contrasting disease phenotypes were performed to identify genes underlying frosty pod disease resistance, identifying a significant quantitative trait locus and 35 differentially expressed genes using two independent differential expression analyses. These results indicate that in breeding populations of heterozygous and recently admixed individuals, mapping approaches can be used for low complexity traits like pod color cacao, or in other species single gene disease resistance, however genomic selection for quantitative traits remains highly effective relative to mapping. Our results can help guide the breeding process for sustainable improved cacao productivity.

  14. Detailed analysis of inversions predicted between two human genomes: errors, real polymorphisms, and their origin and population distribution.

    Science.gov (United States)

    Vicente-Salvador, David; Puig, Marta; Gayà-Vidal, Magdalena; Pacheco, Sarai; Giner-Delgado, Carla; Noguera, Isaac; Izquierdo, David; Martínez-Fundichely, Alexander; Ruiz-Herrera, Aurora; Estivill, Xavier; Aguado, Cristina; Lucas-Lledó, José Ignacio; Cáceres, Mario

    2017-02-01

    The growing catalogue of structural variants in humans often overlooks inversions as one of the most difficult types of variation to study, even though they affect phenotypic traits in diverse organisms. Here, we have analysed in detail 90 inversions predicted from the comparison of two independently assembled human genomes: the reference genome (NCBI36/HG18) and HuRef. Surprisingly, we found that two thirds of these predictions (62) represent errors either in assembly comparison or in one of the assemblies, including 27 misassembled regions in HG18. Next, we validated 22 of the remaining 28 potential polymorphic inversions using different PCR techniques and characterized their breakpoints and ancestral state. In addition, we determined experimentally the derived allele frequency in Europeans for 17 inversions (DAF = 0.01-0.80), as well as the distribution in 14 worldwide populations for 12 of them based on the 1000 Genomes Project data. Among the validated inversions, nine have inverted repeats (IRs) at their breakpoints, and two show nucleotide variation patterns consistent with a recurrent origin. Conversely, inversions without IRs have a unique origin and almost all of them show deletions or insertions at the breakpoints in the derived allele mediated by microhomology sequences, which highlights the importance of mechanisms like FoSTeS/MMBIR in the generation of complex rearrangements in the human genome. Finally, we found several inversions located within genes and at least one candidate to be positively selected in Africa. Thus, our study emphasizes the importance of careful analysis and validation of large-scale genomic predictions to extract reliable biological conclusions. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  15. Genome-wide prediction methods in highly diverse and heterozygous species: proof-of-concept through simulation in grapevine.

    Directory of Open Access Journals (Sweden)

    Agota Fodor

    Full Text Available Nowadays, genome-wide association studies (GWAS and genomic selection (GS methods which use genome-wide marker data for phenotype prediction are of much potential interest in plant breeding. However, to our knowledge, no studies have been performed yet on the predictive ability of these methods for structured traits when using training populations with high levels of genetic diversity. Such an example of a highly heterozygous, perennial species is grapevine. The present study compares the accuracy of models based on GWAS or GS alone, or in combination, for predicting simple or complex traits, linked or not with population structure. In order to explore the relevance of these methods in this context, we performed simulations using approx 90,000 SNPs on a population of 3,000 individuals structured into three groups and corresponding to published diversity grapevine data. To estimate the parameters of the prediction models, we defined four training populations of 1,000 individuals, corresponding to these three groups and a core collection. Finally, to estimate the accuracy of the models, we also simulated four breeding populations of 200 individuals. Although prediction accuracy was low when breeding populations were too distant from the training populations, high accuracy levels were obtained using the sole core-collection as training population. The highest prediction accuracy was obtained (up to 0.9 using the combined GWAS-GS model. We thus recommend using the combined prediction model and a core-collection as training population for grapevine breeding or for other important economic crops with the same characteristics.

  16. CFD prediction of heat island formation on growing Asian cities. Effect of urbanization in Shanghai; Kyodaikasuru Asia no toshi ni okeru heat island keisei ni kansuru CFD yosoku. Shanghai no toshika ga oyobosu eikyo ni kansuru kento

    Energy Technology Data Exchange (ETDEWEB)

    Ojima, T.; Murakami, S. [The University of Tokyo, Tokyo (Japan). Institute of Industrial Science; Mitsumoto, K. [Waseda University, Tokyo (Japan). School of Science and Engineering

    1997-10-01

    Study is conducted of the effect of change in land use and increase in artificial exhaust heat on heat island formation in Shanghai. Concerning the land use distribution in Shanghai, a point sampling survey was conducted in the 1930s using topographic charts, when the area was broken down into building-occupied region, paddy field, bare ground, and waters. In the 1990s, thanks to data from satellites, high-density and low-density urban regions have added. Calculation for Shanghai is performed, based on the rate of increase in Tokyo`s population and data predicted for Shanghai`s population, on the assumption that Shanghai`s population in the 2050s will grow 2.3 times larger than it is in the 1990s. The prediction thus produced indicates that the urban area in Shanghai in the 2050s will be as large as that of the present-day Tokyo that covers a 50km zone. Heat island formation prediction for Shanghai is worked out using computational fluid dynamics (CFD)-aided simulation. According to the prediction, while the maximum temperature in the 1930s was 29.6degC or 4degC higher than in the suburbs, it is 33.2degC or 7.6deg higher in the 1990s, and will be 34.4degC or 8.6degC higher in the 2050s. 16 refs., 11 figs., 1 tab.

  17. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sükösd, Zsuzsanna; Andersen, Ebbe Sloth; Seemann, Ernst Stefan

    2015-01-01

    of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping...

  18. On the limits of computational functional genomics for bacterial lifestyle prediction

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Röttger, Richard; Hauschild, Anne-Christin

    2014-01-01

    We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs...

  19. Complete mitochondrial genome sequences of Korean native horse from Jeju Island: uncovering the spatio-temporal dynamics.

    Science.gov (United States)

    Yoon, Sook Hee; Kim, Jaemin; Shin, Donghyun; Cho, Seoae; Kwak, Woori; Lee, Hak-Kyo; Park, Kyoung-Do; Kim, Heebal

    2017-04-01

    The Korean native horse (Jeju horse) is one of the most important animals in Korean historical, cultural, and economical viewpoints. In the early 1980s, the Jeju horse was close to extinction. The aim of this study is to explore the phylogenomics of Korean native horse focusing on spatio-temporal dynamics. We determined complete mitochondrial genome sequences for the first Korean native (n = 6) and additional Mongolian (n = 2) horses. Those sequences were analyzed together with 143 published ones using Bayesian coalescent approach as well as three different phylogenetic analysis methods, Bayesian inference, maximum likelihood, and neighbor-joining methods. The phylogenomic trees revealed that the Korean native horses had multiple origins and clustered together with some horses from four European and one Middle Eastern breeds. Our phylogenomic analyses also supported that there was no apparent association between breed or geographic location and the evolution of global horses. Time of the most recent common ancestor of the Korean native horse was approximately 13,200-63,200 years, which was much younger than 0.696 My of modern horses. Additionally, our results showed that all global horse lineages including Korean native horse existed prior to their domestication events occurred in about 6000-10,000 years ago. This is the first study on phylogenomics of the Korean native horse focusing on spatio-temporal dynamics. Our findings increase our understanding of the domestication history of the Korean native horses, and could provide useful information for horse conservation projects as well as for horse genomics, emergence, and the geographical distribution.

  20. The prevalences of Salmonella Genomic Island 1 variants in human and animal Salmonella Typhimurium DT104 are distinguishable using a Bayesian approach.

    Directory of Open Access Journals (Sweden)

    Alison E Mather

    Full Text Available Throughout the 1990 s, there was an epidemic of multidrug resistant Salmonella Typhimurium DT104 in both animals and humans in Scotland. The use of antimicrobials in agriculture is often cited as a major source of antimicrobial resistance in pathogenic bacteria of humans, suggesting that DT104 in animals and humans should demonstrate similar prevalences of resistance determinants. Until very recently, only the application of molecular methods would allow such a comparison and our understanding has been hindered by the fact that surveillance data are primarily phenotypic in nature. Here, using large scale surveillance datasets and a novel Bayesian approach, we infer and compare the prevalence of Salmonella Genomic Island 1 (SGI1, SGI1 variants, and resistance determinants independent of SGI1 in animal and human DT104 isolates from such phenotypic data. We demonstrate differences in the prevalences of SGI1, SGI1-B, SGI1-C, absence of SGI1, and tetracycline resistance determinants independent of SGI1 between these human and animal populations, a finding that challenges established tenets that DT104 in domestic animals and humans are from the same well-mixed microbial population.

  1. Heat Islands

    Science.gov (United States)

    EPA's Heat Island Effect Site provides information on heat islands, their impacts, mitigation strategies, related research, a directory of heat island reduction initiatives in U.S. communities, and EPA's Heat Island Reduction Program.

  2. Island biogeography

    DEFF Research Database (Denmark)

    Whittaker, Robert James; Fernández-Palacios, José María; Matthews, Thomas J.

    2017-01-01

    Islands provide classic model biological systems. We review how growing appreciation of geoenvironmental dynamics of marine islands has led to advances in island biogeographic theory accommodating both evolutionary and ecological phenomena. Recognition of distinct island geodynamics permits gener...

  3. Prediction of disease and phenotype associations from genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Stephanie N Lewis

    Full Text Available Genome wide association studies (GWAS have proven useful as a method for identifying genetic variations associated with diseases. In this study, we analyzed GWAS data for 61 diseases and phenotypes to elucidate common associations based on single nucleotide polymorphisms (SNP. The study was an expansion on a previous study on identifying disease associations via data from a single GWAS on seven diseases.Adjustments to the originally reported study included expansion of the SNP dataset using Linkage Disequilibrium (LD and refinement of the four levels of analysis to encompass SNP, SNP block, gene, and pathway level comparisons. A pair-wise comparison between diseases and phenotypes was performed at each level and the Jaccard similarity index was used to measure the degree of association between two diseases/phenotypes. Disease relatedness networks (DRNs were used to visualize our results. We saw predominant relatedness between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis for the first three levels of analysis. Expected relatedness was also seen between lipid- and blood-related traits.The predominant associations between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis can be validated by clinical studies. The diseases have been proposed to share a systemic inflammation phenotype that can result in progression of additional diseases in patients with one of these three diseases. We also noticed unexpected relationships between metabolic and neurological diseases at the pathway comparison level. The less significant relationships found between diseases require a more detailed literature review to determine validity of the predictions. The results from this study serve as a first step towards a better understanding of seemingly unrelated diseases and phenotypes with similar symptoms or modes of treatment.

  4. Prediction of Phenotypic Antimicrobial Resistance Profiles From Whole Genome Sequences of Non-typhoidal Salmonella enterica.

    Science.gov (United States)

    Neuert, Saskia; Nair, Satheesh; Day, Martin R; Doumith, Michel; Ashton, Philip M; Mellor, Kate C; Jenkins, Claire; Hopkins, Katie L; Woodford, Neil; de Pinna, Elizabeth; Godbole, Gauri; Dallman, Timothy J

    2018-01-01

    Surveillance of antimicrobial resistance (AMR) in non-typhoidal Salmonella enterica (NTS), is essential for monitoring transmission of resistance from the food chain to humans, and for establishing effective treatment protocols. We evaluated the prediction of phenotypic resistance in NTS from genotypic profiles derived from whole genome sequencing (WGS). Genes and chromosomal mutations responsible for phenotypic resistance were sought in WGS data from 3,491 NTS isolates received by Public Health England's Gastrointestinal Bacteria Reference Unit between April 2014 and March 2015. Inferred genotypic AMR profiles were compared with phenotypic susceptibilities determined for fifteen antimicrobials using EUCAST guidelines. Discrepancies between phenotypic and genotypic profiles for one or more antimicrobials were detected for 76 isolates (2.18%) although only 88/52,365 (0.17%) isolate/antimicrobial combinations were discordant. Of the discrepant results, the largest number were associated with streptomycin (67.05%, n = 59). Pan-susceptibility was observed in 2,190 isolates (62.73%). Overall, resistance to tetracyclines was most common (26.27% of isolates, n = 917) followed by sulphonamides (23.72%, n = 828) and ampicillin (21.43%, n = 748). Multidrug resistance (MDR), i.e., resistance to three or more antimicrobial classes, was detected in 848 isolates (24.29%) with resistance to ampicillin, streptomycin, sulphonamides and tetracyclines being the most common MDR profile ( n = 231; 27.24%). For isolates with this profile, all but one were S . Typhimurium and 94.81% ( n = 219) had the resistance determinants bla TEM-1, strA-strB, sul2 and tet (A). Extended-spectrum β-lactamase genes were identified in 41 isolates (1.17%) and multiple mutations in chromosomal genes associated with ciprofloxacin resistance in 82 isolates (2.35%). This study showed that WGS is suitable as a rapid means of determining AMR patterns of NTS for public health surveillance.

  5. Persistency of Prediction Accuracy and Genetic Gain in Synthetic Populations Under Recurrent Genomic Selection.

    Science.gov (United States)

    Müller, Dominik; Schopp, Pascal; Melchinger, Albrecht E

    2017-03-10

    Recurrent selection (RS) has been used in plant breeding to successively improve synthetic and other multiparental populations. Synthetics are generated from a limited number of parents [Formula: see text] but little is known about how [Formula: see text] affects genomic selection (GS) in RS, especially the persistency of prediction accuracy ([Formula: see text]) and genetic gain. Synthetics were simulated by intermating [Formula: see text]= 2-32 parent lines from an ancestral population with short- or long-range linkage disequilibrium ([Formula: see text]) and subjected to multiple cycles of GS. We determined [Formula: see text] and genetic gain across 30 cycles for different training set ( TS ) sizes, marker densities, and generations of recombination before model training. Contributions to [Formula: see text] and genetic gain from pedigree relationships, as well as from cosegregation and [Formula: see text] between QTL and markers, were analyzed via four scenarios differing in (i) the relatedness between TS and selection candidates and (ii) whether selection was based on markers or pedigree records. Persistency of [Formula: see text] was high for small [Formula: see text] where predominantly cosegregation contributed to [Formula: see text], but also for large [Formula: see text] where [Formula: see text] replaced cosegregation as the dominant information source. Together with increasing genetic variance, this compensation resulted in relatively constant long- and short-term genetic gain for increasing [Formula: see text] > 4, given long-range LD A in the ancestral population. Although our scenarios suggest that information from pedigree relationships contributed to [Formula: see text] for only very few generations in GS, we expect a longer contribution than in pedigree BLUP, because capturing Mendelian sampling by markers reduces selective pressure on pedigree relationships. Larger TS size ([Formula: see text]) and higher marker density improved persistency of

  6. Persistency of Prediction Accuracy and Genetic Gain in Synthetic Populations Under Recurrent Genomic Selection

    Directory of Open Access Journals (Sweden)

    Dominik Müller

    2017-03-01

    Full Text Available Recurrent selection (RS has been used in plant breeding to successively improve synthetic and other multiparental populations. Synthetics are generated from a limited number of parents ( Np , but little is known about how Np affects genomic selection (GS in RS, especially the persistency of prediction accuracy (rg , g ^ and genetic gain. Synthetics were simulated by intermating Np= 2–32 parent lines from an ancestral population with short- or long-range linkage disequilibrium (LDA and subjected to multiple cycles of GS. We determined rg , g ^ and genetic gain across 30 cycles for different training set (TS sizes, marker densities, and generations of recombination before model training. Contributions to rg , g ^ and genetic gain from pedigree relationships, as well as from cosegregation and LDA between QTL and markers, were analyzed via four scenarios differing in (i the relatedness between TS and selection candidates and (ii whether selection was based on markers or pedigree records. Persistency of rg , g ^ was high for small Np , where predominantly cosegregation contributed to rg , g ^ , but also for large Np , where LDA replaced cosegregation as the dominant information source. Together with increasing genetic variance, this compensation resulted in relatively constant long- and short-term genetic gain for increasing Np > 4, given long-range LDA in the ancestral population. Although our scenarios suggest that information from pedigree relationships contributed to rg , g ^ for only very few generations in GS, we expect a longer contribution than in pedigree BLUP, because capturing Mendelian sampling by markers reduces selective pressure on pedigree relationships. Larger TS size (NTS and higher marker density improved persistency of rg , g ^ and hence genetic gain, but additional recombinations could not increase genetic gain.

  7. Multivariate Statistics and Supervised Learning for Predictive Detection of Unintentional Islanding in Grid-Tied Solar PV Systems

    Directory of Open Access Journals (Sweden)

    Shashank Vyas

    2016-01-01

    Full Text Available Integration of solar photovoltaic (PV generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.

  8. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

    Directory of Open Access Journals (Sweden)

    Xiaochun Sun

    Full Text Available Genomic selection (GS procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA and reproducing kernel Hilbert spaces (RKHS regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

  9. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

    Science.gov (United States)

    Sun, Xiaochun; Ma, Ping; Mumm, Rita H

    2012-01-01

    Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

  10. Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

    Science.gov (United States)

    Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K

    2015-01-01

    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome

  11. Genomic prediction in early selection stages using multi-year data in a hybrid rye breeding program.

    Science.gov (United States)

    Bernal-Vasquez, Angela-Maria; Gordillo, Andres; Schmidt, Malthe; Piepho, Hans-Peter

    2017-05-31

    The use of multiple genetic backgrounds across years is appealing for genomic prediction (GP) because past years' data provide valuable information on marker effects. Nonetheless, single-year GP models are less complex and computationally less demanding than multi-year GP models. In devising a suitable analysis strategy for multi-year data, we may exploit the fact that even if there is no replication of genotypes across years, there is plenty of replication at the level of marker loci. Our principal aim was to evaluate different GP approaches to simultaneously model genotype-by-year (GY) effects and breeding values using multi-year data in terms of predictive ability. The models were evaluated under different scenarios reflecting common practice in plant breeding programs, such as different degrees of relatedness between training and validation sets, and using a selected fraction of genotypes in the training set. We used empirical grain yield data of a rye hybrid breeding program. A detailed description of the prediction approaches highlighting the use of kinship for modeling GY is presented. Using the kinship to model GY was advantageous in particular for datasets disconnected across years. On average, predictive abilities were 5% higher for models using kinship to model GY over models without kinship. We confirmed that using data from multiple selection stages provides valuable GY information and helps increasing predictive ability. This increase is on average 30% higher when the predicted genotypes are closely related with the genotypes in the training set. A selection of top-yielding genotypes together with the use of kinship to model GY improves the predictive ability in datasets composed of single years of several selection cycles. Our results clearly demonstrate that the use of multi-year data and appropriate modeling is beneficial for GP because it allows dissecting GY effects from genomic estimated breeding values. The model choice, as well as ensuring

  12. A New Approach to Predict Microbial Community Assembly and Function Using a Stochastic, Genome-Enabled Modeling Framework

    Science.gov (United States)

    King, E.; Brodie, E.; Anantharaman, K.; Karaoz, U.; Bouskill, N.; Banfield, J. F.; Steefel, C. I.; Molins, S.

    2016-12-01

    Characterizing and predicting the microbial and chemical compositions of subsurface aquatic systems necessitates an understanding of the metabolism and physiology of organisms that are often uncultured or studied under conditions not relevant for one's environment of interest. Cultivation-independent approaches are therefore important and have greatly enhanced our ability to characterize functional microbial diversity. The capability to reconstruct genomes representing thousands of populations from microbial communities using metagenomic techniques provides a foundation for development of predictive models for community structure and function. Here, we discuss a genome-informed stochastic trait-based model incorporated into a reactive transport framework to represent the activities of coupled guilds of hypothetical microorganisms. Metabolic pathways for each microbe within a functional guild are parameterized from metagenomic data with a unique combination of traits governing organism fitness under dynamic environmental conditions. We simulate the thermodynamics of coupled electron donor and acceptor reactions to predict the energy available for cellular maintenance, respiration, biomass development, and enzyme production. While `omics analyses can now characterize the metabolic potential of microbial communities, it is functionally redundant as well as computationally prohibitive to explicitly include the thousands of recovered organisms into biogeochemical models. However, one can derive potential metabolic pathways from genomes along with trait-linkages to build probability distributions of traits. These distributions are used to assemble groups of microbes that couple one or more of these pathways. From the initial ensemble of microbes, only a subset will persist based on the interaction of their physiological and metabolic traits with environmental conditions, competing organisms, etc. Here, we analyze the predicted niches of these hypothetical microbes and

  13. A function accounting for training set size and marker density to model the average accuracy of genomic prediction.

    Science.gov (United States)

    Erbe, Malena; Gredler, Birgit; Seefried, Franz Reinhold; Bapst, Beat; Simianer, Henner

    2013-01-01

    Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments ([Formula: see text]). The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5'698 Holstein Friesian bulls genotyped with 50 K SNPs and 1'332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2-10, 15, 20) cross-validation scenarios (50 replicates, random assignment) were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010), augmented by a weighting factor (w) based on the assumption that the maximum achievable accuracy is [Formula: see text]. The proportion of genetic variance captured by the complete SNP sets ([Formula: see text]) was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20'000 SNPs in the Brown Swiss population studied.

  14. A function accounting for training set size and marker density to model the average accuracy of genomic prediction.

    Directory of Open Access Journals (Sweden)

    Malena Erbe

    Full Text Available Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments ([Formula: see text]. The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5'698 Holstein Friesian bulls genotyped with 50 K SNPs and 1'332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2-10, 15, 20 cross-validation scenarios (50 replicates, random assignment were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010, augmented by a weighting factor (w based on the assumption that the maximum achievable accuracy is [Formula: see text]. The proportion of genetic variance captured by the complete SNP sets ([Formula: see text] was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20'000 SNPs in the Brown Swiss population studied.

  15. Prediction of Tourist Arrivals to the Island of Bali with Holt Method of Winter and Seasonal Autoregressive Integrated Moving Average (SARIMA

    Directory of Open Access Journals (Sweden)

    Agus Supriatna

    2017-11-01

    Full Text Available The tourism sector is one of the contributors of foreign exchange is quite influential in improving the economy of Indonesia. The development of this sector will have a positive impact, including employment opportunities and opportunities for entrepreneurship in various industries such as adventure tourism, craft or hospitality. The beauty and natural resources owned by Indonesia become a tourist attraction for domestic and foreign tourists. One of the many tourist destination is the island of Bali. The island of Bali is not only famous for its natural, cultural diversity and arts but there are also add the value of tourism. In 2015 the increase in the number of tourist arrivals amounted to 6.24% from the previous year. In improving the quality of services, facing a surge of visitors, or prepare a strategy in attracting tourists need a prediction of arrival so that planning can be more efficient and effective. This research used  Holt Winter's method and Seasonal Autoregressive Integrated Moving Average (SARIMA method  to predict tourist arrivals. Based on data of foreign tourist arrivals who visited the Bali island in January 2007 until June 2016, the result of Holt Winter's method with parameter values α=0.1 ,β=0.1 ,γ=0.3 has an error MAPE is 6,171873. While the result of SARIMA method with (0,1,1〖(1,0,0〗12 model has an error MAPE is 5,788615 and it can be concluded that SARIMA method is better. Keywords: Foreign Tourist, Prediction, Bali Island, Holt-Winter’s, SARIMA.

  16. Traumatic Brain Injury Induces Genome-Wide Transcriptomic, Methylomic, and Network Perturbations in Brain and Blood Predicting Neurological Disorders

    Directory of Open Access Journals (Sweden)

    Qingying Meng

    2017-02-01

    Full Text Available The complexity of the traumatic brain injury (TBI pathology, particularly concussive injury, is a serious obstacle for diagnosis, treatment, and long-term prognosis. Here we utilize modern systems biology in a rodent model of concussive injury to gain a thorough view of the impact of TBI on fundamental aspects of gene regulation, which have the potential to drive or alter the course of the TBI pathology. TBI perturbed epigenomic programming, transcriptional activities (expression level and alternative splicing, and the organization of genes in networks centered around genes such as Anax2, Ogn, and Fmod. Transcriptomic signatures in the hippocampus are involved in neuronal signaling, metabolism, inflammation, and blood function, and they overlap with those in leukocytes from peripheral blood. The homology between genomic signatures from blood and brain elicited by TBI provides proof of concept information for development of biomarkers of TBI based on composite genomic patterns. By intersecting with human genome-wide association studies, many TBI signature genes and network regulators identified in our rodent model were causally associated with brain disorders with relevant link to TBI. The overall results show that concussive brain injury reprograms genes which could lead to predisposition to neurological and psychiatric disorders, and that genomic information from peripheral leukocytes has the potential to predict TBI pathogenesis in the brain.

  17. Comparative genomics of the type VI secretion systems of Pantoea and Erwinia species reveals the presence of putative effector islands that may be translocated by the VgrG and Hcp proteins

    Directory of Open Access Journals (Sweden)

    De Maayer Pieter

    2011-11-01

    Full Text Available Abstract Background The Type VI secretion apparatus is assembled by a conserved set of proteins encoded within a distinct locus. The putative effector proteins Hcp and VgrG are also encoded within these loci. We have identified numerous distinct Type VI secretion system (T6SS loci in the genomes of several ecologically diverse Pantoea and Erwinia species and detected the presence of putative effector islands associated with the hcp and vgrG genes. Results Between two and four T6SS loci occur among the Pantoea and Erwinia species. While two of the loci (T6SS-1 and T6SS-2 are well conserved among the various strains, the third (T6SS-3 locus is not universally distributed. Additional orthologous loci are present in Pantoea sp. aB-valens and Erwinia billingiae Eb661. Comparative analysis of the T6SS-1 and T6SS-3 loci showed non-conserved islands associated with the vgrG and hcp, and vgrG genes, respectively. These regions had a G+C content far lower than the conserved portions of the loci. Many of the proteins encoded within the hcp and vgrG islands carry conserved domains, which suggests they may serve as effector proteins for the T6SS. A number of the proteins also show homology to the C-terminal extensions of evolved VgrG proteins. Conclusions Extensive diversity was observed in the number and content of the T6SS loci among the Pantoea and Erwinia species. Genomic islands could be observed within some of T6SS loci, which are associated with the hcp and vgrG proteins and carry putative effector domain proteins. We propose new hypotheses concerning a role for these islands in the acquisition of T6SS effectors and the development of novel evolved VgrG and Hcp proteins.

  18. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  19. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds

    DEFF Research Database (Denmark)

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei

    2017-01-01

    sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased......BACKGROUND: A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic...

  20. Genome-scale characterization of RNA tertiary structures and their functional impact by RNA solvent accessibility prediction.

    Science.gov (United States)

    Yang, Yuedong; Li, Xiaomei; Zhao, Huiying; Zhan, Jian; Wang, Jihua; Zhou, Yaoqi

    2017-01-01

    As most RNA structures are elusive to structure determination, obtaining solvent accessible surface areas (ASAs) of nucleotides in an RNA structure is an important first step to characterize potential functional sites and core structural regions. Here, we developed RNAsnap, the first machine-learning method trained on protein-bound RNA structures for solvent accessibility prediction. Built on sequence profiles from multiple sequence alignment (RNAsnap-prof), the method provided robust prediction in fivefold cross-validation and an independent test (Pearson correlation coefficients, r, between predicted and actual ASA values are 0.66 and 0.63, respectively). Application of the method to 6178 mRNAs revealed its positive correlation to mRNA accessibility by dimethyl sulphate (DMS) experimentally measured in vivo (r = 0.37) but not in vitro (r = 0.07), despite the lack of training on mRNAs and the fact that DMS accessibility is only an approximation to solvent accessibility. We further found strong association across coding and noncoding regions between predicted solvent accessibility of the mutation site of a single nucleotide variant (SNV) and the frequency of that variant in the population for 2.2 million SNVs obtained in the 1000 Genomes Project. Moreover, mapping solvent accessibility of RNAs to the human genome indicated that introns, 5' cap of 5' and 3' cap of 3' untranslated regions, are more solvent accessible, consistent with their respective functional roles. These results support conformational selections as the mechanism for the formation of RNA-protein complexes and highlight the utility of genome-scale characterization of RNA tertiary structures by RNAsnap. The server and its stand-alone downloadable version are available at http://sparks-lab.org. © 2016 Yang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  1. Genetic analysis of environmental strains of the plant pathogen Phytophthora capsici reveals heterogeneous repertoire of effectors and possible effector evolution via genomic island.

    Science.gov (United States)

    Iribarren, María Josefina; Pascuan, Cecilia; Soto, Gabriela; Ayub, Nicolás Daniel

    2015-11-01

    Phytophthora capsici is a virulent oomycete pathogen of many vegetable crops. Recently, it has been demonstrated that the recognition of the RXLR effector AVR3a1 of P. capsici (PcAVR3a1) triggers a hypersensitive response and plays a critical role in mediating non-host resistance. Here, we analyzed the occurrence of PcAVR3a1 in 57 isolates of P. capsici derived from globe squash, eggplant, tomato and bell pepper cocultivated in a small geographical area. The occurrence of PcAVR3a1 in environmental strains of P. capsici was confirmed by PCR in only 21 of these pathogen isolates. To understand the presence-absence pattern of PcAVR3a1 in environmental strains, the flanking region of this gene was sequenced. PcAVR3a1 was found within a genetic element that we named PcAVR3a1-GI (PcAVR3a1 genomic island). PcAVR3a1-GI was flanked by a 22-bp direct repeat, which is related to its site-specific recombination site. In addition to the PcAVR3a1 gene, PcAVR3a1-GI also encoded a phage integrase probably associated with the excision and integration of this mobile element. Exposure to plant induced the presence of an episomal circular intermediate of PcAVR3a1-GI, indicating that this mobile element is functional. Collectively, these findings provide evidence of PcAVR3a1 evolution via mobile elements in environmental strains of Phytophthora. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Model comparison on genomic predictions using high-density markers for different groups of bulls in the Nordic Holstein population

    DEFF Research Database (Denmark)

    Gao, Hongding; Su, Guosheng; Janss, Luc

    2013-01-01

    This study compared genomic predictions based on imputed high-density markers (~777,000) in the Nordic Holstein population using a genomic BLUP (GBLUP) model, 4 Bayesian exponential power models with different shape parameters (0.3, 0.5, 0.8, and 1.0) for the exponential power distribution...... relationship with the training population. Groupsmgs had both the sire and the maternal grandsire (MGS), Groupsire only had the sire, Groupmgs only had the MGS, and Groupnon had neither the sire nor the MGS in the training population. Reliability of DGV was measured as the squared correlation between DGV...... and DRP divided by the reliability of DRP for the bulls in validation data set. Unbiasedness of DGV was measured as the regression of DRP on DGV. The results indicated that DGV were more accurate and less biased for animals that were more related to the training population. In general, the Bayesian...

  3. Population Genomics of Infectious and Integrated Wolbachia pipientis Genomes in Drosophila ananassae

    Science.gov (United States)

    Choi, Jae Young; Bubnell, Jaclyn E.; Aquadro, Charles F.

    2015-01-01

    Coevolution between Drosophila and its endosymbiont Wolbachia pipientis has many intriguing aspects. For example, Drosophila ananassae hosts two forms of W. pipientis genomes: One being the infectious bacterial genome and the other integrated into the host nuclear genome. Here, we characterize the infectious and integrated genomes of W. pipientis infecting D. ananassae (wAna), by genome sequencing 15 strains of D. ananassae that have either the infectious or integrated wAna genomes. Results indicate evolutionarily stable maternal transmission for the infectious wAna genome suggesting a relatively long-term coevolution with its host. In contrast, the integrated wAna genome showed pseudogene-like characteristics accumulating many variants that are predicted to have deleterious effects if present in an infectious bacterial genome. Phylogenomic analysis of sequence variation together with genotyping by polymerase chain reaction of large structural variations indicated several wAna variants among the eight infectious wAna genomes. In contrast, only a single wAna variant was found among the seven integrated wAna genomes examined in lines from Africa, south Asia, and south Pacific islands suggesting that the integration occurred once from a single infectious wAna genome and then spread geographically. Further analysis revealed that for all D. ananassae we examined with the integrated wAna genomes, the majority of the integrated wAna genomic regions is represented in at least two copies suggesting a double integration or single integration followed by an integrated genome duplication. The possible evolutionary mechanism underlying the widespread geographical presence of the duplicate integration of the wAna genome is an intriguing question remaining to be answered. PMID:26254486

  4. On the limits of computational functional genomics for bacterial lifestyle prediction

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Röttger, Richard; Hauschild, Anne-Christin

    2014-01-01

    We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HPs...... of an observation bias, i.e. many HPs might yet be unclassified BPs. (H4) There is no intrinsic genomic characteristic of OPs compared with pathogens, as small mutations are likely to play a more dominant role to survive the immune system. To study these hypotheses, we implemented a bioinformatics pipeline...... that combines evolutionary sequence analysis with statistical learning methods (Random Forest with feature selection, model tuning and robustness analysis). Essentially, we present orthologous gene sets that computationally distinguish pathogens from NPs (H1). We further show a clear limit in differentiating...

  5. Genomic prediction for Nordic Red Cattle using one-step and selection index blending

    DEFF Research Database (Denmark)

    Guosheng, Su; Madsen, Per; Nielsen, Ulrik Sander

    2012-01-01

    This study investigated the accuracy of direct genomic breeding values (DGV) using a genomic BLUP model, genomic enhanced breeding values (GEBV) using a one-step blending approach, and GEBV using a selection index blending approach for 15 traits of Nordic Red Cattle. The data comprised 6,631 bulls...... genotyped and nongenotyped bulls for one-step blending, and to scale DGV and its expected reliability in the selection index blending. Weighting (scaling) factors had a small influence on reliabilities of GEBV, but a large influence on the variation of GEBV. Based on the validation analyses, averaged over...... the 15 traits, the reliability of DGV for bulls without daughter records was 11.0 percentage points higher than the reliability of conventional pedigree index. Further gain of 0.9 percentage points was achieved by combining information from conventional pedigree index using the selection index blending...

  6. Kaptive Web: User-Friendly Capsule and Lipopolysaccharide Serotype Prediction for Klebsiella Genomes.

    Science.gov (United States)

    Wick, Ryan R; Heinz, Eva; Holt, Kathryn E; Wyres, Kelly L

    2018-06-01

    As whole-genome sequencing becomes an established component of the microbiologist's toolbox, it is imperative that researchers, clinical microbiologists, and public health professionals have access to genomic analysis tools for the rapid extraction of epidemiologically and clinically relevant information. For the Gram-negative hospital pathogens such as Klebsiella pneumoniae , initial efforts have focused on the detection and surveillance of antimicrobial resistance genes and clones. However, with the resurgence of interest in alternative infection control strategies targeting Klebsiella surface polysaccharides, the ability to extract information about these antigens is increasingly important. Here we present Kaptive Web, an online tool for the rapid typing of Klebsiella K and O loci, which encode the polysaccharide capsule and lipopolysaccharide O antigen, respectively. Kaptive Web enables users to upload and analyze genome assemblies in a web browser. The results can be downloaded in tabular format or explored in detail via the graphical interface, making it accessible for users at all levels of computational expertise. We demonstrate Kaptive Web's utility by analyzing >500 K. pneumoniae genomes. We identify extensive K and O locus diversity among 201 genomes belonging to the carbapenemase-associated clonal group 258 (25 K and 6 O loci). The characterization of a further 309 genomes indicated that such diversity is common among the multidrug-resistant clones and that these loci represent useful epidemiological markers for strain subtyping. These findings reinforce the need for rapid, reliable, and accessible typing methods such as Kaptive Web. Kaptive Web is available for use at http://kaptive.holtlab.net/, and the source code is available at https://github.com/kelwyres/Kaptive-Web. Copyright © 2018 Wick et al.

  7. Comparison on genomic predictions using GBLUP models and two single-step blending methods with different relationship matrices in the Nordic Holstein population

    DEFF Research Database (Denmark)

    Gao, Hongding; Christensen, Ole Fredslund; Madsen, Per

    2012-01-01

    Background A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may...... not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16......) a simple GBLUP method, 2) a GBLUP method with a polygenic effect, 3) an adjusted GBLUP method with a polygenic effect, 4) a single-step blending method, and 5) an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted...

  8. Integration and comparison of different genomic data for outcome prediction in cancer

    OpenAIRE

    Gomez Rueda, Hugo; Martínez Ledesma, Emmanuel; Martínez Torteya, Antonio; Palacios Corona, Rebeca; Treviño, Victor

    2005-01-01

    Background In cancer, large-scale technologies such as next-generation sequencing and microarrays have produced a wide number of genomic features such as DNA copy number alterations (CNA), mRNA expression (EXPR), microRNA expression (MIRNA), and DNA somatic mutations (MUT), among others. Several analyses of a specific type of these genomic data have generated many prognostic biomarkers in cancer. However, it is uncertain which of these data is more powerful and whether the best data-type is c...

  9. High-Affinity Methanotrophy Informed by Genome-Wide Analysis of Upland Soil Cluster Alpha (USCα) from Axel Heiberg Island, Canadian High Arctic

    Science.gov (United States)

    Rusley, C.; Onstott, T. C.; Lau, M.

    2017-12-01

    Methane (CH4) is a potent greenhouse gas whose proper budgeting is vital to climate predictions. Recent studies have identified upland Arctic mineral cryosols as consistent CH4 sinks, drawing CH4 from both the atmosphere and underlying anaerobic soil layers. Global atmospheric CH4 uptake is proposed to be mediated by high-affinity methanotrophs based on the detection of the marker gene pmoA (particulate methane monooxygenase beta subunit). However, a lack of pure cultures and scarcity of genomic information have hindered our understanding of their metabolic capabilities and versatility. Together with the missing genetic linkage between its pmoA and 16S ribosomal RNA (rRNA) gene, the factors that control the distribution and magnitude of high-affinity methanotrophy in the Arctic permafrost-affected region have remained elusive. Using 21 metagenomic datasets of surface soils obtained from long-term core incubation experiments,1 this bioinformatics study aimed to reconstruct the draft genome of the Upland Soil Cluster α-proteobacteria (USCα), the high-affinity methanotroph previously detected in the samples,2 and to determine its phylogeny and metabolic requirements. We obtained a genome bin containing the high-affinity form of the USCα-like pmoA gene. The 3.03 Mbp assembly is 91.6% complete with a unique set of single-copy marker genes. The 16S rRNA gene fragment of USCα belongs to the α-proteobacterial family Beijerinckiaceae. Genome annotation indicates possible formaldehyde oxidation via tetrahydromethanopterin-linked C1 transfer pathways, acetate utilization, carbon fixation via the Calvin-Benson-Bassham cycle, and glycogen production. Notably, the key enzymes for formaldehyde assimilation via the serine and ribulose monophosphate pathways are missing. The presence of genes encoding nitrate reductase and hemoglobin suggests adaptation to low O2 under water-logged conditions. Since USCα has versatile carbon metabolisms, it may not be an obligate methanotroph

  10. Genomic predictions for dry matter intake using the international reference population of gDMI

    NARCIS (Netherlands)

    Haas, de Y.; Pryce, J.E.; Calus, M.P.L.; Hulsegge, B.; Spurlock, D.M.; Berry, D.P.; Wall, E.; Lovendahl, P.; Weigel, K.; MacDonald, K.; Miglior, F.; Krattenmacher, N.; Veerkamp, R.F.

    2014-01-01

    In this study, we have demonstrated that using dry matter intake (DMI) phenotypes from multiplecountries increases the accuracy of genomic breeding values for this important trait, provided a multi-trait approach is used. Data from Australia, Canada, Denmark, Germany, Ireland, the Netherlands,New

  11. Gross genomic damage measured by DNA image cytometry independently predicts gastric cancer patient survival

    NARCIS (Netherlands)

    Belien, J.A.M.; Buffart, T.E.; Gill, A.; Broeckaert, M.A.M.; Quirke, P.; Meijer, G.A.; Grabsch, H.

    2009-01-01

    BACKGROUND: DNA aneuploidy reflects gross genomic changes. It can be measured by flow cytometry (FCM-DNA) or image cytometry (ICM-DNA). In gastric cancer, the prevalence of DNA aneuploidy has been reported to range from 27 to 100%, with conflicting associations with clinicopathological variables.

  12. Carcinogen susceptibility is regulated by genome architecture and predicts cancer mutagenesis.

    Science.gov (United States)

    García-Nieto, Pablo E; Schwartz, Erin K; King, Devin A; Paulsen, Jonas; Collas, Philippe; Herrera, Rafael E; Morrison, Ashby J

    2017-10-02

    The development of many sporadic cancers is directly initiated by carcinogen exposure. Carcinogens induce malignancies by creating DNA lesions (i.e., adducts) that can result in mutations if left unrepaired. Despite this knowledge, there has been remarkably little investigation into the regulation of susceptibility to acquire DNA lesions. In this study, we present the first quantitative human genome-wide map of DNA lesions induced by ultraviolet (UV) radiation, the ubiquitous carcinogen in sunlight that causes skin cancer. Remarkably, the pattern of carcinogen susceptibility across the genome of primary cells significantly reflects mutation frequency in malignant melanoma. Surprisingly, DNase-accessible euchromatin is protected from UV, while lamina-associated heterochromatin at the nuclear periphery is vulnerable. Many cancer driver genes have an intrinsic increase in carcinogen susceptibility, including the BRAF oncogene that has the highest mutation frequency in melanoma. These findings provide a genome-wide snapshot of DNA injuries at the earliest stage of carcinogenesis. Furthermore, they identify carcinogen susceptibility as an origin of genome instability that is regulated by nuclear architecture and mirrors mutagenesis in cancer. © 2017 The Authors.

  13. Profiles of Genomic Instability in High-Grade Serous Ovarian Cancer Predict Treatment Outcome

    DEFF Research Database (Denmark)

    Wang, Zhigang C.; Birkbak, Nicolai Juul; Culhane, Aedín C.

    2012-01-01

    Purpose: High-grade serous cancer (HGSC) is the most common cancer of the ovary and is characterized by chromosomal instability. Defects in homologous recombination repair (HRR) are associated with genomic instability in HGSC, and are exploited by therapy targeting DNA repair. Defective HRR cause...

  14. Current theoretical models fail to predict the topological complexity of the human genome

    Directory of Open Access Journals (Sweden)

    Javier eArsuaga

    2015-08-01

    Full Text Available Understanding the folding of the human genome is a key challenge of modern structural biology. The emergence of chromatin conformation capture assays ({it e.g.} Hi-C has revolutionized chromosome biology and provided new insights into the three dimensional structure of the genome. The experimental data are highly complex and need to be analyzed with quantitative tools. It has been argued that the data obtained from Hi-C assays are consistent with a fractal organization of the genome. A key characteristic textcolor{red}{of the fractal globule} is the lack of topological complexity (knotting or inter-linking. However, the absence of topological complexity contradicts results from polymer physics showing that the entanglement of long linear polymers in a confined volume increases rapidly with the length and with decreasing volume. textcolor{red}{{it In vivo} and {it in vitro} assays support this claim in some biological systems. We simulate knotted lattice polygons confined inside a sphere and demonstrate that their contact frequencies agree with the human Hi-C data.} We conclude that the topological complexity of the human genome cannot be inferred from current Hi-C data.

  15. Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato

    Science.gov (United States)

    Potato breeding cycles typically last 6-7 years because of the modest seed multiplication rate and large number of traits required of new varieties. Genomic selection has the potential to increase genetic gain per unit of time, through higher accuracy and/or a shorter cycle. Both possibilities were ...

  16. Improving biological understanding and complex trait prediction by integrating prior information in genomic feature models

    DEFF Research Database (Denmark)

    Edwards, Stefan McKinnon

    externally founded information, such as KEGG pathways, Gene Ontology gene sets, or genomic features, and estimate the joint contribution of the genetic variants within these sets to complex trait phenotypes. The analysis of complex trait phenotypes is hampered by the myriad of genes that control the trait...

  17. The role of genomics in the identification, prediction, and prevention of biological threats.

    Directory of Open Access Journals (Sweden)

    W Florian Fricke

    2009-10-01

    Full Text Available In all likelihood, it is only a matter of time before our public health system will face a major biological threat, whether intentionally dispersed or originating from a known or newly emerging infectious disease. It is necessary not only to increase our reactive "biodefense," but also to be proactive and increase our preparedness. To achieve this goal, it is essential that the scientific and public health communities fully embrace the genomic revolution, and that novel bioinformatic and computing tools necessary to make great strides in our understanding of these novel and emerging threats be developed. Genomics has graduated from a specialized field of science to a research tool that soon will be routine in research laboratories and clinical settings. Because the technology is becoming more affordable, genomics can and should be used proactively to build our preparedness and responsiveness to biological threats. All pieces, including major continued funding, advances in next-generation sequencing technologies, bioinformatics infrastructures, and open access to data and metadata, are being set in place for genomics to play a central role in our public health system.

  18. Reliabilities of genomic prediction using combined reference data of the Nordic Red dairy cattle production

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Rius-Vilarrasa, E; Strandén, I

    2011-01-01

    This study investigated the possibility of increasing the reliability of direct genomic values (DGV) by combining reference opulations. The data were from 3,735 bulls from Danish, Swedish, and Finnish Red dairy cattle populations. Single nucleotide polymorphism markers were fitted as random varia...

  19. Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix.

    Directory of Open Access Journals (Sweden)

    Zhe Zhang

    2010-09-01

    Full Text Available With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest.In the framework of mixed model equations, a new best linear unbiased prediction (BLUP method including a trait-specific relationship matrix (TA was presented and termed TABLUP. The TA matrix was constructed on the basis of marker genotypes and their weights in relation to the trait of interest. A simulation study with 1,000 individuals as the training population and five successive generations as candidate population was carried out to validate the proposed method. The proposed TABLUP method outperformed the ridge regression BLUP (RRBLUP and BLUP with realized relationship matrix (GBLUP. It performed slightly worse than BayesB with an accuracy of 0.79 in the standard scenario.The proposed TABLUP method is an improvement of the RRBLUP and GBLUP method. It might be equivalent to the BayesB method but it has additional benefits like the calculation of accuracies for individual breeding values. The results also showed that the TA-matrix performs better in predicting ability than the classical numerator relationship matrix and the realized relationship matrix which are derived solely from pedigree or markers without regard to the trait. This is because the TA-matrix not only accounts for the Mendelian sampling term, but also puts the greater emphasis on those markers that explain more of the genetic variance in the trait.

  20. A unified and comprehensible view of parametric and kernel methods for genomic prediction with application to rice

    Directory of Open Access Journals (Sweden)

    Laval Jacquin

    2016-08-01

    Full Text Available One objective of this study was to provide readers with a clear and unified understanding ofparametric statistical and kernel methods, used for genomic prediction, and to compare some ofthese in the context of rice breeding for quantitative traits. Furthermore, another objective wasto provide a simple and user-friendly R package, named KRMM, which allows users to performRKHS regression with several kernels. After introducing the concept of regularized empiricalrisk minimization, the connections between well-known parametric and kernel methods suchas Ridge regression (i.e. genomic best linear unbiased predictor (GBLUP and reproducingkernel Hilbert space (RKHS regression were reviewed. Ridge regression was then reformulatedso as to show and emphasize the advantage of the kernel trick concept, exploited by kernelmethods in the context of epistatic genetic architectures, over parametric frameworks used byconventional methods. Some parametric and kernel methods; least absolute shrinkage andselection operator (LASSO, GBLUP, support vector machine regression (SVR and RKHSregression were thereupon compared for their genomic predictive ability in the context of ricebreeding using three real data sets. Among the compared methods, RKHS regression and SVRwere often the most accurate methods for prediction followed by GBLUP and LASSO. An Rfunction which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression,with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time hasbeen developed. Moreover, a modified version of this function, which allows users to tune kernelsfor RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.

  1. A Unified and Comprehensible View of Parametric and Kernel Methods for Genomic Prediction with Application to Rice.

    Science.gov (United States)

    Jacquin, Laval; Cao, Tuong-Vi; Ahmadi, Nourollah

    2016-01-01

    One objective of this study was to provide readers with a clear and unified understanding of parametric statistical and kernel methods, used for genomic prediction, and to compare some of these in the context of rice breeding for quantitative traits. Furthermore, another objective was to provide a simple and user-friendly R package, named KRMM, which allows users to perform RKHS regression with several kernels. After introducing the concept of regularized empirical risk minimization, the connections between well-known parametric and kernel methods such as Ridge regression [i.e., genomic best linear unbiased predictor (GBLUP)] and reproducing kernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulated so as to show and emphasize the advantage of the kernel "trick" concept, exploited by kernel methods in the context of epistatic genetic architectures, over parametric frameworks used by conventional methods. Some parametric and kernel methods; least absolute shrinkage and selection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHS regression were thereupon compared for their genomic predictive ability in the context of rice breeding using three real data sets. Among the compared methods, RKHS regression and SVR were often the most accurate methods for prediction followed by GBLUP and LASSO. An R function which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression, with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time has been developed. Moreover, a modified version of this function, which allows users to tune kernels for RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.

  2. Serratia marcescens harbouring SME-type class A carbapenemases in Canada and the presence of blaSME on a novel genomic island, SmarGI1-1.

    Science.gov (United States)

    Mataseje, L F; Boyd, D A; Delport, J; Hoang, L; Imperial, M; Lefebvre, B; Kuhn, M; Van Caeseele, P; Willey, B M; Mulvey, M R

    2014-07-01

    An increasing prevalence since 2010 of Serratia marcescens harbouring the Ambler class A carbapenemase SME prompted us to further characterize these isolates. Isolates harbouring bla(SME) were identified by PCR and sequencing. Phenotypic analysis for carbapenemase activity was carried out by a modified Hodge test and a modified Carba NP test. Antimicrobial susceptibilities were determined by Etest and Vitek 2. Typing was by PFGE of macrorestriction digests. Whole-genome sequencing of three isolates was carried out to characterize the genomic region harbouring the bla(SME)-type genes. All S. marcescens harbouring SME-type enzymes could be detected using a modified Carba NP test. Isolates harbouring bla(SME) were resistant to penicillins and carbapenems, but remained susceptible to third-generation cephalosporins, as well as fluoroquinolones and trimethoprim/sulfamethoxazole. Isolates exhibited diverse genetic backgrounds, though 57% of isolates were found in three clusters. Analysis of whole-genome sequence data from three isolates revealed that the bla(SME) gene occurred in a novel cryptic prophage genomic island, SmarGI1-1. There has been an increasing occurrence of S. marcescens harbouring bla(SME) in Canada since 2010. The bla(SME) gene was found on a genomic island, SmarGI1-1, that can be excised and circularized, which probably contributes to its dissemination amongst S. marcescens. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous