WorldWideScience

Sample records for genomic islands predict

  1. GIPSy: Genomic island prediction software.

    Science.gov (United States)

    Soares, Siomar C; Geyik, Hakan; Ramos, Rommel T J; de Sá, Pablo H C G; Barbosa, Eudes G V; Baumbach, Jan; Figueiredo, Henrique C P; Miyoshi, Anderson; Tauch, Andreas; Silva, Artur; Azevedo, Vasco

    2016-08-20

    Bacteria are highly diverse organisms that are able to adapt to a broad range of environments and hosts due to their high genomic plasticity. Horizontal gene transfer plays a pivotal role in this genome plasticity and in evolution by leaps through the incorporation of large blocks of genome sequences, ordinarily known as genomic islands (GEIs). GEIs may harbor genes encoding virulence, metabolism, antibiotic resistance and symbiosis-related functions, namely pathogenicity islands (PAIs), metabolic islands (MIs), resistance islands (RIs) and symbiotic islands (SIs). Although many software for the prediction of GEIs exist, they only focus on PAI prediction and present other limitations, such as complicated installation and inconvenient user interfaces. Here, we present GIPSy, the genomic island prediction software, a standalone and user-friendly software for the prediction of GEIs, built on our previously developed pathogenicity island prediction software (PIPS). We also present four application cases in which we crosslink data from literature to PAIs, MIs, RIs and SIs predicted by GIPSy. Briefly, GIPSy correctly predicted the following previously described GEIs: 13 PAIs larger than 30kb in Escherichia coli CFT073; 1 MI for Burkholderia pseudomallei K96243, which seems to be a miscellaneous island; 1 RI of Acinetobacter baumannii AYE, named AbaR1; and, 1 SI of Mesorhizobium loti MAFF303099 presenting a mosaic structure. GIPSy is the first life-style-specific genomic island prediction software to perform analyses of PAIs, MIs, RIs and SIs, opening a door for a better understanding of bacterial genome plasticity and the adaptation to new traits.

  2. Prediction of Genomic Islands in Three Bacterial Pathogens of Pneumonia

    Directory of Open Access Journals (Sweden)

    Wen Wei

    2012-03-01

    Full Text Available Pneumonia is one kind of common infectious disease, which is usually caused by bacteria, viruses, or fungi. In this paper, we predicted genomic islands in three bacterial pathogens of pneumonia. They are Chlamydophila pneumoniae, Mycoplasma pneumoniae and Streptococcus pneumoniae, respectively. For each pathogen, one clinical strain is involved. After implementing the cumulative GC profile combined with h and BCN index, eight genomic islands are found in three pathogens. Among them, six genomic islands are found to have mobility elements, which constitute a kind of conserved character of genomic islands, and this introduces the possibility that they are genuine genomic islands. The present results show that the cumulative GC profile when combined with h and BCN indexes is a good method for predicting genomic islands in bacteria and it has lower false positive rate than the SIGI method. Specially, three genomic islands are found to contain clusters of genes coding for production of virulence factors and this is useful for research into the pathogenicity of these pathogens and helpful for the treatment of diseases caused by them.

  3. Genomic Islands Prediction and Analysis in Cyanobacteira by Bioinfomatics

    Institute of Scientific and Technical Information of China (English)

    Yi Li; Ni-Ni Rao; Feng Yang; Han-Ming Liu

    2014-01-01

    Genomic islands (Gis) are one of the most important components for cyanobacterial genome. The Gis code has many functions, such as symbiosis, pathogenesis, and adaptation. In this article, we predict and analyze the Gis in Synechocystis sp. PCC 6803 by bioinfomatics, and the results show that ISL1, ISL8, and ISL16 are homologous with many other bacteria, and they involve in basic reactions and have a conservative evolution. On the contrary, ISL15 has a unique sequence and function only for Synechocystis sp. PCC 6803. Most of Gis play a role in genome rearrangement because they have lots of transposase. Moreover, we find that recombination and horizontal transfer of Gis are important factors to affect the distribution of non-coding RNA. Our work contributes to a comprehensive understanding of genomic islands and their impact on genome of cyanobacteria.

  4. Genomic islands predict functional adaptation in marine actinobacteria

    Energy Technology Data Exchange (ETDEWEB)

    Penn, Kevin; Jenkins, Caroline; Nett, Markus; Udwary, Daniel; Gontang, Erin; McGlinchey, Ryan; Foster, Brian; Lapidus, Alla; Podell, Sheila; Allen, Eric; Moore, Bradley; Jensen, Paul

    2009-04-01

    Linking functional traits to bacterial phylogeny remains a fundamental but elusive goal of microbial ecology 1. Without this information, it becomes impossible to resolve meaningful units of diversity and the mechanisms by which bacteria interact with each other and adapt to environmental change. Ecological adaptations among bacterial populations have been linked to genomic islands, strain-specific regions of DNA that house functionally adaptive traits 2. In the case of environmental bacteria, these traits are largely inferred from bioinformatic or gene expression analyses 2, thus leaving few examples in which the functions of island genes have been experimentally characterized. Here we report the complete genome sequences of Salinispora tropica and S. arenicola, the first cultured, obligate marine Actinobacteria 3. These two species inhabit benthic marine environments and dedicate 8-10percent of their genomes to the biosynthesis of secondary metabolites. Despite a close phylogenetic relationship, 25 of 37 secondary metabolic pathways are species-specific and located within 21 genomic islands, thus providing new evidence linking secondary metabolism to ecological adaptation. Species-specific differences are also observed in CRISPR sequences, suggesting that variations in phage immunity provide fitness advantages that contribute to the cosmopolitan distribution of S. arenicola 4. The two Salinispora genomes have evolved by complex processes that include the duplication and acquisition of secondary metabolite genes, the products of which provide immediate opportunities for molecular diversification and ecological adaptation. Evidence that secondary metabolic pathways are exchanged by Horizontal Gene Transfer (HGT) yet are fixed among globally distributed populations 5 supports a functional role for their products and suggests that pathway acquisition represents a previously unrecognized force driving bacterial diversification

  5. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models

    Directory of Open Access Journals (Sweden)

    Surovcik Katharina

    2006-03-01

    Full Text Available Abstract Background Horizontal gene transfer (HGT is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired

  6. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm

    Science.gov (United States)

    de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.

    2016-01-01

    Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657

  7. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  8. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    Science.gov (United States)

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  9. Genomic Prediction in Barley

    DEFF Research Database (Denmark)

    Edriss, Vahid; Cericola, Fabio; Jensen, Jens D;

    Genomic prediction uses markers (SNPs) across the whole genome to predict individual breeding values at an early growth stage potentially before large scale phenotyping. One of the applications of genomic prediction in plant breeding is to identify the best individual candidate lines to contribut...

  10. Genomic Prediction in Barley

    DEFF Research Database (Denmark)

    Edriss, Vahid; Cericola, Fabio; Jensen, Jens D;

    2015-01-01

    Genomic prediction uses markers (SNPs) across the whole genome to predict individual breeding values at an early growth stage potentially before large scale phenotyping. One of the applications of genomic prediction in plant breeding is to identify the best individual candidate lines to contribut...

  11. Genomic Prediction in Barley

    DEFF Research Database (Denmark)

    Edriss, Vahid; Cericola, Fabio; Jensen, Jens D

    2015-01-01

    Genomic prediction uses markers (SNPs) across the whole genome to predict individual breeding values at an early growth stage potentially before large scale phenotyping. One of the applications of genomic prediction in plant breeding is to identify the best individual candidate lines to contribute...... to next generation. The main goal of this study was to see the potential of using genomic prediction in a commercial Barley breeding program. The data used in this study was from Nordic Seed company which is located in Denmark. Around 350 advanced lines were genotyped with 9K Barely chip from Illumina...

  12. Genomic island excisions in Bordetella petrii

    Directory of Open Access Journals (Sweden)

    Levillain Erwan

    2009-07-01

    Full Text Available Abstract Background Among the members of the genus Bordetella B. petrii is unique, since it is the only species isolated from the environment, while the pathogenic Bordetellae are obligately associated with host organisms. Another feature distinguishing B. petrii from the other sequenced Bordetellae is the presence of a large number of mobile genetic elements including several large genomic regions with typical characteristics of genomic islands collectively known as integrative and conjugative elements (ICEs. These elements mainly encode accessory metabolic factors enabling this bacterium to grow on a large repertoire of aromatic compounds. Results During in vitro culture of Bordetella petrii colony variants appear frequently. We show that this variability can be attributed to the presence of a large number of metastable mobile genetic elements on its chromosome. In fact, the genome sequence of B. petrii revealed the presence of at least seven large genomic islands mostly encoding accessory metabolic functions involved in the degradation of aromatic compounds and detoxification of heavy metals. Four of these islands (termed GI1 to GI3 and GI6 are highly related to ICEclc of Pseudomonas knackmussii sp. strain B13. Here we present first data about the molecular characterization of these islands. We defined the exact borders of each island and we show that during standard culture of the bacteria these islands get excised from the chromosome. For all but one of these islands (GI5 we could detect circular intermediates. For the clc-like elements GI1 to GI3 of B. petrii we provide evidence that tandem insertion of these islands which all encode highly related integrases and attachment sites may also lead to incorporation of genomic DNA which originally was not part of the island and to the formation of huge composite islands. By integration of a tetracycline resistance cassette into GI3 we found this island to be rather unstable and to be lost from

  13. Innovation for ascertaining genomic islands in PAO1 and PA14 of Pseudomonas aeruginosa

    Institute of Scientific and Technical Information of China (English)

    SONG Lei; ZHANG XueHong

    2009-01-01

    Based on three distinct traits of genomic islands,a novel approach was developed to search for and determine genomic islands in special strains.Two genomic islands in Pseudomonas aeruginosa PAO1 and 7 genomic islands in Pseudomonas aeruginosa PA14 were defined with this method.Among the 9 genomic islands,4 islands had been characterized before,while the other 5 islands were initially determined.The insert sites of 6 genomic islands are tRNA sequences,direct repeats of PA14GI-3 are relative to tRNA~(Lau),and direct repeats of PA14GI-2 are at the 3'end of bifunctional GMP synthase/giutamine amidotransferase.Only direct repeats of PA14GI-4 are not clear.Among the 5 newly-found genomic islands,it was supposed that PA14GI-2 is a genomic island related to Hg~(2+) uptake,PA14GI-3 is a secretory activity genomic island,PA14GI-6 is a pathogenicity island,and functions of PA14GI-1 and PA14GI-5 are not clear.Finally,the tyrosine type integrases in PAOIGI-1,PA14GI-5 and PA14GI-7 were analyzed,and their binding and restriction sites were predicted.

  14. Burkholderia pseudomallei genome plasticity associated with genomic island variation

    Directory of Open Access Journals (Sweden)

    Currie Bart J

    2008-04-01

    Full Text Available Abstract Background Burkholderia pseudomallei is a soil-dwelling saprophyte and the cause of melioidosis. Horizontal gene transfer contributes to the genetic diversity of this pathogen and may be an important determinant of virulence potential. The genome contains genomic island (GI regions that encode a broad array of functions. Although there is some evidence for the variable distribution of genomic islands in B. pseudomallei isolates, little is known about the extent of variation between related strains or their association with disease or environmental survival. Results Five islands from B. pseudomallei strain K96243 were chosen as representatives of different types of genomic islands present in this strain, and their presence investigated in other B. pseudomallei. In silico analysis of 10 B. pseudomallei genome sequences provided evidence for the variable presence of these regions, together with micro-evolutionary changes that generate GI diversity. The diversity of GIs in 186 isolates from NE Thailand (83 environmental and 103 clinical isolates was investigated using multiplex PCR screening. The proportion of all isolates positive by PCR ranged from 12% for a prophage-like island (GI 9, to 76% for a metabolic island (GI 16. The presence of each of the five GIs did not differ between environmental and disease-associated isolates (p > 0.05 for all five islands. The cumulative number of GIs per isolate for the 186 isolates ranged from 0 to 5 (median 2, IQR 1 to 3. The distribution of cumulative GI number did not differ between environmental and disease-associated isolates (p = 0.27. The presence of GIs was defined for the three largest clones in this collection (each defined as a single sequence type, ST, by multilocus sequence typing; these were ST 70 (n = 15 isolates, ST 54 (n = 11, and ST 167 (n = 9. The rapid loss and/or acquisition of gene islands was observed within individual clones. Comparisons were drawn between isolates obtained

  15. Genomic islands of speciation in Anopheles gambiae.

    Directory of Open Access Journals (Sweden)

    2005-09-01

    Full Text Available The African malaria mosquito, Anopheles gambiae sensu stricto (A. gambiae, provides a unique opportunity to study the evolution of reproductive isolation because it is divided into two sympatric, partially isolated subtaxa known as M form and S form. With the annotated genome of this species now available, high-throughput techniques can be applied to locate and characterize the genomic regions contributing to reproductive isolation. In order to quantify patterns of differentiation within A. gambiae, we hybridized population samples of genomic DNA from each form to Affymetrix GeneChip microarrays. We found that three regions, together encompassing less than 2.8 Mb, are the only locations where the M and S forms are significantly differentiated. Two of these regions are adjacent to centromeres, on Chromosomes 2L and X, and contain 50 and 12 predicted genes, respectively. Sequenced loci in these regions contain fixed differences between forms and no shared polymorphisms, while no fixed differences were found at nearby control loci. The third region, on Chromosome 2R, contains only five predicted genes; fixed differences in this region were also verified by direct sequencing. These "speciation islands" remain differentiated despite considerable gene flow, and are therefore expected to contain the genes responsible for reproductive isolation. Much effort has recently been applied to locating the genes and genetic changes responsible for reproductive isolation between species. Though much can be inferred about speciation by studying taxa that have diverged for millions of years, studying differentiation between taxa that are in the early stages of isolation will lead to a clearer view of the number and size of regions involved in the genetics of speciation. Despite appreciable levels of gene flow between the M and S forms of A. gambiae, we were able to isolate three small regions of differentiation where genes responsible for ecological and behavioral

  16. The floating (pathogenicity) island: a genomic dessert

    Science.gov (United States)

    Novick, Richard P.; Ram, Geeta

    2015-01-01

    Among the prokaryotic genomic islands (GIs) involved in horizontal gene transfer (HGT) are the classical pathogenicity islands, including the integrative and conjugative elements (ICEs), the gene-transfer agents (GTAs), and the staphylococcal pathogenicity islands (SaPIs), the primary focus of this review. While the ICEs and GTAs mediate HGT autonomously, the SaPIs are dependent on specific phages. The ICEs transfer primarily their own DNA the GTAs exclusively unlinked host DNA and the SaPIs combine the capabilities of both. Thus the SaPIs derive their importance from the genes they carry (their genetic cargo) and the genes they move. They act not only as versatile high frequency mobilizers, but also as mediators of phage interference, and consequently are major benefactors of their host bacteria. PMID:26744223

  17. Genomic islands from five strains of Burkholderia pseudomallei

    Directory of Open Access Journals (Sweden)

    Nierman William C

    2008-11-01

    Full Text Available Abstract Background Burkholderia pseudomallei is the etiologic agent of melioidosis, a significant cause of morbidity and mortality where this infection is endemic. Genomic differences among strains of B. pseudomallei are predicted to be one of the major causes of the diverse clinical manifestations observed among patients with melioidosis. The purpose of this study was to examine the role of genomic islands (GIs as sources of genomic diversity in this species. Results We found that genomic islands (GIs vary greatly among B. pseudomallei strains. We identified 71 distinct GIs from the genome sequences of five reference strains of B. pseudomallei: K96243, 1710b, 1106a, MSHR668, and MSHR305. The genomic positions of these GIs are not random, as many of them are associated with tRNA gene loci. In particular, the 3' end sequences of tRNA genes are predicted to be involved in the integration of GIs. We propose the term "tRNA-mediated site-specific recombination" (tRNA-SSR for this mechanism. In addition, we provide a GI nomenclature that is based upon integration hotspots identified here or previously described. Conclusion Our data suggest that acquisition of GIs is one of the major sources of genomic diversity within B. pseudomallei and the molecular mechanisms that facilitate horizontally-acquired GIs are common across multiple strains of B. pseudomallei. The differential presence of the 71 GIs across multiple strains demonstrates the importance of these mobile elements for shaping the genetic composition of individual strains and populations within this bacterial species.

  18. Genome Island: A Virtual Science Environment in Second Life

    Science.gov (United States)

    Clark, Mary Anne

    2009-01-01

    Mary Anne CLark describes the organization and uses of Genome Island, a virtual laboratory complex constructed in Second Life. Genome Island was created for teaching genetics to university undergraduates but also provides a public space where anyone interested in genetics can spend a few minutes, or a few hours, interacting with genetic…

  19. Genome Island: A Virtual Science Environment in Second Life

    Science.gov (United States)

    Clark, Mary Anne

    2009-01-01

    Mary Anne CLark describes the organization and uses of Genome Island, a virtual laboratory complex constructed in Second Life. Genome Island was created for teaching genetics to university undergraduates but also provides a public space where anyone interested in genetics can spend a few minutes, or a few hours, interacting with genetic…

  20. Evolutionary forces shaping genomic islands of population differentiation in humans

    Directory of Open Access Journals (Sweden)

    Hofer Tamara

    2012-03-01

    Full Text Available Abstract Background Levels of differentiation among populations depend both on demographic and selective factors: genetic drift and local adaptation increase population differentiation, which is eroded by gene flow and balancing selection. We describe here the genomic distribution and the properties of genomic regions with unusually high and low levels of population differentiation in humans to assess the influence of selective and neutral processes on human genetic structure. Methods Individual SNPs of the Human Genome Diversity Panel (HGDP showing significantly high or low levels of population differentiation were detected under a hierarchical-island model (HIM. A Hidden Markov Model allowed us to detect genomic regions or islands of high or low population differentiation. Results Under the HIM, only 1.5% of all SNPs are significant at the 1% level, but their genomic spatial distribution is significantly non-random. We find evidence that local adaptation shaped high-differentiation islands, as they are enriched for non-synonymous SNPs and overlap with previously identified candidate regions for positive selection. Moreover there is a negative relationship between the size of islands and recombination rate, which is stronger for islands overlapping with genes. Gene ontology analysis supports the role of diet as a major selective pressure in those highly differentiated islands. Low-differentiation islands are also enriched for non-synonymous SNPs, and contain an overly high proportion of genes belonging to the 'Oncogenesis' biological process. Conclusions Even though selection seems to be acting in shaping islands of high population differentiation, neutral demographic processes might have promoted the appearance of some genomic islands since i as much as 20% of islands are in non-genic regions ii these non-genic islands are on average two times shorter than genic islands, suggesting a more rapid erosion by recombination, and iii most loci are

  1. The phn island: A new genomic island encoding catabolism of polynuclear aromatic hydrocarbons

    Directory of Open Access Journals (Sweden)

    William James Hickey

    2012-04-01

    Full Text Available Bacteria are key in the biodegradation of polycyclic aromatic hydrocarbons (PAH, which are widespread environmental pollutants. At least six genotypes of PAH-degraders are distinguishable via phylogenies of the ring-hydroxylating dioxygenase (RHD that initiates bacterial PAH metabolism, and a given genotype has a characteristic taxonomic distribution. The latter pattern implies each genotype may have distinct pathways for horizontal gene transfer (HGT. But, while such processes are important in the function of PAH-degrader communities, mechanisms of HGT for most RHD genotypes are unknown. Here, we report in silico and functional analyses of the phenanthrene-degrader Delftia sp. Cs1-4, a representative of the phnAFK2 RHD group. The phnAFK2 genotype predominates PAH degrader communities in some soils and sediments, but, until now, their genomic biology has not been explored. In the present studies, genes for the entire phenanthrene catabolic pathway were discovered on a novel ca. 232 kb genomic island (GEI, now termed the phn island. This GEI had characteristics of an integrative and conjugative element with a mobilization/stabilization system similar to that of SXT/R391-type GEI. But, it could not be grouped with any known GEI, and was the first member of a new GEI class. The island also carried genes predicted to encode: synthesis of quorum sensing signal molecules, fatty acid/polyhydroxyalkonate biosynthesis, a type IV secretory system, a PRTRC system, DNA mobilization functions and > 50 hypothetical proteins. The 50% G+C content of the phn gene cluster differed significantly from the 66.7% G+C level of the island as a whole and the strain Cs1-4 chromosome, indicating a divergent phylogenetic origin for the phn genes. Collectively, these studies added new insights into the genetic elements affecting the PAH biodegradation capacity of microbial communities specifically, and the potential vehicles of HGT in general.

  2. SIFT missense predictions for genomes.

    Science.gov (United States)

    Vaser, Robert; Adusumalli, Swarnaseetha; Leng, Sim Ngak; Sikic, Mile; Ng, Pauline C

    2016-01-01

    The SIFT (sorting intolerant from tolerant) algorithm helps bridge the gap between mutations and phenotypic variations by predicting whether an amino acid substitution is deleterious. SIFT has been used in disease, mutation and genetic studies, and a protocol for its use has been previously published with Nature Protocols. This updated protocol describes SIFT 4G (SIFT for genomes), which is a faster version of SIFT that enables practical computations on reference genomes. Users can get predictions for single-nucleotide variants from their organism of interest using the SIFT 4G annotator with SIFT 4G's precomputed databases. The scope of genomic predictions is expanded, with predictions available for more than 200 organisms. Users can also run the SIFT 4G algorithm themselves. SIFT predictions can be retrieved for 6.7 million variants in 4 min once the database has been downloaded. If precomputed predictions are not available, the SIFT 4G algorithm can compute predictions at a rate of 2.6 s per protein sequence. SIFT 4G is available from http://sift-dna.org/sift4g.

  3. A quantitative account of genomic island acquisitions in prokaryotes

    Directory of Open Access Journals (Sweden)

    Roos Tom E

    2011-08-01

    Full Text Available Abstract Background Microbial genomes do not merely evolve through the slow accumulation of mutations, but also, and often more dramatically, by taking up new DNA in a process called horizontal gene transfer. These innovation leaps in the acquisition of new traits can take place via the introgression of single genes, but also through the acquisition of large gene clusters, which are termed Genomic Islands. Since only a small proportion of all the DNA diversity has been sequenced, it can be hard to find the appropriate donors for acquired genes via sequence alignments from databases. In contrast, relative oligonucleotide frequencies represent a remarkably stable genomic signature in prokaryotes, which facilitates compositional comparisons as an alignment-free alternative for phylogenetic relatedness. In this project, we test whether Genomic Islands identified in individual bacterial genomes have a similar genomic signature, in terms of relative dinucleotide frequencies, and can therefore be expected to originate from a common donor species. Results When multiple Genomic Islands are present within a single genome, we find that up to 28% of these are compositionally very similar to each other, indicative of frequent recurring acquisitions from the same donor to the same acceptor. Conclusions This represents the first quantitative assessment of common directional transfer events in prokaryotic evolutionary history. We suggest that many of the resident Genomic Islands per prokaryotic genome originated from the same source, which may have implications with respect to their regulatory interactions, and for the elucidation of the common origins of these acquired gene clusters.

  4. Unsupervised statistical identification of genomic islands using oligonucleotide distributions with application to Vibrio genomes

    Indian Academy of Sciences (India)

    Sanjay Nag; Raghunath Chatterjee; Keya Chaudhuri; Probal Chaudhuri

    2006-04-01

    Vibrio cholerae, Vibrio vulnificus, Vibrio parahaemolyticus and several other related Vibrio species show distinctly similar two-chromosomal genome organization. However, the modes of pathogenicity are very different among these species, and this is largely attributed to externally acquired genetic elements. We develop some statistical methods to determine these external genetic elements or genomic islands in genomes based on their differential oligonucleotide usage patterns compared to the rest of the genome. Genomic islands identified by these unsupervised statistical methods include integron and pathogenicity islands. After statistical determination of the genomic islands, we investigate their gene contents and their possible association with the pathogenic behaviour of the corresponding Vibrio species. These investigations lead to observations that are of evolutionary and biological significance.

  5. Improved Prediction of Non-methylated Islands in Vertebrates Highlights Different Characteristic Sequence Patterns

    Science.gov (United States)

    Vingron, Martin

    2016-01-01

    Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region’s methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately. PMID:27984582

  6. Finding Pathogenicity Islands in Genome Data with ICA

    Institute of Scientific and Technical Information of China (English)

    ZHENG Fangwei; HUANG Juncai; SHE Kun; ZHOU Mingtian

    2004-01-01

    A novel technique for finding pathogenicity islands in genome data with independent component analyses(ICA) is present.First denoise the genomic signal sequences with ICA and detect G+C patterns in genomes by comparing the result sequence with original sequences.The results on G+C patterns analysis of Dradiodurans chromosome I and N.serogroup A strain Z2491 are present.A set of loci that have very different G+C content and have not previously described are detected.The findings show that ICA is a powerful tool to detect differences within and between genomes and to separate small (gene level) and large (putative pathogenicity islands) genomic regions that have different composition characteristics.

  7. CRISPR-based screening of genomic island excision events in bacteria.

    Science.gov (United States)

    Selle, Kurt; Klaenhammer, Todd R; Barrangou, Rodolphe

    2015-06-30

    Genomic analysis of Streptococcus thermophilus revealed that mobile genetic elements (MGEs) likely contributed to gene acquisition and loss during evolutionary adaptation to milk. Clustered regularly interspaced short palindromic repeats-CRISPR-associated genes (CRISPR-Cas), the adaptive immune system in bacteria, limits genetic diversity by targeting MGEs including bacteriophages, transposons, and plasmids. CRISPR-Cas systems are widespread in streptococci, suggesting that the interplay between CRISPR-Cas systems and MGEs is one of the driving forces governing genome homeostasis in this genus. To investigate the genetic outcomes resulting from CRISPR-Cas targeting of integrated MGEs, in silico prediction revealed four genomic islands without essential genes in lengths from 8 to 102 kbp, totaling 7% of the genome. In this study, the endogenous CRISPR3 type II system was programmed to target the four islands independently through plasmid-based expression of engineered CRISPR arrays. Targeting lacZ within the largest 102-kbp genomic island was lethal to wild-type cells and resulted in a reduction of up to 2.5-log in the surviving population. Genotyping of Lac(-) survivors revealed variable deletion events between the flanking insertion-sequence elements, all resulting in elimination of the Lac-encoding island. Chimeric insertion sequence footprints were observed at the deletion junctions after targeting all of the four genomic islands, suggesting a common mechanism of deletion via recombination between flanking insertion sequences. These results established that self-targeting CRISPR-Cas systems may direct significant evolution of bacterial genomes on a population level, influencing genome homeostasis and remodeling.

  8. On detection and assessment of statistical significance of Genomic Islands

    Directory of Open Access Journals (Sweden)

    Chaudhuri Probal

    2008-04-01

    Full Text Available Abstract Background Many of the available methods for detecting Genomic Islands (GIs in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. Results Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. Conclusion The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods.

  9. Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes.

    Science.gov (United States)

    Dunning Hotopp, Julie C; Grifantini, Renata; Kumar, Nikhil; Tzeng, Yih Ling; Fouts, Derrick; Frigimelica, Elisabetta; Draghi, Monia; Giuliani, Marzia Monica; Rappuoli, Rino; Stephens, David S; Grandi, Guido; Tettelin, Hervé

    2006-12-01

    To better understand Neisseria meningitidis genomes and virulence, microarray comparative genome hybridization (mCGH) data were collected from one Neisseria cinerea, two Neisseria lactamica, two Neisseria gonorrhoeae and 48 Neisseria meningitidis isolates. For N. meningitidis, these isolates are from diverse clonal complexes, invasive and carriage strains, and all major serogroups. The microarray platform represented N. meningitidis strains MC58, Z2491 and FAM18, and N. gonorrhoeae FA1090. By comparing hybridization data to genome sequences, the core N. meningitidis genome and insertions/deletions (e.g. capsule locus, type I secretion system) related to pathogenicity were identified, including further characterization of the capsule locus, bioinformatics analysis of a type I secretion system, and identification of some metabolic pathways associated with intracellular survival in pathogens. Hybridization data clustered meningococcal isolates from similar clonal complexes that were distinguished by the differential presence of six distinct islands of horizontal transfer. Several of these islands contained prophage or other mobile elements, including a novel prophage and a transposon carrying portions of a type I secretion system. Acquisition of some genetic islands appears to have occurred in multiple lineages, including transfer between N. lactamica and N. meningitidis. However, island acquisition occurs infrequently, such that the genomic-level relationship is not obscured within clonal complexes. The N. meningitidis genome is characterized by the horizontal acquisition of multiple genetic islands; the study of these islands reveals important sets of genes varying between isolates and likely to be related to pathogenicity.

  10. Efficient marker data utilization in genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid

    Genomic prediction is a novel method to recognize the best animals for breeding. The aim of this PhD is to improve the accuracy of genomic prediction in dairy cattle by effeiently utilizing marker data. The thesis focuses on three aspects for improving the genomc prediction, which are: criteria...

  11. Patterns and architecture of genomic islands in marine bacteria

    Directory of Open Access Journals (Sweden)

    Fernández-Gómez Beatriz

    2012-07-01

    Full Text Available Abstract Background Genomic Islands (GIs have key roles since they modulate the structure and size of bacterial genomes displaying a diverse set of laterally transferred genes. Despite their importance, GIs in marine bacterial genomes have not been explored systematically to uncover possible trends and to analyze their putative ecological significance. Results We carried out a comprehensive analysis of GIs in 70 selected marine bacterial genomes detected with IslandViewer to explore the distribution, patterns and functional gene content in these genomic regions. We detected 438 GIs containing a total of 8152 genes. GI number per genome was strongly and positively correlated with the total GI size. In 50% of the genomes analyzed the GIs accounted for approximately 3% of the genome length, with a maximum of 12%. Interestingly, we found transposases particularly enriched within Alphaproteobacteria GIs, and site-specific recombinases in Gammaproteobacteria GIs. We described specific Homologous Recombination GIs (HR-GIs in several genera of marine Bacteroidetes and in Shewanella strains among others. In these HR-GIs, we recurrently found conserved genes such as the β-subunit of DNA-directed RNA polymerase, regulatory sigma factors, the elongation factor Tu and ribosomal protein genes typically associated with the core genome. Conclusions Our results indicate that horizontal gene transfer mediated by phages, plasmids and other mobile genetic elements, and HR by site-specific recombinases play important roles in the mobility of clusters of genes between taxa and within closely related genomes, modulating the flexible pool of the genome. Our findings suggest that GIs may increase bacterial fitness under environmental changing conditions by acquiring novel foreign genes and/or modifying gene transcription and/or transduction.

  12. Genomic prediction using QTL derived from whole genome sequence data

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    This study investigated the gain in accuracy of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k SNP data. Analyses were performed for Nordic Holstein and Danish Jersey animals, using eithe...

  13. Predictable evolution toward flightlessness in volant island birds.

    Science.gov (United States)

    Wright, Natalie A; Steadman, David W; Witt, Christopher C

    2016-04-26

    Birds are prolific colonists of islands, where they readily evolve distinct forms. Identifying predictable, directional patterns of evolutionary change in island birds, however, has proved challenging. The "island rule" predicts that island species evolve toward intermediate sizes, but its general applicability to birds is questionable. However, convergent evolution has clearly occurred in the island bird lineages that have undergone transitions to secondary flightlessness, a process involving drastic reduction of the flight muscles and enlargement of the hindlimbs. Here, we investigated whether volant island bird populations tend to change shape in a way that converges subtly on the flightless form. We found that island bird species have evolved smaller flight muscles than their continental relatives. Furthermore, in 366 populations of Caribbean and Pacific birds, smaller flight muscles and longer legs evolved in response to increasing insularity and, strikingly, the scarcity of avian and mammalian predators. On smaller islands with fewer predators, birds exhibited shifts in investment from forelimbs to hindlimbs that were qualitatively similar to anatomical rearrangements observed in flightless birds. These findings suggest that island bird populations tend to evolve on a trajectory toward flightlessness, even if most remain volant. This pattern was consistent across nine families and four orders that vary in lifestyle, foraging behavior, flight style, and body size. These predictable shifts in avian morphology may reduce the physical capacity for escape via flight and diminish the potential for small-island taxa to diversify via dispersal.

  14. Study on the Mitochondrial Genome of Sea Island Cotton (Gossypium barbadense) by BAC Library Screening

    Institute of Scientific and Technical Information of China (English)

    SU Ai-guo; LI Shuang-shuang; LIU Guo-zheng; LEI Bin-bin; KANG Ding-ming; LI Zhao-hu; MA Zhi-ying; HUA Jin-ping

    2014-01-01

    The plant mitochondrial genome displays complex features, particularly in terms of cytoplasmic male sterility (CMS). Therefore, research on the cotton mitochondrial genome may provide important information for analyzing genome evolution and exploring the molecular mechanism of CMS. In this paper, we present a preliminary study on the mitochondrial genome of sea island cotton (Gossypium barbadense) based on positive clones from the bacterial artiifcial chromosome (BAC) library. Thirty-ifve primers designed with the conserved sequences of functional genes and exons of mitochondria were used to screen positive clones in the genome library of the sea island cotton variety called Pima 90-53. Ten BAC clones were obtained and veriifed for further study. A contig was obtained based on six overlapping clones and subsequently laid out primarily on the mitochondrial genome. One BAC clone, clone 6 harbored with the inserter of approximate 115 kb mtDNA sequence, in which more than 10 primers fragments could be ampliifed, was sequenced and assembled using the Solexa strategy. Fifteen mitochondrial functional genes were revealed in clone 6 by gene annotation. The characteristics of the syntenic gene/exon of the sequences and RNA editing were preliminarily predicted.

  15. Methyl-CpG island-associated genome signature tags

    Energy Technology Data Exchange (ETDEWEB)

    Dunn, John J

    2014-05-20

    Disclosed is a method for analyzing the organismic complexity of a sample through analysis of the nucleic acid in the sample. In the disclosed method, through a series of steps, including digestion with a type II restriction enzyme, ligation of capture adapters and linkers and digestion with a type IIS restriction enzyme, genome signature tags are produced. The sequences of a statistically significant number of the signature tags are determined and the sequences are used to identify and quantify the organisms in the sample. Various embodiments of the invention described herein include methods for using single point genome signature tags to analyze the related families present in a sample, methods for analyzing sequences associated with hyper- and hypo-methylated CpG islands, methods for visualizing organismic complexity change in a sampling location over time and methods for generating the genome signature tag profile of a sample of fragmented DNA.

  16. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls...

  17. Accounting for discovery bias in genomic prediction

    Science.gov (United States)

    Our objective was to evaluate an approach to mitigating discovery bias in genomic prediction. Accuracy may be improved by placing greater emphasis on regions of the genome expected to be more influential on a trait. Methods emphasizing regions result in a phenomenon known as “discovery bias” if info...

  18. A new experimental approach for studying bacterial genomic island evolution identifies island genes with bacterial host-specific expression patterns

    Directory of Open Access Journals (Sweden)

    Nickerson Cheryl A

    2006-01-01

    Full Text Available Abstract Background Genomic islands are regions of bacterial genomes that have been acquired by horizontal transfer and often contain blocks of genes that function together for specific processes. Recently, it has become clear that the impact of genomic islands on the evolution of different bacterial species is significant and represents a major force in establishing bacterial genomic variation. However, the study of genomic island evolution has been mostly performed at the sequence level using computer software or hybridization analysis to compare different bacterial genomic sequences. We describe here a novel experimental approach to study the evolution of species-specific bacterial genomic islands that identifies island genes that have evolved in such a way that they are differentially-expressed depending on the bacterial host background into which they are transferred. Results We demonstrate this approach by using a "test" genomic island that we have cloned from the Salmonella typhimurium genome (island 4305 and transferred to a range of Gram negative bacterial hosts of differing evolutionary relationships to S. typhimurium. Systematic analysis of the expression of the island genes in the different hosts compared to proper controls allowed identification of genes with genera-specific expression patterns. The data from the analysis can be arranged in a matrix to give an expression "array" of the island genes in the different bacterial backgrounds. A conserved 19-bp DNA site was found upstream of at least two of the differentially-expressed island genes. To our knowledge, this is the first systematic analysis of horizontally-transferred genomic island gene expression in a broad range of Gram negative hosts. We also present evidence in this study that the IS200 element found in island 4305 in S. typhimurium strain LT2 was inserted after the island had already been acquired by the S. typhimurium lineage and that this element is likely not

  19. An acquisition account of genomic islands based on genome signature comparisons

    Directory of Open Access Journals (Sweden)

    Luyf ACM

    2005-11-01

    Full Text Available Abstract Background Recent analyses of prokaryotic genome sequences have demonstrated the important force horizontal gene transfer constitutes in genome evolution. Horizontally acquired sequences are detectable by, among others, their dinucleotide composition (genome signature dissimilarity with the host genome. Genomic islands (GIs comprise important and interesting horizontally transferred sequences, but information about acquisition events or relatedness between GIs is scarce. In Vibrio vulnificus CMCP6, 10 and 11 GIs have previously been identified in the sequenced chromosomes I and II, respectively. We assessed the compositional similarity and putative acquisition account of these GIs using the genome signature. For this analysis we developed a new algorithm, available as a web application. Results Of 21 GIs, VvI-1 and VvI-10 of chromosome I have similar genome signatures, and while artificially divided due to a linear annotation, they are adjacent on the circular chromosome and therefore comprise one GI. Similarly, GIs VvI-3 and VvI-4 of chromosome I together with the region between these two islands are compositionally similar, suggesting that they form one GI (making a total of 19 GIs in chromosome I + chromosome II. Cluster analysis assigned the 19 GIs to 11 different branches above our conservative threshold. This suggests a limited number of compositionally similar donors or intragenomic dispersion of ancestral acquisitions. Furthermore, 2 GIs of chromosome II cluster with chromosome I, while none of the 19 GIs group with chromosome II, suggesting an unidirectional dispersal of large anomalous gene clusters from chromosome I to chromosome II. Conclusion From the results, we infer 10 compositionally dissimilar donors for 19 GIs in the V. vulnificus CMCP6 genome, including chromosome I donating to chromosome II. This suggests multiple transfer events from individual donor types or from donors with similar genome signatures. Applied to

  20. Genomic islands in the pathogenic filamentous fungus Aspergillus fumigatus.

    Directory of Open Access Journals (Sweden)

    Natalie D Fedorova

    2008-04-01

    Full Text Available We present the genome sequences of a new clinical isolate of the important human pathogen, Aspergillus fumigatus, A1163, and two closely related but rarely pathogenic species, Neosartorya fischeri NRRL181 and Aspergillus clavatus NRRL1. Comparative genomic analysis of A1163 with the recently sequenced A. fumigatus isolate Af293 has identified core, variable and up to 2% unique genes in each genome. While the core genes are 99.8% identical at the nucleotide level, identity for variable genes can be as low 40%. The most divergent loci appear to contain heterokaryon incompatibility (het genes associated with fungal programmed cell death such as developmental regulator rosA. Cross-species comparison has revealed that 8.5%, 13.5% and 12.6%, respectively, of A. fumigatus, N. fischeri and A. clavatus genes are species-specific. These genes are significantly smaller in size than core genes, contain fewer exons and exhibit a subtelomeric bias. Most of them cluster together in 13 chromosomal islands, which are enriched for pseudogenes, transposons and other repetitive elements. At least 20% of A. fumigatus-specific genes appear to be functional and involved in carbohydrate and chitin catabolism, transport, detoxification, secondary metabolism and other functions that may facilitate the adaptation to heterogeneous environments such as soil or a mammalian host. Contrary to what was suggested previously, their origin cannot be attributed to horizontal gene transfer (HGT, but instead is likely to involve duplication, diversification and differential gene loss (DDL. The role of duplication in the origin of lineage-specific genes is further underlined by the discovery of genomic islands that seem to function as designated "gene dumps" and, perhaps, simultaneously, as "gene factories".

  1. Genomic Prediction of Barley Hybrid Performance

    Directory of Open Access Journals (Sweden)

    Norman Philipp

    2016-07-01

    Full Text Available Hybrid breeding in barley ( L. offers great opportunities to accelerate the rate of genetic improvement and to boost yield stability. A crucial requirement consists of the efficient selection of superior hybrid combinations. We used comprehensive phenotypic and genomic data from a commercial breeding program with the goal of examining the potential to predict the hybrid performances. The phenotypic data were comprised of replicated grain yield trials for 385 two-way and 408 three-way hybrids evaluated in up to 47 environments. The parental lines were genotyped using a 3k single nucleotide polymorphism (SNP array based on an Illumina Infinium assay. We implemented ridge regression best linear unbiased prediction modeling for additive and dominance effects and evaluated the prediction ability using five-fold cross validations. The prediction ability of hybrid performances based on general combining ability (GCA effects was moderate, amounting to 0.56 and 0.48 for two- and three-way hybrids, respectively. The potential of GCA-based hybrid prediction requires that both parental components have been evaluated in a hybrid background. This is not necessary for genomic prediction for which we also observed moderate cross-validated prediction abilities of 0.51 and 0.58 for two- and three-way hybrids, respectively. This exemplifies the potential of genomic prediction in hybrid barley. Interestingly, prediction ability using the two-way hybrids as training population and the three-way hybrids as test population or vice versa was low, presumably, because of the different genetic makeup of the parental source populations. Consequently, further research is needed to optimize genomic prediction approaches combining different source populations in barley.

  2. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping.

    Science.gov (United States)

    Roh, Tae-Young; Cuddapah, Suresh; Zhao, Keji

    2005-03-01

    The identity and developmental potential of a human cell is specified by its epigenome that is largely defined by patterns of chromatin modifications including histone acetylation. Here we report high-resolution genome-wide mapping of diacetylation of histone H3 at Lys 9 and Lys 14 in resting and activated human T cells by genome-wide mapping technique (GMAT). Our data show that high levels of the H3 acetylation are detected in gene-rich regions. The chromatin accessibility and gene expression of a genetic domain is correlated with hyperacetylation of promoters and other regulatory elements but not with generally elevated acetylation of the entire domain. Islands of acetylation are identified in the intergenic and transcribed regions. The locations of the 46,813 acetylation islands identified in this study are significantly correlated with conserved noncoding sequences (CNSs) and many of them are colocalized with known regulatory elements in T cells. TCR signaling induces 4045 new acetylation loci that may mediate the global chromatin remodeling and gene activation. We propose that the acetylation islands are epigenetic marks that allow prediction of functional regulatory elements.

  3. High-density transcriptional initiation signals underline genomic islands in bacteria.

    Directory of Open Access Journals (Sweden)

    Qianli Huang

    Full Text Available Genomic islands (GIs, frequently associated with the pathogenicity of bacteria and having a substantial influence on bacterial evolution, are groups of "alien" elements which probably undergo special temporal-spatial regulation in the host genome. Are there particular hallmark transcriptional signals for these "exotic" regions? We here explore the potential transcriptional signals that underline the GIs beyond the conventional views on basic sequence composition, such as codon usage and GC property bias. It showed that there is a significant enrichment of the transcription start positions (TSPs in the GI regions compared to the whole genome of Salmonella enterica and Escherichia coli. There was up to a four-fold increase for the 70% GIs, implying high-density TSPs profile can potentially differentiate the GI regions. Based on this feature, we developed a new sliding window method GIST, Genomic-island Identification by Signals of Transcription, to identify these regions. Subsequently, we compared the known GI-associated features of the GIs detected by GIST and by the existing method Islandviewer to those of the whole genome. Our method demonstrates high sensitivity in detecting GIs harboring genes with biased GI-like function, preferred subcellular localization, skewed GC property, shorter gene length and biased "non-optimal" codon usage. The special transcriptional signals discovered here may contribute to the coordinate expression regulation of foreign genes. Finally, by using GIST, we detected many interesting GIs in the 2011 German E. coli O104:H4 outbreak strain TY-2482, including the microcin H47 system and gene cluster ycgXEFZ-ymgABC that activates the production of biofilm matrix. The aforesaid findings highlight the power of GIST to predict GIs with distinct intrinsic features to the genome. The heterogeneity of cumulative TSPs profiles may not only be a better identity for "alien" regions, but also provide hints to the special

  4. Predicting biological networks from genomic data

    DEFF Research Database (Denmark)

    Harrington, Eoghan D; Jensen, Lars J; Bork, Peer

    2008-01-01

    Continuing improvements in DNA sequencing technologies are providing us with vast amounts of genomic data from an ever-widening range of organisms. The resulting challenge for bioinformatics is to interpret this deluge of data and place it back into its biological context. Biological networks...... provide a conceptual framework with which we can describe part of this context, namely the different interactions that occur between the molecular components of a cell. Here, we review the computational methods available to predict biological networks from genomic sequence data and discuss how they relate...

  5. Genomic islands of divergence are not affected by geography of speciation in sunflowers.

    Science.gov (United States)

    Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H

    2013-01-01

    Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.

  6. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc;

    in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...... prove useful for less heritable traits such as diseases and fertility...

  7. Identification of Horizontally-transferred Genomic Islands and Genome Segmentation Points by Using the GC Profile Method.

    Science.gov (United States)

    Zhang, Ren; Ou, Hong-Yu; Gao, Feng; Luo, Hao

    2014-04-01

    The nucleotide composition of genomes undergoes dramatic variations among all three kingdoms of life. GC content, an important characteristic for a genome, is related to many important functions, and therefore GC content and its distribution are routinely reported for sequenced genomes. Traditionally, GC content distribution is assessed by computing GC contents in windows that slide along the genome. Disadvantages of this routinely used window-based method include low resolution and low sensitivity. Additionally, different window sizes result in different GC content distribution patterns within the same genome. We proposed a windowless method, the GC profile, for displaying GC content variations across the genome. Compared to the window-based method, the GC profile has the following advantages: 1) higher sensitivity, because of variation-amplifying procedures; 2) higher resolution, because boundaries between domains can be determined at one single base pair; 3) uniqueness, because the GC profile is unique for a given genome and 4) the capacity to show both global and regional GC content distributions. These characteristics are useful in identifying horizontally-transferred genomic islands and homogenous GC-content domains. Here, we review the applications of the GC profile in identifying genomic islands and genome segmentation points, and in serving as a platform to integrate with other algorithms for genome analysis. A web server generating GC profiles and implementing relevant genome segmentation algorithms is available at: www.zcurve.net.

  8. CpG islands undermethylation in human genomic regions under selective pressure.

    Directory of Open Access Journals (Sweden)

    Sergio Cocozza

    Full Text Available DNA methylation at CpG islands (CGIs is one of the most intensively studied epigenetic mechanisms. It is fundamental for cellular differentiation and control of transcriptional potential. DNA methylation is involved also in several processes that are central to evolutionary biology, including phenotypic plasticity and evolvability. In this study, we explored the relationship between CpG islands methylation and signatures of selective pressure in Homo Sapiens, using a computational biology approach. By analyzing methylation data of 25 cell lines from the Encyclopedia of DNA Elements (ENCODE Consortium, we compared the DNA methylation of CpG islands in genomic regions under selective pressure with the methylation of CpG islands in the remaining part of the genome. To define genomic regions under selective pressure, we used three different methods, each oriented to provide distinct information about selective events. Independently of the method and of the cell type used, we found evidences of undermethylation of CGIs in human genomic regions under selective pressure. Additionally, by analyzing SNP frequency in CpG islands, we demonstrated that CpG islands in regions under selective pressure show lower genetic variation. Our findings suggest that the CpG islands in regions under selective pressure seem to be somehow more "protected" from methylation when compared with other regions of the genome.

  9. Genomic Prediction Accounting for Residual Heteroskedasticity.

    Science.gov (United States)

    Ou, Zhining; Tempelman, Robert J; Steibel, Juan P; Ernst, Catherine W; Bates, Ronald O; Bello, Nora M

    2015-11-12

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit.

  10. Genomic Prediction Accounting for Residual Heteroskedasticity

    Science.gov (United States)

    Ou, Zhining; Tempelman, Robert J.; Steibel, Juan P.; Ernst, Catherine W.; Bates, Ronald O.; Bello, Nora M.

    2015-01-01

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. PMID:26564950

  11. Genomic Prediction of Gene Bank Wheat Landraces

    Directory of Open Access Journals (Sweden)

    José Crossa

    2016-07-01

    Full Text Available This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H for the highly heritable traits, days to heading (DTH, and days to maturity (DTM. Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E. Two alternative prediction strategies were studied: (1 random cross-validation of the data in 20% training (TRN and 80% testing (TST (TRN20-TST80 sets, and (2 two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm

  12. Probabilistic prediction of barrier-island response to hurricanes

    Science.gov (United States)

    Plant, Nathaniel G.; Stockdon, Hilary F.

    2012-01-01

    Prediction of barrier-island response to hurricane attack is important for assessing the vulnerability of communities, infrastructure, habitat, and recreational assets to the impacts of storm surge, waves, and erosion. We have demonstrated that a conceptual model intended to make qualitative predictions of the type of beach response to storms (e.g., beach erosion, dune erosion, dune overwash, inundation) can be reformulated in a Bayesian network to make quantitative predictions of the morphologic response. In an application of this approach at Santa Rosa Island, FL, predicted dune-crest elevation changes in response to Hurricane Ivan explained about 20% to 30% of the observed variance. An extended Bayesian network based on the original conceptual model, which included dune elevations, storm surge, and swash, but with the addition of beach and dune widths as input variables, showed improved skill compared to the original model, explaining 70% of dune elevation change variance and about 60% of dune and shoreline position change variance. This probabilistic approach accurately represented prediction uncertainty (measured with the log likelihood ratio), and it outperformed the baseline prediction (i.e., the prior distribution based on the observations). Finally, sensitivity studies demonstrated that degrading the resolution of the Bayesian network or removing data from the calibration process reduced the skill of the predictions by 30% to 40%. The reduction in skill did not change conclusions regarding the relative importance of the input variables, and the extended model's skill always outperformed the original model.

  13. Mitochondrial genomes suggest rapid evolution of dwarf California Channel Islands foxes (Urocyon littoralis.

    Directory of Open Access Journals (Sweden)

    Courtney A Hofman

    Full Text Available Island endemics are typically differentiated from their mainland progenitors in behavior, morphology, and genetics, often resulting from long-term evolutionary change. To examine mechanisms for the origins of island endemism, we present a phylogeographic analysis of whole mitochondrial genomes from the endangered island fox (Urocyon littoralis, endemic to California's Channel Islands, and mainland gray foxes (U. cinereoargenteus. Previous genetic studies suggested that foxes first appeared on the islands >16,000 years ago, before human arrival (~13,000 cal BP, while archaeological and paleontological data supported a colonization >7000 cal BP. Our results are consistent with initial fox colonization of the northern islands probably by rafting or human introduction ~9200-7100 years ago, followed quickly by human translocation of foxes from the northern to southern Channel Islands. Mitogenomes indicate that island foxes are monophyletic and most closely related to gray foxes from northern California that likely experienced a Holocene climate-induced range shift. Our data document rapid morphological evolution of island foxes (in ~2000 years or less. Despite evidence for bottlenecks, island foxes have generated and maintained multiple mitochondrial haplotypes. This study highlights the intertwined evolutionary history of island foxes and humans, and illustrates a new approach for investigating the evolutionary histories of other island endemics.

  14. Using a Bayesian network to predict barrier island geomorphologic characteristics

    Science.gov (United States)

    Gutierrez, Ben; Plant, Nathaniel G.; Thieler, E. Robert; Turecek, Aaron

    2015-01-01

    Quantifying geomorphic variability of coastal environments is important for understanding and describing the vulnerability of coastal topography, infrastructure, and ecosystems to future storms and sea level rise. Here we use a Bayesian network (BN) to test the importance of multiple interactions between barrier island geomorphic variables. This approach models complex interactions and handles uncertainty, which is intrinsic to future sea level rise, storminess, or anthropogenic processes (e.g., beach nourishment and other forms of coastal management). The BN was developed and tested at Assateague Island, Maryland/Virginia, USA, a barrier island with sufficient geomorphic and temporal variability to evaluate our approach. We tested the ability to predict dune height, beach width, and beach height variables using inputs that included longer-term, larger-scale, or external variables (historical shoreline change rates, distances to inlets, barrier width, mean barrier elevation, and anthropogenic modification). Data sets from three different years spanning nearly a decade sampled substantial temporal variability and serve as a proxy for analysis of future conditions. We show that distinct geomorphic conditions are associated with different long-term shoreline change rates and that the most skillful predictions of dune height, beach width, and beach height depend on including multiple input variables simultaneously. The predictive relationships are robust to variations in the amount of input data and to variations in model complexity. The resulting model can be used to evaluate scenarios related to coastal management plans and/or future scenarios where shoreline change rates may differ from those observed historically.

  15. Identification of a genomic island present in the majority of pathogenic isolates of Pseudomonas aeruginosa.

    Science.gov (United States)

    Liang, X; Pham, X Q; Olson, M V; Lory, S

    2001-02-01

    Pseudomonas aeruginosa, a ubiquitous gram-negative bacterium, is capable of colonizing a wide range of environmental niches and can also cause serious infections in humans. In order to understand the genetic makeup of pathogenic P. aeruginosa strains, a method of differential hybridization of arrayed libraries of cloned DNA fragments was developed. An M13 library of DNA from strain X24509, isolated from a patient with a urinary tract infection, was screened using a DNA probe from P. aeruginosa strain PAO1. The genome of PAO1 has been recently sequenced and can be used as a reference for comparisons of genetic organization in different strains. M13 clones that did not react with a DNA probe from PAO1 carried X24509-specific inserts. When a similar array hybridization analysis with DNA probes from different strains was used, a set of M13 clones which carried sequences present in the majority of human P. aeruginosa isolates from a wide range of clinical sources was identified. The inserts of these clones were used to identify cosmids encompassing a contiguous 48.9-kb region of the X24509 chromosome called PAGI-1 (for "P. aeruginosa genomic island 1"). PAGI-1 is incorporated in the X24509 chromosome at a locus that shows a deletion of a 6,729-bp region present in strain PAO1. Survey of the incidence of PAGI-1 revealed that this island is present in 85% of the strains from clinical sources. Approximately half of the PAGI-1-carrying strains show the same deletion as X24509, while the remaining strains contain both the PAGI-1 sequences and the 6,729-bp PAO1 segment. Sequence analysis of PAGI-1 revealed that it contains 51 predicted open reading frames. Several of these genes encoded products with predictable function based on their sequence similarities to known genes, including insertion sequences, determinants of regulatory proteins, a number of dehydrogenase gene homologs, and two for proteins of implicated in detoxification of reactive oxygen species. It is very

  16. Predicting community structure in snakes on Eastern Nearctic islands using ecological neutral theory and phylogenetic methods

    Science.gov (United States)

    Burbrink, Frank T.; McKelvy, Alexander D.; Pyron, R. Alexander; Myers, Edward A.

    2015-01-01

    Predicting species presence and richness on islands is important for understanding the origins of communities and how likely it is that species will disperse and resist extinction. The equilibrium theory of island biogeography (ETIB) and, as a simple model of sampling abundances, the unified neutral theory of biodiversity (UNTB), predict that in situations where mainland to island migration is high, species-abundance relationships explain the presence of taxa on islands. Thus, more abundant mainland species should have a higher probability of occurring on adjacent islands. In contrast to UNTB, if certain groups have traits that permit them to disperse to islands better than other taxa, then phylogeny may be more predictive of which taxa will occur on islands. Taking surveys of 54 island snake communities in the Eastern Nearctic along with mainland communities that have abundance data for each species, we use phylogenetic assembly methods and UNTB estimates to predict island communities. Species richness is predicted by island area, whereas turnover from the mainland to island communities is random with respect to phylogeny. Community structure appears to be ecologically neutral and abundance on the mainland is the best predictor of presence on islands. With regard to young and proximate islands, where allopatric or cladogenetic speciation is not a factor, we find that simple neutral models following UNTB and ETIB predict the structure of island communities. PMID:26609083

  17. Predicting community structure in snakes on Eastern Nearctic islands using ecological neutral theory and phylogenetic methods.

    Science.gov (United States)

    Burbrink, Frank T; McKelvy, Alexander D; Pyron, R Alexander; Myers, Edward A

    2015-11-22

    Predicting species presence and richness on islands is important for understanding the origins of communities and how likely it is that species will disperse and resist extinction. The equilibrium theory of island biogeography (ETIB) and, as a simple model of sampling abundances, the unified neutral theory of biodiversity (UNTB), predict that in situations where mainland to island migration is high, species-abundance relationships explain the presence of taxa on islands. Thus, more abundant mainland species should have a higher probability of occurring on adjacent islands. In contrast to UNTB, if certain groups have traits that permit them to disperse to islands better than other taxa, then phylogeny may be more predictive of which taxa will occur on islands. Taking surveys of 54 island snake communities in the Eastern Nearctic along with mainland communities that have abundance data for each species, we use phylogenetic assembly methods and UNTB estimates to predict island communities. Species richness is predicted by island area, whereas turnover from the mainland to island communities is random with respect to phylogeny. Community structure appears to be ecologically neutral and abundance on the mainland is the best predictor of presence on islands. With regard to young and proximate islands, where allopatric or cladogenetic speciation is not a factor, we find that simple neutral models following UNTB and ETIB predict the structure of island communities.

  18. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    Directory of Open Access Journals (Sweden)

    Sherman David H

    2007-07-01

    Full Text Available Abstract Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Results We identified five large S. coelicolor genomic islands (larger than 25 kb and 18 smaller islets absent in S. lividans chromosome. Many of these regions show anomalous GC bias and codon usage patterns. Six of them are in close vicinity of tRNA genes while nine are flanked with near perfect repeat sequences indicating that these are probable recent evolutionary acquisitions into S. coelicolor. Embedded within these segments are at least four DNA methylases and two probable methyl-sensing restriction endonucleases. Comparison with S. coelicolor transcriptome and proteome data revealed that some of the missing genes are active during the course of growth and differentiation in S. coelicolor. In particular, a pair of methylmalonyl CoA mutase (mcm genes involved in polyketide precursor biosynthesis, an acyl-CoA dehydrogenase implicated in timing of actinorhodin synthesis and bldB, a developmentally significant regulator whose mutation causes complete abrogation of antibiotic synthesis belong to this category. Conclusion Our findings provide tangible hints for elucidating the genetic basis of important phenotypic differences between these two streptomycetes. Importantly, absence of certain genes in S. lividans identified here could potentially explain the relative ease of DNA transformations and the conditional lack of actinorhodin synthesis in S. lividans.

  19. Identification and comparative analysis of a genomic island in Mycobacterium avium subsp. hominissuis.

    Science.gov (United States)

    Lahiri, Annesha; Sanchini, Andrea; Semmler, Torsten; Schäfer, Hubert; Lewin, Astrid

    2014-11-03

    Mycobacterium avium subsp. hominissuis (MAH) is an environmental bacterium causing opportunistic infections. The objective of this study was to identify flexible genome regions in MAH isolated from different sources. By comparing five complete and draft MAH genomes we identified a genomic island conferring additional flexibility to the MAH genomes. The island was absent in one of the five strains and had sizes between 16.37 and 84.85kb in the four other strains. The genes present in the islands differed among strains and included phage- and plasmid-derived genes, integrase genes, hypothetical genes, and virulence-associated genes like mmpL or mce genes. Copyright © 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  20. Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake.

    Science.gov (United States)

    Malinsky, Milan; Challis, Richard J; Tyers, Alexandra M; Schiffels, Stephan; Terai, Yohey; Ngatunga, Benjamin P; Miska, Eric A; Durbin, Richard; Genner, Martin J; Turner, George F

    2015-12-18

    The genomic causes and effects of divergent ecological selection during speciation are still poorly understood. Here we report the discovery and detailed characterization of early-stage adaptive divergence of two cichlid fish ecomorphs in a small (700 meters in diameter) isolated crater lake in Tanzania. The ecomorphs differ in depth preference, male breeding color, body shape, diet, and trophic morphology. With whole-genome sequences of 146 fish, we identified 98 clearly demarcated genomic "islands" of high differentiation and demonstrated the association of genotypes across these islands with divergent mate preferences. The islands contain candidate adaptive genes enriched for functions in sensory perception (including rhodopsin and other twilight-vision-associated genes), hormone signaling, and morphogenesis. Our study suggests mechanisms and genomic regions that may play a role in the closely related mega-radiation of Lake Malawi.

  1. Annotation-Based Whole Genomic Prediction and Selection

    DEFF Research Database (Denmark)

    Kadarmideen, Haja; Do, Duy Ngoc; Janss, Luc

    Cπ method and applied to 1,272 Duroc pigs with both genotypic and phenotypic records including residual (RFI) and daily feed intake (DFI), average daily gain (ADG) and back fat (BF)). Records were split into a training (968 pigs) and a validation dataset (304 pigs). SNPs were annotated by 14 different...... groups. Genomic prediction has accuracy comparable to an own phenotype and use of genomic prediction can be cost effective by replacing feed intake measurement. Use of genomic annotation of SNPs and QTL information had no largely significant impact on predictive accuracy for the current traits but may...... in their contribution to estimated genomic variances and in prediction of genomic breeding values by applying SNP annotation approaches to feed efficiency. Ensembl Variant Predictor (EVP) and Pig QTL database were used as the source of genomic annotation for 60K chip. Genomic prediction was performed using the Bayes...

  2. Genomic prediction across dairy cattle populations and breeds

    DEFF Research Database (Denmark)

    Zhou, Lei

    Genomic prediction is successful in single breed genetic evaluation. However, there is no achievement in acoress breed prediction until now. This thesis investigated genomic prediction across populations and breeds using Chinese Holsterin, Nordic Holstein, Norwgian Red, and Nordic Red. Nordic Red...

  3. Genomic islands link secondary metabolism to functional adaptation in marine Actinobacteria.

    Science.gov (United States)

    Penn, Kevin; Jenkins, Caroline; Nett, Markus; Udwary, Daniel W; Gontang, Erin A; McGlinchey, Ryan P; Foster, Brian; Lapidus, Alla; Podell, Sheila; Allen, Eric E; Moore, Bradley S; Jensen, Paul R

    2009-10-01

    Genomic islands have been shown to harbor functional traits that differentiate ecologically distinct populations of environmental bacteria. A comparative analysis of the complete genome sequences of the marine Actinobacteria Salinispora tropica and Salinispora arenicola reveals that 75% of the species-specific genes are located in 21 genomic islands. These islands are enriched in genes associated with secondary metabolite biosynthesis providing evidence that secondary metabolism is linked to functional adaptation. Secondary metabolism accounts for 8.8% and 10.9% of the genes in the S. tropica and S. arenicola genomes, respectively, and represents the major functional category of annotated genes that differentiates the two species. Genomic islands harbor all 25 of the species-specific biosynthetic pathways, the majority of which occur in S. arenicola and may contribute to the cosmopolitan distribution of this species. Genome evolution is dominated by gene duplication and acquisition, which in the case of secondary metabolism provide immediate opportunities for the production of new bioactive products. Evidence that secondary metabolic pathways are exchanged horizontally, coupled with earlier evidence for fixation among globally distributed populations, supports a functional role and suggests that the acquisition of natural product biosynthetic gene clusters represents a previously unrecognized force driving bacterial diversification. Species-specific differences observed in clustered regularly interspaced short palindromic repeat sequences suggest that S. arenicola may possess a higher level of phage immunity, whereas a highly duplicated family of polymorphic membrane proteins provides evidence for a new mechanism of marine adaptation in Gram-positive bacteria.

  4. A genomic island linked to ecotype divergence in Atlantic cod

    DEFF Research Database (Denmark)

    Hansen, Jakob Hemmer; Eg Nielsen, Einar; Therkildsen, Nina O.;

    2013-01-01

    gene flow and large effective population sizes, properties which theoretically could restrict divergence in local genomic regions. We identify a genomic region of strong population differentiation, extending over approximately 20 cM, between pairs of migratory and stationary ecotypes examined at two......The genomic architecture underlying ecological divergence and ecological speciation with gene flow is still largely unknown for most organisms. One central question is whether divergence is genome‐wide or localized in ‘genomic mosaics’ during early stages when gene flow is still pronounced....... Empirical work has so far been limited, and the relative impacts of gene flow and natural selection on genomic patterns have not been fully explored. Here, we use ecotypes of Atlantic cod to investigate genomic patterns of diversity and population differentiation in a natural system characterized by high...

  5. Empirical and deterministic accuracies of across-population genomic prediction

    NARCIS (Netherlands)

    Wientjes, Y.C.J.; Veerkamp, R.F.; Bijma, P.; Bovenhuis, H.; Schrooten, C.; Calus, M.P.L.

    2015-01-01

    Background: Differences in linkage disequilibrium and in allele substitution effects of QTL (quantitative trait loci) may hinder genomic prediction across populations. Our objective was to develop a deterministic formula to estimate the accuracy of across-population genomic prediction, for which

  6. Excess of genomic defects in a woolly mammoth on Wrangel island

    Science.gov (United States)

    Slatkin, Montgomery

    2017-01-01

    Woolly mammoths (Mammuthus primigenius) populated Siberia, Beringia, and North America during the Pleistocene and early Holocene. Recent breakthroughs in ancient DNA sequencing have allowed for complete genome sequencing for two specimens of woolly mammoths (Palkopoulou et al. 2015). One mammoth specimen is from a mainland population 45,000 years ago when mammoths were plentiful. The second, a 4300 yr old specimen, is derived from an isolated population on Wrangel island where mammoths subsisted with small effective population size more than 43-fold lower than previous populations. These extreme differences in effective population size offer a rare opportunity to test nearly neutral models of genome architecture evolution within a single species. Using these previously published mammoth sequences, we identify deletions, retrogenes, and non-functionalizing point mutations. In the Wrangel island mammoth, we identify a greater number of deletions, a larger proportion of deletions affecting gene sequences, a greater number of candidate retrogenes, and an increased number of premature stop codons. This accumulation of detrimental mutations is consistent with genomic meltdown in response to low effective population sizes in the dwindling mammoth population on Wrangel island. In addition, we observe high rates of loss of olfactory receptors and urinary proteins, either because these loci are non-essential or because they were favored by divergent selective pressures in island environments. Finally, at the locus of FOXQ1 we observe two independent loss-of-function mutations, which would confer a satin coat phenotype in this island woolly mammoth. PMID:28253255

  7. Relative entropy differences in bacterial chromosomes, plasmids, phages and genomic islands

    NARCIS (Netherlands)

    Bohlin, J.; Passel, van M.W.J.

    2012-01-01

    Background: We sought to assess whether the concept of relative entropy (information capacity), could aid our understanding of the process of horizontal gene transfer in microbes. We analyzed the differences in information capacity between prokaryotic chromosomes, genomic islands (GI), phages, and

  8. Relative entropy differences in bacterial chromosomes, plasmids, phages and genomic islands

    NARCIS (Netherlands)

    Bohlin, J.; Passel, van M.W.J.

    2012-01-01

    Background: We sought to assess whether the concept of relative entropy (information capacity), could aid our understanding of the process of horizontal gene transfer in microbes. We analyzed the differences in information capacity between prokaryotic chromosomes, genomic islands (GI), phages, and p

  9. Campylobacter fetus subspecies contain conserved type IV secretion systems on multiple genomic islands and plasmids

    NARCIS (Netherlands)

    Graaf-Van Bloois, Van Der Linda; Miller, William G.; Yee, Emma; Wagenaar, Jaap A.

    2016-01-01

    The features contributing to differences in pathogenicity of the Campylobacter fetus subspecies are unknown. Putative factors involved in pathogenesis are located in genomic islands that encode a type IV secretion system (T4SS) and fic domain (filamentation induced by cyclic AMP) proteins, which

  10. Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution.

    Science.gov (United States)

    Cahill, James A; Green, Richard E; Fulton, Tara L; Stiller, Mathias; Jay, Flora; Ovsyanikov, Nikita; Salamzade, Rauf; St John, John; Stirling, Ian; Slatkin, Montgomery; Shapiro, Beth

    2013-01-01

    Despite extensive genetic analysis, the evolutionary relationship between polar bears (Ursus maritimus) and brown bears (U. arctos) remains unclear. The two most recent comprehensive reports indicate a recent divergence with little subsequent admixture or a much more ancient divergence followed by extensive admixture. At the center of this controversy are the Alaskan ABC Islands brown bears that show evidence of shared ancestry with polar bears. We present an analysis of genome-wide sequence data for seven polar bears, one ABC Islands brown bear, one mainland Alaskan brown bear, and a black bear (U. americanus), plus recently published datasets from other bears. Surprisingly, we find clear evidence for gene flow from polar bears into ABC Islands brown bears but no evidence of gene flow from brown bears into polar bears. Importantly, while polar bears contributed <1% of the autosomal genome of the ABC Islands brown bear, they contributed 6.5% of the X chromosome. The magnitude of sex-biased polar bear ancestry and the clear direction of gene flow suggest a model wherein the enigmatic ABC Island brown bears are the descendants of a polar bear population that was gradually converted into brown bears via male-dominated brown bear admixture. We present a model that reconciles heretofore conflicting genetic observations. We posit that the enigmatic ABC Islands brown bears derive from a population of polar bears likely stranded by the receding ice at the end of the last glacial period. Since then, male brown bear migration onto the island has gradually converted these bears into an admixed population whose phenotype and genotype are principally brown bear, except at mtDNA and X-linked loci. This process of genome erosion and conversion may be a common outcome when climate change or other forces cause a population to become isolated and then overrun by species with which it can hybridize.

  11. Genomic evidence for island population conversion resolves conflicting theories of polar bear evolution.

    Directory of Open Access Journals (Sweden)

    James A Cahill

    Full Text Available Despite extensive genetic analysis, the evolutionary relationship between polar bears (Ursus maritimus and brown bears (U. arctos remains unclear. The two most recent comprehensive reports indicate a recent divergence with little subsequent admixture or a much more ancient divergence followed by extensive admixture. At the center of this controversy are the Alaskan ABC Islands brown bears that show evidence of shared ancestry with polar bears. We present an analysis of genome-wide sequence data for seven polar bears, one ABC Islands brown bear, one mainland Alaskan brown bear, and a black bear (U. americanus, plus recently published datasets from other bears. Surprisingly, we find clear evidence for gene flow from polar bears into ABC Islands brown bears but no evidence of gene flow from brown bears into polar bears. Importantly, while polar bears contributed <1% of the autosomal genome of the ABC Islands brown bear, they contributed 6.5% of the X chromosome. The magnitude of sex-biased polar bear ancestry and the clear direction of gene flow suggest a model wherein the enigmatic ABC Island brown bears are the descendants of a polar bear population that was gradually converted into brown bears via male-dominated brown bear admixture. We present a model that reconciles heretofore conflicting genetic observations. We posit that the enigmatic ABC Islands brown bears derive from a population of polar bears likely stranded by the receding ice at the end of the last glacial period. Since then, male brown bear migration onto the island has gradually converted these bears into an admixed population whose phenotype and genotype are principally brown bear, except at mtDNA and X-linked loci. This process of genome erosion and conversion may be a common outcome when climate change or other forces cause a population to become isolated and then overrun by species with which it can hybridize.

  12. Adaptation in Toxic Environments: Arsenic Genomic Islands in the Bacterial Genus Thiomonas.

    Directory of Open Access Journals (Sweden)

    Kelle C Freel

    Full Text Available Acid mine drainage (AMD is a highly toxic environment for most living organisms due to the presence of many lethal elements including arsenic (As. Thiomonas (Tm. bacteria are found ubiquitously in AMD and can withstand these extreme conditions, in part because they are able to oxidize arsenite. In order to further improve our knowledge concerning the adaptive capacities of these bacteria, we sequenced and assembled the genome of six isolates derived from the Carnoulès AMD, and compared them to the genomes of Tm. arsenitoxydans 3As (isolated from the same site and Tm. intermedia K12 (isolated from a sewage pipe. A detailed analysis of the Tm. sp. CB2 genome revealed various rearrangements had occurred in comparison to what was observed in 3As and K12 and over 20 genomic islands (GEIs were found in each of these three genomes. We performed a detailed comparison of the two arsenic-related islands found in CB2, carrying the genes required for arsenite oxidation and As resistance, with those found in K12, 3As, and five other Thiomonas strains also isolated from Carnoulès (CB1, CB3, CB6, ACO3 and ACO7. Our results suggest that these arsenic-related islands have evolved differentially in these closely related Thiomonas strains, leading to divergent capacities to survive in As rich environments.

  13. "Islands of Divergence" in the Atlantic Cod Genome Represent Polymorphic Chromosomal Rearrangements.

    Science.gov (United States)

    Sodeland, Marte; Jorde, Per Erik; Lien, Sigbjørn; Jentoft, Sissel; Berg, Paul R; Grove, Harald; Kent, Matthew P; Arnyasi, Mariann; Olsen, Esben Moland; Knutsen, Halvor

    2016-04-11

    In several species genetic differentiation across environmental gradients or between geographically separate populations has been reported to center at "genomic islands of divergence," resulting in heterogeneous differentiation patterns across genomes. Here, genomic regions of elevated divergence were observed on three chromosomes of the highly mobile fish Atlantic cod (Gadus morhua) within geographically fine-scaled coastal areas. The "genomic islands" extended at least 5, 9.5, and 13 megabases on linkage groups 2, 7, and 12, respectively, and coincided with large blocks of linkage disequilibrium. For each of these three chromosomes, pairs of segregating, highly divergent alleles were identified, with little or no gene exchange between them. These patterns of recombination and divergence mirror genomic signatures previously described for large polymorphic inversions, which have been shown to repress recombination across extensive chromosomal segments. The lack of genetic exchange permits divergence between noninverted and inverted chromosomes in spite of gene flow. For the rearrangements on linkage groups 2 and 12, allelic frequency shifts between coastal and oceanic environments suggest a role in ecological adaptation, in agreement with recently reported associations between molecular variation within these genomic regions and temperature, oxygen, and salinity levels. Elevated genetic differentiation in these genomic regions has previously been described on both sides of the Atlantic Ocean, and we therefore suggest that these polymorphisms are involved in adaptive divergence across the species distributional range. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Genomic islands as a marker to differentiate between clinical and environmental Burkholderia pseudomallei.

    Directory of Open Access Journals (Sweden)

    Thanatchaporn Bartpho

    Full Text Available Burkholderia pseudomallei, as a saprophytic bacterium that can cause a severe sepsis disease named melioidosis, has preserved several extra genes in its genome for survival. The sequenced genome of the organism showed high diversity contributed mainly from genomic islands (GIs. Comparative genome hybridization (CGH of 3 clinical and 2 environmental isolates, using whole genome microarrays based on B. pseudomallei K96243 genes, revealed a difference in the presence of genomic islands between clinical and environmental isolates. The largest GI, GI8, of B. pseudomallei was observed as a 2 sub-GI named GIs8.1 and 8.2 with distinguishable %GC content and unequal presence in the genome. GIs8.1, 8.2 and 15 were found to be more common in clinical isolates. A new GI, GI16c, was detected on chromosome 2. Presences of GIs8.1, 8.2, 15 and 16c were evaluated in 70 environmental and 64 clinical isolates using PCR assays. A combination of GIs8.1 and 16c (positivity of either GI was detected in 70% of clinical isolates and 11.4% of environmental isolates (P0.05. Some virulence genes located in the absent GIs and the difference of GIs seems to contribute less to bacterial virulence. The PCR detection of 2 GIs could be used as a cost effective and rapid tool to detect potentially virulent isolates that were contaminated in soil.

  15. PromBase: a web resource for various genomic features and predicted promoters in prokaryotic genomes

    Directory of Open Access Journals (Sweden)

    Bansal Manju

    2011-07-01

    Full Text Available Abstract Background As more and more genomes are being sequenced, an overview of their genomic features and annotation of their functional elements, which control the expression of each gene or transcription unit of the genome, is a fundamental challenge in genomics and bioinformatics. Findings Relative stability of DNA sequence has been used to predict promoter regions in 913 microbial genomic sequences with GC-content ranging from 16.6% to 74.9%. Irrespective of the genome GC-content the relative stability based promoter prediction method has already been proven to be robust in terms of recall and precision. The predicted promoter regions for the 913 microbial genomes have been accumulated in a database called PromBase. Promoter search can be carried out in PromBase either by specifying the gene name or the genomic position. Each predicted promoter region has been assigned to a reliability class (low, medium, high, very high and highest based on the difference between its average free energy and the downstream region. The recall and precision values for each class are shown graphically in PromBase. In addition, PromBase provides detailed information about base composition, CDS and CG/TA skews for each genome and various DNA sequence dependent structural properties (average free energy, curvature and bendability in the vicinity of all annotated translation start sites (TLS. Conclusion PromBase is a database, which contains predicted promoter regions and detailed analysis of various genomic features for 913 microbial genomes. PromBase can serve as a valuable resource for comparative genomics study and help the experimentalist to rapidly access detailed information on various genomic features and putative promoter regions in any given genome. This database is freely accessible for academic and non- academic users via the worldwide web http://nucleix.mbu.iisc.ernet.in/prombase/.

  16. Correction for Measurement Error from Genotyping-by-Sequencing in Genomic Variance and Genomic Prediction Models

    DEFF Research Database (Denmark)

    Ashraf, Bilal; Janss, Luc; Jensen, Just

    Genotyping-by-sequencing (GBSeq) is becoming a cost-effective genotyping platform for species without available SNP arrays. GBSeq considers to sequence short reads from restriction sites covering a limited part of the genome (e.g., 5-10%) with low sequencing depth per individual (e.g., 5-10X per....... In the current work we show how the correction for measurement error in GBSeq can also be applied in whole genome genomic variance and genomic prediction models. Bayesian whole-genome random regression models are proposed to allow implementation of large-scale SNP-based models with a per-SNP correction...... for measurement error. We show correct retrieval of genomic explained variance, and improved genomic prediction when accounting for the measurement error in GBSeq data...

  17. Combining SNPs in latent variables to improve genomic prediction

    DEFF Research Database (Denmark)

    Heuven, Henri C M; Rosa, G J M; Janss, Luc

    The objective of this study was to develop and test hierarchical genomic models with latent variables that represent parts of the genomic values. An interaction model and a chromosome model were compared with a model based on variable selection in a simulated and real dataset. The program Bayz......: Hierarchical genetic model; Predictive value; Gibbs sampling; Variable selection....

  18. Genomic-enabled prediction with classification algorithms.

    Science.gov (United States)

    Ornella, L; Pérez, P; Tapia, E; González-Camacho, J M; Burgueño, J; Zhang, X; Singh, S; Vicente, F S; Bonnett, D; Dreisigacker, S; Singh, R; Long, N; Crossa, J

    2014-06-01

    Pearson's correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait-environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen's kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets

  19. A genome-wide map of aberrantly expressed chromosomal islands in colorectal cancer

    Directory of Open Access Journals (Sweden)

    Castanos-Velez Esmeralda

    2006-09-01

    Full Text Available Abstract Background Cancer development is accompanied by genetic phenomena like deletion and amplification of chromosome parts or alterations of chromatin structure. It is expected that these mechanisms have a strong effect on regional gene expression. Results We investigated genome-wide gene expression in colorectal carcinoma (CRC and normal epithelial tissues from 25 patients using oligonucleotide arrays. This allowed us to identify 81 distinct chromosomal islands with aberrant gene expression. Of these, 38 islands show a gain in expression and 43 a loss of expression. In total, 7.892 genes (25.3% of all human genes are located in aberrantly expressed islands. Many chromosomal regions that are linked to hereditary colorectal cancer show deregulated expression. Also, many known tumor genes localize to chromosomal islands of misregulated expression in CRC. Conclusion An extensive comparison with published CGH data suggests that chromosomal regions known for frequent deletions in colon cancer tend to show reduced expression. In contrast, regions that are often amplified in colorectal tumors exhibit heterogeneous expression patterns: even show a decrease of mRNA expression. Because for several islands of deregulated expression chromosomal aberrations have never been observed, we speculate that additional mechanisms (like abnormal states of regional chromatin also have a substantial impact on the formation of co-expression islands in colorectal carcinoma.

  20. Draft Genome Sequence of Klebsiella michiganensis 3T412C, Harboring an Arsenic Resistance Genomic Island, Isolated from Mine Tailings in Peru

    Science.gov (United States)

    Ccorahua-Santo, Robert; Cervantes, Miguel; Duran, Yerson; Aguirre, Mac; Marin, Claudia

    2017-01-01

    ABSTRACT An arsenic resistance genomic island in the bacterium Klebsiella michiganensis 3T412C was isolated from mine tailings from Peru. This genomic island confers adaptation to extreme environments with high concentrations of arsenic. Isolate 3T412C contained a complete set of genes involved in resistance to arsenic. This operon is surrounded by putative genes for resistance to other heavy metals. PMID:28705974

  1. A hybrid neural network system for prediction and recognition of promoter regions in human genome

    Institute of Scientific and Technical Information of China (English)

    CHEN Chuan-bo; LI Tao

    2005-01-01

    This paper proposes a high specificity and sensitivity algorithm called PromPredictor for recognizing promoter regions in the human genome. PromPredictor extracts compositional features and CpG islands information from genomic sequence,feeding these features as input for a hybrid neural network system (HNN) and then applies the HNN for prediction. It combines a novel promoter recognition model, coding theory, feature selection and dimensionality reduction with machine learning algorithm.Evaluation on Human chromosome 22 was ~66% in sensitivity and ~48% in specificity. Comparison with two other systems revealed that our method had superior sensitivity and specificity in predicting promoter regions. PromPredictor is written in MATLAB and requires Matlab to run. PromPredictor is freely available at http://www.whtelecom.com/Prompredictor.htm.

  2. Phylogenetic Relationships of the Fern Cyrtomium falcatum (Dryopteridaceae) from Dokdo Island Based on Chloroplast Genome Sequencing.

    Science.gov (United States)

    Raman, Gurusamy; Choi, Kyoung Su; Park, SeonJoo

    2016-12-02

    Cyrtomium falcatum is a popular ornamental fern cultivated worldwide. Native to the Korean Peninsula, Japan, and Dokdo Island in the Sea of Japan, it is the only fern present on Dokdo Island. We isolated and characterized the chloroplast (cp) genome of C. falcatum, and compared it with those of closely related species. The genes trnV-GAC and trnV-GAU were found to be present within the cp genome of C. falcatum, whereas trnP-GGG and rpl21 were lacking. Moreover, cp genomes of Cyrtomium devexiscapulae and Adiantum capillus-veneris lack trnP-GGG and rpl21, suggesting these are not conserved among angiosperm cp genomes. The deletion of trnR-UCG, trnR-CCG, and trnSeC in the cp genomes of C. falcatum and other eupolypod ferns indicates these genes are restricted to tree ferns, non-core leptosporangiates, and basal ferns. The C. falcatum cp genome also encoded ndhF and rps7, with GUG start codons that were only conserved in polypod ferns, and it shares two significant inversions with other ferns, including a minor inversion of the trnD-GUC region and an approximate 3 kb inversion of the trnG-trnT region. Phylogenetic analyses showed that Equisetum was found to be a sister clade to Psilotales-Ophioglossales with a 100% bootstrap (BS) value. The sister relationship between Pteridaceae and eupolypods was also strongly supported by a 100% BS, but Bayesian molecular clock analyses suggested that C. falcatum diversified in the mid-Paleogene period (45.15 ± 4.93 million years ago) and might have moved from Eurasia to Dokdo Island.

  3. Adaptive divergence despite strong genetic drift: genomic analysis of the evolutionary mechanisms causing genetic differentiation in the island fox (Urocyon littoralis).

    Science.gov (United States)

    Funk, W Chris; Lovich, Robert E; Hohenlohe, Paul A; Hofman, Courtney A; Morrison, Scott A; Sillett, T Scott; Ghalambor, Cameron K; Maldonado, Jesus E; Rick, Torben C; Day, Mitch D; Polato, Nicholas R; Fitzpatrick, Sarah W; Coonan, Timothy J; Crooks, Kevin R; Dillon, Adam; Garcelon, David K; King, Julie L; Boser, Christina L; Gould, Nicholas; Andelt, William F

    2016-05-01

    The evolutionary mechanisms generating the tremendous biodiversity of islands have long fascinated evolutionary biologists. Genetic drift and divergent selection are predicted to be strong on islands and both could drive population divergence and speciation. Alternatively, strong genetic drift may preclude adaptation. We conducted a genomic analysis to test the roles of genetic drift and divergent selection in causing genetic differentiation among populations of the island fox (Urocyon littoralis). This species consists of six subspecies, each of which occupies a different California Channel Island. Analysis of 5293 SNP loci generated using Restriction-site Associated DNA (RAD) sequencing found support for genetic drift as the dominant evolutionary mechanism driving population divergence among island fox populations. In particular, populations had exceptionally low genetic variation, small Ne (range = 2.1-89.7; median = 19.4), and significant genetic signatures of bottlenecks. Moreover, islands with the lowest genetic variation (and, by inference, the strongest historical genetic drift) were most genetically differentiated from mainland grey foxes, and vice versa, indicating genetic drift drives genome-wide divergence. Nonetheless, outlier tests identified 3.6-6.6% of loci as high FST outliers, suggesting that despite strong genetic drift, divergent selection contributes to population divergence. Patterns of similarity among populations based on high FST outliers mirrored patterns based on morphology, providing additional evidence that outliers reflect adaptive divergence. Extremely low genetic variation and small Ne in some island fox populations, particularly on San Nicolas Island, suggest that they may be vulnerable to fixation of deleterious alleles, decreased fitness and reduced adaptive potential.

  4. The effect of genealogy-based haplotypes on genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Fernando, Rohan L.; Su, Guosheng

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression...... on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using...... local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (pi) of the haplotype covariates had zero effect...

  5. Determination and Analysis of the Putative AcaCD-Responsive Promoters of Salmonella Genomic Island 1

    Science.gov (United States)

    Olasz, Ferenc; Kiss, János

    2016-01-01

    The integrative genomic island SGI1 and its variants confer multidrug resistance in numerous Salmonella enterica serovariants and several Proteus mirabilis and Acinetobacter strains. SGI1 is mobilized by the IncA/C family plasmids. The island exploits not only the conjugation apparatus of the plasmid, but also utilizes the plasmid-encoded master regulator AcaCD to induce the excision and formation of its transfer-competent form, which is a key step in the horizontal transfer of SGI1. Triggering of SGI1 excision occurs via the AcaCD-dependent activation of xis gene expression. AcaCD binds in Pxis to an unusually long recognition sequence. Beside the Pxis promoter, upstream regions of four additional SGI1 genes, S004, S005, S012 and S018, also contain putative AcaCD-binding sites. Furthermore, SGI1 also encodes an AcaCD-related activator, FlhDCSGI1, which has no known function. Here, we have analysed the functionality of the putative AcaCD-dependent promoter regions and proved their activation by either AcaCD or FlhDCSGI1. Moreover, we provide evidence that both activators act on the same binding site in Pxis and that FlhDCSGI1 is able to complement the acaCD deletion of the IncA/C family plasmid R16a. We determined the transcription start sites for the AcaCD-responsive promoters and showed that orf S004 is expressed probably from a different start codon than predicted earlier. Additionally, expression of S003 from promoter PS004 was ruled out. Pxis and the four SGI1 promoters examined here also lack obvious -35 promoter box and their promoter profile is consistent with the class II-type activation pathway. Although the role of the four additionally analysed AcaCD/FlhDCSGI1-controlled genes in transfer and/or maintenance of SGI1 is not yet clear, the conservation of the whole region suggests the existence of some selection for their functionality. PMID:27727307

  6. Self-regulating genomic island encoding tandem regulators confers chromatic acclimation to marine Synechococcus.

    Science.gov (United States)

    Sanfilippo, Joseph E; Nguyen, Adam A; Karty, Jonathan A; Shukla, Animesh; Schluchter, Wendy M; Garczarek, Laurence; Partensky, Frédéric; Kehoe, David M

    2016-05-24

    The evolutionary success of marine Synechococcus, the second-most abundant phototrophic group in the marine environment, is partly attributable to this group's ability to use the entire visible spectrum of light for photosynthesis. This group possesses a remarkable diversity of light-harvesting pigments, and most of the group's members are orange and pink because of their use of phycourobilin and phycoerythrobilin chromophores, which are attached to antennae proteins called phycoerythrins. Many strains can alter phycoerythrin chromophore ratios to optimize photon capture in changing blue-green environments using type IV chromatic acclimation (CA4). Although CA4 is common in most marine Synechococcus lineages, the regulation of this process remains unexplored. Here, we show that a widely distributed genomic island encoding tandem master regulators named FciA (for type four chromatic acclimation island) and FciB plays a central role in controlling CA4. FciA and FciB have diametric effects on CA4. Interruption of fciA causes a constitutive green light phenotype, and interruption of fciB causes a constitutive blue light phenotype. These proteins regulate all of the molecular responses occurring during CA4, and the proteins' activity is apparently regulated posttranscriptionally, although their cellular ratio appears to be critical for establishing the set point for the blue-green switch in ecologically relevant light environments. Surprisingly, FciA and FciB coregulate only three genes within the Synechococcus genome, all located within the same genomic island as fciA and fciB These findings, along with the widespread distribution of strains possessing this island, suggest that horizontal transfer of a small, self-regulating DNA region has conferred CA4 capability to marine Synechococcus throughout many oceanic areas.

  7. Genome-Wide Prediction of C. elegans Genetic Interactions

    OpenAIRE

    Zhong, Weiwei; Sternberg, Paul W.

    2006-01-01

    To obtain a global view of functional interactions among genes in a metazoan genome, we computationally integrated interactome data, gene expression data, phenotype data, and functional annotation data from three model organisms—Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster—and predicted genome-wide genetic interactions in C. elegans. The resulting genetic interaction network (consisting of 18,183 interactions) provides a framework for system-level understandin...

  8. Network Based Prediction Model for Genomics Data Analysis*

    OpenAIRE

    Huang, Ying; Wang, Pei

    2012-01-01

    Biological networks, such as genetic regulatory networks and protein interaction networks, provide important information for studying gene/protein activities. In this paper, we propose a new method, NetBoosting, for incorporating a priori biological network information in analyzing high dimensional genomics data. Specially, we are interested in constructing prediction models for disease phenotypes of interest based on genomics data, and at the same time identifying disease susceptible genes. ...

  9. Using Genome-scale Models to Predict Biological Capabilities

    DEFF Research Database (Denmark)

    O’Brien, Edward J.; Monk, Jonathan M.; Palsson, Bernhard O.

    2015-01-01

    Constraint-based reconstruction and analysis (COBRA) methods at the genome scale have been under development since the first whole-genome sequences appeared in the mid-1990s. A few years ago, this approach began to demonstrate the ability to predict a range of cellular functions, including cellular...... growth capabilities on various substrates and the effect of gene knockouts at the genome scale. Thus, much interest has developed in understanding and applying these methods to areas such as metabolic engineering, antibiotic design, and organismal and enzyme evolution. This Primer will get you started....

  10. Predicting Tissue-Specific Enhancers in the Human Genome

    Energy Technology Data Exchange (ETDEWEB)

    Pennacchio, Len A.; Loots, Gabriela G.; Nobrega, Marcelo A.; Ovcharenko, Ivan

    2006-07-01

    Determining how transcriptional regulatory signals areencoded in vertebrate genomes is essential for understanding the originsof multi-cellular complexity; yet the genetic code of vertebrate generegulation remains poorly understood. In an attempt to elucidate thiscode, we synergistically combined genome-wide gene expression profiling,vertebrate genome comparisons, and transcription factor binding siteanalysis to define sequence signatures characteristic of candidatetissue-specific enhancers in the human genome. We applied this strategyto microarray-based gene expression profiles from 79 human tissues andidentified 7,187 candidate enhancers that defined their flanking geneexpression, the majority of which were located outside of knownpromoters. We cross-validated this method for its ability to de novopredict tissue-specific gene expression and confirmed its reliability in57 of the 79 available human tissues, with an average precision inenhancer recognition ranging from 32 percent to 63 percent, and asensitivity of 47 percent. We used the sequence signatures identified bythis approach to assign tissue-specific predictions to ~;328,000human-mouse conserved noncoding elements in the human genome. Byoverlapping these genome-wide predictions with a large in vivo dataset ofenhancers validated in transgenic mice, we confirmed our results with a28 percent sensitivity and 50 percent precision. These results indicatethe power of combining complementary genomic datasets as an initialcomputational foray into the global view of tissue-specific generegulation in vertebrates.

  11. Genome-wide prediction of C. elegans genetic interactions.

    Science.gov (United States)

    Zhong, Weiwei; Sternberg, Paul W

    2006-03-10

    To obtain a global view of functional interactions among genes in a metazoan genome, we computationally integrated interactome data, gene expression data, phenotype data, and functional annotation data from three model organisms-Saccharomyces cerevisiae, Caenorhabditis elegans, and Drosophila melanogaster-and predicted genome-wide genetic interactions in C. elegans. The resulting genetic interaction network (consisting of 18,183 interactions) provides a framework for system-level understanding of gene functions. We experimentally tested the predicted interactions for two human disease-related genes and identified 14 new modifiers.

  12. Genomic prediction of traits related to canine hip dysplasia

    Directory of Open Access Journals (Sweden)

    Enrique eSanchez-Molano

    2015-03-01

    Full Text Available Increased concern for the welfare of pedigree dogs has led to development of selection programs against inherited diseases. An example is canine hip dysplasia (CHD, which has a moderate heritability and a high prevalence in some large-sized breeds. To date, selection using phenotypes has led to only modest improvement, and alternative strategies such as genomic selection may prove more effective. The primary aims of this study were to compare the performance of pedigree- and genomic-based breeding against CHD in the UK Labrador retriever population and to evaluate the performance of different genomic selection methods. A sample of 1179 Labrador Retrievers evaluated for CHD according to the UK scoring method (hip score, HS was genotyped with the Illumina CanineHD BeadChip. Twelve functions of HS and its component traits were analyzed using different statistical methods (GBLUP, Bayes C and Single-Step methods, and results were compared with a pedigree-based approach (BLUP using cross-validation. Genomic methods resulted in similar or higher accuracies than pedigree-based methods with training sets of 944 individuals for all but the untransformed HS, suggesting that genomic selection is an effective strategy. GBLUP and Bayes C gave similar prediction accuracies for HS and related traits, indicating a polygenic architecture. This conclusion was also supported by the low accuracies obtained in additional GBLUP analyses performed using only the SNPs with highest test statistics, also indicating that marker-assisted selection would not be as effective as genomic selection. A Single-Step method that combines genomic and pedigree information also showed higher accuracy than GBLUP and Bayes C for the log-transformed HS, which is currently used for pedigree based evaluations in UK. In conclusion, genomic selection is a promising alternative to pedigree-based selection against CHD, requiring more phenotypes with genomic data to improve further the accuracy

  13. Using Genetic Distance to Infer the Accuracy of Genomic Prediction.

    Directory of Open Access Journals (Sweden)

    Marco Scutari

    2016-09-01

    Full Text Available The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.

  14. A Bayesian Network to Predict Barrier Island Geomorphologic Characteristics

    Science.gov (United States)

    Gutierrez, B.; Plant, N. G.; Thieler, E. R.; Turecek, A.; Stippa, S.

    2014-12-01

    Understanding how barrier islands along the Atlantic and Gulf coasts of the United States respond to storms and sea-level rise is an important management concern. Although these threats are well recognized, quantifying the integrated vulnerability is challenging due to the range of time and space scalesover which these processes act. Developing datasets and methods to identify the physical vulnerabilities of coastal environments due to storms and sea-level rise thus is an important scientific focus that supports land management decision making. Here we employ a Bayesian Network (BN) to model the interactions between geomorphic variables sampled from existing datasets that capture both storm-and sea-level rise related coastal evolution. The BN provides a means of estimating probabilities of changes in specific geomorphic characteristics such as foredune crest height, beach width, beach height, given knowledge of barrier island width, maximum barrier island elevation, distance from an inlet, the presence of anthropogenic modifications, and long-term shoreline change rates, which we assume to be directly related to sea-level rise. We evaluate BN skill and explore how different constraints, such as shoreline change characteristics (eroding, stable, accreting), distance to nearby inlets and island width, affect the probability distributions of future morphological characteristics. Our work demonstrates that a skillful BN can be constructed and that factors such as distance to inlet, shoreline change rate, and the presence of human alterations have the strongest influences on network performance. For Assateague Island, Maryland/Virginia, USA, we find that different shoreline change behaviors affect the probabilities of specific geomorphic characteristics, such as dune height, which allows us to identify vulnerable locations on the barrier island where habitat or infrastructure may be vulnerable to storms and sea-level rise.

  15. Genomic diversity and differentiation of a managed island wild boar population

    DEFF Research Database (Denmark)

    Iacolina, Laura; Scandura, Massimo; J. Goedbloed, Daniel;

    2016-01-01

    The evolution of island populations in natural systems is driven by local adaptation and genetic drift. However, evolutionary pathways may be altered by humans in several ways. The wild boar (WB) (Sus scrofa) is an iconic game species occurring in several islands, where it has been strongly managed...... since prehistoric times. We examined genomic diversity at 49 803 single-nucleotide polymorphisms in 99 Sardinian WBs and compared them with 196 wild specimens from mainland Europe and 105 domestic pigs (DP; 11 breeds). High levels of genetic variation were observed in Sardinia (80.9% of the total number...... of polymorphisms), which can be only in part associated to recent genetic introgression. Both Principal Component Analysis and Bayesian clustering approach revealed that the Sardinian WB population is highly differentiated from the other European populations (FST=0.126–0.138), and from DP (FST=0...

  16. Genomic Island Location in Acinetobacter baumannii Strains by tRIP-PCR Technique

    Directory of Open Access Journals (Sweden)

    Suhadsaad

    2013-11-01

    Full Text Available This study was performed to detect the presence of genomic islands which usually insert in the tRNA genes and other non-coding RNA genes, in this study eight strains of Acinetobacter baumannii (AYE, A457, A14, A424 A473, A92, ACICU, A25 were tested by used of tRIP-(tRNA site Interrogation for Pathogeni city islands, prophases and other GIs-PCR method. The results of PCR and agarose gel electrophoresis for eight strains of two loci #7, #24 were: the results of #7 loci screening showed that all strains were positive except A. baumannii 457 strain was negative. While the results of #24 loci showed presence of foreign DNA in A. baumannii AYE, A457, A14, A424, A473, A92 except the results of (ACICU, A25 was positive.

  17. A Genomic Island in Salmonella enterica ssp. salamae provides new insights on the genealogy of the locus of enterocyte effacement.

    Science.gov (United States)

    Chandry, P Scott; Gladman, Simon; Moore, Sean C; Seemann, Torsten; Crandall, Keith A; Fegan, Narelle

    2012-01-01

    The genomic island encoding the locus of enterocyte effacement (LEE) is an important virulence factor of the human pathogenic Escherichia coli. LEE typically encodes a type III secretion system (T3SS) and secreted effectors capable of forming attaching and effacing lesions. Although prominent in the pathogenic E. coli such as serotype O157:H7, LEE has also been detected in Citrobacter rodentium, E. albertii, and although not confirmed, it is likely to also be in Shigella boydii. Previous phylogenetic analysis of LEE indicated the genomic island was evolving through stepwise acquisition of various components. This study describes a new LEE region from two strains of Salmonella enterica subspecies salamae serovar Sofia along with a phylogenetic analysis of LEE that provides new insights into the likely evolution of this genomic island. The Salmonella LEE contains 36 of the 41 genes typically observed in LEE within a genomic island of 49, 371 bp that encodes a total of 54 genes. A phylogenetic analysis was performed on the entire T3SS and four T3SS genes (escF, escJ, escN, and escV) to elucidate the genealogy of LEE. Phylogenetic analysis inferred that the previously known LEE islands are members of a single lineage distinct from the new Salmonella LEE lineage. The previously known lineage of LEE diverged between islands found in Citrobacter and those in Escherichia and Shigella. Although recombination and horizontal gene transfer are important factors in the genealogy of most genomic islands, the phylogeny of the T3SS of LEE can be interpreted with a bifurcating tree. It seems likely that the LEE island entered the Enterobacteriaceae through horizontal gene transfer as a single unit, rather than as separate subsections, which was then subjected to the forces of both mutational change and recombination.

  18. A Genomic Island in Salmonella enterica ssp. salamae provides new insights on the genealogy of the locus of enterocyte effacement.

    Directory of Open Access Journals (Sweden)

    P Scott Chandry

    Full Text Available The genomic island encoding the locus of enterocyte effacement (LEE is an important virulence factor of the human pathogenic Escherichia coli. LEE typically encodes a type III secretion system (T3SS and secreted effectors capable of forming attaching and effacing lesions. Although prominent in the pathogenic E. coli such as serotype O157:H7, LEE has also been detected in Citrobacter rodentium, E. albertii, and although not confirmed, it is likely to also be in Shigella boydii. Previous phylogenetic analysis of LEE indicated the genomic island was evolving through stepwise acquisition of various components. This study describes a new LEE region from two strains of Salmonella enterica subspecies salamae serovar Sofia along with a phylogenetic analysis of LEE that provides new insights into the likely evolution of this genomic island. The Salmonella LEE contains 36 of the 41 genes typically observed in LEE within a genomic island of 49, 371 bp that encodes a total of 54 genes. A phylogenetic analysis was performed on the entire T3SS and four T3SS genes (escF, escJ, escN, and escV to elucidate the genealogy of LEE. Phylogenetic analysis inferred that the previously known LEE islands are members of a single lineage distinct from the new Salmonella LEE lineage. The previously known lineage of LEE diverged between islands found in Citrobacter and those in Escherichia and Shigella. Although recombination and horizontal gene transfer are important factors in the genealogy of most genomic islands, the phylogeny of the T3SS of LEE can be interpreted with a bifurcating tree. It seems likely that the LEE island entered the Enterobacteriaceae through horizontal gene transfer as a single unit, rather than as separate subsections, which was then subjected to the forces of both mutational change and recombination.

  19. Why close a bacterial genome? The plasmid of Alteromonas macleodii HOT1A3 is a vector for inter-specific transfer of a flexible genomic island

    Directory of Open Access Journals (Sweden)

    Eduard eFadeev

    2016-03-01

    Full Text Available Genome sequencing is rapidly becoming a staple technique in environmental and clinical microbiology, yet computational challenges still remain, leading to many draft genomes which are typically fragmented into many contigs. We sequenced and completely assembled the genome of a marine heterotrophic bacterium, Alteromonas macleodii HOT1A3, and compared its full genome to several draft genomes obtained using different reference-based and de-novo methods. In general, the de-novo assemblies clearly outperformed the reference-based or hybrid ones, covering>99% of the genes and representing essentially all of the gene functions. However, only the fully closed genome (~4.5Mbp allowed us to identify the presence of a large, 148 kbp plasmid, pAM1A3. While HOT1A3 belongs to Alteromonas macleodii, typically found in surface waters (surface ecotype, this plasmid consists of an almost complete flexible genomic island, containing many genes involved in metal resistance previously identified in the genomes of Alteromonas mediterranea (deep ecotype. Indeed, similar to A. mediterranea, A. macleodii HOT1A3 grows at concentrations of zinc, mercury and copper that are inhibitory for other A. macleodii strains. The presence of a plasmid encoding almost an entire flexible genomic island suggests that wholesale genomic exchange between heterotrophic marine bacteria belonging to related but ecologically different populations is not uncommon.

  20. Mobilisation and remobilisation of a large archetypal pathogenicity island of uropathogenic Escherichia coli in vitro support the role of conjugation for horizontal transfer of genomic islands

    Directory of Open Access Journals (Sweden)

    Hochhut Bianca

    2011-09-01

    Full Text Available Abstract Background A substantial amount of data has been accumulated supporting the important role of genomic islands (GEIs - including pathogenicity islands (PAIs - in bacterial genome plasticity and the evolution of bacterial pathogens. Their instability and the high level sequence similarity of different (partial islands suggest an exchange of PAIs between strains of the same or even different bacterial species by horizontal gene transfer (HGT. Transfer events of archetypal large genomic islands of enterobacteria which often lack genes required for mobilisation or transfer have been rarely investigated so far. Results To study mobilisation of such large genomic regions in prototypic uropathogenic E. coli (UPEC strain 536, PAI II536 was supplemented with the mobRP4 region, an origin of replication (oriVR6K, an origin of transfer (oriTRP4 and a chloramphenicol resistance selection marker. In the presence of helper plasmid RP4, conjugative transfer of the 107-kb PAI II536 construct occured from strain 536 into an E. coli K-12 recipient. In transconjugants, PAI II536 existed either as a cytoplasmic circular intermediate (CI or integrated site-specifically into the recipient's chromosome at the leuX tRNA gene. This locus is the chromosomal integration site of PAI II536 in UPEC strain 536. From the E. coli K-12 recipient, the chromosomal PAI II536 construct as well as the CIs could be successfully remobilised and inserted into leuX in a PAI II536 deletion mutant of E. coli 536. Conclusions Our results corroborate that mobilisation and conjugal transfer may contribute to evolution of bacterial pathogens through horizontal transfer of large chromosomal regions such as PAIs. Stabilisation of these mobile genetic elements in the bacterial chromosome result from selective loss of mobilisation and transfer functions of genomic islands.

  1. Genomic Prediction of Testcross Performance in Canola (Brassica napus.

    Directory of Open Access Journals (Sweden)

    Habib U Jan

    Full Text Available Genomic selection (GS is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81 followed by oil yield (0.75 and lowest for seedling emergence (0.29. For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF, prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows

  2. An assessment on epitope prediction methods for protozoa genomes

    Directory of Open Access Journals (Sweden)

    Resende Daniela M

    2012-11-01

    Full Text Available Abstract Background Epitope prediction using computational methods represents one of the most promising approaches to vaccine development. Reduction of time, cost, and the availability of completely sequenced genomes are key points and highly motivating regarding the use of reverse vaccinology. Parasites of genus Leishmania are widely spread and they are the etiologic agents of leishmaniasis. Currently, there is no efficient vaccine against this pathogen and the drug treatment is highly toxic. The lack of sufficiently large datasets of experimentally validated parasites epitopes represents a serious limitation, especially for trypanomatids genomes. In this work we highlight the predictive performances of several algorithms that were evaluated through the development of a MySQL database built with the purpose of: a evaluating individual algorithms prediction performances and their combination for CD8+ T cell epitopes, B-cell epitopes and subcellular localization by means of AUC (Area Under Curve performance and a threshold dependent method that employs a confusion matrix; b integrating data from experimentally validated and in silico predicted epitopes; and c integrating the subcellular localization predictions and experimental data. NetCTL, NetMHC, BepiPred, BCPred12, and AAP12 algorithms were used for in silico epitope prediction and WoLF PSORT, Sigcleave and TargetP for in silico subcellular localization prediction against trypanosomatid genomes. Results A database-driven epitope prediction method was developed with built-in functions that were capable of: a removing experimental data redundancy; b parsing algorithms predictions and storage experimental validated and predict data; and c evaluating algorithm performances. Results show that a better performance is achieved when the combined prediction is considered. This is particularly true for B cell epitope predictors, where the combined prediction of AAP12 and BCPred12 reached an AUC value

  3. Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets

    Directory of Open Access Journals (Sweden)

    Amjad Ali

    2015-01-01

    Full Text Available Helicobacter pylori is a human gastric pathogen implicated as the major cause of peptic ulcer and second leading cause of gastric cancer (~70% around the world. Conversely, an increased resistance to antibiotics and hindrances in the development of vaccines against H. pylori are observed. Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome approach; the predicted conserved gene families (1,193 constitute ~77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost homolog proteins were characterized as universal therapeutic targets against H. pylori based on their functional annotation and protein-protein interaction. Finally, pathogenomics and genome plasticity analysis revealed 3 highly conserved and 2 highly variable putative pathogenicity islands in all of the H. pylori genomes been analyzed.

  4. [Prediction in medicine--genome contra envirome].

    Science.gov (United States)

    Brdicka, Radim

    2012-01-01

    Human phenotype is governed by its genotype--a set of genetic information materialized in DNA. Using traditional terminology we speak about a little more than 20 thousands genes that differ in strength to become realized and their effect is modified by a large number of other genes. The result originates from firmly established programmes we obtained from our ancestors. Development and activity of such molecules selected for maintenance, copying and transfer of information i.e. nucleic acids can be followed back to the very origin of the life. Nevertheless the final result is achieved not only by confrontation of the original information with other genetic information but largely also by external influences--environment. Though we are relatively successful in understanding what we have inherited from our parents, our knowledge of environmental factors and their effects on formation of the phenotype is still limited. From this point of view medical prediction has always to be very cautious and interpretations at the probability level must be done by a very experienced and responsible professional.

  5. Identification of genes and genomic islands correlated with high pathogenicity in Streptococcus suis using whole genome tiling microarrays.

    Directory of Open Access Journals (Sweden)

    Xiao Zheng

    Full Text Available Streptococcus suis is an important zoonotic pathogen that can cause meningitis and sepsis in both pigs and humans. Infections in humans have been sporadic worldwide but two severe outbreaks occurred in China in recent years, while infections in pigs are a major problem in the swine industry. Some S. suis strains are more pathogenic than others with 2 sequence types (ST, ST1 and ST7, being well recognized as highly pathogenic. We analyzed 31 isolates from 23 serotypes and 25 STs by NimbleGen tiling microarray using the genome of a high pathogenicity (HP ST1 strain, GZ1, as reference and a new algorithm to detect gene content difference. The number of genes absent in a strain ranged from 49 to 225 with a total of 632 genes absent in at least one strain, while 1346 genes were found to be invariably present in all strains as the core genome of S. suis, accounting for 68% of the GZ1 genome. The majority of genes are located in chromosomal blocks with two or more contiguous genes. Sixty two blocks are absent in two or more strains and defined as regions of difference (RDs, among which 26 are putative genomic islands (GIs. Clustering and statistical analyses revealed that 8 RDs including 6 putative GIs and 21 genes within these RDs are significantly associated with HP. Three RDs encode known virulence related factors including the extracellular factor, the capsular polysaccharide and a SrtF pilus. The strains were divided into 5 groups based on population genetic analysis of multilocus sequence typing data and the distribution of the RDs among the groups revealed gain and loss of RDs in different groups. Our study elucidated the gene content diversity of S. suis and identified genes that potentially promote HP.

  6. Description of genomic islands associated to the multidrug-resistant Pseudomonas aeruginosa clone ST277.

    Science.gov (United States)

    Silveira, Melise Chaves; Albano, Rodolpho Mattos; Asensi, Marise Dutra; Carvalho-Assef, Ana Paula D'Alincourt

    2016-08-01

    Multidrug-resistant Pseudomonas aeruginosa clone ST277 is disseminated in Brazil where it is mainly associated with the presence of metallo-β-lactamase SPM-1. Furthermore, it carries the class I integron In163 and a 16S rRNA methylase rmtD that confers aminoglycoside resistance. To analyze the genetic characteristics that might be responsible for the success of this endemic clone, genomes of four P. aeruginosa strains that were isolated in distinct years and in different Brazilian states were sequenced. The strains differed regarding the presence of the genes blaSPM-1 and rmtD. Genomic comparisons that included genomes of other clones that have spread worldwide from this species were also performed. These analyses revealed a 763,863bp region in the P. aeruginosa chromosome that concentrates acquired genetic structures comprising two new genomic islands (PAGI-13 and PAGI-14), a mobile element that could be used for ST277 fingerprinting and a recently reported Integrative and Conjugative Element (ICE) associated to blaSPM-1. The genetic elements rmtD and In163 are inserted in PAGI-13 while PAGI-14 has genes encoding proteins related to type III restriction system and phages. The data reported in this study provide a basis for a clearer understanding of the genetic content of clone ST277 and illustrate the mechanisms that are responsible for the success of these endemic clones.

  7. Contrast features of CpG islands in the promoter and other regions in the dog genome.

    Science.gov (United States)

    Han, Leng; Zhao, Zhongming

    2009-08-01

    The recent release of the domestic dog genome provides us with an ideal opportunity to investigate dog-specific genomic features. In this study, we performed a systematic analysis of CpG islands (CGIs), which are often considered gene markers, in the dog genome. Relative to the human and mouse genomes, the dog genome has a remarkably large number of CGIs and high CGI density, which is contributed by its noncoding sequences. Surprisingly, the dog genome has fewer CGIs associated with the promoter regions of genes than the human or the mouse. Further examination of functional features of dog-human-mouse homologous genes suggests that the dog might have undergone a faster erosion rate of promoter-associated CGIs than the human or mouse. Some genetic or genomic factors such as local recombination rate and karyotype may be related to the unique dog CGI features.

  8. An Aeromonas caviae Genomic Island Is Required for both O-Antigen Lipopolysaccharide Biosynthesis and Flagellin Glycosylation ▿

    OpenAIRE

    Tabei, S. Mohammed B.; Hitchen, Paul G.; Day-Williams, Michaela J.; Merino, Susana; Vart, Richard; Pang, Poh-Choo; Horsburgh, Gavin J.; Viches, Silvia; Wilhelms, Markus; Tomás, Juan M.; Dell, Anne; Shaw, Jonathan G

    2009-01-01

    Aeromonas caviae Sch3N possesses a small genomic island that is involved in both flagellin glycosylation and lipopolysaccharide (LPS) O-antigen biosynthesis. This island appears to have been laterally acquired as it is flanked by insertion element-like sequences and has a much lower G+C content than the average aeromonad G+C content. Most of the gene products encoded by the island are orthologues of proteins that have been shown to be involved in pseudaminic acid biosynthesis and flagellin gl...

  9. An infinitesimal model for quantitative trait genomic value prediction.

    Directory of Open Access Journals (Sweden)

    Zhiqiu Hu

    Full Text Available We developed a marker based infinitesimal model for quantitative trait analysis. In contrast to the classical infinitesimal model, we now have new information about the segregation of every individual locus of the entire genome. Under this new model, we propose that the genetic effect of an individual locus is a function of the genome location (a continuous quantity. The overall genetic value of an individual is the weighted integral of the genetic effect function along the genome. Numerical integration is performed to find the integral, which requires partitioning the entire genome into a finite number of bins. Each bin may contain many markers. The integral is approximated by the weighted sum of all the bin effects. We now turn the problem of marker analysis into bin analysis so that the model dimension has decreased from a virtual infinity to a finite number of bins. This new approach can efficiently handle virtually unlimited number of markers without marker selection. The marker based infinitesimal model requires high linkage disequilibrium of all markers within a bin. For populations with low or no linkage disequilibrium, we develop an adaptive infinitesimal model. Both the original and the adaptive models are tested using simulated data as well as beef cattle data. The simulated data analysis shows that there is always an optimal number of bins at which the predictability of the bin model is much greater than the original marker analysis. Result of the beef cattle data analysis indicates that the bin model can increase the predictability from 10% (multiple marker analysis to 33% (multiple bin analysis. The marker based infinitesimal model paves a way towards the solution of genetic mapping and genomic selection using the whole genome sequence data.

  10. Predicting human genetic interactions from cancer genome evolution.

    Directory of Open Access Journals (Sweden)

    Xiaowen Lu

    Full Text Available Synthetic Lethal (SL genetic interactions play a key role in various types of biological research, ranging from understanding genotype-phenotype relationships to identifying drug-targets against cancer. Despite recent advances in empirical measuring SL interactions in human cells, the human genetic interaction map is far from complete. Here, we present a novel approach to predict this map by exploiting patterns in cancer genome evolution. First, we show that empirically determined SL interactions are reflected in various gene presence, absence, and duplication patterns in hundreds of cancer genomes. The most evident pattern that we discovered is that when one member of an SL interaction gene pair is lost, the other gene tends not to be lost, i.e. the absence of co-loss. This observation is in line with expectation, because the loss of an SL interacting pair will be lethal to the cancer cell. SL interactions are also reflected in gene expression profiles, such as an under representation of cases where the genes in an SL pair are both under expressed, and an over representation of cases where one gene of an SL pair is under expressed, while the other one is over expressed. We integrated the various previously unknown cancer genome patterns and the gene expression patterns into a computational model to identify SL pairs. This simple, genome-wide model achieves a high prediction power (AUC = 0.75 for known genetic interactions. It allows us to present for the first time a comprehensive genome-wide list of SL interactions with a high estimated prediction precision, covering up to 591,000 gene pairs. This unique list can potentially be used in various application areas ranging from biotechnology to medical genetics.

  11. Genomic prediction for tuberculosis resistance in dairy cattle.

    Directory of Open Access Journals (Sweden)

    Smaragda Tsairidou

    Full Text Available BACKGROUND: The increasing prevalence of bovine tuberculosis (bTB in the UK and the limitations of the currently available diagnostic and control methods require the development of complementary approaches to assist in the sustainable control of the disease. One potential approach is the identification of animals that are genetically more resistant to bTB, to enable breeding of animals with enhanced resistance. This paper focuses on prediction of resistance to bTB. We explore estimation of direct genomic estimated breeding values (DGVs for bTB resistance in UK dairy cattle, using dense SNP chip data, and test these genomic predictions for situations when disease phenotypes are not available on selection candidates. METHODOLOGY/PRINCIPAL FINDINGS: We estimated DGVs using genomic best linear unbiased prediction methodology, and assessed their predictive accuracies with a cross validation procedure and receiver operator characteristic (ROC curves. Furthermore, these results were compared with theoretical expectations for prediction accuracy and area-under-the-ROC-curve (AUC. The dataset comprised 1151 Holstein-Friesian cows (bTB cases or controls. All individuals (592 cases and 559 controls were genotyped for 727,252 loci (Illumina Bead Chip. The estimated observed heritability of bTB resistance was 0.23±0.06 (0.34 on the liability scale and five-fold cross validation, replicated six times, provided a prediction accuracy of 0.33 (95% C.I.: 0.26, 0.40. ROC curves, and the resulting AUC, gave a probability of 0.58, averaged across six replicates, of correctly classifying cows as diseased or as healthy based on SNP chip genotype alone using these data. CONCLUSIONS/SIGNIFICANCE: These results provide a first step in the investigation of the potential feasibility of genomic selection for bTB resistance using SNP data. Specifically, they demonstrate that genomic selection is possible, even in populations with no pedigree data and on animals lacking b

  12. Psoriasis prediction from genome-wide SNP profiles

    Directory of Open Access Journals (Sweden)

    Fang Xiangzhong

    2011-01-01

    Full Text Available Abstract Background With the availability of large-scale genome-wide association study (GWAS data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs to predict psoriasis from searching GWAS data. Methods Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB method was compared with classical linear discriminant analysis(LDA for classification performance. Results The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698, while only 0.520(95% CI: 0.472-0.524 was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study. Conclusions The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.

  13. Whole-genome sequence of Sunxiuqinia dokdonensis DH1T, isolated from deep sub-seafloor sediment in Dokdo Island

    OpenAIRE

    Sooyeon Lim; Dong-Ho Chang; Byoung-Chan Kim

    2016-01-01

    Sunxiuqinia dokdonensis DH1T was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  14. Whole-genome sequence of Sunxiuqinia dokdonensis DH1(T), isolated from deep sub-seafloor sediment in Dokdo Island.

    Science.gov (United States)

    Lim, Sooyeon; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-09-01

    Sunxiuqinia dokdonensis DH1(T) was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island) in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  15. Whole-genome sequence of Sunxiuqinia dokdonensis DH1T, isolated from deep sub-seafloor sediment in Dokdo Island

    Directory of Open Access Journals (Sweden)

    Sooyeon Lim

    2016-09-01

    Full Text Available Sunxiuqinia dokdonensis DH1T was isolated from deep sub-seafloor sediment at a depth of 900 m below the seafloor off Seo-do (the west part of Dokdo Island in the East Sea of the Republic of Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession LGIA00000000.

  16. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    Science.gov (United States)

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  17. Predicting disease trait with genomic data: a composite kernel approach.

    Science.gov (United States)

    Yang, Haitao; Li, Shaoyu; Cao, Hongyan; Zhang, Chichen; Cui, Yuehua

    2016-06-02

    With the advancement of biotechniques, a vast amount of genomic data is generated with no limit. Predicting a disease trait based on these data offers a cost-effective and time-efficient way for early disease screening. Here we proposed a composite kernel partial least squares (CKPLS) regression model for quantitative disease trait prediction focusing on genomic data. It can efficiently capture nonlinear relationships among features compared with linear learning algorithms such as Least Absolute Shrinkage and Selection Operator or ridge regression. We proposed to optimize the kernel parameters and kernel weights with the genetic algorithm (GA). In addition to improved performance for parameter optimization, the proposed GA-CKPLS approach also has better learning capacity and generalization ability compared with single kernel-based KPLS method as well as other nonlinear prediction models such as the support vector regression. Extensive simulation studies demonstrated that GA-CKPLS had better prediction performance than its counterparts under different scenarios. The utility of the method was further demonstrated through two case studies. Our method provides an efficient quantitative platform for disease trait prediction based on increasing volume of omics data.

  18. Genomic prediction in a breeding program of perennial ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario; Ashraf, Bilal; Greve-Pedersen, Morten

    2015-01-01

    We present a genomic selection study performed on 1918 rye grass families (Lolium perenne L.), which were derived from a commercial breeding program at DLF-Trifolium, Denmark. Phenotypes were recorded on standard plots, across 13 years and in 6 different countries. Variants were identified...... this set. Estimated Breeding Value and prediction accuracies were calculated trough two different cross-validation schemes: (i) k-fold (k=10); (ii) leaving out one parent combination at the time, in order to test for accuracy of predicting new families. Accuracies ranged between 0.56 and 0.97 for scheme (i....... A larger set of 1791 F2s were used as training set to predict EBVs of 127 synthetic families (originated from poly-crosses between 5-11 single plants) for heading date and crown rust resistance. Prediction accuracies were 0.93 and 0.57 respectively. Results clearly demonstrate considerable potential...

  19. Accurate Localization of the Integration Sites of Two Genomic Islands at Single-Nucleotide Resolution in the Genome of Bacillus cereus ATCC 10987

    Directory of Open Access Journals (Sweden)

    Ren Zhang

    2008-01-01

    Full Text Available We have identified two genomic islands, that is, BCEGI-1 and BCEGI-2, in the genome of Bacillus cereus ATCC 10987, based on comparative analysis with Bacillus cereus ATCC 14579. Furthermore, by using the cumulative GC profile and performing homology searches between the two genomes, the integration sites of the two genomic islands were determined at single-nucleotide resolution. BCEGI-1 is integrated between 159705 bp and 198000 bp, whereas BCEGI-2 is integrated between the end of ORF BCE4594 and the start of the intergenic sequence immediately following BCE4626, that is, from 4256803 bp to 4285534 bp. BCEGI-1 harbors two bacterial Tn7 transposons, which have two sets of genes encoding TnsA, B, C, and D. It is generally believed that unlike the TnsABC+E pathway, the TnsABC+D pathway would only promote vertical transmission to daughter cells. The evidence presented in this paper, however, suggests a role of the TnsABC+D pathway in the horizontal transfer of some genomic islands.

  20. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Su, Guosheng; Janss, Luc

    2015-01-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected...... itself. Depending on the trait’s economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage...... was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from...

  1. Antibiotic resistance, integrons and Salmonella genomic island 1 among non-typhoidal Salmonella serovars in The Netherlands.

    NARCIS (Netherlands)

    Vo, An T T; Duijkeren, Engeline van; Fluit, Ad C; Wannet, Wim J B; Verbruggen, Anjo J; Maas, Henny M E; Gaastra, Wim

    2006-01-01

    The objective of this study was to investigate the antimicrobial resistance patterns, integron characteristics and gene cassettes as well as the presence of Salmonella genomic island 1 (SGI1) in non-typhoidal Salmonella (NTS) isolates from human and animal origin. Epidemiologically unrelated Dutch N

  2. Identification of a novel genomic island specific to hospital-acquired clonal complex 17 Enterococcus faecium isolates.

    Science.gov (United States)

    Heikens, Esther; van Schaik, Willem; Leavis, Helen L; Bonten, Marc J M; Willems, Rob J L

    2008-11-01

    Hospital-acquired clonal complex 17 (CC17) Enterococcus faecium strains are genetically distinct from indigenous strains and are enriched with resistance genes and virulence genes. We identified a genomic island in CC17 E. faecium tentatively encoding a metabolic pathway involved in carbohydrate transport and metabolism, which may provide a competitive advantage over the indigenous E. faecium microbiota.

  3. Identification of a Novel Genomic Island Specific to Hospital-Acquired Clonal Complex 17 Enterococcus faecium Isolates

    NARCIS (Netherlands)

    Heikens, Esther; van Schaik, Willem; Leavis, Helen L.; Bonten, Marc J. M.; Willems, Rob J. L.

    2008-01-01

    Hospital-acquired clonal complex 17 (CC17) Enterococcus faecium strains are genetically distinct from indigenous strains and are enriched with resistance genes and virulence genes. We identified a genomic island in CC17 E. faecium tentatively encoding a metabolic pathway involved in carbohydrate tra

  4. A genomic island provides Acidithiobacillus ferrooxidans ATCC 53993 additional copper resistance: a possible competitive advantage.

    Science.gov (United States)

    Orellana, Luis H; Jerez, Carlos A

    2011-11-01

    There is great interest in understanding how extremophilic biomining bacteria adapt to exceptionally high copper concentrations in their environment. Acidithiobacillus ferrooxidans ATCC 53993 genome possesses the same copper resistance determinants as strain ATCC 23270. However, the former strain contains in its genome a 160-kb genomic island (GI), which is absent in ATCC 23270. This GI contains, amongst other genes, several genes coding for an additional putative copper ATPase and a Cus system. A. ferrooxidans ATCC 53993 showed a much higher resistance to CuSO(4) (>100 mM) than that of strain ATCC 23270 (<25 mM). When a similar number of bacteria from each strain were mixed and allowed to grow in the absence of copper, their respective final numbers remained approximately equal. However, in the presence of copper, there was a clear overgrowth of strain ATCC 53993 compared to ATCC 23270. This behavior is most likely explained by the presence of the additional copper-resistance genes in the GI of strain ATCC 53993. As determined by qRT-PCR, it was demonstrated that these genes are upregulated when A. ferrooxidans ATCC 53993 is grown in the presence of copper and were shown to be functional when expressed in copper-sensitive Escherichia coli mutants. Thus, the reason for resistance to copper of two strains of the same acidophilic microorganism could be determined by slight differences in their genomes, which may not only lead to changes in their capacities to adapt to their environment, but may also help to select the more fit microorganisms for industrial biomining operations. © Springer-Verlag 2011

  5. A hypervariable genomic island identified in clinical and environmental Mycobacterium avium subsp. hominissuis isolates from Germany.

    Science.gov (United States)

    Sanchini, Andrea; Semmler, Torsten; Mao, Lei; Kumar, Narender; Dematheis, Flavia; Tandon, Kshitij; Peddireddy, Vidyullatha; Ahmed, Niyaz; Lewin, Astrid

    2016-11-01

    Mycobacterium avium subsp. hominissuis (MAH) is an opportunistic human pathogen widespread in the environment. Genomic islands (GI)s represent a part of the accessory genome of bacteria and influence virulence, drug-resistance or fitness and trigger bacterial evolution. We previously identified a novel GI in four MAH genomes. Here, we further explored this GI in a larger collection of MAH isolates from Germany (n=41), including 20 clinical and 21 environmental isolates. Based on comparative whole genome analysis, we detected this GI in 39/41 (95.1%) isolates. Although all these GIs integrated in the same insertion hotspot, there is high variability in the genetic structure of this GI: eight different types of GI have been identified, designated A-H (sized 6.2-73.3kb). These GIs were arranged as single GI (23/41, 56.1%), combination of two different GIs (14/41, 34.1%) or combination of three different GIs (2/41, 4.9%) in the insertion hotspot. Moreover, two GI types shared more than 80% sequence identity with sequences of M. canettii, responsible for Tuberculosis. A total of 253 different genes were identified in all GIs, among which the previously documented virulence-related genes mmpL10 and mce. The diversity of the GI and the sequence similarity with other mycobacteria suggests cross-species transfer, involving also highly pathogenic species. Shuffling of potential virulence genes such as mmpL10 via this GI may create new pathogens that can cause future outbreaks. Copyright © 2016 Elsevier GmbH. All rights reserved.

  6. Antimicrobial resistance, class 1 integrons, and genomic island 1 in Salmonella isolates from Vietnam.

    Directory of Open Access Journals (Sweden)

    An T T Vo

    Full Text Available BACKGROUND: The objective was to investigate the phenotypic and genotypic resistance and the horizontal transfer of resistance determinants from Salmonella isolates from humans and animals in Vietnam. METHODOLOGY/PRINCIPAL FINDINGS: The susceptibility of 297 epidemiologically unrelated non-typhoid Salmonella isolates was investigated by disk diffusion assay. The isolates were screened for the presence of class 1 integrons and Salmonella genomic island 1 by PCR. The potential for the transfer of resistance determinants was investigated by conjugation experiments. Resistance to gentamicin, kanamycin, chloramphenicol, streptomycin, trimethoprim, ampicillin, nalidixic acid, sulphonamides, and tetracycline was found in 13 to 50% of the isolates. Nine distinct integron types were detected in 28% of the isolates belonging to 11 Salmonella serovars including S. Tallahassee. Gene cassettes identified were aadA1, aadA2, aadA5, bla(PSE-1, bla(OXA-30, dfrA1, dfrA12, dfrA17, and sat, as well as open reading frames with unknown functions. Most integrons were located on conjugative plasmids, which can transfer their antimicrobial resistance determinants to Escherichia coli or Salmonella Enteritidis, or with Salmonella Genomic Island 1 or its variants. The resistance gene cluster in serovar Emek identified by PCR mapping and nucleotide sequencing contained SGI1-J3 which is integrated in SGI1 at another position than the majority of SGI1. This is the second report on the insertion of SGI1 at this position. High-level resistance to fluoroquinolones was found in 3 multiresistant S. Typhimurium isolates and was associated with mutations in the gyrA gene leading to the amino acid changes Ser83Phe and Asp87Asn. CONCLUSIONS: Resistance was common among Vietnamese Salmonella isolates from different sources. Legislation to enforce a more prudent use of antibiotics in both human and veterinary medicine should be implemented by the authorities in Vietnam.

  7. MobilomeFINDER: Web-Based Tools for In Silico and Experimental Discovery of Bacterial Genomic Islands

    OpenAIRE

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Hinton, Jay C. D.; Barer, Michael R.; Deng, Zixin; Rajakumar, Kumar; Lory, Stephen

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’....

  8. Cloning and sequencing of a genomic island found in the Brazilian purpuric fever clone of Haemophilus influenzae biogroup aegyptius.

    Science.gov (United States)

    McGillivary, Glen; Tomaras, Andrew P; Rhodes, Eric R; Actis, Luis A

    2005-04-01

    A genomic island was identified in the Haemophilus influenzae biogroup aegyptius Brazilian purpuric fever (BPF) strain F3031. This island, which was also found in other BPF isolates, could not be detected in non-BPF biogroup aegyptius strains or in nontypeable or typeable H. influenzae strains, with the exception of a region present in the type b Eagan strain. This 34,378-bp island is inserted, in reference to H. influenzae Rd KW20, within a choline transport gene and contains a mosaic structure of Mu-like prophage genes, several hypothetical genes, and genes potentially encoding an Erwinia carotovora carotovoricin Er-like bacteriocin. The product of the tail fiber ORF in the bacteriocin-like region shows a hybrid structure where the C terminus is similar to an H. influenzae phage HP1 tail protein implicating this open reading frame in altering host specificity for a putative bacteriocin. Significant synteny is seen in the entire genomic island with genomic regions from Salmonella enterica subsp. enterica serovar Typhi CT18, Photorhabdus luminescens subsp. laumondii TT01, Chromobacterium violaceum, and to a lesser extent Haemophilus ducreyi 35000HP. In a previous work, we isolated several BPF-specific DNA fragments through a genome subtraction procedure, and we have found that a majority of these fragments map to this locus. In addition, several subtracted fragments generated from an independent laboratory by using different but related strains also map to this island. These findings underscore the importance of this BPF-specific chromosomal region in explaining some of the genomic differences between highly invasive BPF strains and non-BPF isolates of biogroup aegyptius.

  9. Predicting genome-wide redundancy using machine learning

    Directory of Open Access Journals (Sweden)

    Shasha Dennis E

    2010-11-01

    Full Text Available Abstract Background Gene duplication can lead to genetic redundancy, which masks the function of mutated genes in genetic analyses. Methods to increase sensitivity in identifying genetic redundancy can improve the efficiency of reverse genetics and lend insights into the evolutionary outcomes of gene duplication. Machine learning techniques are well suited to classifying gene family members into redundant and non-redundant gene pairs in model species where sufficient genetic and genomic data is available, such as Arabidopsis thaliana, the test case used here. Results Machine learning techniques that combine multiple attributes led to a dramatic improvement in predicting genetic redundancy over single trait classifiers alone, such as BLAST E-values or expression correlation. In withholding analysis, one of the methods used here, Support Vector Machines, was two-fold more precise than single attribute classifiers, reaching a level where the majority of redundant calls were correctly labeled. Using this higher confidence in identifying redundancy, machine learning predicts that about half of all genes in Arabidopsis showed the signature of predicted redundancy with at least one but typically less than three other family members. Interestingly, a large proportion of predicted redundant gene pairs were relatively old duplications (e.g., Ks > 1, suggesting that redundancy is stable over long evolutionary periods. Conclusions Machine learning predicts that most genes will have a functionally redundant paralog but will exhibit redundancy with relatively few genes within a family. The predictions and gene pair attributes for Arabidopsis provide a new resource for research in genetics and genome evolution. These techniques can now be applied to other organisms.

  10. KRAS Genomic Status Predicts the Sensitivity of Ovarian Cancer Cells to Decitabine | Office of Cancer Genomics

    Science.gov (United States)

    Decitabine, a cancer therapeutic that inhibits DNA methylation, produces variable antitumor response rates in patients with solid tumors that might be leveraged clinically with identification of a predictive biomarker. In this study, we profiled the response of human ovarian, melanoma, and breast cancer cells treated with decitabine, finding that RAS/MEK/ERK pathway activation and DNMT1 expression correlated with cytotoxic activity. Further, we showed that KRAS genomic status predicted decitabine sensitivity in low-grade and high-grade serous ovarian cancer cells.

  11. Genomic Signal Processing: Predicting Basic Molecular Biological Principles

    Science.gov (United States)

    Alter, Orly

    2005-03-01

    Advances in high-throughput technologies enable acquisition of different types of molecular biological data, monitoring the flow of biological information as DNA is transcribed to RNA, and RNA is translated to proteins, on a genomic scale. Future discovery in biology and medicine will come from the mathematical modeling of these data, which hold the key to fundamental understanding of life on the molecular level, as well as answers to questions regarding diagnosis, treatment and drug development. Recently we described data-driven models for genome-scale molecular biological data, which use singular value decomposition (SVD) and the comparative generalized SVD (GSVD). Now we describe an integrative data-driven model, which uses pseudoinverse projection (1). We also demonstrate the predictive power of these matrix algebra models (2). The integrative pseudoinverse projection model formulates any number of genome-scale molecular biological data sets in terms of one chosen set of data samples, or of profiles extracted mathematically from data samples, designated the ``basis'' set. The mathematical variables of this integrative model, the pseudoinverse correlation patterns that are uncovered in the data, represent independent processes and corresponding cellular states (such as observed genome-wide effects of known regulators or transcription factors, the biological components of the cellular machinery that generate the genomic signals, and measured samples in which these regulators or transcription factors are over- or underactive). Reconstruction of the data in the basis simulates experimental observation of only the cellular states manifest in the data that correspond to those of the basis. Classification of the data samples according to their reconstruction in the basis, rather than their overall measured profiles, maps the cellular states of the data onto those of the basis, and gives a global picture of the correlations and possibly also causal coordination of

  12. Predicting statistical properties of open reading frames in bacterial genomes.

    Directory of Open Access Journals (Sweden)

    Katharina Mir

    Full Text Available An analytical model based on the statistical properties of Open Reading Frames (ORFs of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.

  13. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

    Directory of Open Access Journals (Sweden)

    Vijaykumar Yogesh Muley

    Full Text Available BACKGROUND: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. METHODS: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. CONCLUSIONS: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling

  14. Genome sequence of Bradyrhizobium sp. WSM1253; a microsymbiont of Ornithopus compressus from the Greek Island of Sifnos.

    Science.gov (United States)

    Tiwari, Ravi; Howieson, John; Yates, Ron; Tian, Rui; Held, Britanny; Tapia, Roxanne; Han, Cliff; Seshadri, Rekha; Reddy, T B K; Huntemann, Marcel; Pati, Amrita; Woyke, Tanja; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Reeve, Wayne

    2015-01-01

    Bradyrhizobium sp. WSM1253 is a novel N2-fixing bacterium isolated from a root nodule of the herbaceous annual legume Ornithopus compressus that was growing on the Greek Island of Sifnos. WSM1253 emerged as a strain of interest in an Australian program that was selecting inoculant quality bradyrhizobial strains for inoculation of Mediterranean species of lupins (Lupinus angustifolius, L. princei, L. atlanticus, L. pilosus). In this report we describe, for the first time, the genome sequence information and annotation of this legume microsymbiont. The 8,719,808 bp genome has a G + C content of 63.09 % with 71 contigs arranged into two scaffolds. The assembled genome contains 8,432 protein-coding genes, 66 RNA genes and a single rRNA operon. This improved-high-quality draft rhizobial genome is one of 20 sequenced through a DOE Joint Genome Institute 2010 Community Sequencing Project.

  15. Genomic medicine and risk prediction across the disease spectrum.

    Science.gov (United States)

    Kotze, Maritha J; Lückhoff, Hilmar K; Peeters, Armand V; Baatjes, Karin; Schoeman, Mardelle; van der Merwe, Lize; Grant, Kathleen A; Fisher, Leslie R; van der Merwe, Nicole; Pretorius, Jacobus; van Velden, David P; Myburgh, Ettienne J; Pienaar, Fredrieka M; van Rensburg, Susan J; Yako, Yandiswa Y; September, Alison V; Moremi, Kelebogile E; Cronje, Frans J; Tiffin, Nicki; Bouwens, Christianne S H; Bezuidenhout, Juanita; Apffelstaedt, Justus P; Hough, F Stephen; Erasmus, Rajiv T; Schneider, Johann W

    2015-01-01

    Genomic medicine is based on the knowledge that virtually every medical condition, disease susceptibility or response to treatment is caused, regulated or influenced by genes. Genetic testing may therefore add value across the disease spectrum, ranging from single-gene disorders with a Mendelian inheritance pattern to complex multi-factorial diseases. The critical factors for genomic risk prediction are to determine: (1) where the genomic footprint of a particular susceptibility or dysfunction resides within this continuum, and (2) to what extent the genetic determinants are modified by environmental exposures. Regarding the small subset of highly penetrant monogenic disorders, a positive family history and early disease onset are mostly sufficient to determine the appropriateness of genetic testing in the index case and to inform pre-symptomatic diagnosis in at-risk family members. In more prevalent polygenic non-communicable diseases (NCDs), the use of appropriate eligibility criteria is required to ensure a balance between benefit and risk. An additional screening step may therefore be necessary to identify individuals most likely to benefit from genetic testing. This need provided the stimulus for the development of a pathology-supported genetic testing (PSGT) service as a new model for the translational implementation of genomic medicine in clinical practice. PSGT is linked to the establishment of a research database proven to be an invaluable resource for the validation of novel and previously described gene-disease associations replicated in the South African population for a broad range of NCDs associated with increased cardio-metabolic risk. The clinical importance of inquiry concerning family history in determining eligibility for personalized genotyping was supported beyond its current limited role in diagnosing or screening for monogenic subtypes of NCDs. With the recent introduction of advanced microarray-based breast cancer subtyping, genetic testing

  16. Genome-wide association studies in an isolated founder population from the Pacific Island of Kosrae.

    Directory of Open Access Journals (Sweden)

    Jennifer K Lowe

    2009-02-01

    Full Text Available It has been argued that the limited genetic diversity and reduced allelic heterogeneity observed in isolated founder populations facilitates discovery of loci contributing to both Mendelian and complex disease. A strong founder effect, severe isolation, and substantial inbreeding have dramatically reduced genetic diversity in natives from the island of Kosrae, Federated States of Micronesia, who exhibit a high prevalence of obesity and other metabolic disorders. We hypothesized that genetic drift and possibly natural selection on Kosrae might have increased the frequency of previously rare genetic variants with relatively large effects, making these alleles readily detectable in genome-wide association analysis. However, mapping in large, inbred cohorts introduces analytic challenges, as extensive relatedness between subjects violates the assumptions of independence upon which traditional association test statistics are based. We performed genome-wide association analysis for 15 quantitative traits in 2,906 members of the Kosrae population, using novel approaches to manage the extreme relatedness in the sample. As positive controls, we observe association to known loci for plasma cholesterol, triglycerides, and C-reactive protein and to a compelling candidate loci for thyroid stimulating hormone and fasting plasma glucose. We show that our study is well powered to detect common alleles explaining >/=5% phenotypic variance. However, no such large effects were observed with genome-wide significance, arguing that even in such a severely inbred population, common alleles typically have modest effects. Finally, we show that a majority of common variants discovered in Caucasians have indistinguishable effect sizes on Kosrae, despite the major differences in population genetics and environment.

  17. The evolution of genomic imprinting: theories, predictions and empirical tests.

    Science.gov (United States)

    Patten, M M; Ross, L; Curley, J P; Queller, D C; Bonduriansky, R; Wolf, J B

    2014-08-01

    The epigenetic phenomenon of genomic imprinting has motivated the development of numerous theories for its evolutionary origins and genomic distribution. In this review, we examine the three theories that have best withstood theoretical and empirical scrutiny. These are: Haig and colleagues' kinship theory; Day and Bonduriansky's sexual antagonism theory; and Wolf and Hager's maternal-offspring coadaptation theory. These theories have fundamentally different perspectives on the adaptive significance of imprinting. The kinship theory views imprinting as a mechanism to change gene dosage, with imprinting evolving because of the differential effect that gene dosage has on the fitness of matrilineal and patrilineal relatives. The sexual antagonism and maternal-offspring coadaptation theories view genomic imprinting as a mechanism to modify the resemblance of an individual to its two parents, with imprinting evolving to increase the probability of expressing the fitter of the two alleles at a locus. In an effort to stimulate further empirical work on the topic, we carefully detail the logic and assumptions of all three theories, clarify the specific predictions of each and suggest tests to discriminate between these alternative theories for why particular genes are imprinted.

  18. Short communication : Validation of genomic breeding value predictions for feed intake and feed efficiency traits

    NARCIS (Netherlands)

    Pryce, J.E.; Wales, W.J.; Haas, de Y.; Veerkamp, R.F.; Hayes, B.J.; Coffey, M.P.; Marett, L.C.; Bornhill, J.B.; Gonzalez-Recio, O.

    2014-01-01

    Validating genomic prediction equations in independent populations is an important part of evaluating genomic selection. Published genomic predictions from 2 studies on (1) residual feed intake and (2) dry matter intake (DMI) were validated in a cohort of 78 multiparous Holsteins from Australia. The

  19. Rapid detection by multiplex PCR of Genomic Islands, prophages and Integrative Conjugative Elements in V. cholerae 7th pandemic variants.

    Science.gov (United States)

    Spagnoletti, Matteo; Ceccarelli, Daniela; Colombo, Mauro M

    2012-01-01

    Vibrio cholerae poses a threat to human health, and new epidemic variants have been reported so far. Seventh pandemic V. cholerae strains are characterized by highly related genomic sequences but can be discriminated by a large set of Genomic Islands, phages and Integrative Conjugative Elements. Classical serotyping and biotyping methods do not easily discriminate among new variants arising worldwide, therefore the establishment of new methods for their identification is required. We developed a multiplex PCR assay for the rapid detection of the major 7th pandemic variants of V. cholerae O1 and O139. Three specific genomic islands (GI-12, GI-14 and GI-15), two phages (Kappa and TLC), Vibrio Seventh Pandemic Island 2 (VSP-II), and the ICEs of the SXT/R391 family were selected as targets of our multiplex PCR based on a comparative genomic approach. The optimization and specificity of the multiplex PCR was assessed on 5 V. cholerae 7th pandemic reference strains, and other 34 V. cholerae strains from various epidemic events were analyzed to validate the reliability of our method. This assay had sufficient specificity to identify twelve different V. cholerae genetic profiles, and therefore has the potential to be used as a rapid screening method.

  20. Conjugative transfer and cis-mobilization of a genomic island by an integrative and conjugative element of Streptococcus agalactiae.

    Science.gov (United States)

    Puymège, Aurore; Bertin, Stéphane; Chuzeville, Sarah; Guédon, Gérard; Payot, Sophie

    2013-03-01

    Putative integrative and conjugative elements (ICEs), i.e., genomic islands which could excise, self-transfer by conjugation, and integrate into the chromosome of the bacterial host strain, were previously identified by in silico analysis in the sequenced genomes of Streptococcus agalactiae (M. Brochet et al., J. Bacteriol. 190:6913-6917, 2008). We investigated here the mobility of the elements integrated into the 3' end of a tRNA(Lys) gene. Three of the four putative ICEs tested were found to excise but only one (ICE_515_tRNA(Lys)) was found to transfer by conjugation not only to S. agalactiae strains but also to a Streptococcus pyogenes strain. Transfer was observed even if recipient cell already carries a related resident ICE or a genomic island flanked by attL and attR recombination sites but devoid of conjugation or recombination genes (CIs-Mobilizable Element [CIME]). The incoming ICE preferentially integrates into the 3' end of the tRNA(Lys) gene (i.e., the attR site of the resident element), leading to a CIME-ICE structure. Transfer of the whole composite element CIME-ICE was obtained, showing that the CIME is mobilizable in cis by the ICE. Therefore, genomic islands carrying putative virulence genes but lacking the mobility gene can be mobilized by a related ICE after site-specific accretion.

  1. Automated protein function prediction--the genomic challenge.

    Science.gov (United States)

    Friedberg, Iddo

    2006-09-01

    Overwhelmed with genomic data, biologists are facing the first big post-genomic question--what do all genes do? First, not only is the volume of pure sequence and structure data growing, but its diversity is growing as well, leading to a disproportionate growth in the number of uncharacterized gene products. Consequently, established methods of gene and protein annotation, such as homology-based transfer, are annotating less data and in many cases are amplifying existing erroneous annotation. Second, there is a need for a functional annotation which is standardized and machine readable so that function prediction programs could be incorporated into larger workflows. This is problematic due to the subjective and contextual definition of protein function. Third, there is a need to assess the quality of function predictors. Again, the subjectivity of the term 'function' and the various aspects of biological function make this a challenging effort. This article briefly outlines the history of automated protein function prediction and surveys the latest innovations in all three topics.

  2. Probabilistic protein function prediction from heterogeneous genome-wide data.

    Directory of Open Access Journals (Sweden)

    Naoki Nariai

    Full Text Available Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function.

  3. The Pacific Rat Race to Easter Island: Tracking the Prehistoric Dispersal of Rattus exulans Using Ancient Mitochondrial Genomes

    Directory of Open Access Journals (Sweden)

    Katrina West

    2017-05-01

    Full Text Available The location of the immediate eastern Polynesian origin for the settlement of Easter Island (Rapa Nui, remains unclear with conflicting archeological and linguistic evidence. Previous genetic commensal research using the Pacific rat, Rattus exulans; a species transported by humans across Remote Oceania and throughout the Polynesian Triangle, has identified broad interaction spheres across the region. However, there has been limited success in distinguishing finer-scale movements between Remote Oceanic islands as the same mitochondrial control region haplotype has been identified in the majority of ancient rat specimens. To improve molecular resolution and identify a pattern of prehistoric dispersal to Easter Island, we sequenced complete mitochondrial genomes from ancient Pacific rat specimens obtained from early archeological contexts across West and East Polynesia. Ancient Polynesian rat haplotypes are closely related and reflect the widely supported scenario of a central East Polynesian homeland region from which eastern expansion occurred. An Easter Island and Tubuai (Austral Islands grouping of related haplotypes suggests that both islands were established by the same colonization wave, proposed to have originated in the central homeland region before dispersing through the south-eastern corridor of East Polynesia.

  4. A distinct and divergent lineage of genomic island-associated Type IV Secretion Systems in Legionella.

    Science.gov (United States)

    Wee, Bryan A; Woolfit, Megan; Beatson, Scott A; Petty, Nicola K

    2013-01-01

    Legionella encodes multiple classes of Type IV Secretion Systems (T4SSs), including the Dot/Icm protein secretion system that is essential for intracellular multiplication in amoebal and human hosts. Other T4SSs not essential for virulence are thought to facilitate the acquisition of niche-specific adaptation genes including the numerous effector genes that are a hallmark of this genus. Previously, we identified two novel gene clusters in the draft genome of Legionella pneumophila strain 130b that encode homologues of a subtype of T4SS, the genomic island-associated T4SS (GI-T4SS), usually associated with integrative and conjugative elements (ICE). In this study, we performed genomic analyses of 14 homologous GI-T4SS clusters found in eight publicly available Legionella genomes and show that this cluster is unusually well conserved in a region of high plasticity. Phylogenetic analyses show that Legionella GI-T4SSs are substantially divergent from other members of this subtype of T4SS and represent a novel clade of GI-T4SSs only found in this genus. The GI-T4SS was found to be under purifying selection, suggesting it is functional and may play an important role in the evolution and adaptation of Legionella. Like other GI-T4SSs, the Legionella clusters are also associated with ICEs, but lack the typical integration and replication modules of related ICEs. The absence of complete replication and DNA pre-processing modules, together with the presence of Legionella-specific regulatory elements, suggest the Legionella GI-T4SS-associated ICE is unique and may employ novel mechanisms of regulation, maintenance and excision. The Legionella GI-T4SS cluster was found to be associated with several cargo genes, including numerous antibiotic resistance and virulence factors, which may confer a fitness benefit to the organism. The in-silico characterisation of this new T4SS furthers our understanding of the diversity of secretion systems involved in the frequent horizontal gene

  5. Whole-genome bisulfite sequencing maps from multiple human tissues reveal novel CpG islands associated with tissue-specific regulation.

    Science.gov (United States)

    Mendizabal, Isabel; Yi, Soojin V

    2016-01-01

    CpG islands (CGIs) are one of the most widely studied regulatory features of the human genome, with critical roles in development and disease. Despite such significance and the original epigenetic definition, currently used CGI sets are typically predicted from DNA sequence characteristics. Although CGIs are deeply implicated in practical analyses of DNA methylation, recent studies have shown that such computational annotations suffer from inaccuracies. Here we used whole-genome bisulfite sequencing from 10 diverse human tissues to identify a comprehensive, experimentally obtained, single-base resolution CGI catalog. In addition to the unparalleled annotation precision, our method is free from potential bias due to arbitrary sequence features or probe affinity differences. In addition to clarifying substantial false positives in the widely used University of California Santa Cruz (UCSC) annotations, our study identifies numerous novel epigenetic loci. In particular, we reveal significant impact of transposable elements on the epigenetic regulatory landscape of the human genome and demonstrate ubiquitous presence of transcription initiation at CGIs, including alternative promoters in gene bodies and non-coding RNAs in intergenic regions. Moreover, coordinated DNA methylation and chromatin modifications mark tissue-specific enhancers at novel CGIs. Enrichment of specific transcription factor binding from ChIP-seq supports mechanistic roles of CGIs on the regulation of tissue-specific transcription. The new CGI catalog provides a comprehensive and integrated list of genomic hotspots of epigenetic regulation. © The Author 2015. Published by Oxford University Press.

  6. Phylogenetic Relationships of the Fern Cyrtomium falcatum (Dryopteridaceae) from Dokdo Island, Sea of East Japan, Based on Chloroplast Genome Sequencing.

    Science.gov (United States)

    Raman, Gurusamy; Choi, Kyoung Su; Park, SeonJoo

    2016-12-02

    Cyrtomium falcatum is a popular ornamental fern cultivated worldwide. Native to the Korean Peninsula, Japan, and Dokdo Island in the Sea of Japan, it is the only fern present on Dokdo Island. We isolated and characterized the chloroplast (cp) genome of C. falcatum, and compared it with those of closely related species. The genes trnV-GAC and trnV-GAU were found to be present within the cp genome of C. falcatum, whereas trnP-GGG and rpl21 were lacking. Moreover, cp genomes of Cyrtomium devexiscapulae and Adiantum capillus-veneris lack trnP-GGG and rpl21, suggesting these are not conserved among angiosperm cp genomes. The deletion of trnR-UCG, trnR-CCG, and trnSeC in the cp genomes of C. falcatum and other eupolypod ferns indicates these genes are restricted to tree ferns, non-core leptosporangiates, and basal ferns. The C. falcatum cp genome also encoded ndhF and rps7, with GUG start codons that were only conserved in polypod ferns, and it shares two significant inversions with other ferns, including a minor inversion of the trnD-GUC region and an approximate 3 kb inversion of the trnG-trnT region. Phylogenetic analyses showed that Equisetum was found to be a sister clade to Psilotales-Ophioglossales with a 100% bootstrap (BS) value. The sister relationship between Pteridaceae and eupolypods was also strongly supported by a 100% BS, but Bayesian molecular clock analyses suggested that C. falcatum diversified in the mid-Paleogene period (45.15 ± 4.93 million years ago) and might have moved from Eurasia to Dokdo Island.

  7. The distribution of intra-genomically variable dinoflagellate symbionts at Lord Howe Island, Australia

    Science.gov (United States)

    Wilkinson, Shaun P.; Pontasch, Stefanie; Fisher, Paul L.; Davy, Simon K.

    2016-06-01

    The symbiotic dinoflagellates of corals and other marine invertebrates ( Symbiodinium) are essential to the development of shallow-water coral reefs. This genus contains considerable genetic diversity and a corresponding range of physiological and ecological traits. Most genetic variation arises through the accumulation of somatic mutations that arise during asexual reproduction. Yet growing evidence suggests that occasional sexual reproductive events also occur within, and perhaps between, Symbiodinium lineages, further contributing to the pool of genetic variation available for evolutionary adaptation. Intra-genomic variation can therefore arise from both sexual and asexual reproductive processes, making it difficult to discern its underlying causes and consequences. We used quantitative PCR targeting the ITS2 locus to estimate proportions of genetically homogeneous symbionts and intra-genomically variable Symbiodinium (IGV Symbiodinium) in the reef-building coral Pocillopora damicornis at Lord Howe Island, Australia. We then sampled colonies through time and at a variety of spatial scales to find out whether the distribution of these symbionts followed patterns consistent with niche partitioning. Estimated ratios of homogeneous to IGV Symbiodinium varied between colonies within sites (metres to tens of metres) and between sites separated by hundreds to thousands of metres, but remained stable within colonies through time. Symbiont ratios followed a temperature gradient, with the local thermal maximum emerging as a negative predictor for the estimated proportional abundance of IGV Symbiodinium. While this pattern may result from fine-scale spatial population structure, it is consistent with an increased susceptibility to thermal stress, suggesting that the evolutionary processes that generate IGV (such as inter-lineage recombination and the accumulation of somatic mutations at the ITS2 locus) may have important implications for the fitness of the symbiont and

  8. Candidate pathogenicity islands in the genome of ‘Candidatus Rickettsiella isopodorum’, an intracellular bacterium infecting terrestrial isopod crustaceans

    Science.gov (United States)

    Wang, YaDong

    2016-01-01

    The bacterial genus Rickettsiellabelongs to the order Legionellales in the Gammaproteobacteria, and consists of several described species and pathotypes, most of which are considered to be intracellular pathogens infecting arthropods. Two members of this genus, R. grylliand R. isopodorum, are known to infect terrestrial isopod crustaceans. In this study, we assembled a draft genomic sequence for R. isopodorum, and performed a comparative genomic analysis with R. grylli. We found evidence for several candidate genomic island regions in R. isopodorum, none of which appear in the previously available R. grylli genome sequence.Furthermore, one of these genomic island candidates in R. isopodorum contained a gene that encodes a cytotoxin partially homologous to those found in Photorhabdus luminescensand Xenorhabdus nematophilus (Enterobacteriaceae), suggesting that horizontal gene transfer may have played a role in the evolution of pathogenicity in Rickettsiella. These results lay the groundwork for future studies on the mechanisms underlying pathogenesis in R. isopodorum, and this system may provide a good model for studying the evolution of host-microbe interactions in nature. PMID:28028472

  9. Conjugative Transfer and cis-Mobilization of a Genomic Island by an Integrative and Conjugative Element of Streptococcus agalactiae

    OpenAIRE

    Puymège, Aurore; Bertin, Stéphane; Chuzeville, Sarah; Guédon, Gérard; Payot, Sophie

    2013-01-01

    Putative integrative and conjugative elements (ICEs), i.e., genomic islands which could excise, self-transfer by conjugation, and integrate into the chromosome of the bacterial host strain, were previously identified by in silico analysis in the sequenced genomes of Streptococcus agalactiae (M. Brochet et al., J. Bacteriol. 190:6913–6917, 2008). We investigated here the mobility of the elements integrated into the 3′ end of a tRNALys gene. Three of the four putative ICEs tested were found to ...

  10. Sequence-Based Characterization of Tn5801-Like Genomic Islands in Tetracycline-Resistant Staphylococcus pseudintermedius and Other Gram-positive Bacteria from Humans and Animals

    DEFF Research Database (Denmark)

    de Vries, Lisbeth Elvira; Hasman, Henrik; Jurado Rabadán, Sonia;

    2016-01-01

    Antibiotic resistance in pathogens is often associated with mobile genetic elements, Antibiotic resistance in pathogens is often associated with mobile genetic elements, such as genomic islands (GI) including integrative and conjugative elements (ICEs). These can transfer resistance genes within ...

  11. Comparative genomics boosts target prediction for bacterial small RNAs.

    Science.gov (United States)

    Wright, Patrick R; Richter, Andreas S; Papenfort, Kai; Mann, Martin; Vogel, Jörg; Hess, Wolfgang R; Backofen, Rolf; Georg, Jens

    2013-09-10

    Small RNAs (sRNAs) constitute a large and heterogeneous class of bacterial gene expression regulators. Much like eukaryotic microRNAs, these sRNAs typically target multiple mRNAs through short seed pairing, thereby acting as global posttranscriptional regulators. In some bacteria, evidence for hundreds to possibly more than 1,000 different sRNAs has been obtained by transcriptome sequencing. However, the experimental identification of possible targets and, therefore, their confirmation as functional regulators of gene expression has remained laborious. Here, we present a strategy that integrates phylogenetic information to predict sRNA targets at the genomic scale and reconstructs regulatory networks upon functional enrichment and network analysis (CopraRNA, for Comparative Prediction Algorithm for sRNA Targets). Furthermore, CopraRNA precisely predicts the sRNA domains for target recognition and interaction. When applied to several model sRNAs, CopraRNA revealed additional targets and functions for the sRNAs CyaR, FnrS, RybB, RyhB, SgrS, and Spot42. Moreover, the mRNAs gdhA, lrp, marA, nagZ, ptsI, sdhA, and yobF-cspC were suggested as regulatory hubs targeted by up to seven different sRNAs. The verification of many previously undetected targets by CopraRNA, even for extensively investigated sRNAs, demonstrates its advantages and shows that CopraRNA-based analyses can compete with experimental target prediction approaches. A Web interface allows high-confidence target prediction and efficient classification of bacterial sRNAs.

  12. Genomics: Tool to predict and prevent male infertility.

    Science.gov (United States)

    Halder, Ashutosh; Kumar, Prashant; Jain, Manish; Kalsi, Amanpreet Kaur

    2017-06-01

    A large number of human diseases arise as a result of genetic abnormalities. With the advent of improved molecular biology techniques, the genetic etiology of male infertility is increasing. The common genetic factors responsible for male infertility are chromosomal abnormalities, Yq microdeletion and cystic fibrosis. These are responsible for approximately 30 percent cases of male infertility. About 40 percent cases of male infertility are categorized as idiopathic. These cases may be associated with genetic and genomic abnormalities. During last few years more and more genes are implicated in male infertility leading to decline in prevalence of idiopathic etiology. In this review we will summarize up to date published works on genetic etiologies of male infertility including our own works. We also briefly describe reproductive technologies used to overcome male infertility, dangers of transmitting genetic disorders to offspring and ways to prevent transmission of genetic disorders during assisted reproduction. At the end we will provide our points on how genomic information can be utilized for prediction and prevention of male infertility in coming years.

  13. Genomic prediction and genome-wide association analysis of female longevity in a composite beef cattle breed

    Science.gov (United States)

    Longevity is a highly important trait to the efficiency of beef cattle production. The objective of this study was to evaluate the genomic prediction of longevity and identify genomic regions associated with this trait. The data used in this study consisted of 547 Composite Gene Combination (CGC) c...

  14. PRIMEGENS-v2: genome-wide primer design for analyzing DNA methylation patterns of CpG islands.

    Science.gov (United States)

    Srivastava, Gyan P; Guo, Juyuan; Shi, Huidong; Xu, Dong

    2008-09-01

    DNA methylation plays important roles in biological processes and human diseases, especially cancers. High-throughput bisulfite genomic sequencing based on new generation of sequencers, such as the 454-sequencing system provides an efficient method for analyzing DNA methylation patterns. The successful implementation of this approach depends on the use of primer design software capable of performing genome-wide scan for optimal primers from in silico bisulfite-treated genome sequences. We have developed a method, which fulfills this requirement and conduct primer design for sequences including regions of given promoter CpG islands. The developed method has been implemented using the C and JAVA programming languages. The primer design results were tested in the PCR experiments of 96 selected human DNA sequences containing CpG islands in the promoter regions. The results indicate that this method is efficient and reliable for designing sequence-specific primers. The sequence-specific primer design for DNA meth-ylated sequences including CpG islands has been integrated into the second version of PRIMEGENS as one of the primer design features. The software is freely available for academic use at http://digbio.missouri.edu/primegens/.

  15. A second actin-like MamK protein in Magnetospirillum magneticum AMB-1 encoded outside the genomic magnetosome island.

    Directory of Open Access Journals (Sweden)

    Jean-Baptiste Rioux

    Full Text Available Magnetotactic bacteria are able to swim navigating along geomagnetic field lines. They synthesize ferromagnetic nanocrystals that are embedded in cytoplasmic membrane invaginations forming magnetosomes. Regularly aligned in the cytoplasm along cytoskeleton filaments, the magnetosome chain effectively forms a compass needle bestowing on bacteria their magnetotactic behaviour. A large genomic island, conserved among magnetotactic bacteria, contains the genes potentially involved in magnetosome formation. One of the genes, mamK has been described as encoding a prokaryotic actin-like protein which when it polymerizes forms in the cytoplasm filamentous structures that provide the scaffold for magnetosome alignment. Here, we have identified a series of genes highly similar to the mam genes in the genome of Magnetospirillum magneticum AMB-1. The newly annotated genes are clustered in a genomic islet distinct and distant from the known magnetosome genomic island and most probably acquired by lateral gene transfer rather than duplication. We focused on a mamK-like gene whose product shares 54.5% identity with the actin-like MamK. Filament bundles of polymerized MamK-like protein were observed in vitro with electron microscopy and in vivo in E. coli cells expressing MamK-like-Venus fusions by fluorescence microscopy. In addition, we demonstrate that mamK-like is transcribed in AMB-1 wild-type and DeltamamK mutant cells and that the actin-like filamentous structures observed in the DeltamamK strain are probably MamK-like polymers. Thus MamK-like is a new member of the prokaryotic actin-like family. This is the first evidence of a functional mam gene encoded outside the magnetosome genomic island.

  16. Predictive modeling of spinner dolphin (Stenella longirostris) resting habitat in the main Hawaiian Islands.

    Science.gov (United States)

    Thorne, Lesley H; Johnston, David W; Urban, Dean L; Tyne, Julian; Bejder, Lars; Baird, Robin W; Yin, Suzanne; Rickards, Susan H; Deakos, Mark H; Mobley, Joseph R; Pack, Adam A; Chapla Hill, Marie

    2012-01-01

    Predictive habitat models can provide critical information that is necessary in many conservation applications. Using Maximum Entropy modeling, we characterized habitat relationships and generated spatial predictions of spinner dolphin (Stenella longirostris) resting habitat in the main Hawaiian Islands. Spinner dolphins in Hawai'i exhibit predictable daily movements, using inshore bays as resting habitat during daylight hours and foraging in offshore waters at night. There are growing concerns regarding the effects of human activities on spinner dolphins resting in coastal areas. However, the environmental factors that define suitable resting habitat remain unclear and must be assessed and quantified in order to properly address interactions between humans and spinner dolphins. We used a series of dolphin sightings from recent surveys in the main Hawaiian Islands and a suite of environmental variables hypothesized as being important to resting habitat to model spinner dolphin resting habitat. The model performed well in predicting resting habitat and indicated that proximity to deep water foraging areas, depth, the proportion of bays with shallow depths, and rugosity were important predictors of spinner dolphin habitat. Predicted locations of suitable spinner dolphin resting habitat provided in this study indicate areas where future survey efforts should be focused and highlight potential areas of conflict with human activities. This study provides an example of a presence-only habitat model used to inform the management of a species for which patterns of habitat availability are poorly understood.

  17. Predictive modeling of spinner dolphin (Stenella longirostris resting habitat in the main Hawaiian Islands.

    Directory of Open Access Journals (Sweden)

    Lesley H Thorne

    Full Text Available Predictive habitat models can provide critical information that is necessary in many conservation applications. Using Maximum Entropy modeling, we characterized habitat relationships and generated spatial predictions of spinner dolphin (Stenella longirostris resting habitat in the main Hawaiian Islands. Spinner dolphins in Hawai'i exhibit predictable daily movements, using inshore bays as resting habitat during daylight hours and foraging in offshore waters at night. There are growing concerns regarding the effects of human activities on spinner dolphins resting in coastal areas. However, the environmental factors that define suitable resting habitat remain unclear and must be assessed and quantified in order to properly address interactions between humans and spinner dolphins. We used a series of dolphin sightings from recent surveys in the main Hawaiian Islands and a suite of environmental variables hypothesized as being important to resting habitat to model spinner dolphin resting habitat. The model performed well in predicting resting habitat and indicated that proximity to deep water foraging areas, depth, the proportion of bays with shallow depths, and rugosity were important predictors of spinner dolphin habitat. Predicted locations of suitable spinner dolphin resting habitat provided in this study indicate areas where future survey efforts should be focused and highlight potential areas of conflict with human activities. This study provides an example of a presence-only habitat model used to inform the management of a species for which patterns of habitat availability are poorly understood.

  18. Using whole-genome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012 for starvation resistance (startle response. The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP-based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.

  19. An Integrative Pathway-based Clinical-genomic Model for Cancer Survival Prediction.

    Science.gov (United States)

    Chen, Xi; Wang, Lily; Ishwaran, Hemant

    2010-09-01

    Prediction models that use gene expression levels are now being proposed for personalized treatment of cancer, but building accurate models that are easy to interpret remains a challenge. In this paper, we describe an integrative clinical-genomic approach that combines both genomic pathway and clinical information. First, we summarize information from genes in each pathway using Supervised Principal Components (SPCA) to obtain pathway-based genomic predictors. Next, we build a prediction model based on clinical variables and pathway-based genomic predictors using Random Survival Forests (RSF). Our rationale for this two-stage procedure is that the underlying disease process may be influenced by environmental exposure (measured by clinical variables) and perturbations in different pathways (measured by pathway-based genomic variables), as well as their interactions. Using two cancer microarray datasets, we show that the pathway-based clinical-genomic model outperforms gene-based clinical-genomic models, with improved prediction accuracy and interpretability.

  20. Draft Genome Sequence of Pseudomonas hussainii Strain MB3, a Denitrifying Aerobic Bacterium Isolated from the Rhizospheric Region of Mangrove Trees in the Andaman Islands, India.

    Science.gov (United States)

    Jaiswal, Shubham K; Saxena, Rituja; Mittal, Parul; Gupta, Ankit; Sharma, Vineet K

    2017-02-02

    The genome sequence of Pseudomonas hussainii MB3, isolated from the rhizospheric region of mangroves in the Andaman Islands, is comprised of 3,644,788 bp and 3,159 protein coding genes. Draft genome analysis indicates that MB3 is an aerobic bacterium capable of performing assimilatory sulfate reduction, dissimilatory nitrate reduction, and denitrification.

  1. Predicting Where a Radiation Will Occur: Acoustic and Molecular Surveys Reveal Overlooked Diversity in Indian Ocean Island Crickets (Mogoplistinae: Ornebius).

    Science.gov (United States)

    Warren, Ben H; Baudin, Rémy; Franck, Antoine; Hugel, Sylvain; Strasberg, Dominique

    2016-01-01

    Recent theory suggests that the geographic location of island radiations (local accumulation of species diversity due to cladogenesis) can be predicted based on island area and isolation. Crickets are a suitable group for testing these predictions, as they show both the ability to reach some of the most isolated islands in the world, and to speciate at small spatial scales. Despite substantial song variation between closely related species in many island cricket lineages worldwide, to date this characteristic has not received attention in the western Indian Ocean islands; existing species descriptions are based on morphology alone. Here we use a combination of acoustics and DNA sequencing to survey these islands for Ornebius crickets. We uncover a small but previously unknown radiation in the Mascarenes, constituting a three-fold increase in the Ornebius species diversity of this archipelago (from two to six species). A further new species is detected in the Comoros. Although double archipelago colonisation is the best explanation for species diversity in the Seychelles, in situ cladogenesis is the best explanation for the six species in the Mascarenes and two species of the Comoros. Whether the radiation of Mascarene Ornebius results from intra- or purely inter- island speciation cannot be determined on the basis of the phylogenetic data alone. However, the existence of genetic, song and ecological divergence at the intra-island scale is suggestive of an intra-island speciation scenario in which ecological and mating traits diverge hand-in-hand. Our results suggest that the geographic location of Ornebius radiations is partially but not fully explained by island area and isolation. A notable anomaly is Madagascar, where our surveys are consistent with existing accounts in finding no Ornebius species present. Possible explanations are discussed, invoking ecological differences between species and differences in environmental history between islands.

  2. Predicting Where a Radiation Will Occur: Acoustic and Molecular Surveys Reveal Overlooked Diversity in Indian Ocean Island Crickets (Mogoplistinae: Ornebius.

    Directory of Open Access Journals (Sweden)

    Ben H Warren

    Full Text Available Recent theory suggests that the geographic location of island radiations (local accumulation of species diversity due to cladogenesis can be predicted based on island area and isolation. Crickets are a suitable group for testing these predictions, as they show both the ability to reach some of the most isolated islands in the world, and to speciate at small spatial scales. Despite substantial song variation between closely related species in many island cricket lineages worldwide, to date this characteristic has not received attention in the western Indian Ocean islands; existing species descriptions are based on morphology alone. Here we use a combination of acoustics and DNA sequencing to survey these islands for Ornebius crickets. We uncover a small but previously unknown radiation in the Mascarenes, constituting a three-fold increase in the Ornebius species diversity of this archipelago (from two to six species. A further new species is detected in the Comoros. Although double archipelago colonisation is the best explanation for species diversity in the Seychelles, in situ cladogenesis is the best explanation for the six species in the Mascarenes and two species of the Comoros. Whether the radiation of Mascarene Ornebius results from intra- or purely inter- island speciation cannot be determined on the basis of the phylogenetic data alone. However, the existence of genetic, song and ecological divergence at the intra-island scale is suggestive of an intra-island speciation scenario in which ecological and mating traits diverge hand-in-hand. Our results suggest that the geographic location of Ornebius radiations is partially but not fully explained by island area and isolation. A notable anomaly is Madagascar, where our surveys are consistent with existing accounts in finding no Ornebius species present. Possible explanations are discussed, invoking ecological differences between species and differences in environmental history between

  3. A common reference population from four European Holstein populations increases reliability of genomic predictions

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; de Ross, Sander PW; de Vries, Alfred G

    2011-01-01

    Background Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to...

  4. A common reference population from four European Holstein populations increases reliability of genomic predictions

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; de Ross, Sander PW; de Vries, Alfred G

    2011-01-01

    Background Size of the reference population and reliability of phenotypes are crucial factors influencing the reliability of genomic predictions. It is therefore useful to combine closely related populations. Increased accuracies of genomic predictions depend on the number of individuals added to...

  5. A toxin antitoxin system promotes the maintenance of the IncA/C-mobilizable Salmonella Genomic Island 1.

    Science.gov (United States)

    Huguet, Kevin T; Gonnet, Mathieu; Doublet, Benoît; Cloeckaert, Axel

    2016-08-31

    The multidrug resistance Salmonella Genomic Island 1 (SGI1) is an integrative mobilizable element identified in several enterobacterial pathogens. This chromosomal island requires a conjugative IncA/C plasmid to be excised as a circular extrachromosomal form and conjugally mobilized in trans. Preliminary observations suggest stable maintenance of SGI1 in the host chromosome but paradoxically also incompatibility between SGI1 and IncA/C plasmids. Here, using a Salmonella enterica serovar Agona clonal bacterial population as model, we demonstrate that a Toxin-Antitoxin (TA) system encoded by SGI1 plays a critical role in its stable host maintenance when an IncA/C plasmid is concomitantly present. This system, designated sgiAT for Salmonella genomic island 1 Antitoxin and Toxin respectively, thus seems to play a stabilizing role in a situation where SGI1 is susceptible to be lost through plasmid IncA/C-mediated excision. Moreover and for the first time, the incompatibility between SGI1 and IncA/C plasmids was experimentally confirmed.

  6. Predicting sea-level rise vulnerability of terrestrial habitat and wildlife of the Northwestern Hawaiian Islands

    Science.gov (United States)

    Reynolds, Michelle H.; Berkowitz, Paul; Courtot, Karen N.; Krause, Crystal M.; Reynolds, Michelle H.; Berkowitz, Paul; Courtot, Karen N.; Krause, Crystal M.

    2012-01-01

    If current climate change trends continue, rising sea levels may inundate low-lying islands across the globe, placing island biodiversity at risk. Recent models predict a rise of approximately one meter (1 m) in global sea level by 2100, with larger increases possible in areas of the Pacific Ocean. Pacific Islands are unique ecosystems home to many endangered endemic plant and animal species. The Northwestern Hawaiian Islands (NWHI), which extend 1,930 kilometers (km) beyond the main Hawaiian Islands, are a World Heritage Site and part of the Papahanaumokuakea Marine National Monument. These NWHI support the largest tropical seabird rookery in the world, providing breeding habitat for 21 species of seabirds, 4 endemic land bird species and essential foraging, breeding, or haul-out habitat for other resident and migratory wildlife. In recent years, concern has grown about the increasing vulnerability of the NWHI and their wildlife populations to changing climatic patterns, particularly the uncertainty associated with potential impacts from global sea-level rise (SLR) and storms. In response to the need by managers to adapt future resource protection strategies to climate change variability and dynamic island ecosystems, we have synthesized and down scaled analyses for this important region. This report describes a 2-year study of a remote northwestern Pacific atoll ecosystem and identifies wildlife and habitat vulnerable to rising sea levels and changing climate conditions. A lack of high-resolution topographic data for low-lying islands of the NWHI had previously precluded an extensive quantitative model of the potential impacts of SLR on wildlife habitat. The first chapter (chapter 1) describes the vegetation and topography of 20 islands of Papahanaumokuakea Marine National Monument, the distribution and status of wildlife populations, and the predicted impacts for a range of SLR scenarios. Furthermore, this chapter explores the potential effects of SLR on

  7. Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.

    Science.gov (United States)

    Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan

    2016-11-01

    In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects.

  8. Reducing dimensionality for prediction of genome-wide breeding values

    Directory of Open Access Journals (Sweden)

    Woolliams John A

    2009-03-01

    Full Text Available Abstract Partial least square regression (PLSR and principal component regression (PCR are methods designed for situations where the number of predictors is larger than the number of records. The aim was to compare the accuracy of genome-wide breeding values (EBV produced using PLSR and PCR with a Bayesian method, 'BayesB'. Marker densities of 1, 2, 4 and 8 Ne markers/Morgan were evaluated when the effective population size (Ne was 100. The correlation between true breeding value and estimated breeding value increased with density from 0.611 to 0.681 and 0.604 to 0.658 using PLSR and PCR respectively, with an overall advantage to PLSR of 0.016 (s.e = 0.008. Both methods gave a lower accuracy compared to the 'BayesB', for which accuracy increased from 0.690 to 0.860. PLSR and PCR appeared less responsive to increased marker density with the advantage of 'BayesB' increasing by 17% from a marker density of 1 to 8Ne/M. PCR and PLSR showed greater bias than 'BayesB' in predicting breeding values at all densities. Although, the PLSR and PCR were computationally faster and simpler, these advantages do not outweigh the reduction in accuracy, and there is a benefit in obtaining relevant prior information from the distribution of gene effects.

  9. Outsmarting cancer: the power of hybrid genomic/proteomic biomarkers to predict drug response.

    Science.gov (United States)

    Rexer, Brent N; Arteaga, Carlos L

    2014-01-01

    A recent study by Niepel and colleagues describes a novel approach to predicting response to targeted anti-cancer therapies. The authors used biochemical profiling of signaling activity in basal and ligand-stimulated states for a panel of receptor and intracellular kinases to develop predictive models of drug sensitivity. In some cases, the response to ligand stimulation predicted drug response better than did target abundance or genomic alterations in the targeted pathway. Furthermore, combining biochemical profiles with genomic information was better at predicting drug response. This work suggests that incorporating biochemical signaling profiles with genomic alterations should provide powerful predictors of response to molecularly targeted therapies.

  10. Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

    DEFF Research Database (Denmark)

    de los Campos, Gustavo; Vazquez, Ana I; Fernando, Rohan;

    2013-01-01

    ) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations......Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR...... by imperfect LD between markers and QTL is given by (12b) 2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome...

  11. Large-scale prokaryotic gene prediction and comparison to genome annotation

    DEFF Research Database (Denmark)

    Nielsen, Pernille; Krogh, Anders Stærmose

    2005-01-01

    Motivation: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome...... genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms......-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation....

  12. The Salmonella genomic island 1 is specifically mobilized in trans by the IncA/C multidrug resistance plasmid family.

    Science.gov (United States)

    Douard, Gregory; Praud, Karine; Cloeckaert, Axel; Doublet, Benoît

    2010-12-20

    The Salmonella genomic island 1 (SGI1) is a Salmonella enterica-derived integrative mobilizable element (IME) containing various complex multiple resistance integrons identified in several S. enterica serovars and in Proteus mirabilis. Previous studies have shown that SGI1 transfers horizontally by in trans mobilization in the presence of the IncA/C conjugative helper plasmid pR55. Here, we report the ability of different prevalent multidrug resistance (MDR) plasmids including extended-spectrum β-lactamase (ESBL) gene-carrying plasmids to mobilize the multidrug resistance genomic island SGI1. Through conjugation experiments, none of the 24 conjugative plasmids tested of the IncFI, FII, HI2, I1, L/M, N, P incompatibility groups were able to mobilize SGI1 at a detectable level (transfer frequency IncA/C incompatibility group. Several conjugative IncA/C MDR plasmids as well as the sequenced IncA/C reference plasmid pRA1 of 143,963 bp were shown to mobilize in trans SGI1 from a S. enterica donor to the Escherichia coli recipient strain. Depending on the IncA/C plasmid used, the conjugative transfer of SGI1 occurred at frequencies ranging from 10(-3) to 10(-6) transconjugants per donor. Of particular concern, some large IncA/C MDR plasmids carrying the extended-spectrum cephalosporinase bla(CMY-2) gene were shown to mobilize in trans SGI1. The ability of the IncA/C MDR plasmid family to mobilize SGI1 could contribute to its spread by horizontal transfer among enteric pathogens. Moreover, the increasing prevalence of IncA/C plasmids in MDR S. enterica isolates worldwide has potential implications for the epidemic success of the antibiotic resistance genomic island SGI1 and its close derivatives.

  13. Drug target prediction and prioritization: using orthology to predict essentiality in parasite genomes

    Directory of Open Access Journals (Sweden)

    Hall Ross S

    2010-04-01

    Full Text Available Abstract Background New drug targets are urgently needed for parasites of socio-economic importance. Genes that are essential for parasite survival are highly desirable targets, but information on these genes is lacking, as gene knockouts or knockdowns are difficult to perform in many species of parasites. We examined the applicability of large-scale essentiality information from four model eukaryotes, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Saccharomyces cerevisiae, to discover essential genes in each of their genomes. Parasite genes that lack orthologues in their host are desirable as selective targets, so we also examined prediction of essential genes within this subset. Results Cross-species analyses showed that the evolutionary conservation of genes and the presence of essential orthologues are each strong predictors of essentiality in eukaryotes. Absence of paralogues was also found to be a general predictor of increased relative essentiality. By combining several orthology and essentiality criteria one can select gene sets with up to a five-fold enrichment in essential genes compared with a random selection. We show how quantitative application of such criteria can be used to predict a ranked list of potential drug targets from Ancylostoma caninum and Haemonchus contortus - two blood-feeding strongylid nematodes, for which there are presently limited sequence data but no functional genomic tools. Conclusions The present study demonstrates the utility of using orthology information from multiple, diverse eukaryotes to predict essential genes. The data also emphasize the challenge of identifying essential genes among those in a parasite that are absent from its host.

  14. Vibrio cholerae VttRA and VttRB Regulatory Influences Extend beyond the Type 3 Secretion System Genomic Island

    OpenAIRE

    Chaand, Mudit; Dziejman, Michelle

    2013-01-01

    A subset of non-O1/non-O139 serogroup strains of Vibrio cholerae cause disease using type 3 secretion system (T3SS)-mediated mechanisms. An ∼50-kb genomic island carries genes encoding the T3SS structural apparatus, effector proteins, and two transmembrane transcriptional regulators, VttRA and VttRB, which are ToxR homologues. Previous experiments demonstrated that VttRA and VttRB are necessary for colonization in vivo and promote bile-dependent T3SS gene expression in vitro. To better unders...

  15. Vibrio cholerae VttRA and VttRB Regulatory Influences Extend beyond the Type 3 Secretion System Genomic Island

    OpenAIRE

    Chaand, Mudit; Dziejman, Michelle

    2013-01-01

    A subset of non-O1/non-O139 serogroup strains of Vibrio cholerae cause disease using type 3 secretion system (T3SS)-mediated mechanisms. An ∼50-kb genomic island carries genes encoding the T3SS structural apparatus, effector proteins, and two transmembrane transcriptional regulators, VttRA and VttRB, which are ToxR homologues. Previous experiments demonstrated that VttRA and VttRB are necessary for colonization in vivo and promote bile-dependent T3SS gene expression in vitro. To better unders...

  16. Genomic prediction in a breeding program of perennial ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario; Ashraf, Bilal; Greve-Pedersen, Morten;

    2015-01-01

    We present a genomic selection study performed on 1918 rye grass families (Lolium perenne L.), which were derived from a commercial breeding program at DLF-Trifolium, Denmark. Phenotypes were recorded on standard plots, across 13 years and in 6 different countries. Variants were identified...... in utilizing genomic selection in rye grass....

  17. An equation to predict the accuracy of genomic values by combining data from multiple traits, populations, or environments

    NARCIS (Netherlands)

    Wientjes, Y.C.J.; Bijma, P.; Veerkamp, R.F.; Calus, M.P.L.

    2016-01-01

    Predicting the accuracy of estimated genomic values using genome-wide marker information is an important step in designing training populations. Currently, different deterministic equations are available to predict accuracy within populations, but not for multipopulation scenarios where data from

  18. Data from: An equation to predict the accuracy of genomic values by combining data from multiple traits, populations, or environments

    NARCIS (Netherlands)

    Wientjes, Y.C.J.; Bijma, P.; Veerkamp, R.F.; Calus, M.P.L.

    2015-01-01

    Predicting the accuracy of estimated genomic values using genome-wide marker information is an important step in designing training populations. Currently, different deterministic equations are available to predict accuracy within populations, but not for multipopulation scenarios where data from

  19. An estimator-based distributed voltage-predictive control strategy for ac islanded microgrids

    DEFF Research Database (Denmark)

    Wang, Yanbo; Chen, Zhe; Wang, Xiongfei

    2015-01-01

    This paper presents an estimator-based voltage predictive control strategy for AC islanded microgrids, which is able to perform voltage control without any communication facilities. The proposed control strategy is composed of a network voltage estimator and a voltage predictive controller for each...... control strategy is analyzed through small signal analysis method, from which the design guideline for the controller parameters is formulated. Furthermore, the robustness of the proposed voltage control strategy is investigated under a series of parameters uncertainties, including the line parameters...... perturbation, load parameters variation, different disturbance locations, LC filters perturbation, output impedances perturbation and DG unit fault. The simulation and experimental results show that the proposed control approach is able to perform offset-free voltage control without any communication links...

  20. An estimator-based distributed voltage-predictive control strategy for AC islanded microgrids

    DEFF Research Database (Denmark)

    Wang, Yanbo; Chen, Zhe; Wang, Xiongfei

    2015-01-01

    This paper presents an estimator-based voltage-predictive control strategy for ac islanded microgrids, which is able to perform voltage control without any communication facilities. The proposed control strategy is composed of a network voltage estimator and a voltage-predictive controller for each...... control strategy is analyzed through small-signal analysis method, from which the design guideline for the controller parameters is formulated. Furthermore, the robustness of the proposed voltage control strategy is investigated under a series of parameters uncertainties, including the line parameters...... perturbation, load parameters variation, different disturbance locations, LC filters perturbation, output impedances perturbation, and DG unit fault. The simulation and experimental results show that the proposed control approach is able to perform offset-free voltage control without any communication links...

  1. Genomic prediction of starch content and chipping quality in tetraploid potato using genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Sverrisdóttir, Elsa; Byrne, Stephen; Sundmark, Ea Høegh Riis

    2017-01-01

    Genomic selection uses genome-wide molecular markers to predict performance of individuals and allows selections in the absence of direct phenotyping. It is regarded as a useful tool to accelerate genetic gain in breeding programs, and is becoming increasingly viable for crops as genotyping costs...... genomic estimated breeding values. Cross-validated prediction correlations of 0.56 and 0.73 were obtained within the training population for starch content and chipping quality, respectively, while correlations were lower when predicting performance in the test panel, at 0.30–0.31 and 0.......42–0.43, respectively. Predictions in the test panel were slightly improved when including representatives from the test panel in the training population but worsened when preceded by marker selection. Our results suggest that genomic prediction is feasible, however, the extremely high allelic diversity of tetraploid...

  2. Prediction of causative genomic relationships using sequence data of five French and Danish dairy cattle breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Boichard, Didier; Lund, Mogens Sandø

    and HD chips, or two 1 Kb intervals on both sides of each causative mutation, varying the distance between causative mutations and intervals from 1 base to 1 Mb. Subsequently, the regression coefficient of the genomic relationships at prediction markers on the genomic relationships at causal loci...... data is more likely to contain causative mutations and therefore increase the prediction accuracy in such populations. We studied the potential advantage of using real sequence data for prediction of genomic relationships at causative mutations using sequence data of chromosome 1 for 122 Holstein, 27...

  3. Promoter Prediction on a Genomic Scale—The Adh Experience

    Science.gov (United States)

    Ohler, Uwe

    2000-01-01

    We describe our statistical system for promoter recognition in genomic DNA with which we took part in the Genome Annotation Assessment Project (GASP1). We applied two versions of the system: the first uses a region-based approach toward transcription start site identification, namely, interpolated Markov chains; the second was a hybrid approach combining regions and signals within a stochastic segment model. We compare the results of both versions with each other and examine how well the application on a genomic scale compares with the results we previously obtained on smaller data sets. PMID:10779494

  4. A System for Predicting Subcellular Localization of Yeast Genome Using Neural Network

    CERN Document Server

    Thampi, Sabu M

    2007-01-01

    The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. Many efforts have been made to predict protein subcellular localization. This paper aims to merge the artificial neural networks and bioinformatics to predict the location of protein in yeast genome. We introduce a new subcellular prediction method based on a backpropagation neural network. The results show that the prediction within an error limit of 5 to 10 percentage can be achieved with the system.

  5. Improved prediction of genetic predisposition to psychiatric disorders using genomic feature best linear unbiased prediction models

    DEFF Research Database (Denmark)

    Rohde, Palle Duun; Demontis, Ditte; Børglum, Anders

    Introduction: Accurate prediction of unobserved phenotypes from observed genotypes is essential for the success in predicting disease risk from genotypes. However, the performance is somewhat limited. Genomic feature best linear unbiased prediction (GFBLUP) models separate the total genomic...... is enriched for causal variants. Here we apply the GFBLUP model to a small schizophrenia case-control study to test the promise of this model on psychiatric disorders, and hypothesize that the performance will be increased when applying the model to a larger ADHD case-control study if the genomic feature...... contains the causal variants. Materials and Methods: The schizophrenia study consisted of 882 controls and 888 schizophrenia cases genotyped for 520,000 SNPs. The ADHD study contained 25,954 controls and 16,663 ADHD cases with 8,4 million imputed genotypes. Results: The predictive ability for schizophrenia...

  6. Genomic prediction for Nordic Red Cattle using one-step and selection index blending

    DEFF Research Database (Denmark)

    Guosheng, Su; Madsen, Per; Nielsen, Ulrik Sander

    2012-01-01

    This study investigated the accuracy of direct genomic breeding values (DGV) using a genomic BLUP model, genomic enhanced breeding values (GEBV) using a one-step blending approach, and GEBV using a selection index blending approach for 15 traits of Nordic Red Cattle. The data comprised 6,631 bull......-step blending approach is a good alternative to predict GEBV in practical genetic evaluation program....

  7. Effect of marker-data editing on the accuracy of genomic prediction

    DEFF Research Database (Denmark)

    Edriss, Vahid; Guldbrandtsen, Bernt; Lund, Mogens Sandø

    2013-01-01

    Genomic selection is a method to predict breeding values using genome-wide single-nucleotide polymorphism (SNP) markers. High-quality marker data are necessary for genomic selection. The aim of this study was to investigate the effect of marker-editing criteria on the accuracy of genomic predicti......Genomic selection is a method to predict breeding values using genome-wide single-nucleotide polymorphism (SNP) markers. High-quality marker data are necessary for genomic selection. The aim of this study was to investigate the effect of marker-editing criteria on the accuracy of genomic...... predictions in the Nordic Holstein and Jersey populations. Data included 4429 Holstein and 1071 Jersey bulls. In total, 48 222 SNP for Holstein and 44 305 SNP for Jersey were polymorphic. The SNP data were edited based on (i) minor allele frequencies (MAF) with thresholds of no limit, 0.001, 0.01, 0.02, 0.......05 and 0.10, (ii) deviations from Hardy–Weinberg proportions (HWP) with thresholds of no limit, chi-squared p-values of 0.001, 0.02, 0.05 and 0.10, and (iii) GenCall (GC) scores with thresholds of 0.15, 0.55, 0.60, 0.65 and 0.70. The marker data sets edited with different criteria were used for genomic...

  8. Accounting for genetic architecture improves sequence based genomic prediction for a Drosophila fitness trait.

    Directory of Open Access Journals (Sweden)

    Ulrike Ober

    Full Text Available The ability to predict quantitative trait phenotypes from molecular polymorphism data will revolutionize evolutionary biology, medicine and human biology, and animal and plant breeding. Efforts to map quantitative trait loci have yielded novel insights into the biology of quantitative traits, but the combination of individually significant quantitative trait loci typically has low predictive ability. Utilizing all segregating variants can give good predictive ability in plant and animal breeding populations, but gives little insight into trait biology. Here, we used the Drosophila Genetic Reference Panel to perform both a genome wide association analysis and genomic prediction for the fitness-related trait chill coma recovery time. We found substantial total genetic variation for chill coma recovery time, with a genetic architecture that differs between males and females, a small number of molecular variants with large main effects, and evidence for epistasis. Although the top additive variants explained 36% (17% of the genetic variance among lines in females (males, the predictive ability using genomic best linear unbiased prediction and a relationship matrix using all common segregating variants was very low for females and zero for males. We hypothesized that the low predictive ability was due to the mismatch between the infinitesimal genetic architecture assumed by the genomic best linear unbiased prediction model and the true genetic architecture of chill coma recovery time. Indeed, we found that the predictive ability of the genomic best linear unbiased prediction model is markedly improved when we combine quantitative trait locus mapping with genomic prediction by only including the top variants associated with main and epistatic effects in the relationship matrix. This trait-associated prediction approach has the advantage that it yields biologically interpretable prediction models.

  9. In silico enhanced restriction enzyme based methylation analysis of the human glioblastoma genome using Agilent 244K CpG Island microarrays

    Directory of Open Access Journals (Sweden)

    Anh Tran

    2010-01-01

    Full Text Available Genome wide methylation profiling of gliomas is likely to provide important clues to improving treatment outcomes. Restriction enzyme based approaches have been widely utilized for methylation profiling of cancer genomes and will continue to have importance in combination with higher density microarrays. With the availability of the human genome sequence and microarray probe sequences, these approaches can be readily characterized and optimized via in silico modeling. We adapted the previously described HpaII/MspI based Methylation Sensitive Restriction Enzyme (MSRE assay for use with two-color Agilent 244K CpG island microarrays. In this assay, fragmented genomic DNA is digested in separate reactions with isoschizomeric HpaII (methylation-sensitive and MspI (methylation-insensitive restriction enzymes. Using in silico hybridization, we found that genomic fragmentation with BfaI was superior to MseI, providing a maximum effective coverage of 22,362 CpG islands in the human genome. In addition, we confirmed the presence of an internal control group of fragments lacking HpaII/MspI sites which enable separation of methylated and unmethylated fragments. We used this method on genomic DNA isolated from normal brain, U87MG cells, and a glioblastoma patient tumor sample and confirmed selected differentially methylated CpG islands using bisulfite sequencing. Along with additional validation points, we performed a receiver operating characteristics (ROC analysis to determine the optimal threshold (p ≤ 0.001. Based on this threshold, we identified ~2400 CpG islands common to all three samples and 145 CpG islands unique to glioblastoma. These data provide more general guidance to individuals seeking to maximize effective coverage using restriction enzyme based methylation profiling approaches.

  10. The master activator of IncA/C conjugative plasmids stimulates genomic islands and multidrug resistance dissemination.

    Science.gov (United States)

    Carraro, Nicolas; Matteau, Dominick; Luo, Peng; Rodrigue, Sébastien; Burrus, Vincent

    2014-10-01

    Dissemination of antibiotic resistance genes occurs mostly by conjugation, which mediates DNA transfer between cells in direct contact. Conjugative plasmids of the IncA/C incompatibility group have become a substantial threat due to their broad host-range, the extended spectrum of antimicrobial resistance they confer, their prevalence in enteric bacteria and their very efficient spread by conjugation. However, their biology remains largely unexplored. Using the IncA/C conjugative plasmid pVCR94ΔX as a prototype, we have investigated the regulatory circuitry that governs IncA/C plasmids dissemination and found that the transcriptional activator complex AcaCD is essential for the expression of plasmid transfer genes. Using chromatin immunoprecipitation coupled with exonuclease digestion (ChIP-exo) and RNA sequencing (RNA-seq) approaches, we have identified the sequences recognized by AcaCD and characterized the AcaCD regulon. Data mining using the DNA motif recognized by AcaCD revealed potential AcaCD-binding sites upstream of genes involved in the intracellular mobility functions (recombination directionality factor and mobilization genes) in two widespread classes of genomic islands (GIs) phylogenetically unrelated to IncA/C plasmids. The first class, SGI1, confers and propagates multidrug resistance in Salmonella enterica and Proteus mirabilis, whereas MGIVmi1 in Vibrio mimicus belongs to a previously uncharacterized class of GIs. We have demonstrated that through expression of AcaCD, IncA/C plasmids specifically trigger the excision and mobilization of the GIs at high frequencies. This study provides new evidence of the considerable impact of IncA/C plasmids on bacterial genome plasticity through their own mobility and the mobilization of genomic islands.

  11. Evaluation of the utility of gene expression and metabolic information for genomic prediction in maize.

    Science.gov (United States)

    Guo, Zhigang; Magwire, Michael M; Basten, Christopher J; Xu, Zhanyou; Wang, Daolong

    2016-12-01

    Predictive ability derived from gene expression and metabolic information was evaluated using genomic prediction methods based on datasets from a public maize panel. With the rapid development of high throughput biological technologies, information from gene expression and metabolites has received growing attention in plant genetics and breeding. In this study, we evaluated the utility of gene expression and metabolic information for genomic prediction using data obtained from a maize diversity panel. Our results show that, when used as predictor variables, gene expression levels and metabolite abundances provided reasonable predictive abilities relative to those based on genetic markers, although these values were not as large as those with genetic markers. Integrating gene expression levels and metabolite abundances with genetic markers significantly improved predictive abilities in comparison to the benchmark genomic best linear unbiased prediction model using genome-wide markers only. Predictive abilities based on gene expression and metabolites were trait-specific and were affected by the time of measurement and tissue samples as well as the number of genes and metabolites included in the model. In general, our results suggest that, rather than being conventionally used as intermediate phenotypes, gene expression and metabolic information can be used as predictors for genomic prediction and help improve genetic gains for complex traits in breeding programs.

  12. Establishing the basis for Genomic Prediction in Perennial Ryegrass

    DEFF Research Database (Denmark)

    Fé, Dario

    2015-01-01

    Genomic Selection (GS) is a relatively new technology, which has already revolutionized animal breeding and which is expected to have a high impact on plant breeding. In contrast to traditional marker assisted breeding, which only focuses on specific genes. GS estimates the genetic value...... of individuals/families by using genomic information over the Whole genome. The benefits of GS include reductions in expensive and time-consuming phenotyping operations, higher genetic gains, and simultaneous selection of multiple traits. To date, GS has primarely been tested in species, which are grown...... as homogeneous varieties. For crops grown in heterogeneous families, investigations have been limited to af few theoretical considerations. The aim of the present thesis was to establish the basis for GS implementation in such species. Analyses were performed on real data from a breeding program of perennial...

  13. Kernel-based whole-genome prediction of complex traits: a review

    Directory of Open Access Journals (Sweden)

    Gota eMorota

    2014-10-01

    Full Text Available Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways, thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  14. Kernel-based whole-genome prediction of complex traits: a review

    Science.gov (United States)

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  15. From structure prediction to genomic screens for novel non-coding RNAs

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.

    2011-01-01

    . This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch...

  16. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    Directory of Open Access Journals (Sweden)

    Bonten Marc JM

    2010-04-01

    Full Text Available Abstract Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI, which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come.

  17. Comparative Genomics of Rhodococcus equi Virulence Plasmids Indicates Host-Driven Evolution of the vap Pathogenicity Island.

    Science.gov (United States)

    MacArthur, Iain; Anastasi, Elisa; Alvarez, Sonsiray; Scortti, Mariela; Vázquez-Boland, José A

    2017-05-01

    The conjugative virulence plasmid is a key component of the Rhodococcus equi accessory genome essential for pathogenesis. Three host-associated virulence plasmid types have been identified the equine pVAPA and porcine pVAPB circular variants, and the linear pVAPN found in bovine (ruminant) isolates. We recently characterized the R. equi pangenome (Anastasi E, et al. 2016. Pangenome and phylogenomic analysis of the pathogenic actinobacterium Rhodococcus equi. Genome Biol Evol. 8:3140-3148.) and we report here the comparative analysis of the virulence plasmid genomes. Plasmids within each host-associated type were highly similar despite their diverse origins. Variation was accounted for by scattered single nucleotide polymorphisms and short nucleotide indels, while larger indels-mostly in the plasticity region near the vap pathogencity island (PAI)-defined plasmid genomic subtypes. Only one of the plasmids analyzed, of pVAPN type, was exceptionally divergent due to accumulation of indels in the housekeeping backbone. Each host-associated plasmid type carried a unique PAI differing in vap gene complement, suggesting animal host-specific evolution of the vap multigene family. Complete conservation of the vap PAI was observed within each host-associated plasmid type. Both diversity of host-associated plasmid types and clonality of specific chromosomal-plasmid genomic type combinations were observed within the same R. equi phylogenomic subclade. Our data indicate that the overall strong conservation of the R. equi host-associated virulence plasmids is the combined result of host-driven selection, lateral transfer between strains, and geographical spread due to international livestock exchanges. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Genomic selection and complex trait prediction using a fast EM algorithm applied to genome-wide markers.

    Science.gov (United States)

    Shepherd, Ross K; Meuwissen, Theo H E; Woolliams, John A

    2010-10-22

    The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time.

  19. A foundation for provitamin A biofortification of maize: genome-wide association and genomic prediction models of carotenoid levels.

    Science.gov (United States)

    Owens, Brenda F; Lipka, Alexander E; Magallanes-Lundback, Maria; Tiede, Tyler; Diepenbrock, Christine H; Kandianis, Catherine B; Kim, Eunha; Cepela, Jason; Mateos-Hernandez, Maria; Buell, C Robin; Buckler, Edward S; DellaPenna, Dean; Gore, Michael A; Rocheford, Torbert

    2014-12-01

    Efforts are underway for development of crops with improved levels of provitamin A carotenoids to help combat dietary vitamin A deficiency. As a global staple crop with considerable variation in kernel carotenoid composition, maize (Zea mays L.) could have a widespread impact. We performed a genome-wide association study (GWAS) of quantified seed carotenoids across a panel of maize inbreds ranging from light yellow to dark orange in grain color to identify some of the key genes controlling maize grain carotenoid composition. Significant associations at the genome-wide level were detected within the coding regions of zep1 and lut1, carotenoid biosynthetic genes not previously shown to impact grain carotenoid composition in association studies, as well as within previously associated lcyE and crtRB1 genes. We leveraged existing biochemical and genomic information to identify 58 a priori candidate genes relevant to the biosynthesis and retention of carotenoids in maize to test in a pathway-level analysis. This revealed dxs2 and lut5, genes not previously associated with kernel carotenoids. In genomic prediction models, use of markers that targeted a small set of quantitative trait loci associated with carotenoid levels in prior linkage studies were as effective as genome-wide markers for predicting carotenoid traits. Based on GWAS, pathway-level analysis, and genomic prediction studies, we outline a flexible strategy involving use of a small number of genes that can be selected for rapid conversion of elite white grain germplasm, with minimal amounts of carotenoids, to orange grain versions containing high levels of provitamin A.

  20. Genomic prediction and genomic variance partitioning of daily and residual feed intake in pigs using Bayesian Power Lasso models

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Janss, Luc L G; Strathe, Anders B

    Improvement of feed efficiency is essential in pig breeding and selection for reduced residual feed intake (RFI) is an option. The study applied Bayesian Power LASSO (BPL) models with different power parameter to investigate genetic architecture, to predict genomic breeding values, and to partition...... genomic variance for RFI and daily feed intake (DFI). A total of 1272 Duroc pigs had both genotypic and phenotypic records for these traits. Significant SNPs were detected on chromosome 1 (SSC 1) and SSC 14 for RFI and on SSC 1 for DFI. BPL had similar accuracy and bias as GBLUP but power parameters had...

  1. Genomic prediction and genomic variance partitioning of daily and residual feed intake in pigs using Bayesian Power Lasso models

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Janss, L. L. G.; Strathe, Anders Bjerring

    Improvement of feed efficiency is essential in pig breeding and selection for reduced residual feed intake (RFI) is an option. The study applied Bayesian Power LASSO (BPL) models with different power parameter to investigate genetic architecture, to predict genomic breeding values, and to partition...... genomic variance for RFI and daily feed intake (DFI). A total of 1272 Duroc pigs had both genotypic and phenotypic records for these traits. Significant SNPs were detected on chromosome 1 (SSC 1) and SSC 14 for RFI and on SSC 1 for DFI. BPL models had similar accuracy and bias as GBLUP method but use...

  2. Gene prediction in the fathead minnow [Pimephales promelas] genome

    Science.gov (United States)

    The fathead minnow is a well-established model organism which has been widely used for regulatory ecotoxicity testing and research for over half century. While much information has been gathered on the organism over the years, the fathead minnow genome, a critical source of infor...

  3. Prediction of severe thunderstorms over Sriharikota Island by using the WRF-ARW operational model

    Science.gov (United States)

    Papa Rao, G.; Rajasekhar, M.; Pushpa Saroja, R.; Sreeshna, T.; Rajeevan, M.; Ramakrishna, S. S. V. S.

    2016-05-01

    Operational short range prediction of Meso-scale thunderstorms for Sriharikota(13.7°N ,80.18°E) has been performed using two nested domains 27 & 9Km configuration of Weather Research & Forecasting-Advanced Research Weather Model (WRF- ARW V3.4).Thunderstorm is a Mesoscale system with spatial scale of few kilometers to a couple of 100 kilometers and time scale of less than an one hour to several hours, which produces heavy rain, lightning, thunder, surface wind squalls and down-bursts. Numerical study of Thunderstorms at Sriharikota and its neighborhood have been discussed with its antecedent thermodynamic stability indices and Parameters that are usually favorable for the development of convective instability based on WRF ARW model predictions. Instability is a prerequisite for the occurrence of severe weather, the greater the instability, the greater will be the potential of thunderstorm. In the present study, K Index, Total totals Index (TTI), Convective Available Potential Energy (CAPE), Convective Inhibition Energy (CINE), Lifted Index (LI), Precipitable Water (PW), etc. are the instability indices used for the short range prediction of thunderstorms. In this study we have made an attempt to estimate the skill of WRF ARW predictability and diagnosed three thunderstorms that occurred during the late evening to late night of 31st July, 20th September and 2nd October of 2015 over Sriharikota Island which are validated with Local Electric Field Mill (EFM), rainfall observations and Chennai Doppler Weather Radar products. The model predicted thermodynamic indices (CAPE, CINE, K Index, LI, TTI and PW) over Sriharikota which act as good indicators for severe thunderstorm activity.

  4. Destabilization of IncA and IncC plasmids by SGI1 and SGI2 type Salmonella genomic islands.

    Science.gov (United States)

    Harmer, Christopher J; Hamidian, Mohammad; Ambrose, Stephanie J; Hall, Ruth M

    Both the Salmonella genomic islands (SGI) and the conjugative IncC plasmids are known to contribute substantially to the acquisition of resistance to multiple antibiotics, and plasmids in the A/C group are known to mobilize the Salmonella genomic island SGI1, which also carries multiple antibiotic resistance genes. Plasmid pRMH760 (IncC; A/C2) was shown to mobilize SGI1 variants SGI1-I, SGI1-F, SGI1-K and SGI2 from Salmonella enterica to Escherichia coli where it was integrated at the preferred location, at the end of the trmE (thdF) gene. The plasmid was transferred at a similar frequency. However, we observed that co-transfer of the SGI and the plasmid was rarer. In E. coli to E. coli transfer, the frequency of transfer of the IncC plasmid pRMH760 was at least 1000-fold lower when the donor carried SGI1-I or SGI1-K, indicating that the SGI suppresses transfer of the plasmid. In addition, pRMH760 was rapidly lost from both E. coli and S. enterica strains that also carried SGI1-I, SGI1-F or SGI2. However, plasmid loss was not seen when the SGI1 variant was SGI1-K, which lacks two segments of the SGI1 backbone. The complete sequence of the SGI1-I and SGI1-F were determined and SGI1-K also carries two single base substitutions relative to SGI1-I. The IncA (A/C1) plasmid RA1 was also shown to mobilize SGI2-A and though there are significant differences between the backbones of IncA and IncC plasmids, RA1 was also rapidly lost when SGI2-A was present in the same cell. We conclude that there are multiple interactions, both cooperative and antagonistic, between an IncA or IncC plasmid and the SGI1 and SGI2 family genomic islands. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes

    Directory of Open Access Journals (Sweden)

    Yang Yi-Fan

    2007-03-01

    Full Text Available Abstract Background Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. Results This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs and Translation Initiation Sites (TISs. The former is based on a linguistic "Entropy Density Profile" (EDP model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Conclusion Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  6. Interactions of Neuropathogenic Escherichia coli K1 (RS218 and Its Derivatives Lacking Genomic Islands with Phagocytic Acanthamoeba castellanii and Nonphagocytic Brain Endothelial Cells

    Directory of Open Access Journals (Sweden)

    Farzana Abubakar Yousuf

    2014-01-01

    Full Text Available Here we determined the role of various genomic islands in E. coli K1 interactions with phagocytic A. castellanii and nonphagocytic brain microvascular endothelial cells. The findings revealed that the genomic islands deletion mutants of RS218 related to toxins (peptide toxin, α-hemolysin, adhesins (P fimbriae, F17-like fimbriae, nonfimbrial adhesins, Hek, and hemagglutinin, protein secretion system (T1SS for hemolysin, invasins (IbeA, CNF1, metabolism (D-serine catabolism, dihydroxyacetone, glycerol, and glyoxylate metabolism showed reduced interactions with both A. castellanii and brain microvascular endothelial cells. Interestingly, the deletion of RS218-derived genomic island 21 containing adhesins (P fimbriae, F17-like fimbriae, nonfimbrial adhesins, Hek, and hemagglutinin, protein secretion system (T1SS for hemolysin, invasins (CNF1, metabolism (D-serine catabolism abolished E. coli K1-mediated HBMEC cytotoxicity in a CNF1-independent manner. Therefore, the characterization of these genomic islands should reveal mechanisms of evolutionary gain for E. coli K1 pathogenicity.

  7. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Energy Technology Data Exchange (ETDEWEB)

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  8. Genome-wide prediction and validation of sigma70 promoters in Lactobacillus plantarum WCFS1.

    Directory of Open Access Journals (Sweden)

    Tilman J Todt

    Full Text Available BACKGROUND: In prokaryotes, sigma factors are essential for directing the transcription machinery towards promoters. Various sigma factors have been described that recognize, and bind to specific DNA sequence motifs in promoter sequences. The canonical sigma factor σ(70 is commonly involved in transcription of the cell's housekeeping genes, which is mediated by the conserved σ(70 promoter sequence motifs. In this study the σ(70-promoter sequences in Lactobacillus plantarum WCFS1 were predicted using a genome-wide analysis. The accuracy of the transcriptionally-active part of this promoter prediction was subsequently evaluated by correlating locations of predicted promoters with transcription start sites inferred from the 5'-ends of transcripts detected by high-resolution tiling array transcriptome datasets. RESULTS: To identify σ(70-related promoter sequences, we performed a genome-wide sequence motif scan of the L. plantarum WCFS1 genome focussing on the regions upstream of protein-encoding genes. We obtained several highly conserved motifs including those resembling the conserved σ(70-promoter consensus. Position weight matrices-based models of the recovered σ(70-promoter sequence motif were employed to identify 3874 motifs with significant similarity (p-value<10(-4 to the model-motif in the L. plantarum genome. Genome-wide transcript information deduced from whole genome tiling-array transcriptome datasets, was used to infer transcription start sites (TSSs from the 5'-end of transcripts. By this procedure, 1167 putative TSSs were identified that were used to corroborate the transcriptionally active fraction of these predicted promoters. In total, 568 predicted promoters were found in proximity (≤ 40 nucleotides of the putative TSSs, showing a highly significant co-occurrence of predicted promoter and TSS (p-value<10(-263. CONCLUSIONS: High-resolution tiling arrays provide a suitable source to infer TSSs at a genome-wide level, and

  9. A Bayesian network approach to predicting nest presence of thefederally-threatened piping plover (Charadrius melodus) using barrier island features

    Science.gov (United States)

    Gieder, Katherina D.; Karpanty, Sarah M.; Frasera, James D.; Catlin, Daniel H.; Gutierrez, Benjamin T.; Plant, Nathaniel G.; Turecek, Aaron M.; Thieler, E. Robert

    2014-01-01

    Sea-level rise and human development pose significant threats to shorebirds, particularly for species that utilize barrier island habitat. The piping plover (Charadrius melodus) is a federally-listed shorebird that nests on barrier islands and rapidly responds to changes in its physical environment, making it an excellent species with which to model how shorebird species may respond to habitat change related to sea-level rise and human development. The uncertainty and complexity in predicting sea-level rise, the responses of barrier island habitats to sea-level rise, and the responses of species to sea-level rise and human development necessitate a modelling approach that can link species to the physical habitat features that will be altered by changes in sea level and human development. We used a Bayesian network framework to develop a model that links piping plover nest presence to the physical features of their nesting habitat on a barrier island that is impacted by sea-level rise and human development, using three years of data (1999, 2002, and 2008) from Assateague Island National Seashore in Maryland. Our model performance results showed that we were able to successfully predict nest presence given a wide range of physical conditions within the model’s dataset. We found that model predictions were more successful when the range of physical conditions included in model development was varied rather than when those physical conditions were narrow. We also found that all model predictions had fewer false negatives (nests predicted to be absent when they were actually present in the dataset) than false positives (nests predicted to be present when they were actually absent in the dataset), indicating that our model correctly predicted nest presence better than nest absence. These results indicated that our approach of using a Bayesian network to link specific physical features to nest presence will be useful for modelling impacts of sea-level rise- or human

  10. The complete genomes of subgenotype IA hepatitis A virus strains from four different islands in Indonesia form a phylogenetic cluster.

    Science.gov (United States)

    Mulyanto; Wibawa, I Dewa Nyoman; Suparyatmo, Joseph Benedictus; Amirudin, Rifai; Ohnishi, Hiroshi; Takahashi, Masaharu; Nishizawa, Tsutomu; Okamoto, Hiroaki

    2014-05-01

    Despite the high endemicity of hepatitis A virus (HAV) in Indonesia, genetic information on those HAV strains is limited. Serum samples obtained from 76 individuals during outbreaks of hepatitis A in Jember (East Java) in 2006 and Tangerang (West Java) in 2007 and those from 82 patients with acute hepatitis in Solo (Central Java), Denpasar on Bali Island, Mataram on Lombok Island, and Makassar on Sulawesi Island in 2003 or 2007 were tested for the presence of HAV RNA by reverse transcription PCR with primers targeting the VP1-2B region (481 nucleotides, primer sequences at both ends excluded). Overall, 34 serum samples had detectable HAV RNA, including at least one viremic sample from each of the six regions. These 34 strains were 96.3-100 % identical to each other and formed a phylogenetic cluster within genotype IA. Six representative HAV isolates from each region shared 98.3-98.9 % identity over the entire genome and constituted a IA sublineage with a bootstrap value of 100 %, consisting of only Indonesian strains. HAV strains recovered from Japanese patients who were presumed to have contracted HAV infection while visiting Indonesia were closest to the Indonesian IA HAV strains obtained in the present study, with a high identity of 99.5-99.7 %, supporting the Indonesian origin of the imported strains. These results indicate that genetic analysis of HAV strains indigenous to HAV-endemic countries, including Indonesia, are useful for tracing infectious sources in imported cases of acute hepatitis A and for defining the epidemiological features of HAV infection in that country.

  11. Cytotoxic chromosomal targeting by CRISPR/Cas systems can reshape bacterial genomes and expel or remodel pathogenicity islands.

    Directory of Open Access Journals (Sweden)

    Reuben B Vercoe

    2013-04-01

    Full Text Available In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs and their associated (Cas proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2 involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas-mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA-targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity.

  12. Cytotoxic chromosomal targeting by CRISPR/Cas systems can reshape bacterial genomes and expel or remodel pathogenicity islands.

    Directory of Open Access Journals (Sweden)

    Reuben B Vercoe

    2013-04-01

    Full Text Available In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs and their associated (Cas proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2 involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas-mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA-targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity.

  13. Cytotoxic Chromosomal Targeting by CRISPR/Cas Systems Can Reshape Bacterial Genomes and Expel or Remodel Pathogenicity Islands

    Science.gov (United States)

    Vercoe, Reuben B.; Chang, James T.; Dy, Ron L.; Taylor, Corinda; Gristwood, Tamzin; Clulow, James S.; Richter, Corinna; Przybilski, Rita; Pitman, Andrew R.; Fineran, Peter C.

    2013-01-01

    In prokaryotes, clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (Cas) proteins constitute a defence system against bacteriophages and plasmids. CRISPR/Cas systems acquire short spacer sequences from foreign genetic elements and incorporate these into their CRISPR arrays, generating a memory of past invaders. Defence is provided by short non-coding RNAs that guide Cas proteins to cleave complementary nucleic acids. While most spacers are acquired from phages and plasmids, there are examples of spacers that match genes elsewhere in the host bacterial chromosome. In Pectobacterium atrosepticum the type I-F CRISPR/Cas system has acquired a self-complementary spacer that perfectly matches a protospacer target in a horizontally acquired island (HAI2) involved in plant pathogenicity. Given the paucity of experimental data about CRISPR/Cas–mediated chromosomal targeting, we examined this process by developing a tightly controlled system. Chromosomal targeting was highly toxic via targeting of DNA and resulted in growth inhibition and cellular filamentation. The toxic phenotype was avoided by mutations in the cas operon, the CRISPR repeats, the protospacer target, and protospacer-adjacent motif (PAM) beside the target. Indeed, the natural self-targeting spacer was non-toxic due to a single nucleotide mutation adjacent to the target in the PAM sequence. Furthermore, we show that chromosomal targeting can result in large-scale genomic alterations, including the remodelling or deletion of entire pre-existing pathogenicity islands. These features can be engineered for the targeted deletion of large regions of bacterial chromosomes. In conclusion, in DNA–targeting CRISPR/Cas systems, chromosomal interference is deleterious by causing DNA damage and providing a strong selective pressure for genome alterations, which may have consequences for bacterial evolution and pathogenicity. PMID:23637624

  14. History shaped the geographic distribution of genomic admixture on the island of Puerto Rico.

    Directory of Open Access Journals (Sweden)

    Marc Via

    Full Text Available Contemporary genetic variation among Latin Americans human groups reflects population migrations shaped by complex historical, social and economic factors. Consequently, admixture patterns may vary by geographic regions ranging from countries to neighborhoods. We examined the geographic variation of admixture across the island of Puerto Rico and the degree to which it could be explained by historic and social events. We analyzed a census-based sample of 642 Puerto Rican individuals that were genotyped for 93 ancestry informative markers (AIMs to estimate African, European and Native American ancestry. Socioeconomic status (SES data and geographic location were obtained for each individual. There was significant geographic variation of ancestry across the island. In particular, African ancestry demonstrated a decreasing East to West gradient that was partially explained by historical factors linked to the colonial sugar plantation system. SES also demonstrated a parallel decreasing cline from East to West. However, at a local level, SES and African ancestry were negatively correlated. European ancestry was strongly negatively correlated with African ancestry and therefore showed patterns complementary to African ancestry. By contrast, Native American ancestry showed little variation across the island and across individuals and appears to have played little social role historically. The observed geographic distributions of SES and genetic variation relate to historical social events and mating patterns, and have substantial implications for the design of studies in the recently admixed Puerto Rican population. More generally, our results demonstrate the importance of incorporating social and geographic data with genetics when studying contemporary admixed populations.

  15. Genome-wide association and genomic prediction of resistance to maize lethal necrosis disease in tropical maize germplasm.

    Science.gov (United States)

    Gowda, Manje; Das, Biswanath; Makumbi, Dan; Babu, Raman; Semagn, Kassa; Mahuku, George; Olsen, Michael S; Bright, Jumbo M; Beyene, Yoseph; Prasanna, Boddupalli M

    2015-10-01

    Genome-wide association analysis in tropical and subtropical maize germplasm revealed that MLND resistance is influenced by multiple genomic regions with small to medium effects. The maize lethal necrosis disease (MLND) caused by synergistic interaction of Maize chlorotic mottle virus and Sugarcane mosaic virus, and has emerged as a serious threat to maize production in eastern Africa since 2011. Our objective was to gain insights into the genetic architecture underlying the resistance to MLND by genome-wide association study (GWAS) and genomic selection. We used two association mapping (AM) panels comprising a total of 615 diverse tropical/subtropical maize inbred lines. All the lines were evaluated against MLND under artificial inoculation. Both the panels were genotyped using genotyping-by-sequencing. Phenotypic variation for MLND resistance was significant and heritability was moderately high in both the panels. Few promising lines with high resistance to MLND were identified to be used as potential donors. GWAS revealed 24 SNPs that were significantly associated (P < 3 × 10(-5)) with MLND resistance. These SNPs are located within or adjacent to 20 putative candidate genes that are associated with plant disease resistance. Ridge regression best linear unbiased prediction with five-fold cross-validation revealed higher prediction accuracy for IMAS-AM panel (0.56) over DTMA-AM (0.36) panel. The prediction accuracy for both within and across panels is promising; inclusion of MLND resistance associated SNPs into the prediction model further improved the accuracy. Overall, the study revealed that resistance to MLND is controlled by multiple loci with small to medium effects and the SNPs identified by GWAS can be used as potential candidates in MLND resistance breeding program.

  16. Whole genome phylogeny for 21 Drosophila species using predicted 2b-RAD fragments

    Directory of Open Access Journals (Sweden)

    Arun S. Seetharam

    2013-12-01

    Full Text Available Type IIB restriction endonucleases are site-specific endonucleases that cut both strands of double-stranded DNA upstream and downstream of their recognition sequences. These restriction enzymes have recognition sequences that are generally interrupted and range from 5 to 7 bases long. They produce DNA fragments which are uniformly small, ranging from 21 to 33 base pairs in length (without cohesive ends. The fragments are generated from throughout the entire length of a genomic DNA providing an excellent fractional representation of the genome. In this study we simulated restriction enzyme digestions on 21 sequenced genomes of various Drosophila species using the predicted targets of 16 Type IIB restriction enzymes to effectively produce a large and arbitrary selection of loci from these genomes. The fragments were then used to compare organisms and to calculate the distance between genomes in pair-wise combination by counting the number of shared fragments between the two genomes. Phylogenetic trees were then generated for each enzyme using this distance measure and the consensus was calculated. The consensus tree obtained agrees well with the currently accepted tree for the Drosophila species. We conclude that multi-locus sub-genomic representation combined with next generation sequencing, especially for individuals and species without previous genome characterization, can accelerate studies of comparative genomics and the building of accurate phylogenetic trees.

  17. Genome-wide prediction, display and refinement of binding sites with information theory-based models

    Directory of Open Access Journals (Sweden)

    Leeder J Steven

    2003-09-01

    Full Text Available Abstract Background We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices. Results Parameters for genome scans are entered using a Java-based GUI interface and backend scripts in Perl. Multi-processor CPU load-sharing minimized the average response time for scans of different chromosomes. Scans of human genome assemblies required 4–6 hours for transcription factor binding sites and 10–19 hours for splice sites, respectively, on 24- and 3-node Mosix and Beowulf clusters. Individual binding sites are displayed either as high-resolution sequence walkers or in low-resolution custom tracks in the UCSC genome browser. For large datasets, we applied a data reduction strategy that limited displays of binding sites exceeding a threshold information content to specific chromosomal regions within or adjacent to genes. An HTML document is produced listing binding sites ranked by binding site strength or chromosomal location hyperlinked to the UCSC custom track, other annotation databases and binding site sequences. Post-genome scan tools parse binding site annotations of selected chromosome intervals and compare the results of genome scans using different weight matrices. Comparisons of multiple genome scans can display binding sites that are unique to each scan and identify sites with significantly altered binding strengths

  18. Enhancing genomic prediction with genome-wide association studies in multiparental maize populations

    Science.gov (United States)

    Genome-wide association mapping using dense marker sets has identified some nucleotide variants affecting complex traits which have been validated with fine-mapping and functional analysis. Many sequence variants associated with complex traits in maize have small effects and low repeatability, howev...

  19. Meta-analysis of genome-wide association from genomic prediction models

    Science.gov (United States)

    A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...

  20. Experimental-confirmation and functional-annotation of predicted proteins in the chicken genome

    Directory of Open Access Journals (Sweden)

    McCarthy Fiona M

    2007-11-01

    Full Text Available Abstract Background The chicken genome was sequenced because of its phylogenetic position as a non-mammalian vertebrate, its use as a biomedical model especially to study embryology and development, its role as a source of human disease organisms and its importance as the major source of animal derived food protein. However, genomic sequence data is, in itself, of limited value; generally it is not equivalent to understanding biological function. The benefit of having a genome sequence is that it provides a basis for functional genomics. However, the sequence data currently available is poorly structurally and functionally annotated and many genes do not have standard nomenclature assigned. Results We analysed eight chicken tissues and improved the chicken genome structural annotation by providing experimental support for the in vivo expression of 7,809 computationally predicted proteins, including 30 chicken proteins that were only electronically predicted or hypothetical translations in human. To improve functional annotation (based on Gene Ontology, we mapped these identified proteins to their human and mouse orthologs and used this orthology to transfer Gene Ontology (GO functional annotations to the chicken proteins. The 8,213 orthology-based GO annotations that we produced represent an 8% increase in currently available chicken GO annotations. Orthologous chicken products were also assigned standardized nomenclature based on current chicken nomenclature guidelines. Conclusion We demonstrate the utility of high-throughput expression proteomics for rapid experimental structural annotation of a newly sequenced eukaryote genome. These experimentally-supported predicted proteins were further annotated by assigning the proteins with standardized nomenclature and functional annotation. This method is widely applicable to a diverse range of species. Moreover, information from one genome can be used to improve the annotation of other genomes and

  1. Unraveling the regulatory network of IncA/C plasmid mobilization: When genomic islands hijack conjugative elements.

    Science.gov (United States)

    Carraro, Nicolas; Matteau, Dominick; Burrus, Vincent; Rodrigue, Sébastien

    2015-01-01

    Conjugative plasmids of the A/C incompatibility group (IncA/C) have become substantial players in the dissemination of multidrug resistance. These large conjugative plasmids are characterized by their broad host-range, extended spectrum of antimicrobials resistance, and prevalence in enteric bacteria recovered from both environmental and clinical settings. Until recently, relatively little was known about the basic biology of IncA/C plasmids, mostly because of the hindrance of multidrug resistance for molecular biology experiments. To circumvent this issue, we previously developed pVCR94ΔX, a convenient prototype that codes for a reduced set of antibiotic resistances. Using pVCR94ΔX, we then characterized the regulatory pathway governing IncA/C plasmid dissemination. We found that the expression of roughly 2 thirds of the genes encoded by this plasmid, including large operons involved in the conjugation process, depends on an FlhCD-like master activator called AcaCD. Beyond the mobility of IncA/C plasmids, AcaCD was also shown to play a key role in the mobilization of different classes of genomic islands (GIs) identified in various pathogenic bacteria. By doing so, IncA/C plasmids can have a considerable impact on bacterial genomes plasticity and evolution.

  2. Campylobacter fetus subspecies: Comparative genomics and prediction of potential virulence targets

    DEFF Research Database (Denmark)

    Ali, Amjad; Soares, Siomar C.; Santos, Anderson R.

    2012-01-01

    The genus Campylobacter contains pathogens causing a wide range of diseases, targeting both humans and animals. Among them, the Campylobacter fetus subspecies fetus and venerealis deserve special attention, as they are the etiological agents of human bacterial gastroenteritis and bovine genital...... in an island specific for C. fetus subsp. venerealis. The genomic variations and potential core and unique virulence factors characterized in this study would lead to better insight into the species virulence and to more efficient use of the candidates for antibiotic, drug and vaccine development....

  3. A putative genomic island, PGI-1, in Ralstonia solanacearum biovar 2 revealed by subtractive hybridization

    NARCIS (Netherlands)

    Stevens, P.; van Elsas, J.D.

    2010-01-01

    Ralstonia solanacearum biovar 2, a key bacterial pathogen of potato, has recently established in temperate climate waters. On the basis of isolates obtained from diseased (potato) plants, its genome has been assumed to be virtually clonal, but information on environmental isolates has been lacking.

  4. Computational prediction of microRNA genes in silkworm genome

    Institute of Scientific and Technical Information of China (English)

    TONG Chuan-zhou; JIN Yong-feng; ZHANG Yao-zhou

    2006-01-01

    MicroRNAs (miRNAs) constitute a novel, extensive class of small RNAs (~21 nucleotides), and play important gene-regulation roles during growth and development in various organisms. Here we conducted a homology search to identify homologs of previously validated miRNAs from silkworm genome. We identified 24 potential miRNA genes, and gave each of them a name according to the common criteria. Interestingly, we found that a great number of newly identified miRNAs were conserved in silkworm and Drosophila, and family alignment revealed that miRNA families might possess single nucleotide polymorphisms. miRNA gene clusters and possible functions of complement miRNA pairs are discussed.

  5. SPOCS: software for predicting and visualizing orthology/paralogy relationships among genomes

    Science.gov (United States)

    Curtis, Darren S.; Phillips, Aaron R.; Callister, Stephen J.; Conlan, Sean; McCue, Lee Ann

    2013-01-01

    Summary: At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. Availability and Implementation: A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required. Contact: leeann.mccue@pnnl.gov PMID:23956303

  6. SPOCS: Software for Predicting and Visualizing Orthology/Paralogy Relationships Among Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Curtis, Darren S.; Phillips, Aaron R.; Callister, Stephen J.; Conlan, Sean; McCue, Lee Ann

    2013-10-15

    At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. AVAILABILITY AND IMPLEMENTATION: A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required.

  7. Computational analysis and prediction for exons of PAC579 genomic sequence

    Institute of Scientific and Technical Information of China (English)

    黄弋; 覃文新; 万大方; 赵新泰; 顾健人

    2001-01-01

    To isolate the novel genes related to human hepatocellular carcinoma (HCC), we sequenced P1-derived artificial chromosome PAC579 (D17S926 locus) mapped in the minimum LOH (loss of heterozygosity) deletion region of chromosome 17p13.3 in HCC, Four novel genes mapped in this genomic sequence area were isolated and cloned by wet-lab experiments, and the exons of these genes were located. 0-60 kb of this genomic sequence including the genes of interest was scanned with five different computational exon prediction programs as well as four splice site recognition programs. After analyzing and comparing the computationally predicted results with the wet-lab experiment results, some potential exons were predicted in the genomic sequence by using these programs.

  8. Prediction of Genomic Breeding Values for feed efficiency and related traits in pigs

    DEFF Research Database (Denmark)

    Do, Duy Ngoc; Janss, Luc; Strathe, Anders Bjerring

    Improvement of feed efficiency is essential in pig breeding and selection for reduced residual feed intake (RFI) is an option. Accuracy of genomic prediction (GP) relies on assumptions of genetic architecture of the traits. This study applied five different Bayesian Power LASSO (BPL) models...... with different power parameters to investigate genetic architecture of RFI, to predict genomic breeding values, and to partition genetic variances for different SNP groups. Data were 1272 Duroc pigs with both genotypic and phenotypic records for RFI as well as daily feed intake (DFI). The gene mapping confirmed...... and indicates their potentials for genomic prediction. Further work includes applying other GP methods for RFI and DFI as well as extending these methods to feed efficiency related traits such as feeding behaviour and body composition traits....

  9. Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Sakanyan Vehary

    2008-05-01

    Full Text Available Abstract Background Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. Results We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I σ70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the α subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the σ70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. Conclusion The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.

  10. Marked variation in predicted and observed variability of tandem repeat loci across the human genome

    Directory of Open Access Journals (Sweden)

    Shields Denis C

    2008-04-01

    Full Text Available Abstract Background Tandem repeat (TR variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2–12 nucleotides in Whole Genome Shotgun (WGS sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome. Results We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation ρ = 0.29, p Conclusion Variability among 2–12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y – likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter – and excesses in chromosomes 6, 13, 20 and 21.

  11. Irruptive dynamics of introduced caribou on Adak Island, Alaska: an evaluation of Riney-Caughley model predictions

    Science.gov (United States)

    Ricca, Mark A.; Van Vuren, Dirk H.; Weckerly, Floyd W.; Williams, Jeffrey C.; Miles, A. Keith

    2014-01-01

    Large mammalian herbivores introduced to islands without predators are predicted to undergo irruptive population and spatial dynamics, but only a few well-documented case studies support this paradigm. We used the Riney-Caughley model as a framework to test predictions of irruptive population growth and spatial expansion of caribou (Rangifer tarandus granti) introduced to Adak Island in the Aleutian archipelago of Alaska in 1958 and 1959. We utilized a time series of spatially explicit counts conducted on this population intermittently over a 54-year period. Population size increased from 23 released animals to approximately 2900 animals in 2012. Population dynamics were characterized by two distinct periods of irruptive growth separated by a long time period of relative stability, and the catalyst for the initial irruption was more likely related to annual variation in hunting pressure than weather conditions. An unexpected pattern resembling logistic population growth occurred between the peak of the second irruption in 2005 and the next survey conducted seven years later in 2012. Model simulations indicated that an increase in reported harvest alone could not explain the deceleration in population growth, yet high levels of unreported harvest combined with increasing density-dependent feedbacks on fecundity and survival were the most plausible explanation for the observed population trend. No studies of introduced island Rangifer have measured a time series of spatial use to the extent described in this study. Spatial use patterns during the post-calving season strongly supported Riney-Caughley model predictions, whereby high-density core areas expanded outwardly as population size increased. During the calving season, caribou displayed marked site fidelity across the full range of population densities despite availability of other suitable habitats for calving. Finally, dispersal and reproduction on neighboring Kagalaska Island represented a new dispersal front

  12. Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.

    Science.gov (United States)

    Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu

    2017-09-01

    Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. Improving genomic prediction for Danish Jersey using a joint Danish-US reference population

    DEFF Research Database (Denmark)

    Su, Guosheng; Nielsen, Ulrik Sander; Wiggans, G;

    Accuracy of genomic prediction depends on the information in the reference population. Achieving an adequate sized reference population is a challenge for genomic prediction in small cattle populations. One way to increase the size of reference population is to combine reference data from different...... a GBLUP model from the Danish reference population and the joint Danish-US reference population. The traits in the analysis were milk yield, fat yield, protein yield, fertility, mastitis, longevity, body conformation, feet & legs, and longevity. Eight of the nine traits benefitted from the inclusion of US...

  14. Methods and Strategies to Impute Missing Genotypes for Improving Genomic Prediction

    DEFF Research Database (Denmark)

    Ma, Peipei

    Genomic prediction has been widely used in dairy cattle breeding. Genotype imputation is a key procedure to efficently utilize marker data from different chips and obtain high density marker data with minimizing cost. This thesis investigated methods and strategies to genotype imputation for impr......Genomic prediction has been widely used in dairy cattle breeding. Genotype imputation is a key procedure to efficently utilize marker data from different chips and obtain high density marker data with minimizing cost. This thesis investigated methods and strategies to genotype imputation...

  15. Ori-Finder 2, an integrated tool to predict replication origins in the archaeal genomes

    Directory of Open Access Journals (Sweden)

    Hao eLuo

    2014-09-01

    Full Text Available DNA replication is one of the most basic processes in all three domains of cellular life. With the advent of the post-genomic era, the increasing number of complete archaeal genomes has created an opportunity for exploration of the molecular mechanisms for initiating cellular DNA replication by in vivo experiments as well as in silico analysis. However, the location of replication origins (oriCs in many sequenced archaeal genomes remains unknown. We present a web-based tool Ori-Finder 2 to predict oriCs in the archaeal genomes automatically, based on the integrated method comprising the analysis of base composition asymmetry using the Z-curve method, the distribution of Origin Recognition Boxes (ORBs identified by FIMO tool, and the occurrence of genes frequently close to oriCs. The web server is also able to analyze the unannotated genome sequences by integrating with gene prediction pipelines and BLAST software for gene identification and function annotation. The result of the predicted oriCs is displayed as an HTML table, which offers an intuitive way to browse the result in graphical and tabular form. The software presented here is accurate for the genomes with single oriC, but it does not necessarily find all the origins of replication for the genomes with multiple oriCs. Ori-Finder 2 aims to become a useful platform for the identification and analysis of oriCs in the archaeal genomes, which would provide insight into the replication mechanisms in archaea. The web server is freely available at http://tubic.tju.edu.cn/Ori-Finder2/.

  16. Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Science.gov (United States)

    Sokolov, Artem; Carlin, Daniel E; Paull, Evan O; Baertsch, Robert; Stuart, Joshua M

    2016-03-01

    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.

  17. Pathway-Based Genomics Prediction using Generalized Elastic Net.

    Directory of Open Access Journals (Sweden)

    Artem Sokolov

    2016-03-01

    Full Text Available We present a novel regularization scheme called The Generalized Elastic Net (GELnet that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach.

  18. Pathway-Based Genomics Prediction using Generalized Elastic Net

    Science.gov (United States)

    Sokolov, Artem; Carlin, Daniel E.; Paull, Evan O.; Baertsch, Robert; Stuart, Joshua M.

    2016-01-01

    We present a novel regularization scheme called The Generalized Elastic Net (GELnet) that incorporates gene pathway information into feature selection. The proposed formulation is applicable to a wide variety of problems in which the interpretation of predictive features using known molecular interactions is desired. The method naturally steers solutions toward sets of mechanistically interlinked genes. Using experiments on synthetic data, we demonstrate that pathway-guided results maintain, and often improve, the accuracy of predictors even in cases where the full gene network is unknown. We apply the method to predict the drug response of breast cancer cell lines. GELnet is able to reveal genetic determinants of sensitivity and resistance for several compounds. In particular, for an EGFR/HER2 inhibitor, it finds a possible trans-differentiation resistance mechanism missed by the corresponding pathway agnostic approach. PMID:26960204

  19. Comprehensive prediction of chromosome dimer resolution sites in bacterial genomes

    Directory of Open Access Journals (Sweden)

    Arakawa Kazuharu

    2011-01-01

    Full Text Available Abstract Background During the replication process of bacteria with circular chromosomes, an odd number of homologous recombination events results in concatenated dimer chromosomes that cannot be partitioned into daughter cells. However, many bacteria harbor a conserved dimer resolution machinery consisting of one or two tyrosine recombinases, XerC and XerD, and their 28-bp target site, dif. Results To study the evolution of the dif/XerCD system and its relationship with replication termination, we report the comprehensive prediction of dif sequences in silico using a phylogenetic prediction approach based on iterated hidden Markov modeling. Using this method, dif sites were identified in 641 organisms among 16 phyla, with a 97.64% identification rate for single-chromosome strains. The dif sequence positions were shown to be strongly correlated with the GC skew shift-point that is induced by replicational mutation/selection pressures, but the difference in the positions of the predicted dif sites and the GC skew shift-points did not correlate with the degree of replicational mutation/selection pressures. Conclusions The sequence of dif sites is widely conserved among many bacterial phyla, and they can be computationally identified using our method. The lack of correlation between dif position and the degree of GC skew suggests that replication termination does not occur strictly at dif sites.

  20. Genome-wide analysis of the salmonella Fis regulon and its regulatory mechanism on pathogenicity islands.

    Directory of Open Access Journals (Sweden)

    Hui Wang

    Full Text Available Fis, one of the most important nucleoid-associated proteins, functions as a global regulator of transcription in bacteria that has been comprehensively studied in Escherichia coli K12. Fis also influences the virulence of Salmonella enterica and pathogenic E. coli by regulating their virulence genes, however, the relevant mechanism is unclear. In this report, using combined RNA-seq and chromatin immunoprecipitation (ChIP-seq technologies, we first identified 1646 Fis-regulated genes and 885 Fis-binding targets in the S. enterica serovar Typhimurium, and found a Fis regulon different from that in E. coli. Fis has been reported to contribute to the invasion ability of S. enterica. By using cell infection assays, we found it also enhances the intracellular replication ability of S. enterica within macrophage cell, which is of central importance for the pathogenesis of infections. Salmonella pathogenicity islands (SPI-1 and SPI-2 are crucial for the invasion and survival of S. enterica in host cells. Using mutation and overexpression experiments, real-time PCR analysis, and electrophoretic mobility shift assays, we demonstrated that Fis regulates 63 of the 94 Salmonella pathogenicity island (SPI-1 and SPI-2 genes, by three regulatory modes: i binds to SPI regulators in the gene body or in upstream regions; ii binds to SPI genes directly to mediate transcriptional activation of themselves and downstream genes; iii binds to gene encoding OmpR which affects SPI gene expression by controlling SPI regulators SsrA and HilD. Our results provide new insights into the impact of Fis on SPI genes and the pathogenicity of S. enterica.

  1. The stealth episome: suppression of gene expression on the excised genomic island PPHGI-1 from Pseudomonas syringae pv. phaseolicola.

    Directory of Open Access Journals (Sweden)

    Scott A C Godfrey

    2011-03-01

    Full Text Available Pseudomonas syringae pv. phaseolicola is the causative agent of halo blight in the common bean, Phaseolus vulgaris. P. syringae pv. phaseolicola race 4 strain 1302A contains the avirulence gene avrPphB (syn. hopAR1, which resides on PPHGI-1, a 106 kb genomic island. Loss of PPHGI-1 from P. syringae pv. phaseolicola 1302A following exposure to the hypersensitive resistance response (HR leads to the evolution of strains with altered virulence. Here we have used fluorescent protein reporter systems to gain insight into the mobility of PPHGI-1. Confocal imaging of dual-labelled P. syringae pv. phaseolicola 1302A strain, F532 (dsRFP in chromosome and eGFP in PPHGI-1, revealed loss of PPHGI-1::eGFP encoded fluorescence during plant infection and when grown in vitro on extracted leaf apoplastic fluids. Fluorescence-activated cell sorting (FACS of fluorescent and non-fluorescent PPHGI-1::eGFP F532 populations showed that cells lost fluorescence not only when the GI was deleted, but also when it had excised and was present as a circular episome. In addition to reduced expression of eGFP, quantitative PCR on sub-populations separated by FACS showed that transcription of other genes on PPHGI-1 (avrPphB and xerC was also greatly reduced in F532 cells harbouring the excised PPHGI-1::eGFP episome. Our results show how virulence determinants located on mobile pathogenicity islands may be hidden from detection by host surveillance systems through the suppression of gene expression in the episomal state.

  2. Assessment of the genomic prediction accuracy for feed efficiency traits in meat-type chickens.

    Science.gov (United States)

    Liu, Tianfei; Luo, Chenglong; Wang, Jie; Ma, Jie; Shu, Dingming; Lund, Mogens Sandø; Su, Guosheng; Qu, Hao

    2017-01-01

    Feed represents the major cost of chicken production. Selection for improving feed utilization is a feasible way to reduce feed cost and greenhouse gas emissions. The objectives of this study were to investigate the efficiency of genomic prediction for feed conversion ratio (FCR), residual feed intake (RFI), average daily gain (ADG) and average daily feed intake (ADFI) and to assess the impact of selection for feed efficiency traits FCR and RFI on eviscerating percentage (EP), breast muscle percentage (BMP) and leg muscle percentage (LMP) in meat-type chickens. Genomic prediction was assessed using a 4-fold cross-validation for two validation scenarios. The first scenario was a random family sampling validation (CVF), and the second scenario was a random individual sampling validation (CVR). Variance components were estimated based on the genomic relationship built with single nucleotide polymorphism markers. Genomic estimated breeding values (GEBV) were predicted using a genomic best linear unbiased prediction model. The accuracies of GEBV were evaluated in two ways: the correlation between GEBV and corrected phenotypic value divided by the square root of heritability, i.e., the correlation-based accuracy, and model-based theoretical accuracy. Breeding values were also predicted using a conventional pedigree-based best linear unbiased prediction model in order to compare accuracies of genomic and conventional predictions. The heritability estimates of FCR and RFI were 0.29 and 0.50, respectively. The heritability estimates of ADG, ADFI, EP, BMP and LMP ranged from 0.34 to 0.53. In the CVF scenario, the correlation-based accuracy and the theoretical accuracy of genomic prediction for FCR were slightly higher than those for RFI. The correlation-based accuracies for FCR, RFI, ADG and ADFI were 0.360, 0.284, 0.574 and 0.520, respectively, and the model-based theoretical accuracies were 0.420, 0.414, 0.401 and 0.382, respectively. In the CVR scenario, the correlation

  3. Assessment of the genomic prediction accuracy for feed efficiency traits in meat-type chickens

    Science.gov (United States)

    Wang, Jie; Ma, Jie; Shu, Dingming; Lund, Mogens Sandø; Su, Guosheng; Qu, Hao

    2017-01-01

    Feed represents the major cost of chicken production. Selection for improving feed utilization is a feasible way to reduce feed cost and greenhouse gas emissions. The objectives of this study were to investigate the efficiency of genomic prediction for feed conversion ratio (FCR), residual feed intake (RFI), average daily gain (ADG) and average daily feed intake (ADFI) and to assess the impact of selection for feed efficiency traits FCR and RFI on eviscerating percentage (EP), breast muscle percentage (BMP) and leg muscle percentage (LMP) in meat-type chickens. Genomic prediction was assessed using a 4-fold cross-validation for two validation scenarios. The first scenario was a random family sampling validation (CVF), and the second scenario was a random individual sampling validation (CVR). Variance components were estimated based on the genomic relationship built with single nucleotide polymorphism markers. Genomic estimated breeding values (GEBV) were predicted using a genomic best linear unbiased prediction model. The accuracies of GEBV were evaluated in two ways: the correlation between GEBV and corrected phenotypic value divided by the square root of heritability, i.e., the correlation-based accuracy, and model-based theoretical accuracy. Breeding values were also predicted using a conventional pedigree-based best linear unbiased prediction model in order to compare accuracies of genomic and conventional predictions. The heritability estimates of FCR and RFI were 0.29 and 0.50, respectively. The heritability estimates of ADG, ADFI, EP, BMP and LMP ranged from 0.34 to 0.53. In the CVF scenario, the correlation-based accuracy and the theoretical accuracy of genomic prediction for FCR were slightly higher than those for RFI. The correlation-based accuracies for FCR, RFI, ADG and ADFI were 0.360, 0.284, 0.574 and 0.520, respectively, and the model-based theoretical accuracies were 0.420, 0.414, 0.401 and 0.382, respectively. In the CVR scenario, the correlation

  4. Predictive biomarker discovery through the parallel integration of clinical trial and functional genomics datasets

    DEFF Research Database (Denmark)

    Swanton, C.; Larkin, J.M.; Gerlinger, M.

    2010-01-01

    RNA screens to identify and validate functionally important genomic or transcriptomic predictive biomarkers of individual drug response in patients. PREDICT's approach to predictive biomarker discovery differs from conventional associative learning approaches, which can be susceptible to the detection...... inhibitor. Through the analysis of tumour tissue derived from pre-operative renal cell carcinoma (RCC) clinical trials, the PREDICT consortium will use established and novel methods to integrate comprehensive tumour-derived genomic data with personalised tumour-derived shRNA and high throughput si......, reducing ineffective therapy in drug resistant disease, leading to improved quality of life and higher cost efficiency, which in turn should broaden patient access to beneficial therapeutics, thereby enhancing clinical outcome and cancer survival. The consortium will also establish and consolidate...

  5. Genome Neighborhood Network Reveals Insights into Enediyne Biosynthesis and Facilitates Prediction and Prioritization for Discovery

    Science.gov (United States)

    Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben

    2015-01-01

    The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027

  6. Genome-wide de Novo Prediction of Proximal and Distal Tissue-Specific Enhancers

    Energy Technology Data Exchange (ETDEWEB)

    Loots, G G; Ovcharenko, I V

    2005-11-03

    Determining how transcriptional regulatory networks are encoded in the human genome is essential for understanding how cellular processes are directed. Here, we present a novel approach for systematically predicting tissue specific regulatory elements (REs) that blends genome-wide expression profiling, vertebrate genome comparisons, and pattern analysis of transcription factor binding sites. This analysis yields 4,670 candidate REs in the human genome with distinct tissue specificities, the majority of which reside far away from transcription start sites. We identify key transcription factors (TFs) for 34 distinct tissues and demonstrate that tissue-specific gene expression relies on multiple regulatory pathways employing similar, but different cohorts of interacting TFs. The methods and results we describe provide a global view of tissue specific gene regulation in humans, and propose a strategy for deciphering the transcriptional regulatory code in eukaryotes.

  7. Accuracy of predicting genomic breeding values for carcass merit traits in Angus and Charolais beef cattle.

    Science.gov (United States)

    Chen, L; Vinsky, M; Li, C

    2015-02-01

    Accuracy of predicting genomic breeding values for carcass merit traits including hot carcass weight, longissimus muscle area (REA), carcass average backfat thickness (AFAT), lean meat yield (LMY) and carcass marbling score (CMAR) was evaluated based on 543 Angus and 400 Charolais steers genotyped on the Illumina BovineSNP50 Beadchip. For the genomic prediction within Angus, the average accuracy was 0.35 with a range from 0.32 (LMY) to 0.37 (CMAR) across different training/validation data-splitting strategies and statistical methods. The within-breed genomic prediction for Charolais yielded an average accuracy of 0.36 with a range from 0.24 (REA) to 0.46 (AFAT). The across-breed prediction had the lowest accuracy, which was on average near zero. When the data from the two breeds were combined to predict the breeding values of either breed, the prediction accuracy averaged 0.35 for Angus with a range from 0.33 (REA) to 0.39 (CMAR) and averaged 0.33 for Charolais with a range from 0.18 (REA) to 0.46 (AFAT). The prediction accuracy was slightly higher on average when the data were split by animal's birth year than when the data were split by sire family. These results demonstrate that the genetic relationship or relatedness of selection candidates with the training population has a great impact on the accuracy of predicting genomic breeding values under the density of the marker panel used in this study. © 2014 Her Majesty the Queen in Right of Canada. Animal Genetics © 2014 Stichting International Foundation for Animal Genetics.

  8. A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction

    Directory of Open Access Journals (Sweden)

    Osval A. Montesinos-López

    2017-06-01

    Full Text Available There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments.

  9. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery

    DEFF Research Database (Denmark)

    Hickey, John M.; Chiurugwi, Tinashe; Mackay, Ian

    2017-01-01

    The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human...... that unifies breeding approaches, biological discovery, and tools and methods. Here we compare and contrast some animal and plant breeding approaches to make a case for bringing the two together through the application of genomic selection. We propose a strategy for the use of genomic selection as a unifying...... use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic prediction of breeding values has the potential to improve selection, reduce costs and provide a platform...

  10. Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery.

    Science.gov (United States)

    Hickey, John M; Chiurugwi, Tinashe; Mackay, Ian; Powell, Wayne

    2017-08-30

    The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic prediction of breeding values has the potential to improve selection, reduce costs and provide a platform that unifies breeding approaches, biological discovery, and tools and methods. Here we compare and contrast some animal and plant breeding approaches to make a case for bringing the two together through the application of genomic selection. We propose a strategy for the use of genomic selection as a unifying approach to deliver innovative 'step changes' in the rate of genetic gain at scale.

  11. RNA 3D modules in genome-wide predictions of RNA 2D structure

    DEFF Research Database (Denmark)

    Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

    2015-01-01

    Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution....... These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D...

  12. Haplotype Based Genome-Enabled Prediction of Traits Across Nordic Red Cattle Breeds

    DEFF Research Database (Denmark)

    Castro Dias Cuyabano, Beatriz; Lund, Mogens Sandø; Rosa, G J M;

    SNP markers have been widely explored in genome based prediction. This study explored the use of haplotype blocks (haploblocks) to predict five milk production traits (fertility, mastitis, protein, fat and milk yield), using a mix of Nordic Red cattle as reference population for training.......1% higher reliability than with the individual SNP approach in mastitis. This work gives evidence that predictions using haploblocks along with a combined training population of dairy cattle, may improve prediction accuracy of important traits in the individual populations........ Predictions were performed under a Bayesian approach comparing a GBLUP and a mixture model. In general, predictions were more reliable when using haploblocks instead of individual SNPs as predictors. The Danish Red cattle presented the largest benefit in predictive ability from haploblocks, achieving 5...

  13. Performance of genomic prediction within and across generations in maritime pine

    NARCIS (Netherlands)

    Bartholomé, Jérôme; Heerwaarden, Van Joost; Isik, Fikret; Boury, Christophe; Vidal, Marjorie; Plomion, Christophe; Bouffier, Laurent

    2016-01-01

    Background: Genomic selection (GS) is a promising approach for decreasing breeding cycle length in forest trees. Assessment of progeny performance and of the prediction accuracy of GS models over generations is therefore a key issue. Results: A reference population of maritime pine (Pinus

  14. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed was...

  15. Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications

    Science.gov (United States)

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optim...

  16. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these

  17. Across Breed QTL Detection and Genomic Prediction in French and Danish Dairy Cattle Breeds

    DEFF Research Database (Denmark)

    van den Berg, Irene; Guldbrandtsen, Bernt; Hozé, C

    Our objective was to investigate the potential benefits of using sequence data to improve across breed genomic prediction, using data from five French and Danish dairy cattle breeds. First, QTL for protein yield were detected using high density genotypes. Part of the QTL detected within breed was...

  18. Genomic prediction of continuous and binary fertility traits of females in a composite beef cattle breed

    Science.gov (United States)

    Reproduction efficiency is a major factor in the profitability of the beef cattle industry. Genomic selection (GS) is a promising tool that may improve the predictive accuracy and genetic gain of fertility traits. There is a wide range of traits used to measure fertility in dairy and beef cattle inc...

  19. A combined approach for genome wide protein function annotation/prediction

    DEFF Research Database (Denmark)

    Benso, Alfredo; Di Carlo, Stefano; Ur Rehman, Hafeez

    2013-01-01

    proteins in functional genomics and biology in general motivates the use of computational techniques well orchestrated to accurately predict their functions. METHODS: We propose a computational flow for the functional annotation of a protein able to assign the most probable functions to a protein...

  20. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species.

    Science.gov (United States)

    de Bruijn, Irene; de Kock, Maarten J D; Yang, Meng; de Waard, Pieter; van Beek, Teris A; Raaijmakers, Jos M

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these predictions, however, are untested and the association between genome sequence and biological function of the predicted metabolite is lacking. Here we report the genome-based identification of previously unknown CLP gene clusters in plant pathogenic Pseudomonas syringae strains B728a and DC3000 and in plant beneficial Pseudomonas fluorescens Pf0-1 and SBW25. For P. fluorescens SBW25, a model strain in studying bacterial evolution and adaptation, the structure of the CLP with a predicted 9-amino acid peptide moiety was confirmed by chemical analyses. Mutagenesis confirmed that the three identified NRPS genes are essential for CLP synthesis in strain SBW25. CLP production was shown to play a key role in motility, biofilm formation and in activity of SBW25 against zoospores of Phytophthora infestans. This is the first time that an antimicrobial metabolite is identified from strain SBW25. The results indicate that genome mining may enable the discovery of unknown gene clusters and traits that are highly relevant in the lifestyle of plant beneficial and plant pathogenic bacteria.

  1. Genome-based discovery, structure prediction and functional analysis of cyclic lipopeptide antibiotics in Pseudomonas species

    NARCIS (Netherlands)

    Bruijn, de I.; Kock, de M.J.D.; Meng, Y.; Waard, de P.; Beek, van T.A.; Raaijmakers, J.M.

    2007-01-01

    Analysis of microbial genome sequences have revealed numerous genes involved in antibiotic biosynthesis. In Pseudomonads, several gene clusters encoding non-ribosomal peptide synthetases (NRPSs) were predicted to be involved in the synthesis of cyclic lipopeptide (CLP) antibiotics. Most of these pre

  2. Preliminary genomic predictions of feed saved for 1.4 million Holsteins

    Science.gov (United States)

    Genomic predictions of transmitting ability (GPTAs) for residual feed intake (RFI) were computed using data from 4,621 42-day and 202 28-day feed intake trials of 3,947 U.S. Holsteins born 1999-2013 in 9 research herds. The 28-day records had 8.5% larger error variance than 42-day records and receiv...

  3. Potential of marker selection to increase prediction accuracy of genomic selection in soybean (Glycine max L.).

    Science.gov (United States)

    Ma, Yansong; Reif, Jochen C; Jiang, Yong; Wen, Zixiang; Wang, Dechun; Liu, Zhangxiong; Guo, Yong; Wei, Shuhong; Wang, Shuming; Yang, Chunming; Wang, Huicai; Yang, Chunyan; Lu, Weiguo; Xu, Ran; Zhou, Rong; Wang, Ruizhen; Sun, Zudong; Chen, Huaizhu; Zhang, Wanhai; Wu, Jian; Hu, Guohua; Liu, Chunyan; Luan, Xiaoyan; Fu, Yashu; Guo, Tai; Han, Tianfu; Zhang, Mengchen; Sun, Bincheng; Zhang, Lei; Chen, Weiyuan; Wu, Cunxiang; Sun, Shi; Yuan, Baojun; Zhou, Xinan; Han, Dezhi; Yan, Hongrui; Li, Wenbin; Qiu, Lijuan

    Genomic selection is a promising molecular breeding strategy enhancing genetic gain per unit time. The objectives of our study were to (1) explore the prediction accuracy of genomic selection for plant height and yield per plant in soybean [Glycine max (L.) Merr.], (2) discuss the relationship between prediction accuracy and numbers of markers, and (3) evaluate the effect of marker preselection based on different methods on the prediction accuracy. Our study is based on a population of 235 soybean varieties which were evaluated for plant height and yield per plant at multiple locations and genotyped by 5361 single nucleotide polymorphism markers. We applied ridge regression best linear unbiased prediction coupled with fivefold cross-validations and evaluated three strategies of marker preselection. For plant height, marker density and marker preselection procedure impacted prediction accuracy only marginally. In contrast, for grain yield, prediction accuracy based on markers selected with a haplotype block analyses-based approach increased by approximately 4 % compared with random or equidistant marker sampling. Thus, applying marker preselection based on haplotype blocks is an interesting option for a cost-efficient implementation of genomic selection for grain yield in soybean breeding.

  4. Comparison of 61 Sequenced Escherichia coli Genomes

    DEFF Research Database (Denmark)

    Lukjancenko, Oksana; Wassenaar, T. M.; Ussery, David

    2010-01-01

    MLST was performed, many of the various strains appear jumbled and less well resolved. The predicted pan-genome comprises 15,741 gene families, and only 993 (6%) of the families are represented in every genome, comprising the core genome. The variable or 'accessory' genes thus make up more than 90......% of the pan-genome and about 80% of a typical genome; some of these variable genes tend to be co-localized on genomic islands. The diversity within the species E. coli, and the overlap in gene content between this and related species, suggests a continuum rather than sharp species borders in this group...

  5. Pathogenicity island mobility and gene content.

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Kelly Porter

    2013-10-01

    Key goals towards national biosecurity include methods for analyzing pathogens, predicting their emergence, and developing countermeasures. These goals are served by studying bacterial genes that promote pathogenicity and the pathogenicity islands that mobilize them. Cyberinfrastructure promoting an island database advances this field and enables deeper bioinformatic analysis that may identify novel pathogenicity genes. New automated methods and rich visualizations were developed for identifying pathogenicity islands, based on the principle that islands occur sporadically among closely related strains. The chromosomally-ordered pan-genome organizes all genes from a clade of strains; gaps in this visualization indicate islands, and decorations of the gene matrix facilitate exploration of island gene functions. A %E2%80%9Clearned phyloblocks%E2%80%9D method was developed for automated island identification, that trains on the phylogenetic patterns of islands identified by other methods. Learned phyloblocks better defined termini of previously identified islands in multidrug-resistant Klebsiella pneumoniae ATCC BAA-2146, and found its only antibiotic resistance island.

  6. Predicting Sea-Level Rise Vulnerability of Terrestrial Habitat and Wildlife of the Northwestern Hawaiian Islands

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Chapter 1 describes the vegetation and topography of 20 islands of Papahanaumokuakea Marine National Monument, the distribution and status of wildlife populations,...

  7. HOX Gene Promoter Prediction and Inter-genomic Comparison: An Evo-Devo Study

    Directory of Open Access Journals (Sweden)

    Marla A. Endriga

    2010-10-01

    Full Text Available Homeobox genes direct the anterior-posterior axis of the body plan in eukaryotic organisms. Promoter regions upstream of the Hox genes jumpstart the transcription process. CpG islands found within the promoter regions can cause silencing of these promoters. The locations of the promoter regions and the CpG islands of Homeo sapiens sapiens (human, Pan troglodytes (chimpanzee, Mus musculus (mouse, and Rattus norvegicus (brown rat are compared and related to the possible influence on the specification of the mammalian body plan. The sequence of each gene in Hox clusters A-D of the mammals considered were retrieved from Ensembl and locations of promoter regions and CpG islands predicted using Exon Finder. The predicted promoter sequences were confirmed via BLAST and verified against the Eukaryotic Promoter Database. The significance of the locations was determined using the Kruskal-Wallis test. Among the four clusters, only promoter locations in cluster B showed significant difference. HOX B genes have been linked with the control of genes that direct the development of axial morphology, particularly of the vertebral column bones. The magnitude of variation among the body plans of closely-related species can thus be partially attributed to the promoter kind, location and number, and gene inactivation via CpG methylation.

  8. The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands

    Directory of Open Access Journals (Sweden)

    Lindsay M. Veazey

    2016-07-01

    Full Text Available Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30–180 m is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3% for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta (“presence” threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai‘i.

  9. The implementation of rare events logistic regression to predict the distribution of mesophotic hard corals across the main Hawaiian Islands.

    Science.gov (United States)

    Veazey, Lindsay M; Franklin, Erik C; Kelley, Christopher; Rooney, John; Frazer, L Neil; Toonen, Robert J

    2016-01-01

    Predictive habitat suitability models are powerful tools for cost-effective, statistically robust assessment of the environmental drivers of species distributions. The aim of this study was to develop predictive habitat suitability models for two genera of scleractinian corals (Leptoserisand Montipora) found within the mesophotic zone across the main Hawaiian Islands. The mesophotic zone (30-180 m) is challenging to reach, and therefore historically understudied, because it falls between the maximum limit of SCUBA divers and the minimum typical working depth of submersible vehicles. Here, we implement a logistic regression with rare events corrections to account for the scarcity of presence observations within the dataset. These corrections reduced the coefficient error and improved overall prediction success (73.6% and 74.3%) for both original regression models. The final models included depth, rugosity, slope, mean current velocity, and wave height as the best environmental covariates for predicting the occurrence of the two genera in the mesophotic zone. Using an objectively selected theta ("presence") threshold, the predicted presence probability values (average of 0.051 for Leptoseris and 0.040 for Montipora) were translated to spatially-explicit habitat suitability maps of the main Hawaiian Islands at 25 m grid cell resolution. Our maps are the first of their kind to use extant presence and absence data to examine the habitat preferences of these two dominant mesophotic coral genera across Hawai'i.

  10. A unique arabinose 5-phosphate isomerase found within a genomic island associated with the uropathogenicity of Escherichia coli CFT073.

    Science.gov (United States)

    Mosberg, Joshua A; Yep, Alejandra; Meredith, Timothy C; Smith, Sara; Wang, Pan-Fen; Holler, Tod P; Mobley, Harry L T; Woodard, Ronald W

    2011-06-01

    Previous studies showed that deletion of genes c3405 to c3410 from PAI-metV, a genomic island from Escherichia coli CFT073, results in a strain that fails to compete with wild-type CFT073 after a transurethral cochallenge in mice and is deficient in the ability to independently colonize the mouse kidney. Our analysis of c3405 to c3410 suggests that these genes constitute an operon with a role in the internalization and utilization of an unknown carbohydrate. This operon is not found in E. coli K-12 but is present in a small number of pathogenic E. coli and Shigella boydii strains. One of the genes, c3406, encodes a protein with significant homology to the sugar isomerase domain of arabinose 5-phosphate isomerases but lacking the tandem cystathionine beta-synthase domains found in the other arabinose 5-phosphate isomerases of E. coli. We prepared recombinant c3406 protein, found it to possess arabinose 5-phosphate isomerase activity, and characterized this activity in detail. We also constructed a c3406 deletion mutant of E. coli CFT073 and demonstrated that this deletion mutant was still able to compete with wild-type CFT073 in a transurethral cochallenge in mice and could colonize the mouse kidney. These results demonstrate that the presence of c3406 is not essential for a pathogenic phenotype.

  11. Structure of a short-chain dehydrogenase/reductase (SDR) within a genomic island from a clinical strain of Acinetobacter baumannii

    Energy Technology Data Exchange (ETDEWEB)

    Shah, Bhumika S., E-mail: bhumika.shah@mq.edu.au; Tetu, Sasha G. [Macquarie University, Research Park Drive, Sydney, NSW 2109 (Australia); Harrop, Stephen J. [University of New South Wales, Sydney, NSW 2052 (Australia); Paulsen, Ian T.; Mabbutt, Bridget C. [Macquarie University, Research Park Drive, Sydney, NSW 2109 (Australia)

    2014-09-25

    The structure of a short-chain dehydrogenase encoded within genomic islands of A. baumannii strains has been solved to 2.4 Å resolution. This classical SDR incorporates a flexible helical subdomain. The NADP-binding site and catalytic side chains are identified. Over 15% of the genome of an Australian clinical isolate of Acinetobacter baumannii occurs within genomic islands. An uncharacterized protein encoded within one island feature common to this and other International Clone II strains has been studied by X-ray crystallography. The 2.4 Å resolution structure of SDR-WM99c reveals it to be a new member of the classical short-chain dehydrogenase/reductase (SDR) superfamily. The enzyme contains a nucleotide-binding domain and, like many other SDRs, is tetrameric in form. The active site contains a catalytic tetrad (Asn117, Ser146, Tyr159 and Lys163) and water molecules occupying the presumed NADP cofactor-binding pocket. An adjacent cleft is capped by a relatively mobile helical subdomain, which is well positioned to control substrate access.

  12. Intra- and interspecies genomic transfer of the Enterococcus faecalis pathogenicity island.

    Directory of Open Access Journals (Sweden)

    Jenny A Laverde Gomez

    Full Text Available Enterococci are the third leading cause of hospital associated infections and have gained increased importance due to their fast adaptation to the clinical environment by acquisition of antibiotic resistance and pathogenicity traits. Enterococcus faecalis harbours a pathogenicity island (PAI of 153 kb containing several virulence factors including the enterococcal surface protein (esp. Until now only internal fragments of the PAI or larger chromosomal regions containing it have been transferred. Here we demonstrate precise excision, circularization and horizontal transfer of the entire PAI element from the chromosome of E. faecalis strain UW3114. This PAI (ca. 200 kb contained some deletions and insertions as compared to the PAI of the reference strain MMH594, transferred precisely and integrated site-specifically into the chromosome of E. faecalis (intergenic region and Enterococcus faecium (tRNAlys. The internal PAI structure was maintained after transfer. We assessed phenotypic changes accompanying acquisition of the PAI and expression of some of its determinants. The esp gene is expressed on the surface of donor and both transconjugants. Biofilm formation and cytolytic activity were enhanced in E. faecalis transconjugants after acquisition of the PAI. No differences in pathogenicity of E. faecalis were detected using a mouse bacteraemia and a mouse peritonitis models (tail vein and intraperitoneal injection. A 66 kb conjugative pheromone-responsive plasmid encoding erm(B (pLG2 that was transferred in parallel with the PAI was sequenced. pLG2 is a pheromone responsive plasmid that probably promotes the PAI horizontal transfer, encodes antibiotic resistance features and contains complete replication and conjugation modules of enterococcal origin in a mosaic-like composition. The E. faecalis PAI can undergo precise intra- and interspecies transfer probably with the help of conjugative elements like conjugative resistance plasmids, supporting

  13. Influence of outliers on accuracy estimation in genomic prediction in plant breeding.

    Science.gov (United States)

    Estaghvirou, Sidi Boubacar Ould; Ogutu, Joseph O; Piepho, Hans-Peter

    2014-10-01

    Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 10 scenarios to evaluate the influence of outliers on the performance of seven methods for estimating accuracy. These scenarios are defined by the number of genotypes, marker effect variance, and magnitude of outliers. To mimic outliers, we added to one observation in each simulated dataset, in turn, 5-, 8-, and 10-times the error SD used to simulate small and large phenotypic datasets. The effect of outliers on accuracy estimation was evaluated by comparing deviations in the estimated and true accuracies for datasets with and without outliers. Outliers adversely influenced accuracy estimation, more so at small values of genetic variance or number of genotypes. A method for estimating heritability and predictive accuracy in plant breeding and another used to estimate accuracy in animal breeding were the most accurate and resistant to outliers across all scenarios and are therefore preferable for accuracy estimation in genomic prediction studies. The performances of the other five methods that use cross-validation were less consistent and varied widely across scenarios. The computing time for the methods increased as the size of outliers and sample size increased and the genetic variance decreased. Copyright © 2014 Ould Estaghvirou et al.

  14. [A novel method of the genome-wide prediction for the target genes and its application].

    Science.gov (United States)

    Zhang, Jing-Jing; Feng, Jing; Zhu, Ying-Guo; Li, Yang-Sheng

    2006-10-01

    Based on the protein databases of several model species, this study developed a new method of the Genome-wide prediction for the target genes, using Hidden Markov model by Perl programming. The advantages of this method are high throughput, high quality and easy prediction, especially in the case of multi-domains proteins families. By this method, we predicted the PPR and TPR proteins families in whole genome of several model species. There were 536 PPR proteins and 199 TPR proteins in Oryza sativa ssp. japonica, 519 PPR proteins and 177 TPR proteins in Oryza sativa L. ssp. indica, 735 PPR proteins and 292 TPR proteins in Arabidopsis thaliana, 6 PPR proteins and 32 TPR proteins in Cyanidioschyzon merolae. Synechococcus and Thermophilic archaebacterium did not have PPR proteins. By contrast, 10 TPR proteins were found in Synechococcus and 4 TPR proteins were found in Thermophilic archaebacterium. Moreover, of these results, some further bioinformatics analyses were conducted.

  15. Identification of DNA motifs implicated in maintenance of bacterial core genomes by predictive modeling.

    Science.gov (United States)

    Halpern, David; Chiapello, Hélène; Schbath, Sophie; Robin, Stéphane; Hennequet-Antier, Christelle; Gruss, Alexandra; El Karoui, Meriem

    2007-09-01

    Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the "crossover hotspot instigator," or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions.

  16. Identification of DNA motifs implicated in maintenance of bacterial core genomes by predictive modeling.

    Directory of Open Access Journals (Sweden)

    David Halpern

    2007-09-01

    Full Text Available Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the "crossover hotspot instigator," or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions.

  17. Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer.

    Directory of Open Access Journals (Sweden)

    Giovanny Covarrubias-Pazaran

    Full Text Available Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Moreover, Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. A new open-source R package called sommer is presented to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The use of sommer for genomic prediction is demonstrated through several examples using maize and wheat genotypic and phenotypic data. At its core, the program contains three algorithms for estimating variance components: Average information (AI, Expectation-Maximization (EM and Efficient Mixed Model Association (EMMA. Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. Results from sommer were comparable to other software, but the analysis was faster than Bayesian counterparts in the magnitude of hours to days. In addition, ability to deal with missing data, combined with greater flexibility and speed than other REML-based software was achieved by putting together some of the most efficient algorithms to fit models in a gentle environment such as R.

  18. Comparative genomics of bacterial and plant folate synthesis and salvage: predictions and validations

    Directory of Open Access Journals (Sweden)

    Noiriel Alexandre

    2007-07-01

    Full Text Available Abstract Background Folate synthesis and salvage pathways are relatively well known from classical biochemistry and genetics but they have not been subjected to comparative genomic analysis. The availability of genome sequences from hundreds of diverse bacteria, and from Arabidopsis thaliana, enabled such an analysis using the SEED database and its tools. This study reports the results of the analysis and integrates them with new and existing experimental data. Results Based on sequence similarity and the clustering, fusion, and phylogenetic distribution of genes, several functional predictions emerged from this analysis. For bacteria, these included the existence of novel GTP cyclohydrolase I and folylpolyglutamate synthase gene families, and of a trifunctional p-aminobenzoate synthesis gene. For plants and bacteria, the predictions comprised the identities of a 'missing' folate synthesis gene (folQ and of a folate transporter, and the absence from plants of a folate salvage enzyme. Genetic and biochemical tests bore out these predictions. Conclusion For bacteria, these results demonstrate that much can be learnt from comparative genomics, even for well-explored primary metabolic pathways. For plants, the findings particularly illustrate the potential for rapid functional assignment of unknown genes that have prokaryotic homologs, by analyzing which genes are associated with the latter. More generally, our data indicate how combined genomic analysis of both plants and prokaryotes can be more powerful than isolated examination of either group alone.

  19. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    Energy Technology Data Exchange (ETDEWEB)

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  20. Comparison between genomic predictions using daughter yield deviation and conventional estimated breeding value as response variables

    DEFF Research Database (Denmark)

    Guo, Gang; Lund, Mogens Sandø; Zhang, Y;

    2010-01-01

    This study compared genomic predictions using conventional estimated breeding values (EBV) and daughter yield deviations (DYD) as response variables based on simulated data. Eight scenarios were simulated in regard to heritability (0.05 and 0.30), number of daughters per sire (30, 100, and unequal......), the EBV and DYD approaches provided similar genomic estimated breeding value (GEBV) reliabilities, except for scenarios with unequal numbers of daughters and half of sires without genotype, for which the EBV approach was superior to the DYD approach (by 1.2 and 2.4%). Using a Bayesian mixture prior model...

  1. Can metabolomics in addition to genomics add to prognostic and predictive information in breast cancer?

    Science.gov (United States)

    Howell, Anthony

    2010-11-16

    Genomic data from breast cancers provide additional prognostic and predictive information that is beginning to be used for patient management. The question arises whether additional information derived from other 'omic' approaches such as metabolomics can provide additional information. In an article published this month in BMC Cancer, Borgan et al. add metabolomic information to genomic measures in breast tumours and demonstrate, for the first time, that it may be possible to further define subgroups of patients which could be of value clinically. See research article: http://www.biomedcentral.com/1471-2407/10/628.

  2. Using physicochemical and compositional characteristics of DNA sequence for prediction of genomic signals

    KAUST Repository

    Mulamba, Pierre Abraham

    2014-12-01

    The challenge in finding genes in eukaryotic organisms using computational methods is an ongoing problem in the biology. Based on various genomic signals found in eukaryotic genomes, this problem can be divided into many different sub­‐problems such as identification of transcription start sites, translation initiation sites, splice sites, poly (A) signals, etc. Each sub-­problem deals with a particular type of genomic signals and various computational methods are used to solve each sub-­problem. Aggregating information from all these individual sub-­problems can lead to a complete annotation of a gene and its component signals. The fundamental principle of most of these computational methods is the mapping principle – building an input-­output model for the prediction of a particular genomic signal based on a set of known input signals and their corresponding output signal. The type of input signals used to build the model is an essential element in most of these computational methods. The common factor of most of these methods is that they are mainly based on the statistical analysis of the basic nucleotide sequence string composition. 4 Our study is based on a novel approach to predict genomic signals in which uniquely generated structural profiles that combine compressed physicochemical properties with topological and compositional properties of DNA sequences are used to develop machine learning predictive models. The compression of the physicochemical properties is made using principal component analysis transformation. Our ideas are evaluated through prediction models of canonical splice sites using support vector machine models. We demonstrate across several species that the proposed methodology has resulted in the most accurate splice site predictors that are publicly available or described. We believe that the approach in this study is quite general and has various applications in other biological modeling problems.

  3. Effect of marker-data editing on the accuracy of genomic prediction.

    Science.gov (United States)

    Edriss, V; Guldbrandtsen, B; Lund, M S; Su, G

    2013-04-01

    Genomic selection is a method to predict breeding values using genome-wide single-nucleotide polymorphism (SNP) markers. High-quality marker data are necessary for genomic selection. The aim of this study was to investigate the effect of marker-editing criteria on the accuracy of genomic predictions in the Nordic Holstein and Jersey populations. Data included 4429 Holstein and 1071 Jersey bulls. In total, 48,222 SNP for Holstein and 44,305 SNP for Jersey were polymorphic. The SNP data were edited based on (i) minor allele frequencies (MAF) with thresholds of no limit, 0.001, 0.01, 0.02, 0.05 and 0.10, (ii) deviations from Hardy-Weinberg proportions (HWP) with thresholds of no limit, chi-squared p-values of 0.001, 0.02, 0.05 and 0.10, and (iii) GenCall (GC) scores with thresholds of 0.15, 0.55, 0.60, 0.65 and 0.70. The marker data sets edited with different criteria were used for genomic prediction of protein yield, fertility and mastitis using a Bayesian variable selection and a GBLUP model. De-regressed EBV were used as response variables. The result showed little difference between prediction accuracies based on marker data sets edited with MAF and deviation from HWP. However, accuracy decreased with more stringent thresholds of GC score. According to the results of this study, it would be appropriate to edit data with restriction of MAF being between 0.01 and 0.02, a p-value of deviation from HWP being 0.05, and keeping all individual SNP genotypes having a GC score over 0.15. © 2012 Blackwell Verlag GmbH.

  4. Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

    Directory of Open Access Journals (Sweden)

    Sungkyoung Choi

    2016-12-01

    Full Text Available The success of genome-wide association studies (GWASs has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR, least absolute shrinkage and selection operator (LASSO, and Elastic-Net (EN. We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

  5. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

    Science.gov (United States)

    Wu, Chengchao; Yao, Shixin; Li, Xinghao; Chen, Chujia; Hu, Xuehai

    2017-01-01

    DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation. PMID:28212312

  6. Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

    Science.gov (United States)

    Choi, Sungkyoung; Bae, Sunghwan

    2016-01-01

    The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.

  7. Genomic prediction of survival time in a population of brown laying hens showing cannibalistic behavior.

    Science.gov (United States)

    Alemu, Setegn W; Calus, Mario P L; Muir, William M; Peeters, Katrijn; Vereijken, Addie; Bijma, Piter

    2016-09-13

    Mortality due to cannibalism causes both economic and welfare problems in laying hens. To limit mortality due to cannibalism, laying hens are often beak-trimmed, which is undesirable for animal welfare reasons. Genetic selection is an alternative strategy to increase survival and is more efficient by taking heritable variation that originates from social interactions into account, which are modelled as the so-called indirect genetic effects (IGE). Despite the considerable heritable variation in survival time due to IGE, genetic improvement of survival time in laying hens is still challenging because the detected heritable variation of the trait with IGE is still limited, ranging from 0.06 to 0.26, and individuals that are still alive at the end of the recording period are censored. Furthermore, survival time records are available late in life and only on females. To cope with these challenges, we tested the hypothesis that genomic prediction increases the accuracy of estimated breeding values (EBV) compared to parental average EBV, and increases response to selection for survival time compared to a traditional breeding scheme. We tested this hypothesis in two lines of brown layers with intact beaks, which show cannibalism, and also the hypothesis that the rate of inbreeding per year is lower for genomic selection than for the traditional breeding scheme. The standard deviation of genomic prediction EBV for survival time was around 22 days for both lines, indicating good prospects for selection against mortality in laying hens with intact beaks. Genomic prediction increased the accuracy of the EBV by 35 and 32 % compared to the parent average EBV for the two lines. At the current reference population size, predicted response to selection was 91 % higher when using genomic selection than with the traditional breeding scheme, as a result of a shorter generation interval in males and greater accuracy of selection in females. The predicted rate of inbreeding per

  8. Protein Subcellular Localization Prediction and Genomic Polymorphism Analysis of the SARS Coronavirus

    Institute of Scientific and Technical Information of China (English)

    季星来; 柳树群; 李岭; 孙之荣

    2004-01-01

    The cause of severe acute respiratory syndrome (SARS) has been identified as a new coronavirus (CoV).Several sequences of the complete genome of SARS-CoV have been determined.The subcellular localization (SubLocation) of annotated open-reading frames of the SARS-CoV genome was predicted using a support vector machine.Several gene products were predicted to locate in the Golgi body and cell nucleus.The SubLocation information was combined with predicted transmembrane information to develop a model of the viral life cycle.The results show that this information can be used to predict the functions of genes and even the virus pathogenesis.In addition,the entire SARS viral genome sequences currently available in GenBank were compared to identify the sequence variations among different isolates.Some variations in the Hong Kong strains may be related to the special clinical manifestations and provide clues for understanding the relationship between gene functions and evolution.These variations reflect the evolution of the SARS virus in human populations and may help development of a vaccine.

  9. Prediction of type III secretion signals in genomes of gram-negative bacteria.

    Directory of Open Access Journals (Sweden)

    Martin Löwer

    Full Text Available BACKGROUND: Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins" have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein, their chromosomes (11% and plasmids (13%, as well as 213 Firmicute genomes (7%. CONCLUSIONS/SIGNIFICANCE: We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org.

  10. A highly conserved gene island of three genes on chromosome 3B of hexaploid wheat: diverse gene function and genomic structure maintained in a tightly linked block

    Directory of Open Access Journals (Sweden)

    Ma Wujun

    2010-05-01

    Full Text Available Abstract Background The complexity of the wheat genome has resulted from waves of retrotransposable element insertions. Gene deletions and disruptions generated by the fast replacement of repetitive elements in wheat have resulted in disruption of colinearity at a micro (sub-megabase level among the cereals. In view of genomic changes that are possible within a given time span, conservation of genes between species tends to imply an important functional or regional constraint that does not permit a change in genomic structure. The ctg1034 contig completed in this paper was initially studied because it was assigned to the Sr2 resistance locus region, but detailed mapping studies subsequently assigned it to the long arm of 3B and revealed its unusual features. Results BAC shotgun sequencing of the hexaploid wheat (Triticum aestivum cv. Chinese Spring genome has been used to assemble a group of 15 wheat BACs from the chromosome 3B physical map FPC contig ctg1034 into a 783,553 bp genomic sequence. This ctg1034 sequence was annotated for biological features such as genes and transposable elements. A three-gene island was identified among >80% repetitive DNA sequence. Using bioinformatics analysis there were no observable similarity in their gene functions. The ctg1034 gene island also displayed complete conservation of gene order and orientation with syntenic gene islands found in publicly available genome sequences of Brachypodium distachyon, Oryza sativa, Sorghum bicolor and Zea mays, even though the intergenic space and introns were divergent. Conclusion We propose that ctg1034 is located within the heterochromatic C-band region of deletion bin 3BL7 based on the identification of heterochromatic tandem repeats and presence of significant matches to chromodomain-containing gypsy LTR retrotransposable elements. We also speculate that this location, among other highly repetitive sequences, may account for the relative stability in gene order and

  11. Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction

    Science.gov (United States)

    Bandeira e Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose

    2017-01-01

    Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. PMID:28455415

  12. Impact of Relationships between Test and Reference Animals and between Reference Animals on Reliability of Genomic Prediction

    DEFF Research Database (Denmark)

    Wu, Xiaoping; Lund, Mogens Sandø; Sun, Dongxiao

    as a common test population. A GBLUP model and a Bayesian mixture model were applied to predict Genomic breeding values for bulls in the test data. Result showed that a closer relationship between test and reference animals led to a higher reliability, while a closer relationship between reference animal......This study investigated reliability of genomic prediction in various scenarios with regard to relationship between test and reference animals and between animals within the reference population. Different reference populations were generated from EuroGenomics data and 1288 Nordic Holstein bulls...... resulted in a lower reliability. Therefore, the design of reference population is important for improving the reliability of genomic prediction. With regard to model, the Bayesian mixture model in general led to slightly a higher reliability of genomic prediction than the GBLUP model....

  13. A novel genome-wide full- length kinesin prediction analysis reveals additional mammalian kinesins

    Institute of Scientific and Technical Information of China (English)

    XUE Yu; LIU Dan; FU Chuanhai; DOU Zhen; ZHOU Qing; YAO Xuebiao

    2006-01-01

    Kinesin superfamily of microtubule- based motor orchestrates a variety of cellular processes. Recent availability of mammalian genomes has enabled analyses of kinesins on the whole genome. Here we present a novel full-length kinesin prediction program (FKPP) for mammalian kinesin gene discovery based on a comparative genomics approach. Contrary to previous predictions of 94 kinesins, we identify a total of 134 potentially kinesin genes from mammalian genomes, including 45 from mouse, 45 from rat and 44 from human. In addition, FKPP synthesizes 25 potentially full-length mammalian kinesins based on the partial sequences in the database. Surprisingly, FKPP reveals that full-length human CENP-E contains 2701 aa rather than 2663 aa in the database. Experimentation using sequence specific antibody and cDNA sequencing of human CENP-E validates the accuracy of FKPP. Given the remarkable computing efficiency and accuracy of FKPP, we reclassify the mammalian kinesin superfamily. Since current databases contain many incomplete sequences, FKPP may provide a novel approach for molecular delineation of kinesins and other protein families.

  14. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    Science.gov (United States)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  15. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots.

    Directory of Open Access Journals (Sweden)

    Sunita Kumari

    Full Text Available Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs. The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the computational prediction of CPEs across eight plant genomes to help better understand the transcription initiation complex assembly. The distribution of thirteen known CPEs across four monocots (Brachypodium distachyon, Oryza sativa ssp. japonica, Sorghum bicolor, Zea mays and four dicots (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Glycine max reveals the structural organization of the core promoter in relation to the TATA-box as well as with respect to other CPEs. The distribution of known CPE motifs with respect to transcription start site (TSS exhibited positional conservation within monocots and dicots with slight differences across all eight genomes. Further, a more refined subset of annotated genes based on orthologs of the model monocot (O. sativa ssp. japonica and dicot (A. thaliana genomes supported the positional distribution of these thirteen known CPEs. DNA free energy profiles provided evidence that the structural properties of promoter regions are distinctly different from that of the non-regulatory genome sequence. It also showed that monocot core promoters have lower DNA free energy than dicot core promoters. The comparison of monocot and dicot promoter sequences highlights both the similarities and differences in the core promoter architecture irrespective of the species-specific nucleotide bias. This study will be useful for future work related to genome annotation projects and can inspire research efforts aimed to better understand regulatory mechanisms of transcription.

  16. Prediction of maize phenotype based on whole-genome single nucleotide polymorphisms using deep belief networks

    Science.gov (United States)

    Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.

    2017-05-01

    Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.

  17. A systems approach to predict oncometabolites via context-specific genome-scale metabolic networks.

    Directory of Open Access Journals (Sweden)

    Hojung Nam

    2014-09-01

    Full Text Available Altered metabolism in cancer cells has been viewed as a passive response required for a malignant transformation. However, this view has changed through the recently described metabolic oncogenic factors: mutated isocitrate dehydrogenases (IDH, succinate dehydrogenase (SDH, and fumarate hydratase (FH that produce oncometabolites that competitively inhibit epigenetic regulation. In this study, we demonstrate in silico predictions of oncometabolites that have the potential to dysregulate epigenetic controls in nine types of cancer by incorporating massive scale genetic mutation information (collected from more than 1,700 cancer genomes, expression profiling data, and deploying Recon 2 to reconstruct context-specific genome-scale metabolic models. Our analysis predicted 15 compounds and 24 substructures of potential oncometabolites that could result from the loss-of-function and gain-of-function mutations of metabolic enzymes, respectively. These results suggest a substantial potential for discovering unidentified oncometabolites in various forms of cancers.

  18. Bias of genetic trend of genomic predictions based on both real dairy cattle and simulated data

    DEFF Research Database (Denmark)

    Ma, Peipei; Lund, Mogens Sandø; Nielsen, Ulrik Sander;

    population. In simulated data, there was no bias when the test animals were unselected cows. When the G matrix was derived from genotypes of causal genes, the bias was reduced. The results suggest that the main reasons for causing the bias of the prediction trends are the selection of bulls and bull dams......This study investigated the phenomenon of bias in the trend of genomic predictions and attempted to find the reason and solution for this bias. The data used in this study include Danish Jersey data and simulation data. In Jersey data, the bias was reduced when cows were included in the reference...... as well as the inaccurate relationship matrix. The possible strategies to eliminate the bias could be to use cow reference and improve genomic relationship matrix...

  19. Genomic predictions based on a joint reference population for the Nordic Red cattle breeds.

    Science.gov (United States)

    Zhou, L; Heringstad, B; Su, G; Guldbrandtsen, B; Meuwissen, T H E; Svendsen, M; Grove, H; Nielsen, U S; Lund, M S

    2014-07-01

    The main aim of this study was to compare accuracies of imputation and genomic predictions based on single and joint reference populations for Norwegian Red (NRF) and a composite breed (DFS) consisting of Danish Red, Finnish Ayrshire, and Swedish Red. The single nucleotide polymorphism (SNP) data for NRF consisted of 2 data sets: one including 25,000 markers (NRF25K) and the other including 50,000 markers (NRF50K). The NRF25K data set had 2,572 bulls, and the NRF50K data set had 1,128 bulls. Four hundred forty-two bulls were genotyped in both data sets (double-genotyped bulls). The DFS data set (DSF50K) included 50,000 markers of 13,472 individuals, of which around 4,700 were progeny-tested bulls. The NRF25K data set was imputed to 50,000 density using the software Beagle. The average error rate for the imputation of NRF25K decreased slightly from 0.023 to 0.021, and the correlation between observed and imputed genotypes changed from 0.935 to 0.936 when comparing the NRF50K reference and the NRF50K-DFS50K joint reference imputations. A genomic BLUP (GBLUP) model and a Bayesian 4-component mixture model were used to predict genomic breeding values for the NRF and DFS bulls based on the single and joint NRF and DFS reference populations. In the multiple population predictions, accuracies of genomic breeding values increased for the 3 production traits (milk, fat, and protein yields) for both NRF and DFS. Accuracies increased by 6 and 1.3 percentage points, on average, for the NRF and DFS bulls, respectively, using the GBLUP model, and by 9.3 and 1.3 percentage points, on average, using the Bayesian 4-component mixture model. However, accuracies for health or reproduction traits did not increase from the multiple population predictions. Among the 3 DFS populations, Swedish Red gained most in accuracies from the multiple population predictions, presumably because Swedish Red has a closer genetic relationship with NRF than Danish Red and Finnish Ayrshire. The Bayesian 4

  20. Distance from sub-Saharan Africa predicts mutational load in diverse human genomes.

    Science.gov (United States)

    Henn, Brenna M; Botigué, Laura R; Peischl, Stephan; Dupanloup, Isabelle; Lipatov, Mikhail; Maples, Brian K; Martin, Alicia R; Musharoff, Shaila; Cann, Howard; Snyder, Michael P; Excoffier, Laurent; Kidd, Jeffrey M; Bustamante, Carlos D

    2016-01-26

    The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

  1. A two step Bayesian approach for genomic prediction of breeding values

    DEFF Research Database (Denmark)

    Mahdi Shariati, Mohammad; Sørensen, Peter; Janss, Luc

    2012-01-01

    Background: In genomic models that assign an individual variance to each marker, the contribution of one marker to the posterior distribution of the marker variance is only one degree of freedom (df), which introduces many variance parameters with only little information per variance parameter...... of predicted breeding values. However, the accuracies of predicted breeding values were lower than Bayesian methods with marker specific variances. Conclusions: Grouping markers is less flexible than allowing each marker to have a specific marker variance but, by grouping, the power to estimate marker...

  2. Genomic prediction contributing to a promising global strategy to turbocharge gene banks.

    Science.gov (United States)

    Yu, Xiaoqing; Li, Xianran; Guo, Tingting; Zhu, Chengsong; Wu, Yuye; Mitchell, Sharon E; Roozeboom, Kraig L; Wang, Donghai; Wang, Ming Li; Pederson, Gary A; Tesso, Tesfaye T; Schnable, Patrick S; Bernardo, Rex; Yu, Jianming

    2016-10-03

    The 7.4 million plant accessions in gene banks are largely underutilized due to various resource constraints, but current genomic and analytic technologies are enabling us to mine this natural heritage. Here we report a proof-of-concept study to integrate genomic prediction into a broad germplasm evaluation process. First, a set of 962 biomass sorghum accessions were chosen as a reference set by germplasm curators. With high throughput genotyping-by-sequencing (GBS), we genetically characterized this reference set with 340,496 single nucleotide polymorphisms (SNPs). A set of 299 accessions was selected as the training set to represent the overall diversity of the reference set, and we phenotypically characterized the training set for biomass yield and other related traits. Cross-validation with multiple analytical methods using the data of this training set indicated high prediction accuracy for biomass yield. Empirical experiments with a 200-accession validation set chosen from the reference set confirmed high prediction accuracy. The potential to apply the prediction model to broader genetic contexts was also examined with an independent population. Detailed analyses on prediction reliability provided new insights into strategy optimization. The success of this project illustrates that a global, cost-effective strategy may be designed to assess the vast amount of valuable germplasm archived in 1,750 gene banks.

  3. Site-Specific Mobilization of Vinyl Chloride Respiration Islands by a Mechanism Common in Dehalococcoides

    Directory of Open Access Journals (Sweden)

    Edwards Elizabeth A

    2011-06-01

    Full Text Available Abstract Background Vinyl chloride is a widespread groundwater pollutant and Group 1 carcinogen. A previous comparative genomic analysis revealed that the vinyl chloride reductase operon, vcrABC, of Dehalococcoides sp. strain VS is embedded in a horizontally-acquired genomic island that integrated at the single-copy tmRNA gene, ssrA. Results We targeted conserved positions in available genomic islands to amplify and sequence four additional vcrABC -containing genomic islands from previously-unsequenced vinyl chloride respiring Dehalococcoides enrichments. We identified a total of 31 ssrA-specific genomic islands from Dehalococcoides genomic data, accounting for 47 reductive dehalogenase homologous genes and many other non-core genes. Sixteen of these genomic islands contain a syntenic module of integration-associated genes located adjacent to the predicted site of integration, and among these islands, eight contain vcrABC as genetic 'cargo'. These eight vcrABC -containing genomic islands are syntenic across their ~12 kbp length, but have two phylogenetically discordant segments that unambiguously differentiate the integration module from the vcrABC cargo. Using available Dehalococcoides phylogenomic data we estimate that these ssrA-specific genomic islands are at least as old as the Dehalococcoides group itself, which in turn is much older than human civilization. Conclusions The vcrABC -containing genomic islands are a recently-acquired subset of a diverse collection of ssrA-specific mobile elements that are a major contributor to strain-level diversity in Dehalococcoides, and may have been throughout its evolution. The high similarity between vcrABC sequences is quantitatively consistent with recent horizontal acquisition driven by ~100 years of industrial pollution with chlorinated ethenes.

  4. From structure prediction to genomic screens for novel non-coding RNAs.

    Directory of Open Access Journals (Sweden)

    Jan Gorodkin

    2011-08-01

    Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  5. Discovery and validation of a prostate cancer genomic classifier that predicts early metastasis following radical prostatectomy.

    Directory of Open Access Journals (Sweden)

    Nicholas Erho

    Full Text Available PURPOSE: Clinicopathologic features and biochemical recurrence are sensitive, but not specific, predictors of metastatic disease and lethal prostate cancer. We hypothesize that a genomic expression signature detected in the primary tumor represents true biological potential of aggressive disease and provides improved prediction of early prostate cancer metastasis. METHODS: A nested case-control design was used to select 639 patients from the Mayo Clinic tumor registry who underwent radical prostatectomy between 1987 and 2001. A genomic classifier (GC was developed by modeling differential RNA expression using 1.4 million feature high-density expression arrays of men enriched for rising PSA after prostatectomy, including 213 who experienced early clinical metastasis after biochemical recurrence. A training set was used to develop a random forest classifier of 22 markers to predict for cases--men with early clinical metastasis after rising PSA. Performance of GC was compared to prognostic factors such as Gleason score and previous gene expression signatures in a withheld validation set. RESULTS: Expression profiles were generated from 545 unique patient samples, with median follow-up of 16.9 years. GC achieved an area under the receiver operating characteristic curve of 0.75 (0.67-0.83 in validation, outperforming clinical variables and gene signatures. GC was the only significant prognostic factor in multivariable analyses. Within Gleason score groups, cases with high GC scores experienced earlier death from prostate cancer and reduced overall survival. The markers in the classifier were found to be associated with a number of key biological processes in prostate cancer metastatic disease progression. CONCLUSION: A genomic classifier was developed and validated in a large patient cohort enriched with prostate cancer metastasis patients and a rising PSA that went on to experience metastatic disease. This early metastasis prediction model based on

  6. Accuracy of genome enabled prediction in a dairy cattle population using different cross-validation layouts

    Directory of Open Access Journals (Sweden)

    M. Angeles ePérez-Cabal

    2012-02-01

    Full Text Available The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL, as well as a layout based on stratification by generation (GEN. The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, cross-validation designs should resemble the intended use of the predictive models, e.g. within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered.

  7. Whole genomic prediction of growth and carcass traits in a Chinese quality chicken population.

    Science.gov (United States)

    Zhang, Z; Xu, Z-Q; Luo, Y-Y; Zhang, H-B; Gao, N; He, J-L; Ji, C-L; Zhang, D-X; Li, J-Q; Zhang, X-Q

    2017-01-01

    By incorporating high-density markers into breeding value prediction models, the whole genomic prediction (WGP) method can effectively accelerate genetic improvement in livestock breeding. However, the performance of WGP varies across species and populations and is affected by the underlying genetic architecture. In particular, very little is known about the performance of WGP for many chicken breeds. Here we estimate the genetic parameters and evaluate the performance of WGP for 18 growth and carcass traits in a Chinese quality chicken population. In total, 435 chickens were systematically phenotyped and genotyped using a 600K genotyping array. Two variance component estimation scenarios, 3 breeding value prediction methods, and 2 validation procedures were compared. The results showed that the heritability of these 18 traits was medium to high (ranging from 0.28 to 0.60) and that deviations existed between the heritability estimated from pedigrees and markers. Compared with conventional breeding methods, WGP could potentially increase the selection accuracy by 20% or more depending on the prediction model used, the trait under consideration, and the genetic connectedness between the training and validation individuals. Our results showed the potential of implementing genomic selection in small breeding herds.

  8. Whole-genome prediction of fatty acid composition in meat of Japanese Black cattle.

    Science.gov (United States)

    Onogi, A; Ogino, A; Komatsu, T; Shoji, N; Shimizu, K; Kurogi, K; Yasumori, T; Togashi, K; Iwata, H

    2015-10-01

    Because fatty acid composition influences the flavor and texture of meat, controlling it is particularly important for cattle breeds such as the Japanese Black, characterized by high meat quality. We evaluated the predictive ability of single-step genomic best linear unbiased prediction (ssGBLUP) in fatty acid composition of Japanese Black cattle by assessing the composition of seven fatty acids in 3088 cattle, of which 952 had genome-wide marker genotypes. All sires of the genotyped animals were genotyped, but their dams were not. Cross-validation was conducted for the 952 animals. The prediction accuracy was higher with ssGBLUP than with best linear unbiased prediction (BLUP) for all traits, and in an empirical investigation, the gain in accuracy of using ssGBLUP over BLUP increased as the deviations in phenotypic values of the animals increased. In addition, the superior accuracy of ssGBLUP tended to be more evident in animals whose maternal grandsire was genotyped than in other animals, although the effect was small.

  9. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat

    KAUST Repository

    Liu, Guozheng

    2016-07-06

    Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1) examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2) explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3) investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L.) and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs), but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population.

  10. Predicting Hybrid Performances for Quality Traits through Genomic-Assisted Approaches in Central European Wheat.

    Directory of Open Access Journals (Sweden)

    Guozheng Liu

    Full Text Available Bread-making quality traits are central targets for wheat breeding. The objectives of our study were to (1 examine the presence of major effect QTLs for quality traits in a Central European elite wheat population, (2 explore the optimal strategy for predicting the hybrid performance for wheat quality traits, and (3 investigate the effects of marker density and the composition and size of the training population on the accuracy of prediction of hybrid performance. In total 135 inbred lines of Central European bread wheat (Triticum aestivum L. and 1,604 hybrids derived from them were evaluated for seven quality traits in up to six environments. The 135 parental lines were genotyped using a 90k single-nucleotide polymorphism array. Genome-wide association mapping initially suggested presence of several quantitative trait loci (QTLs, but cross-validation rather indicated the absence of major effect QTLs for all quality traits except of 1000-kernel weight. Genomic selection substantially outperformed marker-assisted selection in predicting hybrid performance. A resampling study revealed that increasing the effective population size in the estimation set of hybrids is relevant to boost the accuracy of prediction for an unrelated test population.

  11. Genomic biomarkers of prenatal intrauterine inflammation in umbilical cord tissue predict later life neurological outcomes.

    Science.gov (United States)

    Tilley, Sloane K; Joseph, Robert M; Kuban, Karl C K; Dammann, Olaf U; O'Shea, T Michael; Fry, Rebecca C

    2017-01-01

    Preterm birth is a major risk factor for neurodevelopmental delays and disorders. This study aimed to identify genomic biomarkers of intrauterine inflammation in umbilical cord tissue in preterm neonates that predict cognitive impairment at 10 years of age. Genome-wide messenger RNA (mRNA) levels from umbilical cord tissue were obtained from 43 neonates born before 28 weeks of gestation. Genes that were differentially expressed across four indicators of intrauterine inflammation were identified and their functions examined. Exact logistic regression was used to test whether expression levels in umbilical cord tissue predicted neurocognitive function at 10 years of age. Placental indicators of inflammation were associated with changes in the mRNA expression of 445 genes in umbilical cord tissue. Transcripts with decreased expression showed significant enrichment for biological signaling processes related to neuronal development and growth. The altered expression of six genes was found to predict neurocognitive impairment when children were 10 years old These genes include two that encode for proteins involved in neuronal development. Prenatal intrauterine inflammation is associated with altered gene expression in umbilical cord tissue. A set of six of the differentially expressed genes predict cognitive impairment later in life, suggesting that the fetal environment is associated with significant adverse effects on neurodevelopment that persist into later childhood.

  12. Insilco Prediction and Characterization of microRNAs from Oncopeltus fasciatus (Hemiptera: Lygaeidae) Genome.

    Science.gov (United States)

    Ellango, R; Asokan, R; Ramamurthy, V V

    2016-08-01

    For studies on functional genomics, small RNAs, especially microRNAs (miRNAs), have emerged as a hot topic due to their importance in cellular and developmental processes. Identification of insect miRNAs largely depends on the availability of genomic sequences in the public domain. The large milkweed bug, Oncopeltus fasciatus (Dallas) is a hemimetabolous insect which has become a model hemipteran system for various molecular studies. In this study, we identified 96 candidate mature miRNAs from O. fasciatus genome using a blast search with the previously reported animal miRNAs. The secondary structure of predicted miRNA sequences was determined online using "mfold" web server and verified by calculating the minimal free energy index (MFEI). Six miRNAs let-7e, miR-133c, miR-219b, mir-466d, mir-669f, and mir-669l are reported for the first time in Insecta. Comparison of O. fasciatus mir-2 and mir-71 family clusters to those of diverse insect species showed that they are highly conserved. The phylogenetic analysis of miRNAs revealed the evolutionary relationship of conserved miRNAs of O. fasciatus with other insect species. Using a classical rule-based algorithm method, we predicted the possible targets of the new miRNAs. Our study not only identified the list of miRNAs in O. fasciatus but also provides a basic platform for developing novel pest management strategies based on artificial miRNAs.

  13. Salmonella Genomic Island 1 (SGI1) and genetic characteristics of animal and food isolates of Salmonella typhimurium DT104 in Hungary.

    Science.gov (United States)

    Fekete, Péter Zsolt; Nagy, Béla

    2008-03-01

    To study the genetic characteristics of DT104 strains of Salmonella Typhimurium and the prevalence of Salmonella Genomic Island (SGI1) in Hungary, 140 recent Salmonella strains of food and animal origin were examined. For the first time in Hungary, the SGI1 was found in 17 out of 59 S. Typhimurium isolates (all proven to be DT104 phage type). These 17 strains were then subtyped by pulsed-field gel electrophoresis (PFGE) into 6 pulsotypes which were less correlated with the geographic origin than with the animal species of origin.

  14. Sequence-based prediction of single nucleosome positioning and genome-wide nucleosome occupancy.

    Science.gov (United States)

    van der Heijden, Thijn; van Vugt, Joke J F A; Logie, Colin; van Noort, John

    2012-09-18

    Nucleosome positioning dictates eukaryotic DNA compaction and access. To predict nucleosome positions in a statistical mechanics model, we exploited the knowledge that nucleosomes favor DNA sequences with specific periodically occurring dinucleotides. Our model is the first to capture both dyad position within a few base pairs, and free binding energy within 2 k(B)T, for all the known nucleosome positioning sequences. By applying Percus's equation to the derived energy landscape, we isolate sequence effects on genome-wide nucleosome occupancy from other factors that may influence nucleosome positioning. For both in vitro and in vivo systems, three parameters suffice to predict nucleosome occupancy with correlation coefficients of respectively 0.74 and 0.66. As predicted, we find the largest deviations in vivo around transcription start sites. This relatively simple algorithm can be used to guide future studies on the influence of DNA sequence on chromatin organization.

  15. Test Data Sets and Evaluation of Gene Prediction Programs on the Rice Genome

    Institute of Scientific and Technical Information of China (English)

    Heng Li; Tao Liu; Hai-Hong Li; Yan Li; Li-Jun Fang; Hui-Min Xie; Wei-Mou Zheng; Bai-Lin Hao; Jin-Song Liu; Zhao Xu; Jiao Jin; Lin Fang; Lei Gao; Yu-Dong Li; Zi-Xing Xing; Shao-Gen Gao

    2005-01-01

    With several rice genome projects approaching completion gene prediction/finding by computer algorithms has become an urgent task. Two test sets were constructed by mapping the newly published 28,469 full-length KOME rice cDNA to the RGP BAC clone sequences of Oryza sativa ssp. japonica: a single-gene set of 550 sequences and a multi-gene set of 62 sequences with 271 genes. These data sets were used to evaluate five ab initio gene prediction programs: RiceHMM,GlimmerR, GeneMark, FGENSH and BGF. The predictions were compared on nucleotide, exon and whole gene structure levels using commonly accepted measures and several new measures. The test results show a progress in performance in chronological order. At the same time complementarity of the programs hints on the possibility of further improvement and on the feasibility of reaching better performance by combining several gene-finders.

  16. Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome analysis

    DEFF Research Database (Denmark)

    Pedersen, Anders Gorm; Nielsen, Henrik

    1997-01-01

    Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role.This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known...... and global sequence information. Furthermore, analysis of false predictions shows that AUGs in frame with the actual start codon are more frequently selected than out-of-frame AUGs, suggesting that our nteworks use reading frame detection. A number of conflicts between neural network predictions and database...... annotations are analysed in detail, leading to identification of possible database errors....

  17. Draft genome of Leisingera aquaemixtae CECT 8399T, a member of the Roseobacter clade isolated from a junction of fresh and ocean water in Jeju Island, South Korea

    Directory of Open Access Journals (Sweden)

    Lidia Rodrigo-Torres

    2016-03-01

    Full Text Available We report the draft genome sequence and annotation of Leisingera aquaemixtae CECT 8399T (DDBJ/EMBL/GenBank accession number CYSR00000000 which comprises 4,614,060 bp, 4313 protein coding genes, 54 tRNA coding genes and 7 rRNA coding genes. General findings of the annotated genome, such as pigment indigoidine operon, phenylacetate oxidation genes or predictable number of replicons, are commented in comparison to other Leisingera species. Average Nucleotide Identity between available genomes of type strains of species of Leisingera and Phaeobacter genera has been calculated to evaluate its current classification.

  18. Draft genome of Leisingera aquaemixtae CECT 8399T, a member of the Roseobacter clade isolated from a junction of fresh and ocean water in Jeju Island, South Korea

    Science.gov (United States)

    Rodrigo-Torres, Lidia; Pujalte, María J.; Arahal, David R.

    2016-01-01

    We report the draft genome sequence and annotation of Leisingera aquaemixtae CECT 8399T (DDBJ/EMBL/GenBank accession number CYSR00000000) which comprises 4,614,060 bp, 4313 protein coding genes, 54 tRNA coding genes and 7 rRNA coding genes. General findings of the annotated genome, such as pigment indigoidine operon, phenylacetate oxidation genes or predictable number of replicons, are commented in comparison to other Leisingera species. Average Nucleotide Identity between available genomes of type strains of species of Leisingera and Phaeobacter genera has been calculated to evaluate its current classification. PMID:26981415

  19. Genome-wide association analysis and genomic prediction of Mycobacterium avium subspecies paratuberculosis infection in US Jersey cattle.

    Directory of Open Access Journals (Sweden)

    Yalda Zare

    Full Text Available Paratuberculosis (Johne's disease, an enteric disorder in ruminants caused by Mycobacterium avium subspecies paratuberculosis (MAP, causes economic losses in excess of $200 million annually to the US dairy industry. To identify genomic regions underlying susceptibility to MAP infection in Jersey cattle, a case-control genome-wide association study (GWAS was performed. Blood and fecal samples were collected from ∼ 5,000 mature cows in 30 commercial Jersey herds from across the US. Discovery data consisted of 450 cases and 439 controls genotyped with the Illumina BovineSNP50 BeadChip. Cases were animals with positive ELISA and fecal culture (FC results. Controls were animals negative to both ELISA and FC tests that matched cases on birth date and herd. Validation data consisted of 180 animals including 90 cases (positive to FC and 90 controls (negative to ELISA and FC, selected from discovery herds and genotyped by Illumina BovineLD BeadChip (∼ 7K SNPs. Two analytical approaches were used: single-marker GWAS using the GRAMMAR-GC method and Bayesian variable selection (Bayes C using GenSel software. GRAMMAR-GC identified one SNP on BTA7 at 68 megabases (Mb surpassing a significance threshold of 5 × 10(-5. ARS-BFGL-NGS-11887 on BTA23 (27.7 Mb accounted for the highest percentage of genetic variance (3.3% in the Bayes C analysis. SNPs identified in common by GRAMMAR-GC and Bayes C in both discovery and combined data were mapped to BTA23 (27, 29 and 44 Mb, 3 (100, 101, 106 and 107 Mb and 17 (57 Mb. Correspondence between results of GRAMMAR-GC and Bayes C was high (70-80% of most significant SNPs in common. These SNPs could potentially be associated with causal variants underlying susceptibility to MAP infection in Jersey cattle. Predictive performance of the model developed by Bayes C for prediction of infection status of animals in validation set was low (55% probability of correct ranking of paired case and control samples.

  20. Computational analysis and prediction for exons of PAC579 genomic sequence

    Institute of Scientific and Technical Information of China (English)

    HUANG; Yi(

    2001-01-01

    [1]Milanesi. L.. Kolchanov, N., Rogozin, I. et al.. Sequence functional inference, in Guide to Human Genome Computing (ed.Bishop. M. J.). Cambridge: Academic Press, 1994, 249-312.[2]Solovyev. V. V., Salamov, A. A., Lawrence, C. B., Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames, Nucleic Acids Res., 1994.22(24): 5156-5163.[3]Borodovsky, M., McIninch, J., GeneMark: parallel gene recognition for both DNA stands, Comp, Chem,, 1993, 17:123-133.[4]Guigo. R.. Knudsen, S, Drake, N. et al., Prediction of gene structure, J. Mol. Biol., 1992, 226(1): 141-157.[5]Kulp, D., Haussler, D., Reese, M. G. et al., A generalized Hidden Markov Model for the recognition of human genes in DNA. ISMB-96. St. Louise: AAAI/MIT Press, 1996.[6]Snyder. E. E.. Stormo, G. D., Identification of protein coding regions in genomic DNA, J. Mol. Biol., 1995, 248(1): 1-18.[7]Xu. Y., Einstein, J. R., Mural, R. J. et al., An improved system for exon recognition and gene modeling in human DNA sequences, in Proc. Int. Conf. lntell. Syst. Mol. Biol., Menlo Park, CA: AAAI Press, 1994, 2: 376-384.[8]Burset. M., Guigo, R., Evaluation of gene structure prediction programs, Genomics, 1996, 34(3): 353-367.[9]Burge. C.. Karlin, S., Prediction of complete gene structures in human genomic DNA, J. Mol. Biol., 1997, 268(l): 78-94.[10]Zhang. M. Q., Identification of protein coding regions in the human genome by quadratic discriminant analysis, Proc. Natl.Acad. Sci. USA, 1997,94(2): 565-568.[11]Mount. S. M., A catalogue of splice junction sequences, Nucleic Acids Res., 1982, 10(2): 459-472.[12]Qin Wenxin. Gu Jianren, Loss of heterozygosity on chromosome 17p13.3 in human malignant tumors, Chinese Bulletin of Life Sciences (in Chinese), 1999, 11(2): 75-77.[13]Li, D., Cao, Y., He, L. et al., Aberrations of p53 gene in human hepatocellular carcinoma from China, Carcinogenesis,1993. 14(2): 169-173.[14

  1. Predicting extreme wind speeds on a tropical island for multi-peril catastrophe modelling

    Science.gov (United States)

    Thornton, James; Moncoulon, David; Millinship, Ian; Raven, Emma

    2013-04-01

    Catastrophe models are important tools used by the reinsurance industry for assessing and managing risk. Here, we present the methods used to develop high-resolution wind hazard maps for the Indian Ocean island of La Réunion. As the recent Cyclone Dumile (January 2013) reminded us, the island is at considerable risk from the extreme weather associated with tropical cyclones. It also contains a significant proportion of the total value insured in French overseas territories. The wind maps, alongside flood and storm surge maps, were ultimately combined with exposure information in a multi-peril catastrophe model to provide probabilistic estimates of insured loss. Our wind mapping methodology used established extreme value theory statistics to estimate the annual probability of extreme wind speeds, including those exceeding the observed maxima of our 19 year record, at meteorological stations. This gave approximate wind speeds for a range of return periods at these specific locations. Since the spatial density of the stations was insufficient to resolve the numerous potential effects of the complex island topography, geographically weighted regression (GWR) models were then developed to interpolate these cyclonic wind speeds across the entire island. Factors known to affect local wind speed such as elevation, surface roughness and coastal proximity were explicitly accounted for. Using this advanced interpolation method, wind hazard maps were produced for six return periods between 1 in 10 and 1 in 1000 years. Our maps compared favourably with those of historical events, and also showed patterns of wind speed in agreement with the findings of other studies investigating the effects of topography. Leave-one-out cross-validation (LOOCV) further confirmed the satisfactory performance of the models in providing a robust and comprehensive description of wind patterns during cyclone passage. Uncertainty increased with return period as more extrapolation of the limited

  2. Computational prediction of cAMP receptor protein (CRP binding sites in cyanobacterial genomes

    Directory of Open Access Journals (Sweden)

    Su Zhengchang

    2009-01-01

    Full Text Available Abstract Background Cyclic AMP receptor protein (CRP, also known as catabolite gene activator protein (CAP, is an important transcriptional regulator widely distributed in many bacteria. The biological processes under the regulation of CRP are highly diverse among different groups of bacterial species. Elucidation of CRP regulons in cyanobacteria will further our understanding of the physiology and ecology of this important group of microorganisms. Previously, CRP has been experimentally studied in only two cyanobacterial strains: Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120; therefore, a systematic genome-scale study of the potential CRP target genes and binding sites in cyanobacterial genomes is urgently needed. Results We have predicted and analyzed the CRP binding sites and regulons in 12 sequenced cyanobacterial genomes using a highly effective cis-regulatory binding site scanning algorithm. Our results show that cyanobacterial CRP binding sites are very similar to those in E. coli; however, the regulons are very different from that of E. coli. Furthermore, CRP regulons in different cyanobacterial species/ecotypes are also highly diversified, ranging from photosynthesis, carbon fixation and nitrogen assimilation, to chemotaxis and signal transduction. In addition, our prediction indicates that crp genes in modern cyanobacteria are likely inherited from a common ancestral gene in their last common ancestor, and have adapted various cellular functions in different environments, while some cyanobacteria lost their crp genes as well as CRP binding sites during the course of evolution. Conclusion The CRP regulons in cyanobacteria are highly diversified, probably as a result of divergent evolution to adapt to various ecological niches. Cyanobacterial CRPs may function as lineage-specific regulators participating in various cellular processes, and are important in some lineages. However, they are dispensable in some other lineages. The

  3. PRISM offers a comprehensive genomic approach to transcription factor function prediction

    KAUST Repository

    Wenger, A. M.

    2013-02-04

    The human genome encodes 1500-2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.

  4. Using information of relatives in genomic prediction to apply effective stratified medicine

    Science.gov (United States)

    Lee, S. Hong; Weerasinghe, W. M. Shalanee P.; Wray, Naomi R.; Goddard, Michael E.; van der Werf, Julius H. J.

    2017-01-01

    Genomic prediction shows promise for personalised medicine in which diagnosis and treatment are tailored to individuals based on their genetic profiles for complex diseases. We present a theoretical framework to demonstrate that prediction accuracy can be improved by targeting more informative individuals in the data set used to generate the predictors (“discovery sample”) to include those with genetically close relationships with the subjects put forward for risk prediction. Increase of prediction accuracy from closer relationships is achieved under an additive model and does not rely on any family or interaction effects. Using theory, simulations and real data analyses, we show that the predictive accuracy or the area under the receiver operating characteristic curve (AUC) increased exponentially with decreasing effective size (Ne), i.e. when individuals are closely related. For example, with the sample size of discovery set N = 3000, heritability h2 = 0.5 and population prevalence K = 0.1, AUC value approached to 0.9 and the top percentile of the estimated genetic profile scores had 23 times higher proportion of cases than the general population. This suggests that there is considerable room to increase prediction accuracy by using a design that does not exclude closer relationships. PMID:28181587

  5. Impact of Relationships between Test and Reference Animals and between Reference Animals on Reliability of Genomic Prediction

    DEFF Research Database (Denmark)

    Wu, Xiaoping; Lund, Mogens Sandø; Sun, Dongxiao

    This study investigated reliability of genomic prediction in various scenarios with regard to relationship between test and reference animals and between animals within the reference population. Different reference populations were generated from EuroGenomics data and 1288 Nordic Holstein bulls...... as a common test population. A GBLUP model and a Bayesian mixture model were applied to predict Genomic breeding values for bulls in the test data. Result showed that a closer relationship between test and reference animals led to a higher reliability, while a closer relationship between reference animal...

  6. Genomic risk models improve prediction of longitudinal lipid levels in children and young adults

    Directory of Open Access Journals (Sweden)

    Nathan E. Wineinger

    2013-05-01

    Full Text Available In clinical medicine, lipids are commonly measured biomarkers used to assess an individual’s risk for cardiovascular disease, heart attack, and stroke. Accurately predicting longitudinal lipid levels based on genomic information can inform therapeutic practices and decrease cardiovascular risk by identifying high-risk patients prior to onset. Using genotyped and imputed genetic data from 523 unrelated Caucasian Americans from the Bogalusa Heart Study, surveyed on 4,026 occasions from 4 to 48 years of age, we generated various lipid genomic risk models based on previously reported markers. We observed a significant improvement in prediction over non-genetic risk models in high density lipoprotein cholesterol (increase in the squared correlation between observed and predicted values, d=0.032, low density lipoprotein cholesterol (d=0.053, total cholesterol (d=0.043, and triglycerides (d=0.031. Many of our approaches are based on an n-fold cross-validation procedure that are, by design, adaptable to a clinical environment.

  7. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data.

    Directory of Open Access Journals (Sweden)

    Hua Yu

    Full Text Available In silico prediction of drug-target interactions from heterogeneous biological data can advance our system-level search for drug molecules and therapeutic targets, which efforts have not yet reached full fruition. In this work, we report a systematic approach that efficiently integrates the chemical, genomic, and pharmacological information for drug targeting and discovery on a large scale, based on two powerful methods of Random Forest (RF and Support Vector Machine (SVM. The performance of the derived models was evaluated and verified with internally five-fold cross-validation and four external independent validations. The optimal models show impressive performance of prediction for drug-target interactions, with a concordance of 82.83%, a sensitivity of 81.33%, and a specificity of 93.62%, respectively. The consistence of the performances of the RF and SVM models demonstrates the reliability and robustness of the obtained models. In addition, the validated models were employed to systematically predict known/unknown drugs and targets involving the enzymes, ion channels, GPCRs, and nuclear receptors, which can be further mapped to functional ontologies such as target-disease associations and target-target interaction networks. This approach is expected to help fill the existing gap between chemical genomics and network pharmacology and thus accelerate the drug discovery processes.

  8. Predicting co-complexed protein pairs using genomic and proteomic data integration

    Directory of Open Access Journals (Sweden)

    King Oliver D

    2004-04-01

    Full Text Available Abstract Background Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H and affinity purification coupled with mass spectrometry (APMS have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. Results Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue, a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database, and the remaining predictions may potentially represent unknown CCPs. Conclusions We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

  9. Predictive Models of Recombination Rate Variation across the Drosophila melanogaster Genome

    Science.gov (United States)

    Adrian, Andrew B.; Corchado, Johnny Cruz; Comeron, Josep M.

    2016-01-01

    In all eukaryotic species examined, meiotic recombination, and crossovers in particular, occur non‐randomly along chromosomes. The cause for this non-random distribution remains poorly understood but some specific DNA sequence motifs have been shown to be enriched near crossover hotspots in a number of species. We present analyses using machine learning algorithms to investigate whether DNA motif distribution across the genome can be used to predict crossover variation in Drosophila melanogaster, a species without hotspots. Our study exposes a combinatorial non-linear influence of motif presence able to account for a significant fraction of the genome-wide variation in crossover rates at all genomic scales investigated, from 20% at 5-kb to almost 70% at 2,500-kb scale. The models are particularly predictive for regions with the highest and lowest crossover rates and remain highly informative after removing sub-telomeric and -centromeric regions known to have strongly reduced crossover rates. Transcriptional activity during early meiosis and differences in motif use between autosomes and the X chromosome add to the predictive power of the models. Moreover, we show that population-specific differences in crossover rates can be partly explained by differences in motif presence. Our results suggest that crossover distribution in Drosophila is influenced by both meiosis-specific chromatin dynamics and very local constitutive open chromatin associated with DNA motifs that prevent nucleosome stabilization. These findings provide new information on the genetic factors influencing variation in recombination rates and a baseline to study epigenetic mechanisms responsible for plastic recombination as response to different biotic and abiotic conditions and stresses. PMID:27492232

  10. High-Throughput Phenotyping of Sorghum Plant Height Using an Unmanned Aerial Vehicle and Its Application to Genomic Prediction Modeling

    Science.gov (United States)

    Watanabe, Kakeru; Guo, Wei; Arai, Keigo; Takanashi, Hideki; Kajiya-Kanegae, Hiromi; Kobayashi, Masaaki; Yano, Kentaro; Tokunaga, Tsuyoshi; Fujiwara, Toru; Tsutsumi, Nobuhiro; Iwata, Hiroyoshi

    2017-01-01

    Genomics-assisted breeding methods have been rapidly developed with novel technologies such as next-generation sequencing, genomic selection and genome-wide association study. However, phenotyping is still time consuming and is a serious bottleneck in genomics-assisted breeding. In this study, we established a high-throughput phenotyping system for sorghum plant height and its response to nitrogen availability; this system relies on the use of unmanned aerial vehicle (UAV) remote sensing with either an RGB or near-infrared, green and blue (NIR-GB) camera. We evaluated the potential of remote sensing to provide phenotype training data in a genomic prediction model. UAV remote sensing with the NIR-GB camera and the 50th percentile of digital surface model, which is an indicator of height, performed well. The correlation coefficient between plant height measured by UAV remote sensing (PHUAV) and plant height measured with a ruler (PHR) was 0.523. Because PHUAV was overestimated (probably because of the presence of taller plants on adjacent plots), the correlation coefficient between PHUAV and PHR was increased to 0.678 by using one of the two replications (that with the lower PHUAV value). Genomic prediction modeling performed well under the low-fertilization condition, probably because PHUAV overestimation was smaller under this condition due to a lower plant height. The predicted values of PHUAV and PHR were highly correlated with each other (r = 0.842). This result suggests that the genomic prediction models generated with PHUAV were almost identical and that the performance of UAV remote sensing was similar to that of traditional measurements in genomic prediction modeling. UAV remote sensing has a high potential to increase the throughput of phenotyping and decrease its cost. UAV remote sensing will be an important and indispensable tool for high-throughput genomics-assisted plant breeding.

  11. High-Throughput Phenotyping of Sorghum Plant Height Using an Unmanned Aerial Vehicle and Its Application to Genomic Prediction Modeling.

    Science.gov (United States)

    Watanabe, Kakeru; Guo, Wei; Arai, Keigo; Takanashi, Hideki; Kajiya-Kanegae, Hiromi; Kobayashi, Masaaki; Yano, Kentaro; Tokunaga, Tsuyoshi; Fujiwara, Toru; Tsutsumi, Nobuhiro; Iwata, Hiroyoshi

    2017-01-01

    Genomics-assisted breeding methods have been rapidly developed with novel technologies such as next-generation sequencing, genomic selection and genome-wide association study. However, phenotyping is still time consuming and is a serious bottleneck in genomics-assisted breeding. In this study, we established a high-throughput phenotyping system for sorghum plant height and its response to nitrogen availability; this system relies on the use of unmanned aerial vehicle (UAV) remote sensing with either an RGB or near-infrared, green and blue (NIR-GB) camera. We evaluated the potential of remote sensing to provide phenotype training data in a genomic prediction model. UAV remote sensing with the NIR-GB camera and the 50th percentile of digital surface model, which is an indicator of height, performed well. The correlation coefficient between plant height measured by UAV remote sensing (PHUAV) and plant height measured with a ruler (PHR) was 0.523. Because PHUAV was overestimated (probably because of the presence of taller plants on adjacent plots), the correlation coefficient between PHUAV and PHR was increased to 0.678 by using one of the two replications (that with the lower PHUAV value). Genomic prediction modeling performed well under the low-fertilization condition, probably because PHUAV overestimation was smaller under this condition due to a lower plant height. The predicted values of PHUAV and PHR were highly correlated with each other (r = 0.842). This result suggests that the genomic prediction models generated with PHUAV were almost identical and that the performance of UAV remote sensing was similar to that of traditional measurements in genomic prediction modeling. UAV remote sensing has a high potential to increase the throughput of phenotyping and decrease its cost. UAV remote sensing will be an important and indispensable tool for high-throughput genomics-assisted plant breeding.

  12. Applications of population genetics to animal breeding, from wright, fisher and lush to genomic prediction.

    Science.gov (United States)

    Hill, William G

    2014-01-01

    Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives' performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher's infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with "genomic selection" is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas.

  13. Applications of Population Genetics to Animal Breeding, from Wright, Fisher and Lush to Genomic Prediction

    Science.gov (United States)

    Hill, William G.

    2014-01-01

    Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives’ performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher’s infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with “genomic selection” is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas. PMID:24395822

  14. A Comparative Genomics Approach to Prediction of New Members of Regulons

    Science.gov (United States)

    Tan, Kai; Moreno-Hagelsieb, Gabriel; Collado-Vides, Julio; Stormo, Gary D.

    2001-01-01

    Identifying the complete transcriptional regulatory network for an organism is a major challenge. For each regulatory protein, we want to know all the genes it regulates, that is, its regulon. Examples of known binding sites can be used to estimate the binding specificity of the protein and to predict other binding sites. However, binding site predictions can be unreliable because determining the true specificity of the protein is difficult because of the considerable variability of binding sites. Because regulatory systems tend to be conserved through evolution, we can use comparisons between species to increase the reliability of binding site predictions. In this article, an approach is presented to evaluate the computational predicitions of regulatory sites. We combine the prediction of transcription units having orthologous genes with the prediction of transcription factor binding sites based on probabilistic models. We augment the sets of genes in Escherichia coli that are expected to be regulated by two transcription factors, the cAMP receptor protein and the fumarate and nitrate reduction regulatory protein, through a comparison with the Haemophilus influenzae genome. At the same time, we learned more about the regulatory networks of H. influenzae, a species with much less experimental knowledge than E. coli. By studying orthologous genes subject to regulation by the same transcription factor, we also gained understanding of the evolution of the entire regulatory systems. PMID:11282972

  15. INDeGenIUS, a new method for high-throughput identification of specialized functional islands in completely sequenced organisms

    Indian Academy of Sciences (India)

    Sakshi Shrivastava; Ch V Siva Kumar Reddy; Sharmila S Mande

    2010-09-01

    Genomic islands (GIs) are regions in the genome which are believed to have been acquired via horizontal gene transfer events and are thus likely to be compositionally distinct from the rest of the genome. Majority of the genes located in a GI encode a particular function. Depending on the genes they encode, GIs can be classified into various categories, such as `metabolic islands’, `symbiotic islands’, `resistance islands’, `pathogenicity islands’, etc. The computational process for GI detection is known and many algorithms for the same are available. We present a new method termed as Improved N-mer based Detection of Genomic Islands Using Sequence-clustering (INDeGenIUS) for the identification of GIs. This method was applied to 400 completely sequenced species belonging to proteobacteria. Based on the genes encoded in the identified GIs, the GIs were grouped into 6 categories: metabolic islands, symbiotic islands, resistance islands, secretion islands, pathogenicity islands and motility islands. Several new islands of interest which had previously been missed out by earlier algorithms were picked up as GIs by INDeGenIUS. The present algorithm has potential application in the identification of functionally relevant GIs in the large number of genomes that are being sequenced. Investigation of the predicted GIs in pathogens may lead to identification of potential drug/vaccine candidates.

  16. Predicting effects of structural stress in a genome-reduced model bacterial metabolism

    Science.gov (United States)

    Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles

    2012-08-01

    Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.

  17. Prediction of drug-target interactions for drug repositioning only based on genomic expression similarity.

    Directory of Open Access Journals (Sweden)

    Kejian Wang

    Full Text Available Small drug molecules usually bind to multiple protein targets or even unintended off-targets. Such drug promiscuity has often led to unwanted or unexplained drug reactions, resulting in side effects or drug repositioning opportunities. So it is always an important issue in pharmacology to identify potential drug-target interactions (DTI. However, DTI discovery by experiment remains a challenging task, due to high expense of time and resources. Many computational methods are therefore developed to predict DTI with high throughput biological and clinical data. Here, we initiatively demonstrate that the on-target and off-target effects could be characterized by drug-induced in vitro genomic expression changes, e.g. the data in Connectivity Map (CMap. Thus, unknown ligands of a certain target can be found from the compounds showing high gene-expression similarity to the known ligands. Then to clarify the detailed practice of CMap based DTI prediction, we objectively evaluate how well each target is characterized by CMap. The results suggest that (1 some targets are better characterized than others, so the prediction models specific to these well characterized targets would be more accurate and reliable; (2 in some cases, a family of ligands for the same target tend to interact with common off-targets, which may help increase the efficiency of DTI discovery and explain the mechanisms of complicated drug actions. In the present study, CMap expression similarity is proposed as a novel indicator of drug-target interactions. The detailed strategies of improving data quality by decreasing the batch effect and building prediction models are also effectively established. We believe the success in CMap can be further translated into other public and commercial data of genomic expression, thus increasing research productivity towards valid drug repositioning and minimal side effects.

  18. In silico miRNA prediction in metazoan genomes: balancing between sensitivity and specificity

    Directory of Open Access Journals (Sweden)

    Fiers Mark WJE

    2009-04-01

    Full Text Available Abstract Background MicroRNAs (miRNAs, short ~21-nucleotide RNA molecules, play an important role in post-transcriptional regulation of gene expression. The number of known miRNA hairpins registered in the miRBase database is rapidly increasing, but recent reports suggest that many miRNAs with restricted temporal or tissue-specific expression remain undiscovered. Various strategies for in silico miRNA identification have been proposed to facilitate miRNA discovery. Notably support vector machine (SVM methods have recently gained popularity. However, a drawback of these methods is that they do not provide insight into the biological properties of miRNA sequences. Results We here propose a new strategy for miRNA hairpin prediction in which the likelihood that a genomic hairpin is a true miRNA hairpin is evaluated based on statistical distributions of observed biological variation of properties (descriptors of known miRNA hairpins. These distributions are transformed into a single and continuous outcome classifier called the L score. Using a dataset of known miRNA hairpins from the miRBase database and an exhaustive set of genomic hairpins identified in the genome of Caenorhabditis elegans, a subset of 18 most informative descriptors was selected after detailed analysis of correlation among and discriminative power of individual descriptors. We show that the majority of previously identified miRNA hairpins have high L scores, that the method outperforms miRNA prediction by threshold filtering and that it is more transparent than SVM classifiers. Conclusion The L score is applicable as a prediction classifier with high sensitivity for novel miRNA hairpins. The L-score approach can be used to rank and select interesting miRNA hairpin candidates for downstream experimental analysis when coupled to a genome-wide set of in silico-identified hairpins or to facilitate the analysis of large sets of putative miRNA hairpin loci obtained in deep

  19. The effects of relatedness and GxE interaction on prediction accuracies in genomic selection: a study in cassava

    Science.gov (United States)

    Prior to implementation of genomic selection, an evaluation of the potential accuracy of prediction can be obtained by cross validation. In this procedure, a population with both phenotypes and genotypes is split into training and validation sets. The prediction model is fitted using the training se...

  20. Consistent CMT solutions from Harvard University before great earthquakes in Kurile Islands and its significance for earthquake prediction

    Institute of Scientific and Technical Information of China (English)

    WANG Jun-guo; DIAO Gui-ling

    2005-01-01

    In the paper, we use the Central Moment Tensor (CMT) solution acquired by Harvard University for the earthquakes occurred in Kurile Islands to analyze the consistent focal mechanism in the area and propose the idea of making earthquake prediction based on the consistent parameter a of focal mechanism and stress field. The results from the study indicate that before MW≥7.5 earthquakes, the consistent parameter a decreases, which starts about 10~110 days and ends about 30~2 days before the great earthquakes. Although the phenomenon is not totally the same for individual earthquake, the difference is not large. Certainly, the phenomenon should be tested by time for its reliability. However, it should not be random that the focal mechanism of MW≥5.3 earthquakes are consistent successively with the stress field in an area of several hundreds kilometers in length. It should be a phenomenon of predictive significance. When the accumulated earthquake examples are sufficient, uniform judgment criteria and prediction principles can be stipulated then.

  1. Should the markers on X chromosome be used for genomic prediction?

    DEFF Research Database (Denmark)

    Su, Guosheng; Guldbrandtsen, Bernt; Aamand, Gert Pedersen;

    2013-01-01

    excluding the X chromosome.Averaged over 15 traits, the gains in reliability from the X chromosome rangedfrom 0.3% to 0.5% points among the three data sets and models. Using a model with a G-matrix accounting for sex-linkedrelationship appropriately or a model which divided genomic breeding value intoan......This study investigated theaccuracy of imputation from LD (7K) to 54K panel and compared accuracy ofgenomic prediction with or without the X chromosome information, based on data ofNordic Holstein bulls. Beagle and Findhap were used for imputation. Averagedover two imputation datasets, the allele...... correct rates of imputation usingFindhap were 98.2% for autosomal markers, 89.7% for markers on the pseudoautosomal region of the X chromosome, and 96.4% for X-specific markers. Theallele correct rates were 98.9%, 91.2% and 96.8%, respectively, when usingBeagle. Genomic predictions were carried out for 15...

  2. Intrinsic disorder in Viral Proteins Genome-Linked: experimental and predictive analyses

    Directory of Open Access Journals (Sweden)

    Van Dorsselaer Alain

    2009-02-01

    Full Text Available Abstract Background VPgs are viral proteins linked to the 5' end of some viral genomes. Interactions between several VPgs and eukaryotic translation initiation factors eIF4Es are critical for plant infection. However, VPgs are not restricted to phytoviruses, being also involved in genome replication and protein translation of several animal viruses. To date, structural data are still limited to small picornaviral VPgs. Recently three phytoviral VPgs were shown to be natively unfolded proteins. Results In this paper, we report the bacterial expression, purification and biochemical characterization of two phytoviral VPgs, namely the VPgs of Rice yellow mottle virus (RYMV, genus Sobemovirus and Lettuce mosaic virus (LMV, genus Potyvirus. Using far-UV circular dichroism and size exclusion chromatography, we show that RYMV and LMV VPgs are predominantly or partly unstructured in solution, respectively. Using several disorder predictors, we show that both proteins are predicted to possess disordered regions. We next extend theses results to 14 VPgs representative of the viral diversity. Disordered regions were predicted in all VPg sequences whatever the genus and the family. Conclusion Based on these results, we propose that intrinsic disorder is a common feature of VPgs. The functional role of intrinsic disorder is discussed in light of the biological roles of VPgs.

  3. Compatibility of pedigree-based and marker-based relationships for single-step genomic prediction

    DEFF Research Database (Denmark)

    Christensen, Ole Fredslund

    2012-01-01

    Single-step methods for genomic prediction have recently become popular because they are conceptually simple and in practice such a method can completely replace a pedigree-based method for routine genetic evaluation. An issue with single-step methods is compatibility between the marker-based rel......Single-step methods for genomic prediction have recently become popular because they are conceptually simple and in practice such a method can completely replace a pedigree-based method for routine genetic evaluation. An issue with single-step methods is compatibility between the marker......-based relationship matrix and the pedigree-based relationship matrix. The compatibility issue involves which allele frequencies to use in the marker-based relationship matrix, and also that adjustments of this matrix to the pedigree-based relationship matrix are needed. In addition, it has been overlooked...... in the base population. Here, two ideas are explored. The first idea is to instead adjust the pedigree-based relationship matrix to be compatible to the marker-based relationship matrix, whereas the second idea is to include the likelihood for the observed markers. A single-step method is used where...

  4. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties.

    Science.gov (United States)

    Yue, Zhenyu; Zhang, Wenna; Lu, Yongming; Yang, Qiaoyue; Ding, Qiuying; Xia, Junfeng; Chen, Yan

    2015-01-01

    Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  5. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties

    Directory of Open Access Journals (Sweden)

    Zhenyu Yue

    2015-11-01

    Full Text Available Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  6. Integrated genome-scale prediction of detrimental mutations in transcription networks.

    Directory of Open Access Journals (Sweden)

    Mirko Francesconi

    2011-05-01

    Full Text Available A central challenge in genetics is to understand when and why mutations alter the phenotype of an organism. The consequences of gene inhibition have been systematically studied and can be predicted reasonably well across a genome. However, many sequence variants important for disease and evolution may alter gene regulation rather than gene function. The consequences of altering a regulatory interaction (or "edge" rather than a gene (or "node" in a network have not been as extensively studied. Here we use an integrative analysis and evolutionary conservation to identify features that predict when the loss of a regulatory interaction is detrimental in the extensively mapped transcription network of budding yeast. Properties such as the strength of an interaction, location and context in a promoter, regulator and target gene importance, and the potential for compensation (redundancy associate to some extent with interaction importance. Combined, however, these features predict quite well whether the loss of a regulatory interaction is detrimental across many promoters and for many different transcription factors. Thus, despite the potential for regulatory diversity, common principles can be used to understand and predict when changes in regulation are most harmful to an organism.

  7. Genomic prediction with parallel computing for slaughter traits in Chinese Simmental beef cattle using high-density genotypes.

    Science.gov (United States)

    Guo, Peng; Zhu, Bo; Xu, Lingyang; Niu, Hong; Wang, Zezhao; Guan, Long; Liang, Yonghu; Ni, Hemin; Guo, Yong; Chen, Yan; Zhang, Lupei; Gao, Xue; Gao, Huijiang; Li, Junya

    2017-01-01

    Genomic selection has been widely used for complex quantitative trait in farm animals. Estimations of breeding values for slaughter traits are most important to beef cattle industry, and it is worthwhile to investigate prediction accuracies of genomic selection for these traits. In this study, we assessed genomic predictive abilities for average daily gain weight (ADG), live weight (LW), carcass weight (CW), dressing percentage (DP), lean meat percentage (LMP) and retail meat weight (RMW) using Illumina Bovine 770K SNP Beadchip in Chinese Simmental cattle. To evaluate the abilities of prediction, marker effects were estimated using genomic BLUP (GBLUP) and three parallel Bayesian models, including multiple chains parallel BayesA, BayesB and BayesCπ (PBayesA, PBayesB and PBayesCπ). Training set and validation set were divided by random allocation, and the predictive accuracies were evaluated using 5-fold cross validations. We found the accuracies of genomic predictions ranged from 0.195±0.084 (GBLUP for LMP) to 0.424±0.147 (PBayesB for CW). The average accuracies across traits were 0.327±0.085 (GBLUP), 0.335±0.063 (PBayesA), 0.347±0.093 (PBayesB) and 0.334±0.077 (PBayesCπ), respectively. Notably, parallel Bayesian models were more accurate than GBLUP across six traits. Our study suggested that genomic selections with multiple chains parallel Bayesian models are feasible for slaughter traits in Chinese Simmental cattle. The estimations of direct genomic breeding values using parallel Bayesian methods can offer important insights into improving prediction accuracy at young ages and may also help to identify superior candidates in breeding programs.

  8. SNP detection and prediction of variability between chicken lines using genome resequencing of DNA pools

    Directory of Open Access Journals (Sweden)

    Carlborg Örjan

    2010-11-01

    Full Text Available Abstract Background Next-generation sequencing technologies are widely used for detection of millions of Single Nucleotide Polymorphisms (SNPs and also provide a means of assessing their variation. This information is useful for composing subsets of highly informative SNPs for region-specific or genome-wide analysis and to identify mutations regulating phenotypic differences within or between populations. In this study, we investigated the sensitivity of SNP detection and introduced the flanking SNPs value (FSV as a novel measure for predicting SNP-variability using ~5X genome resequencing with ABI SOLID and DNA pools from two chicken lines divergently selected for juvenile bodyweight. Results Genotyping with a 60 K SNP chip revealed polymorphisms within or between two divergently selected chicken lines for 31 363 SNPs, 48% of which were also detected using resequencing of DNA pools. SNP detection using resequencing was more powerful for positions with larger differences in allele frequency between the lines. About 50% of the SNPs with non-reference allele frequencies in the range 0.5-0.6 and 67% of those with frequencies > 0.9 could be detected. On average, ~3.7 SNPs/kb were detected by resequencing, with about 5% lower density on microchromosomes than on macrochromosomes. There was a positive correlation between the observed between-line SNP variation from the 60 K chip analysis and our proposed FSV score computed from the genome resequencing data. The strongest correlations on macrochromosomes and microchromosomes were observed when the FSV was calculated with total flanking regions of 62 kb (correlation 0.55 and 38 kb (correlation 0.45, respectively. Conclusions Genome resequencing with limited coverage (~5X using pooled DNA samples and three non-reference reads as a threshold for SNP detection, identified 50 - 67% of the 60 K SNPs with a non-reference allele frequency larger than 0.5. The SNP density was around 5% lower on the

  9. Report on three Genomes to Life Workshops: Data Infrastructure, Modeling and Simulation, and Protein Structure Prediction

    Energy Technology Data Exchange (ETDEWEB)

    Geist, GA

    2003-09-16

    On July 22, 23, 24, 2003, three one day workshops were held in Gaithersburg, Maryland. Each was attended by about 30 computational biologists, mathematicians, and computer scientists who were experts in the respective workshop areas The first workshop discussed the data infrastructure needs for the Genomes to Life (GTL) program with the objective to identify gaps in the present GTL data infrastructure and define the GTL data infrastructure required for the success of the proposed GTL facilities. The second workshop discussed the modeling and simulation needs for the next phase of the GTL program and defined how these relate to the experimental data generated by genomics, proteomics, and metabolomics. The third workshop identified emerging technical challenges in computational protein structure prediction for DOE missions and outlining specific goals for the next phase of GTL. The workshops were attended by representatives from both OBER and OASCR. The invited experts at each of the workshops made short presentations on what they perceived as the key needs in the GTL data infrastructure, modeling and simulation, and structure prediction respectively. Each presentation was followed by a lively discussion by all the workshop attendees. The following findings and recommendations were derived from the three workshops. A seamless integration of GTL data spanning the entire range of genomics, proteomics, and metabolomics will be extremely challenging but it has to be treated as the first-class component of the GTL program to assure GTL's chances for success. High-throughput GTL facilities and ultrascale computing will make it possible to address the ultimate goal of modern biology: to achieve a fundamental, comprehensive, and systematic understanding of life. But first the GTL community needs to address the problem of the massive quantities and increased complexity of biological data produced by experiments and computations. Genome-scale collection, analysis

  10. Genome-wide protein localization prediction strategies for gram negative bacteria

    Directory of Open Access Journals (Sweden)

    Romine Margaret F

    2011-06-01

    Full Text Available Abstract Background Genome-wide prediction of protein subcellular localization is an important type of evidence used for inferring protein function. While a variety of computational tools have been developed for this purpose, errors in the gene models and use of protein sorting signals that are not recognized by the more commonly accepted tools can diminish the accuracy of their output. Results As part of an effort to manually curate the annotations of 19 strains of Shewanella, numerous insights were gained regarding the use of computational tools and proteomics data to predict protein localization. Identification of the suite of secretion systems present in each strain at the start of the process made it possible to tailor-fit the subsequent localization prediction strategies to each strain for improved accuracy. Comparisons of the computational predictions among orthologous proteins revealed inconsistencies in the computational outputs, which could often be resolved by adjusting the gene models or ortholog group memberships. While proteomic data was useful for verifying start site predictions and post-translational proteolytic cleavage, care was needed to distinguish cellular versus sample processing-mediated cleavage events. Searches for lipoprotein signal peptides revealed that neither TatP nor LipoP are designed for identification of lipoprotein substrates of the twin arginine translocation system and that the +2 rule for lipoprotein sorting does not apply to this Genus. Analysis of the relationships between domain occurrence and protein localization prediction enabled identification of numerous location-informative domains which could then be used to refine or increase confidence in location predictions. This collective knowledge was used to develop a general strategy for predicting protein localization that could be adapted to other organisms. Conclusion Improved localization prediction accuracy is not simply a matter of developing better

  11. A Research Progress on Salmonella Multi-drug Resistance Genomic Island 1%沙门菌多重耐药基因岛研究进展

    Institute of Scientific and Technical Information of China (English)

    王芳

    2011-01-01

    多药耐药基因组岛是指细菌染色体上一段具有典型特征的基因簇,携带有多种耐药基因,决定细菌的多药耐药性;多药耐药基因组岛具有移动元件的特征,如 G+C百分比和密码子使用与宿主菌不同,常含移动基因,可以在同种甚至于不同种菌株间水平转移,加速了临床上多药耐药菌株的产生.目前已发现在沙门菌属和其他菌属的细菌中携带沙门菌多重耐药基因岛.由于沙门菌多重耐药基因岛1上的耐药基因具有可移动性,使其在细菌多重耐药获得与传播机制的研究中具有重要意义.%A multi-drug resistance genomic island of bacteria refers to the gene clusters section of hacterial chromosome with typical characteristics, on which there are many drug resistant genes determining the multi-drug resistance of bacteria. Multi-drug resistance genomic island has characteristics of mobile genetic elements such as different (G+ C contents and codon usage. It usually contains mobile genes and can be transferred between homologous and even heterologous hacteria,which promotes the production of clinical multi-drug resistant strains. At present , salmonella multidrug resistant genomic island 1 ( SGI1 )and the variants of it have been found in salmonella and other bacteria. The mohile characteristic of SGI1 made it very important in studying development and dissemination of the multi-drug resistance among bacteria.

  12. Predicting relatedness of bacterial genomes using the chaperonin-60 universal target (cpn60 UT): application to Thermoanaerobacter species.

    Science.gov (United States)

    Verbeke, Tobin J; Sparling, Richard; Hill, Janet E; Links, Matthew G; Levin, David; Dumonceaux, Tim J

    2011-05-01

    D.R. Zeigler determined that the sequence identity of bacterial genomes can be predicted accurately using the sequence identities of a corresponding set of genes that meet certain criteria [32]. This three-gene model for comparing bacterial genome pairs requires the determination of the sequence identities for recN, thdF, and rpoA. This involves the generation of approximately 4.2kb of genomic DNA sequence from each organism to be compared, and also normally requires that oligonucleotide primers be designed for amplification and sequencing based on the sequences of closely related organisms. However, we have developed an analogous mathematical model for predicting the sequence identity of whole genomes based on the sequence identity of the 542-567 base pair chaperonin-60 universal target (cpn60 UT). The cpn60 UT is accessible in nearly all bacterial genomes with a single set of universal primers, and its length is such that it can be completely sequenced in one pair of overlapping sequencing reads via di-deoxy sequencing. These mathematical models were applied to a set of Thermoanaerobacter isolates from a wood chip compost pile and it was shown that both the one-gene cpn60 UT-based model and the three-gene model based on recN, rpoA, and thdF predicted that these isolates could be classified as Thermoanaerobacter thermohydrosulfuricus. Furthermore, it was found that the genomic prediction model using cpn60 UT gave similar results to whole-genome sequence alignments over a broad range of taxa, suggesting that this method may have general utility for screening isolates and predicting their taxonomic affiliations.

  13. Genome-wide DNA methylation analysis predicts an epigenetic switch for GATA factor expression in endometriosis.

    Directory of Open Access Journals (Sweden)

    Matthew T Dyson

    2014-03-01

    Full Text Available Endometriosis is a gynecological disease defined by the extrauterine growth of endometrial-like cells that cause chronic pain and infertility. The disease is limited to primates that exhibit spontaneous decidualization, and diseased cells are characterized by significant defects in the steroid-dependent genetic pathways that typify this process. Altered DNA methylation may underlie these defects, but few regions with differential methylation have been implicated in the disease. We mapped genome-wide differences in DNA methylation between healthy human endometrial and endometriotic stromal cells and correlated this with gene expression using an interaction analysis strategy. We identified 42,248 differentially methylated CpGs in endometriosis compared to healthy cells. These extensive differences were not unidirectional, but were focused intragenically and at sites distal to classic CpG islands where methylation status was typically negatively correlated with gene expression. Significant differences in methylation were mapped to 403 genes, which included a disproportionally large number of transcription factors. Furthermore, many of these genes are implicated in the pathology of endometriosis and decidualization. Our results tremendously improve the scope and resolution of differential methylation affecting the HOX gene clusters, nuclear receptor genes, and intriguingly the GATA family of transcription factors. Functional analysis of the GATA family revealed that GATA2 regulates key genes necessary for the hormone-driven differentiation of healthy stromal cells, but is hypermethylated and repressed in endometriotic cells. GATA6, which is hypomethylated and abundant in endometriotic cells, potently blocked hormone sensitivity, repressed GATA2, and induced markers of endometriosis when expressed in healthy endometrial cells. The unique epigenetic fingerprint in endometriosis suggests DNA methylation is an integral component of the disease, and

  14. Dispositional optimism and perceived risk interact to predict intentions to learn genome sequencing results.

    Science.gov (United States)

    Taber, Jennifer M; Klein, William M P; Ferrer, Rebecca A; Lewis, Katie L; Biesecker, Leslie G; Biesecker, Barbara B

    2015-07-01

    Dispositional optimism and risk perceptions are each associated with health-related behaviors and decisions and other outcomes, but little research has examined how these constructs interact, particularly in consequential health contexts. The predictive validity of risk perceptions for health-related information seeking and intentions may be improved by examining dispositional optimism as a moderator, and by testing alternate types of risk perceptions, such as comparative and experiential risk. Participants (n = 496) had their genomes sequenced as part of a National Institutes of Health pilot cohort study (ClinSeq®). Participants completed a cross-sectional baseline survey of various types of risk perceptions and intentions to learn genome sequencing results for differing disease risks (e.g., medically actionable, nonmedically actionable, carrier status) and to use this information to change their lifestyle/health behaviors. Risk perceptions (absolute, comparative, and experiential) were largely unassociated with intentions to learn sequencing results. Dispositional optimism and comparative risk perceptions interacted, however, such that individuals higher in optimism reported greater intentions to learn all 3 types of sequencing results when comparative risk was perceived to be higher than when it was perceived to be lower. This interaction was inconsistent for experiential risk and absent for absolute risk. Independent of perceived risk, participants high in dispositional optimism reported greater interest in learning risks for nonmedically actionable disease and carrier status, and greater intentions to use genome information to change their lifestyle/health behaviors. The relationship between risk perceptions and intentions may depend on how risk perceptions are assessed and on degree of optimism. (c) 2015 APA, all rights reserved.

  15. Comparison of whole genome prediction accuracy across generations using parametric and semi parametric methods

    Directory of Open Access Journals (Sweden)

    Abbas Atefi

    2016-11-01

    Full Text Available Accuracy of genomic prediction was compared using three parametric and semi parametric methods, including BayesA, Bayesian LASSO and Reproducing kernel Hilbert spaces regression under various levels of heritability (0.15, 0.3 and 0.45, different number of markers (500, 750 and 1000 and generation intervals of validating set. A historical population of 1000 individuals with equal sex ratio was simulated for 100 generations at constant size. It followed by 100 extra generations of gradually reducing size down to 500 individuals in generation 200. Individuals of generation 200 were mated randomly for 10 more generations applying litter size of 5 to expand the historical generation. Finally, 50 males and 500 females chosen from generation 210 were randomly mated to generate 10 more generations of recent population. Individuals born in generation 211 considered as the training set while the validation set was composed of individuals either from generations 213, 215 or 217. The genome comprised one chromosome of 100 cM length carrying 50 QTLs. There was no significant difference between accuracy of investigated methods (p > 0.05 but among three methods, the highest mean accuracy (0.659 was observed for BayesA. By increasing the heritability, the average genomic accuracy increased from 0.53 to 0.75 (p < 0.05. The number of SNPs affected the accuracy and accuracies increased as number of SNPs increased; therefore, the highest accuracy was for the case number of SNPs=1000. With getting away from validating set, the accuracies decreased and the most severe decay observed in the case of low heritability. Decreasing the accuracy across generations affected by marker density but was independent from investigated methods.

  16. Genomic selection prediction accuracy in a perennial crop: case study of oil palm (Elaeis guineensis Jacq.).

    Science.gov (United States)

    Cros, David; Denis, Marie; Sánchez, Leopoldo; Cochard, Benoit; Flori, Albert; Durand-Gasselin, Tristan; Nouy, Bruno; Omoré, Alphonse; Pomiès, Virginie; Riou, Virginie; Suryana, Edyana; Bouvet, Jean-Marc

    2015-03-01

    Genomic selection empirically appeared valuable for reciprocal recurrent selection in oil palm as it could account for family effects and Mendelian sampling terms, despite small populations and low marker density. Genomic selection (GS) can increase the genetic gain in plants. In perennial crops, this is expected mainly through shortened breeding cycles and increased selection intensity, which requires sufficient GS accuracy in selection candidates, despite often small training populations. Our objective was to obtain the first empirical estimate of GS accuracy in oil palm (Elaeis guineensis), the major world oil crop. We used two parental populations involved in conventional reciprocal recurrent selection (Deli and Group B) with 131 individuals each, genotyped with 265 SSR. We estimated within-population GS accuracies when predicting breeding values of non-progeny-tested individuals for eight yield traits. We used three methods to sample training sets and five statistical methods to estimate genomic breeding values. The results showed that GS could account for family effects and Mendelian sampling terms in Group B but only for family effects in Deli. Presumably, this difference between populations originated from their contrasting breeding history. The GS accuracy ranged from -0.41 to 0.94 and was positively correlated with the relationship between training and test sets. Training sets optimized with the so-called CDmean criterion gave the highest accuracies, ranging from 0.49 (pulp to fruit ratio in Group B) to 0.94 (fruit weight in Group B). The statistical methods did not affect the accuracy. Finally, Group B could be preselected for progeny tests by applying GS to key yield traits, therefore increasing the selection intensity. Our results should be valuable for breeding programs with small populations, long breeding cycles, or reduced effective size.

  17. Genome Wide Association Study to predict severe asthma exacerbations in children using random forests classifiers

    Directory of Open Access Journals (Sweden)

    Litonjua Augusto A

    2011-06-01

    Full Text Available Abstract Background Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. To date, disease prediction has been based on a few environmental factors and/or single nucleotide polymorphisms (SNPs, while complex diseases are usually affected by many genetic and environmental factors with each factor contributing a small portion to the outcome. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics. Methods In this study, using emergency room visits or hospitalizations as the definition of a severe asthma exacerbation, we first identified a list of top Genome Wide Association Study (GWAS SNPs ranked by Random Forests (RF importance score for the CAMP (Childhood Asthma Management Program population of 127 exacerbation cases and 290 non-exacerbation controls. We predict severe asthma exacerbations using the top 10 to 320 SNPs together with age, sex, pre-bronchodilator FEV1 percentage predicted, and treatment group. Results Testing in an independent set of the CAMP population shows that severe asthma exacerbations can be predicted with an Area Under the Curve (AUC = 0.66 with 160-320 SNPs in comparison to an AUC score of 0.57 with 10 SNPs. Using the clinical traits alone yielded AUC score of 0.54, suggesting the phenotype is affected by genetic as well as environmental factors. Conclusions Our study shows that a random forests algorithm can effectively extract and use the information contained in a small number of samples. Random forests, and other machine learning tools, can be used with GWAS studies to integrate large numbers of predictors simultaneously.

  18. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds.

    Science.gov (United States)

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-08-10

    A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of

  19. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    Science.gov (United States)

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  20. 139 Clinically Applicable and Biologically Validated MRI Radiomic Test Method Predicts Glioblastoma Genomic Landscape and Survival.

    Science.gov (United States)

    Zinn, Pascal O; Singh, Sanjay K; Kotrotsou, Aikaterini; Zandi, Faramak; Thomas, Ginu; Hatami, Masumeh; Luedi, Markus M; Elakkad, Ahmed; Hassan, Islam; Gumin, Joy; Sulman, Erik P; Lang, Frederick F; Colen, Rivka R

    2016-08-01

    Imaging is the modality of choice for noninvasive characterization of biological tissue and organ systems; imaging serves as early diagnostic tool for most disease processes and is rapidly evolving, thus transforming the way we diagnose and follow patients over time. A vast number of cancer imaging characteristics have been correlated to underlying genomics; however, none have established causality. Therefore, our objectives were to test if there is a causal relationship between imaging and genomic information; and to develop a clinically relevant radiomic pipeline for glioblastoma molecular characterization. Functional validation was performed using a prototypic in vivo RNA-interference-based orthotopic xenograft mouse model. The automated pipeline collects 4800 MRI-derived texture features per tumor. Using univariate feature selection and boosted tree predictive modeling, a patient-specific genomic probability map was derived and patient survival predicted (The Cancer Genome Atlas/MD Anderson data sets). Data demonstrated a significant xenograft to human association (area under the curve [AUC] 84%, P applicable analytical imaging method termed Radiome Sequencing to allow for automated image analysis, prediction of key genomic events, and survival. This method is scalable and applicable to any type of medical imaging. Further, it allows for human-mouse matched coclinical trials, in-depth end point analysis, and upfront noninvasive high-resolution radiomics-based diagnostic, prognostic, and predictive biomarker development.

  1. In silico prediction and screening of modular crystal structures via a high-throughput genomic approach

    Science.gov (United States)

    Li, Yi; Li, Xu; Liu, Jiancong; Duan, Fangzheng; Yu, Jihong

    2015-09-01

    High-throughput computational methods capable of predicting, evaluating and identifying promising synthetic candidates with desired properties are highly appealing to today's scientists. Despite some successes, in silico design of crystalline materials with complex three-dimensionally extended structures remains challenging. Here we demonstrate the application of a new genomic approach to ABC-6 zeolites, a family of industrially important catalysts whose structures are built from the stacking of modular six-ring layers. The sequences of layer stacking, which we deem the genes of this family, determine the structures and the properties of ABC-6 zeolites. By enumerating these gene-like stacking sequences, we have identified 1,127 most realizable new ABC-6 structures out of 78 groups of 84,292 theoretical ones, and experimentally realized 2 of them. Our genomic approach can extract crucial structural information directly from these gene-like stacking sequences, enabling high-throughput identification of synthetic targets with desired properties among a large number of candidate structures.

  2. DNAskew: Statistical Analysis of Base Compositional Asymmetry and Prediction of Replication Boundaries in the Genome Sequences

    Institute of Scientific and Technical Information of China (English)

    Xiang-RuMA; Shao-BoXIAO; Ai-ZhenGUO; Jian-QiangLUE; Huan-ChunCHEN

    2004-01-01

    Sueoka and Lobry declared respectively that, in the absence of bias between the two DNA strands for mutation and selection, the base composition within each strand should be A=T and C=G (this state is called Parity Rule type 2, PR2). However, the genome sequences of many bacteria, vertebrates and viruses showed asymmetries in base composition and gene direction. To determine the relationship of base composition skews with replication orientation, gene function, codon usage biases and phylogenetic evolution,in this paper a program called DNAskew was developed for the statistical analysis of strand asymmetry and codon composition bias in the DNA sequence. In addition, the program can also be used to predict the replication boundaries of genome sequences. The method builds on the fact that there are compositional asymmetries between the leading and the lagging strand for replication. DNAskew was written in Perl script language and implemented on the LINUX operating system. It works quickly with annotated or unannotated sequences in GBFF (GenBank flatfile) or fasta format. The source code is freely available for academic use at http://www.epizooty.com/pub/stat/DNAskew.

  3. Genome-wide Transcription Factor Gene Prediction and their Expressional Tissue-Specificities in Maize

    Institute of Scientific and Technical Information of China (English)

    Yi Jiang; Biao Zeng; Hainan Zhao; Mei Zhang; Shaojun Xie; Jinsheng Lai

    2012-01-01

    Transcription factors (TFs) are important regulators of gene expression.To better understand TFencoding genes in maize (Zea mays L.),a genome-wide TF prediction was performed using the updated B73 reference genome.A total of 2 298 TF genes were identified,which can be classified into 56 families.The largest family,known as the MYB superfamily,comprises 322 MYB and MYB-related TF genes.The expression patterns of 2014 (87.64%) TF genes were examined using RNA-seq data,which resulted in the identification of a subset of TFs that are specifically expressed in particular tissues (including root,shoot,leaf,ear,tassel and kernel).Similarly,98 kernel-specific TF genes were further analyzed,and it was observed that 29 of the kernel-specific genes were preferentially expressed in the early kernel developmental stage,while 69 of the genes were expressed in the late kernel developmental stage.Identification of these TFs,particularly the tissue-specific ones,provides important information for the understanding of development and transcriptional regulation of maize.

  4. Sequence Analysis of Staphylococcus hyicus ATCC 11249T, an Etiological Agent of Exudative Epidermitis in Swine, Reveals a Type VII Secretion System Locus and a Novel 116-Kilobase Genomic Island Harboring Toxin-Encoding Genes.

    Science.gov (United States)

    Calcutt, Michael J; Foecking, Mark F; Hsieh, Hsin-Yeh; Adkins, Pamela R F; Stewart, George C; Middleton, John R

    2015-02-19

    Staphylococcus hyicus is the primary etiological agent of exudative epidermitis in swine. Analysis of the complete genome sequence of the type strain revealed a locus encoding a type VII secretion system and a large chromosomal island harboring the genes encoding exfoliative toxin ExhA and an EDIN toxin homolog.

  5. Sequence-Based Characterization of Tn5801-Like Genomic Islands in Tetracycline-Resistant Staphylococcus pseudintermedius and Other Gram-positive Bacteria from Humans and Animals

    DEFF Research Database (Denmark)

    de Vries, Lisbeth Elvira; Hasman, Henrik; Jurado Rabadán, Sonia

    2016-01-01

    Antibiotic resistance in pathogens is often associated with mobile genetic elements, such as genomic islands (GI) including integrative and conjugative elements (ICEs). These can transfer resistance genes within and between bacteria from humans and/or animals. The aim of this study...... was to investigate whether Tn5801-like GIs carrying the tetracycline resistance gene, tet(M), are common in Staphylococcus pseudintermedius from pets, and to do an overall sequences-based characterization of Tn5801-like GIs detected in Gram-positive bacteria from humans and animals. A total of 27 tetracycline...... types were detected among the porcine E. faecium and human S. aureus isolates (Tn6014 and GI6288). Tn5801-like GIs were detected in GenBank-sequences from Gram-positive bacteria of human, animal or food origin worldwide. Known Tn5801-like GIs were divided into seven types. The results showed that Tn5801...

  6. Life-history traits maintain the genomic integrity of sympatric species of the spruce budworm (Choristoneura fumiferana) group on an isolated forest island.

    Science.gov (United States)

    Lumley, Lisa M; Sperling, Felix Ah

    2011-10-01

    Identification of widespread species collected from islands can be challenging due to the potential for local ecological and phenotypic divergence in isolated populations. We sought to determine how many species of the spruce budworm (Choristoneura fumiferana) complex reside in Cypress Hills, an isolated remnant coniferous forest in western Canada. We integrated data on behavior, ecology, morphology, mitochondrial DNA, and simple sequence repeats, comparing Cypress Hills populations to those from other regions of North America to determine which species they resembled most. We identified C. fumiferana, C. occidentalis, C. lambertiana, and hybrid forms in Cypress Hills. Adult flight phenology and pheromone attraction were identified as key life-history traits involved in maintaining the genomic integrity of species. Our study highlights the importance of extensive sampling of both specimens and a variety of characters for understanding species boundaries in biodiversity research.

  7. Back from a predicted climatic extinction of an island endemic: a future for the Corsican Nuthatch.

    Directory of Open Access Journals (Sweden)

    Morgane Barbet-Massin

    Full Text Available The Corsican Nuthatch (Sitta whiteheadi is red-listed as vulnerable to extinction by the IUCN because of its endemism, reduced population size, and recent decline. A further cause is the fragmentation and loss of its spatially-restricted favourite habitat, the Corsican pine (Pinus nigra laricio forest. In this study, we aimed at estimating the potential impact of climate change on the distribution of the Corsican Nuthatch using species distribution models. Because this species has a strong trophic association with the Corsican and Maritime pines (P. nigra laricio and P. pinaster, we first modelled the current and future potential distribution of both pine species in order to use them as habitat variables when modelling the nuthatch distribution. However, the Corsican pine has suffered large distribution losses in the past centuries due to the development of anthropogenic activities, and is now restricted to mountainous woodland. As a consequence, its realized niche is likely significantly smaller than its fundamental niche, so that a projection of the current distribution under future climatic conditions would produce misleading results. To obtain a predicted pine distribution at closest to the geographic projection of the fundamental niche, we used available information on the current pine distribution associated to information on the persistence of isolated natural pine coppices. While common thresholds (maximizing the sum of sensitivity and specificity predicted a potential large loss of the Corsican Nuthatch distribution by 2100, the use of more appropriate thresholds aiming at getting closer to the fundamental distribution of the Corsican pine predicted that 98% of the current presence points should remain potentially suitable for the nuthatch and its range could be 10% larger in the future. The habitat of the endemic Corsican Nuthatch is therefore more likely threatened by an increasing frequency and intensity of wildfires or anthropogenic

  8. The dog and cat population on Maio Island, Cape Verde: characterisation and prediction based on household survey and remotely sensed imagery

    DEFF Research Database (Denmark)

    Lopes Antunes, Ana Carolina; Ducheyne, Els; Bryssinckx, Ward

    2015-01-01

    The objective was to estimate and characterise the dog and cat population on Maio Island, Cape Verde. Remotely sensed imagery was used to document the number of houses across the island and a household survey was carried out in six administrative areas recording the location of each animal using...... a global positioning system instrument. Linear statistical models were applied to predict the dog and cat populations based on the number of houses found and according to various levels of data aggregation. In the surveyed localities, a total of 457 dogs and 306 cats were found. The majority of animals had...... owners and only a few had free access to outdoor activities. The estimated population size was 531 dogs [95% confidence interval (CI): 453-609] and 354 cats (95% CI: 275-431). Stray animals were not a concern on the island in contrast to the rest of the country...

  9. The master regulator of IncA/C plasmids is recognized by the Salmonella Genomic island SGI1 as a signal for excision and conjugal transfer.

    Science.gov (United States)

    Kiss, János; Papp, Péter Pál; Szabó, Mónika; Farkas, Tibor; Murányi, Gábor; Szakállas, Erik; Olasz, Ferenc

    2015-10-15

    The genomic island SGI1 and its variants, the important vehicles of multi-resistance in Salmonella strains, are integrative elements mobilized exclusively by the conjugative IncA/C plasmids. Integration and excision of the island are carried out by the SGI1-encoded site-specific recombinase Int and the recombination directionality factor Xis. Chromosomal integration ensures the stable maintenance and vertical transmission of SGI1, while excision is the initial step of horizontal transfer, followed by conjugation and integration into the recipient. We report here that SGI1 not only exploits the conjugal apparatus of the IncA/C plasmids but also utilizes the regulatory mechanisms of the conjugation system for the exact timing and activation of excision to ensure efficient horizontal transfer. This study demonstrates that the FlhDC-family activator AcaCD, which regulates the conjugation machinery of the IncA/C plasmids, serves as a signal of helper entry through binding to SGI1 xis promoter and activating SGI1 excision. Promoters of int and xis genes have been identified and the binding site of the activator has been located by footprinting and deletion analyses. We prove that expression of xis is activator-dependent while int is constitutively expressed, and this regulatory mechanism is presumably responsible for the efficient transfer and stable maintenance of SGI1. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. A genomic island integrated into recA of Vibrio cholerae contains a divergent recA and provides multi-pathway protection from DNA damage.

    Science.gov (United States)

    Rapa, Rita A; Islam, Atiqul; Monahan, Leigh G; Mutreja, Ankur; Thomson, Nicholas; Charles, Ian G; Stokes, Harold W; Labbate, Maurizio

    2015-04-01

    Lateral gene transfer (LGT) has been crucial in the evolution of the cholera pathogen, Vibrio cholerae. The two major virulence factors are present on two different mobile genetic elements, a bacteriophage containing the cholera toxin genes and a genomic island (GI) containing the intestinal adhesin genes. Non-toxigenic V. cholerae in the aquatic environment are a major source of novel DNA that allows the pathogen to morph via LGT. In this study, we report a novel GI from a non-toxigenic V. cholerae strain containing multiple genes involved in DNA repair including the recombination repair gene recA that is 23% divergent from the indigenous recA and genes involved in the translesion synthesis pathway. This is the first report of a GI containing the critical gene recA and the first report of a GI that targets insertion into a specific site within recA. We show that possession of the island in Escherichia coli is protective against DNA damage induced by UV-irradiation and DNA targeting antibiotics. This study highlights the importance of genetic elements such as GIs in the evolution of V. cholerae and emphasizes the importance of environmental strains as a source of novel DNA that can influence the pathogenicity of toxigenic strains.

  11. CpG Island Methylation in Human Lymphocytes Is Highly Correlated with DNA Sequence, Repeats, and Predicted DNA Structure

    OpenAIRE

    Christoph Bock; Martina Paulsen; Sascha Tierling; Thomas Mikeska; Thomas Lengauer; Jörn Walter

    2006-01-01

    CpG island methylation plays an important role in epigenetic gene control during mammalian development and is frequently altered in disease situations such as cancer. The majority of CpG islands is normally unmethylated, but a sizeable fraction is prone to become methylated in various cell types and pathological situations. The goal of this study is to show that a computational epigenetics approach can discriminate between CpG islands that are prone to methylation from...

  12. Genome-wide association and genomic prediction of breeding values for fatty acid composition in subcutaneous adipose and longissimus lumborum muscle of beef cattle.

    Science.gov (United States)

    Chen, Liuhong; Ekine-Dzivenu, Chinyere; Vinsky, Michael; Basarab, John; Aalhus, Jennifer; Dugan, Mike E R; Fitzsimmons, Carolyn; Stothard, Paul; Li, Changxi

    2015-11-21

    Identification of genetic variants that are associated with fatty acid composition in beef will enhance our understanding of host genetic influence on the trait and also allow for more effective improvement of beef fatty acid profiles through genomic selection and marker-assisted diet management. In this study, 81 and 83 fatty acid traits were measured in subcutaneous adipose (SQ) and longissimus lumborum muscle (LL), respectively, from 1366 purebred and crossbred beef steers and heifers that were genotyped on the Illumina BovineSNP50 Beadchip. The objective was to conduct genome-wide association studies (GWAS) for the fatty acid traits and to evaluate the accuracy of genomic prediction for fatty acid composition using genomic best linear unbiased prediction (GBLUP) and Bayesian methods. In total, 302 and 360 significant SNPs spanning all autosomal chromosomes were identified to be associated with fatty acid composition in SQ and LL tissues, respectively. Proportions of total genetic variance explained by individual significant SNPs ranged from 0.03 to 11.06% in SQ, and from 0.005 to 24.28% in the LL muscle. Markers with relatively large effects were located near fatty acid synthase (FASN), stearoyl-CoA desaturase (SCD), and thyroid hormone responsive (THRSP) genes. For the majority of the fatty acid traits studied, the accuracy of genomic prediction was relatively low ( = 0.50) were achieved for 10:0, 12:0, 14:0, 15:0, 16:0, 9c-14:1, 12c-16:1, 13c-18:1, and health index (HI) in LL, and for 12:0, 14:0, 15:0, 10 t,12c-18:2, and 11 t,13c + 11c,13 t-18:2 in SQ. The Bayesian method performed similarly as GBLUP for most of the traits but substantially better for traits that were affected by SNPs of large effects as identified by GWAS. Fatty acid composition in beef is influenced by a few host genes with major effects and many genes of smaller effects. With the current training population size and marker density, genomic prediction has the potential to predict

  13. Predictive ability of genomic selection models for breeding value estimation on growth traits of Pacific white shrimp Litopenaeus vannamei

    Science.gov (United States)

    Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai

    2017-09-01

    Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.

  14. Predictive ability of genomic selection models for breeding value estimation on growth traits of Pacific white shrimp Litopenaeus vannamei

    Science.gov (United States)

    Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai

    2016-10-01

    Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.

  15. Integration of Multiple Genomic Data Sources in a Bayesian Cox Model for Variable Selection and Prediction.

    Science.gov (United States)

    Treppmann, Tabea; Ickstadt, Katja; Zucknick, Manuela

    2017-01-01

    Bayesian variable selection becomes more and more important in statistical analyses, in particular when performing variable selection in high dimensions. For survival time models and in the presence of genomic data, the state of the art is still quite unexploited. One of the more recent approaches suggests a Bayesian semiparametric proportional hazards model for right censored time-to-event data. We extend this model to directly include variable selection, based on a stochastic search procedure within a Markov chain Monte Carlo sampler for inference. This equips us with an intuitive and flexible approach and provides a way for integrating additional data sources and further extensions. We make use of the possibility of implementing parallel tempering to help improve the mixing of the Markov chains. In our examples, we use this Bayesian approach to integrate copy number variation data into a gene-expression-based survival prediction model. This is achieved by formulating an informed prior based on copy number variation. We perform a simulation study to investigate the model's behavior and prediction performance in different situations before applying it to a dataset of glioblastoma patients and evaluating the biological relevance of the findings.

  16. Demethylation by 5-aza-2'-deoxycytidine in colorectal cancer cells targets genomic DNA whilst promoter CpG island methylation persists

    Directory of Open Access Journals (Sweden)

    Kim Kyu-Tae

    2010-07-01

    Full Text Available Abstract Background DNA methylation and histone acetylation are epigenetic modifications that act as regulators of gene expression. Aberrant epigenetic gene silencing in tumours is a frequent event, yet the factors which dictate which genes are targeted for inactivation are unknown. DNA methylation and histone acetylation can be modified with the chemical agents 5-aza-2'-deoxycytidine (5-aza-dC and Trichostatin A (TSA respectively. The aim of this study was to analyse de-methylation and re-methylation and its affect on gene expression in colorectal cancer cell lines treated with 5-aza-dC alone and in combination with TSA. We also sought to identify methylation patterns associated with long term reactivation of previously silenced genes. Method Colorectal cancer cell lines were treated with 5-aza-dC, with and without TSA, to analyse global methylation decreases by High Performance Liquid Chromatography (HPLC. Re-methylation was observed with removal of drug treatments. Expression arrays identified silenced genes with differing patterns of expression after treatment, such as short term reactivation or long term reactivation. Sodium bisulfite sequencing was performed on the CpG island associated with these genes and expression was verified with real time PCR. Results Treatment with 5-aza-dC was found to affect genomic methylation and to a lesser extent gene specific methylation. Reactivated genes which remained expressed 10 days post 5-aza-dC treatment featured hypomethylated CpG sites adjacent to the transcription start site (TSS. In contrast, genes with uniformly hypermethylated CpG islands were only temporarily reactivated. Conclusion These results imply that 5-aza-dC induces strong de-methylation of the genome and initiates reactivation of transcriptionally inactive genes, but this does not require gene associated CpG island de-methylation to occur. In addition, for three of our selected genes, hypomethylation at the TSS of an epigenetically

  17. Genomic Prostate Cancer Classifier Predicts Biochemical Failure and Metastases in Patients After Postoperative Radiation Therapy

    Energy Technology Data Exchange (ETDEWEB)

    Den, Robert B., E-mail: Robert.Den@jeffersonhospital.org [Kimmel Cancer Center, Jefferson Medical College of Thomas Jefferson University, Philadelphia, Pennsylvania (United States); Feng, Felix Y. [University of Michigan, Michigan Union, Michigan (United States); Showalter, Timothy N. [University of Virginia School of Medicine, Charlottesville, Virginia (United States); Mishra, Mark V. [University of Maryland Medical Center, Baltimore, Maryland (United States); Trabulsi, Edouard J.; Lallas, Costas D.; Gomella, Leonard G.; Kelly, W. Kevin; Birbe, Ruth C.; McCue, Peter A. [Kimmel Cancer Center, Jefferson Medical College of Thomas Jefferson University, Philadelphia, Pennsylvania (United States); Ghadessi, Mercedeh; Yousefi, Kasra; Davicioni, Elai [GenomeDx Biosciences Inc., Vancouver, British Columbia (Canada); Knudsen, Karen E.; Dicker, Adam P. [Kimmel Cancer Center, Jefferson Medical College of Thomas Jefferson University, Philadelphia, Pennsylvania (United States)

    2014-08-01

    Purpose: To test the hypothesis that a genomic classifier (GC) would predict biochemical failure (BF) and distant metastasis (DM) in men receiving radiation therapy (RT) after radical prostatectomy (RP). Methods and Materials: Among patients who underwent post-RP RT, 139 were identified for pT3 or positive margin, who did not receive neoadjuvant hormones and had paraffin-embedded specimens. Ribonucleic acid was extracted from the highest Gleason grade focus and applied to a high-density-oligonucleotide microarray. Receiver operating characteristic, calibration, cumulative incidence, and Cox regression analyses were performed to assess GC performance for predicting BF and DM after post-RP RT in comparison with clinical nomograms. Results: The area under the receiver operating characteristic curve of the Stephenson model was 0.70 for both BF and DM, with addition of GC significantly improving area under the receiver operating characteristic curve to 0.78 and 0.80, respectively. Stratified by GC risk groups, 8-year cumulative incidence was 21%, 48%, and 81% for BF (P<.0001) and for DM was 0, 12%, and 17% (P=.032) for low, intermediate, and high GC, respectively. In multivariable analysis, patients with high GC had a hazard ratio of 8.1 and 14.3 for BF and DM. In patients with intermediate or high GC, those irradiated with undetectable prostate-specific antigen (PSA ≤0.2 ng/mL) had median BF survival of >8 years, compared with <4 years for patients with detectable PSA (>0.2 ng/mL) before initiation of RT. At 8 years, the DM cumulative incidence for patients with high GC and RT with undetectable PSA was 3%, compared with 23% with detectable PSA (P=.03). No outcome differences were observed for low GC between the treatment groups. Conclusion: The GC predicted BF and metastasis after post-RP irradiation. Patients with lower GC risk may benefit from delayed RT, as opposed to those with higher GC; however, this needs prospective validation. Genomic-based models

  18. Neutral Theory Predicts the Relative Abundance and Diversity of Genetic Elements in a Broad Array of Eukaryotic Genomes

    Science.gov (United States)

    Serra, François; Becher, Verónica; Dopazo, Hernán

    2013-01-01

    It is universally true in ecological communities, terrestrial or aquatic, temperate or tropical, that some species are very abundant, others are moderately common, and the majority are rare. Likewise, eukaryotic genomes also contain classes or “species” of genetic elements that vary greatly in abundance: DNA transposons, retrotransposons, satellite sequences, simple repeats and their less abundant functional sequences such as RNA or genes. Are the patterns of relative species abundance and diversity similar among ecological communities and genomes? Previous dynamical models of genomic diversity have focused on the selective forces shaping the abundance and diversity of transposable elements (TEs). However, ideally, models of genome dynamics should consider not only TEs, but also the diversity of all genetic classes or “species” populating eukaryotic genomes. Here, in an analysis of the diversity and abundance of genetic elements in >500 eukaryotic chromosomes, we show that the patterns are consistent with a neutral hypothesis of genome assembly in virtually all chromosomes tested. The distributions of relative abundance of genetic elements are quite precisely predicted by the dynamics of an ecological model for which the principle of functional equivalence is the main assumption. We hypothesize that at large temporal scales an overarching neutral or nearly neutral process governs the evolution of abundance and diversity of genetic elements in eukaryotic genomes. PMID:23798991

  19. Delineation and Prediction Uncertainty of Areas Contributing Recharge to Selected Well Fields in Wetland and Coastal Settings, Southern Rhode Island

    Science.gov (United States)

    Friesz, Paul J.

    2010-01-01

    Areas contributing recharge to four well fields in two study sites in southern Rhode Island were delineated on the basis of steady-state groundwater-flow models representing average hydrologic conditions. The wells are screened in sand and gravel deposits in wetland and coastal settings. The groundwater-flow models were calibrated by inverse modeling using nonlinear regression. Summary statistics from nonlinear regression were used to evaluate the uncertainty associated with the predicted areas contributing recharge to the well fields. In South Kingstown, two United Water Rhode Island well fields are in Mink Brook watershed and near Worden Pond and extensive wetlands. Wetland deposits of peat near the well fields generally range in thickness from 5 to 8 feet. Analysis of water-level drawdowns in a piezometer screened beneath the peat during a 20-day pumping period indicated vertical leakage and a vertical hydraulic conductivity for the peat of roughly 0.01 ft/d. The simulated area contributing recharge for average withdrawals of 2,138 gallons per minute during 2003-07 extended to groundwater divides in mostly till and morainal deposits, and it encompassed 2.30 square miles. Most of a sand and gravel mining operation between the well fields was in the simulated contributing area. For the maximum pumping capacity (5,100 gallons per minute), the simulated area contributing recharge expanded to 5.54 square miles. The well fields intercepted most of the precipitation recharge in Mink Brook watershed and in an adjacent small watershed, and simulated streams ceased to flow. The simulated contributing area to the well fields included an area beneath Worden Pond and a remote, isolated area in upland till on the opposite side of Worden Pond from the well fields. About 12 percent of the pumped water was derived from Worden Pond. In Charlestown, the Central Beach Fire District and the East Beach Water Association well fields are on a small (0.85 square mile) peninsula in a

  20. Quantitative genetics theory for genomic selection and efficiency of breeding value prediction in open-pollinated populations

    Directory of Open Access Journals (Sweden)

    José Marcelo Soriano Viana

    2016-06-01

    Full Text Available ABSTRACT To date, the quantitative genetics theory for genomic selection has focused mainly on the relationship between marker and additive variances assuming one marker and one quantitative trait locus (QTL. This study extends the quantitative genetics theory to genomic selection in order to prove that prediction of breeding values based on thousands of single nucleotide polymorphisms (SNPs depends on linkage disequilibrium (LD between markers and QTLs, assuming dominance. We also assessed the efficiency of genomic selection in relation to phenotypic selection, assuming mass selection in an open-pollinated population, all QTLs of lower effect, and reduced sample size, based on simulated data. We show that the average effect of a SNP substitution is proportional to LD measure and to average effect of a gene substitution for each QTL that is in LD with the marker. Weighted (by SNP frequencies and unweighted breeding value predictors have the same accuracy. Efficiency of genomic selection in relation to phenotypic selection is inversely proportional to heritability. Accuracy of breeding value prediction is not affected by the dominance degree and the method of analysis, however, it is influenced by LD extent and magnitude of additive variance. The increase in the number of markers asymptotically improved accuracy of breeding value prediction. The decrease in the sample size from 500 to 200 did not reduce considerably accuracy of breeding value prediction.

  1. Genome-scale prediction of proteins with long intrinsically disordered regions.

    Science.gov (United States)

    Peng, Zhenling; Mizianty, Marcin J; Kurgan, Lukasz

    2014-01-01

    Proteins with long disordered regions (LDRs), defined as having 30 or more consecutive disordered residues, are abundant in eukaryotes, and these regions are recognized as a distinct class of biologically functional domains. LDRs facilitate various cellular functions and are important for target selection in structural genomics. Motivated by the lack of methods that directly predict proteins with LDRs, we designed Super-fast predictor of proteins with Long Intrinsically DisordERed regions (SLIDER). SLIDER utilizes logistic regression that takes an empirically chosen set of numerical features, which consider selected physicochemical properties of amino acids, sequence complexity, and amino acid composition, as its inputs. Empirical tests show that SLIDER offers competitive predictive performance combined with low computational cost. It outperforms, by at least a modest margin, a comprehensive set of modern disorder predictors (that can indirectly predict LDRs) and is 16 times faster compared to the best currently available disorder predictor. Utilizing our time-efficient predictor, we characterized abundance and functional roles of proteins with LDRs over 110 eukaryotic proteomes. Similar to related studies, we found that eukaryotes have many (on average 30.3%) proteins with LDRs with majority of proteomes having between 25 and 40%, where higher abundance is characteristic to proteomes that have larger proteins. Our first-of-its-kind large-scale functional analysis shows that these proteins are enriched in a number of cellular functions and processes including certain binding events, regulation of catalytic activities, cellular component organization, biogenesis, biological regulation, and some metabolic and developmental processes. A webserver that implements SLIDER is available at http://biomine.ece.ualberta.ca/SLIDER/. Copyright © 2013 Wiley Periodicals, Inc.

  2. Accuracy of whole-genome prediction using a genetic architecture-enhanced variance-covariance matrix.

    Science.gov (United States)

    Zhang, Zhe; Erbe, Malena; He, Jinlong; Ober, Ulrike; Gao, Ning; Zhang, Hao; Simianer, Henner; Li, Jiaqi

    2015-02-09

    Obtaining accurate predictions of unobserved genetic or phenotypic values for complex traits in animal, plant, and human populations is possible through whole-genome prediction (WGP), a combined analysis of genotypic and phenotypic data. Because the underlying genetic architecture of the trait of interest is an important factor affecting model selection, we propose a new strategy, termed BLUP|GA (BLUP-given genetic architecture), which can use genetic architecture information within the dataset at hand rather than from public sources. This is achieved by using a trait-specific covariance matrix ( T: ), which is a weighted sum of a genetic architecture part ( S: matrix) and the realized relationship matrix ( G: ). The algorithm of BLUP|GA (BLUP-given genetic architecture) is provided and illustrated with real and simulated datasets. Predictive ability of BLUP|GA was validated with three model traits in a dairy cattle dataset and 11 traits in three public datasets with a variety of genetic architectures and compared with GBLUP and other approaches. Results show that BLUP|GA outperformed GBLUP in 20 of 21 scenarios in the dairy cattle dataset and outperformed GBLUP, BayesA, and BayesB in 12 of 13 traits in the analyzed public datasets. Further analyses showed that the difference of accuracies for BLUP|GA and GBLUP significantly correlate with the distance between the T: and G: matrices. The new strategy applied in BLUP|GA is a favorable and flexible alternative to the standard GBLUP model, allowing to account for the genetic architecture of the quantitative trait under consideration when necessary. This feature is mainly due to the increased similarity between the trait-specific relationship matrix ( T: matrix) and the genetic relationship matrix at unobserved causal loci. Applying BLUP|GA in WGP would ease the burden of model selection. Copyright © 2015 Zhang et al.

  3. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    Science.gov (United States)

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  4. CpG island hypermethylation of the DNA repair enzyme methyltransferase predicts response to temozolomide in primary gliomas.

    Science.gov (United States)

    Paz, Maria F; Yaya-Tur, Ricard; Rojas-Marcos, Iñigo; Reynes, Gaspar; Pollan, Marina; Aguirre-Cruz, Lucinda; García-Lopez, Jose Luis; Piquer, Jose; Safont, María-Jose; Balaña, Carmen; Sanchez-Cespedes, Montserrat; García-Villanueva, Mercedes; Arribas, Leoncio; Esteller, Manel

    2004-08-01

    The DNA repair enzyme O(6)-methylguanine DNA methyltransferase (MGMT) inhibits the killing of tumor cells by alkylating agents, and its loss in cancer cells is associated with hypermethylation of the MGMT CpG island. Thus, methylation of MGMT has been correlated with the clinical response to 1,3-bis(2-chloroethyl)-1-nitrosourea (BCNU) in primary gliomas. Here, we investigate whether the presence of MGMT methylation in gliomas is also a good predictor of response to another emergent alkylating agent, temozolomide. Using a methylation-specific PCR approach, we assessed the methylation status of the CpG island of MGMT in 92 glioma patients who received temozolomide as first-line chemotherapy or as treatment for relapses. Methylation of the MGMT promoter positively correlated with the clinical response in the glioma patients receiving temozolomide as first-line chemotherapy (n = 40). Eight of 12 patients with MGMT-methylated tumors (66.7%) had a partial or complete response, compared with 7 of 28 patients with unmethylated tumors (25.0%; P = 0.030). We also found a positive association between MGMT methylation and clinical response in those patients receiving BCNU (n = 35, P = 0.041) or procarbazine/1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea (n = 17, P = 0.043) as first-line chemotherapy. Overall, if we analyze the clinical response of all of the first-line chemotherapy treatments with temozolomide, BCNU, and procarbazine/1-(2-chloroethyl)-3-cyclohexyl-1-nitrosourea as a group in relation to the MGMT methylation status, MGMT hypermethylation was strongly associated with the presence of partial or complete clinical response (P < 0.001). Finally, the MGMT methylation status determined in the initial glioma tumor did not correlate with the clinical response to temozolomide when this drug was administered as treatment for relapses (P = 0.729). MGMT methylation predicts the clinical response of primary gliomas to first-line chemotherapy with the alkylating agent

  5. Biofilm Formation Mechanisms of Pseudomonas aeruginosa Predicted via Genome-Scale Kinetic Models of Bacterial Metabolism.

    Science.gov (United States)

    Vital-Lopez, Francisco G; Reifman, Jaques; Wallqvist, Anders

    2015-10-01

    A hallmark of Pseudomonas aeruginosa is its ability to establish biofilm-based infections that are difficult to eradicate. Biofilms are less susceptible to host inflammatory and immune responses and have higher antibiotic tolerance than free-living planktonic cells. Developing treatments against biofilms requires an understanding of bacterial biofilm-specific physiological traits. Research efforts have started to elucidate the intricate mechanisms underlying biofilm development. However, many aspects of these mechanisms are still poorly understood. Here, we addressed questions regarding biofilm metabolism using a genome-scale kinetic model of the P. aeruginosa metabolic network and gene expression profiles. Specifically, we computed metabolite concentration differences between known mutants with altered biofilm formation and the wild-type strain to predict drug targets against P. aeruginosa biofilms. We also simulated the altered metabolism driven by gene expression changes between biofilm and stationary growth-phase planktonic cultures. Our analysis suggests that the synthesis of important biofilm-related molecules, such as the quorum-sensing molecule Pseudomonas quinolone signal and the exopolysaccharide Psl, is regulated not only through the expression of genes in their own synthesis pathway, but also through the biofilm-specific expression of genes in pathways competing for precursors to these molecules. Finally, we investigated why mutants defective in anthranilate degradation have an impaired ability to form biofilms. Alternative to a previous hypothesis that this biofilm reduction is caused by a decrease in energy production, we proposed that the dysregulation of the synthesis of secondary metabolites derived from anthranilate and chorismate is what impaired the biofilms of these mutants. Notably, these insights generated through our kinetic model-based approach are not accessible from previous constraint-based model analyses of P. aeruginosa biofilm

  6. Towards fully automated structure-based function prediction in structural genomics: a case study.

    Science.gov (United States)

    Watson, James D; Sanderson, Steve; Ezersky, Alexandra; Savchenko, Alexei; Edwards, Aled; Orengo, Christine; Joachimiak, Andrzej; Laskowski, Roman A; Thornton, Janet M

    2007-04-13

    As the global Structural Genomics projects have picked up pace, the number of structures annotated in the Protein Data Bank as hypothetical protein or unknown function has grown significantly. A major challenge now involves the development of computational methods to assign functions to these proteins accurately and automatically. As part of the Midwest Center for Structural Genomics (MCSG) we have developed a fully automated functional analysis server, ProFunc, which performs a battery of analyses on a submitted structure. The analyses combine a number of sequence-based and structure-based methods to identify functional clues. After the first stage of the Protein Structure Initiative (PSI), we review the success of the pipeline and the importance of structure-based function prediction. As a dataset, we have chosen all structures solved by the MCSG during the 5 years of the first PSI. Our analysis suggests that two of the structure-based methods are particularly successful and provide examples of local similarity that is difficult to identify using current sequence-based methods. No one method is successful in all cases, so, through the use of a number of complementary sequence and structural approaches, the ProFunc server increases the chances that at least one method will find a significant hit that can help elucidate function. Manual assessment of the results is a time-consuming process and subject to individual interpretation and human error. We present a method based on the Gene Ontology (GO) schema using GO-slims that can allow the automated assessment of hits with a success rate approaching that of expert manual assessment.

  7. Klebsiella pneumoniae asparagine tDNAs are integration hotspots for different genomic islands encoding microcin E492 production determinants and other putative virulence factors present in hypervirulent strains

    Directory of Open Access Journals (Sweden)

    Andrés Esteban Marcoleta

    2016-06-01

    Full Text Available Due to the developing of multi-resistant and invasive hypervirulent strains, Klebsiella pneumoniae has become one of the most urgent bacterial pathogen threats in the last years. Genomic comparison of a growing number of sequenced isolates has allowed the identification of putative virulence factors, proposed to be acquirable mainly through horizontal gene transfer. In particular, those related with synthesizing the antibacterial peptide microcin E492 (MccE492 and salmochelin siderophores were found to be highly prevalent among hypervirulent strains. The determinants for the production of both molecules were first reported as part of a 13-kbp segment of K. pneumoniae RYC492 chromosome, and were cloned and characterized in E. coli. However, the genomic context of this segment in K. pneumoniae remained uncharacterized.In this work we provided experimental and bioinformatics evidence indicating that the MccE492 cluster is part of a highly conserved 23-kbp genomic island (GI named GIE492, that was integrated in a specific asparagine-tRNA gene (asn-tDNA and was found in a high proportion of isolates from liver abscesses sampled around the world. This element resulted to be unstable and its excision frequency increased after treating bacteria with mytomicin C and upon the overexpression of the island-encoded integrase. Besides the MccE492 genetic cluster, it invariably included an integrase-coding gene, at least 7 protein-coding genes of unknown function, and a putative transfer origin that possibly allows this GI to be mobilized through conjugation. In addition, we analyzed the asn-tDNA loci of all the available K. pneumoniae assembled chromosomes to evaluate them as GI-integration sites. Remarkably, 73% of the strains harbored at least one GI integrated in one of the four asn-tDNA present in this species, confirming them as integration hotspots. Each of these tDNAs was occupied with different frequencies, although they were 100% identical. Also, we

  8. Genome-wide scan in Portuguese Island families implicates multiple loci in bipolar disorder: fine mapping adds support on chromosomes 6 and 11.

    Science.gov (United States)

    Pato, Carlos N; Pato, M T; Kirby, A; Petryshen, T L; Medeiros, H; Carvalho, C; Macedo, A; Dourado, A; Coelho, I; Valente, J; Soares, M J; Ferreira, C P; Lei, M; Verner, A; Hudson, T J; Morley, C P; Kennedy, J L; Azevedo, M H; Daly, M J; Sklar, P

    2004-05-15

    As part of an extensive study in the Portuguese Island population of families with multiple patients suffering from bipolar disorder and schizophrenia, we performed an initial genome-wide scan of 16 extended families with bipolar disorder that identified three regions on chromosomes 2, 11, and 19 with genome-wide suggestive linkage and several other regions, including chromosome 6q, also approached suggestive levels of significance. Dick et al. [2003: Am J Hum Genet 73:107-114] recently reported in a study of 250 families with bipolar disorder a maxLOD score of 3.61 near marker D6S1021 on chromosome 6q. This study replicates this finding having detected a peak NPL = 2.02 (P = 0.025) with the same marker D6S1021(104.7 Mb). Higher-density mapping provided additional support for loci on chromosome 6 including marker D6S1021 with an NPL = 2.59 (P = 0.0068) and peaking at marker D6S1639 (125 Mb) with an NPL = 3.06 (P = 0.0019). A similar pattern was detected with higher-density mapping of chromosome 11 with an NPL = 3.15 (P = 0.0014) at marker D11S1883 (63.1 Mb). Simulations at the density of our fine mapping data indicate that less than 1 scan out of 10 would find two such scores genome-wide in the same scan by chance. Our findings provide additional support for a susceptibility locus for bipolar disorder on 6q, as well as, suggesting the importance of denser scans. Published 2004 Wiley-Liss, Inc.

  9. K19 capsular polysaccharide of Acinetobacter baumannii is produced via a Wzy polymerase encoded in a small genomic island rather than the KL19 capsule gene cluster.

    Science.gov (United States)

    Kenyon, Johanna J; Shneider, Mikhail M; Senchenkova, Sofya N; Shashkov, Alexander S; Siniagina, Maria N; Malanin, Sergey Y; Popova, Anastasiya V; Miroshnikov, Konstantin A; Hall, Ruth M; Knirel, Yuriy A

    2016-08-01

    Polymerization of the oligosaccharides (K units) of complex capsular polysaccharides (CPSs) requires a Wzy polymerase, which is usually encoded in the gene cluster that directs K unit synthesis. Here, a gene cluster at the Acinetobacter K locus (KL) that lacks a wzy gene, KL19, was found in Acinetobacter baumannii ST111 isolates 28 and RBH2 recovered from hospitals in the Russian Federation and Australia, respectively. However, these isolates produced long-chain capsule, and a wzy gene was found in a 6.1 kb genomic island (GI) located adjacent to the cpn60 gene. The GI also includes an acetyltransferase gene, atr25, which is interrupted by an insertion sequence (IS) in RBH2. The capsule structure from both strains was →3)-α-d-GalpNAc-(1→4)-α-d-GalpNAcA-(1→3)-β-d-QuipNAc4NAc-(1→, determined using NMR spectroscopy. Biosynthesis of the K unit was inferred to be initiated with QuiNAc4NAc, and hence the Wzy forms the β-(1→3) linkage between QuipNAc4NAc and GalpNAc. The GalpNAc residue is 6-O-acetylated in isolate 28 only, showing that atr25 is responsible for this acetylation. The same GI with or without an IS in atr25 was found in draft genomes of other KL19 isolates, as well as ones carrying a closely related CPS gene cluster, KL39, which differs from KL19 only in a gene for an acyltransferase in the QuiNAc4NR synthesis pathway. Isolates carrying a KL1 variant with the wzy and atr genes each interrupted by an ISAba125 also have this GI. To our knowledge, this study is the first report of genes involved in capsule biosynthesis normally found at the KL located elsewhere in A. baumannii genomes.

  10. Multivariate Statistics and Supervised Learning for Predictive Detection of Unintentional Islanding in Grid-Tied Solar PV Systems

    Directory of Open Access Journals (Sweden)

    Shashank Vyas

    2016-01-01

    Full Text Available Integration of solar photovoltaic (PV generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.

  11. Functional divergence in the genus Oenococcus as predicted by genome sequencing of the newly-described species, Oenococcus kitaharae.

    Directory of Open Access Journals (Sweden)

    Anthony R Borneman

    Full Text Available Oenococcus kitaharae is only the second member of the genus Oenococcus to be identified and is the closest relative of the industrially important wine bacterium Oenococcus oeni. To provide insight into this new species, the genome of the type strain of O. kitaharae, DSM 17330, was sequenced. Comparison of the sequenced genomes of both species show that the genome of O. kitaharae DSM 17330 contains many genes with predicted functions in cellular defence (bacteriocins, antimicrobials, restriction-modification systems and a CRISPR locus which are lacking in O. oeni. The two genomes also appear to differentially encode several metabolic pathways associated with amino acid biosynthesis and carbohydrate utilization and which have direct phenotypic consequences. This would indicate that the two species have evolved different survival techniques to suit their particular environmental niches. O. oeni has adapted to survive in the harsh, but predictable, environment of wine that provides very few competitive species. However O. kitaharae appears to have adapted to a growth environment in which biological competition provides a significant selective pressure by accumulating biological defence molecules, such as bacteriocins and restriction-modification systems, throughout its genome.

  12. Genetic parameters for predicted methane production and potential for reducing enteric emissions through genomic selection.

    Science.gov (United States)

    Haas, Y de; Windig, J J; Calus, M P L; Dijkstra, J; Haan, M de; Bannink, A; Veerkamp, R F

    2011-12-01

    Mitigation of enteric methane (CH₄) emission in ruminants has become an important area of research because accumulation of CH₄ is linked to global warming. Nutritional and microbial opportunities to reduce CH₄ emissions have been extensively researched, but little is known about using natural variation to breed animals with lower CH₄ yield. Measuring CH₄ emission rates directly from animals is difficult and hinders direct selection on reduced CH₄ emission. However, improvements can be made through selection on associated traits (e.g., residual feed intake, RFI) or through selection on CH₄ predicted from feed intake and diet composition. The objective was to establish phenotypic and genetic variation in predicted CH₄ output, and to determine the potential of genetics to reduce methane emissions in dairy cattle. Experimental data were used and records on daily feed intake, weekly body weights, and weekly milk production were available from 548 heifers. Residual feed intake (MJ/d) is the difference between net energy intake and calculated net energy requirements for maintenance as a function of body weight and for fat- and protein-corrected milk production. Predicted methane emission (PME; g/d) is 6% of gross energy intake (Intergovernmental Panel on Climate Change methodology) corrected for energy content of methane (55.65 kJ/g). The estimated heritabilities for PME and RFI were 0.35 and 0.40, respectively. The positive genetic correlation between RFI and PME indicated that cows with lower RFI have lower PME (estimates ranging from 0.18 to 0.84). Hence, it is possible to decrease the methane production of a cow by selecting more-efficient cows, and the genetic variation suggests that reductions in the order of 11 to 26% in 10 yr are theoretically possible, and could be even higher in a genomic selection program. However, several uncertainties are discussed; for example, the lack of true methane measurements (and the key assumption that methane

  13. Predictive genomic and metabolomic analysis for the standardization of enzyme data

    Directory of Open Access Journals (Sweden)

    Masaaki Kotera

    2014-05-01

    Full Text Available The IUBMB׳s Enzyme List gives a valuable library of the individual experimental facts on enzyme activities, providing the standard classification and nomenclature of enzymes. Empirical knowledge about the relationships between the enzyme protein sequences (or structures and their functions (the capability of catalyzing chemical reactions has been accumulating in public literatures and databases. This provides a complementary approach to standardize and organize enzyme data, i.e., predicting the possible enzymes, reactions and metabolites that remain to be identified experimentally. Thus, we suggest the necessity of classifying enzymes based on the evidence and different perspectives obtained from various experimental works. The KEGG (Kyoto Encyclopedia of Genes and Genomes database describes enzymes from many different viewpoints including; the IUBMB׳s enzyme nomenclature/classification (EC numbers, the similarity group of enzyme reactions (KEGG Reaction Class; RCLASS based solely on the chemical structure transformation patterns, and the similarity groups of enzyme genes (KEGG Orthology; KO based on the orthologous groups that can be mapped to the KEGG PATHWAY and BRITE functional hierarchy. Some unique identifiers were additionally introduced to the KEGG database other than the EC numbers established by IUBMB. R, RP and RC numbers are given to distinguish reactions, reactant pairs and RCLASS, respectively. Genes, including enzyme genes, have their own ID numbers in specific organisms, and they are classified into ortholog groups that are identified by K numbers. In this review, we explain the concept and methodology of this formulation with some concrete example cases. We propose it beneficial to create a standard classification scheme that deals with both experimentally identified and theoretically predicted enzymes.

  14. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks.

    Science.gov (United States)

    Xiang, Zuoshuang; Qin, Tingting; Qin, Zhaohui S; He, Yongqun

    2013-10-16

    The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. The GenoMesh algorithm and web program provide the first genome-wide, MeSH-based literature mining

  15. Heat Islands

    Science.gov (United States)

    EPA's Heat Island Effect Site provides information on heat islands, their impacts, mitigation strategies, related research, a directory of heat island reduction initiatives in U.S. communities, and EPA's Heat Island Reduction Program.

  16. Predicting transcription factor binding sites using local over-representation and comparative genomics

    Directory of Open Access Journals (Sweden)

    Touzet Hélène

    2006-08-01

    Full Text Available Abstract Background Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms. Results We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets. Conclusion TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at http://bioinfo.lifl.fr/TFM-Explorer.

  17. Behavioral, Brain Imaging and Genomic Measures to Predict Functional Outcomes Post - Bed Rest and Spaceflight

    Science.gov (United States)

    Mulavara, A. P.; DeDios, Y. E.; Gadd, N. E.; Caldwell, E. E.; Batson, C. D.; Goel, R.; Seidler, R. D.; Oddsson, L.; Zanello, S.; Clarke, T.; Peters, B.; Cohen, H. S.; Reschke, M.; Wood, S.; Bloomberg, J. J.

    2016-01-01

    retrospective study, leveraging data already collected from relevant ongoing or completed bed rest and spaceflight studies. These data will be combined with predictor metrics that will be collected prospectively (as described for behavioral, brain imaging and genomic measures) from these returning subjects to build models for predicting post-mission (bed rest - non-astronauts or space flight - astronauts) adaptive capability as manifested in their outcome measures. To date we have completed a study on 15 normal subjects with all of the above measures. In this presentation we will discuss the optimized set of tests for predictive metrics to be used for evaluating post mission adaptive capability as manifested in their outcome measures. Comparisons of model performance will allow us to better design and implement sensorimotor adaptability training countermeasures against decrements in post-mission adaptive capability that are customized for each crewmember's sensory biases, adaptive capacity, brain structure and functional capacities, and genetic predispositions. The ability to customize adaptability training will allow more efficient use of crew time during training and will optimize training prescriptions for astronauts to ensure expected outcomes.

  18. On the limits of computational functional genomics for bacterial lifestyle prediction

    DEFF Research Database (Denmark)

    Barbosa, Eudes; Röttger, Richard; Hauschild, Anne-Christin

    2014-01-01

    We review the level of genomic specificity regarding actinobacterial pathogenicity. As they occupy various niches in diverse habitats, one may assume the existence of lifestyle-specific genomic features. We include 240 actinobacteria classified into four pathogenicity classes: human pathogens (HP...... in the post-genome era and despite next-generation sequencing technology, our ability to efficiently deduce real-world conclusions, such as pathogenicity classification, remains quite limited....

  19. Accuracy of Igenity genomically estimated breeding values for predicting Australian Angus BREEDPLAN traits.

    Science.gov (United States)

    Boerner, V; Johnston, D; Wu, X-L; Bauck, S

    2015-02-01

    Genomically estimated breeding values (GEBV) for Angus beef cattle are available from at least 2 commercial suppliers (Igenity [http://www.igenity.com] and Zoetis [http://www.zoetis.com]). The utility of these GEBV for improving genetic evaluation depends on their accuracies, which can be estimated by the genetic correlation with phenotypic target traits. Genomically estimated breeding values of 1,032 Angus bulls calculated from prediction equations (PE) derived by 2 different procedures in the U.S. Angus population were supplied by Igenity. Both procedures were based on Illuminia BovineSNP50 BeadChip genotypes. In procedure sg, GEBV were calculated from PE that used subsets of only 392 SNP, where these subsets were individually selected for each trait by BayesCπ. In procedure rg GEBV were calculated from PE derived in a ridge regression approach using all available SNP. Because the total set of 1,032 bulls with GEBV contained 732 individuals used in the Igenity training population, GEBV subsets were formed characterized by a decreasing average relationship between individuals in the subsets and individuals in the training population. Accuracies of GEBV were estimated as genetic correlations between GEBV and their phenotypic target traits modeling GEBV as trait observations in a bivariate REML approach, in which phenotypic observations were those recorded in the commercial Australian Angus seed stock sector. Using results from the GEBV subset excluding all training individuals as a reference, estimated accuracies were generally in agreement with those already published, with both types of GEBV (sg and rg) yielding similar results. Accuracies for growth traits ranged from 0.29 to 0.45, for reproductive traits from 0.11 to 0.53, and for carcass traits from 0.3 to 0.75. Accuracies generally decreased with an increasing genetic distance between the training and the validation population. However, for some carcass traits characterized by a low number of phenotypic

  20. Promoter prediction and annotation of microbial genomes based on DNA sequence and structural responses to superhelical stress

    Directory of Open Access Journals (Sweden)

    Benham Craig J

    2006-05-01

    Full Text Available Abstract Background In our previous studies, we found that the sites in prokaryotic genomes which are most susceptible to duplex destabilization under the negative superhelical stresses that occur in vivo are statistically highly significantly associated with intergenic regions that are known or inferred to contain promoters. In this report we investigate how this structural property, either alone or together with other structural and sequence attributes, may be used to search prokaryotic genomes for promoters. Results We show that the propensity for stress-induced DNA duplex destabilization (SIDD is closely associated with specific promoter regions. The extent of destabilization in promoter-containing regions is found to be bimodally distributed. When compared with DNA curvature, deformability, thermostability or sequence motif scores within the -10 region, SIDD is found to be the most informative DNA property regarding promoter locations in the E. coli K12 genome. SIDD properties alone perform better at detecting promoter regions than other programs trained on this genome. Because this approach has a very low false positive rate, it can be used to predict with high confidence the subset of promoters that are strongly destabilized. When SIDD properties are combined with -10 motif scores in a linear classification function, they predict promoter regions with better than 80% accuracy. When these methods were tested with promoter and non-promoter sequences from Bacillus subtilis, they achieved similar or higher accuracies. We also present a strictly SIDD-based predictor for annotating promoter sequences in complete microbial genomes. Conclusion In this report we show that the propensity to undergo stress-induced duplex destabilization (SIDD is a distinctive structural attribute of many prokaryotic promoter sequences. We have developed methods to identify promoter sequences in prokaryotic genomes that use SIDD either as a sole predictor or in

  1. In Vitro Analysis of Predicted DNA-Binding Sites for the Stl Repressor of the Staphylococcus aureus SaPIBov1 Pathogenicity Island.

    Directory of Open Access Journals (Sweden)

    Veronika Papp-Kádár

    Full Text Available The regulation model of the Staphylococcus aureus pathogenicity island SaPIbov1 transfer was recently reported. The repressor protein Stl obstructs the expression of SaPI proteins Str and Xis, latter which is responsible for mobilization initiation. Upon Φ11 phage infection of S. aureus. phage dUTPase activates the SaPI transfer via Stl-dUTPase complex formation. Our aim was to predict the binding sites for the Stl repressor within the S. aureus pathogenicity island DNA sequence. We found that Stl was capable to bind to three 23-mer oligonucleotides, two of those constituting sequence segments in the stl-str, while the other corresponding to sequence segment within the str-xis intergenic region. Within these oligonucleotides, mutational analysis revealed that the predicted binding site for the Stl protein exists as a palindromic segment in both intergenic locations. The palindromes are built as 6-mer repeat sequences involved in Stl binding. The 6-mer repeats are separated by a 5 oligonucleotides long, nonspecific sequence. Future examination of the interaction between Stl and its binding sites in vivo will provide a molecular explanation for the mechanisms of gene repression and gene activation exerted simultaneously by the Stl protein in regulating transfer of the SaPIbov1 pathogenicity island in S. aureus.

  2. MicroTrout: A comprehensive, genome-wide miRNA target prediction framework for rainbow trout, Oncorhynchus mykiss.

    Science.gov (United States)

    Mennigen, Jan A; Zhang, Dapeng

    2016-12-01

    Rainbow trout represent an important teleost research model and aquaculture species. As such, rainbow trout are employed in diverse areas of biological research, including basic biological disciplines such as comparative physiology, toxicology, and, since rainbow trout have undergone both teleost- and salmonid-specific rounds of genome duplication, molecular evolution. In recent years, microRNAs (miRNAs, small non-protein coding RNAs) have emerged as important posttranscriptional regulators of gene expression in animals. Given the increasingly recognized importance of miRNAs as an additional layer in the regulation of gene expression and hence biological function, recent efforts using RNA- and genome sequencing approaches have resulted in the creation of several resources for the construction of a comprehensive repertoire of rainbow trout miRNAs and isomiRs (variant miRNA sequences that all appear to derive from the same gene but vary in sequence due to post-transcriptional processing). Importantly, through the recent publication of the rainbow trout genome (Berthelot et al., 2014), mRNA 3'UTR information has become available, allowing for the first time the genome-wide prediction of miRNA-target RNA relationships in this species. We here report the creation of the microtrout database, a comprehensive resource for rainbow trout miRNA and annotated 3'UTRs. The comprehensive database was used to implement an algorithm to predict genome-wide rainbow trout-specific miRNA-mRNA target relationships, generating an improved predictive framework over previously published approaches. This work will serve as a useful framework and sequence resource to experimentally address the role of miRNAs in several research areas using the rainbow trout model, examples of which are discussed. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. The use of multiple hierarchically independent gene ontology terms in gene function prediction and genome annotation

    NARCIS (Netherlands)

    Kourmpetis, Y.I.A.; Burgt, van der A.; Bink, M.C.A.M.; Braak, ter C.J.F.; Ham, van R.C.H.J.

    2007-01-01

    The Gene Ontology (GO) is a widely used controlled vocabulary for the description of gene function. In this study we quantify the usage of multiple and hierarchically independent GO terms in the curated genome annotations of seven well-studied species. In most genomes, significant proportions (6 -

  4. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots

    Science.gov (United States)

    Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the compu...

  5. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sükösd, Zsuzsanna; Andersen, Ebbe Sloth; Seemann, Ernst Stefan;

    2015-01-01

    of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping...

  6. Bayesian prediction of bacterial growth temperature range based on genome sequences

    DEFF Research Database (Denmark)

    Jensen, Dan Børge; Vesth, Tammi Camilla; Hallin, Peter Fischer

    2012-01-01

    on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results: This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles...... and psychrophilic adapted bacterial genomes....

  7. Prediction of disease and phenotype associations from genome-wide association studies.

    Directory of Open Access Journals (Sweden)

    Stephanie N Lewis

    Full Text Available BACKGROUND: Genome wide association studies (GWAS have proven useful as a method for identifying genetic variations associated with diseases. In this study, we analyzed GWAS data for 61 diseases and phenotypes to elucidate common associations based on single nucleotide polymorphisms (SNP. The study was an expansion on a previous study on identifying disease associations via data from a single GWAS on seven diseases. METHODOLOGY/PRINCIPAL FINDINGS: Adjustments to the originally reported study included expansion of the SNP dataset using Linkage Disequilibrium (LD and refinement of the four levels of analysis to encompass SNP, SNP block, gene, and pathway level comparisons. A pair-wise comparison between diseases and phenotypes was performed at each level and the Jaccard similarity index was used to measure the degree of association between two diseases/phenotypes. Disease relatedness networks (DRNs were used to visualize our results. We saw predominant relatedness between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis for the first three levels of analysis. Expected relatedness was also seen between lipid- and blood-related traits. CONCLUSIONS/SIGNIFICANCE: The predominant associations between Multiple Sclerosis, type 1 diabetes, and rheumatoid arthritis can be validated by clinical studies. The diseases have been proposed to share a systemic inflammation phenotype that can result in progression of additional diseases in patients with one of these three diseases. We also noticed unexpected relationships between metabolic and neurological diseases at the pathway comparison level. The less significant relationships found between diseases require a more detailed literature review to determine validity of the predictions. The results from this study serve as a first step towards a better understanding of seemingly unrelated diseases and phenotypes with similar symptoms or modes of treatment.

  8. Persistency of Prediction Accuracy and Genetic Gain in Synthetic Populations Under Recurrent Genomic Selection

    Directory of Open Access Journals (Sweden)

    Dominik Müller

    2017-03-01

    Full Text Available Recurrent selection (RS has been used in plant breeding to successively improve synthetic and other multiparental populations. Synthetics are generated from a limited number of parents ( Np , but little is known about how Np affects genomic selection (GS in RS, especially the persistency of prediction accuracy (rg , g ^ and genetic gain. Synthetics were simulated by intermating Np= 2–32 parent lines from an ancestral population with short- or long-range linkage disequilibrium (LDA and subjected to multiple cycles of GS. We determined rg , g ^ and genetic gain across 30 cycles for different training set (TS sizes, marker densities, and generations of recombination before model training. Contributions to rg , g ^ and genetic gain from pedigree relationships, as well as from cosegregation and LDA between QTL and markers, were analyzed via four scenarios differing in (i the relatedness between TS and selection candidates and (ii whether selection was based on markers or pedigree records. Persistency of rg , g ^ was high for small Np , where predominantly cosegregation contributed to rg , g ^ , but also for large Np , where LDA replaced cosegregation as the dominant information source. Together with increasing genetic variance, this compensation resulted in relatively constant long- and short-term genetic gain for increasing Np > 4, given long-range LDA in the ancestral population. Although our scenarios suggest that information from pedigree relationships contributed to rg , g ^ for only very few generations in GS, we expect a longer contribution than in pedigree BLUP, because capturing Mendelian sampling by markers reduces selective pressure on pedigree relationships. Larger TS size (NTS and higher marker density improved persistency of rg , g ^ and hence genetic gain, but additional recombinations could not increase genetic gain.

  9. Identifying Rare Variation in Cases of Schizophrenia in the Isolated Population of the Faroe Islands using Whole-genome Sequencing

    DEFF Research Database (Denmark)

    Als, Thomas Damm; Lescai, Francesco; Dahl, Hans;

    is nowhere less than 4.3x and is on average 6.7x. More than 10 million Single Nucleotide Variants (SNVs) were identified based multi-sample calling of the aligned sequences. Identified risk alleles may, however, either appear to be private to the isolated population and not observed elsewhere or extremely...... of developing SZ. However, these studies are designed to examining only “the common variant” proportion of the genomic landscape of SZ. Due to increased genetic drift during founding and potential bottlenecks, followed by population expansion, isolated populations may be particularly useful in identifying rare...... disease variants, that may appear at higher frequencies and/or within a more clearly distinct haplotype structure compared to outbred populations. Small isolated populations also typically show reduced phenotypic, genetic and environmental heterogeneity, thus making them advantageous in studies aiming...

  10. Complete mitochondrial genome sequences of Korean native horse from Jeju Island: uncovering the spatio-temporal dynamics.

    Science.gov (United States)

    Yoon, Sook Hee; Kim, Jaemin; Shin, Donghyun; Cho, Seoae; Kwak, Woori; Lee, Hak-Kyo; Park, Kyoung-Do; Kim, Heebal

    2017-04-01

    The Korean native horse (Jeju horse) is one of the most important animals in Korean historical, cultural, and economical viewpoints. In the early 1980s, the Jeju horse was close to extinction. The aim of this study is to explore the phylogenomics of Korean native horse focusing on spatio-temporal dynamics. We determined complete mitochondrial genome sequences for the first Korean native (n = 6) and additional Mongolian (n = 2) horses. Those sequences were analyzed together with 143 published ones using Bayesian coalescent approach as well as three different phylogenetic analysis methods, Bayesian inference, maximum likelihood, and neighbor-joining methods. The phylogenomic trees revealed that the Korean native horses had multiple origins and clustered together with some horses from four European and one Middle Eastern breeds. Our phylogenomic analyses also supported that there was no apparent association between breed or geographic location and the evolution of global horses. Time of the most recent common ancestor of the Korean native horse was approximately 13,200-63,200 years, which was much younger than 0.696 My of modern horses. Additionally, our results showed that all global horse lineages including Korean native horse existed prior to their domestication events occurred in about 6000-10,000 years ago. This is the first study on phylogenomics of the Korean native horse focusing on spatio-temporal dynamics. Our findings increase our understanding of the domestication history of the Korean native horses, and could provide useful information for horse conservation projects as well as for horse genomics, emergence, and the geographical distribution.

  11. The prevalences of Salmonella Genomic Island 1 variants in human and animal Salmonella Typhimurium DT104 are distinguishable using a Bayesian approach.

    Directory of Open Access Journals (Sweden)

    Alison E Mather

    Full Text Available Throughout the 1990 s, there was an epidemic of multidrug resistant Salmonella Typhimurium DT104 in both animals and humans in Scotland. The use of antimicrobials in agriculture is often cited as a major source of antimicrobial resistance in pathogenic bacteria of humans, suggesting that DT104 in animals and humans should demonstrate similar prevalences of resistance determinants. Until very recently, only the application of molecular methods would allow such a comparison and our understanding has been hindered by the fact that surveillance data are primarily phenotypic in nature. Here, using large scale surveillance datasets and a novel Bayesian approach, we infer and compare the prevalence of Salmonella Genomic Island 1 (SGI1, SGI1 variants, and resistance determinants independent of SGI1 in animal and human DT104 isolates from such phenotypic data. We demonstrate differences in the prevalences of SGI1, SGI1-B, SGI1-C, absence of SGI1, and tetracycline resistance determinants independent of SGI1 between these human and animal populations, a finding that challenges established tenets that DT104 in domestic animals and humans are from the same well-mixed microbial population.

  12. Genome wide prediction of HNF4alpha functional binding sites by the use of local and global sequence context.

    Science.gov (United States)

    Kel, Alexander E; Niehof, Monika; Matys, Volker; Zemlin, Rüdiger; Borlak, Jürgen

    2008-01-01

    We report an application of machine learning algorithms that enables prediction of the functional context of transcription factor binding sites in the human genome. We demonstrate that our method allowed de novo identification of hepatic nuclear factor (HNF)4alpha binding sites and significantly improved an overall recognition of faithful HNF4alpha targets. When applied to published findings, an unprecedented high number of false positives were identified. The technique can be applied to any transcription factor.

  13. Genome-wide prediction of transcriptional regulatory elements of human promoters using gene expression and promoter analysis data

    Directory of Open Access Journals (Sweden)

    Kim Seon-Young

    2006-07-01

    Full Text Available Abstract Background A complete understanding of the regulatory mechanisms of gene expression is the next important issue of genomics. Many bioinformaticians have developed methods and algorithms for predicting transcriptional regulatory mechanisms from sequence, gene expression, and binding data. However, most of these studies involved the use of yeast which has much simpler regulatory networks than human and has many genome wide binding data and gene expression data under diverse conditions. Studies of genome wide transcriptional networks of human genomes currently lag behind those of yeast. Results We report herein a new method that combines gene expression data analysis with promoter analysis to infer transcriptional regulatory elements of human genes. The Z scores from the application of gene set analysis with gene sets of transcription factor binding sites (TFBSs were successfully used to represent the activity of TFBSs in a given microarray data set. A significant correlation between the Z scores of gene sets of TFBSs and individual genes across multiple conditions permitted successful identification of many known human transcriptional regulatory elements of genes as well as the prediction of numerous putative TFBSs of many genes which will constitute a good starting point for further experiments. Using Z scores of gene sets of TFBSs produced better predictions than the use of mRNA levels of a transcription factor itself, suggesting that the Z scores of gene sets of TFBSs better represent diverse mechanisms for changing the activity of transcription factors in the cell. In addition, cis-regulatory modules, combinations of co-acting TFBSs, were readily identified by our analysis. Conclusion By a strategic combination of gene set level analysis of gene expression data sets and promoter analysis, we were able to identify and predict many transcriptional regulatory elements of human genes. We conclude that this approach will aid in decoding

  14. Accuracy of genomic prediction using deregressed breeding values estimated from purebred and crossbred offspring phenotypes in pigs.

    Science.gov (United States)

    Hidalgo, A M; Bastiaansen, J W M; Lopes, M S; Veroneze, R; Groenen, M A M; de Koning, D-J

    2015-07-01

    Genomic selection is applied to dairy cattle breeding to improve the genetic progress of purebred (PB) animals, whereas in pigs and poultry the target is a crossbred (CB) animal for which a different strategy appears to be needed. The source of information used to estimate the breeding values, i.e., using phenotypes of CB or PB animals, may affect the accuracy of prediction. The objective of our study was to assess the direct genomic value (DGV) accuracy of CB and PB pigs using different sources of phenotypic information. Data used were from 3 populations: 2,078 Dutch Landrace-based, 2,301 Large White-based, and 497 crossbreds from an F1 cross between the 2 lines. Two female reproduction traits were analyzed: gestation length (GLE) and total number of piglets born (TNB). Phenotypes used in the analyses originated from offspring of genotyped individuals. Phenotypes collected on CB and PB animals were analyzed as separate traits using a single-trait model. Breeding values were estimated separately for each trait in a pedigree BLUP analysis and subsequently deregressed. Deregressed EBV for each trait originating from different sources (CB or PB offspring) were used to study the accuracy of genomic prediction. Accuracy of prediction was computed as the correlation between DGV and the DEBV of the validation population. Accuracy of prediction within PB populations ranged from 0.43 to 0.62 across GLE and TNB. Accuracies to predict genetic merit of CB animals with one PB population in the training set ranged from 0.12 to 0.28, with the exception of using the CB offspring phenotype of the Dutch Landrace that resulted in an accuracy estimate around 0 for both traits. Accuracies to predict genetic merit of CB animals with both parental PB populations in the training set ranged from 0.17 to 0.30. We conclude that prediction within population and trait had good predictive ability regardless of the trait being the PB or CB performance, whereas using PB population(s) to predict

  15. Predicting Essential Metabolic Genome Content of Niche-Specific Enterobacterial Human Pathogens during Simulation of Host Environments.

    Directory of Open Access Journals (Sweden)

    Tong Ding

    Full Text Available Microorganisms have evolved to occupy certain environmental niches, and the metabolic genes essential for growth in these locations are retained in the genomes. Many microorganisms inhabit niches located in the human body, sometimes causing disease, and may retain genes essential for growth in locations such as the bloodstream and urinary tract, or growth during intracellular invasion of the hosts' macrophage cells. Strains of Escherichia coli (E. coli and Salmonella spp. are thought to have evolved over 100 million years from a common ancestor, and now cause disease in specific niches within humans. Here we have used a genome scale metabolic model representing the pangenome of E. coli which contains all metabolic reactions encoded by genes from 16 E. coli genomes, and have simulated environmental conditions found in the human bloodstream, urinary tract, and macrophage to determine essential metabolic genes needed for growth in each location. We compared the predicted essential genes for three E. coli strains and one Salmonella strain that cause disease in each host environment, and determined that essential gene retention could be accurately predicted using this approach. This project demonstrated that simulating human body environments such as the bloodstream can successfully lead to accurate computational predictions of essential/important genes.

  16. Predicting Essential Metabolic Genome Content of Niche-Specific Enterobacterial Human Pathogens during Simulation of Host Environments.

    Science.gov (United States)

    Ding, Tong; Case, Kyle A; Omolo, Morrine A; Reiland, Holly A; Metz, Zachary P; Diao, Xinyu; Baumler, David J

    2016-01-01

    Microorganisms have evolved to occupy certain environmental niches, and the metabolic genes essential for growth in these locations are retained in the genomes. Many microorganisms inhabit niches located in the human body, sometimes causing disease, and may retain genes essential for growth in locations such as the bloodstream and urinary tract, or growth during intracellular invasion of the hosts' macrophage cells. Strains of Escherichia coli (E. coli) and Salmonella spp. are thought to have evolved over 100 million years from a common ancestor, and now cause disease in specific niches within humans. Here we have used a genome scale metabolic model representing the pangenome of E. coli which contains all metabolic reactions encoded by genes from 16 E. coli genomes, and have simulated environmental conditions found in the human bloodstream, urinary tract, and macrophage to determine essential metabolic genes needed for growth in each location. We compared the predicted essential genes for three E. coli strains and one Salmonella strain that cause disease in each host environment, and determined that essential gene retention could be accurately predicted using this approach. This project demonstrated that simulating human body environments such as the bloodstream can successfully lead to accurate computational predictions of essential/important genes.

  17. GenoMatrix: A Software Package for Pedigree-Based and Genomic Prediction Analyses on Complex Traits.

    Science.gov (United States)

    Nazarian, Alireza; Gezan, Salvador Alejandro

    2016-07-01

    Genomic and pedigree-based best linear unbiased prediction methodologies (G-BLUP and P-BLUP) have proven themselves efficient for partitioning the phenotypic variance of complex traits into its components, estimating the individuals' genetic merits, and predicting unobserved (or yet-to-be observed) phenotypes in many species and fields of study. The GenoMatrix software, presented here, is a user-friendly package to facilitate the process of using genome-wide marker data and parentage information for G-BLUP and P-BLUP analyses on complex traits. It provides users with a collection of applications which help them on a set of tasks from performing quality control on data to constructing and manipulating the genomic and pedigree-based relationship matrices and obtaining their inverses. Such matrices will be then used in downstream analyses by other statistical packages. The package also enables users to obtain predicted values for unobserved individuals based on the genetic values of observed related individuals. GenoMatrix is available to the research community as a Windows 64bit executable and can be downloaded free of charge at: http://compbio.ufl.edu/software/genomatrix/. © The American Genetic Association. 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. A genomic island present along the bacterial chromosome of the Parachlamydiaceae UWE25, an obligate amoebal endosymbiont, encodes a potentially functional F-like conjugative DNA transfer system

    Directory of Open Access Journals (Sweden)

    Guy Lionel

    2004-12-01

    Full Text Available Abstract Background The genome of Protochlamydia amoebophila UWE25, a Parachlamydia-related endosymbiont of free-living amoebae, was recently published, providing the opportunity to search for genomic islands (GIs. Results On the residual cumulative G+C content curve, a G+C-rich 19-kb region was observed. This sequence is part of a 100-kb chromosome region, containing 100 highly co-oriented ORFs, flanked by two 17-bp direct repeats. Two identical gly-tRNA genes in tandem are present at the proximal end of this genetic element. Several mobility genes encoding transposases and bacteriophage-related proteins are located within this chromosome region. Thus, this region largely fulfills the criteria of GIs. The G+C content analysis shows that several modules compose this GI. Surprisingly, one of them encodes all genes essential for F-like conjugative DNA transfer (traF, traG, traH, traN, traU, traW, and trbC, involved in sex pilus retraction and mating pair stabilization, strongly suggesting that, similarly to the other F-like operons, the parachlamydial tra unit is devoted to DNA transfer. A close relatedness of this tra unit to F-like tra operons involved in conjugative transfer is confirmed by phylogenetic analyses performed on concatenated genes and gene order conservation. These analyses and that of gly-tRNA distribution in 140 GIs suggest a proteobacterial origin of the parachlamydial tra unit. Conclusions A GI of the UWE25 chromosome encodes a potentially functional F-like DNA conjugative system. This is the first hint of a putative conjugative system in chlamydiae. Conjugation most probably occurs within free-living amoebae, that may contain hundreds of Parachlamydia bacteria tightly packed in vacuoles. Such a conjugative system might be involved in DNA transfer between internalized bacteria. Since this system is absent from the sequenced genomes of Chlamydiaceae, we hypothesize that it was acquired after the divergence between

  19. Extensive amplification of GI-VII-6, a multidrug resistance genomic island of Salmonella enterica serovar Typhimurium, increases resistance to extended-spectrum cephalosporins

    Directory of Open Access Journals (Sweden)

    Ken-ichi eLee

    2015-02-01

    Full Text Available GI-VII-6 is a chromosomally integrated multidrug resistance genomic island harbored by a specific clone of Salmonella enterica serovar Typhimurium (S. Typhimurium. It contains a gene encoding CMY-2 β-lactamase (blaCMY-2, and therefore contributes to extended-spectrum cephalosporin resistance. To elucidate the significance of GI-VII-6 on adaptive evolution, spontaneous mutants of S. Typhimurium strain L-3553 were selected on plates containing cefotaxime (CTX. The concentrations of CTX were higher than its minimum inhibition concentration to the parent strain. The mutants appeared on the plates containing 12.5 and 25 μg/ml CTX at a frequency of 10−6 and 10−8, respectively. No colonies were observed at higher CTX concentrations. The copy number of blaCMY-2 increased up to 85 per genome in the mutants, while the parent strain contains one copy of that in the chromosome. This elevation was accompanied by increased amount of transcription. The blaCMY-2 copy number in the mutants drastically decreased in the absence of antibiotic selection pressure. Southern hybridization analysis and short-read mapping indicated that the entire 125 kb GI-VII-6 or parts of it were tandemly amplified. GI-VII-6 amplification occurred at its original position, although it also transposed to other locations in the genome in some mutants, including an endogenous plasmid in some of the mutants, leading to the amplification of GI-VII-6 at different loci. Insertion sequences were observed at the junction of the amplified regions in the mutants, suggesting their significant roles in the transposition and amplification. Plasmid copy number in the selected mutants was 1.4 to 4.4 times higher than that of the parent strain. These data suggest that transposition and amplification of the blaCMY-2-containing region, along with the copy number variation of the plasmid, contributed to the extensive amplification of blaCMY-2 and increased resistance to CTX.

  20. Islands, Island Studies, Island Studies Journal

    Directory of Open Access Journals (Sweden)

    Godfrey Baldacchino

    2006-05-01

    Full Text Available Islands are sites of innovative conceptualizations, whether of nature or human enterprise, whether virtual or real. The study of islands on their own terms today enjoys a growing and wide-ranging recognition. This paper celebrates the launch of Island Studies Journal in the context of a long and thrilling tradition of island studies scholarship.

  1. A New Approach to Predict Microbial Community Assembly and Function Using a Stochastic, Genome-Enabled Modeling Framework

    Science.gov (United States)

    King, E.; Brodie, E.; Anantharaman, K.; Karaoz, U.; Bouskill, N.; Banfield, J. F.; Steefel, C. I.; Molins, S.

    2016-12-01

    Characterizing and predicting the microbial and chemical compositions of subsurface aquatic systems necessitates an understanding of the metabolism and physiology of organisms that are often uncultured or studied under conditions not relevant for one's environment of interest. Cultivation-independent approaches are therefore important and have greatly enhanced our ability to characterize functional microbial diversity. The capability to reconstruct genomes representing thousands of populations from microbial communities using metagenomic techniques provides a foundation for development of predictive models for community structure and function. Here, we discuss a genome-informed stochastic trait-based model incorporated into a reactive transport framework to represent the activities of coupled guilds of hypothetical microorganisms. Metabolic pathways for each microbe within a functional guild are parameterized from metagenomic data with a unique combination of traits governing organism fitness under dynamic environmental conditions. We simulate the thermodynamics of coupled electron donor and acceptor reactions to predict the energy available for cellular maintenance, respiration, biomass development, and enzyme production. While `omics analyses can now characterize the metabolic potential of microbial communities, it is functionally redundant as well as computationally prohibitive to explicitly include the thousands of recovered organisms into biogeochemical models. However, one can derive potential metabolic pathways from genomes along with trait-linkages to build probability distributions of traits. These distributions are used to assemble groups of microbes that couple one or more of these pathways. From the initial ensemble of microbes, only a subset will persist based on the interaction of their physiological and metabolic traits with environmental conditions, competing organisms, etc. Here, we analyze the predicted niches of these hypothetical microbes and

  2. Accurate Prediction of the Statistics of Repetitions in Random Sequences: A Case Study in Archaea Genomes.

    Science.gov (United States)

    Régnier, Mireille; Chassignet, Philippe

    2016-01-01

    Repetitive patterns in genomic sequences have a great biological significance and also algorithmic implications. Analytic combinatorics allow to derive formula for the expected length of repetitions in a random sequence. Asymptotic results, which generalize previous works on a binary alphabet, are easily computable. Simulations on random sequences show their accuracy. As an application, the sample case of Archaea genomes illustrates how biological sequences may differ from random sequences.

  3. Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

    Science.gov (United States)

    Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K

    2015-01-01

    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set

  4. Integrating spatial data and shorebird nesting locations to predict the potential future impact of global warming on coastal habitats: A case study on Farasan Islands, Saudi Arabia.

    Science.gov (United States)

    Alrashidi, Monif; Shobrak, Mohammed; Al-Eissa, Mohammed S; Székely, Tamás

    2012-07-01

    One of the expected effects of the global warming is changing coastal habitats by accelerating the rate of sea level rise. Coastal habitats support large number of marine and wetland species including shorebirds (plovers, sandpipers and allies). In this study, we investigate how coastal habitats may be impacted by sea level rise in the Farasan Islands, Kingdom of Saudi Arabia. We use Kentish plover Charadrius alexandrinus - a common coastal breeding shorebird - as an ecological model species to predict the influence of sea level rise. We found that any rise of sea level is likely to inundate 11% of Kentish plover nests. In addition, 5% of the coastal areas of Farasan Islands, which support 26% of Kentish plover nests, will be flooded, if sea level rises by one metre. Our results are constrained by the availability of data on both elevation and bird populations. Therefore, we recommend follow-up studies to model the impacts of sea level rise using different elevation scenarios, and the establishment of a monitoring programme for breeding shorebirds and seabirds in Farasan Islands to assess the impact of climate change on their populations.

  5. Accuracy of Genome-Enabled Prediction in a Dairy Cattle Population using Different Cross-Validation Layouts

    Science.gov (United States)

    Pérez-Cabal, M. Angeles; Vazquez, Ana I.; Gianola, Daniel; Rosa, Guilherme J. M.; Weigel, Kent A.

    2012-01-01

    The impact of extent of genetic relatedness on accuracy of genome-enabled predictions was assessed using a dairy cattle population and alternative cross-validation (CV) strategies were compared. The CV layouts consisted of training and testing sets obtained from either random allocation of individuals (RAN) or from a kernel-based clustering of individuals using the additive relationship matrix, to obtain two subsets that were as unrelated as possible (UNREL), as well as a layout based on stratification by generation (GEN). The UNREL layout decreased the average genetic relationships between training and testing animals but produced similar accuracies to the RAN design, which were about 15% higher than in the GEN setting. Results indicate that the CV structure can have an important effect on the accuracy of whole-genome predictions. However, the connection between average genetic relationships across training and testing sets and the estimated predictive ability is not straightforward, and may depend also on the kind of relatedness that exists between the two subsets and on the heritability of the trait. For high heritability traits, close relatives such as parents and full-sibs make the greatest contributions to accuracy, which can be compensated by half-sibs or grandsires in the case of lack of close relatives. However, for the low heritability traits the inclusion of close relatives is crucial and including more relatives of various types in the training set tends to lead to greater accuracy. In practice, CV designs should resemble the intended use of the predictive models, e.g., within or between family predictions, or within or across generation predictions, such that estimation of predictive ability is consistent with the actual application to be considered. PMID:22403583

  6. Traumatic Brain Injury Induces Genome-Wide Transcriptomic, Methylomic, and Network Perturbations in Brain and Blood Predicting Neurological Disorders

    Directory of Open Access Journals (Sweden)

    Qingying Meng

    2017-02-01

    Full Text Available The complexity of the traumatic brain injury (TBI pathology, particularly concussive injury, is a serious obstacle for diagnosis, treatment, and long-term prognosis. Here we utilize modern systems biology in a rodent model of concussive injury to gain a thorough view of the impact of TBI on fundamental aspects of gene regulation, which have the potential to drive or alter the course of the TBI pathology. TBI perturbed epigenomic programming, transcriptional activities (expression level and alternative splicing, and the organization of genes in networks centered around genes such as Anax2, Ogn, and Fmod. Transcriptomic signatures in the hippocampus are involved in neuronal signaling, metabolism, inflammation, and blood function, and they overlap with those in leukocytes from peripheral blood. The homology between genomic signatures from blood and brain elicited by TBI provides proof of concept information for development of biomarkers of TBI based on composite genomic patterns. By intersecting with human genome-wide association studies, many TBI signature genes and network regulators identified in our rodent model were causally associated with brain disorders with relevant link to TBI. The overall results show that concussive brain injury reprograms genes which could lead to predisposition to neurological and psychiatric disorders, and that genomic information from peripheral leukocytes has the potential to predict TBI pathogenesis in the brain.

  7. Traumatic Brain Injury Induces Genome-Wide Transcriptomic, Methylomic, and Network Perturbations in Brain and Blood Predicting Neurological Disorders.

    Science.gov (United States)

    Meng, Qingying; Zhuang, Yumei; Ying, Zhe; Agrawal, Rahul; Yang, Xia; Gomez-Pinilla, Fernando

    2017-02-01

    The complexity of the traumatic brain injury (TBI) pathology, particularly concussive injury, is a serious obstacle for diagnosis, treatment, and long-term prognosis. Here we utilize modern systems biology in a rodent model of concussive injury to gain a thorough view of the impact of TBI on fundamental aspects of gene regulation, which have the potential to drive or alter the course of the TBI pathology. TBI perturbed epigenomic programming, transcriptional activities (expression level and alternative splicing), and the organization of genes in networks centered around genes such as Anax2, Ogn, and Fmod. Transcriptomic signatures in the hippocampus are involved in neuronal signaling, metabolism, inflammation, and blood function, and they overlap with those in leukocytes from peripheral blood. The homology between genomic signatures from blood and brain elicited by TBI provides proof of concept information for development of biomarkers of TBI based on composite genomic patterns. By intersecting with human genome-wide association studies, many TBI signature genes and network regulators identified in our rodent model were causally associated with brain disorders with relevant link to TBI. The overall results show that concussive brain injury reprograms genes which could lead to predisposition to neurological and psychiatric disorders, and that genomic information from peripheral leukocytes has the potential to predict TBI pathogenesis in the brain.

  8. Comparative genomics of the Type VI secretion systems of Pantoea and Erwinia species reveals the presence of putative effector islands that may be translocated by the VgrG and Hcp proteins.

    Science.gov (United States)

    De Maayer, Pieter; Venter, Stephanus N; Kamber, Tim; Duffy, Brion; Coutinho, Teresa A; Smits, Theo H M

    2011-11-24

    The Type VI secretion apparatus is assembled by a conserved set of proteins encoded within a distinct locus. The putative effector proteins Hcp and VgrG are also encoded within these loci. We have identified numerous distinct Type VI secretion system (T6SS) loci in the genomes of several ecologically diverse Pantoea and Erwinia species and detected the presence of putative effector islands associated with the hcp and vgrG genes. Between two and four T6SS loci occur among the Pantoea and Erwinia species. While two of the loci (T6SS-1 and T6SS-2) are well conserved among the various strains, the third (T6SS-3) locus is not universally distributed. Additional orthologous loci are present in Pantoea sp. aB-valens and Erwinia billingiae Eb661. Comparative analysis of the T6SS-1 and T6SS-3 loci showed non-conserved islands associated with the vgrG and hcp, and vgrG genes, respectively. These regions had a G+C content far lower than the conserved portions of the loci. Many of the proteins encoded within the hcp and vgrG islands carry conserved domains, which suggests they may serve as effector proteins for the T6SS. A number of the proteins also show homology to the C-terminal extensions of evolved VgrG proteins. Extensive diversity was observed in the number and content of the T6SS loci among the Pantoea and Erwinia species. Genomic islands could be observed within some of T6SS loci, which are associated with the hcp and vgrG proteins and carry putative effector domain proteins. We propose new hypotheses concerning a role for these islands in the acquisition of T6SS effectors and the development of novel evolved VgrG and Hcp proteins.

  9. Comparative genomics of the type VI secretion systems of Pantoea and Erwinia species reveals the presence of putative effector islands that may be translocated by the VgrG and Hcp proteins

    Science.gov (United States)

    2011-01-01

    Background The Type VI secretion apparatus is assembled by a conserved set of proteins encoded within a distinct locus. The putative effector proteins Hcp and VgrG are also encoded within these loci. We have identified numerous distinct Type VI secretion system (T6SS) loci in the genomes of several ecologically diverse Pantoea and Erwinia species and detected the presence of putative effector islands associated with the hcp and vgrG genes. Results Between two and four T6SS loci occur among the Pantoea and Erwinia species. While two of the loci (T6SS-1 and T6SS-2) are well conserved among the various strains, the third (T6SS-3) locus is not universally distributed. Additional orthologous loci are present in Pantoea sp. aB-valens and Erwinia billingiae Eb661. Comparative analysis of the T6SS-1 and T6SS-3 loci showed non-conserved islands associated with the vgrG and hcp, and vgrG genes, respectively. These regions had a G+C content far lower than the conserved portions of the loci. Many of the proteins encoded within the hcp and vgrG islands carry conserved domains, which suggests they may serve as effector proteins for the T6SS. A number of the proteins also show homology to the C-terminal extensions of evolved VgrG proteins. Conclusions Extensive diversity was observed in the number and content of the T6SS loci among the Pantoea and Erwinia species. Genomic islands could be observed within some of T6SS loci, which are associated with the hcp and vgrG proteins and carry putative effector domain proteins. We propose new hypotheses concerning a role for these islands in the acquisition of T6SS effectors and the development of novel evolved VgrG and Hcp proteins. PMID:22115407

  10. Comparative genomics of the type VI secretion systems of Pantoea and Erwinia species reveals the presence of putative effector islands that may be translocated by the VgrG and Hcp proteins

    Directory of Open Access Journals (Sweden)

    De Maayer Pieter

    2011-11-01

    Full Text Available Abstract Background The Type VI secretion apparatus is assembled by a conserved set of proteins encoded within a distinct locus. The putative effector proteins Hcp and VgrG are also encoded within these loci. We have identified numerous distinct Type VI secretion system (T6SS loci in the genomes of several ecologically diverse Pantoea and Erwinia species and detected the presence of putative effector islands associated with the hcp and vgrG genes. Results Between two and four T6SS loci occur among the Pantoea and Erwinia species. While two of the loci (T6SS-1 and T6SS-2 are well conserved among the various strains, the third (T6SS-3 locus is not universally distributed. Additional orthologous loci are present in Pantoea sp. aB-valens and Erwinia billingiae Eb661. Comparative analysis of the T6SS-1 and T6SS-3 loci showed non-conserved islands associated with the vgrG and hcp, and vgrG genes, respectively. These regions had a G+C content far lower than the conserved portions of the loci. Many of the proteins encoded within the hcp and vgrG islands carry conserved domains, which suggests they may serve as effector proteins for the T6SS. A number of the proteins also show homology to the C-terminal extensions of evolved VgrG proteins. Conclusions Extensive diversity was observed in the number and content of the T6SS loci among the Pantoea and Erwinia species. Genomic islands could be observed within some of T6SS loci, which are associated with the hcp and vgrG proteins and carry putative effector domain proteins. We propose new hypotheses concerning a role for these islands in the acquisition of T6SS effectors and the development of novel evolved VgrG and Hcp proteins.

  11. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of

  12. DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale

    Science.gov (United States)

    Zhou, Tianyin; Yang, Lin; Lu, Yan; Dror, Iris; Dantas Machado, Ana Carolina; Ghane, Tahereh; Di Felice, Rosa; Rohs, Remo

    2013-01-01

    We present a method and web server for predicting DNA structural features in a high-throughput (HT) manner for massive sequence data. This approach provides the framework for the integration of DNA sequence and shape analyses in genome-wide studies. The HT methodology uses a sliding-window approach to mine DNA structural information obtained from Monte Carlo simulations. It requires only nucleotide sequence as input and instantly predicts multiple structural features of DNA (minor groove width, roll, propeller twist and helix twist). The results of rigorous validations of the HT predictions based on DNA structures solved by X-ray crystallography and NMR spectroscopy, hydroxyl radical cleavage data, statistical analysis and cross-validation, and molecular dynamics simulations provide strong confidence in this approach. The DNAshape web server is freely available at http://rohslab.cmb.usc.edu/DNAshape/. PMID:23703209

  13. Sequence-based characterization of Tn5801-like genomic islands in tetracycline-resistant Staphylococcus pseudintermedius and other Gram-positive bacteria from humans and animals

    Directory of Open Access Journals (Sweden)

    Lisbeth Elvira De Vries

    2016-04-01

    Full Text Available Antibiotic resistance in pathogens is often associated with mobile genetic elements, such as genomic islands (GI including integrative and conjugative elements (ICEs. These can transfer resistance genes within and between bacteria from humans and/or animals. The aim of this study was to investigate whether Tn5801-like GIs carrying the tetracycline resistance gene, tet(M, are common in Staphylococcus pseudintermedius from pets, and to do an overall sequences-based characterization of Tn5801-like GIs detected in Gram-positive bacteria from humans and animals. A total of 27 tetracycline-resistant S. pseudintermedius isolates from Danish pets (1998-2005 were screened for tet(M by PCR. Selected isolates (13 were screened for GI- or ICE-specific genes (intTn5801 or xisTn916 and their tet(M gene was sequenced (Sanger-method. Long-range PCR mappings and whole-genome-sequencing (Illumina were performed for selected S. pseudintermedius-isolates (7 and 3 isolates, respectively as well as for human Staphylococcus aureus isolates (7 and 1 isolates, respectively and one porcine Enterococcus faecium isolate known to carry Tn5801-like GIs. All 27 S. pseudintermedius were positive for tet(M. Out of 13 selected isolates, 7 contained Tn5801-like GIs and 6 contained Tn916-like ICEs. Two different Tn5801-like GI types were detected among S. pseudintermedius (Tn5801 and GI6287 - both showed high similarity compared to GenBank sequences from human pathogens. Two distinct Tn5801-like GI types were detected among the porcine E. faecium and human S. aureus isolates (Tn6014 and GI6288. Tn5801-like GIs were detected in GenBank-sequences from Gram-positive bacteria of human, animal or food origin worldwide. Known Tn5801-like GIs were divided into 7 types. The results showed that Tn5801-like GIs appear to be relatively common in tetracycline-resistant S. pseudintermedius in Denmark. Almost identical Tn5801-like GIs were identified in different Gram-positive species of pet

  14. Sequence-Based Characterization of Tn5801-Like Genomic Islands in Tetracycline-Resistant Staphylococcus pseudintermedius and Other Gram-positive Bacteria from Humans and Animals.

    Science.gov (United States)

    de Vries, Lisbeth E; Hasman, Henrik; Jurado Rabadán, Sonia; Agersø, Yvonne

    2016-01-01

    Antibiotic resistance in pathogens is often associated with mobile genetic elements, such as genomic islands (GI) including integrative and conjugative elements (ICEs). These can transfer resistance genes within and between bacteria from humans and/or animals. The aim of this study was to investigate whether Tn5801-like GIs carrying the tetracycline resistance gene, tet(M), are common in Staphylococcus pseudintermedius from pets, and to do an overall sequences-based characterization of Tn5801-like GIs detected in Gram-positive bacteria from humans and animals. A total of 27 tetracycline-resistant S. pseudintermedius isolates from Danish pets (1998-2005) were screened for tet(M) by PCR. Selected isolates (13) were screened for GI- or ICE-specific genes (int Tn5801 or xis Tn916 ) and their tet(M) gene was sequenced (Sanger-method). Long-range PCR mappings and whole-genome-sequencing (Illumina) were performed for selected S. pseudintermedius-isolates (seven and three isolates, respectively) as well as for human S. aureus isolates (seven and one isolates, respectively) and one porcine Enterococcus faecium isolate known to carry Tn5801-like GIs. All 27 S. pseudintermedius were positive for tet(M). Out of 13 selected isolates, seven contained Tn5801-like GIs and six contained Tn916-like ICEs. Two different Tn5801-like GI types were detected among S. pseudintermedius (Tn5801 and GI6287) - both showed high similarity compared to GenBank sequences from human pathogens. Two distinct Tn5801-like GI types were detected among the porcine E. faecium and human S. aureus isolates (Tn6014 and GI6288). Tn5801-like GIs were detected in GenBank-sequences from Gram-positive bacteria of human, animal or food origin worldwide. Known Tn5801-like GIs were divided into seven types. The results showed that Tn5801-like GIs appear to be relatively common in tetracycline-resistant S. pseudintermedius in Denmark. Almost identical Tn5801-like GIs were identified in different Gram-positive species

  15. Motif-independent prediction of a secondary metabolism gene cluster using comparative genomics: application to sequenced genomes of Aspergillus and ten other filamentous fungal species.

    Science.gov (United States)

    Takeda, Itaru; Umemura, Myco; Koike, Hideaki; Asai, Kiyoshi; Machida, Masayuki

    2014-08-01

    Despite their biological importance, a significant number of genes for secondary metabolite biosynthesis (SMB) remain undetected due largely to the fact that they are highly diverse and are not expressed under a variety of cultivation conditions. Several software tools including SMURF and antiSMASH have been developed to predict fungal SMB gene clusters by finding core genes encoding polyketide synthase, nonribosomal peptide synthetase and dimethylallyltryptophan synthase as well as several others typically present in the cluster. In this work, we have devised a novel comparative genomics method to identify SMB gene clusters that is independent of motif information of the known SMB genes. The method detects SMB gene clusters by searching for a similar order of genes and their presence in nonsyntenic blocks. With this method, we were able to identify many known SMB gene clusters with the core genes in the genomic sequences of 10 filamentous fungi. Furthermore, we have also detected SMB gene clusters without core genes, including the kojic acid biosynthesis gene cluster of Aspergillus oryzae. By varying the detection parameters of the method, a significant difference in the sequence characteristics was detected between the genes residing inside the clusters and those outside the clusters. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  16. Pan-Genome Analysis of Human Gastric Pathogen H. pylori: Comparative Genomics and Pathogenomics Approaches to Identify Regions Associated with Pathogenicity and Prediction of Potential Core Therapeutic Targets

    DEFF Research Database (Denmark)

    Ali, Amjad; Naz, Anam; Soares, Siomar C.

    2015-01-01

    . Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome...

  17. Genetic analysis of environmental strains of the plant pathogen Phytophthora capsici reveals heterogeneous repertoire of effectors and possible effector evolution via genomic island.

    Science.gov (United States)

    Iribarren, María Josefina; Pascuan, Cecilia; Soto, Gabriela; Ayub, Nicolás Daniel

    2015-11-01

    Phytophthora capsici is a virulent oomycete pathogen of many vegetable crops. Recently, it has been demonstrated that the recognition of the RXLR effector AVR3a1 of P. capsici (PcAVR3a1) triggers a hypersensitive response and plays a critical role in mediating non-host resistance. Here, we analyzed the occurrence of PcAVR3a1 in 57 isolates of P. capsici derived from globe squash, eggplant, tomato and bell pepper cocultivated in a small geographical area. The occurrence of PcAVR3a1 in environmental strains of P. capsici was confirmed by PCR in only 21 of these pathogen isolates. To understand the presence-absence pattern of PcAVR3a1 in environmental strains, the flanking region of this gene was sequenced. PcAVR3a1 was found within a genetic element that we named PcAVR3a1-GI (PcAVR3a1 genomic island). PcAVR3a1-GI was flanked by a 22-bp direct repeat, which is related to its site-specific recombination site. In addition to the PcAVR3a1 gene, PcAVR3a1-GI also encoded a phage integrase probably associated with the excision and integration of this mobile element. Exposure to plant induced the presence of an episomal circular intermediate of PcAVR3a1-GI, indicating that this mobile element is functional. Collectively, these findings provide evidence of PcAVR3a1 evolution via mobile elements in environmental strains of Phytophthora.

  18. Assimilation of Remotely Sensed Data into the Biome-BGC Ecosystem Model to Improve the Prediction of Energy and Carbon Exchange in Southwestern Mountain Island Forests

    Science.gov (United States)

    Brown-Mitic, C. M.; Burke, E. J.; Shuttleworth, W. J.; Petti, J. R.; Harlow, R. C.; Brooks, P. D.

    2003-12-01

    Mountain island forest ecosystems populate high-altitude areas in the semi-arid southwestern U.S. and northwestern Mexico. In these regions, on average, the precipitation input exceeds the evapotranspiration loss. Therefore, they represent the primary source area for sustainable water resources, as well as making a major contribution to the regional carbon balance. During 2002 a 30-m tall micrometeorological tower was installed at a sky island forest site in the Santa Catalina Mountains, near Tucson AZ in order to characterize the surface exchanges of water, energy, and carbon. Measurements to date indicate that the surface fluxes are very sensitive to water status, for example, stomatal functioning totally closed down during the pre-monsoon period in 2002. Initial studies with the Biome Biogeochemical model (Biome-BGC) using driving data measured in the Santa Catalina Mountains provide reasonable simulations of the behavior of the mountain island forest. In order to obtain a regional estimate of the carbon exchange using a model such as Biome-BGC accurate estimates of the distributed forcing data particularly the precipitation are required. Alternatively, an estimate of the soil water status and the leaf area index can be used to improve model predictions. TERRA-MODIS leaf area index and fraction of photosynthetically active radiation (FPAR) are both sensitive to drought stress. Time periods during which there is a marked and sustained decrease in FPAR are shown to be symptomatic of a sustained period of low soil moisture that started 10-20 days earlier. Under these conditions the soil moisture status in Biome-BGC can be set to an arbitrary low value. Initial modeling studies demonstrate significant improvement in the prediction of surface exchange when using FPAR to diagnose periods of water stress.

  19. Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes

    Science.gov (United States)

    Oh, Jung Hun; Kerns, Sarah; Ostrer, Harry; Powell, Simon N.; Rosenstein, Barry; Deasy, Joseph O.

    2017-02-01

    The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.

  20. Gene Ontology consistent protein function prediction: the FALCON algorithm applied to six eukaryotic genomes

    NARCIS (Netherlands)

    Kourmpetis, Y.A.I.; Dijk, van A.D.J.; Braak, ter C.J.F.

    2013-01-01

    Gene Ontology (GO) is a hierarchical vocabulary for the description of biological functions and locations, often employed by computational methods for protein function prediction. Due to the structure of GO, function predictions can be self- contradictory. For example, a protein may be predicted to

  1. Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers

    DEFF Research Database (Denmark)

    Su, Guosheng; Christensen, Ole Fredslund; Ostersen, Tage;

    2012-01-01

    Non-additive genetic variation is usually ignored when genome-wide markers are used to study the genetic architecture and genomic prediction of complex traits in human, wild life, model organisms or farm animals. However, non-additive genetic effects may have an important contribution to total...... genetic variation of complex traits. This study presented a genomic BLUP model including additive and non-additive genetic effects, in which additive and non-additive genetic relation matrices were constructed from information of genome-wide dense single nucleotide polymorphism (SNP) markers. In addition...... of genomic predictions for daily gain in pigs. In the analysis of daily gain, four linear models were used: 1) a simple additive genetic model (MA), 2) a model including both additive and additive by additive epistatic genetic effects (MAE), 3) a model including both additive and dominance genetic effects...

  2. Comparison on genomic predictions using GBLUP models and two single-step blending methods with different relationship matrices in the Nordic Holstein population

    DEFF Research Database (Denmark)

    Gao, Hongding; Christensen, Ole Fredslund; Madsen, Per

    2012-01-01

    Background A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may...... not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16......) a simple GBLUP method, 2) a GBLUP method with a polygenic effect, 3) an adjusted GBLUP method with a polygenic effect, 4) a single-step blending method, and 5) an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted...

  3. Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix.

    Directory of Open Access Journals (Sweden)

    Zhe Zhang

    Full Text Available BACKGROUND: With the availability of high density whole-genome single nucleotide polymorphism chips, genomic selection has become a promising method to estimate genetic merit with potentially high accuracy for animal, plant and aquaculture species of economic importance. With markers covering the entire genome, genetic merit of genotyped individuals can be predicted directly within the framework of mixed model equations, by using a matrix of relationships among individuals that is derived from the markers. Here we extend that approach by deriving a marker-based relationship matrix specifically for the trait of interest. METHODOLOGY/PRINCIPAL FINDINGS: In the framework of mixed model equations, a new best linear unbiased prediction (BLUP method including a trait-specific relationship matrix (TA was presented and termed TABLUP. The TA matrix was constructed on the basis of marker genotypes and their weights in relation to the trait of interest. A simulation study with 1,000 individuals as the training population and five successive generations as candidate population was carried out to validate the proposed method. The proposed TABLUP method outperformed the ridge regression BLUP (RRBLUP and BLUP with realized relationship matrix (GBLUP. It performed slightly worse than BayesB with an accuracy of 0.79 in the standard scenario. CONCLUSIONS/SIGNIFICANCE: The proposed TABLUP method is an improvement of the RRBLUP and GBLUP method. It might be equivalent to the BayesB method but it has additional benefits like the calculation of accuracies for individual breeding values. The results also showed that the TA-matrix performs better in predicting ability than the classical numerator relationship matrix and the realized relationship matrix which are derived solely from pedigree or markers without regard to the trait. This is because the TA-matrix not only accounts for the Mendelian sampling term, but also puts the greater emphasis on those markers that

  4. A unified and comprehensible view of parametric and kernel methods for genomic prediction with application to rice

    Directory of Open Access Journals (Sweden)

    Laval Jacquin

    2016-08-01

    Full Text Available One objective of this study was to provide readers with a clear and unified understanding ofparametric statistical and kernel methods, used for genomic prediction, and to compare some ofthese in the context of rice breeding for quantitative traits. Furthermore, another objective wasto provide a simple and user-friendly R package, named KRMM, which allows users to performRKHS regression with several kernels. After introducing the concept of regularized empiricalrisk minimization, the connections between well-known parametric and kernel methods suchas Ridge regression (i.e. genomic best linear unbiased predictor (GBLUP and reproducingkernel Hilbert space (RKHS regression were reviewed. Ridge regression was then reformulatedso as to show and emphasize the advantage of the kernel trick concept, exploited by kernelmethods in the context of epistatic genetic architectures, over parametric frameworks used byconventional methods. Some parametric and kernel methods; least absolute shrinkage andselection operator (LASSO, GBLUP, support vector machine regression (SVR and RKHSregression were thereupon compared for their genomic predictive ability in the context of ricebreeding using three real data sets. Among the compared methods, RKHS regression and SVRwere often the most accurate methods for prediction followed by GBLUP and LASSO. An Rfunction which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression,with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time hasbeen developed. Moreover, a modified version of this function, which allows users to tune kernelsfor RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.

  5. Accuracy of genomic prediction of purebreds for cross bred performance in pigs

    NARCIS (Netherlands)

    Marubayashi Hidalgo, Andre; Bastiaansen, J.W.M.; Lopes, M.S.; Calus, M.P.L.; Koning, de D.J.

    2016-01-01

    In pig breeding, as the final product is a cross bred (CB) animal, the goal is to increase the CB performance. This goal requires different strategies for the implementation of genomic selection from what is currently implemented in, for example dairy cattle breeding. A good strategy is to estima

  6. Bias due to selective genotyping in genomic prediction using H-BLUP

    DEFF Research Database (Denmark)

    Wang, Lei; Madsen, Per; Sapp, Robyn

    H-BLUP uses a variance-covariance structure based on a combined relationship matrix (H), which augments a pedigree-based relationship matrix (A) with a genomic relationship matrix (G) for genotyped individuals. In practice, often only preselected individuals are genotyped and this selective genot...

  7. Gross genomic damage measured by DNA image cytometry independently predicts gastric cancer patient survival

    NARCIS (Netherlands)

    Belien, J.A.M.; Buffart, T.E.; Gill, A.; Broeckaert, M.A.M.; Quirke, P.; Meijer, G.A.; Grabsch, H.

    2009-01-01

    BACKGROUND: DNA aneuploidy reflects gross genomic changes. It can be measured by flow cytometry (FCM-DNA) or image cytometry (ICM-DNA). In gastric cancer, the prevalence of DNA aneuploidy has been reported to range from 27 to 100%, with conflicting associations with clinicopathological variables. Th

  8. Prediction of total genetic value using genome-wide dense marker maps

    NARCIS (Netherlands)

    Meuwissen, T.H.; Hayes, B.J.; Goddard, M.E.

    2001-01-01

    Recent advances in molecular genetic techniques will make dense marker maps available and genotyping many individuals for these markers feasible. Here we attempted to estimate the effects of ∼50,000 marker haplotypes simultaneously from a limited number of phenotypic records. A genome of 1000 cM was

  9. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sukosd, Zsuzsanna; Andersen, Ebbe S.; Seemann, Stefan E.

    2015-01-01

    protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential...

  10. The role of genomics in the identification, prediction, and prevention of biological threats.

    Science.gov (United States)

    Fricke, W Florian; Rasko, David A; Ravel, Jacques

    2009-10-01

    In all likelihood, it is only a matter of time before our public health system will face a major biological threat, whether intentionally dispersed or originating from a known or newly emerging infectious disease. It is necessary not only to increase our reactive "biodefense," but also to be proactive and increase our preparedness. To achieve this goal, it is essential that the scientific and public health communities fully embrace the genomic revolution, and that novel bioinformatic and computing tools necessary to make great strides in our understanding of these novel and emerging threats be developed. Genomics has graduated from a specialized field of science to a research tool that soon will be routine in research laboratories and clinical settings. Because the technology is becoming more affordable, genomics can and should be used proactively to build our preparedness and responsiveness to biological threats. All pieces, including major continued funding, advances in next-generation sequencing technologies, bioinformatics infrastructures, and open access to data and metadata, are being set in place for genomics to play a central role in our public health system.

  11. The role of genomics in the identification, prediction, and prevention of biological threats.

    Directory of Open Access Journals (Sweden)

    W Florian Fricke

    2009-10-01

    Full Text Available In all likelihood, it is only a matter of time before our public health system will face a major biological threat, whether intentionally dispersed or originating from a known or newly emerging infectious disease. It is necessary not only to increase our reactive "biodefense," but also to be proactive and increase our preparedness. To achieve this goal, it is essential that the scientific and public health communities fully embrace the genomic revolution, and that novel bioinformatic and computing tools necessary to make great strides in our understanding of these novel and emerging threats be developed. Genomics has graduated from a specialized field of science to a research tool that soon will be routine in research laboratories and clinical settings. Because the technology is becoming more affordable, genomics can and should be used proactively to build our preparedness and responsiveness to biological threats. All pieces, including major continued funding, advances in next-generation sequencing technologies, bioinformatics infrastructures, and open access to data and metadata, are being set in place for genomics to play a central role in our public health system.

  12. Current theoretical models fail to predict the topological complexity of the human genome.

    Science.gov (United States)

    Arsuaga, Javier; Jayasinghe, Reyka G; Scharein, Robert G; Segal, Mark R; Stolz, Robert H; Vazquez, Mariel

    2015-01-01

    Understanding the folding of the human genome is a key challenge of modern structural biology. The emergence of chromatin conformation capture assays (e.g., Hi-C) has revolutionized chromosome biology and provided new insights into the three dimensional structure of the genome. The experimental data are highly complex and need to be analyzed with quantitative tools. It has been argued that the data obtained from Hi-C assays are consistent with a fractal organization of the genome. A key characteristic of the fractal globule is the lack of topological complexity (knotting or inter-linking). However, the absence of topological complexity contradicts results from polymer physics showing that the entanglement of long linear polymers in a confined volume increases rapidly with the length and with decreasing volume. In vivo and in vitro assays support this claim in some biological systems. We simulate knotted lattice polygons confined inside a sphere and demonstrate that their contact frequencies agree with the human Hi-C data. We conclude that the topological complexity of the human genome cannot be inferred from current Hi-C data.

  13. Methods to improve genomic prediction and GWAS using combined Holstein populations

    DEFF Research Database (Denmark)

    Li, Xiujin

    interaction exists between populations. 2) Combining data from Chinese and Danish Holstein populations increases the power of GWAS and detects new QTL regions for milk fatty acid traits. 3) The novel multi-trait Bayesian model efficiently estimates region-specific genomic variances, covariances...

  14. Reliabilities of genomic prediction using combined reference data of the Nordic Red dairy cattle production

    DEFF Research Database (Denmark)

    Brøndum, Rasmus Froberg; Rius-Vilarrasa, E; Strandén, I

    2011-01-01

    This study investigated the possibility of increasing the reliability of direct genomic values (DGV) by combining reference opulations. The data were from 3,735 bulls from Danish, Swedish, and Finnish Red dairy cattle populations. Single nucleotide polymorphism markers were fitted as random varia...

  15. CpG-island fragments from the HNRPA2B1/CBX3 genomic locus reduce silencing and enhance transgene expression from the hCMV promoter/enhancer in mammalian cells

    Directory of Open Access Journals (Sweden)

    Irvine Alistair

    2005-06-01

    Full Text Available Abstract Background The hCMV promoter is very commonly used for high level expression of transgenes in mammalian cells, but its utility is hindered by transcriptional silencing. Large genomic fragments incorporating the CpG island region of the HNRPA2B1 locus are resistant to transcriptional silencing. Results In this report we describe studies on the use of a novel series of vectors combining the HNRPA2B1 CpG island with the hCMV promoter for expression of transgenes in CHO-K1 cells. We show that the CpG island gives at least twenty-fold increases in the levels of EGFP and EPO observed in pools of transfectants, and that transgene expression levels remain high in such pools for more than 100 generations. These novel vectors also allow facile isolation of clonal CHO-K1 cell lines showing stable, high-level transgene expression. Conclusion Vectors incorporating the hnRPA2B1 CpG island give major benefits in transgene expression from the hCMV promoter, including substantial improvements in the level and stability of expression. The utility of these vectors for the improved production of recombinant proteins in CHO cells has been demonstrated.

  16. Predicted Highly Expressed Genes in the Genomes of Streptomyces Coelicolor and Streptomyces Avermitilis and the Implications for their Metabolism.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Gang; Culley, David E.; Zhang, Weiwen

    2005-06-01

    SUMMARY-Highly expressed genes in bacteria often have a stronger codon bias than genes expressed at lower levels. In this study, a comparative analysis of predicted highly expressed (PHX) genes in the Streptomyces coelicolor and S. avermitilis genomes was performed using the codon adaptation index (CAI) as a numerical estimator of gene expression level. Although it has been suggested that there is little heterogeneity in codon usage in G+C rich bacteria, considerable heterogeneity was found among genes in two G+C rich Streptomyces genomes. Using ribosomal protein (RP) genes as references, ~10% of the genes were predicted to be PHX genes using a CAI cutoff value of greater than 0.78 and 0.75 in S. coelicolor and S. avermitilis, respectively. Most of the PHX genes were found to be located within the conserved cores of the Streptomyces linear chromosomes. The predicted PHX genes showed good agreement with the experimental data on expression levels collected by proteomic analysis (Hesketh et al., 2002). Among all PHX genes, 368 were conserved in both genomes. These represented most of the genes essential for cell growth, including those involved in protein and DNA biosynthesis, amino acid metabolism, central intermediary and energy metabolisms. Only a few genes directly involved in biosynthesis of secondary metabolites were predicted to be PHX genes. Correspondence analysis showed that the genes responsible for biosynthesis of secondary metabolites possessed different codon usage patterns from RP genes, suggesting that they were either under strong translational selection that may have driven the codon preference in another direction, or they were acquired by horizontal transfer during their origin and evolution. Nevertheless, several key genes responsible for producing precursors for secondary metabolites, such as crotonyl-CoA reductase and propionyl-CoA carboxylase, and genes necessary for initiation of secondary metabolism, such as adenosylmethionine synthetase were

  17. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds

    DEFF Research Database (Denmark)

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei

    2017-01-01

    sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased...... regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. RESULTS: Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed...... equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically...

  18. Data Mining Strategy for "Gene Prediction" with Special Reference to Cotton Genome

    Institute of Scientific and Technical Information of China (English)

    KSHIRSAGAR Manali; BALASUBRAMANI G; SINGH Col Gurmit

    2008-01-01

    @@ This paper presents an integrated approach towards solving the problem of "Gene Prediction".The "Gene Prediction" problem solving undergoes well defined stages starting with a DNA sequence as input and lab treatment and computational analysis go hands in hands throughout the process.Many bioinformatics tools are available for analysis at different stages of "Gene Prediction",but a simplified and integrated approach is needed to support and speed up the task of a life scientist.

  19. Paradise Islands? Island States and Environmental Performance

    Directory of Open Access Journals (Sweden)

    Sverker C. Jagers

    2016-03-01

    Full Text Available Island states have been shown to outperform continental states on a number of large-scale coordination-related outcomes, such as levels of democracy and institutional quality. The argument developed and tested in this article contends that the same kind of logic may apply to islands’ environmental performance, too. However, the empirical analysis shows mixed results. Among the 105 environmental outcomes that we analyzed, being an island only has a positive impact on 20 of them. For example, island states tend to outcompete continental states with respect to several indicators related to water quality but not in aspects related to biodiversity, protected areas, or environmental regulations. In addition, the causal factors previously suggested to make islands outperform continental states in terms of coordination have weak explanatory power in predicting islands’ environmental performance. We conclude the paper by discussing how these interesting findings can be further explored.

  20. Evolutionary gradient of predicted nuclear localization signals (NLS)-bearing proteins in genomes of family Planctomycetaceae.

    Science.gov (United States)

    Guo, Min; Yang, Ruifu; Huang, Chen; Liao, Qiwen; Fan, Guangyi; Sun, Chenghang; Lee, Simon Ming-Yuen

    2017-04-04

    The nuclear envelope is considered a key classification marker that distinguishes prokaryotes from eukaryotes. However, this marker does not apply to the family Planctomycetaceae, which has intracellular spaces divided by lipidic intracytoplasmic membranes (ICMs). Nuclear localization signal (NLS), a short stretch of amino acid sequence, destines to transport proteins from cytoplasm into nucleus, and is also associated with the development of nuclear envelope. We attempted to investigate the NLS motifs in Planctomycetaceae genomes to demonstrate the potential molecular transition in the development of intracellular membrane system. In this study, we identified NLS-like motifs that have the same amino acid compositions as experimentally identified NLSs in genomes of 11 representative species of family Planctomycetaceae. A total of 15 NLS types and 170 NLS-bearing proteins were detected in the 11 strains. To determine the molecular transformation, we compared NLS-bearing protein abundances in the 11 representative Planctomycetaceae genomes with them in genomes of 16 taxonomically varied microorganisms: nine bacteria, two archaea and five fungi. In the 27 strains, 29 NLS types and 1101 NLS-bearing proteins were identified, principal component analysis showed a significant transitional gradient from bacteria to Planctomycetaceae to fungi on their NLS-bearing protein abundance profiles. Then, we clustered the 993 non-redundant NLS-bearing proteins into 181 families and annotated their involved metabolic pathways. Afterwards, we aligned the ten types of NLS motifs from the 13 families containing NLS-bearing proteins among bacteria, Planctomycetaceae or fungi, considering their diversity, length and origin. A transition towards increased complexity from non-planctomycete bacteria to Planctomycetaceae to archaea and fungi was detected based on the complexity of the 10 types of NLS-like motifs in the 13 NLS-bearing proteins families. The results of this study reveal that