WorldWideScience

Sample records for accurate structural genome

  1. SV2: accurate structural variation genotyping and de novo mutation detection from whole genomes.

    Science.gov (United States)

    Antaki, Danny; Brandler, William M; Sebat, Jonathan

    2018-05-15

    Structural variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease. Here, we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2). jsebat@ucsd.edu. Supplementary data are available at Bioinformatics online.

  2. Identification of transcriptional signals in Encephalitozoon cuniculi widespread among Microsporidia phylum: support for accurate structural genome annotation

    Directory of Open Access Journals (Sweden)

    Wincker Patrick

    2009-12-01

    , 5'UTRs being strongly reduced, these signals can be used to ensure the accurate prediction of translation initiation codons for microsporidian genes and to improve microsporidian genome annotation.

  3. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Science.gov (United States)

    Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E

    2006-01-01

    Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid

  4. Rapid and accurate pyrosequencing of angiosperm plastid genomes

    Directory of Open Access Journals (Sweden)

    Farmerie William G

    2006-08-01

    Full Text Available Abstract Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20 System (454 Life Sciences Corporation, to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae and Platanus occidentalis (Platanaceae. Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy

  5. Dense and accurate whole-chromosome haplotyping of individual genomes

    NARCIS (Netherlands)

    Porubsky, David; Garg, Shilpa; Sanders, Ashley D.; Korbel, Jan O.; Guryev, Victor; Lansdorp, Peter M.; Marschall, Tobias

    2017-01-01

    The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate

  6. Towards accurate de novo assembly for genomes with repeats

    NARCIS (Netherlands)

    Bucur, Doina

    2017-01-01

    De novo genome assemblers designed for short k-mer length or using short raw reads are unlikely to recover complex features of the underlying genome, such as repeats hundreds of bases long. We implement a stochastic machine-learning method which obtains accurate assemblies with repeats and

  7. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner

    DEFF Research Database (Denmark)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan

    2009-01-01

    MOTIVATION: The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary...... determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than...

  8. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    Science.gov (United States)

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

    Science.gov (United States)

    Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

    2009-07-01

    The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/

  10. Informational laws of genome structures

    Science.gov (United States)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  11. Structural genomics in endocrinology

    NARCIS (Netherlands)

    Smit, J. W.; Romijn, J. A.

    2001-01-01

    Traditionally, endocrine research evolved from the phenotypical characterisation of endocrine disorders to the identification of underlying molecular pathophysiology. This approach has been, and still is, extremely successful. The introduction of genomics and proteomics has resulted in a reversal of

  12. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

    Science.gov (United States)

    Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter

    2018-01-01

    Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also

  13. Functional Insights from Structural Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Forouhar,F.; Kuzin, A.; Seetharaman, J.; Lee, I.; Zhou, W.; Abashidze, M.; Chen, Y.; Montelione, G.; Tong, L.; et al

    2007-01-01

    Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNA methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF{_}0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP{_}1951), and a 12-stranded {beta}-barrel with a novel fold (V. parahaemolyticus VPA1032).

  14. 2004 Structural, Function and Evolutionary Genomics

    Energy Technology Data Exchange (ETDEWEB)

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  15. LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs

    DEFF Research Database (Denmark)

    Will, Sebastian; Joshi, Tejal; Hofacker, Ivo L.

    2012-01-01

    Current genomic screens for noncoding RNAs (ncRNAs) predict a large number of genomic regions containing potential structural ncRNAs. The analysis of these data requires highly accurate prediction of ncRNA boundaries and discrimination of promising candidate ncRNAs from weak predictions. Existing...... methods struggle with these goals because they rely on sequence-based multiple sequence alignments, which regularly misalign RNA structure and therefore do not support identification of structural similarities. To overcome this limitation, we compute columnwise and global reliabilities of alignments based...... on sequence and structure similarity; we refer to these structure-based alignment reliabilities as STARs. The columnwise STARs of alignments, or STAR profiles, provide a versatile tool for the manual and automatic analysis of ncRNAs. In particular, we improve the boundary prediction of the widely used nc...

  16. Accurate estimation of short read mapping quality for next-generation genome sequencing

    Science.gov (United States)

    Ruffalo, Matthew; Koyutürk, Mehmet; Ray, Soumya; LaFramboise, Thomas

    2012-01-01

    Motivation: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment—in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants. Approach: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality. Results: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can ‘resurrect’ many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms. Availability: LoQuM is available as open source at http://compbio.case.edu/loqum/. Contact: matthew.ruffalo@case.edu. PMID:22962451

  17. Tools for Accurate and Efficient Analysis of Complex Evolutionary Mechanisms in Microbial Genomes. Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Nakhleh, Luay

    2014-03-12

    I proposed to develop computationally efficient tools for accurate detection and reconstruction of microbes' complex evolutionary mechanisms, thus enabling rapid and accurate annotation, analysis and understanding of their genomes. To achieve this goal, I proposed to address three aspects. (1) Mathematical modeling. A major challenge facing the accurate detection of HGT is that of distinguishing between these two events on the one hand and other events that have similar "effects." I proposed to develop a novel mathematical approach for distinguishing among these events. Further, I proposed to develop a set of novel optimization criteria for the evolutionary analysis of microbial genomes in the presence of these complex evolutionary events. (2) Algorithm design. In this aspect of the project, I proposed to develop an array of e cient and accurate algorithms for analyzing microbial genomes based on the formulated optimization criteria. Further, I proposed to test the viability of the criteria and the accuracy of the algorithms in an experimental setting using both synthetic as well as biological data. (3) Software development. I proposed the nal outcome to be a suite of software tools which implements the mathematical models as well as the algorithms developed.

  18. Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.

    Science.gov (United States)

    Liu, Ruolin; Dickerson, Julie

    2017-11-01

    We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.

  19. Accurately controlled sequential self-folding structures by polystyrene film

    Science.gov (United States)

    Deng, Dongping; Yang, Yang; Chen, Yong; Lan, Xing; Tice, Jesse

    2017-08-01

    Four-dimensional (4D) printing overcomes the traditional fabrication limitations by designing heterogeneous materials to enable the printed structures evolve over time (the fourth dimension) under external stimuli. Here, we present a simple 4D printing of self-folding structures that can be sequentially and accurately folded. When heated above their glass transition temperature pre-strained polystyrene films shrink along the XY plane. In our process silver ink traces printed on the film are used to provide heat stimuli by conducting current to trigger the self-folding behavior. The parameters affecting the folding process are studied and discussed. Sequential folding and accurately controlled folding angles are achieved by using printed ink traces and angle lock design. Theoretical analyses are done to guide the design of the folding processes. Programmable structures such as a lock and a three-dimensional antenna are achieved to test the feasibility and potential applications of this method. These self-folding structures change their shapes after fabrication under controlled stimuli (electric current) and have potential applications in the fields of electronics, consumer devices, and robotics. Our design and fabrication method provides an easy way by using silver ink printed on polystyrene films to 4D print self-folding structures for electrically induced sequential folding with angular control.

  20. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots.

    Science.gov (United States)

    Hajdin, Christine E; Bellaousov, Stanislav; Huggins, Wayne; Leonard, Christopher W; Mathews, David H; Weeks, Kevin M

    2013-04-02

    A pseudoknot forms in an RNA when nucleotides in a loop pair with a region outside the helices that close the loop. Pseudoknots occur relatively rarely in RNA but are highly overrepresented in functionally critical motifs in large catalytic RNAs, in riboswitches, and in regulatory elements of viruses. Pseudoknots are usually excluded from RNA structure prediction algorithms. When included, these pairings are difficult to model accurately, especially in large RNAs, because allowing this structure dramatically increases the number of possible incorrect folds and because it is difficult to search the fold space for an optimal structure. We have developed a concise secondary structure modeling approach that combines SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) experimental chemical probing information and a simple, but robust, energy model for the entropic cost of single pseudoknot formation. Structures are predicted with iterative refinement, using a dynamic programming algorithm. This melded experimental and thermodynamic energy function predicted the secondary structures and the pseudoknots for a set of 21 challenging RNAs of known structure ranging in size from 34 to 530 nt. On average, 93% of known base pairs were predicted, and all pseudoknots in well-folded RNAs were identified.

  1. Visualization of RNA structure models within the Integrative Genomics Viewer.

    Science.gov (United States)

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  2. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

    Directory of Open Access Journals (Sweden)

    Dewey Colin N

    2011-08-01

    Full Text Available Abstract Background RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. Results We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. Conclusions RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost

  3. READSCAN: A fast and scalable pathogen discovery program with accurate genome relative abundance estimation

    KAUST Repository

    Naeem, Raeece

    2012-11-28

    Summary: READSCAN is a highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets. READSCAN accurately classified human and viral sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf compute cluster with 16 nodes (Supplementary Material). Availability: http://cbrc.kaust.edu.sa/readscan Contact: or raeece.naeem@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. 2012 The Author(s).

  4. READSCAN: A fast and scalable pathogen discovery program with accurate genome relative abundance estimation

    KAUST Repository

    Naeem, Raeece; Rashid, Mamoon; Pain, Arnab

    2012-01-01

    Summary: READSCAN is a highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets. READSCAN accurately classified human and viral sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf compute cluster with 16 nodes (Supplementary Material). Availability: http://cbrc.kaust.edu.sa/readscan Contact: or raeece.naeem@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. 2012 The Author(s).

  5. Genomics technologies to study structural variations in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Cardone Maria Francesca

    2016-01-01

    Full Text Available Grapevine is one of the most important crop plants in the world. Recently there was great expansion of genomics resources about grapevine genome, thus providing increasing efforts for molecular breeding. Current cultivars display a great level of inter-specific differentiation that needs to be investigated to reach a comprehensive understanding of the genetic basis of phenotypic differences, and to find responsible genes selected by cross breeding programs. While there have been significant advances in resolving the pattern and nature of single nucleotide polymorphisms (SNPs on plant genomes, few data are available on copy number variation (CNV. Furthermore association between structural variations and phenotypes has been described in only a few cases. We combined high throughput biotechnologies and bioinformatics tools, to reveal the first inter-varietal atlas of structural variation (SV for the grapevine genome. We sequenced and compared four table grape cultivars with the Pinot noir inbred line PN40024 genome as the reference. We detected roughly 8% of the grapevine genome affected by genomic variations. Taken into account phenotypic differences existing among the studied varieties we performed comparison of SVs among them and the reference and next we performed an in-depth analysis of gene content of polymorphic regions. This allowed us to identify genes showing differences in copy number as putative functional candidates for important traits in grapevine cultivation.

  6. Sequence- vs. chip-assisted genomic selection: accurate biological information is advised.

    Science.gov (United States)

    Pérez-Enciso, Miguel; Rincón, Juan C; Legarra, Andrés

    2015-05-09

    The development of next-generation sequencing technologies (NGS) has made the use of whole-genome sequence data for routine genetic evaluations possible, which has triggered a considerable interest in animal and plant breeding fields. Here, we investigated whether complete or partial sequence data can improve upon existing SNP (single nucleotide polymorphism) array-based selection strategies by simulation using a mixed coalescence - gene-dropping approach. We simulated 20 or 100 causal mutations (quantitative trait nucleotides, QTN) within 65 predefined 'gene' regions, each 10 kb long, within a genome composed of ten 3-Mb chromosomes. We compared prediction accuracy by cross-validation using a medium-density chip (7.5 k SNPs), a high-density (HD, 17 k) and sequence data (335 k). Genetic evaluation was based on a GBLUP method. The simulations showed: (1) a law of diminishing returns with increasing number of SNPs; (2) a modest effect of SNP ascertainment bias in arrays; (3) a small advantage of using whole-genome sequence data vs. HD arrays i.e. ~4%; (4) a minor effect of NGS errors except when imputation error rates are high (≥20%); and (5) if QTN were known, prediction accuracy approached 1. Since this is obviously unrealistic, we explored milder assumptions. We showed that, if all SNPs within causal genes were included in the prediction model, accuracy could also dramatically increase by ~40%. However, this criterion was highly sensitive to either misspecification (including wrong genes) or to the use of an incomplete gene list; in these cases, accuracy fell rapidly towards that reached when all SNPs from sequence data were blindly included in the model. Our study shows that, unless an accurate prior estimate on the functionality of SNPs can be included in the predictor, there is a law of diminishing returns with increasing SNP density. As a result, use of whole-genome sequence data may not result in a highly increased selection response over high

  7. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

    KAUST Repository

    Magana-Mora, Arturo

    2017-08-15

    BackgroundPolyadenylation is a critical stage of RNA processing during the formation of mature mRNA, and is present in most of the known eukaryote protein-coding transcripts and many long non-coding RNAs. The correct identification of poly(A) signals (PAS) not only helps to elucidate the 3′-end genomic boundaries of a transcribed DNA region and gene regulatory mechanisms but also gives insight into the multiple transcript isoforms resulting from alternative PAS. Although progress has been made in the in-silico prediction of genomic signals, the recognition of PAS in DNA genomic sequences remains a challenge.ResultsIn this study, we analyzed human genomic DNA sequences for the 12 most common PAS variants. Our analysis has identified a set of features that helps in the recognition of true PAS, which may be involved in the regulation of the polyadenylation process. The proposed features, in combination with a recognition model, resulted in a novel method and tool, Omni-PolyA. Omni-PolyA combines several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep neural networks, and Omni-PolyA. Results show that Omni-PolyA significantly reduced the average classification error rate by 35.37% in the prediction of the 12 considered PAS variants relative to the state-of-the-art results.ConclusionsThe results of our study demonstrate that Omni-PolyA is currently the most accurate model for the prediction of PAS in human and can serve as a useful complement to other PAS recognition methods. Omni-PolyA is publicly available as an online tool accessible at www.cbrc.kaust.edu.sa/omnipolya/.

  8. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.

    Science.gov (United States)

    Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D

    2015-05-01

    Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.

  9. Accurate protein structure modeling using sparse NMR data and homologous structure information.

    Science.gov (United States)

    Thompson, James M; Sgourakis, Nikolaos G; Liu, Gaohua; Rossi, Paolo; Tang, Yuefeng; Mills, Jeffrey L; Szyperski, Thomas; Montelione, Gaetano T; Baker, David

    2012-06-19

    While information from homologous structures plays a central role in X-ray structure determination by molecular replacement, such information is rarely used in NMR structure determination because it can be incorrect, both locally and globally, when evolutionary relationships are inferred incorrectly or there has been considerable evolutionary structural divergence. Here we describe a method that allows robust modeling of protein structures of up to 225 residues by combining (1)H(N), (13)C, and (15)N backbone and (13)Cβ chemical shift data, distance restraints derived from homologous structures, and a physically realistic all-atom energy function. Accurate models are distinguished from inaccurate models generated using incorrect sequence alignments by requiring that (i) the all-atom energies of models generated using the restraints are lower than models generated in unrestrained calculations and (ii) the low-energy structures converge to within 2.0 Å backbone rmsd over 75% of the protein. Benchmark calculations on known structures and blind targets show that the method can accurately model protein structures, even with very remote homology information, to a backbone rmsd of 1.2-1.9 Å relative to the conventional determined NMR ensembles and of 0.9-1.6 Å relative to X-ray structures for well-defined regions of the protein structures. This approach facilitates the accurate modeling of protein structures using backbone chemical shift data without need for side-chain resonance assignments and extensive analysis of NOESY cross-peak assignments.

  10. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA.

    Science.gov (United States)

    Gibbons, Brian; Datta, Parikkhit; Wu, Ying; Chan, Alan; Al Armour, John

    2006-06-30

    Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH) we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A). Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  11. Structural genomic variation in ischemic stroke

    Science.gov (United States)

    Matarin, Mar; Simon-Sanchez, Javier; Fung, Hon-Chung; Scholz, Sonja; Gibbs, J. Raphael; Hernandez, Dena G.; Crews, Cynthia; Britton, Angela; Wavrant De Vrieze, Fabienne; Brott, Thomas G.; Brown, Robert D.; Worrall, Bradford B.; Silliman, Scott; Case, L. Douglas; Hardy, John A.; Rich, Stephen S.; Meschia, James F.; Singleton, Andrew B.

    2008-01-01

    Technological advances in molecular genetics allow rapid and sensitive identification of genomic copy number variants (CNVs). This, in turn, has sparked interest in the function such variation may play in disease. While a role for copy number mutations as a cause of Mendelian disorders is well established, it is unclear whether CNVs may affect risk for common complex disorders. We sought to investigate whether CNVs may modulate risk for ischemic stroke (IS) and to provide a catalog of CNVs in patients with this disorder by analyzing copy number metrics produced as a part of our previous genome-wide single-nucleotide polymorphism (SNP)-based association study of ischemic stroke in a North American white population. We examined CNVs in 263 patients with ischemic stroke (IS). Each identified CNV was compared with changes identified in 275 neurologically normal controls. Our analysis identified 247 CNVs, corresponding to 187 insertions (76%; 135 heterozygous; 25 homozygous duplications or triplications; 2 heterosomic) and 60 deletions (24%; 40 heterozygous deletions;3 homozygous deletions; 14 heterosomic deletions). Most alterations (81%) were the same as, or overlapped with, previously reported CNVs. We report here the first genome-wide analysis of CNVs in IS patients. In summary, our study did not detect any common genomic structural variation unequivocally linked to IS, although we cannot exclude that smaller CNVs or CNVs in genomic regions poorly covered by this methodology may confer risk for IS. The application of genome-wide SNP arrays now facilitates the evaluation of structural changes through the entire genome as part of a genome-wide genetic association study. PMID:18288507

  12. Functional Coverage of the Human Genome by Existing Structures, Structural Genomics Targets, and Homology Models.

    Directory of Open Access Journals (Sweden)

    2005-08-01

    Full Text Available The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB, target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB, it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  13. Microarray MAPH: accurate array-based detection of relative copy number in genomic DNA

    Directory of Open Access Journals (Sweden)

    Chan Alan

    2006-06-01

    Full Text Available Abstract Background Current methods for measurement of copy number do not combine all the desirable qualities of convenience, throughput, economy, accuracy and resolution. In this study, to improve the throughput associated with Multiplex Amplifiable Probe Hybridisation (MAPH we aimed to develop a modification based on the 3-Dimensional, Flow-Through Microarray Platform from PamGene International. In this new method, electrophoretic analysis of amplified products is replaced with photometric analysis of a probed oligonucleotide array. Copy number analysis of hybridised probes is based on a dual-label approach by comparing the intensity of Cy3-labelled MAPH probes amplified from test samples co-hybridised with similarly amplified Cy5-labelled reference MAPH probes. The key feature of using a hybridisation-based end point with MAPH is that discrimination of amplified probes is based on sequence and not fragment length. Results In this study we showed that microarray MAPH measurement of PMP22 gene dosage correlates well with PMP22 gene dosage determined by capillary MAPH and that copy number was accurately reported in analyses of DNA from 38 individuals, 12 of which were known to have Charcot-Marie-Tooth disease type 1A (CMT1A. Conclusion Measurement of microarray-based endpoints for MAPH appears to be of comparable accuracy to electrophoretic methods, and holds the prospect of fully exploiting the potential multiplicity of MAPH. The technology has the potential to simplify copy number assays for genes with a large number of exons, or of expanded sets of probes from dispersed genomic locations.

  14. Fast and accurate phylogenetic reconstruction from high-resolution whole-genome data and a novel robustness estimator.

    Science.gov (United States)

    Lin, Y; Rajan, V; Moret, B M E

    2011-09-01

    The rapid accumulation of whole-genome data has renewed interest in the study of genomic rearrangements. Comparative genomics, evolutionary biology, and cancer research all require models and algorithms to elucidate the mechanisms, history, and consequences of these rearrangements. However, even simple models lead to NP-hard problems, particularly in the area of phylogenetic analysis. Current approaches are limited to small collections of genomes and low-resolution data (typically a few hundred syntenic blocks). Moreover, whereas phylogenetic analyses from sequence data are deemed incomplete unless bootstrapping scores (a measure of confidence) are given for each tree edge, no equivalent to bootstrapping exists for rearrangement-based phylogenetic analysis. We describe a fast and accurate algorithm for rearrangement analysis that scales up, in both time and accuracy, to modern high-resolution genomic data. We also describe a novel approach to estimate the robustness of results-an equivalent to the bootstrapping analysis used in sequence-based phylogenetic reconstruction. We present the results of extensive testing on both simulated and real data showing that our algorithm returns very accurate results, while scaling linearly with the size of the genomes and cubically with their number. We also present extensive experimental results showing that our approach to robustness testing provides excellent estimates of confidence, which, moreover, can be tuned to trade off thresholds between false positives and false negatives. Together, these two novel approaches enable us to attack heretofore intractable problems, such as phylogenetic inference for high-resolution vertebrate genomes, as we demonstrate on a set of six vertebrate genomes with 8,380 syntenic blocks. A copy of the software is available on demand.

  15. Using Genomics for Natural Product Structure Elucidation.

    Science.gov (United States)

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  16. Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

    Science.gov (United States)

    Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

    2015-03-15

    The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Interrogating the druggable genome with structural informatics.

    Science.gov (United States)

    Hambly, Kevin; Danzer, Joseph; Muskal, Steven; Debe, Derek A

    2006-08-01

    Structural genomics projects are producing protein structure data at an unprecedented rate. In this paper, we present the Target Informatics Platform (TIP), a novel structural informatics approach for amplifying the rapidly expanding body of experimental protein structure information to enhance the discovery and optimization of small molecule protein modulators on a genomic scale. In TIP, existing experimental structure information is augmented using a homology modeling approach, and binding sites across multiple target families are compared using a clique detection algorithm. We report here a detailed analysis of the structural coverage for the set of druggable human targets, highlighting drug target families where the level of structural knowledge is currently quite high, as well as those areas where structural knowledge is sparse. Furthermore, we demonstrate the utility of TIP's intra- and inter-family binding site similarity analysis using a series of retrospective case studies. Our analysis underscores the utility of a structural informatics infrastructure for extracting drug discovery-relevant information from structural data, aiding researchers in the identification of lead discovery and optimization opportunities as well as potential "off-target" liabilities.

  18. Accurate reanalysis of structures by a preconditioned conjugate gradient method

    Czech Academy of Sciences Publication Activity Database

    Kirsch, U.; Kočvara, Michal; Zowe, J.

    2002-01-01

    Roč. 55, č. 2 (2002), s. 233-251 ISSN 0029-5981 R&D Projects: GA AV ČR IAA1075005 Grant - others:BMBF(DE) 03ZOM3ER Institutional research plan: CEZ:AV0Z1075907 Keywords : preconditioned conjugate gradient s * structural reanalysis Subject RIV: BA - General Mathematics Impact factor: 1.468, year: 2002

  19. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2004-07-14

    The structural genomics project is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy which is medically and biologically relevant, of good value, and tractable. As an option to consider, we present the Pfam5000 strategy, which involves selecting the 5000 most important families from the Pfam database as sources for targets. We compare the Pfam5000 strategy to several other proposed strategies that would require similar numbers of targets. These include including complete solution of several small to moderately sized bacterial proteomes, partial coverage of the human proteome, and random selection of approximately 5000 targets from sequenced genomes. We measure the impact that successful implementation of these strategies would have upon structural interpretation of the proteins in Swiss-Prot, TrEMBL, and 131 complete proteomes (including 10 of eukaryotes) from the Proteome Analysis database at EBI. Solving the structures of proteins from the 5000 largest Pfam families would allow accurate fold assignment for approximately 68 percent of all prokaryotic proteins (covering 59 percent of residues) and 61 percent of eukaryotic proteins (40 percent of residues). More fine-grained coverage which would allow accurate modeling of these proteins would require an order of magnitude more targets. The Pfam5000 strategy may be modified in several ways, for example to focus on larger families, bacterial sequences, or eukaryotic sequences; as long as secondary consideration is given to large families within Pfam, coverage results vary only slightly. In contrast, focusing structural genomics on a single tractable genome would have only a limited impact in structural knowledge of other proteomes: a significant fraction (about 30-40 percent of the proteins, and 40-60 percent of the residues) of each proteome is classified in small

  20. Gene Composer in a structural genomics environment

    International Nuclear Information System (INIS)

    Lorimer, Don; Raymond, Amy; Mixon, Mark; Burgin, Alex; Staker, Bart; Stewart, Lance

    2011-01-01

    For structural biology applications, protein-construct engineering is guided by comparative sequence analysis and structural information, which allow the researcher to better define domain boundaries for terminal deletions and nonconserved regions for surface mutants. A database software application called Gene Composer has been developed to facilitate construct design. The structural genomics effort at the Seattle Structural Genomics Center for Infectious Disease (SSGCID) requires the manipulation of large numbers of amino-acid sequences and the underlying DNA sequences which are to be cloned into expression vectors. To improve efficiency in high-throughput protein structure determination, a database software package, Gene Composer, has been developed which facilitates the information-rich design of protein constructs and their underlying gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bioinformatics steps used in modern structure-guided protein engineering and synthetic gene engineering. An example of the structure determination of H1N1 RNA-dependent RNA polymerase PB2 subunit is given

  1. Structural genomic variations and Parkinson's disease.

    Science.gov (United States)

    Bandrés-Ciga, Sara; Ruz, Clara; Barrero, Francisco J; Escamilla-Sevilla, Francisco; Pelegrina, Javier; Vives, Francisco; Duran, Raquel

    2017-10-01

    Parkinson's disease (PD) is the second most common neurodegenerative disease, whose prevalence is projected to be between 8.7 and 9.3 million by 2030. Until about 20 years ago, PD was considered to be the textbook example of a "non-genetic" disorder. Nowadays, PD is generally considered a multifactorial disorder that arises from the combination and complex interaction of genes and environmental factors. To date, a total of 7 genes including SNCA, LRRK2, PARK2, DJ-1, PINK 1, VPS35 and ATP13A2 have been seen to cause unequivocally Mendelian PD. Also, variants with incomplete penetrance in the genes LRRK2 and GBA are considered to be strong risk factors for PD worldwide. Although genetic studies have provided valuable insights into the pathogenic mechanisms underlying PD, the role of structural variation in PD has been understudied in comparison with other genomic variations. Structural genomic variations might substantially account for such genetic substrates yet to be discovered. The present review aims to provide an overview of the structural genomic variants implicated in the pathogenesis of PD.

  2. Refining the structure and content of clinical genomic reports.

    Science.gov (United States)

    Dorschner, Michael O; Amendola, Laura M; Shirts, Brian H; Kiedrowski, Lesli; Salama, Joseph; Gordon, Adam S; Fullerton, Stephanie M; Tarczy-Hornoch, Peter; Byers, Peter H; Jarvik, Gail P

    2014-03-01

    To effectively articulate the results of exome and genome sequencing we refined the structure and content of molecular test reports. To communicate results of a randomized control trial aimed at the evaluation of exome sequencing for clinical medicine, we developed a structured narrative report. With feedback from genetics and non-genetics professionals, we developed separate indication-specific and incidental findings reports. Standard test report elements were supplemented with research study-specific language, which highlighted the limitations of exome sequencing and provided detailed, structured results, and interpretations. The report format we developed to communicate research results can easily be transformed for clinical use by removal of research-specific statements and disclaimers. The development of clinical reports for exome sequencing has shown that accurate and open communication between the clinician and laboratory is ideally an ongoing process to address the increasing complexity of molecular genetic testing. © 2014 Wiley Periodicals, Inc.

  3. A method for accurate detection of genomic microdeletions using real-time quantitative PCR

    Directory of Open Access Journals (Sweden)

    Bassett Anne S

    2005-12-01

    Full Text Available Abstract Background Quantitative Polymerase Chain Reaction (qPCR is a well-established method for quantifying levels of gene expression, but has not been routinely applied to the detection of constitutional copy number alterations of human genomic DNA. Microdeletions or microduplications of the human genome are associated with a variety of genetic disorders. Although, clinical laboratories routinely use fluorescence in situ hybridization (FISH to identify such cryptic genomic alterations, there remains a significant number of individuals in which constitutional genomic imbalance is suspected, based on clinical parameters, but cannot be readily detected using current cytogenetic techniques. Results In this study, a novel application for real-time qPCR is presented that can be used to reproducibly detect chromosomal microdeletions and microduplications. This approach was applied to DNA from a series of patient samples and controls to validate genomic copy number alteration at cytoband 22q11. The study group comprised 12 patients with clinical symptoms of chromosome 22q11 deletion syndrome (22q11DS, 1 patient trisomic for 22q11 and 4 normal controls. 6 of the patients (group 1 had known hemizygous deletions, as detected by standard diagnostic FISH, whilst the remaining 6 patients (group 2 were classified as 22q11DS negative using the clinical FISH assay. Screening of the patients and controls with a set of 10 real time qPCR primers, spanning the 22q11.2-deleted region and flanking sequence, confirmed the FISH assay results for all patients with 100% concordance. Moreover, this qPCR enabled a refinement of the region of deletion at 22q11. Analysis of DNA from chromosome 22 trisomic sample demonstrated genomic duplication within 22q11. Conclusion In this paper we present a qPCR approach for the detection of chromosomal microdeletions and microduplications. The strategic use of in silico modelling for qPCR primer design to avoid regions of repetitive

  4. Structured Matrix Completion with Applications to Genomic Data Integration.

    Science.gov (United States)

    Cai, Tianxi; Cai, T Tony; Zhang, Anru

    2016-01-01

    Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

  5. Use of Whole-Genus Genome Sequence Data To Develop a Multilocus Sequence Typing Tool That Accurately Identifies Yersinia Isolates to the Species and Subspecies Levels

    Science.gov (United States)

    Hall, Miquette; Chattaway, Marie A.; Reuter, Sandra; Savin, Cyril; Strauch, Eckhard; Carniel, Elisabeth; Connor, Thomas; Van Damme, Inge; Rajakaruna, Lakshani; Rajendram, Dunstan; Jenkins, Claire; Thomson, Nicholas R.

    2014-01-01

    The genus Yersinia is a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identify Yersinia strains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic species Yersinia enterocolitica to whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification of Yersinia isolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available for Y. enterocolitica. PMID:25339391

  6. Rapid and Accurate Sequencing of Enterovirus Genomes Using MinION Nanopore Sequencer.

    Science.gov (United States)

    Wang, Ji; Ke, Yue Hua; Zhang, Yong; Huang, Ke Qiang; Wang, Lei; Shen, Xin Xin; Dong, Xiao Ping; Xu, Wen Bo; Ma, Xue Jun

    2017-10-01

    Knowledge of an enterovirus genome sequence is very important in epidemiological investigation to identify transmission patterns and ascertain the extent of an outbreak. The MinION sequencer is increasingly used to sequence various viral pathogens in many clinical situations because of its long reads, portability, real-time accessibility of sequenced data, and very low initial costs. However, information is lacking on MinION sequencing of enterovirus genomes. In this proof-of-concept study using Enterovirus 71 (EV71) and Coxsackievirus A16 (CA16) strains as examples, we established an amplicon-based whole genome sequencing method using MinION. We explored the accuracy, minimum sequencing time, discrimination and high-throughput sequencing ability of MinION, and compared its performance with Sanger sequencing. Within the first minute (min) of sequencing, the accuracy of MinION was 98.5% for the single EV71 strain and 94.12%-97.33% for 10 genetically-related CA16 strains. In as little as 14 min, 99% identity was reached for the single EV71 strain, and in 17 min (on average), 99% identity was achieved for 10 CA16 strains in a single run. MinION is suitable for whole genome sequencing of enteroviruses with sufficient accuracy and fine discrimination and has the potential as a fast, reliable and convenient method for routine use. Copyright © 2017 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

  7. Accurate DNA assembly and genome engineering with optimized uracil excision cloning

    DEFF Research Database (Denmark)

    Cavaleiro, Mafalda; Kim, Se Hyeuk; Seppala, Susanna

    2015-01-01

    Simple and reliable DNA editing by uracil excision (a.k.a. USER cloning) has been described by several research groups, but the optimal design of cohesive DNA ends for multigene assembly remains elusive. Here, we use two model constructs based on expression of gfp and a four-gene pathway that pro......Simple and reliable DNA editing by uracil excision (a.k.a. USER cloning) has been described by several research groups, but the optimal design of cohesive DNA ends for multigene assembly remains elusive. Here, we use two model constructs based on expression of gfp and a four-gene pathway...... that produces β-carotene to optimize assembly junctions and the uracil excision protocol. By combining uracil excision cloning with a genomic integration technology, we demonstrate that up to six DNA fragments can be assembled in a one-tube reaction for direct genome integration with high accuracy, greatly...... facilitating the advanced engineering of robust cell factories....

  8. Whole-genome modeling accurately predicts quantitative traits, as revealed in plants.

    OpenAIRE

    Tatarinova, Tatiana; Shin, Min-Gyoung; Marjoram, Paul; Nuzhdin, Sergey; Triska, Martin; Rickauer, Martina; Nikolsky, Yuri; Mazurier, Melanie; Gentzbittel, Laurent; Ben, Cecile

    2016-01-01

    Many adaptive events in natural populations, as well as response to artificial selection, are caused by polygenic action. Under selective pressure, the adaptive traits can quickly respond via small allele frequency shifts spread across numerous loci. We hypothesize that a large proportion of current phenotypic variation between individuals may be best explained by population admixture. We thus consider the complete, genome-wide universe of genetic variability, spread across several ancestral ...

  9. High-throughput automated microfluidic sample preparation for accurate microbial genomics.

    Science.gov (United States)

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C

    2017-01-27

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications.

  10. Accurate Energies and Structures for Large Water Clusters Using the X3LYP Hybrid Density Functional

    OpenAIRE

    Su, Julius T.; Xu, Xin; Goddard, William A., III

    2004-01-01

    We predict structures and energies of water clusters containing up to 19 waters with X3LYP, an extended hybrid density functional designed to describe noncovalently bound systems as accurately as covalent systems. Our work establishes X3LYP as the most practical ab initio method today for calculating accurate water cluster structures and energies. We compare X3LYP/aug-cc-pVTZ energies to the most accurate theoretical values available (n = 2−6, 8), MP2 with basis set superposition error (BSSE)...

  11. Genomic inference accurately predicts the timing and severity of a recent bottleneck in a non-model insect population

    Science.gov (United States)

    McCoy, Rajiv C.; Garud, Nandita R.; Kelley, Joanna L.; Boggs, Carol L.; Petrov, Dmitri A.

    2015-01-01

    The analysis of molecular data from natural populations has allowed researchers to answer diverse ecological questions that were previously intractable. In particular, ecologists are often interested in the demographic history of populations, information that is rarely available from historical records. Methods have been developed to infer demographic parameters from genomic data, but it is not well understood how inferred parameters compare to true population history or depend on aspects of experimental design. Here we present and evaluate a method of SNP discovery using RNA-sequencing and demographic inference using the program δaδi, which uses a diffusion approximation to the allele frequency spectrum to fit demographic models. We test these methods in a population of the checkerspot butterfly Euphydryas gillettii. This population was intentionally introduced to Gothic, Colorado in 1977 and has since experienced extreme fluctuations including bottlenecks of fewer than 25 adults, as documented by nearly annual field surveys. Using RNA-sequencing of eight individuals from Colorado and eight individuals from a native population in Wyoming, we generate the first genomic resources for this system. While demographic inference is commonly used to examine ancient demography, our study demonstrates that our inexpensive, all-in-one approach to marker discovery and genotyping provides sufficient data to accurately infer the timing of a recent bottleneck. This demographic scenario is relevant for many species of conservation concern, few of which have sequenced genomes. Our results are remarkably insensitive to sample size or number of genomic markers, which has important implications for applying this method to other non-model systems. PMID:24237665

  12. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome.

    Directory of Open Access Journals (Sweden)

    Jian Li

    Full Text Available The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR mediated by low-copy repeats (LCRs. Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ~1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR-mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.

  13. Deformable meshes for medical image segmentation accurate automatic segmentation of anatomical structures

    CERN Document Server

    Kainmueller, Dagmar

    2014-01-01

    ? Segmentation of anatomical structures in medical image data is an essential task in clinical practice. Dagmar Kainmueller introduces methods for accurate fully automatic segmentation of anatomical structures in 3D medical image data. The author's core methodological contribution is a novel deformation model that overcomes limitations of state-of-the-art Deformable Surface approaches, hence allowing for accurate segmentation of tip- and ridge-shaped features of anatomical structures. As for practical contributions, she proposes application-specific segmentation pipelines for a range of anatom

  14. CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation

    Directory of Open Access Journals (Sweden)

    Li Gong-Hua

    2010-08-01

    Full Text Available Abstract Background The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB, thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm, has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods. Results The evaluation of CMASA shows that the CMASA is highly accurate (0.96, sensitive (0.86, and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function and SPASM (a local structure alignment method; and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods. The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA. Conclusions The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the

  15. Child Development and Structural Variation in the Human Genome

    Science.gov (United States)

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  16. Integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants.

    Directory of Open Access Journals (Sweden)

    Jiang Du

    2009-07-01

    Full Text Available The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen, with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs. SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome. To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of

  17. G2S: A web-service for annotating genomic variants on 3D protein structures.

    Science.gov (United States)

    Wang, Juexin; Sheridan, Robert; Sumer, S Onur; Schultz, Nikolaus; Xu, Dong; Gao, Jianjiong

    2018-01-27

    Accurately mapping and annotating genomic locations on 3D protein structures is a key step in structure-based analysis of genomic variants detected by recent large-scale sequencing efforts. There are several mapping resources currently available, but none of them provides a web API (Application Programming Interface) that support programmatic access. We present G2S, a real-time web API that provides automated mapping of genomic variants on 3D protein structures. G2S can align genomic locations of variants, protein locations, or protein sequences to protein structures and retrieve the mapped residues from structures. G2S API uses REST-inspired design conception and it can be used by various clients such as web browsers, command terminals, programming languages and other bioinformatics tools for bringing 3D structures into genomic variant analysis. The webserver and source codes are freely available at https://g2s.genomenexus.org. g2s@genomenexus.org. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  18. Comparative genomics of the relationship between gene structure and expression

    NARCIS (Netherlands)

    Ren, X.

    2006-01-01

    The relationship between the structure of genes and their expression is a relatively new aspect of genome organization and regulation. With more genome sequences and expression data becoming available, bioinformatics approaches can help the further elucidation of the relationships between gene

  19. Structural biology at York Structural Biology Laboratory; laboratory information management systems for structural genomics

    Czech Academy of Sciences Publication Activity Database

    Dohnálek, Jan

    2005-01-01

    Roč. 12, č. 1 (2005), s. 3 ISSN 1211-5894. [Meeting of Structural Biologists /4./. 10.03.2005-12.03.2005, Nové Hrady] R&D Projects: GA MŠk(CZ) 1K05008 Keywords : structural biology * LIMS * structural genomics Subject RIV: CD - Macromolecular Chemistry

  20. Structural Genomics of Minimal Organisms: Pipeline and Results

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  1. Tree decomposition based fast search of RNA structures including pseudoknots in genomes.

    Science.gov (United States)

    Song, Yinglei; Liu, Chunmei; Malmberg, Russell; Pan, Fangfang; Cai, Liming

    2005-01-01

    Searching genomes for RNA secondary structure with computational methods has become an important approach to the annotation of non-coding RNAs. However, due to the lack of efficient algorithms for accurate RNA structure-sequence alignment, computer programs capable of fast and effectively searching genomes for RNA secondary structures have not been available. In this paper, a novel RNA structure profiling model is introduced based on the notion of a conformational graph to specify the consensus structure of an RNA family. Tree decomposition yields a small tree width t for such conformation graphs (e.g., t = 2 for stem loops and only a slight increase for pseudo-knots). Within this modelling framework, the optimal alignment of a sequence to the structure model corresponds to finding a maximum valued isomorphic subgraph and consequently can be accomplished through dynamic programming on the tree decomposition of the conformational graph in time O(k(t)N(2)), where k is a small parameter; and N is the size of the projiled RNA structure. Experiments show that the application of the alignment algorithm to search in genomes yields the same search accuracy as methods based on a Covariance model with a significant reduction in computation time. In particular; very accurate searches of tmRNAs in bacteria genomes and of telomerase RNAs in yeast genomes can be accomplished in days, as opposed to months required by other methods. The tree decomposition based searching tool is free upon request and can be downloaded at our site h t t p ://w.uga.edu/RNA-informatics/software/index.php.

  2. Pathgroups, a dynamic data structure for genome reconstruction problems.

    Science.gov (United States)

    Zheng, Chunfang

    2010-07-01

    Ancestral gene order reconstruction problems, including the median problem, quartet construction, small phylogeny, guided genome halving and genome aliquoting, are NP hard. Available heuristics dedicated to each of these problems are computationally costly for even small instances. We present a data structure enabling rapid heuristic solution to all these ancestral genome reconstruction problems. A generic greedy algorithm with look-ahead based on an automatically generated priority system suffices for all the problems using this data structure. The efficiency of the algorithm is due to fast updating of the structure during run time and to the simplicity of the priority scheme. We illustrate with the first rapid algorithm for quartet construction and apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny. http://albuquerque.bioinformatics.uottawa.ca/pathgroup/Quartet.html chunfang313@gmail.com Supplementary data are available at Bioinformatics online.

  3. Structural dynamics of retroviral genome and the packaging.

    Science.gov (United States)

    Miyazaki, Yasuyuki; Miyake, Ariko; Nomaguchi, Masako; Adachi, Akio

    2011-01-01

    Retroviruses can cause diseases such as AIDS, leukemia, and tumors, but are also used as vectors for human gene therapy. All retroviruses, except foamy viruses, package two copies of unspliced genomic RNA into their progeny viruses. Understanding the molecular mechanisms of retroviral genome packaging will aid the design of new anti-retroviral drugs targeting the packaging process and improve the efficacy of retroviral vectors. Retroviral genomes have to be specifically recognized by the cognate nucleocapsid domain of the Gag polyprotein from among an excess of cellular and spliced viral mRNA. Extensive virological and structural studies have revealed how retroviral genomic RNA is selectively packaged into the viral particles. The genomic area responsible for the packaging is generally located in the 5' untranslated region (5' UTR), and contains dimerization site(s). Recent studies have shown that retroviral genome packaging is modulated by structural changes of RNA at the 5' UTR accompanied by the dimerization. In this review, we focus on three representative retroviruses, Moloney murine leukemia virus, human immunodeficiency virus type 1 and 2, and describe the molecular mechanism of retroviral genome packaging.

  4. Structural dynamics of retroviral genome and the packaging

    Directory of Open Access Journals (Sweden)

    Yasuyuki eMiyazaki

    2011-12-01

    Full Text Available Retroviruses can cause diseases such as AIDS, leukemia and tumors, but are also used as vectors for human gene therapy. All retroviruses, except foamy viruses, package two copies of unspliced genomic RNA into their progeny viruses. Understanding the molecular mechanisms of retroviral genome packaging will aid the design of new anti-retroviral drugs targeting the packaging process and improve the efficacy of retroviral vectors. Retroviral genomes have to be specifically recognized by the cognate nucleocapsid (NC domain of the Gag polyprotein from among an excess of cellular and spliced viral mRNA. Extensive virological and structural studies have revealed how retroviral genomic RNA is selectively packaged into the viral particles. The genomic area responsible for the packaging is generally located in the 5’ untranslated region (5’ UTR, and contains dimerization site(s. Recent studies have shown that retroviral genome packaging is modulated by structural changes of RNA at the 5’ UTR accompanied by the dimerization. In this review, we focus on three representative retroviruses, Moloney murine leukemia virus (MoMLV, human immunodeficiency virus type 1 (HIV-1 and 2 (HIV-2, and describe the molecular mechanism of retroviral genome packaging.

  5. Functional RNA structures throughout the Hepatitis C Virus genome.

    Science.gov (United States)

    Adams, Rebecca L; Pirakitikulr, Nathan; Pyle, Anna Marie

    2017-06-01

    The single-stranded Hepatitis C Virus (HCV) genome adopts a set of elaborate RNA structures that are involved in every stage of the viral lifecycle. Recent advances in chemical probing, sequencing, and structural biology have facilitated analysis of RNA folding on a genome-wide scale, revealing novel structures and networks of interactions. These studies have underscored the active role played by RNA in every function of HCV and they open the door to new types of RNA-targeted therapeutics. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Spectral entropy criteria for structural segmentation in genomic DNA sequences

    International Nuclear Information System (INIS)

    Chechetkin, V.R.; Lobzin, V.V.

    2004-01-01

    The spectral entropy is calculated with Fourier structure factors and characterizes the level of structural ordering in a sequence of symbols. It may efficiently be applied to the assessment and reconstruction of the modular structure in genomic DNA sequences. We present the relevant spectral entropy criteria for the local and non-local structural segmentation in DNA sequences. The results are illustrated with the model examples and analysis of intervening exon-intron segments in the protein-coding regions

  7. Screened exchange hybrid density functional for accurate and efficient structures and interaction energies.

    Science.gov (United States)

    Brandenburg, Jan Gerit; Caldeweyher, Eike; Grimme, Stefan

    2016-06-21

    We extend the recently introduced PBEh-3c global hybrid density functional [S. Grimme et al., J. Chem. Phys., 2015, 143, 054107] by a screened Fock exchange variant based on the Henderson-Janesko-Scuseria exchange hole model. While the excellent performance of the global hybrid is maintained for small covalently bound molecules, its performance for computed condensed phase mass densities is further improved. Most importantly, a speed up of 30 to 50% can be achieved and especially for small orbital energy gap cases, the method is numerically much more robust. The latter point is important for many applications, e.g., for metal-organic frameworks, organic semiconductors, or protein structures. This enables an accurate density functional based electronic structure calculation of a full DNA helix structure on a single core desktop computer which is presented as an example in addition to comprehensive benchmark results.

  8. Structural genomics of infectious disease drug targets: the SSGCID

    International Nuclear Information System (INIS)

    Stacy, Robin; Begley, Darren W.; Phan, Isabelle; Staker, Bart L.; Van Voorhis, Wesley C.; Varani, Gabriele; Buchko, Garry W.; Stewart, Lance J.; Myler, Peter J.

    2011-01-01

    An introduction and overview of the focus, goals and overall mission of the Seattle Structural Genomics Center for Infectious Disease (SSGCID) is given. The Seattle Structural Genomics Center for Infectious Disease (SSGCID) is a consortium of researchers at Seattle BioMed, Emerald BioStructures, the University of Washington and Pacific Northwest National Laboratory that was established to apply structural genomics approaches to drug targets from infectious disease organisms. The SSGCID is currently funded over a five-year period by the National Institute of Allergy and Infectious Diseases (NIAID) to determine the three-dimensional structures of 400 proteins from a variety of Category A, B and C pathogens. Target selection engages the infectious disease research and drug-therapy communities to identify drug targets, essential enzymes, virulence factors and vaccine candidates of biomedical relevance to combat infectious diseases. The protein-expression systems, purified proteins, ligand screens and three-dimensional structures produced by SSGCID constitute a valuable resource for drug-discovery research, all of which is made freely available to the greater scientific community. This issue of Acta Crystallographica Section F, entirely devoted to the work of the SSGCID, covers the details of the high-throughput pipeline and presents a series of structures from a broad array of pathogenic organisms. Here, a background is provided on the structural genomics of infectious disease, the essential components of the SSGCID pipeline are discussed and a survey of progress to date is presented

  9. Structural Genomics and Drug Discovery for Infectious Diseases

    International Nuclear Information System (INIS)

    Anderson, W.F.

    2009-01-01

    The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.

  10. Structural determinants and mechanism of HIV-1 genome packaging.

    Science.gov (United States)

    Lu, Kun; Heng, Xiao; Summers, Michael F

    2011-07-22

    Like all retroviruses, the human immunodeficiency virus selectively packages two copies of its unspliced RNA genome, both of which are utilized for strand-transfer-mediated recombination during reverse transcription-a process that enables rapid evolution under environmental and chemotherapeutic pressures. The viral RNA appears to be selected for packaging as a dimer, and there is evidence that dimerization and packaging are mechanistically coupled. Both processes are mediated by interactions between the nucleocapsid domains of a small number of assembling viral Gag polyproteins and RNA elements within the 5'-untranslated region of the genome. A number of secondary structures have been predicted for regions of the genome that are responsible for packaging, and high-resolution structures have been determined for a few small RNA fragments and protein-RNA complexes. However, major questions regarding the RNA structures (and potentially the structural changes) that are responsible for dimeric genome selection remain unanswered. Here, we review efforts that have been made to identify the molecular determinants and mechanism of human immunodeficiency virus type 1 genome packaging. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Towards the accurate electronic structure descriptions of typical high-constant dielectrics

    Science.gov (United States)

    Jiang, Ting-Ting; Sun, Qing-Qing; Li, Ye; Guo, Jiao-Jiao; Zhou, Peng; Ding, Shi-Jin; Zhang, David Wei

    2011-05-01

    High-constant dielectrics have gained considerable attention due to their wide applications in advanced devices, such as gate oxides in metal-oxide-semiconductor devices and insulators in high-density metal-insulator-metal capacitors. However, the theoretical investigations of these materials cannot fulfil the requirement of experimental development, especially the requirement for the accurate description of band structures. We performed first-principles calculations based on the hybrid density functionals theory to investigate several typical high-k dielectrics such as Al2O3, HfO2, ZrSiO4, HfSiO4, La2O3 and ZrO2. The band structures of these materials are well described within the framework of hybrid density functionals theory. The band gaps of Al2O3, HfO2, ZrSiO4, HfSiO4, La2O3 and ZrO2are calculated to be 8.0 eV, 5.6 eV, 6.2 eV, 7.1 eV, 5.3 eV and 5.0 eV, respectively, which are very close to the experimental values and far more accurate than those obtained by the traditional generalized gradient approximation method.

  12. Multi-scale structural community organisation of the human genome.

    Science.gov (United States)

    Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin

    2017-04-11

    Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.

  13. The Impact of Structural Genomics: Expectations and Outcomes

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Brenner, Steven E.

    2005-12-21

    Structural Genomics (SG) projects aim to expand our structural knowledge of biological macromolecules, while lowering the average costs of structure determination. We quantitatively analyzed the novelty, cost, and impact of structures solved by SG centers, and contrast these results with traditional structural biology. The first structure from a protein family is particularly important to reveal the fold and ancient relationships to other proteins. In the last year, approximately half of such structures were solved at a SG center rather than in a traditional laboratory. Furthermore, the cost of solving a structure at the most efficient U.S. center has now dropped to one-quarter the estimated cost of solving a structure by traditional methods. However, top structural biology laboratories are much more efficient than the average, and comparable to SG centers despite working on very challenging structures. Moreover, traditional structural biology papers are cited significantly more often, suggesting greater current impact.

  14. High throughput platforms for structural genomics of integral membrane proteins.

    Science.gov (United States)

    Mancia, Filippo; Love, James

    2011-08-01

    Structural genomics approaches on integral membrane proteins have been postulated for over a decade, yet specific efforts are lagging years behind their soluble counterparts. Indeed, high throughput methodologies for production and characterization of prokaryotic integral membrane proteins are only now emerging, while large-scale efforts for eukaryotic ones are still in their infancy. Presented here is a review of recent literature on actively ongoing structural genomics of membrane protein initiatives, with a focus on those aimed at implementing interesting techniques aimed at increasing our rate of success for this class of macromolecules. Copyright © 2011 Elsevier Ltd. All rights reserved.

  15. Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing

    DEFF Research Database (Denmark)

    Hollegaard, Mads Vilhelm; Grauholm, Jonas; Nielsen, Ronni

    2013-01-01

    Dried blood spot samples (DBSS) have been collected and stored for decades as part of newborn screening programmes worldwide. Representing almost an entire population under a certain age and collected with virtually no bias, the Newborn Screening Biobanks are of immense value in medical studies......, for example, to examine the genetics of various disorders. We have previously demonstrated that DNA extracted from a fraction (2×3.2mm discs) of an archived DBSS can be whole genome amplified (wgaDNA) and used for accurate array genotyping. However, until now, it has been uncertain whether wgaDNA from DBSS...... can be used for accurate whole genome sequencing (WGS) and exome sequencing (WES). This study examined two individuals represented by three different types of samples each: whole-blood (reference samples), 3-year-old DBSS spotted with reference material (refDBSS), and 27- to 29-year-old archived...

  16. Megabase replication domains along the human genome: relation to chromatin structure and genome organisation.

    Science.gov (United States)

    Audit, Benjamin; Zaghloul, Lamia; Baker, Antoine; Arneodo, Alain; Chen, Chun-Long; d'Aubenton-Carafa, Yves; Thermes, Claude

    2013-01-01

    In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.

  17. Accurate structures and energetics of neutral-framework zeotypes from dispersion-corrected DFT calculations

    Science.gov (United States)

    Fischer, Michael; Angel, Ross J.

    2017-05-01

    Density-functional theory (DFT) calculations incorporating a pairwise dispersion correction were employed to optimize the structures of various neutral-framework compounds with zeolite topologies. The calculations used the PBE functional for solids (PBEsol) in combination with two different dispersion correction schemes, the D2 correction devised by Grimme and the TS correction of Tkatchenko and Scheffler. In the first part of the study, a benchmarking of the DFT-optimized structures against experimental crystal structure data was carried out, considering a total of 14 structures (8 all-silica zeolites, 4 aluminophosphate zeotypes, and 2 dense phases). Both PBEsol-D2 and PBEsol-TS showed an excellent performance, improving significantly over the best-performing approach identified in a previous study (PBE-TS). The temperature dependence of lattice parameters and bond lengths was assessed for those zeotypes where the available experimental data permitted such an analysis. In most instances, the agreement between DFT and experiment improved when the experimental data were corrected for the effects of thermal motion and when low-temperature structure data rather than room-temperature structure data were used as a reference. In the second part, a benchmarking against experimental enthalpies of transition (with respect to α-quartz) was carried out for 16 all-silica zeolites. Excellent agreement was obtained with the PBEsol-D2 functional, with the overall error being in the same range as the experimental uncertainty. Altogether, PBEsol-D2 can be recommended as a computationally efficient DFT approach that simultaneously delivers accurate structures and energetics of neutral-framework zeotypes.

  18. Fuzzy Reasoning to More Accurately Determine Void Areas on Optical Micrographs of Composite Structures

    Science.gov (United States)

    Dominquez, Jesus A.; Tate, Lanetra C.; Wright, M. Clara; Caraccio, Anne

    2013-01-01

    Accomplishing the best-performing composite matrix (resin) requires that not only the processing method but also the cure cycle generate low-void-content structures. If voids are present, the performance of the composite matrix will be significantly reduced. This is usually noticed by significant reductions in matrix-dominated properties, such as compression and shear strength. Voids in composite materials are areas that are absent of the composite components: matrix and fibers. The characteristics of the voids and their accurate estimation are critical to determine for high performance composite structures. One widely used method of performing void analysis on a composite structure sample is acquiring optical micrographs or Scanning Electron Microscope (SEM) images of lateral sides of the sample and retrieving the void areas within the micrographs/images using an image analysis technique. Segmentation for the retrieval and subsequent computation of void areas within the micrographs/images is challenging as the gray-scaled values of the void areas are close to the gray-scaled values of the matrix leading to the need of manually performing the segmentation based on the histogram of the micrographs/images to retrieve the void areas. The use of an algorithm developed by NASA and based on Fuzzy Reasoning (FR) proved to overcome the difficulty of suitably differentiate void and matrix image areas with similar gray-scaled values leading not only to a more accurate estimation of void areas on composite matrix micrographs but also to a faster void analysis process as the algorithm is fully autonomous.

  19. Evolutionary genomics and population structure of Entamoeba histolytica

    Directory of Open Access Journals (Sweden)

    Koushik Das

    2014-11-01

    Full Text Available Amoebiasis caused by the gastrointestinal parasite Entamoeba histolytica has diverse disease outcomes. Study of genome and evolution of this fascinating parasite will help us to understand the basis of its virulence and explain why, when and how it causes diseases. In this review, we have summarized current knowledge regarding evolutionary genomics of E. histolytica and discussed their association with parasite phenotypes and its differential pathogenic behavior. How genetic diversity reveals parasite population structure has also been discussed. Queries concerning their evolution and population structure which were required to be addressed have also been highlighted. This significantly large amount of genomic data will improve our knowledge about this pathogenic species of Entamoeba.

  20. BEYOND ELLIPSE(S): ACCURATELY MODELING THE ISOPHOTAL STRUCTURE OF GALAXIES WITH ISOFIT AND CMODEL

    International Nuclear Information System (INIS)

    Ciambur, B. C.

    2015-01-01

    This work introduces a new fitting formalism for isophotes that enables more accurate modeling of galaxies with non-elliptical shapes, such as disk galaxies viewed edge-on or galaxies with X-shaped/peanut bulges. Within this scheme, the angular parameter that defines quasi-elliptical isophotes is transformed from the commonly used, but inappropriate, polar coordinate to the “eccentric anomaly.” This provides a superior description of deviations from ellipticity, better capturing the true isophotal shape. Furthermore, this makes it possible to accurately recover both the surface brightness profile, using the correct azimuthally averaged isophote, and the two-dimensional model of any galaxy: the hitherto ubiquitous, but artificial, cross-like features in residual images are completely removed. The formalism has been implemented into the Image Reduction and Analysis Facility tasks Ellipse and Bmodel to create the new tasks “Isofit,” and “Cmodel.” The new tools are demonstrated here with application to five galaxies, chosen to be representative case-studies for several areas where this technique makes it possible to gain new scientific insight. Specifically: properly quantifying boxy/disky isophotes via the fourth harmonic order in edge-on galaxies, quantifying X-shaped/peanut bulges, higher-order Fourier moments for modeling bars in disks, and complex isophote shapes. Higher order (n > 4) harmonics now become meaningful and may correlate with structural properties, as boxyness/diskyness is known to do. This work also illustrates how the accurate construction, and subtraction, of a model from a galaxy image facilitates the identification and recovery of over-lapping sources such as globular clusters and the optical counterparts of X-ray sources

  1. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

    DEFF Research Database (Denmark)

    Li, Yingrui; Zheng, Hancheng; Luo, Ruibang

    2011-01-01

    Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1-50 kb) including insertions, deletions, inversions and their precise...

  2. Structured RNAs and synteny regions in the pig genome

    DEFF Research Database (Denmark)

    Anthon, Christian; Tafer, Hakim; Havgaard, Jakob H

    2014-01-01

    BACKGROUND: Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However......, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. RESULTS: We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure...... lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome...

  3. cDNA structure, genomic organization and expression patterns of ...

    African Journals Online (AJOL)

    Visfatin was a newly identified adipocytokine, which was involved in various physiologic and pathologic processes of organisms. The cDNA structure, genomic organization and expression patterns of silver Prussian carp visfatin were described in this report. The silver Prussian carp visfatin cDNA cloned from the liver was ...

  4. N-Methyl Inversion and Accurate Equilibrium Structures in Alkaloids: Pseudopelletierine.

    Science.gov (United States)

    Vallejo-López, Montserrat; Écija, Patricia; Vogt, Natalja; Demaison, Jean; Lesarri, Alberto; Basterretxea, Francisco J; Cocinero, Emilio J

    2017-11-21

    A rotational spectroscopy investigation has resolved the conformational equilibrium and structural properties of the alkaloid pseudopelletierine. Two different conformers, which originate from inversion of the N-methyl group from an axial to an equatorial position, have been unambiguously identified in the gas phase, and nine independent isotopologues have been recorded by Fourier-transform microwave spectroscopy in a jet expansion. Both conformers share a chair-chair configuration of the two bridged six-membered rings. The conformational equilibrium is displaced towards the axial form, with a relative population in the supersonic jet of N axial /N equatorial ≈2/1. An accurate equilibrium structure has been determined by using the semiexperimental mixed-estimation method and alternatively computed by quantum-chemical methods up to the coupled-cluster level of theory. A comparison with the N-methyl inversion equilibria in related tropanes is also presented. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Genome-Scale Metabolic Model for the Green Alga Chlorella vulgaris UTEX 395 Accurately Predicts Phenotypes under Autotrophic, Heterotrophic, and Mixotrophic Growth Conditions.

    Science.gov (United States)

    Zuñiga, Cristal; Li, Chien-Ting; Huelsman, Tyler; Levering, Jennifer; Zielinski, Daniel C; McConnell, Brian O; Long, Christopher P; Knoshaug, Eric P; Guarnieri, Michael T; Antoniewicz, Maciek R; Betenbaugh, Michael J; Zengler, Karsten

    2016-09-01

    The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. © 2016 American Society of Plant Biologists. All rights reserved.

  6. Genome-Scale Metabolic Model for the Green Alga Chlorella vulgaris UTEX 395 Accurately Predicts Phenotypes under Autotrophic, Heterotrophic, and Mixotrophic Growth Conditions1

    Science.gov (United States)

    Zuñiga, Cristal; Li, Chien-Ting; Zielinski, Daniel C.; Guarnieri, Michael T.; Antoniewicz, Maciek R.; Zengler, Karsten

    2016-01-01

    The green microalga Chlorella vulgaris has been widely recognized as a promising candidate for biofuel production due to its ability to store high lipid content and its natural metabolic versatility. Compartmentalized genome-scale metabolic models constructed from genome sequences enable quantitative insight into the transport and metabolism of compounds within a target organism. These metabolic models have long been utilized to generate optimized design strategies for an improved production process. Here, we describe the reconstruction, validation, and application of a genome-scale metabolic model for C. vulgaris UTEX 395, iCZ843. The reconstruction represents the most comprehensive model for any eukaryotic photosynthetic organism to date, based on the genome size and number of genes in the reconstruction. The highly curated model accurately predicts phenotypes under photoautotrophic, heterotrophic, and mixotrophic conditions. The model was validated against experimental data and lays the foundation for model-driven strain design and medium alteration to improve yield. Calculated flux distributions under different trophic conditions show that a number of key pathways are affected by nitrogen starvation conditions, including central carbon metabolism and amino acid, nucleotide, and pigment biosynthetic pathways. Furthermore, model prediction of growth rates under various medium compositions and subsequent experimental validation showed an increased growth rate with the addition of tryptophan and methionine. PMID:27372244

  7. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    Science.gov (United States)

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3  ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA

  8. Considerations in the identification of functional RNA structural elements in genomic alignments

    Directory of Open Access Journals (Sweden)

    Blencowe Benjamin J

    2007-01-01

    Full Text Available Abstract Background Accurate identification of novel, functional noncoding (nc RNA features in genome sequence has proven more difficult than for exons. Current algorithms identify and score potential RNA secondary structures on the basis of thermodynamic stability, conservation, and/or covariance in sequence alignments. Neither the algorithms nor the information gained from the individual inputs have been independently assessed. Furthermore, due to issues in modelling background signal, it has been difficult to gauge the precision of these algorithms on a genomic scale, in which even a seemingly small false-positive rate can result in a vast excess of false discoveries. Results We developed a shuffling algorithm, shuffle-pair.pl, that simultaneously preserves dinucleotide frequency, gaps, and local conservation in pairwise sequence alignments. We used shuffle-pair.pl to assess precision and recall of six ncRNA search tools (MSARI, QRNA, ddbRNA, RNAz, Evofold, and several variants of simple thermodynamic stability on a test set of 3046 alignments of known ncRNAs. Relative to mononucleotide shuffling, preservation of dinucleotide content in shuffling the alignments resulted in a drastic increase in estimated false-positive detection rates for ncRNA elements, precluding evaluation of higher order alignments, which cannot not be adequately shuffled maintaining both dinucleotides and alignment structure. On pairwise alignments, none of the covariance-based tools performed markedly better than thermodynamic scoring alone. Although the high false-positive rates call into question the veracity of any individual predicted secondary structural element in our analysis, we nevertheless identified intriguing global trends in human genome alignments. The distribution of ncRNA prediction scores in 75-base windows overlapping UTRs, introns, and intergenic regions analyzed using both thermodynamic stability and EvoFold (which has no thermodynamic component was

  9. Chromatin structure and evolution in the human genome

    Directory of Open Access Journals (Sweden)

    Dunlop Malcolm G

    2007-05-01

    Full Text Available Abstract Background Evolutionary rates are not constant across the human genome but genes in close proximity have been shown to experience similar levels of divergence and selection. The higher-order organisation of chromosomes has often been invoked to explain such phenomena but previously there has been insufficient data on chromosome structure to investigate this rigorously. Using the results of a recent genome-wide analysis of open and closed human chromatin structures we have investigated the global association between divergence, selection and chromatin structure for the first time. Results In this study we have shown that, paradoxically, synonymous site divergence (dS at non-CpG sites is highest in regions of open chromatin, primarily as a result of an increased number of transitions, while the rates of other traditional measures of mutation (intergenic, intronic and ancient repeat divergence as well as SNP density are highest in closed regions of the genome. Analysis of human-chimpanzee divergence across intron-exon boundaries indicates that although genes in relatively open chromatin generally display little selection at their synonymous sites, those in closed regions show markedly lower divergence at their fourfold degenerate sites than in neighbouring introns and intergenic regions. Exclusion of known Exonic Splice Enhancer hexamers has little affect on the divergence observed at fourfold degenerate sites across chromatin categories; however, we show that closed chromatin is enriched with certain classes of ncRNA genes whose RNA secondary structure may be particularly important. Conclusion We conclude that, overall, non-CpG mutation rates are lowest in open regions of the genome and that regions of the genome with a closed chromatin structure have the highest background mutation rate. This might reflect lower rates of DNA damage or enhanced DNA repair processes in regions of open chromatin. Our results also indicate that dS is a poor

  10. Straightforward and accurate technique for post-coupler stabilization in drift tube linac structures

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Khalvati

    2016-04-01

    Full Text Available The axial electric field of Alvarez drift tube linacs (DTLs is known to be susceptible to variations due to static and dynamic effects like manufacturing tolerances and beam loading. Post-couplers are used to stabilize the accelerating fields of DTLs against tuning errors. Tilt sensitivity and its slope have been introduced as measures for the stability right from the invention of post-couplers but since then the actual stabilization has mostly been done by tedious iteration. In the present article, the local tilt-sensitivity slope TS_{n}^{′} is established as the principal measure for stabilization instead of tilt sensitivity or some visual slope, and its significance is developed on the basis of an equivalent-circuit diagram of the DTL. Experimental and 3D simulation results are used to analyze its behavior and to define a technique for stabilization that allows finding the best post-coupler settings with just four tilt-sensitivity measurements. CERN’s Linac4 DTL Tank 2 and Tank 3 have been stabilized successfully using this technique. The final tilt-sensitivity error has been reduced from ±100%/MHz down to ±3%/MHz for Tank 2 and down to ±1%/MHz for Tank 3. Finally, an accurate procedure for tuning the structure using slug tuners is discussed.

  11. Straightforward and accurate technique for post-coupler stabilization in drift tube linac structures

    Science.gov (United States)

    Khalvati, Mohammad Reza; Ramberger, Suitbert

    2016-04-01

    The axial electric field of Alvarez drift tube linacs (DTLs) is known to be susceptible to variations due to static and dynamic effects like manufacturing tolerances and beam loading. Post-couplers are used to stabilize the accelerating fields of DTLs against tuning errors. Tilt sensitivity and its slope have been introduced as measures for the stability right from the invention of post-couplers but since then the actual stabilization has mostly been done by tedious iteration. In the present article, the local tilt-sensitivity slope TSn' is established as the principal measure for stabilization instead of tilt sensitivity or some visual slope, and its significance is developed on the basis of an equivalent-circuit diagram of the DTL. Experimental and 3D simulation results are used to analyze its behavior and to define a technique for stabilization that allows finding the best post-coupler settings with just four tilt-sensitivity measurements. CERN's Linac4 DTL Tank 2 and Tank 3 have been stabilized successfully using this technique. The final tilt-sensitivity error has been reduced from ±100 %/MHz down to ±3 %/MHz for Tank 2 and down to ±1 %/MHz for Tank 3. Finally, an accurate procedure for tuning the structure using slug tuners is discussed.

  12. Crystal Structures of DNA-Whirly Complexes and Their Role in Arabidopsis Organelle Genome Repair

    Energy Technology Data Exchange (ETDEWEB)

    Cappadocia, Laurent; Maréchal, Alexandre; Parent, Jean-Sébastien; Lepage, Étienne; Sygusch, Jurgen; Brisson, Normand (Montreal)

    2010-09-07

    DNA double-strand breaks are highly detrimental to all organisms and need to be quickly and accurately repaired. Although several proteins are known to maintain plastid and mitochondrial genome stability in plants, little is known about the mechanisms of DNA repair in these organelles and the roles of specific proteins. Here, using ciprofloxacin as a DNA damaging agent specific to the organelles, we show that plastids and mitochondria can repair DNA double-strand breaks through an error-prone pathway similar to the microhomology-mediated break-induced replication observed in humans, yeast, and bacteria. This pathway is negatively regulated by the single-stranded DNA (ssDNA) binding proteins from the Whirly family, thus indicating that these proteins could contribute to the accurate repair of plant organelle genomes. To understand the role of Whirly proteins in this process, we solved the crystal structures of several Whirly-DNA complexes. These reveal a nonsequence-specific ssDNA binding mechanism in which DNA is stabilized between domains of adjacent subunits and rendered unavailable for duplex formation and/or protein interactions. Our results suggest a model in which the binding of Whirly proteins to ssDNA would favor accurate repair of DNA double-strand breaks over an error-prone microhomology-mediated break-induced replication repair pathway.

  13. Elucidation of Operon Structures across Closely Related Bacterial Genomes

    Science.gov (United States)

    Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components. PMID:24959722

  14. Ribosomal DNA sequence heterogeneity reflects intraspecies phylogenies and predicts genome structure in two contrasting yeast species.

    Science.gov (United States)

    West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N

    2014-07-01

    The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of

  15. SINEs, evolution and genome structure in the opossum.

    Science.gov (United States)

    Gu, Wanjun; Ray, David A; Walker, Jerilyn A; Barnes, Erin W; Gentles, Andrew J; Samollow, Paul B; Jurka, Jerzy; Batzer, Mark A; Pollock, David D

    2007-07-01

    Short INterspersed Elements (SINEs) are non-autonomous retrotransposons, usually between 100 and 500 base pairs (bp) in length, which are ubiquitous components of eukaryotic genomes. Their activity, distribution, and evolution can be highly informative on genomic structure and evolutionary processes. To determine recent activity, we amplified more than one hundred SINE1 loci in a panel of 43 M. domestica individuals derived from five diverse geographic locations. The SINE1 family has expanded recently enough that many loci were polymorphic, and the SINE1 insertion-based genetic distances among populations reflected geographic distance. Genome-wide comparisons of SINE1 densities and GC content revealed that high SINE1 density is associated with high GC content in a few long and many short spans. Young SINE1s, whether fixed or polymorphic, showed an unbiased GC content preference for insertion, indicating that the GC preference accumulates over long time periods, possibly in periodic bursts. SINE1 evolution is thus broadly similar to human Alu evolution, although it has an independent origin. High GC content adjacent to SINE1s is strongly correlated with bias towards higher AT to GC substitutions and lower GC to AT substitutions. This is consistent with biased gene conversion, and also indicates that like chickens, but unlike eutherian mammals, GC content heterogeneity (isochore structure) is reinforced by substitution processes in the M. domestica genome. Nevertheless, both high and low GC content regions are apparently headed towards lower GC content equilibria, possibly due to a relative shift to lower recombination rates in the recent Monodelphis ancestral lineage. Like eutherians, metatherian (marsupial) mammals have evolved high CpG substitution rates, but this is apparently a convergence in process rather than a shared ancestral state.

  16. Population Structure and Genomic Breed Composition in an Angus-Brahman Crossbred Cattle Population.

    Science.gov (United States)

    Gobena, Mesfin; Elzo, Mauricio A; Mateescu, Raluca G

    2018-01-01

    Crossbreeding is a common strategy used in tropical and subtropical regions to enhance beef production, and having accurate knowledge of breed composition is essential for the success of a crossbreeding program. Although pedigree records have been traditionally used to obtain the breed composition of crossbred cattle, the accuracy of pedigree-based breed composition can be reduced by inaccurate and/or incomplete records and Mendelian sampling. Breed composition estimation from genomic data has multiple advantages including higher accuracy without being affected by missing, incomplete, or inaccurate records and the ability to be used as independent authentication of breed in breed-labeled beef products. The present study was conducted with 676 Angus-Brahman crossbred cattle with genotype and pedigree information to evaluate the feasibility and accuracy of using genomic data to determine breed composition. We used genomic data in parametric and non-parametric methods to detect population structure due to differences in breed composition while accounting for the confounding effect of close familial relationships. By applying principal component analysis (PCA) and the maximum likelihood method of ADMIXTURE to genomic data, it was possible to successfully characterize population structure resulting from heterogeneous breed ancestry, while accounting for close familial relationships. PCA results offered additional insight into the different hierarchies of genetic variation structuring. The first principal component was strongly correlated with Angus-Brahman proportions, and the second represented variation within animals that have a relatively more extended Brangus lineage-indicating the presence of a distinct pattern of genetic variation in these cattle. Although there was strong agreement between breed proportions estimated from pedigree and genetic information, there were significant discrepancies between these two methods for certain animals. This was most likely due

  17. Population Structure and Genomic Breed Composition in an Angus–Brahman Crossbred Cattle Population

    Directory of Open Access Journals (Sweden)

    Mesfin Gobena

    2018-03-01

    Full Text Available Crossbreeding is a common strategy used in tropical and subtropical regions to enhance beef production, and having accurate knowledge of breed composition is essential for the success of a crossbreeding program. Although pedigree records have been traditionally used to obtain the breed composition of crossbred cattle, the accuracy of pedigree-based breed composition can be reduced by inaccurate and/or incomplete records and Mendelian sampling. Breed composition estimation from genomic data has multiple advantages including higher accuracy without being affected by missing, incomplete, or inaccurate records and the ability to be used as independent authentication of breed in breed-labeled beef products. The present study was conducted with 676 Angus–Brahman crossbred cattle with genotype and pedigree information to evaluate the feasibility and accuracy of using genomic data to determine breed composition. We used genomic data in parametric and non-parametric methods to detect population structure due to differences in breed composition while accounting for the confounding effect of close familial relationships. By applying principal component analysis (PCA and the maximum likelihood method of ADMIXTURE to genomic data, it was possible to successfully characterize population structure resulting from heterogeneous breed ancestry, while accounting for close familial relationships. PCA results offered additional insight into the different hierarchies of genetic variation structuring. The first principal component was strongly correlated with Angus–Brahman proportions, and the second represented variation within animals that have a relatively more extended Brangus lineage—indicating the presence of a distinct pattern of genetic variation in these cattle. Although there was strong agreement between breed proportions estimated from pedigree and genetic information, there were significant discrepancies between these two methods for certain animals

  18. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes

    DEFF Research Database (Denmark)

    Parker, Brian John; Moltke, Ida; Roth, Adam

    2011-01-01

    a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein...

  19. Omni-PolyA: a method and tool for accurate recognition of Poly(A) signals in human genomic DNA

    KAUST Repository

    Magana-Mora, Arturo; Kalkatawi, Manal M.; Bajic, Vladimir B.

    2017-01-01

    several machine learning techniques such as different classifiers in a tree-like decision structure and genetic algorithms for deriving a robust classification model. We performed a comparison between results obtained by state-of-the-art methods, deep

  20. Macromolecular structure determination in the post-genome era

    CERN Document Server

    Kuhn, P

    2001-01-01

    Recent advances in genetics, molecular biology and crystallographic instrumentation and methodology have led to a revolution in the field of Structural Molecular Biology (SMB). These combined advances have paved the way to a more complete and detailed understanding of the biological macromolecules that make up an organism, both in terms of their individual functions and also the interactions between them. In this paper we describe a large-scale, genomic approach to the three-dimensional structure determination of macromolecules and their complexes, using high-throughput methodology to streamline all aspects of the process. This task requires the development of automated high-intensity synchrotron beam lines for X-ray diffraction data collection from single crystal samples. Furthermore, these beam lines must be operated within a sophisticated software and hardware environment, which is capable of delivering a completely automated structure determination pipeline. The SMB resource at SSRL is developing a system...

  1. The complete chloroplast genome sequence of Podocarpus lambertii: genome structure, evolutionary aspects, gene content and SSR detection.

    Directory of Open Access Journals (Sweden)

    Leila do Nascimento Vieira

    Full Text Available BACKGROUND: Podocarpus lambertii (Podocarpaceae is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. METHODOLOGY/PRINCIPAL FINDINGS: The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR. It contains 118 unique genes and one duplicated tRNA (trnN-GUU, which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi and Araucariaceae (Agathis dammara. Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. CONCLUSION: The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of

  2. An accurate model for numerical prediction of piezoelectric energy harvesting from fluid structure interaction problems

    International Nuclear Information System (INIS)

    Amini, Y; Emdad, H; Farid, M

    2014-01-01

    Piezoelectric energy harvesting (PEH) from ambient energy sources, particularly vibrations, has attracted considerable interest throughout the last decade. Since fluid flow has a high energy density, it is one of the best candidates for PEH. Indeed, a piezoelectric energy harvesting process from the fluid flow takes the form of natural three-way coupling of the turbulent fluid flow, the electromechanical effect of the piezoelectric material and the electrical circuit. There are some experimental and numerical studies about piezoelectric energy harvesting from fluid flow in literatures. Nevertheless, accurate modeling for predicting characteristics of this three-way coupling has not yet been developed. In the present study, accurate modeling for this triple coupling is developed and validated by experimental results. A new code based on this modeling in an openFOAM platform is developed. (paper)

  3. Recognizing genes and other components of genomic structure

    Energy Technology Data Exchange (ETDEWEB)

    Burks, C. (Los Alamos National Lab., NM (USA)); Myers, E. (Arizona Univ., Tucson, AZ (USA). Dept. of Computer Science); Stormo, G.D. (Colorado Univ., Boulder, CO (USA). Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  4. Structural constraints in the packaging of bluetongue virus genomic segments.

    Science.gov (United States)

    Burkhardt, Christiane; Sung, Po-Yu; Celma, Cristina C; Roy, Polly

    2014-10-01

    The mechanism used by bluetongue virus (BTV) to ensure the sorting and packaging of its 10 genomic segments is still poorly understood. In this study, we investigated the packaging constraints for two BTV genomic segments from two different serotypes. Segment 4 (S4) of BTV serotype 9 was mutated sequentially and packaging of mutant ssRNAs was investigated by two newly developed RNA packaging assay systems, one in vivo and the other in vitro. Modelling of the mutated ssRNA followed by biochemical data analysis suggested that a conformational motif formed by interaction of the 5' and 3' ends of the molecule was necessary and sufficient for packaging. A similar structural signal was also identified in S8 of BTV serotype 1. Furthermore, the same conformational analysis of secondary structures for positive-sense ssRNAs was used to generate a chimeric segment that maintained the putative packaging motif but contained unrelated internal sequences. This chimeric segment was packaged successfully, confirming that the motif identified directs the correct packaging of the segment. © 2014 The Authors.

  5. Identification of genomic indels and structural variations using split reads

    Directory of Open Access Journals (Sweden)

    Urban Alexander E

    2011-07-01

    Full Text Available Abstract Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC, a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read. All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions. A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models. This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions. We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole

  6. Secure web book to store structural genomics research data.

    Science.gov (United States)

    Manjasetty, Babu A; Höppner, Klaus; Mueller, Uwe; Heinemann, Udo

    2003-01-01

    Recently established collaborative structural genomics programs aim at significantly accelerating the crystal structure analysis of proteins. These large-scale projects require efficient data management systems to ensure seamless collaboration between different groups of scientists working towards the same goal. Within the Berlin-based Protein Structure Factory, the synchrotron X-ray data collection and the subsequent crystal structure analysis tasks are located at BESSY, a third-generation synchrotron source. To organize file-based communication and data transfer at the BESSY site of the Protein Structure Factory, we have developed the web-based BCLIMS, the BESSY Crystallography Laboratory Information Management System. BCLIMS is a relational data management system which is powered by MySQL as the database engine and Apache HTTP as the web server. The database interface routines are written in Python programing language. The software is freely available to academic users. Here we describe the storage, retrieval and manipulation of laboratory information, mainly pertaining to the synchrotron X-ray diffraction experiments and the subsequent protein structure analysis, using BCLIMS.

  7. Training set optimization under population structure in genomic selection.

    Science.gov (United States)

    Isidro, Julio; Jannink, Jean-Luc; Akdemir, Deniz; Poland, Jesse; Heslot, Nicolas; Sorrells, Mark E

    2015-01-01

    Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.

  8. Asexual populations of the human malaria parasite, Plasmodium falciparum, use a two-step genomic strategy to acquire accurate, beneficial DNA amplifications.

    Directory of Open Access Journals (Sweden)

    Jennifer L Guler

    Full Text Available Malaria drug resistance contributes to up to a million annual deaths. Judicious deployment of new antimalarials and vaccines could benefit from an understanding of early molecular events that promote the evolution of parasites. Continuous in vitro challenge of Plasmodium falciparum parasites with a novel dihydroorotate dehydrogenase (DHODH inhibitor reproducibly selected for resistant parasites. Genome-wide analysis of independently-derived resistant clones revealed a two-step strategy to evolutionary success. Some haploid blood-stage parasites first survive antimalarial pressure through fortuitous DNA duplications that always included the DHODH gene. Independently-selected parasites had different sized amplification units but they were always flanked by distant A/T tracks. Higher level amplification and resistance was attained using a second, more efficient and more accurate, mechanism for head-to-tail expansion of the founder unit. This second homology-based process could faithfully tune DNA copy numbers in either direction, always retaining the unique DNA amplification sequence from the original A/T-mediated duplication for that parasite line. Pseudo-polyploidy at relevant genomic loci sets the stage for gaining additional mutations at the locus of interest. Overall, we reveal a population-based genomic strategy for mutagenesis that operates in human stages of P. falciparum to efficiently yield resistance-causing genetic changes at the correct locus in a successful parasite. Importantly, these founding events arise with precision; no other new amplifications are seen in the resistant haploid blood stage parasite. This minimizes the need for meiotic genetic cleansing that can only occur in sexual stage development of the parasite in mosquitoes.

  9. CASD-NMR 2: robust and accurate unsupervised analysis of raw NOESY spectra and protein structure determination with UNIO

    International Nuclear Information System (INIS)

    Guerry, Paul; Duong, Viet Dung; Herrmann, Torsten

    2015-01-01

    UNIO is a comprehensive software suite for protein NMR structure determination that enables full automation of all NMR data analysis steps involved—including signal identification in NMR spectra, sequence-specific backbone and side-chain resonance assignment, NOE assignment and structure calculation. Within the framework of the second round of the community-wide stringent blind NMR structure determination challenge (CASD-NMR 2), we participated in two categories of CASD-NMR 2, namely using either raw NMR spectra or unrefined NOE peak lists as input. A total of 15 resulting NMR structure bundles were submitted for 9 out of 10 blind protein targets. All submitted UNIO structures accurately coincided with the corresponding blind targets as documented by an average backbone root mean-square deviation to the reference proteins of only 1.2 Å. Also, the precision of the UNIO structure bundles was virtually identical to the ensemble of reference structures. By assessing the quality of all UNIO structures submitted to the two categories, we find throughout that only the UNIO–ATNOS/CANDID approach using raw NMR spectra consistently yielded structure bundles of high quality for direct deposition in the Protein Data Bank. In conclusion, the results obtained in CASD-NMR 2 are another vital proof for robust, accurate and unsupervised NMR data analysis by UNIO for real-world applications

  10. GOSSIP: a method for fast and accurate global alignment of protein structures.

    Science.gov (United States)

    Kifer, I; Nussinov, R; Wolfson, H J

    2011-04-01

    The database of known protein structures (PDB) is increasing rapidly. This results in a growing need for methods that can cope with the vast amount of structural data. To analyze the accumulating data, it is important to have a fast tool for identifying similar structures and clustering them by structural resemblance. Several excellent tools have been developed for the comparison of protein structures. These usually address the task of local structure alignment, an important yet computationally intensive problem due to its complexity. It is difficult to use such tools for comparing a large number of structures to each other at a reasonable time. Here we present GOSSIP, a novel method for a global all-against-all alignment of any set of protein structures. The method detects similarities between structures down to a certain cutoff (a parameter of the program), hence allowing it to detect similar structures at a much higher speed than local structure alignment methods. GOSSIP compares many structures in times which are several orders of magnitude faster than well-known available structure alignment servers, and it is also faster than a database scanning method. We evaluate GOSSIP both on a dataset of short structural fragments and on two large sequence-diverse structural benchmarks. Our conclusions are that for a threshold of 0.6 and above, the speed of GOSSIP is obtained with no compromise of the accuracy of the alignments or of the number of detected global similarities. A server, as well as an executable for download, are available at http://bioinfo3d.cs.tau.ac.il/gossip/.

  11. Accurate Breakpoint Mapping in Apparently Balanced Translocation Families with Discordant Phenotypes Using Whole Genome Mate-Pair Sequencing

    DEFF Research Database (Denmark)

    Aristidou, Constantia; Koufaris, Costas; Theodosiou, Athina

    2017-01-01

    Familial apparently balanced translocations (ABTs) segregating with discordant phenotypes are extremely challenging for interpretation and counseling due to the scarcity of publications and lack of routine techniques for quick investigation. Recently, next generation sequencing has emerged...... and non-affected members carrying the same translocations. PTCD1, ATP5J2-PTCD1, CADPS2, and STPG1 were disrupted by the translocations in three families, rendering them initially as possible disease candidate genes. However, subsequent mutation screening and structural variant analysis did not reveal any...... can also be used in routine clinical investigation of ABT cases. Unlike de novo translocations, no associations were determined here between familial two-way ABTs and the phenotype of the affected members, in which the presence of cryptic imbalances and complex chromosomal rearrangements has been...

  12. Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

    Science.gov (United States)

    Hormozdiari, Fereydoun; Hajirasouliha, Iman; McPherson, Andrew; Eichler, Evan E.; Sahinalp, S. Cenk

    Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes [1]. Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper, we study the problem of detecting structural variation events in two or more sequenced genomes through high throughput sequencing . We propose to move from the current model of (1) detecting genomic variations in single next generation sequenced (NGS) donor genomes independently, and (2) checking whether two or more donor genomes indeed agree or disagree on the variations (in this paper we name this framework Independent Structural Variation Discovery and Merging - ISV&M), to a new model in which we detect structural variation events among multiple genomes simultaneously.

  13. Accurate Determination of the Frequency Response Function of Submerged and Confined Structures by Using PZT-Patches†

    Directory of Open Access Journals (Sweden)

    Alexandre Presas

    2017-03-01

    Full Text Available To accurately determine the dynamic response of a structure is of relevant interest in many engineering applications. Particularly, it is of paramount importance to determine the Frequency Response Function (FRF for structures subjected to dynamic loads in order to avoid resonance and fatigue problems that can drastically reduce their useful life. One challenging case is the experimental determination of the FRF of submerged and confined structures, such as hydraulic turbines, which are greatly affected by dynamic problems as reported in many cases in the past. The utilization of classical and calibrated exciters such as instrumented hammers or shakers to determine the FRF in such structures can be very complex due to the confinement of the structure and because their use can disturb the boundary conditions affecting the experimental results. For such cases, Piezoelectric Patches (PZTs, which are very light, thin and small, could be a very good option. Nevertheless, the main drawback of these exciters is that the calibration as dynamic force transducers (relationship voltage/force has not been successfully obtained in the past. Therefore, in this paper, a method to accurately determine the FRF of submerged and confined structures by using PZTs is developed and validated. The method consists of experimentally determining some characteristic parameters that define the FRF, with an uncalibrated PZT exciting the structure. These parameters, which have been experimentally determined, are then introduced in a validated numerical model of the tested structure. In this way, the FRF of the structure can be estimated with good accuracy. With respect to previous studies, where only the natural frequencies and mode shapes were considered, this paper discuss and experimentally proves the best excitation characteristic to obtain also the damping ratios and proposes a procedure to fully determine the FRF. The method proposed here has been validated for the

  14. Accurate Determination of the Frequency Response Function of Submerged and Confined Structures by Using PZT-Patches†.

    Science.gov (United States)

    Presas, Alexandre; Valentin, David; Egusquiza, Eduard; Valero, Carme; Egusquiza, Mònica; Bossio, Matias

    2017-03-22

    To accurately determine the dynamic response of a structure is of relevant interest in many engineering applications. Particularly, it is of paramount importance to determine the Frequency Response Function (FRF) for structures subjected to dynamic loads in order to avoid resonance and fatigue problems that can drastically reduce their useful life. One challenging case is the experimental determination of the FRF of submerged and confined structures, such as hydraulic turbines, which are greatly affected by dynamic problems as reported in many cases in the past. The utilization of classical and calibrated exciters such as instrumented hammers or shakers to determine the FRF in such structures can be very complex due to the confinement of the structure and because their use can disturb the boundary conditions affecting the experimental results. For such cases, Piezoelectric Patches (PZTs), which are very light, thin and small, could be a very good option. Nevertheless, the main drawback of these exciters is that the calibration as dynamic force transducers (relationship voltage/force) has not been successfully obtained in the past. Therefore, in this paper, a method to accurately determine the FRF of submerged and confined structures by using PZTs is developed and validated. The method consists of experimentally determining some characteristic parameters that define the FRF, with an uncalibrated PZT exciting the structure. These parameters, which have been experimentally determined, are then introduced in a validated numerical model of the tested structure. In this way, the FRF of the structure can be estimated with good accuracy. With respect to previous studies, where only the natural frequencies and mode shapes were considered, this paper discuss and experimentally proves the best excitation characteristic to obtain also the damping ratios and proposes a procedure to fully determine the FRF. The method proposed here has been validated for the structure vibrating

  15. Accurate first-principles structures and energies of diversely bonded systems from an efficient density functional.

    Science.gov (United States)

    Sun, Jianwei; Remsing, Richard C; Zhang, Yubo; Sun, Zhaoru; Ruzsinszky, Adrienn; Peng, Haowei; Yang, Zenghui; Paul, Arpita; Waghmare, Umesh; Wu, Xifan; Klein, Michael L; Perdew, John P

    2016-09-01

    One atom or molecule binds to another through various types of bond, the strengths of which range from several meV to several eV. Although some computational methods can provide accurate descriptions of all bond types, those methods are not efficient enough for many studies (for example, large systems, ab initio molecular dynamics and high-throughput searches for functional materials). Here, we show that the recently developed non-empirical strongly constrained and appropriately normed (SCAN) meta-generalized gradient approximation (meta-GGA) within the density functional theory framework predicts accurate geometries and energies of diversely bonded molecules and materials (including covalent, metallic, ionic, hydrogen and van der Waals bonds). This represents a significant improvement at comparable efficiency over its predecessors, the GGAs that currently dominate materials computation. Often, SCAN matches or improves on the accuracy of a computationally expensive hybrid functional, at almost-GGA cost. SCAN is therefore expected to have a broad impact on chemistry and materials science.

  16. Multi-scale coding of genomic information: From DNA sequence to genome structure and function

    International Nuclear Information System (INIS)

    Arneodo, Alain; Vaillant, Cedric; Audit, Benjamin; Argoul, Francoise; D'Aubenton-Carafa, Yves; Thermes, Claude

    2011-01-01

    Understanding how chromatin is spatially and dynamically organized in the nucleus of eukaryotic cells and how this affects genome functions is one of the main challenges of cell biology. Since the different orders of packaging in the hierarchical organization of DNA condition the accessibility of DNA sequence elements to trans-acting factors that control the transcription and replication processes, there is actually a wealth of structural and dynamical information to learn in the primary DNA sequence. In this review, we show that when using concepts, methodologies, numerical and experimental techniques coming from statistical mechanics and nonlinear physics combined with wavelet-based multi-scale signal processing, we are able to decipher the multi-scale sequence encoding of chromatin condensation-decondensation mechanisms that play a fundamental role in regulating many molecular processes involved in nuclear functions.

  17. RNA structural constraints in the evolution of the influenza A virus genome NP segment

    NARCIS (Netherlands)

    A.P. Gultyaev (Alexander); A. Tsyganov-Bodounov (Anton); M.I. Spronken (Monique); S. Van Der Kooij (Sander); R.A.M. Fouchier (Ron); R.C.L. Olsthoorn (René)

    2014-01-01

    textabstractConserved RNA secondary structures were predicted in the nucleoprotein (NP) segment of the influenza A virus genome using comparative sequence and structure analysis. A number of structural elements exhibiting nucleotide covariations were identified over the whole segment length,

  18. Macromolecular structure determination in the post-genome era

    International Nuclear Information System (INIS)

    Kuhn, P.; Soltis, S.M.

    2001-01-01

    Recent advances in genetics, molecular biology and crystallographic instrumentation and methodology have led to a revolution in the field of Structural Molecular Biology (SMB). These combined advances have paved the way to a more complete and detailed understanding of the biological macromolecules that make up an organism, both in terms of their individual functions and also the interactions between them. In this paper we describe a large-scale, genomic approach to the three-dimensional structure determination of macromolecules and their complexes, using high-throughput methodology to streamline all aspects of the process. This task requires the development of automated high-intensity synchrotron beam lines for X-ray diffraction data collection from single crystal samples. Furthermore, these beam lines must be operated within a sophisticated software and hardware environment, which is capable of delivering a completely automated structure determination pipeline. The SMB resource at SSRL is developing a system for the structure determination steps of this process, starting with the initial characterization of the frozen sample, followed by data collection, data reduction, phase determination, and model building. This paper focuses on the data collection elements of this high-throughput system

  19. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sükösd, Zsuzsanna; Andersen, Ebbe Sloth; Seemann, Ernst Stefan

    2015-01-01

    of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping...

  20. Interbank Market Structure and Accurate Estimation of an Aggregate Liquidity Shock

    OpenAIRE

    Isakov, A.

    2013-01-01

    It's customary among money market analysts to blame interest rate deviations from the Bank of Russia's target band on the market structure imperfections or segmentation. We isolate one form of such market imperfection and provide an illustration of its potential impact on central bank's open market operations efficiency in the current monetary policy framework. We then hypothesize that naive (market) structure-agnostic liquidity gap aggregation will lead to market demand underestimation in so...

  1. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology

    DEFF Research Database (Denmark)

    Cao, Hongzhi; Hastie, Alex R.; Cao, Dandan

    2014-01-01

    mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost......-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion. RESULTS: Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger...... fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides...

  2. Insular Celtic population structure and genomic footprints of migration.

    Directory of Open Access Journals (Sweden)

    Ross P Byrne

    2018-01-01

    Full Text Available Previous studies of the genetic landscape of Ireland have suggested homogeneity, with population substructure undetectable using single-marker methods. Here we have harnessed the haplotype-based method fineSTRUCTURE in an Irish genome-wide SNP dataset, identifying 23 discrete genetic clusters which segregate with geographical provenance. Cluster diversity is pronounced in the west of Ireland but reduced in the east where older structure has been eroded by historical migrations. Accordingly, when populations from the neighbouring island of Britain are included, a west-east cline of Celtic-British ancestry is revealed along with a particularly striking correlation between haplotypes and geography across both islands. A strong relationship is revealed between subsets of Northern Irish and Scottish populations, where discordant genetic and geographic affinities reflect major migrations in recent centuries. Additionally, Irish genetic proximity of all Scottish samples likely reflects older strata of communication across the narrowest inter-island crossing. Using GLOBETROTTER we detected Irish admixture signals from Britain and Europe and estimated dates for events consistent with the historical migrations of the Norse-Vikings, the Anglo-Normans and the British Plantations. The influence of the former is greater than previously estimated from Y chromosome haplotypes. In all, we paint a new picture of the genetic landscape of Ireland, revealing structure which should be considered in the design of studies examining rare genetic variation and its association with traits.

  3. Unexpected structural complexity of supernumerary marker chromosomes characterized by microarray comparative genomic hybridization

    Directory of Open Access Journals (Sweden)

    Hing Anne V

    2008-04-01

    Full Text Available Abstract Background Supernumerary marker chromosomes (SMCs are structurally abnormal extra chromosomes that cannot be unambiguously identified by conventional banding techniques. In the past, SMCs have been characterized using a variety of different molecular cytogenetic techniques. Although these techniques can sometimes identify the chromosome of origin of SMCs, they are cumbersome to perform and are not available in many clinical cytogenetic laboratories. Furthermore, they cannot precisely determine the region or breakpoints of the chromosome(s involved. In this study, we describe four patients who possess one or more SMCs (a total of eight SMCs in all four patients that were characterized by microarray comparative genomic hybridization (array CGH. Results In at least one SMC from all four patients, array CGH uncovered unexpected complexity, in the form of complex rearrangements, that could have gone undetected using other molecular cytogenetic techniques. Although array CGH accurately defined the chromosome content of all but two minute SMCs, fluorescence in situ hybridization was necessary to determine the structure of the markers. Conclusion The increasing use of array CGH in clinical cytogenetic laboratories will provide an efficient method for more comprehensive characterization of SMCs. Improved SMC characterization, facilitated by array CGH, will allow for more accurate SMC/phenotype correlation.

  4. Accurate Fabrication of Hydroxyapatite Bone Models with Porous Scaffold Structures by Using Stereolithography

    International Nuclear Information System (INIS)

    Maeda, Chiaki; Tasaki, Satoko; Kirihara, Soshu

    2011-01-01

    Computer graphic models of bioscaffolds with four-coordinate lattice structures of solid rods in artificial bones were designed by using a computer aided design. The scaffold models composed of acryl resin with hydroxyapatite particles at 45vol. % were fabricated by using stereolithography of a computer aided manufacturing. After dewaxing and sintering heat treatment processes, the ceramics scaffold models with four-coordinate lattices and fine hydroxyapatite microstructures were obtained successfully. By using a computer aided analysis, it was found that bio-fluids could flow extensively inside the sintered scaffolds. This result shows that the lattice structures will realize appropriate bio-fluid circulations and promote regenerations of new bones.

  5. Accurate Fabrication of Hydroxyapatite Bone Models with Porous Scaffold Structures by Using Stereolithography

    Energy Technology Data Exchange (ETDEWEB)

    Maeda, Chiaki; Tasaki, Satoko; Kirihara, Soshu, E-mail: c-maeda@jwri.osaka-u.ac.jp [Joining and Welding Research Institute, Osaka University, 11-1 Mihogaoka, Ibaraki City, Osaka 567-0047 (Japan)

    2011-05-15

    Computer graphic models of bioscaffolds with four-coordinate lattice structures of solid rods in artificial bones were designed by using a computer aided design. The scaffold models composed of acryl resin with hydroxyapatite particles at 45vol. % were fabricated by using stereolithography of a computer aided manufacturing. After dewaxing and sintering heat treatment processes, the ceramics scaffold models with four-coordinate lattices and fine hydroxyapatite microstructures were obtained successfully. By using a computer aided analysis, it was found that bio-fluids could flow extensively inside the sintered scaffolds. This result shows that the lattice structures will realize appropriate bio-fluid circulations and promote regenerations of new bones.

  6. PredictSNP2: A Unified Platform for Accurately Evaluating SNP Effects by Exploiting the Different Characteristics of Variants in Distinct Genomic Regions.

    Science.gov (United States)

    Bendl, Jaroslav; Musil, Miloš; Štourač, Jan; Zendulka, Jaroslav; Damborský, Jiří; Brezovský, Jan

    2016-05-01

    An important message taken from human genome sequencing projects is that the human population exhibits approximately 99.9% genetic similarity. Variations in the remaining parts of the genome determine our identity, trace our history and reveal our heritage. The precise delineation of phenotypically causal variants plays a key role in providing accurate personalized diagnosis, prognosis, and treatment of inherited diseases. Several computational methods for achieving such delineation have been reported recently. However, their ability to pinpoint potentially deleterious variants is limited by the fact that their mechanisms of prediction do not account for the existence of different categories of variants. Consequently, their output is biased towards the variant categories that are most strongly represented in the variant databases. Moreover, most such methods provide numeric scores but not binary predictions of the deleteriousness of variants or confidence scores that would be more easily understood by users. We have constructed three datasets covering different types of disease-related variants, which were divided across five categories: (i) regulatory, (ii) splicing, (iii) missense, (iv) synonymous, and (v) nonsense variants. These datasets were used to develop category-optimal decision thresholds and to evaluate six tools for variant prioritization: CADD, DANN, FATHMM, FitCons, FunSeq2 and GWAVA. This evaluation revealed some important advantages of the category-based approach. The results obtained with the five best-performing tools were then combined into a consensus score. Additional comparative analyses showed that in the case of missense variations, protein-based predictors perform better than DNA sequence-based predictors. A user-friendly web interface was developed that provides easy access to the five tools' predictions, and their consensus scores, in a user-understandable format tailored to the specific features of different categories of variations. To

  7. Ab initio investigation of the N2 endash HF complex: Accurate structure and energetics

    International Nuclear Information System (INIS)

    Woon, D.E.; Dunning, T.H. Jr.; Peterson, K.A.

    1996-01-01

    Augmented correlation consistent basis sets of double (aug-cc-pVDZ), triple (aug-cc-pVTZ), and modified quadruple zeta (aug-cc-pVQZ') quality have been employed to describe the N 2 endash HF potential energy surface at the Hartree endash Fock level and with single reference correlated wave functions including Mo/ller endash Plesset perturbation theory (MP2, MP3, MP4) and coupled cluster methods [CCSD, CCSD(T)]. The most accurate computed equilibrium binding energies D e are (with counterpoise correction) 810 cm -1 (MP4/aug-cc-pVQZ') and 788 cm -1 [CCSD(T)/aug-cc-pVQZ']. Estimated complete basis set limits of 814 cm -1 (MP4) and 793 cm -1 [CCSD(T)] indicate that the large basis set results are essentially converged. Harmonic frequencies and zero-point energies were determined through the aug-cc-pVTZ level. Combining the zero point energies computed at the aug-cc-pVTZ level with the equilibrium binding energies computed at the aug-cc-pVQZ' level, we predict D 0 values of 322 and 296 cm -1 , respectively, at the MP4 and CCSD(T) levels of theory. Using experimental anharmonic frequencies, on the other hand, the CCSD(T) value of D 0 is increased to 415 cm -1 , in good agreement with the experimental value recently reported by Miller and co-workers, 398±2 cm -1 . copyright 1996 American Institute of Physics

  8. Accurate calibration of RL shunts for piezoelectric vibration damping of flexible structures

    DEFF Research Database (Denmark)

    Høgsberg, Jan Becker; Krenk, Steen

    2016-01-01

    Piezoelectric RL (resistive-inductive) shunts are passive resonant devices used for damping of dominantvibration modes of a flexible structure and their efficiency relies on precise calibration of the shuntcomponents. In the present paper improved calibration accuracy is attained by an extension...

  9. Sidewall patterning - A new wafer-scale method for accurate patterning of vertical silicon structures

    NARCIS (Netherlands)

    Westerik, P. J.; Vijselaar, W. J.C.; Berenschot, J. W.; Tas, N. R.; Huskens, J.; Gardeniers, J. G.E.

    2018-01-01

    For the definition of wafer scale micro- and nanostructures, in-plane geometry is usually controlled by optical lithography. However, options for precisely patterning structures in the out-of-plane direction are much more limited. In this paper we present a versatile self-aligned technique that

  10. Systematic determination of the mosaic structure of bacterial genomes: species backbone versus strain-specific loops

    Directory of Open Access Journals (Sweden)

    Gendrault-Jacquemard A

    2005-07-01

    Full Text Available Abstract Background Public databases now contain multitude of complete bacterial genomes, including several genomes of the same species. The available data offers new opportunities to address questions about bacterial genome evolution, a task that requires reliable fine comparison data of closely related genomes. Recent analyses have shown, using pairwise whole genome alignments, that it is possible to segment bacterial genomes into a common conserved backbone and strain-specific sequences called loops. Results Here, we generalize this approach and propose a strategy that allows systematic and non-biased genome segmentation based on multiple genome alignments. Segmentation analyses, as applied to 13 different bacterial species, confirmed the feasibility of our approach to discern the 'mosaic' organization of bacterial genomes. Segmentation results are available through a Web interface permitting functional analysis, extraction and visualization of the backbone/loops structure of documented genomes. To illustrate the potential of this approach, we performed a precise analysis of the mosaic organization of three E. coli strains and functional characterization of the loops. Conclusion The segmentation results including the backbone/loops structure of 13 bacterial species genomes are new and available for use by the scientific community at the URL: http://genome.jouy.inra.fr/mosaic.

  11. Multi-threshold white matter structural networks fusion for accurate diagnosis of Tourette syndrome children

    Science.gov (United States)

    Wen, Hongwei; Liu, Yue; Wang, Shengpei; Li, Zuoyong; Zhang, Jishui; Peng, Yun; He, Huiguang

    2017-03-01

    Tourette syndrome (TS) is a childhood-onset neurobehavioral disorder. To date, TS is still misdiagnosed due to its varied presentation and lacking of obvious clinical symptoms. Therefore, studies of objective imaging biomarkers are of great importance for early TS diagnosis. As tic generation has been linked to disturbed structural networks, and many efforts have been made recently to investigate brain functional or structural networks using machine learning methods, for the purpose of disease diagnosis. However, few studies were related to TS and some drawbacks still existed in them. Therefore, we propose a novel classification framework integrating a multi-threshold strategy and a network fusion scheme to address the preexisting drawbacks. Here we used diffusion MRI probabilistic tractography to construct the structural networks of 44 TS children and 48 healthy children. We ameliorated the similarity network fusion algorithm specially to fuse the multi-threshold structural networks. Graph theoretical analysis was then implemented, and nodal degree, nodal efficiency and nodal betweenness centrality were selected as features. Finally, support vector machine recursive feature extraction (SVM-RFE) algorithm was used for feature selection, and then optimal features are fed into SVM to automatically discriminate TS children from controls. We achieved a high accuracy of 89.13% evaluated by a nested cross validation, demonstrated the superior performance of our framework over other comparison methods. The involved discriminative regions for classification primarily located in the basal ganglia and frontal cortico-cortical networks, all highly related to the pathology of TS. Together, our study may provide potential neuroimaging biomarkers for early-stage TS diagnosis.

  12. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile, and accurate RNA structure analysis

    Science.gov (United States)

    Smola, Matthew J.; Rice, Greggory M.; Busan, Steven; Siegfried, Nathan A.; Weeks, Kevin M.

    2016-01-01

    SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures. PMID:26426499

  13. Target Selection and Deselection at the Berkeley StructuralGenomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Chandonia, John-Marc; Kim, Sung-Hou; Brenner, Steven E.

    2005-03-22

    At the Berkeley Structural Genomics Center (BSGC), our goalis to obtain a near-complete structural complement of proteins in theminimal organisms Mycoplasma genitalium and M. pneumoniae, two closelyrelated pathogens. Current targets for structure determination have beenselected in six major stages, starting with those predicted to be mosttractable to high throughput study and likely to yield new structuralinformation. We report on the process used to select these proteins, aswell as our target deselection procedure. Target deselection reducesexperimental effort by eliminating targets similar to those recentlysolved by the structural biology community or other centers. We measurethe impact of the 69 structures solved at the BSGC as of July 2004 onstructure prediction coverage of the M. pneumoniae and M. genitaliumproteomes. The number of Mycoplasma proteins for which thefold couldfirst be reliably assigned based on structures solved at the BSGC (24 M.pneumoniae and 21 M. genitalium) is approximately 25 percent of the totalresulting from work at all structural genomics centers and the worldwidestructural biology community (94 M. pneumoniae and 86M. genitalium)during the same period. As the number of structures contributed by theBSGC during that period is less than 1 percent of the total worldwideoutput, the benefits of a focused target selection strategy are apparent.If the structures of all current targets were solved, the percentage ofM. pneumoniae proteins for which folds could be reliably assigned wouldincrease from approximately 57 percent (391 of 687) at present to around80 percent (550 of 687), and the percentage of the proteome that could beaccurately modeled would increase from around 37 percent (254 of 687) toabout 64 percent (438 of 687). In M. genitalium, the percentage of theproteome that could be structurally annotated based on structures of ourremaining targets would rise from 72 percent (348 of 486) to around 76percent (371 of 486), with the

  14. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    Science.gov (United States)

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  15. Accurate determination of interfacial protein secondary structure by combining interfacial-sensitive amide I and amide III spectral signals.

    Science.gov (United States)

    Ye, Shuji; Li, Hongchun; Yang, Weilai; Luo, Yi

    2014-01-29

    Accurate determination of protein structures at the interface is essential to understand the nature of interfacial protein interactions, but it can only be done with a few, very limited experimental methods. Here, we demonstrate for the first time that sum frequency generation vibrational spectroscopy can unambiguously differentiate the interfacial protein secondary structures by combining surface-sensitive amide I and amide III spectral signals. This combination offers a powerful tool to directly distinguish random-coil (disordered) and α-helical structures in proteins. From a systematic study on the interactions between several antimicrobial peptides (including LKα14, mastoparan X, cecropin P1, melittin, and pardaxin) and lipid bilayers, it is found that the spectral profiles of the random-coil and α-helical structures are well separated in the amide III spectra, appearing below and above 1260 cm(-1), respectively. For the peptides with a straight backbone chain, the strength ratio for the peaks of the random-coil and α-helical structures shows a distinct linear relationship with the fraction of the disordered structure deduced from independent NMR experiments reported in the literature. It is revealed that increasing the fraction of negatively charged lipids can induce a conformational change of pardaxin from random-coil to α-helical structures. This experimental protocol can be employed for determining the interfacial protein secondary structures and dynamics in situ and in real time without extraneous labels.

  16. The Dirac equation in electronic structure calculations: Accurate evaluation of DFT predictions for actinides

    International Nuclear Information System (INIS)

    Wills, John M.; Mattsson, Ann E.

    2012-01-01

    Brooks, Johansson, and Skriver, using the LMTO-ASA method and considerable insight, were able to explain many of the ground state properties of the actinides. In the many years since this work was done, electronic structure calculations of increasing sophistication have been applied to actinide elements and compounds, attempting to quantify the applicability of DFT to actinides and actinide compounds and to try to incorporate other methodologies (i.e. DMFT) into DFT calculations. Through these calculations, the limits of both available density functionals and ad hoc methodologies are starting to become clear. However, it has also become clear that approximations used to incorporate relativity are not adequate to provide rigorous tests of the underlying equations of DFT, not to mention ad hoc additions. In this talk, we describe the result of full-potential LMTO calculations for the elemental actinides, comparing results obtained with a full Dirac basis with those obtained from scalar-relativistic bases, with and without variational spin-orbit. This comparison shows that the scalar relativistic treatment of actinides does not have sufficient accuracy to provide a rigorous test of theory and that variational spin-orbit introduces uncontrolled errors in the results of electronic structure calculations on actinide elements.

  17. A more accurate modeling of the effects of actuators in large space structures

    Science.gov (United States)

    Hablani, H. B.

    1981-01-01

    The paper deals with finite actuators. A nonspinning three-axis stabilized space vehicle having a two-dimensional large structure and a rigid body at the center is chosen for analysis. The torquers acting on the vehicle are modeled as antisymmetric forces distributed in a small but finite area. In the limit they represent point torquers which also are treated as a special case of surface distribution of dipoles. Ordinary and partial differential equations governing the forced vibrations of the vehicle are derived by using Hamilton's principle. Associated modal inputs are obtained for both the distributed moments and the distributed forces. It is shown that the finite torquers excite the higher modes less than the point torquers. Modal cost analysis proves to be a suitable methodology to this end.

  18. Producing genome structure populations with the dynamic and automated PGS software.

    Science.gov (United States)

    Hua, Nan; Tjong, Harianto; Shin, Hanjun; Gong, Ke; Zhou, Xianghong Jasmine; Alber, Frank

    2018-05-01

    Chromosome conformation capture technologies such as Hi-C are widely used to investigate the spatial organization of genomes. Because genome structures can vary considerably between individual cells of a population, interpreting ensemble-averaged Hi-C data can be challenging, in particular for long-range and interchromosomal interactions. We pioneered a probabilistic approach for the generation of a population of distinct diploid 3D genome structures consistent with all the chromatin-chromatin interaction probabilities from Hi-C experiments. Each structure in the population is a physical model of the genome in 3D. Analysis of these models yields new insights into the causes and the functional properties of the genome's organization in space and time. We provide a user-friendly software package, called PGS, which runs on local machines (for practice runs) and high-performance computing platforms. PGS takes a genome-wide Hi-C contact frequency matrix, along with information about genome segmentation, and produces an ensemble of 3D genome structures entirely consistent with the input. The software automatically generates an analysis report, and provides tools to extract and analyze the 3D coordinates of specific domains. Basic Linux command-line knowledge is sufficient for using this software. A typical running time of the pipeline is ∼3 d with 300 cores on a computer cluster to generate a population of 1,000 diploid genome structures at topological-associated domain (TAD)-level resolution.

  19. Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic.

    Science.gov (United States)

    Brown, Peter; Pullan, Wayne; Yang, Yuedong; Zhou, Yaoqi

    2016-02-01

    The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org yaoqi.zhou@griffith.edu.au. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Accurate facade feature extraction method for buildings from three-dimensional point cloud data considering structural information

    Science.gov (United States)

    Wang, Yongzhi; Ma, Yuqing; Zhu, A.-xing; Zhao, Hui; Liao, Lixia

    2018-05-01

    Facade features represent segmentations of building surfaces and can serve as a building framework. Extracting facade features from three-dimensional (3D) point cloud data (3D PCD) is an efficient method for 3D building modeling. By combining the advantages of 3D PCD and two-dimensional optical images, this study describes the creation of a highly accurate building facade feature extraction method from 3D PCD with a focus on structural information. The new extraction method involves three major steps: image feature extraction, exploration of the mapping method between the image features and 3D PCD, and optimization of the initial 3D PCD facade features considering structural information. Results show that the new method can extract the 3D PCD facade features of buildings more accurately and continuously. The new method is validated using a case study. In addition, the effectiveness of the new method is demonstrated by comparing it with the range image-extraction method and the optical image-extraction method in the absence of structural information. The 3D PCD facade features extracted by the new method can be applied in many fields, such as 3D building modeling and building information modeling.

  1. An efficient and accurate method for computation of energy release rates in beam structures with longitudinal cracks

    DEFF Research Database (Denmark)

    Blasques, José Pedro Albergaria Amaral; Bitsche, Robert

    2015-01-01

    This paper proposes a novel, efficient, and accurate framework for fracture analysis of beam structures with longitudinal cracks. The three-dimensional local stress field is determined using a high-fidelity beam model incorporating a finite element based cross section analysis tool. The Virtual...... Crack Closure Technique is used for computation of strain energy release rates. The devised framework was employed for analysis of cracks in beams with different cross section geometries. The results show that the accuracy of the proposed method is comparable to that of conventional three......-dimensional solid finite element models while using only a fraction of the computation time....

  2. Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

    2007-09-02

    Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

  3. Accurate measurement of surface areas of anatomical structures by computer-assisted triangulation of computed tomography images

    Energy Technology Data Exchange (ETDEWEB)

    Allardice, J.T.; Jacomb-Hood, J.; Abulafi, A.M.; Williams, N.S. (Royal London Hospital (United Kingdom)); Cookson, J.; Dykes, E.; Holman, J. (London Hospital Medical College (United Kingdom))

    1993-05-01

    There is a need for accurate surface area measurement of internal anatomical structures in order to define light dosimetry in adjunctive intraoperative photodynamic therapy (AIOPDT). The authors investigated whether computer-assisted triangulation of serial sections generated by computed tomography (CT) scanning can give an accurate assessment of the surface area of the walls of the true pelvis after anterior resection and before colorectal anastomosis. They show that the technique of paper density tessellation is an acceptable method of measuring the surface areas of phantom objects, with a maximum error of 0.5%, and is used as the gold standard. Computer-assisted triangulation of CT images of standard geometric objects and accurately-constructed pelvic phantoms gives a surface area assessment with a maximum error of 2.5% compared with the gold standard. The CT images of 20 patients' pelves have been analysed by computer-assisted triangulation and this shows the surface area of the walls varies from 143 cm[sup 2] to 392 cm[sup 2]. (Author).

  4. The eukaryotic genome is structurally and functionally more like a social insect colony than a book.

    Science.gov (United States)

    Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin

    2017-11-01

    Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.

  5. Susceptibilities to DNA Structural Transitions within Eukaryotic Genomes

    Science.gov (United States)

    Zhabinskaya, Dina; Benham, Craig; Madden, Sally

    2012-02-01

    We analyze the competitive transitions to alternate secondary DNA structures in a negatively supercoiled DNA molecule of kilobase length and specified base sequence. We use statistical mechanics to calculate the competition among all regions within the sequence that are susceptible to transitions to alternate structures. We use an approximate numerical method since the calculation of an exact partition function is numerically cumbersome for DNA molecules of lengths longer than hundreds of base pairs. This method yields accurate results in reasonable computational times. We implement algorithms that calculate the competition between transitions to denatured states and to Z-form DNA. We analyze these transitions near the transcription start sites (TSS) of a set of eukaryotic genes. We find an enhancement of Z-forming regions upstream of the TSS and a depletion of denatured regions around the start sites. We confirm that these finding are statistically significant by comparing our results to a set of randomized genes with preserved base composition at each position relative to the gene start sites. When we study the correlation of these transitions in orthologous mouse and human genes we find a clear evolutionary conservation of both types of transitions around the TSS.

  6. Accurate protein structure annotation through competitive diffusion of enzymatic functions over a network of local evolutionary similarities.

    Directory of Open Access Journals (Sweden)

    Eric Venner

    Full Text Available High-throughput Structural Genomics yields many new protein structures without known molecular function. This study aims to uncover these missing annotations by globally comparing select functional residues across the structural proteome. First, Evolutionary Trace Annotation, or ETA, identifies which proteins have local evolutionary and structural features in common; next, these proteins are linked together into a proteomic network of ETA similarities; then, starting from proteins with known functions, competing functional labels diffuse link-by-link over the entire network. Every node is thus assigned a likelihood z-score for every function, and the most significant one at each node wins and defines its annotation. In high-throughput controls, this competitive diffusion process recovered enzyme activity annotations with 99% and 97% accuracy at half-coverage for the third and fourth Enzyme Commission (EC levels, respectively. This corresponds to false positive rates 4-fold lower than nearest-neighbor and 5-fold lower than sequence-based annotations. In practice, experimental validation of the predicted carboxylesterase activity in a protein from Staphylococcus aureus illustrated the effectiveness of this approach in the context of an increasingly drug-resistant microbe. This study further links molecular function to a small number of evolutionarily important residues recognizable by Evolutionary Tracing and it points to the specificity and sensitivity of functional annotation by competitive global network diffusion. A web server is at http://mammoth.bcm.tmc.edu/networks.

  7. Alignment-free comparative genomic screen for structured RNAs using coarse-grained secondary structure dot plots

    DEFF Research Database (Denmark)

    Kato, Yuki; Gorodkin, Jan; Havgaard, Jakob Hull

    2017-01-01

    . Methods: Here we present a fast and efficient method, DotcodeR, for detecting structurally similar RNAs in genomic sequences by comparing their corresponding coarse-grained secondary structure dot plots at string level. This allows us to perform an all-against-all scan of all window pairs from two genomes...... without alignment. Results: Our computational experiments with simulated data and real chromosomes demonstrate that the presented method has good sensitivity. Conclusions: DotcodeR can be useful as a pre-filter in a genomic comparative scan for structured RNAs....

  8. Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.

    Directory of Open Access Journals (Sweden)

    Niklas Berliner

    Full Text Available Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases.

  9. From structure prediction to genomic screens for novel non-coding RNAs

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.

    2011-01-01

    Abstract: Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction....... This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early...... upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other....

  10. Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels

    Science.gov (United States)

    Dral, Pavlo O.; Owens, Alec; Yurchenko, Sergei N.; Thiel, Walter

    2017-06-01

    We present an efficient approach for generating highly accurate molecular potential energy surfaces (PESs) using self-correcting, kernel ridge regression (KRR) based machine learning (ML). We introduce structure-based sampling to automatically assign nuclear configurations from a pre-defined grid to the training and prediction sets, respectively. Accurate high-level ab initio energies are required only for the points in the training set, while the energies for the remaining points are provided by the ML model with negligible computational cost. The proposed sampling procedure is shown to be superior to random sampling and also eliminates the need for training several ML models. Self-correcting machine learning has been implemented such that each additional layer corrects errors from the previous layer. The performance of our approach is demonstrated in a case study on a published high-level ab initio PES of methyl chloride with 44 819 points. The ML model is trained on sets of different sizes and then used to predict the energies for tens of thousands of nuclear configurations within seconds. The resulting datasets are utilized in variational calculations of the vibrational energy levels of CH3Cl. By using both structure-based sampling and self-correction, the size of the training set can be kept small (e.g., 10% of the points) without any significant loss of accuracy. In ab initio rovibrational spectroscopy, it is thus possible to reduce the number of computationally costly electronic structure calculations through structure-based sampling and self-correcting KRR-based machine learning by up to 90%.

  11. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  12. Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes

    Science.gov (United States)

    Spanò, M.; Lillo, F.; Miccichè, S.; Mantegna, R. N.

    2008-10-01

    By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups, hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact, we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.

  13. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

    Science.gov (United States)

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.

  14. A sequence-based survey of the complex structural organization of tumor genomes

    Energy Technology Data Exchange (ETDEWEB)

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  15. GENERATING ACCURATE 3D MODELS OF ARCHITECTURAL HERITAGE STRUCTURES USING LOW-COST CAMERA AND OPEN SOURCE ALGORITHMS

    Directory of Open Access Journals (Sweden)

    M. Zacharek

    2017-05-01

    Full Text Available These studies have been conductedusing non-metric digital camera and dense image matching algorithms, as non-contact methods of creating monuments documentation.In order toprocess the imagery, few open-source software and algorithms of generating adense point cloud from images have been executed. In the research, the OSM Bundler, VisualSFM software, and web application ARC3D were used. Images obtained for each of the investigated objects were processed using those applications, and then dense point clouds and textured 3D models were created. As a result of post-processing, obtained models were filtered and scaled.The research showedthat even using the open-source software it is possible toobtain accurate 3D models of structures (with an accuracy of a few centimeters, but for the purpose of documentation and conservation of cultural and historical heritage, such accuracy can be insufficient.

  16. Low-q structure of Kr gas at undercritical densities. An accurate SANS determination of triplet forces

    International Nuclear Information System (INIS)

    Guarini, E.; Casanova, G.; Genova Univ.; Bafile, U.; Firenze Univ.; Barocchi, F.; Firenze Univ.

    1999-01-01

    Complete text of publication follows. Besides the promising results of very recent attempts to directly measure the triple-dipole interactions in noble fluids, an alternative method can be applied to get much more accurate experimental information on the long-range triplet contributions from the density behavior of the structure factor S(q) in the low-q region. However this method requires the use of a model for the pair potential and its accuracy critically depends on the agreement between the experimental two-body properties of the fluid and the model assumed. It is shown how this condition is remarkably fulfilled by the results of a SANS measurement in gaseous Kr at densities below 4.3 nm -3 , thus leading to and unprecedented accuracy in the evaluation of triple-dipole forces. (author)

  17. Local chromatin structure of heterochromatin regulates repeated DNA stability, nucleolus structure, and genome integrity

    Energy Technology Data Exchange (ETDEWEB)

    Peng, Jamy C. [Univ. of California, Berkeley, CA (United States)

    2007-01-01

    Heterochromatin constitutes a significant portion of the genome in higher eukaryotes; approximately 30% in Drosophila and human. Heterochromatin contains a high repeat DNA content and a low density of protein-encoding genes. In contrast, euchromatin is composed mostly of unique sequences and contains the majority of single-copy genes. Genetic and cytological studies demonstrated that heterochromatin exhibits regulatory roles in chromosome organization, centromere function and telomere protection. As an epigenetically regulated structure, heterochromatin formation is not defined by any DNA sequence consensus. Heterochromatin is characterized by its association with nucleosomes containing methylated-lysine 9 of histone H3 (H3K9me), heterochromatin protein 1 (HP1) that binds H3K9me, and Su(var)3-9, which methylates H3K9 and binds HP1. Heterochromatin formation and functions are influenced by HP1, Su(var)3-9, and the RNA interference (RNAi) pathway. My thesis project investigates how heterochromatin formation and function impact nuclear architecture, repeated DNA organization, and genome stability in Drosophila melanogaster. H3K9me-based chromatin reduces extrachromosomal DNA formation; most likely by restricting the access of repair machineries to repeated DNAs. Reducing extrachromosomal ribosomal DNA stabilizes rDNA repeats and the nucleolus structure. H3K9me-based chromatin also inhibits DNA damage in heterochromatin. Cells with compromised heterochromatin structure, due to Su(var)3-9 or dcr-2 (a component of the RNAi pathway) mutations, display severe DNA damage in heterochromatin compared to wild type. In these mutant cells, accumulated DNA damage leads to chromosomal defects such as translocations, defective DNA repair response, and activation of the G2-M DNA repair and mitotic checkpoints that ensure cellular and animal viability. My thesis research suggests that DNA replication, repair, and recombination mechanisms in heterochromatin differ from those in

  18. Accurate prediction of stability changes in protein mutants by combining machine learning with structure based computational mutagenesis.

    Science.gov (United States)

    Masso, Majid; Vaisman, Iosif I

    2008-09-15

    Accurate predictive models for the impact of single amino acid substitutions on protein stability provide insight into protein structure and function. Such models are also valuable for the design and engineering of new proteins. Previously described methods have utilized properties of protein sequence or structure to predict the free energy change of mutants due to thermal (DeltaDeltaG) and denaturant (DeltaDeltaG(H2O)) denaturations, as well as mutant thermal stability (DeltaT(m)), through the application of either computational energy-based approaches or machine learning techniques. However, accuracy associated with applying these methods separately is frequently far from optimal. We detail a computational mutagenesis technique based on a four-body, knowledge-based, statistical contact potential. For any mutation due to a single amino acid replacement in a protein, the method provides an empirical normalized measure of the ensuing environmental perturbation occurring at every residue position. A feature vector is generated for the mutant by considering perturbations at the mutated position and it's ordered six nearest neighbors in the 3-dimensional (3D) protein structure. These predictors of stability change are evaluated by applying machine learning tools to large training sets of mutants derived from diverse proteins that have been experimentally studied and described. Predictive models based on our combined approach are either comparable to, or in many cases significantly outperform, previously published results. A web server with supporting documentation is available at http://proteins.gmu.edu/automute.

  19. Large-scale trends in the evolution of gene structures within 11 animal genomes.

    Directory of Open Access Journals (Sweden)

    Mark Yandell

    2006-03-01

    Full Text Available We have used the annotations of six animal genomes (Homo sapiens, Mus musculus, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans together with the sequences of five unannotated Drosophila genomes to survey changes in protein sequence and gene structure over a variety of timescales--from the less than 5 million years since the divergence of D. simulans and D. melanogaster to the more than 500 million years that have elapsed since the Cambrian explosion. To do so, we have developed a new open-source software library called CGL (for "Comparative Genomics Library". Our results demonstrate that change in intron-exon structure is gradual, clock-like, and largely independent of coding-sequence evolution. This means that genome annotations can be used in new ways to inform, corroborate, and test conclusions drawn from comparative genomics analyses that are based upon protein and nucleotide sequence similarities.

  20. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome

    NARCIS (Netherlands)

    Collins, Ryan L; Brand, Harrison; Redin, Claire E.; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T.; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A.; Lucente, Diane; Levy, Brynn; Sanders, Jan-Stephan; Wapner, Ronald J.; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E.

    2017-01-01

    Background: Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. Results: We sequenced 689 participants with autism spectrum disorder (ASD) and other

  1. From structure prediction to genomic screens for novel non-coding RNAs.

    Science.gov (United States)

    Gorodkin, Jan; Hofacker, Ivo L

    2011-08-01

    Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  2. Structural constraints in the packaging of bluetongue virus genomic segments

    OpenAIRE

    Burkhardt, Christiane; Sung, Po-Yu; Celma, Cristina C.; Roy, Polly

    2014-01-01

    : The mechanism used by bluetongue virus (BTV) to ensure the sorting and packaging of its 10 genomic segments is still poorly understood. In this study, we investigated the packaging constraints for two BTV genomic segments from two different serotypes. Segment 4 (S4) of BTV serotype 9 was mutated sequentially and packaging of mutant ssRNAs was investigated by two newly developed RNA packaging assay systems, one in vivo and the other in vitro. Modelling of the mutated ssRNA followed by bioche...

  3. Leishmania naiffi and Leishmania guyanensis reference genomes highlight genome structure and gene evolution in the Viannia subgenus.

    Science.gov (United States)

    Coughlan, Simone; Taylor, Ali Shirley; Feane, Eoghan; Sanders, Mandy; Schonian, Gabriele; Cotton, James A; Downing, Tim

    2018-04-01

    The unicellular protozoan parasite Leishmania causes the neglected tropical disease leishmaniasis, affecting 12 million people in 98 countries. In South America, where the Viannia subgenus predominates, so far only L. ( Viannia ) braziliensis and L. ( V. ) panamensis have been sequenced, assembled and annotated as reference genomes. Addressing this deficit in molecular information can inform species typing, epidemiological monitoring and clinical treatment. Here, L. ( V. ) naiffi and L. ( V. ) guyanensis genomic DNA was sequenced to assemble these two genomes as draft references from short sequence reads. The methods used were tested using short sequence reads for L. braziliensis M2904 against its published reference as a comparison. This assembly and annotation pipeline identified 70 additional genes not annotated on the original M2904 reference. Phylogenetic and evolutionary comparisons of L. guyanensis and L. naiffi with 10 other Viannia genomes revealed four traits common to all Viannia : aneuploidy, 22 orthologous groups of genes absent in other Leishmania subgenera, elevated TATE transposon copies and a high NADH-dependent fumarate reductase gene copy number. Within the Viannia , there were limited structural changes in genome architecture specific to individual species: a 45 Kb amplification on chromosome 34 was present in all bar L. lainsoni , L. naiffi had a higher copy number of the virulence factor leishmanolysin, and laboratory isolate L. shawi M8408 had a possible minichromosome derived from the 3' end of chromosome 34 . This combination of genome assembly, phylogenetics and comparative analysis across an extended panel of diverse Viannia has uncovered new insights into the origin and evolution of this subgenus and can help improve diagnostics for leishmaniasis surveillance.

  4. Structural genomic variation as risk factor for idiopathic recurrent miscarriage

    DEFF Research Database (Denmark)

    Nagirnaja, Liina; Palta, Priit; Kasak, Laura

    2014-01-01

    Recurrent miscarriage (RM) is a multifactorial disorder with acknowledged genetic heritability that affects ∼3% of couples aiming at childbirth. As copy number variants (CNVs) have been shown to contribute to reproductive disease susceptibility, we aimed to describe genome-wide profile of CNVs an...

  5. Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution.

    Science.gov (United States)

    Yap, Jia-Yee S; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y H; Wilton, Alan; Wilkins, Marc R; Rossetto, Maurizio; Delaney, Sven K

    2015-01-01

    The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine.

  6. A genomic overview of the population structure of Salmonella.

    Directory of Open Access Journals (Sweden)

    Nabil-Fareed Alikhan

    2018-04-01

    Full Text Available For many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST] corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria. Here, we describe EnteroBase, a web-based platform that assembles draft genomes from Illumina short reads in the public domain or that are uploaded by users. EnteroBase implements legacy MLST as well as ribosomal gene MLST (rMLST, core genome MLST (cgMLST, and whole genome MLST (wgMLST and currently contains over 100,000 assembled genomes from Salmonella. It also provides graphical tools for visual interrogation of these genotypes and those based on core single nucleotide polymorphisms (SNPs. eBGs based on legacy MLST are largely consistent with eBGs based on rMLST, thus demonstrating that these correspond to natural populations. rMLST also facilitated the selection of representative genotypes for SNP analyses of the entire breadth of diversity within Salmonella. In contrast, cgMLST provides the resolution needed for epidemiological investigations. These observations show that genomic genotyping, with the assistance of EnteroBase, can be applied at all levels of diversity within the Salmonella genus.

  7. A genomic overview of the population structure of Salmonella.

    Science.gov (United States)

    Alikhan, Nabil-Fareed; Zhou, Zhemin; Sergeant, Martin J; Achtman, Mark

    2018-04-01

    For many decades, Salmonella enterica has been subdivided by serological properties into serovars or further subdivided for epidemiological tracing by a variety of diagnostic tests with higher resolution. Recently, it has been proposed that so-called eBurst groups (eBGs) based on the alleles of seven housekeeping genes (legacy multilocus sequence typing [MLST]) corresponded to natural populations and could replace serotyping. However, this approach lacks the resolution needed for epidemiological tracing and the existence of natural populations had not been independently validated by independent criteria. Here, we describe EnteroBase, a web-based platform that assembles draft genomes from Illumina short reads in the public domain or that are uploaded by users. EnteroBase implements legacy MLST as well as ribosomal gene MLST (rMLST), core genome MLST (cgMLST), and whole genome MLST (wgMLST) and currently contains over 100,000 assembled genomes from Salmonella. It also provides graphical tools for visual interrogation of these genotypes and those based on core single nucleotide polymorphisms (SNPs). eBGs based on legacy MLST are largely consistent with eBGs based on rMLST, thus demonstrating that these correspond to natural populations. rMLST also facilitated the selection of representative genotypes for SNP analyses of the entire breadth of diversity within Salmonella. In contrast, cgMLST provides the resolution needed for epidemiological investigations. These observations show that genomic genotyping, with the assistance of EnteroBase, can be applied at all levels of diversity within the Salmonella genus.

  8. One-photon mass-analyzed threshold ionization (MATI) spectroscopy of pyridine: Determination of accurate ionization energy and cationic structure

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Yu Ran; Kang, Do Won; Kim, Hong Lae, E-mail: chkwon@kangwon.ac.kr, E-mail: hlkim@kangwon.ac.kr; Kwon, Chan Ho, E-mail: chkwon@kangwon.ac.kr, E-mail: hlkim@kangwon.ac.kr [Department of Chemistry and Institute for Molecular Science and Fusion Technology, College of Natural Sciences, Kangwon National University, Chuncheon 200-701 (Korea, Republic of)

    2014-11-07

    Ionization energies and cationic structures of pyridine were intensively investigated utilizing one-photon mass-analyzed threshold ionization (MATI) spectroscopy with vacuum ultraviolet radiation generated by four-wave difference frequency mixing in Kr. The present one-photon high-resolution MATI spectrum of pyridine demonstrated a much finer and richer vibrational structure than that of the previously reported two-photon MATI spectrum. From the MATI spectrum and photoionization efficiency curve, the accurate ionization energy of the ionic ground state of pyridine was confidently determined to be 73 570 ± 6 cm{sup −1} (9.1215 ± 0.0007 eV). The observed spectrum was almost completely assigned by utilizing Franck-Condon factors and vibrational frequencies calculated through adjustments of the geometrical parameters of cationic pyridine at the B3LYP/cc-pVTZ level. A unique feature unveiled through rigorous analysis was the prominent progression of the 10 vibrational mode, which corresponds to in-plane ring bending, and the combination of other totally symmetric fundamentals with the ring bending overtones, which contribute to the geometrical change upon ionization. Notably, the remaining peaks originate from the upper electronic state ({sup 2}A{sub 2}), as predicted by high-resolution photoelectron spectroscopy studies and symmetry-adapted cluster configuration interaction calculations. Based on the quantitatively good agreement between the experimental and calculated results, it was concluded that upon ionization the pyridine cation in the ground electronic state should have a planar structure of C{sub 2v} symmetry through the C-N axis.

  9. The genomic structure of the DMBT1 gene

    DEFF Research Database (Denmark)

    Mollenhauer, J; Holmskov, U; Wiemann, S

    1999-01-01

    Increasing evidence has accumulated for an involvement of the inactivation of tumour suppressor genes at chromosome 10q in the carcinogenesis of brain tumours, melanomas, and carcinomas of the lung, the prostate, the pancreas, and the endometrium. The gene DMBT1 (Deleted in Malignant Brain Tumours...... 1) is located at chromosome 10q25.3-q26.1, within one of the putative intervals for tumour suppressor genes. DMBT1 is a member of the scavenger-receptor cysteine-rich (SRCR) superfamily and displays homozygous deletions or lack of expression in glioblastoma multiforme, medulloblastoma......, and in gastrointestinal and lung cancers. Based on these properties, DMBT1 has been proposed to be a candidate tumour suppressor gene. We have determined the genomic sequence of DMBT1 to allow analyses of mutations. The gene has at least 54 exons that span a genomic region of about 80 kb. We have identified a putative...

  10. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    Science.gov (United States)

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  11. RNA 3D modules in genome-wide predictions of RNA 2D structure

    DEFF Research Database (Denmark)

    Theis, Corinna; Zirbel, Craig L; Zu Siederdissen, Christian Höner

    2015-01-01

    . These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D......Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational...... approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution...

  12. The SGC beyond structural genomics: redefining the role of 3D structures by coupling genomic stratification with fragment-based discovery.

    Science.gov (United States)

    Bradley, Anthony R; Echalier, Aude; Fairhead, Michael; Strain-Damerell, Claire; Brennan, Paul; Bullock, Alex N; Burgess-Brown, Nicola A; Carpenter, Elisabeth P; Gileadi, Opher; Marsden, Brian D; Lee, Wen Hwa; Yue, Wyatt; Bountra, Chas; von Delft, Frank

    2017-11-08

    The ongoing explosion in genomics data has long since outpaced the capacity of conventional biochemical methodology to verify the large number of hypotheses that emerge from the analysis of such data. In contrast, it is still a gold-standard for early phenotypic validation towards small-molecule drug discovery to use probe molecules (or tool compounds), notwithstanding the difficulty and cost of generating them. Rational structure-based approaches to ligand discovery have long promised the efficiencies needed to close this divergence; in practice, however, this promise remains largely unfulfilled, for a host of well-rehearsed reasons and despite the huge technical advances spearheaded by the structural genomics initiatives of the noughties. Therefore the current, fourth funding phase of the Structural Genomics Consortium (SGC), building on its extensive experience in structural biology of novel targets and design of protein inhibitors, seeks to redefine what it means to do structural biology for drug discovery. We developed the concept of a Target Enabling Package (TEP) that provides, through reagents, assays and data, the missing link between genetic disease linkage and the development of usefully potent compounds. There are multiple prongs to the ambition: rigorously assessing targets' genetic disease linkages through crowdsourcing to a network of collaborating experts; establishing a systematic approach to generate the protocols and data that comprise each target's TEP; developing new, X-ray-based fragment technologies for generating high quality chemical matter quickly and cheaply; and exploiting a stringently open access model to build multidisciplinary partnerships throughout academia and industry. By learning how to scale these approaches, the SGC aims to make structures finally serve genomics, as originally intended, and demonstrate how 3D structures systematically allow new modes of druggability to be discovered for whole classes of targets. © 2017 The

  13. Structure and genome organization of AFV2, a novel archaeal lipothrixvirus with unusual terminal and core structures

    DEFF Research Database (Denmark)

    Häring, Monika; Vestergaard, Gisle Alberg; Brügger, Kim

    2005-01-01

    A novel filamentous virus, AFV2, from the hyperthermophilic archaeal genus Acidianus shows structural similarity to lipothrixviruses but differs from them in its unusual terminal and core structures. The double-stranded DNA genome contains 31,787 bp and carries eight open reading frames homologous...

  14. The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial 'mobilome'.

    Science.gov (United States)

    Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

    2009-11-01

    Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The approximately 108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element

  15. Three-dimensional Structure of a Viral Genome-delivery Portal Vertex

    Energy Technology Data Exchange (ETDEWEB)

    A Olia; P Prevelige Jr.; J Johnson; G Cingolani

    2011-12-31

    DNA viruses such as bacteriophages and herpesviruses deliver their genome into and out of the capsid through large proteinaceous assemblies, known as portal proteins. Here, we report two snapshots of the dodecameric portal protein of bacteriophage P22. The 3.25-{angstrom}-resolution structure of the portal-protein core bound to 12 copies of gene product 4 (gp4) reveals a {approx}1.1-MDa assembly formed by 24 proteins. Unexpectedly, a lower-resolution structure of the full-length portal protein unveils the unique topology of the C-terminal domain, which forms a {approx}200-{angstrom}-long {alpha}-helical barrel. This domain inserts deeply into the virion and is highly conserved in the Podoviridae family. We propose that the barrel domain facilitates genome spooling onto the interior surface of the capsid during genome packaging and, in analogy to a rifle barrel, increases the accuracy of genome ejection into the host cell.

  16. Structured RNAs in the ENCODE selected regions of the human genome

    DEFF Research Database (Denmark)

    Washietl, Stefan; Pedersen, Jakob Skou; Korbel, Jan O

    2007-01-01

    Functional RNA structures play an important role both in the context of noncoding RNA transcripts as well as regulatory elements in mRNAs. Here we present a computational study to detect functional RNA structures within the ENCODE regions of the human genome. Since structural RNAs in general lack...... with the GENCODE annotation points to functional RNAs in all genomic contexts, with a slightly increased density in 3'-UTRs. While we estimate a significant false discovery rate of approximately 50%-70% many of the predictions can be further substantiated by additional criteria: 248 loci are predicted by both RNAz...

  17. Terminal structures of West Nile virus genomic RNA and their interactions with viral NS5 protein

    International Nuclear Information System (INIS)

    Dong Hongping; Zhang Bo; Shi Peiyong

    2008-01-01

    Genome cyclization is essential for flavivirus replication. We used RNases to probe the structures formed by the 5'-terminal 190 nucleotides and the 3'-terminal 111 nucleotides of the West Nile virus (WNV) genomic RNA. When analyzed individually, the two RNAs adopt stem-loop structures as predicted by the thermodynamic-folding program. However, when mixed together, the two RNAs form a duplex that is mediated through base-pairings of two sets of RNA elements (5'CS/3'CSI and 5'UAR/3'UAR). Formation of the RNA duplex facilitates a conformational change that leaves the 3'-terminal nucleotides of the genome (position - 8 to - 16) to be single-stranded. Viral NS5 binds specifically to the 5'-terminal stem-loop (SL1) of the genomic RNA. The 5'SL1 RNA structure is essential for WNV replication. The study has provided further evidence to suggest that flavivirus genome cyclization and NS5/5'SL1 RNA interaction facilitate NS5 binding to the 3' end of the genome for the initiation of viral minus-strand RNA synthesis

  18. Genome-wide identification of structural variants in genes encoding drug targets

    DEFF Research Database (Denmark)

    Rasmussen, Henrik Berg; Dahmcke, Christina Mackeprang

    2012-01-01

    The objective of the present study was to identify structural variants of drug target-encoding genes on a genome-wide scale. We also aimed at identifying drugs that are potentially amenable for individualization of treatments based on knowledge about structural variation in the genes encoding...

  19. Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

    Science.gov (United States)

    Lewis, Tony E; Sillitoe, Ian; Andreeva, Antonina; Blundell, Tom L; Buchan, Daniel W A; Chothia, Cyrus; Cuff, Alison; Dana, Jose M; Filippis, Ioannis; Gough, Julian; Hunter, Sarah; Jones, David T; Kelley, Lawrence A; Kleywegt, Gerard J; Minneci, Federico; Mitchell, Alex; Murzin, Alexey G; Ochoa-Montaño, Bernardo; Rackham, Owen J L; Smith, James; Sternberg, Michael J E; Velankar, Sameer; Yeats, Corin; Orengo, Christine

    2013-01-01

    Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence-structure-function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker's yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).

  20. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features.

    Science.gov (United States)

    Ding, Yiliang; Tang, Yin; Kwok, Chun Kit; Zhang, Yu; Bevilacqua, Philip C; Assmann, Sarah M

    2014-01-30

    RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5' splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.

  1. From structure prediction to genomic screens for novel non-coding RNAs.

    Directory of Open Access Journals (Sweden)

    Jan Gorodkin

    2011-08-01

    Full Text Available Non-coding RNAs (ncRNAs are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs. A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  2. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    Energy Technology Data Exchange (ETDEWEB)

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  3. Genomic structural variation contributes to phenotypic change of industrial bioethanol yeast Saccharomyces cerevisiae.

    Science.gov (United States)

    Zhang, Ke; Zhang, Li-Jie; Fang, Ya-Hong; Jin, Xin-Na; Qi, Lei; Wu, Xue-Chang; Zheng, Dao-Qiong

    2016-03-01

    Genomic structural variation (GSV) is a ubiquitous phenomenon observed in the genomes of Saccharomyces cerevisiae strains with different genetic backgrounds; however, the physiological and phenotypic effects of GSV are not well understood. Here, we first revealed the genetic characteristics of a widely used industrial S. cerevisiae strain, ZTW1, by whole genome sequencing. ZTW1 was identified as an aneuploidy strain and a large-scale GSV was observed in the ZTW1 genome compared with the genome of a diploid strain YJS329. These GSV events led to copy number variations (CNVs) in many chromosomal segments as well as one whole chromosome in the ZTW1 genome. Changes in the DNA dosage of certain functional genes directly affected their expression levels and the resultant ZTW1 phenotypes. Moreover, CNVs of large chromosomal regions triggered an aneuploidy stress in ZTW1. This stress decreased the proliferation ability and tolerance of ZTW1 to various stresses, while aneuploidy response stress may also provide some benefits to the fermentation performance of the yeast, including increased fermentation rates and decreased byproduct generation. This work reveals genomic characters of the bioethanol S. cerevisiae strain ZTW1 and suggests that GSV is an important kind of mutation that changes the traits of industrial S. cerevisiae strains. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Long-Range Order and Fractality in the Structure and Organization of Eukaryotic Genomes

    Science.gov (United States)

    Polychronopoulos, Dimitris; Tsiagkas, Giannis; Athanasopoulou, Labrini; Sellis, Diamantis; Almirantis, Yannis

    2014-12-01

    The late Professor J.S. Nicolis always emphasized, both in his writings and in presentations and discussions with students and friends, the relevance of a dynamical systems approach to biology. In particular, viewing the genome as a "biological text" captures the dynamical character of both the evolution and function of the organisms in the form of correlations indicating the presence of a long-range order. This genomic structure can be expressed in forms reminiscent of natural languages and several temporal and spatial traces l by the functioning of dynamical systems: Zipf laws, self-similarity and fractality. Here we review several works of our group and recent unpublished results, focusing on the chromosomal distribution of biologically active genomic components: Genes and protein-coding segments, CpG islands, transposable elements belonging to all major classes and several types of conserved non-coding genomic elements. We report the systematic appearance of power-laws in the size distribution of the distances between elements belonging to each of these types of functional genomic elements. Moreover, fractality is also found in several cases, using box-counting and entropic scaling.We present here, for the first time in a unified way, an aggregative model of the genomic dynamics which can explain the observed patterns on the grounds of known phenomena accompanying genome evolution. Our results comply with recent findings about a "fractal globule" geometry of chromatin in the eukaryotic nucleus.

  5. Exploring the role of genome and structural ions in preventing viral capsid collapse during dehydration

    Science.gov (United States)

    Martín-González, Natalia; Guérin Darvas, Sofía M.; Durana, Aritz; Marti, Gerardo A.; Guérin, Diego M. A.; de Pablo, Pedro J.

    2018-03-01

    Even though viruses evolve mainly in liquid milieu, their horizontal transmission routes often include episodes of dry environment. Along their life cycle, some insect viruses, such as viruses from the Dicistroviridae family, withstand dehydrated conditions with presently unknown consequences to their structural stability. Here, we use atomic force microscopy to monitor the structural changes of viral particles of Triatoma virus (TrV) after desiccation. Our results demonstrate that TrV capsids preserve their genome inside, conserving their height after exposure to dehydrating conditions, which is in stark contrast with other viruses that expel their genome when desiccated. Moreover, empty capsids (without genome) resulted in collapsed particles after desiccation. We also explored the role of structural ions in the dehydration process of the virions (capsid containing genome) by chelating the accessible cations from the external solvent milieu. We observed that ion suppression helps to keep the virus height upon desiccation. Our results show that under drying conditions, the genome of TrV prevents the capsid from collapsing during dehydration, while the structural ions are responsible for promoting solvent exchange through the virion wall.

  6. Insights into the genome structure and copy-number variation of Eimeria tenella

    Directory of Open Access Journals (Sweden)

    Lim Lik-Sin

    2012-08-01

    method to improve the assembly of the genome of E. tenella from shotgun data, and to help reveal its overall structure. A preliminary assessment of copy-number variation (extra or missing copies of genomic segments between strains of E. tenella was also carried out. The emerging picture is of a very unusual genome architecture displaying inter-strain copy-number variation. We suggest that these features may be related to the known ability of this parasite to rapidly develop drug resistance.

  7. Accurate Dna Assembly And Direct Genome Integration With Optimized Uracil Excision Cloning To Facilitate Engineering Of Escherichia Coli As A Cell Factory

    DEFF Research Database (Denmark)

    Cavaleiro, Mafalda; Kim, Se Hyeuk; Nørholm, Morten

    2015-01-01

    Plants produce a vast diversity of valuable compounds with medical properties, but these are often difficult to purify from the natural source or produce by organic synthesis. An alternative is to transfer the biosynthetic pathways to an efficient production host like the bacterium Escherichia co......-excision-based cloning and combining it with a genome-engineering approach to allow direct integration of whole metabolic pathways into the genome of E. coli, to facilitate the advanced engineering of cell factories........ Cloning and heterologous gene expression are major bottlenecks in the metabolic engineering field. We are working on standardizing DNA vector design processes to promote automation and collaborations in early phase metabolic engineering projects. Here, we focus on optimizing the already established uracil...

  8. GeneViTo: Visualizing gene-product functional and structural features in genomic datasets

    Directory of Open Access Journals (Sweden)

    Promponas Vasilis J

    2003-10-01

    Full Text Available Abstract Background The availability of increasing amounts of sequence data from completely sequenced genomes boosts the development of new computational methods for automated genome annotation and comparative genomics. Therefore, there is a need for tools that facilitate the visualization of raw data and results produced by bioinformatics analysis, providing new means for interactive genome exploration. Visual inspection can be used as a basis to assess the quality of various analysis algorithms and to aid in-depth genomic studies. Results GeneViTo is a JAVA-based computer application that serves as a workbench for genome-wide analysis through visual interaction. The application deals with various experimental information concerning both DNA and protein sequences (derived from public sequence databases or proprietary data sources and meta-data obtained by various prediction algorithms, classification schemes or user-defined features. Interaction with a Graphical User Interface (GUI allows easy extraction of genomic and proteomic data referring to the sequence itself, sequence features, or general structural and functional features. Emphasis is laid on the potential comparison between annotation and prediction data in order to offer a supplement to the provided information, especially in cases of "poor" annotation, or an evaluation of available predictions. Moreover, desired information can be output in high quality JPEG image files for further elaboration and scientific use. A compilation of properly formatted GeneViTo input data for demonstration is available to interested readers for two completely sequenced prokaryotes, Chlamydia trachomatis and Methanococcus jannaschii. Conclusions GeneViTo offers an inspectional view of genomic functional elements, concerning data stemming both from database annotation and analysis tools for an overall analysis of existing genomes. The application is compatible with Linux or Windows ME-2000-XP operating

  9. Rare genomic structural variants in complex disease: lessons from the replication of associations with obesity.

    Directory of Open Access Journals (Sweden)

    Robin G Walters

    Full Text Available The limited ability of common variants to account for the genetic contribution to complex disease has prompted searches for rare variants of large effect, to partly explain the 'missing heritability'. Analyses of genome-wide genotyping data have identified genomic structural variants (GSVs as a source of such rare causal variants. Recent studies have reported multiple GSV loci associated with risk of obesity. We attempted to replicate these associations by similar analysis of two familial-obesity case-control cohorts and a population cohort, and detected GSVs at 11 out of 18 loci, at frequencies similar to those previously reported. Based on their reported frequencies and effect sizes (OR≥25, we had sufficient statistical power to detect the large majority (80% of genuine associations at these loci. However, only one obesity association was replicated. Deletion of a 220 kb region on chromosome 16p11.2 has a carrier population frequency of 2×10(-4 (95% confidence interval [9.6×10(-5-3.1×10(-4]; accounts overall for 0.5% [0.19%-0.82%] of severe childhood obesity cases (P = 3.8×10(-10; odds ratio = 25.0 [9.9-60.6]; and results in a mean body mass index (BMI increase of 5.8 kg.m(-2 [1.8-10.3] in adults from the general population. We also attempted replication using BMI as a quantitative trait in our population cohort; associations with BMI at or near nominal significance were detected at two further loci near KIF2B and within FOXP2, but these did not survive correction for multiple testing. These findings emphasise several issues of importance when conducting rare GSV association, including the need for careful cohort selection and replication strategy, accurate GSV identification, and appropriate correction for multiple testing and/or control of false discovery rate. Moreover, they highlight the potential difficulty in replicating rare CNV associations across different populations. Nevertheless, we show that such studies are potentially

  10. GenRGenS: Software for Generating Random Genomic Sequences and Structures

    OpenAIRE

    Ponty , Yann; Termier , Michel; Denise , Alain

    2006-01-01

    International audience; GenRGenS is a software tool dedicated to randomly generating genomic sequences and structures. It handles several classes of models useful for sequence analysis, such as Markov chains, hidden Markov models, weighted context-free grammars, regular expressions and PROSITE expressions. GenRGenS is the only program that can handle weighted context-free grammars, thus allowing the user to model and to generate structured objects (such as RNA secondary structures) of any giv...

  11. SL1 revisited: functional analysis of the structure and conformation of HIV-1 genome RNA.

    Science.gov (United States)

    Sakuragi, Sayuri; Yokoyama, Masaru; Shioda, Tatsuo; Sato, Hironori; Sakuragi, Jun-Ichi

    2016-11-11

    The dimer initiation site/dimer linkage sequence (DIS/DLS) region of HIV is located on the 5' end of the viral genome and suggested to form complex secondary/tertiary structures. Within this structure, stem-loop 1 (SL1) is believed to be most important and an essential key to dimerization, since the sequence and predicted secondary structure of SL1 are highly stable and conserved among various virus subtypes. In particular, a six-base palindromic sequence is always present at the hairpin loop of SL1 and the formation of kissing-loop structure at this position between the two strands of genomic RNA is suggested to trigger dimerization. Although the higher-order structure model of SL1 is well accepted and perhaps even undoubted lately, there could be stillroom for consideration to depict the functional SL1 structure while in vivo (in virion or cell). In this study, we performed several analyses to identify the nucleotides and/or basepairing within SL1 which are necessary for HIV-1 genome dimerization, encapsidation, recombination and infectivity. We unexpectedly found that some nucleotides that are believed to contribute the formation of the stem do not impact dimerization or infectivity. On the other hand, we found that one G-C basepair involved in stem formation may serve as an alternative dimer interactive site. We also report on our further investigation of the roles of the palindromic sequences on viral replication. Collectively, we aim to assemble a more-comprehensive functional map of SL1 on the HIV-1 viral life cycle. We discovered several possibilities for a novel structure of SL1 in HIV-1 DLS. The newly proposed structure model suggested that the hairpin loop of SL1 appeared larger, and genome dimerization process might consist of more complicated mechanism than previously understood. Further investigations would be still required to fully understand the genome packaging and dimerization of HIV.

  12. Gene order data from a model amphibian (Ambystoma: new perspectives on vertebrate genome structure and evolution

    Directory of Open Access Journals (Sweden)

    Voss S Randal

    2006-08-01

    Full Text Available Abstract Background Because amphibians arise from a branch of the vertebrate evolutionary tree that is juxtaposed between fishes and amniotes, they provide important comparative perspective for reconstructing character changes that have occurred during vertebrate evolution. Here, we report the first comparative study of vertebrate genome structure that includes a representative amphibian. We used 491 transcribed sequences from a salamander (Ambystoma genetic map and whole genome assemblies for human, mouse, rat, dog, chicken, zebrafish, and the freshwater pufferfish Tetraodon nigroviridis to compare gene orders and rearrangement rates. Results Ambystoma has experienced a rate of genome rearrangement that is substantially lower than mammalian species but similar to that of chicken and fish. Overall, we found greater conservation of genome structure between Ambystoma and tetrapod vertebrates, nevertheless, 57% of Ambystoma-fish orthologs are found in conserved syntenies of four or more genes. Comparisons between Ambystoma and amniotes reveal extensive conservation of segmental homology for 57% of the presumptive Ambystoma-amniote orthologs. Conclusion Our analyses suggest relatively constant interchromosomal rearrangement rates from the euteleost ancestor to the origin of mammals and illustrate the utility of amphibian mapping data in establishing ancestral amniote and tetrapod gene orders. Comparisons between Ambystoma and amniotes reveal some of the key events that have structured the human genome since diversification of the ancestral amniote lineage.

  13. The genomic structure of the human UFO receptor.

    Science.gov (United States)

    Schulz, A S; Schleithoff, L; Faust, M; Bartram, C R; Janssen, J W

    1993-02-01

    Using a DNA transfection-tumorigenicity assay we have recently identified the UFO oncogene. It encodes a tyrosine kinase receptor characterized by the juxtaposition of two immunoglobulin-like and two fibronectin type III repeats in its extracellular domain. Here we describe the genomic organization of the human UFO locus. The UFO receptor is encoded by 20 exons that are distributed over a region of 44 kb. Different isoforms of UFO mRNA are generated by alternative splicing of exon 10 and differential usage of two imperfect polyadenylation sites resulting in the presence or absence of 1.5-kb 3' untranslated sequences. Primer extension and S1 nuclease analyses revealed multiple transcriptional initiation sites including a major site 169 bp upstream of the translation start site. The promoter region is GC rich, lacks TATA and CAAT boxes, but contains potential recognition sites for a variety of trans-acting factors, including Sp1, AP-2 and the cyclic AMP response element-binding protein. Proto-UFO and its oncogenic counterpart exhibit identical cDNA and promoter regions sequences. Possible modes of UFO activation are discussed.

  14. Protein structure similarity clustering (PSSC) and natural product structure as inspiration sources for drug development and chemical genomics

    NARCIS (Netherlands)

    Dekker, Frank J; Koch, Marcus A; Waldmann, Herbert; Dekker, Frans

    Finding small molecules that modulate protein function is of primary importance in drug development and in the emerging field of chemical genomics. To facilitate the identification of such molecules, we developed a novel strategy making use of structural conservatism found in protein domain

  15. Evolution of the Exon-Intron Structure in Ciliate Genomes.

    Directory of Open Access Journals (Sweden)

    Vladyslav S Bondarenko

    Full Text Available A typical eukaryotic gene is comprised of alternating stretches of regions, exons and introns, retained in and spliced out a mature mRNA, respectively. Although the length of introns may vary substantially among organisms, a large fraction of genes contains short introns in many species. Notably, some Ciliates (Paramecium and Nyctotherus possess only ultra-short introns, around 25 bp long. In Paramecium, ultra-short introns with length divisible by three (3n are under strong evolutionary pressure and have a high frequency of in-frame stop codons, which, in the case of intron retention, cause premature termination of mRNA translation and consequent degradation of the mis-spliced mRNA by the nonsense-mediated decay mechanism. Here, we analyzed introns in five genera of Ciliates, Paramecium, Tetrahymena, Ichthyophthirius, Oxytricha, and Stylonychia. Introns can be classified into two length classes in Tetrahymena and Ichthyophthirius (with means 48 bp, 69 bp, and 55 bp, 64 bp, respectively, but, surprisingly, comprise three distinct length classes in Oxytricha and Stylonychia (with means 33-35 bp, 47-51 bp, and 78-80 bp. In most ranges of the intron lengths, 3n introns are underrepresented and have a high frequency of in-frame stop codons in all studied species. Introns of Paramecium, Tetrahymena, and Ichthyophthirius are preferentially located at the 5' and 3' ends of genes, whereas introns of Oxytricha and Stylonychia are strongly skewed towards the 5' end. Analysis of evolutionary conservation shows that, in each studied genome, a significant fraction of intron positions is conserved between the orthologs, but intron lengths are not correlated between the species. In summary, our study provides a detailed characterization of introns in several genera of Ciliates and highlights some of their distinctive properties, which, together, indicate that splicing spellchecking is a universal and evolutionarily conserved process in the biogenesis of short

  16. Genomic mid-range inhomogeneity correlates with an abundance of RNA secondary structures

    Directory of Open Access Journals (Sweden)

    Song Jun

    2008-06-01

    Full Text Available Abstract Background Genomes possess different levels of non-randomness, in particular, an inhomogeneity in their nucleotide composition. Inhomogeneity is manifest from the short-range where neighboring nucleotides influence the choice of base at a site, to the long-range, commonly known as isochores, where a particular base composition can span millions of nucleotides. A separate genomic issue that has yet to be thoroughly elucidated is the role that RNA secondary structure (SS plays in gene expression. Results We present novel data and approaches that show that a mid-range inhomogeneity (~30 to 1000 nt not only exists in mammalian genomes but is also significantly associated with strong RNA SS. A whole-genome bioinformatics investigation of local SS in a set of 11,315 non-redundant human pre-mRNA sequences has been carried out. Four distinct components of these molecules (5'-UTRs, exons, introns and 3'-UTRs were considered separately, since they differ in overall nucleotide composition, sequence motifs and periodicities. For each pre-mRNA component, the abundance of strong local SS ( Conclusion We demonstrate that the excess of strong local SS in pre-mRNAs is linked to the little explored phenomenon of genomic mid-range inhomogeneity (MRI. MRI is an interdependence between nucleotide choice and base composition over a distance of 20–1000 nt. Additionally, we have created a public computational resource to support further study of genomic MRI.

  17. Chloroplast genomes of Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea: Structures and comparative analysis.

    Science.gov (United States)

    Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung

    2017-08-08

    We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.

  18. Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

    Science.gov (United States)

    Jeong, Young-Min; Kim, Namshin; Ahn, Byung Ohg; Oh, Mijin; Chung, Won-Hyong; Chung, Hee; Jeong, Seongmun; Lim, Ki-Byung; Hwang, Yoon-Jung; Kim, Goon-Bo; Baek, Seunghoon; Choi, Sang-Bong; Hyung, Dae-Jin; Lee, Seung-Won; Sohn, Seong-Han; Kwon, Soo-Jin; Jin, Mina; Seol, Young-Joo; Chae, Won Byoung; Choi, Keun Jin; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan

    2016-07-01

    This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

  19. Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

    Science.gov (United States)

    Gonçalves, Juliana W; Valiati, Victor Hugo; Delprat, Alejandra; Valente, Vera L S; Ruiz, Alfredo

    2014-09-13

    Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 sequenced Drosophila genomes, indicating its widespread distribution within this genus. Galileo is strikingly abundant in Drosophila willistoni, a neotropical species that is highly polymorphic for chromosomal inversions, suggesting a role for this transposon in the evolution of its genome. We carried out a detailed characterization of all Galileo copies present in the D. willistoni genome. A total of 191 copies, including 133 with two terminal inverted repeats (TIRs), were classified according to structure in six groups. The TIRs exhibited remarkable variation in their length and structure compared to the most complete copy. Three copies showed extended TIRs due to internal tandem repeats, the insertion of other transposable elements (TEs), or the incorporation of non-TIR sequences into the TIRs. Phylogenetic analyses of the transposase (TPase)-encoding and TIR segments yielded two divergent clades, which we termed Galileo subfamilies V and W. Target-site duplications (TSDs) in D. willistoni Galileo copies were 7- or 8-bp in length, with the consensus sequence GTATTAC. Analysis of the region around the TSDs revealed a target site motif (TSM) with a 15-bp palindrome that may give rise to a stem-loop secondary structure. There is a remarkable abundance and diversity of Galileo copies in the D. willistoni genome, although no functional copies were found. The TIRs in particular have a dynamic structure and extend in different ways, but their ends (required for transposition) are more conserved than the rest of the element. The D. willistoni genome harbors two Galileo subfamilies (V and W) that diverged ~9 million years ago and may have descended from an ancestral

  20. Structural genomic variation in childhood epilepsies with complex phenotypes

    DEFF Research Database (Denmark)

    Helbig, Ingo; Swinkels, Marielle E M; Aten, Emmelien

    2014-01-01

    of CNVs in patients with unclassified epilepsies and complex phenotypes. A total of 222 patients from three European countries, including patients with structural lesions on magnetic resonance imaging (MRI), dysmorphic features, and multiple congenital anomalies, were clinically evaluated and screened.......9%). Segregation of all identified variants could be assessed in 42 patients, 11 of which were de novo. The frequency of all structural variants and de novo variants was not statistically different between patients with or without MRI abnormalities or MRI subcategories. Patients with dysmorphic features were more...

  1. Structured RNAs and synteny regions in the pig genome

    DEFF Research Database (Denmark)

    Anthon, Christian; Tafer, Hakim; Havgaard, Jakob Hull

    2014-01-01

    annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo...

  2. Genomes in Turmoil: Frugality Drives Microbial Community Structure in Extremely Acidic Environments

    Science.gov (United States)

    Holmes, D. S.

    2016-12-01

    Extremely acidic environments (To gain insight into these issues, we have conducted deep bioinformatic analyses, including metabolic reconstruction of key assimilatory pathways, phylogenomics and network scrutiny of >160 genomes of acidophiles, including representatives from Archaea, Bacteria and Eukarya and at least ten metagenomes of acidic environments [Cardenas JP, et al. pp 179-197 in Acidophiles, eds R. Quatrini and D. B. Johnson, Caister Academic Press, UK (2016)]. Results yielded valuable insights into cellular processes, including carbon and nitrogen management and energy production, linking biogeochemical processes to organismal physiology. They also provided insight into the evolutionary forces that shape the genomic structure of members of acidophile communities. Niche partitioning can explain diversity patterns in rapidly changing acidic environments such as bioleaching heaps. However, in spatially and temporally homogeneous acidic environments genome flux appears to provide deeper insight into the composition and evolution of acidic consortia. Acidophiles have undergone genome streamlining by gene loss promoting mutual coexistence of species that exploit complementarity use of scarce resources consistent with the Black Queen hypothesis [Morris JJ et al. mBio 3: e00036-12 (2012)]. Acidophiles also have a large pool of accessory genes (the microbial super-genome) that can be accessed by horizontal gene transfer. This further promotes dependency relationships as drivers of community structure and the evolution of keystone species. Acknowledgements: Fondecyt 1130683; Basal CCTE PFB16

  3. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants

    NARCIS (Netherlands)

    Hehir-Kwa, J.Y.; Marschall, T.; Kloosterman, W.P.; Francioli, L.C.; Baaijens, J.A.; Dijkstra, L.J.; Abdellaoui, A.; Koval, V.; Thung, D.T.; Wardenaar, R.; Renkens, I.; Coe, B.P.; Deelen, P.; de Ligt, J.; Lameijer, E.W.; Dijk, F.; Hormozdiari, F.; Uitterlinden, A.G.; van Duijn, C.M.; Eichler, E.E.; Bakker, P.I.W.; Swertz, M.A.; Wijmenga, C.; van Ommen, G.J.B; Slagboom, P.E.; Boomsma, D.I.; Schönhuth, A.; Ye, K.; Guryev, V.

    2016-01-01

    Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic

  4. New families of human regulatory RNA structures identified by comparative analysis of vertebrate genomes.

    Science.gov (United States)

    Parker, Brian J; Moltke, Ida; Roth, Adam; Washietl, Stefan; Wen, Jiayu; Kellis, Manolis; Breaker, Ronald; Pedersen, Jakob Skou

    2011-11-01

    Regulatory RNA structures are often members of families with multiple paralogous instances across the genome. Family members share functional and structural properties, which allow them to be studied as a whole, facilitating both bioinformatic and experimental characterization. We have developed a comparative method, EvoFam, for genome-wide identification of families of regulatory RNA structures, based on primary sequence and secondary structure similarity. We apply EvoFam to a 41-way genomic vertebrate alignment. Genome-wide, we identify 220 human, high-confidence families outside protein-coding regions comprising 725 individual structures, including 48 families with known structural RNA elements. Known families identified include both noncoding RNAs, e.g., miRNAs and the recently identified MALAT1/MEN β lincRNA family; and cis-regulatory structures, e.g., iron-responsive elements. We also identify tens of new families supported by strong evolutionary evidence and other statistical evidence, such as GO term enrichments. For some of these, detailed analysis has led to the formulation of specific functional hypotheses. Examples include two hypothesized auto-regulatory feedback mechanisms: one involving six long hairpins in the 3'-UTR of MAT2A, a key metabolic gene that produces the primary human methyl donor S-adenosylmethionine; the other involving a tRNA-like structure in the intron of the tRNA maturation gene POP1. We experimentally validate the predicted MAT2A structures. Finally, we identify potential new regulatory networks, including large families of short hairpins enriched in immunity-related genes, e.g., TNF, FOS, and CTLA4, which include known transcript destabilizing elements. Our findings exemplify the diversity of post-transcriptional regulation and provide a resource for further characterization of new regulatory mechanisms and families of noncoding RNAs.

  5. Population Structure Analysis of Bull Genomes of European and Western Ancestry

    DEFF Research Database (Denmark)

    Chung, Neo Christopher; Szyda, Joanna; Frąszczak, Magdalena

    2017-01-01

    Since domestication, population bottlenecks, breed formation, and selective breeding have radically shaped the genealogy and genetics of Bos taurus. In turn, characterization of population structure among diverse bull (males of Bos taurus) genomes enables detailed assessment of genetic resources...... and origins. By analyzing 432 unrelated bull genomes from 13 breeds and 16 countries, we demonstrate genetic diversity and structural complexity among the European/Western cattle population. Importantly, we relaxed a strong assumption of discrete or admixed population, by adapting latent variable models...... harboring largest genetic differentiation suggest positive selection underlying population structure. We carried out gene set analysis using SNP annotations to identify enriched functional categories such as energy-related processes and multiple development stages. Our population structure analysis of bull...

  6. Genomic analysis of the hierarchical structure of regulatory networks

    Science.gov (United States)

    Yu, Haiyuan; Gerstein, Mark

    2006-01-01

    A fundamental question in biology is how the cell uses transcription factors (TFs) to coordinate the expression of thousands of genes in response to various stimuli. The relationships between TFs and their target genes can be modeled in terms of directed regulatory networks. These relationships, in turn, can be readily compared with commonplace “chain-of-command” structures in social networks, which have characteristic hierarchical layouts. Here, we develop algorithms for identifying generalized hierarchies (allowing for various loop structures) and use these approaches to illuminate extensive pyramid-shaped hierarchical structures existing in the regulatory networks of representative prokaryotes (Escherichia coli) and eukaryotes (Saccharomyces cerevisiae), with most TFs at the bottom levels and only a few master TFs on top. These masters are situated near the center of the protein–protein interaction network, a different type of network from the regulatory one, and they receive most of the input for the whole regulatory hierarchy through protein interactions. Moreover, they have maximal influence over other genes, in terms of affecting expression-level changes. Surprisingly, however, TFs at the bottom of the regulatory hierarchy are more essential to the viability of the cell. Finally, one might think master TFs achieve their wide influence through directly regulating many targets, but TFs with most direct targets are in the middle of the hierarchy. We find, in fact, that these midlevel TFs are “control bottlenecks” in the hierarchy, and this great degree of control for “middle managers” has parallels in efficient social structures in various corporate and governmental settings. PMID:17003135

  7. Use of the Operon Structure of the C. elegans Genome as a Tool to Identify Functionally Related Proteins

    Directory of Open Access Journals (Sweden)

    Silvia Dossena

    2013-12-01

    Full Text Available One of the most pressing challenges in the post genomic era is the identification and characterization of protein-protein interactions (PPIs, as these are essential in understanding the cellular physiology of health and disease. Experimental techniques suitable for characterizing PPIs (X-ray crystallography or nuclear magnetic resonance spectroscopy, among others are usually laborious, time-consuming and often difficult to apply to membrane proteins, and therefore require accurate prediction of the candidate interacting partners. High-throughput experimental methods (yeast two-hybrid and affinity purification succumb to the same shortcomings, and can also lead to high rates of false positive and negative results. Therefore, reliable tools for predicting PPIs are needed. The use of the operon structure in the eukaryote Caenorhabditis elegans genome is a valuable, though underserved, tool for identifying physically or functionally interacting proteins. Based on the concept that genes organized in the same operon may encode physically or functionally related proteins, this algorithm is easy to be applied and, importantly, gives a limited number of candidate partners of a given protein, allowing for focused experimental verification. Moreover, this approach can be successfully used to predict PPIs in the human system, including those of membrane proteins.

  8. A new model of dispersion for metals leading to a more accurate modeling of plasmonic structures using the FDTD method

    Energy Technology Data Exchange (ETDEWEB)

    Vial, A.; Dridi, M.; Cunff, L. le [Universite de Technologie de Troyes, Institut Charles Delaunay, CNRS UMR 6279, Laboratoire de Nanotechnologie et d' Instrumentation Optique, 12, rue Marie Curie, BP-2060, Troyes Cedex (France); Laroche, T. [Universite de Franche-Comte, Institut FEMTO-ST, CNRS UMR 6174, Departement de Physique et de Metrologie des Oscillateurs, Besancon Cedex (France)

    2011-06-15

    We present FDTD simulations results obtained using the Drude critical points model. This model enables spectroscopic studies of metallic structures over wider wavelength ranges than usually used, and it facilitates the study of structures made of several metals. (orig.)

  9. The Drosophila Helicase MLE Targets Hairpin Structures in Genomic Transcripts.

    Directory of Open Access Journals (Sweden)

    Simona Cugusi

    2016-01-01

    Full Text Available RNA hairpins are a common type of secondary structures that play a role in every aspect of RNA biochemistry including RNA editing, mRNA stability, localization and translation of transcripts, and in the activation of the RNA interference (RNAi and microRNA (miRNA pathways. Participation in these functions often requires restructuring the RNA molecules by the association of single-strand (ss RNA-binding proteins or by the action of helicases. The Drosophila MLE helicase has long been identified as a member of the MSL complex responsible for dosage compensation. The complex includes one of two long non-coding RNAs and MLE was shown to remodel the roX RNA hairpin structures in order to initiate assembly of the complex. Here we report that this function of MLE may apply to the hairpins present in the primary RNA transcripts that generate the small molecules responsible for RNA interference. Using stocks from the Transgenic RNAi Project and the Vienna Drosophila Research Center, we show that MLE specifically targets hairpin RNAs at their site of transcription. The association of MLE at these sites is independent of sequence and chromosome location. We use two functional assays to test the biological relevance of this association and determine that MLE participates in the RNAi pathway.

  10. The complete mitochondrial genome structure of the jaguar (Panthera onca).

    Science.gov (United States)

    Caragiulo, Anthony; Dougherty, Eric; Soto, Sofia; Rabinowitz, Salisa; Amato, George

    2016-01-01

    The jaguar (Panthera onca) is the largest felid in the Western hemisphere, and the only member of the Panthera genus in the New World. The jaguar inhabits most countries within Central and South America, and is considered near threatened by the International Union for the Conservation of Nature. This study represents the first sequence of the entire jaguar mitogenome, which was the only Panthera mitogenome that had not been sequenced. The jaguar mitogenome is 17,049 bases and possesses the same molecular structure as other felid mitogenomes. Bayesian inference (BI) and maximum likelihood (ML) were used to determine the phylogenetic placement of the jaguar within the Panthera genus. Both BI and ML analyses revealed the jaguar to be sister to the tiger/leopard/snow leopard clade.

  11. Complete Chloroplast Genomes of Papaver rhoeas and Papaver orientale: Molecular Structures, Comparative Analysis, and Phylogenetic Analysis

    Directory of Open Access Journals (Sweden)

    Jianguo Zhou

    2018-02-01

    Full Text Available Papaver rhoeas L. and P. orientale L., which belong to the family Papaveraceae, are used as ornamental and medicinal plants. The chloroplast genome has been used for molecular markers, evolutionary biology, and barcoding identification. In this study, the complete chloroplast genome sequences of P. rhoeas and P. orientale are reported. Results show that the complete chloroplast genomes of P. rhoeas and P. orientale have typical quadripartite structures, which are comprised of circular 152,905 and 152,799-bp-long molecules, respectively. A total of 130 genes were identified in each genome, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence divergence analysis of four species from Papaveraceae indicated that the most divergent regions are found in the non-coding spacers with minimal differences among three Papaver species. These differences include the ycf1 gene and intergenic regions, such as rpoB-trnC, trnD-trnT, petA-psbJ, psbE-petL, and ccsA-ndhD. These regions are hypervariable regions, which can be used as specific DNA barcodes. This finding suggested that the chloroplast genome could be used as a powerful tool to resolve the phylogenetic positions and relationships of Papaveraceae. These results offer valuable information for future research in the identification of Papaver species and will benefit further investigations of these species.

  12. Complete plastid genomes from Ophioglossum californicum, Psilotum nudum, and Equisetum hyemale reveal an ancestral land plant genome structure and resolve the position of Equisetales among monilophytes

    Directory of Open Access Journals (Sweden)

    Grewe Felix

    2013-01-01

    Full Text Available Abstract Background Plastid genome structure and content is remarkably conserved in land plants. This widespread conservation has facilitated taxon-rich phylogenetic analyses that have resolved organismal relationships among many land plant groups. However, the relationships among major fern lineages, especially the placement of Equisetales, remain enigmatic. Results In order to understand the evolution of plastid genomes and to establish phylogenetic relationships among ferns, we sequenced the plastid genomes from three early diverging species: Equisetum hyemale (Equisetales, Ophioglossum californicum (Ophioglossales, and Psilotum nudum (Psilotales. A comparison of fern plastid genomes showed that some lineages have retained inverted repeat (IR boundaries originating from the common ancestor of land plants, while other lineages have experienced multiple IR changes including expansions and inversions. Genome content has remained stable throughout ferns, except for a few lineage-specific losses of genes and introns. Notably, the losses of the rps16 gene and the rps12i346 intron are shared among Psilotales, Ophioglossales, and Equisetales, while the gain of a mitochondrial atp1 intron is shared between Marattiales and Polypodiopsida. These genomic structural changes support the placement of Equisetales as sister to Ophioglossales + Psilotales and Marattiales as sister to Polypodiopsida. This result is augmented by some molecular phylogenetic analyses that recover the same relationships, whereas others suggest a relationship between Equisetales and Polypodiopsida. Conclusions Although molecular analyses were inconsistent with respect to the position of Marattiales and Equisetales, several genomic structural changes have for the first time provided a clear placement of these lineages within the ferns. These results further demonstrate the power of using rare genomic structural changes in cases where molecular data fail to provide strong phylogenetic

  13. Reference-quality genome sequence of Aegilops tauschii, the source of wheat D genome, shows that recombination shapes genome structure and evolution

    Science.gov (United States)

    Aegilops tauschii is the diploid progenitor of the D genome of hexaploid wheat and an important genetic resource for wheat. A reference-quality sequence for the Ae. tauschii genome was produced with a combination of ordered-clone sequencing, whole-genome shotgun sequencing, and BioNano optical geno...

  14. De novo prediction of human chromosome structures: Epigenetic marking patterns encode genome architecture.

    Science.gov (United States)

    Di Pierro, Michele; Cheng, Ryan R; Lieberman Aiden, Erez; Wolynes, Peter G; Onuchic, José N

    2017-11-14

    Inside the cell nucleus, genomes fold into organized structures that are characteristic of cell type. Here, we show that this chromatin architecture can be predicted de novo using epigenetic data derived from chromatin immunoprecipitation-sequencing (ChIP-Seq). We exploit the idea that chromosomes encode a 1D sequence of chromatin structural types. Interactions between these chromatin types determine the 3D structural ensemble of chromosomes through a process similar to phase separation. First, a neural network is used to infer the relation between the epigenetic marks present at a locus, as assayed by ChIP-Seq, and the genomic compartment in which those loci reside, as measured by DNA-DNA proximity ligation (Hi-C). Next, types inferred from this neural network are used as an input to an energy landscape model for chromatin organization [Minimal Chromatin Model (MiChroM)] to generate an ensemble of 3D chromosome conformations at a resolution of 50 kilobases (kb). After training the model, dubbed Maximum Entropy Genomic Annotation from Biomarkers Associated to Structural Ensembles (MEGABASE), on odd-numbered chromosomes, we predict the sequences of chromatin types and the subsequent 3D conformational ensembles for the even chromosomes. We validate these structural ensembles by using ChIP-Seq tracks alone to predict Hi-C maps, as well as distances measured using 3D fluorescence in situ hybridization (FISH) experiments. Both sets of experiments support the hypothesis of phase separation being the driving process behind compartmentalization. These findings strongly suggest that epigenetic marking patterns encode sufficient information to determine the global architecture of chromosomes and that de novo structure prediction for whole genomes may be increasingly possible. Copyright © 2017 the Author(s). Published by PNAS.

  15. Plastid genome structure and loss of photosynthetic ability in the parasitic genus Cuscuta.

    Science.gov (United States)

    Revill, Meredith J W; Stanley, Susan; Hibberd, Julian M

    2005-09-01

    The genus Cuscuta (dodder) is composed of parasitic plants, some species of which appear to be losing the ability to photosynthesize. A molecular phylogeny was constructed using 15 species of Cuscuta in order to assess whether changes in photosynthetic ability and alterations in structure of the plastid genome relate to phylogenetic position within the genus. The molecular phylogeny provides evidence for four major clades within Cuscuta. Although DNA blot analysis showed that Cuscuta species have smaller plastid genomes than tobacco, and that plastome size varied significantly even within one Cuscuta clade, dot blot analysis indicated that the dodders possess homologous sequence to 101 genes from the tobacco plastome. Evidence is provided for significant rates of DNA transfer from plastid to nucleus in Cuscuta. Size and structure of Cuscuta plastid genomes, as well as photosynthetic ability, appear to vary independently of position within the phylogeny, thus supporting the hypothesis that within Cuscuta photosynthetic ability and organization of the plastid genome are changing in an unco-ordinated manner.

  16. A genome wide survey of SNP variation reveals the genetic structure of sheep breeds.

    Directory of Open Access Journals (Sweden)

    James W Kijas

    Full Text Available The genetic structure of sheep reflects their domestication and subsequent formation into discrete breeds. Understanding genetic structure is essential for achieving genetic improvement through genome-wide association studies, genomic selection and the dissection of quantitative traits. After identifying the first genome-wide set of SNP for sheep, we report on levels of genetic variability both within and between a diverse sample of ovine populations. Then, using cluster analysis and the partitioning of genetic variation, we demonstrate sheep are characterised by weak phylogeographic structure, overlapping genetic similarity and generally low differentiation which is consistent with their short evolutionary history. The degree of population substructure was, however, sufficient to cluster individuals based on geographic origin and known breed history. Specifically, African and Asian populations clustered separately from breeds of European origin sampled from Australia, New Zealand, Europe and North America. Furthermore, we demonstrate the presence of stratification within some, but not all, ovine breeds. The results emphasize that careful documentation of genetic structure will be an essential prerequisite when mapping the genetic basis of complex traits. Furthermore, the identification of a subset of SNP able to assign individuals into broad groupings demonstrates even a small panel of markers may be suitable for applications such as traceability.

  17. Morphology, genome sequence, and structural proteome of type phage P335 from Lactococcus lactis

    DEFF Research Database (Denmark)

    Labrie, Simon J.; Josephsen, Jytte; Neve, Horst

    2008-01-01

    for a shorter tail and a different collar/whisker structure. Its 33,613-bp double-stranded DNA genome had 50 open reading frames. Putative functions were assigned to 29 of them. Unlike other sequenced genomes from lactococcal phages belonging to this species, P335 did not have a lysogeny module. However, it did...... genome. The genetic diversity of the P335 species indicates that they are exceptional models for studying the modular theory of phage evolution....

  18. RCK: accurate and efficient inference of sequence- and structure-based protein-RNA binding models from RNAcompete data.

    Science.gov (United States)

    Orenstein, Yaron; Wang, Yuhao; Berger, Bonnie

    2016-06-15

    Protein-RNA interactions, which play vital roles in many processes, are mediated through both RNA sequence and structure. CLIP-based methods, which measure protein-RNA binding in vivo, suffer from experimental noise and systematic biases, whereas in vitro experiments capture a clearer signal of protein RNA-binding. Among them, RNAcompete provides binding affinities of a specific protein to more than 240 000 unstructured RNA probes in one experiment. The computational challenge is to infer RNA structure- and sequence-based binding models from these data. The state-of-the-art in sequence models, Deepbind, does not model structural preferences. RNAcontext models both sequence and structure preferences, but is outperformed by GraphProt. Unfortunately, GraphProt cannot detect structural preferences from RNAcompete data due to the unstructured nature of the data, as noted by its developers, nor can it be tractably run on the full RNACompete dataset. We develop RCK, an efficient, scalable algorithm that infers both sequence and structure preferences based on a new k-mer based model. Remarkably, even though RNAcompete data is designed to be unstructured, RCK can still learn structural preferences from it. RCK significantly outperforms both RNAcontext and Deepbind in in vitro binding prediction for 244 RNAcompete experiments. Moreover, RCK is also faster and uses less memory, which enables scalability. While currently on par with existing methods in in vivo binding prediction on a small scale test, we demonstrate that RCK will increasingly benefit from experimentally measured RNA structure profiles as compared to computationally predicted ones. By running RCK on the entire RNAcompete dataset, we generate and provide as a resource a set of protein-RNA structure-based models on an unprecedented scale. Software and models are freely available at http://rck.csail.mit.edu/ bab@mit.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by

  19. An accurate quantification of the flow structure along the acoustic signal cycle in a forced two-phase jet

    Directory of Open Access Journals (Sweden)

    Calvo Bernad Esteban

    2014-03-01

    Full Text Available This paper provides an experimental study of an acoustically forced two-phase air jet generated by a convergent nozzle. The used particles are transparent glass spheres with diameters between 2 and 50 μm (which gives Stokes number of order 1 and the selected forcing frequency (f=400 Hz induces a powerful nearly periodic flow pattern. Measurements were done by a two-colour Phase-Doppler Anemometer. The experimental setup is computer-controlled to provide an accurate control with a high long-term stability. Measurements cover the whole forcing signal cycle. Raw measurements were carefully post-processed to avoid bias induced by the forcing and the instrument setup, as well as obtain right mean values of the dispersed flow. The effect of the forcing and the particle load allows authors to establish the effect of the acoustic forcing and the particle load on the jet.

  20. A Small Area In-Situ MEMS Test Structure to Accurately Measure Fracture Strength by Electrostatic Probing

    Energy Technology Data Exchange (ETDEWEB)

    Bitsie, Fernando; Jensen, Brian D.; de Boer, Maarten

    1999-07-15

    We have designed, fabricated, tested and modeled a first generation small area test structure for MEMS fracture studies by electrostatic rather than mechanical probing. Because of its small area, this device has potential applications as a lot monitor of strength or fatigue of the MEMS structural material. By matching deflection versus applied voltage data to a 3-D model of the test structure, we develop high confidence that the local stresses achieved in the gage section are greater than 1 GPa. Brittle failure of the polycrystalline silicon was observed.

  1. Identification of balanced chromosomal rearrangements previously unknown among participants in the 1000 Genomes Project: implications for interpretation of structural variation in genomes and the future of clinical cytogenetics.

    Science.gov (United States)

    Dong, Zirui; Wang, Huilin; Chen, Haixiao; Jiang, Hui; Yuan, Jianying; Yang, Zhenjun; Wang, Wen-Jing; Xu, Fengping; Guo, Xiaosen; Cao, Ye; Zhu, Zhenzhen; Geng, Chunyu; Cheung, Wan Chee; Kwok, Yvonne K; Yang, Huanming; Leung, Tak Yeung; Morton, Cynthia C; Cheung, Sau Wai; Choy, Kwong Wai

    2017-11-02

    PurposeRecent studies demonstrate that whole-genome sequencing enables detection of cryptic rearrangements in apparently balanced chromosomal rearrangements (also known as balanced chromosomal abnormalities, BCAs) previously identified by conventional cytogenetic methods. We aimed to assess our analytical tool for detecting BCAs in the 1000 Genomes Project without knowing which bands were affected.MethodsThe 1000 Genomes Project provides an unprecedented integrated map of structural variants in phenotypically normal subjects, but there is no information on potential inclusion of subjects with apparent BCAs akin to those traditionally detected in diagnostic cytogenetics laboratories. We applied our analytical tool to 1,166 genomes from the 1000 Genomes Project with sufficient physical coverage (8.25-fold).ResultsWith this approach, we detected four reciprocal balanced translocations and four inversions, ranging in size from 57.9 kb to 13.3 Mb, all of which were confirmed by cytogenetic methods and polymerase chain reaction studies. One of these DNAs has a subtle translocation that is not readily identified by chromosome analysis because of the similarity of the banding patterns and size of exchanged segments, and another results in disruption of all transcripts of an OMIM gene.ConclusionOur study demonstrates the extension of utilizing low-pass whole-genome sequencing for unbiased detection of BCAs including translocations and inversions previously unknown in the 1000 Genomes Project.GENETICS in MEDICINE advance online publication, 2 November 2017; doi:10.1038/gim.2017.170.

  2. Effects of aneuploidy on genome structure, expression, and interphase organization in Arabidopsis thaliana.

    Directory of Open Access Journals (Sweden)

    Bruno Huettel

    2008-10-01

    Full Text Available Aneuploidy refers to losses and/or gains of individual chromosomes from the normal chromosome set. The resulting gene dosage imbalance has a noticeable affect on the phenotype, as illustrated by aneuploid syndromes, including Down syndrome in humans, and by human solid tumor cells, which are highly aneuploid. Although the phenotypic manifestations of aneuploidy are usually apparent, information about the underlying alterations in structure, expression, and interphase organization of unbalanced chromosome sets is still sparse. Plants generally tolerate aneuploidy better than animals, and, through colchicine treatment and breeding strategies, it is possible to obtain inbred sibling plants with different numbers of chromosomes. This possibility, combined with the genetic and genomics tools available for Arabidopsis thaliana, provides a powerful means to assess systematically the molecular and cytological consequences of aberrant numbers of specific chromosomes. Here, we report on the generation of Arabidopsis plants in which chromosome 5 is present in triplicate. We compare the global transcript profiles of normal diploids and chromosome 5 trisomics, and assess genome integrity using array comparative genome hybridization. We use live cell imaging to determine the interphase 3D arrangement of transgene-encoded fluorescent tags on chromosome 5 in trisomic and triploid plants. The results indicate that trisomy 5 disrupts gene expression throughout the genome and supports the production and/or retention of truncated copies of chromosome 5. Although trisomy 5 does not grossly distort the interphase arrangement of fluorescent-tagged sites on chromosome 5, it may somewhat enhance associations between transgene alleles. Our analysis reveals the complex genomic changes that can occur in aneuploids and underscores the importance of using multiple experimental approaches to investigate how chromosome numerical changes condition abnormal phenotypes and

  3. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    OpenAIRE

    Zuccolo, Andrea; Bowers, John E; Estill, James C; Xiong, Zhiyong; Luo, Meizhong; Sebastian, Aswathy; Goicoechea, Jos? Luis; Collura, Kristi; Yu, Yeisoo; Jiao, Yuannian; Duarte, Jill; Tang, Haibao; Ayyampalayam, Saravanaraj; Rounsley, Steve; Kudrna, Dave

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome....

  4. Identification and accurate quantification of structurally related peptide impurities in synthetic human C-peptide by liquid chromatography-high resolution mass spectrometry.

    Science.gov (United States)

    Li, Ming; Josephs, Ralf D; Daireaux, Adeline; Choteau, Tiphaine; Westwood, Steven; Wielgosz, Robert I; Li, Hongmei

    2018-06-04

    Peptides are an increasingly important group of biomarkers and pharmaceuticals. The accurate purity characterization of peptide calibrators is critical for the development of reference measurement systems for laboratory medicine and quality control of pharmaceuticals. The peptides used for these purposes are increasingly produced through peptide synthesis. Various approaches (for example mass balance, amino acid analysis, qNMR, and nitrogen determination) can be applied to accurately value assign the purity of peptide calibrators. However, all purity assessment approaches require a correction for structurally related peptide impurities in order to avoid biases. Liquid chromatography coupled to high resolution mass spectrometry (LC-hrMS) has become the key technique for the identification and accurate quantification of structurally related peptide impurities in intact peptide calibrator materials. In this study, LC-hrMS-based methods were developed and validated in-house for the identification and quantification of structurally related peptide impurities in a synthetic human C-peptide (hCP) material, which served as a study material for an international comparison looking at the competencies of laboratories to perform peptide purity mass fraction assignments. More than 65 impurities were identified, confirmed, and accurately quantified by using LC-hrMS. The total mass fraction of all structurally related peptide impurities in the hCP study material was estimated to be 83.3 mg/g with an associated expanded uncertainty of 3.0 mg/g (k = 2). The calibration hierarchy concept used for the quantification of individual impurities is described in detail. Graphical abstract ᅟ.

  5. Improvisation in evolution of genes and genomes: whose structure is it anyway?

    Science.gov (United States)

    Shakhnovich, Boris E; Shakhnovich, Eugene I

    2008-06-01

    Significant progress has been made in recent years in a variety of seemingly unrelated fields such as sequencing, protein structure prediction, and high-throughput transcriptomics and metabolomics. At the same time, new microscopic models have been developed that made it possible to analyze the evolution of genes and genomes from first principles. The results from these efforts enable, for the first time, a comprehensive insight into the evolution of complex systems and organisms on all scales--from sequences to organisms and populations. Every newly sequenced genome uncovers new genes, families, and folds. Where do these new genes come from? How do gene duplication and subsequent divergence of sequence and structure affect the fitness of the organism? What role does regulation play in the evolution of proteins and folds? Emerging synergism between data and modeling provides first robust answers to these questions.

  6. Comparative Annotation of Viral Genomes with Non-Conserved Gene Structure

    DEFF Research Database (Denmark)

    de Groot, Saskia; Mailund, Thomas; Hein, Jotun

    2007-01-01

    Motivation: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded...... allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. Results...... and HIV2, as well as of two different Hepatitis Viruses, attaining results of ~87% sensitivity and ~98.5% specificity. We subsequently incorporate prior knowledge by "knowing" the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate...

  7. High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states.

    Directory of Open Access Journals (Sweden)

    Kevin A Wilkinson

    2008-04-01

    Full Text Available Replication and pathogenesis of the human immunodeficiency virus (HIV is tightly linked to the structure of its RNA genome, but genome structure in infectious virions is poorly understood. We invent high-throughput SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension technology, which uses many of the same tools as DNA sequencing, to quantify RNA backbone flexibility at single-nucleotide resolution and from which robust structural information can be immediately derived. We analyze the structure of HIV-1 genomic RNA in four biologically instructive states, including the authentic viral genome inside native particles. Remarkably, given the large number of plausible local structures, the first 10% of the HIV-1 genome exists in a single, predominant conformation in all four states. We also discover that noncoding regions functioning in a regulatory role have significantly lower (p-value < 0.0001 SHAPE reactivities, and hence more structure, than do viral coding regions that function as the template for protein synthesis. By directly monitoring protein binding inside virions, we identify the RNA recognition motif for the viral nucleocapsid protein. Seven structurally homologous binding sites occur in a well-defined domain in the genome, consistent with a role in directing specific packaging of genomic RNA into nascent virions. In addition, we identify two distinct motifs that are targets for the duplex destabilizing activity of this same protein. The nucleocapsid protein destabilizes local HIV-1 RNA structure in ways likely to facilitate initial movement both of the retroviral reverse transcriptase from its tRNA primer and of the ribosome in coding regions. Each of the three nucleocapsid interaction motifs falls in a specific genome domain, indicating that local protein interactions can be organized by the long-range architecture of an RNA. High-throughput SHAPE reveals a comprehensive view of HIV-1 RNA genome structure, and further

  8. Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

    Science.gov (United States)

    Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

    2017-04-01

    There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.

  9. Universal Internucleotide Statistics in Full Genomes: A Footprint of the DNA Structure and Packaging?

    OpenAIRE

    Bogachev, Mikhail I.; Kayumov, Airat R.; Bunde, Armin

    2014-01-01

    Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleot...

  10. Structural analysis of a set of proteins resulting from a bacterial genomics project.

    Science.gov (United States)

    Badger, J; Sauder, J M; Adams, J M; Antonysamy, S; Bain, K; Bergseid, M G; Buchanan, S G; Buchanan, M D; Batiyenko, Y; Christopher, J A; Emtage, S; Eroshkina, A; Feil, I; Furlong, E B; Gajiwala, K S; Gao, X; He, D; Hendle, J; Huber, A; Hoda, K; Kearins, P; Kissinger, C; Laubert, B; Lewis, H A; Lin, J; Loomis, K; Lorimer, D; Louie, G; Maletic, M; Marsh, C D; Miller, I; Molinari, J; Muller-Dieckmann, H J; Newman, J M; Noland, B W; Pagarigan, B; Park, F; Peat, T S; Post, K W; Radojicic, S; Ramos, A; Romero, R; Rutter, M E; Sanderson, W E; Schwinn, K D; Tresser, J; Winhoven, J; Wright, T A; Wu, L; Xu, J; Harris, T J R

    2005-09-01

    The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB. Copyright 2005 Wiley-Liss, Inc.

  11. Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

    Science.gov (United States)

    Brown, Nathan M; Mueller, Ryan S; Shepardson, Jonathan W; Landry, Zachary C; Morré, Jeffrey T; Maier, Claudia S; Hardy, F Joan; Dreher, Theo W

    2016-06-13

    Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.

  12. Global MLST of Salmonella Typhi Revisited in Post-Genomic Era: Genetic conservation, Population Structure and Comparative genomics of rare sequence types

    Directory of Open Access Journals (Sweden)

    Kien-Pong eYap

    2016-03-01

    Full Text Available Typhoid fever, caused by Salmonella enterica serovar Typhi, remains an important public health burden in Southeast Asia and other endemic countries. Various genotyping methods have been applied to study the genetic variations of this human-restricted pathogen. Multilocus Sequence Typing (MLST is one of the widely accepted methods, and recently, there is a growing interest in the re-application of MLST in the post-genomic era. In this study, we provide the global MLST distribution of S. Typhi utilizing both publicly available 1,826 S. Typhi genome sequences in addition to performing conventional MLST on S. Typhi strains isolated from various endemic regions spanning over a century. Our global MLST analysis confirms the predominance of two sequence types (ST1 and ST2 co-existing in the endemic regions. Interestingly, S. Typhi strains with ST8 are currently confined within the African continent. Comparative genomic analyses of ST8 and other rare STs with genomes of ST1/ST2 revealed unique mutations in important virulence genes such as flhB, sipC and tviD that may explain the variations that differentiate between seemingly successful (widespread and unsuccessful (poor dissemination S. Typhi populations. Large scale whole-genome phylogeny demonstrated evidence of phylogeographical structuring and showed that ST8 may have diverged from the earlier ancestral population of ST1 and ST2, which later lost some of its fitness advantages, leading to poor worldwide dissemination. In response to the unprecedented increase in genomic data, this study demonstrates and highlights the utility of large-scale genome-based MLST as a quick and effective approach to narrow the scope of in-depth comparative genomic analysis and consequently provide new insights into the fine scale of pathogen evolution and population structure.

  13. Functional connectivity and structural covariance between regions of interest can be measured more accurately using multivariate distance correlation.

    Science.gov (United States)

    Geerligs, Linda; Cam-Can; Henson, Richard N

    2016-07-15

    Studies of brain-wide functional connectivity or structural covariance typically use measures like the Pearson correlation coefficient, applied to data that have been averaged across voxels within regions of interest (ROIs). However, averaging across voxels may result in biased connectivity estimates when there is inhomogeneity within those ROIs, e.g., sub-regions that exhibit different patterns of functional connectivity or structural covariance. Here, we propose a new measure based on "distance correlation"; a test of multivariate dependence of high dimensional vectors, which allows for both linear and non-linear dependencies. We used simulations to show how distance correlation out-performs Pearson correlation in the face of inhomogeneous ROIs. To evaluate this new measure on real data, we use resting-state fMRI scans and T1 structural scans from 2 sessions on each of 214 participants from the Cambridge Centre for Ageing & Neuroscience (Cam-CAN) project. Pearson correlation and distance correlation showed similar average connectivity patterns, for both functional connectivity and structural covariance. Nevertheless, distance correlation was shown to be 1) more reliable across sessions, 2) more similar across participants, and 3) more robust to different sets of ROIs. Moreover, we found that the similarity between functional connectivity and structural covariance estimates was higher for distance correlation compared to Pearson correlation. We also explored the relative effects of different preprocessing options and motion artefacts on functional connectivity. Because distance correlation is easy to implement and fast to compute, it is a promising alternative to Pearson correlations for investigating ROI-based brain-wide connectivity patterns, for functional as well as structural data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  14. A structural model of the genome packaging process in a membrane-containing double stranded DNA virus.

    Directory of Open Access Journals (Sweden)

    Chuan Hong

    2014-12-01

    Full Text Available Two crucial steps in the virus life cycle are genome encapsidation to form an infective virion and genome exit to infect the next host cell. In most icosahedral double-stranded (ds DNA viruses, the viral genome enters and exits the capsid through a unique vertex. Internal membrane-containing viruses possess additional complexity as the genome must be translocated through the viral membrane bilayer. Here, we report the structure of the genome packaging complex with a membrane conduit essential for viral genome encapsidation in the tailless icosahedral membrane-containing bacteriophage PRD1. We utilize single particle electron cryo-microscopy (cryo-EM and symmetry-free image reconstruction to determine structures of PRD1 virion, procapsid, and packaging deficient mutant particles. At the unique vertex of PRD1, the packaging complex replaces the regular 5-fold structure and crosses the lipid bilayer. These structures reveal that the packaging ATPase P9 and the packaging efficiency factor P6 form a dodecameric portal complex external to the membrane moiety, surrounded by ten major capsid protein P3 trimers. The viral transmembrane density at the special vertex is assigned to be a hexamer of heterodimer of proteins P20 and P22. The hexamer functions as a membrane conduit for the DNA and as a nucleating site for the unique vertex assembly. Our structures show a conformational alteration in the lipid membrane after the P9 and P6 are recruited to the virion. The P8-genome complex is then packaged into the procapsid through the unique vertex while the genome terminal protein P8 functions as a valve that closes the channel once the genome is inside. Comparing mature virion, procapsid, and mutant particle structures led us to propose an assembly pathway for the genome packaging apparatus in the PRD1 virion.

  15. A structural model of the genome packaging process in a membrane-containing double stranded DNA virus.

    Science.gov (United States)

    Hong, Chuan; Oksanen, Hanna M; Liu, Xiangan; Jakana, Joanita; Bamford, Dennis H; Chiu, Wah

    2014-12-01

    Two crucial steps in the virus life cycle are genome encapsidation to form an infective virion and genome exit to infect the next host cell. In most icosahedral double-stranded (ds) DNA viruses, the viral genome enters and exits the capsid through a unique vertex. Internal membrane-containing viruses possess additional complexity as the genome must be translocated through the viral membrane bilayer. Here, we report the structure of the genome packaging complex with a membrane conduit essential for viral genome encapsidation in the tailless icosahedral membrane-containing bacteriophage PRD1. We utilize single particle electron cryo-microscopy (cryo-EM) and symmetry-free image reconstruction to determine structures of PRD1 virion, procapsid, and packaging deficient mutant particles. At the unique vertex of PRD1, the packaging complex replaces the regular 5-fold structure and crosses the lipid bilayer. These structures reveal that the packaging ATPase P9 and the packaging efficiency factor P6 form a dodecameric portal complex external to the membrane moiety, surrounded by ten major capsid protein P3 trimers. The viral transmembrane density at the special vertex is assigned to be a hexamer of heterodimer of proteins P20 and P22. The hexamer functions as a membrane conduit for the DNA and as a nucleating site for the unique vertex assembly. Our structures show a conformational alteration in the lipid membrane after the P9 and P6 are recruited to the virion. The P8-genome complex is then packaged into the procapsid through the unique vertex while the genome terminal protein P8 functions as a valve that closes the channel once the genome is inside. Comparing mature virion, procapsid, and mutant particle structures led us to propose an assembly pathway for the genome packaging apparatus in the PRD1 virion.

  16. Tensor-decomposed vibrational coupled-cluster theory: Enabling large-scale, highly accurate vibrational-structure calculations

    Science.gov (United States)

    Madsen, Niels Kristian; Godtliebsen, Ian H.; Losilla, Sergio A.; Christiansen, Ove

    2018-01-01

    A new implementation of vibrational coupled-cluster (VCC) theory is presented, where all amplitude tensors are represented in the canonical polyadic (CP) format. The CP-VCC algorithm solves the non-linear VCC equations without ever constructing the amplitudes or error vectors in full dimension but still formally includes the full parameter space of the VCC[n] model in question resulting in the same vibrational energies as the conventional method. In a previous publication, we have described the non-linear-equation solver for CP-VCC calculations. In this work, we discuss the general algorithm for evaluating VCC error vectors in CP format including the rank-reduction methods used during the summation of the many terms in the VCC amplitude equations. Benchmark calculations for studying the computational scaling and memory usage of the CP-VCC algorithm are performed on a set of molecules including thiadiazole and an array of polycyclic aromatic hydrocarbons. The results show that the reduced scaling and memory requirements of the CP-VCC algorithm allows for performing high-order VCC calculations on systems with up to 66 vibrational modes (anthracene), which indeed are not possible using the conventional VCC method. This paves the way for obtaining highly accurate vibrational spectra and properties of larger molecules.

  17. Optical electromagnetic vector-field modeling for the accurate analysis of finite diffractive structures of high complexity

    DEFF Research Database (Denmark)

    Dridi, Kim; Bjarklev, Anders Overgaard

    1999-01-01

    An electromagnetic vector-field modle for design of optical components based on the finite-difference-time-domain method and radiation integrals in presented. Its ability to predict the optical electromagnetic dynamics in structures with complex material distribution is demonstrated. Theoretical...

  18. Combining Functional and Structural Genomics to Sample the Essential Burkholderia Structome

    Science.gov (United States)

    Baugh, Loren; Gallagher, Larry A.; Patrapuvich, Rapatbhorn; Clifton, Matthew C.; Gardberg, Anna S.; Edwards, Thomas E.; Armour, Brianna; Begley, Darren W.; Dieterich, Shellie H.; Dranow, David M.; Abendroth, Jan; Fairman, James W.; Fox, David; Staker, Bart L.; Phan, Isabelle; Gillespie, Angela; Choi, Ryan; Nakazawa-Hewitt, Steve; Nguyen, Mary Trang; Napuli, Alberto; Barrett, Lynn; Buchko, Garry W.; Stacy, Robin; Myler, Peter J.; Stewart, Lance J.; Manoil, Colin; Van Voorhis, Wesley C.

    2013-01-01

    Background The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. Methodology/Principal Findings We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an “ortholog rescue” strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. Conclusions/Significance This collection of structures, solubility and experimental essentiality data

  19. Combining functional and structural genomics to sample the essential Burkholderia structome.

    Directory of Open Access Journals (Sweden)

    Loren Baugh

    Full Text Available The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite.We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq. We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID structure determination pipeline. To maximize structural coverage of these targets, we applied an "ortholog rescue" strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail.This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against

  20. Combining functional and structural genomics to sample the essential Burkholderia structome.

    Science.gov (United States)

    Baugh, Loren; Gallagher, Larry A; Patrapuvich, Rapatbhorn; Clifton, Matthew C; Gardberg, Anna S; Edwards, Thomas E; Armour, Brianna; Begley, Darren W; Dieterich, Shellie H; Dranow, David M; Abendroth, Jan; Fairman, James W; Fox, David; Staker, Bart L; Phan, Isabelle; Gillespie, Angela; Choi, Ryan; Nakazawa-Hewitt, Steve; Nguyen, Mary Trang; Napuli, Alberto; Barrett, Lynn; Buchko, Garry W; Stacy, Robin; Myler, Peter J; Stewart, Lance J; Manoil, Colin; Van Voorhis, Wesley C

    2013-01-01

    The genus Burkholderia includes pathogenic gram-negative bacteria that cause melioidosis, glanders, and pulmonary infections of patients with cancer and cystic fibrosis. Drug resistance has made development of new antimicrobials critical. Many approaches to discovering new antimicrobials, such as structure-based drug design and whole cell phenotypic screens followed by lead refinement, require high-resolution structures of proteins essential to the parasite. We experimentally identified 406 putative essential genes in B. thailandensis, a low-virulence species phylogenetically similar to B. pseudomallei, the causative agent of melioidosis, using saturation-level transposon mutagenesis and next-generation sequencing (Tn-seq). We selected 315 protein products of these genes based on structure-determination criteria, such as excluding very large and/or integral membrane proteins, and entered them into the Seattle Structural Genomics Center for Infection Disease (SSGCID) structure determination pipeline. To maximize structural coverage of these targets, we applied an "ortholog rescue" strategy for those producing insoluble or difficult to crystallize proteins, resulting in the addition of 387 orthologs (or paralogs) from seven other Burkholderia species into the SSGCID pipeline. This structural genomics approach yielded structures from 31 putative essential targets from B. thailandensis, and 25 orthologs from other Burkholderia species, yielding an overall structural coverage for 49 of the 406 essential gene families, with a total of 88 depositions into the Protein Data Bank. Of these, 25 proteins have properties of a potential antimicrobial drug target i.e., no close human homolog, part of an essential metabolic pathway, and a deep binding pocket. We describe the structures of several potential drug targets in detail. This collection of structures, solubility and experimental essentiality data provides a resource for development of drugs against infections and diseases

  1. Exchange-Hole Dipole Dispersion Model for Accurate Energy Ranking in Molecular Crystal Structure Prediction II: Nonplanar Molecules.

    Science.gov (United States)

    Whittleton, Sarah R; Otero-de-la-Roza, A; Johnson, Erin R

    2017-11-14

    The crystal structure prediction (CSP) of a given compound from its molecular diagram is a fundamental challenge in computational chemistry with implications in relevant technological fields. A key component of CSP is the method to calculate the lattice energy of a crystal, which allows the ranking of candidate structures. This work is the second part of our investigation to assess the potential of the exchange-hole dipole moment (XDM) dispersion model for crystal structure prediction. In this article, we study the relatively large, nonplanar, mostly flexible molecules in the first five blind tests held by the Cambridge Crystallographic Data Centre. Four of the seven experimental structures are predicted as the energy minimum, and thermal effects are demonstrated to have a large impact on the ranking of at least another compound. As in the first part of this series, delocalization error affects the results for a single crystal (compound X), in this case by detrimentally overstabilizing the π-conjugated conformation of the monomer. Overall, B86bPBE-XDM correctly predicts 16 of the 21 compounds in the five blind tests, a result similar to the one obtained using the best CSP method available to date (dispersion-corrected PW91 by Neumann et al.). Perhaps more importantly, the systems for which B86bPBE-XDM fails to predict the experimental structure as the energy minimum are mostly the same as with Neumann's method, which suggests that similar difficulties (absence of vibrational free energy corrections, delocalization error,...) are not limited to B86bPBE-XDM but affect GGA-based DFT-methods in general. Our work confirms B86bPBE-XDM as an excellent option for crystal energy ranking in CSP and offers a guide to identify crystals (organic salts, conjugated flexible systems) where difficulties may appear.

  2. Structural genomics: keeping up with expanding knowledge of the protein universe

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2010-01-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space — a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a reassessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006. PMID:17587562

  3. Structural genomics: keeping up with expanding knowledge of the protein universe.

    Science.gov (United States)

    Grabowski, Marek; Joachimiak, Andrzej; Otwinowski, Zbyszek; Minor, Wladek

    2007-06-01

    Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space--a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a re-assessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006.

  4. Grass genomes

    OpenAIRE

    Bennetzen, Jeffrey L.; SanMiguel, Phillip; Chen, Mingsheng; Tikhonov, Alexander; Francki, Michael; Avramova, Zoya

    1998-01-01

    For the most part, studies of grass genome structure have been limited to the generation of whole-genome genetic maps or the fine structure and sequence analysis of single genes or gene clusters. We have investigated large contiguous segments of the genomes of maize, sorghum, and rice, primarily focusing on intergenic spaces. Our data indicate that much (>50%) of the maize genome is composed of interspersed repetitive DNAs, primarily nested retrotransposons that in...

  5. Magnetic resonance imaging can accurately assess the long-term progression of knee structural changes in experimental dog osteoarthritis.

    Science.gov (United States)

    Boileau, C; Martel-Pelletier, J; Abram, F; Raynauld, J-P; Troncy, E; D'Anjou, M-A; Moreau, M; Pelletier, J-P

    2008-07-01

    Osteoarthritis (OA) structural changes take place over decades in humans. MRI can provide precise and reliable information on the joint structure and changes over time. In this study, we investigated the reliability of quantitative MRI in assessing knee OA structural changes in the experimental anterior cruciate ligament (ACL) dog model of OA. OA was surgically induced by transection of the ACL of the right knee in five dogs. High resolution three dimensional MRI using a 1.5 T magnet was performed at baseline, 4, 8 and 26 weeks post surgery. Cartilage volume/thickness, cartilage defects, trochlear osteophyte formation and subchondral bone lesion (hypersignal) were assessed on MRI images. Animals were killed 26 weeks post surgery and macroscopic evaluation was performed. There was a progressive and significant increase over time in the loss of knee cartilage volume, the cartilage defect and subchondral bone hypersignal. The trochlear osteophyte size also progressed over time. The greatest cartilage loss at 26 weeks was found on the tibial plateaus and in the medial compartment. There was a highly significant correlation between total knee cartilage volume loss or defect and subchondral bone hypersignal, and also a good correlation between the macroscopic and the MRI findings. This study demonstrated that MRI is a useful technology to provide a non-invasive and reliable assessment of the joint structural changes during the development of OA in the ACL dog model. The combination of this OA model with MRI evaluation provides a promising tool for the evaluation of new disease-modifying osteoarthritis drugs (DMOADs).

  6. Population genomic structure and linkage disequilibrium analysis of South African goat breeds using genome-wide SNP data.

    Science.gov (United States)

    Mdladla, K; Dzomba, E F; Huson, H J; Muchadeyi, F C

    2016-08-01

    The sustainability of goat farming in marginal areas of southern Africa depends on local breeds that are adapted to specific agro-ecological conditions. Unimproved non-descript goats are the main genetic resources used for the development of commercial meat-type breeds of South Africa. Little is known about genetic diversity and the genetics of adaptation of these indigenous goat populations. This study investigated the genetic diversity, population structure and breed relations, linkage disequilibrium, effective population size and persistence of gametic phase in goat populations of South Africa. Three locally developed meat-type breeds of the Boer (n = 33), Savanna (n = 31), Kalahari Red (n = 40), a feral breed of Tankwa (n = 25) and unimproved non-descript village ecotypes (n = 110) from four goat-producing provinces of the Eastern Cape, KwaZulu-Natal, Limpopo and North West were assessed using the Illumina Goat 50K SNP Bead Chip assay. The proportion of SNPs with minor allele frequencies >0.05 ranged from 84.22% in the Tankwa to 97.58% in the Xhosa ecotype, with a mean of 0.32 ± 0.13 across populations. Principal components analysis, admixture and pairwise FST identified Tankwa as a genetically distinct population and supported clustering of the populations according to their historical origins. Genome-wide FST identified 101 markers potentially under positive selection in the Tankwa. Average linkage disequilibrium was highest in the Tankwa (r(2)  = 0.25 ± 0.26) and lowest in the village ecotypes (r(2) range = 0.09 ± 0.12 to 0.11 ± 0.14). We observed an effective population size of 100 kb with the exception of those in Savanna and Tswana populations. This study highlights the high level of genetic diversity in South African indigenous goats as well as the utility of the genome-wide SNP marker panels in genetic studies of these populations. © 2016 Stichting International Foundation for Animal Genetics.

  7. Supplementary Material for: Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody; Coll, Francesc; McNerney, Ruth; Ascher, David; Pires, Douglas; Furnham, Nick; Coeck, Nele; Hill-Cawthorne, Grant; Nair, Mridul; Mallard, Kim; Ramsay, Andrew; Campino, Susana; Hibberd, Martin; Pain, Arnab; Rigouts, Leen; Clark, Taane

    2016-01-01

    Abstract Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure

  8. A new methodology for non-contact accurate crack width measurement through photogrammetry for automated structural safety evaluation

    International Nuclear Information System (INIS)

    Jahanshahi, Mohammad R; Masri, Sami F

    2013-01-01

    In mechanical, aerospace and civil structures, cracks are important defects that can cause catastrophes if neglected. Visual inspection is currently the predominant method for crack assessment. This approach is tedious, labor-intensive, subjective and highly qualitative. An inexpensive alternative to current monitoring methods is to use a robotic system that could perform autonomous crack detection and quantification. To reach this goal, several image-based crack detection approaches have been developed; however, the crack thickness quantification, which is an essential element for a reliable structural condition assessment, has not been sufficiently investigated. In this paper, a new contact-less crack quantification methodology, based on computer vision and image processing concepts, is introduced and evaluated against a crack quantification approach which was previously developed by the authors. The proposed approach in this study utilizes depth perception to quantify crack thickness and, as opposed to most previous studies, needs no scale attachment to the region under inspection, which makes this approach ideal for incorporation with autonomous or semi-autonomous mobile inspection systems. Validation tests are performed to evaluate the performance of the proposed approach, and the results show that the new proposed approach outperforms the previously developed one. (paper)

  9. De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences

    DEFF Research Database (Denmark)

    Ruzzo, Walter L; Gorodkin, Jan

    2014-01-01

    De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphas...... on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.......De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis...

  10. Structure and mechanism of the ATPase that powers viral genome packaging.

    Science.gov (United States)

    Hilbert, Brendan J; Hayes, Janelle A; Stone, Nicholas P; Duffy, Caroline M; Sankaran, Banumathi; Kelch, Brian A

    2015-07-21

    Many viruses package their genomes into procapsids using an ATPase machine that is among the most powerful known biological motors. However, how this motor couples ATP hydrolysis to DNA translocation is still unknown. Here, we introduce a model system with unique properties for studying motor structure and mechanism. We describe crystal structures of the packaging motor ATPase domain that exhibit nucleotide-dependent conformational changes involving a large rotation of an entire subdomain. We also identify the arginine finger residue that catalyzes ATP hydrolysis in a neighboring motor subunit, illustrating that previous models for motor structure need revision. Our findings allow us to derive a structural model for the motor ring, which we validate using small-angle X-ray scattering and comparisons with previously published data. We illustrate the model's predictive power by identifying the motor's DNA-binding and assembly motifs. Finally, we integrate our results to propose a mechanistic model for DNA translocation by this molecular machine.

  11. Whole genome PCR scanning reveals the syntenic genome structure of toxigenic Vibrio cholerae strains in the O1/O139 population.

    Directory of Open Access Journals (Sweden)

    Bo Pang

    Full Text Available Vibrio cholerae is commonly found in estuarine water systems. Toxigenic O1 and O139 V. cholerae strains have caused cholera epidemics and pandemics, whereas the nontoxigenic strains within these serogroups only occasionally lead to disease. To understand the differences in the genome and clonality between the toxigenic and nontoxigenic strains of V. cholerae serogroups O1 and O139, we employed a whole genome PCR scanning (WGPScanning method, an rrn operon-mediated fragment rearrangement analysis and comparative genomic hybridization (CGH to analyze the genome structure of different strains. WGPScanning in conjunction with CGH revealed that the genomic contents of the toxigenic strains were conservative, except for a few indels located mainly in mobile elements. Minor nucleotide variation in orthologous genes appeared to be the major difference between the toxigenic strains. rrn operon-mediated rearrangements were infrequent in El Tor toxigenic strains tested using I-CeuI digested pulsed-field gel electrophoresis (PFGE analysis and PCR analysis based on flanking sequence of rrn operons. Using these methods, we found that the genomic structures of toxigenic El Tor and O139 strains were syntenic. The nontoxigenic strains exhibited more extensive sequence variations, but toxin coregulated pilus positive (TCP+ strains had a similar structure. TCP+ nontoxigenic strains could be subdivided into multiple lineages according to the TCP type, suggesting the existence of complex intermediates in the evolution of toxigenic strains. The data indicate that toxigenic O1 El Tor and O139 strains were derived from a single lineage of intermediates from complex clones in the environment. The nontoxigenic strains with non-El Tor type TCP may yet evolve into new epidemic clones after attaining toxigenic attributes.

  12. Common structural and epigenetic changes in the genome of castration-resistant prostate cancer.

    Science.gov (United States)

    Friedlander, Terence W; Roy, Ritu; Tomlins, Scott A; Ngo, Vy T; Kobayashi, Yasuko; Azameera, Aruna; Rubin, Mark A; Pienta, Kenneth J; Chinnaiyan, Arul; Ittmann, Michael M; Ryan, Charles J; Paris, Pamela L

    2012-02-01

    Progression of primary prostate cancer to castration-resistant prostate cancer (CRPC) is associated with numerous genetic and epigenetic alterations that are thought to promote survival at metastatic sites. In this study, we investigated gene copy number and CpG methylation status in CRPC to gain insight into specific pathophysiologic pathways that are active in this advanced form of prostate cancer. Our analysis defined and validated 495 genes exhibiting significant differences in CRPC in gene copy number, including gains in androgen receptor (AR) and losses of PTEN and retinoblastoma 1 (RB1). Significant copy number differences existed between tumors with or without AR gene amplification, including a common loss of AR repressors in AR-unamplified tumors. Simultaneous gene methylation and allelic deletion occurred frequently in RB1 and HSD17B2, the latter of which is involved in testosterone metabolism. Lastly, genomic DNA from most CRPC was hypermethylated compared with benign prostate tissue. Our findings establish a comprehensive methylation signature that couples epigenomic and structural analyses, thereby offering insights into the genomic alterations in CRPC that are associated with a circumvention of hormonal therapy. Genes identified in this integrated genomic study point to new drug targets in CRPC, an incurable disease state which remains the chief therapeutic challenge. ©2012 AACR.

  13. Primary structure of the human follistatin precursor and its genomic organization

    International Nuclear Information System (INIS)

    Shimasaki, Shunichi; Koga, Makoto; Esch, F.

    1988-01-01

    Follistatin is a single-chain gonadal protein that specifically inhibits follicle-stimulating hormone release. By use of the recently characterized porcine follistatin cDNA as a probe to screen a human testis cDNA library and a genomic library, the structure of the complete human follistatin precursor as well as its genomic organization have been determined. Three of eight cDNA clones that were sequenced predicted a precursor with 344 amino acids, whereas the remaining five cDNA clones encoded a 317 amino acid precursor, resulting from alternative splicing of the precursor mRNA. Mature follistatins contain four contiguous domains that are encoded by precisely separated exons; three of the domains are highly similar to each other, as well as to human epidermal growth factor and human pancreatic secretory trypsin inhibitor. The genomic organization of the human follistatin is similar to that of the human epidermal growth factor gene and thus supports the notion of exon shuffling during evolution

  14. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo.

    Science.gov (United States)

    Zubradt, Meghan; Gupta, Paromita; Persad, Sitara; Lambowitz, Alan M; Weissman, Jonathan S; Rouskin, Silvi

    2017-01-01

    Coupling of structure-specific in vivo chemical modification to next-generation sequencing is transforming RNA secondary structure studies in living cells. The dominant strategy for detecting in vivo chemical modifications uses reverse transcriptase truncation products, which introduce biases and necessitate population-average assessments of RNA structure. Here we present dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq), which encodes DMS modifications as mismatches using a thermostable group II intron reverse transcriptase. DMS-MaPseq yields a high signal-to-noise ratio, can report multiple structural features per molecule, and allows both genome-wide studies and focused in vivo investigations of even low-abundance RNAs. We apply DMS-MaPseq for the first analysis of RNA structure within an animal tissue and to identify a functional structure involved in noncanonical translation initiation. Additionally, we use DMS-MaPseq to compare the in vivo structure of pre-mRNAs with their mature isoforms. These applications illustrate DMS-MaPseq's capacity to dramatically expand in vivo analysis of RNA structure.

  15. Sequencing intractable DNA to close microbial genomes.

    Directory of Open Access Journals (Sweden)

    Richard A Hurt

    Full Text Available Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps and the Desulfovibrio africanus genome (1 intractable gap. The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  16. Sequencing Intractable DNA to Close Microbial Genomes

    Energy Technology Data Exchange (ETDEWEB)

    Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  17. Cloning, production, and purification of proteins for a medium-scale structural genomics project.

    Science.gov (United States)

    Quevillon-Cheruel, Sophie; Collinet, Bruno; Trésaugues, Lionel; Minard, Philippe; Henckes, Gilles; Aufrère, Robert; Blondeau, Karine; Zhou, Cong-Zhao; Liger, Dominique; Bettache, Nabila; Poupon, Anne; Aboulfath, Ilham; Leulliot, Nicolas; Janin, Joël; van Tilbeurgh, Herman

    2007-01-01

    The South-Paris Yeast Structural Genomics Pilot Project (http://www.genomics.eu.org) aims at systematically expressing, purifying, and determining the three-dimensional structures of Saccharomyces cerevisiae proteins. We have already cloned 240 yeast open reading frames in the Escherichia coli pET system. Eighty-two percent of the targets can be expressed in E. coli, and 61% yield soluble protein. We have currently purified 58 proteins. Twelve X-ray structures have been solved, six are in progress, and six other proteins gave crystals. In this chapter, we present the general experimental flowchart applied for this project. One of the main difficulties encountered in this pilot project was the low solubility of a great number of target proteins. We have developed parallel strategies to recover these proteins from inclusion bodies, including refolding, coexpression with chaperones, and an in vitro expression system. A limited proteolysis protocol, developed to localize flexible regions in proteins that could hinder crystallization, is also described.

  18. Multiple independent structural dynamic events in the evolution of snake mitochondrial genomes.

    Science.gov (United States)

    Qian, Lifu; Wang, Hui; Yan, Jie; Pan, Tao; Jiang, Shanqun; Rao, Dingqi; Zhang, Baowei

    2018-05-10

    Mitochondrial DNA sequences have long been used in phylogenetic studies. However, little attention has been paid to the changes in gene arrangement patterns in the snake's mitogenome. Here, we analyzed the complete mitogenome sequences and structures of 65 snake species from 14 families and examined their structural patterns, organization and evolution. Our purpose was to further investigate the evolutionary implications and possible rearrangement mechanisms of the mitogenome within snakes. In total, eleven types of mitochondrial gene arrangement patterns were detected (Type I, II, III, III-A, III-B, III-B1, III-C, III-D, III-E, III-F, III-G), with mitochondrial genome rearrangements being a major trend in snakes, especially in Alethinophidia. In snake mitogenomes, the rearrangements mainly involved three processes, gene loss, translocation and duplication. Within Scolecophidia, the O L was lost several times in Typhlopidae and Leptotyphlopidae, but persisted as a plesiomorphy in the Alethinophidia. Duplication of the control region and translocation of the tRNA Leu gene are two visible features in Alethinophidian mitochondrial genomes. Independently and stochastically, the duplication of pseudo-Pro (P*) emerged in seven different lineages of unequal size in three families, indicating that the presence of P* was a polytopic event in the mitogenome. The WANCY tRNA gene cluster and the control regions and their adjacent segments were hotspots for mitogenome rearrangement. Maintenance of duplicate control regions may be the source for snake mitogenome structural diversity.

  19. Experiment and theory at the convergence limit: accurate equilibrium structure of picolinic acid by gas-phase electron diffraction and coupled-cluster computations.

    Science.gov (United States)

    Vogt, Natalja; Marochkin, Ilya I; Rykov, Anatolii N

    2018-04-18

    The accurate molecular structure of picolinic acid has been determined from experimental data and computed at the coupled cluster level of theory. Only one conformer with the O[double bond, length as m-dash]C-C-N and H-O-C[double bond, length as m-dash]O fragments in antiperiplanar (ap) positions, ap-ap, has been detected under conditions of the gas-phase electron diffraction (GED) experiment (Tnozzle = 375(3) K). The semiexperimental equilibrium structure, rsee, of this conformer has been derived from the GED data taking into account the anharmonic vibrational effects estimated from the ab initio force field. The equilibrium structures of the two lowest-energy conformers, ap-ap and ap-sp (with the synperiplanar H-O-C[double bond, length as m-dash]O fragment), have been fully optimized at the CCSD(T)_ae level of theory in conjunction with the triple-ζ basis set (cc-pwCVTZ). The quality of the optimized structures has been improved due to extrapolation to the quadruple-ζ basis set. The high accuracy of both GED determination and CCSD(T) computations has been disclosed by a correct comparison of structures having the same physical meaning. The ap-ap conformer has been found to be stabilized by the relatively strong NH-O hydrogen bond of 1.973(27) Å (GED) and predicted to be lower in energy by 16 kJ mol-1 with respect to the ap-sp conformer without a hydrogen bond. The influence of this bond on the structure of picolinic acid has been analyzed within the Natural Bond Orbital model. The possibility of the decarboxylation of picolinic acid has been considered in the GED analysis, but no significant amounts of pyridine and carbon dioxide could be detected. To reveal the structural changes reflecting the mesomeric and inductive effects due to the carboxylic substituent, the accurate structure of pyridine has been also computed at the CCSD(T)_ae level with basis sets from triple- to 5-ζ quality. The comprehensive structure computations for pyridine as well as for

  20. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny

    Science.gov (United States)

    Marr, Henry S.; Tarigo, Jaime L.; Cohn, Leah A.; Bird, David M.; Scholl, Elizabeth H.; Levy, Michael G.; Wiegmann, Brian M.; Birkenheuer, Adam J.

    2016-01-01

    The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon) are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi) with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis) and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae) while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward understanding the

  1. Mitochondrial Genome Sequences and Structures Aid in the Resolution of Piroplasmida phylogeny.

    Directory of Open Access Journals (Sweden)

    Megan E Schreeg

    Full Text Available The taxonomy of the order Piroplasmida, which includes a number of clinically and economically relevant organisms, is a hotly debated topic amongst parasitologists. Three genera (Babesia, Theileria, and Cytauxzoon are recognized based on parasite life cycle characteristics, but molecular phylogenetic analyses of 18S sequences have suggested the presence of five or more distinct Piroplasmida lineages. Despite these important advancements, a few studies have been unable to define the taxonomic relationships of some organisms (e.g. C. felis and T. equi with respect to other Piroplasmida. Additional evidence from mitochondrial genome sequences and synteny should aid in the inference of Piroplasmida phylogeny and resolution of taxonomic uncertainties. In this study, we have amplified, sequenced, and annotated seven previously uncharacterized mitochondrial genomes (Babesia canis, Babesia vogeli, Babesia rossi, Babesia sp. Coco, Babesia conradae, Babesia microti-like sp., and Cytauxzoon felis and identified additional ribosomal fragments in ten previously characterized mitochondrial genomes. Phylogenetic analysis of concatenated mitochondrial and 18S sequences as well as cox1 amino acid sequence identified five distinct Piroplasmida groups, each of which possesses a unique mitochondrial genome structure. Specifically, our results confirm the existence of four previously identified clades (B. microti group, Babesia sensu stricto, Theileria equi, and a Babesia sensu latu group that includes B. conradae while supporting the integration of Theileria and Cytauxzoon species into a single fifth taxon. Although known biological characteristics of Piroplasmida corroborate the proposed phylogeny, more investigation into parasite life cycles is warranted to further understand the evolution of the Piroplasmida. Our results provide an evolutionary framework for comparative biology of these important animal and human pathogens and help focus renewed efforts toward

  2. A genomic perspective on protein tyrosine phosphatases: gene structure, pseudogenes, and genetic disease linkage

    DEFF Research Database (Denmark)

    Andersen, Jannik N; Jansen, Peter G; Echwald, Søren M

    2004-01-01

    sequence databases, we discovered one novel human PTP gene and defined chromosomal loci and exon structure of the additional 37 genes encoding known PTP transcripts. Direct orthologs were present in the mouse genome for all 38 human PTP genes. In addition, we identified 12 PTP pseudogenes unique to humans...... that have probably contaminated previous bioinformatics analysis of this gene family. PCR amplification and transcript sequencing indicate that some PTP pseudogenes are expressed, but their function (if any) is unknown. Furthermore, we analyzed the enhanced diversity generated by alternative splicing...

  3. Structural and functional insights of β-glucosidases identified from the genome of Aspergillus fumigatus

    Science.gov (United States)

    Dodda, Subba Reddy; Aich, Aparajita; Sarkar, Nibedita; Jain, Piyush; Jain, Sneha; Mondal, Sudipa; Aikat, Kaustav; Mukhopadhyay, Sudit S.

    2018-03-01

    Thermostable glucose tolerant β-glucosidase from Aspergillus species has attracted worldwide interest for their potentiality in industrial applications and bioethanol production. A strain of Aspergillus fumigatus (AfNITDGPKA3) identified by our laboratory from straw retting ground showed higher cellulase activity, specifically the β-glucosidase activity, compared to other contemporary strains. Though A. fumigatus has been known for high cellulase activity, detailed identification and characterization of the cellulase genes from their genome is yet to be done. In this work we have been analyzed the cellulase genes from the genome sequence database of Aspergillus fumigatus (Af293). Genome analysis suggests two cellobiohydrolase, eleven endoglucanase and seventeen β-glucosidase genes present. β-Glucosidase genes belong to either Glycohydro1 (GH1 or Bgl1) or Glycohydro3 (GH3 or Bgl3) family. The sequence similarity suggests that Bgl1 and Bgl3 of A. fumagatus are phylogenetically close to those of A. fisheri and A. oryzae. The modelled structure of the Bgl1 predicts the (β/α)8 barrel type structure with deep and narrow active site, whereas, Bgl3 shows the (α/β)8 barrel and (α/β)6 sandwich structure with shallow and open active site. Docking results suggest that amino acids Glu544, Glu466, Trp408,Trp567,Tyr44,Tyr222,Tyr770,Asp844,Asp537,Asn212,Asn217 of Bgl3 and Asp224,Asn242,Glu440, Glu445, Tyr367, Tyr365,Thr994,Trp435,Trp446 of Bgl1 are involved in the hydrolysis. Binding affinity analyses suggest that Bgl3 and Bgl1 enzymes are more active on the substrates like 4-methylumbelliferyl glycoside (MUG) and p-nitrophenyl-β-D-1, 4-glucopyranoside (pNPG) than on cellobiose. Further docking with glucose suggests that Bgl1 is more glucose tolerant than Bgl3. Analysis of the Aspergillus fumigatus genome may help to identify a β-glucosidase enzyme with better property and the structural information may help to develop an engineered recombinant enzyme.

  4. Genomic prediction in families of perennial ryegrass based on genotyping-by-sequencing

    DEFF Research Database (Denmark)

    Ashraf, Bilal

    In this thesis we investigate the potential for genomic prediction in perennial ryegrass using genotyping-by-sequencing (GBS) data. Association method based on family-based breeding systems was developed, genomic heritabilities, genomic prediction accurancies and effects of some key factors wer...... explored. Results show that low sequencing depth caused underestimation of allele substitution effects in GWAS and overestimation of genomic heritability in prediction studies. Other factors susch as SNP marker density, population structure and size of training population influenced accuracy of genomic...... prediction. Overall, GBS allows for genomic prediction in breeding families of perennial ryegrass and holds good potential to expedite genetic gain and encourage the application of genomic prediction...

  5. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    Science.gov (United States)

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  6. The structures of bovine herpesvirus 1 virion and concatemeric DNA: implications for cleavage and packaging of herpesvirus genomes

    International Nuclear Information System (INIS)

    Schynts, Frederic; McVoy, Michael A.; Meurens, Francois; Detry, Bruno; Epstein, Alberto L.; Thiry, Etienne

    2003-01-01

    Herpesvirus genomes are often characterized by the presence of direct and inverted repeats that delineate their grouping into six structural classes. Class D genomes consist of a long (L) segment and a short (S) segment. The latter is flanked by large inverted repeats. DNA replication produces concatemers of head-to-tail linked genomes that are cleaved into unit genomes during the process of packaging DNA into capsids. Packaged class D genomes are an equimolar mixture of two isomers in which S is in either of two orientations, presumably a consequence of homologous recombination between the inverted repeats. The L segment remains predominantly fixed in a prototype (P) orientation; however, low levels of genomes having inverted L (I L ) segments have been reported for some class D herpesviruses. Inefficient formation of class D I L genomes has been attributed to infrequent L segment inversion, but recent detection of frequent inverted L segments in equine herpesvirus 1 concatemers [Virology 229 (1997) 415-420] suggests that the defect may be at the level of cleavage and packaging rather than inversion. In this study, the structures of virion and concatemeric DNA of another class D herpesvirus, bovine herpesvirus 1, were determined. Virion DNA contained low levels of I L genomes, whereas concatemeric DNA contained significant amounts of L segments in both P and I L orientations. However, concatemeric termini exhibited a preponderance of L termini derived from P isomers which was comparable to the preponderance of P genomes found in virion DNA. Thus, the defect in formation of I L genomes appears to lie at the level of concatemer cleavage. These results have important implications for the mechanisms by which herpesvirus DNA cleavage and packaging occur

  7. The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial ‘mobilome’

    Science.gov (United States)

    Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

    2009-01-01

    Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The ∼108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element

  8. Discovery of new enzymes and metabolic pathways using structure and genome context

    Science.gov (United States)

    Zhao, Suwen; Kumar, Ritesh; Sakai, Ayano; Vetting, Matthew W.; Wood, B. McKay; Brown, Shoshana; Bonanno, Jeffery B.; Hillerich, Brandan S.; Seidel, Ronald D.; Babbitt, Patricia C.; Almo, Steven C.; Sweedler, Jonathan V.; Gerlt, John A.; Cronan, John E.; Jacobson, Matthew P.

    2014-01-01

    Assigning valid functions to proteins identified in genome projects is challenging, with over-prediction and database annotation errors major concerns1. We, and others2, are developing computation-guided strategies for functional discovery using “metabolite docking” to experimentally derived3 or homology-based4 three-dimensional structures. Bacterial metabolic pathways often are encoded by “genome neighborhoods” (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by “predicting” the intermediates in the glycolytic pathway in E. coli5. Metabolite docking to multiple binding proteins/enzymes in the same pathway increases the reliability of in silico predictions of substrate specificities because the pathway intermediates are structurally similar. We report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed i) the correct prediction of the in vitro activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of trans-4-hydroxy-L-proline betaine (tHyp-B) and cis-4-hydroxy-D-proline betaine (cHyp-B), and ii) the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by in vitro assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guide functional predictions to enable the discovery of new metabolic pathways. PMID:24056934

  9. Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models.

    Science.gov (United States)

    Lehermeier, Christina; Schön, Chris-Carolin; de Los Campos, Gustavo

    2015-09-01

    Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to "correct" for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP. Copyright © 2015 by the Genetics Society of America.

  10. Modeling structure of G protein-coupled receptors in huan genome

    KAUST Repository

    Zhang, Yang

    2016-01-26

    G protein-coupled receptors (or GPCRs) are integral transmembrane proteins responsible to various cellular signal transductions. Human GPCR proteins are encoded by 5% of human genes but account for the targets of 40% of the FDA approved drugs. Due to difficulties in crystallization, experimental structure determination remains extremely difficult for human GPCRs, which have been a major barrier in modern structure-based drug discovery. We proposed a new hybrid protocol, GPCR-I-TASSER, to construct GPCR structure models by integrating experimental mutagenesis data with ab initio transmembrane-helix assembly simulations, assisted by the predicted transmembrane-helix interaction networks. The method was tested in recent community-wide GPCRDock experiments and constructed models with a root mean square deviation 1.26 Å for Dopamine-3 and 2.08 Å for Chemokine-4 receptors in the transmembrane domain regions, which were significantly closer to the native than the best templates available in the PDB. GPCR-I-TASSER has been applied to model all 1,026 putative GPCRs in the human genome, where 923 are found to have correct folds based on the confidence score analysis and mutagenesis data comparison. The successfully modeled GPCRs contain many pharmaceutically important families that do not have previously solved structures, including Trace amine, Prostanoids, Releasing hormones, Melanocortins, Vasopressin and Neuropeptide Y receptors. All the human GPCR models have been made publicly available through the GPCR-HGmod database at http://zhanglab.ccmb.med.umich.edu/GPCR-HGmod/ The results demonstrate new progress on genome-wide structure modeling of transmembrane proteins which should bring useful impact on the effort of GPCR-targeted drug discovery.

  11. Inferring network structure in non-normal and mixed discrete-continuous genomic data.

    Science.gov (United States)

    Bhadra, Anindya; Rao, Arvind; Baladandayuthapani, Veerabhadran

    2018-03-01

    Inferring dependence structure through undirected graphs is crucial for uncovering the major modes of multivariate interaction among high-dimensional genomic markers that are potentially associated with cancer. Traditionally, conditional independence has been studied using sparse Gaussian graphical models for continuous data and sparse Ising models for discrete data. However, there are two clear situations when these approaches are inadequate. The first occurs when the data are continuous but display non-normal marginal behavior such as heavy tails or skewness, rendering an assumption of normality inappropriate. The second occurs when a part of the data is ordinal or discrete (e.g., presence or absence of a mutation) and the other part is continuous (e.g., expression levels of genes or proteins). In this case, the existing Bayesian approaches typically employ a latent variable framework for the discrete part that precludes inferring conditional independence among the data that are actually observed. The current article overcomes these two challenges in a unified framework using Gaussian scale mixtures. Our framework is able to handle continuous data that are not normal and data that are of mixed continuous and discrete nature, while still being able to infer a sparse conditional sign independence structure among the observed data. Extensive performance comparison in simulations with alternative techniques and an analysis of a real cancer genomics data set demonstrate the effectiveness of the proposed approach. © 2017, The International Biometric Society.

  12. Genome Scan for Selection in Structured Layer Chicken Populations Exploiting Linkage Disequilibrium Information.

    Directory of Open Access Journals (Sweden)

    Mahmood Gholami

    Full Text Available An increasing interest is being placed in the detection of genes, or genomic regions, that have been targeted by selection because identifying signatures of selection can lead to a better understanding of genotype-phenotype relationships. A common strategy for the detection of selection signatures is to compare samples from distinct populations and to search for genomic regions with outstanding genetic differentiation. The aim of this study was to detect selective signatures in layer chicken populations using a recently proposed approach, hapFLK, which exploits linkage disequilibrium information while accounting appropriately for the hierarchical structure of populations. We performed the analysis on 70 individuals from three commercial layer breeds (White Leghorn, White Rock and Rhode Island Red, genotyped for approximately 1 million SNPs. We found a total of 41 and 107 regions with outstanding differentiation or similarity using hapFLK and its single SNP counterpart FLK respectively. Annotation of selection signature regions revealed various genes and QTL corresponding to productions traits, for which layer breeds were selected. A number of the detected genes were associated with growth and carcass traits, including IGF-1R, AGRP and STAT5B. We also annotated an interesting gene associated with the dark brown feather color mutational phenotype in chickens (SOX10. We compared FST, FLK and hapFLK and demonstrated that exploiting linkage disequilibrium information and accounting for hierarchical population structure decreased the false detection rate.

  13. Universal internucleotide statistics in full genomes: a footprint of the DNA structure and packaging?

    Directory of Open Access Journals (Sweden)

    Mikhail I Bogachev

    Full Text Available Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleotide interval distributions exhibit the same [Formula: see text]-exponential form. While in prokaryotes a single [Formula: see text]-exponential function makes the best fit, in eukaryotes the PDF contains additionally a second [Formula: see text]-exponential, which in the human genome makes a perfect approximation over nearly 10 decades. We suggest that this functional form is a footprint of the heterogeneous DNA structure, where the first [Formula: see text]-exponential reflects the universal helical pitch that appears both in pro- and eukaryotic DNA, while the second [Formula: see text]-exponential is a specific marker of the large-scale eukaryotic DNA organization.

  14. Genomic structure and evolution of the mating type locus in the green seaweed Ulva partita.

    Science.gov (United States)

    Yamazaki, Tomokazu; Ichihara, Kensuke; Suzuki, Ryogo; Oshima, Kenshiro; Miyamura, Shinichi; Kuwano, Kazuyoshi; Toyoda, Atsushi; Suzuki, Yutaka; Sugano, Sumio; Hattori, Masahira; Kawano, Shigeyuki

    2017-09-15

    The evolution of sex chromosomes and mating loci in organisms with UV systems of sex/mating type determination in haploid phases via genes on UV chromosomes is not well understood. We report the structure of the mating type (MT) locus and its evolutionary history in the green seaweed Ulva partita, which is a multicellular organism with an isomorphic haploid-diploid life cycle and mating type determination in the haploid phase. Comprehensive comparison of a total of 12.0 and 16.6 Gb of genomic next-generation sequencing data for mt - and mt + strains identified highly rearranged MT loci of 1.0 and 1.5 Mb in size and containing 46 and 67 genes, respectively, including 23 gametologs. Molecular evolutionary analyses suggested that the MT loci diverged over a prolonged period in the individual mating types after their establishment in an ancestor. A gene encoding an RWP-RK domain-containing protein was found in the mt - MT locus but was not an ortholog of the chlorophycean mating type determination gene MID. Taken together, our results suggest that the genomic structure and its evolutionary history in the U. partita MT locus are similar to those on other UV chromosomes and that the MT locus genes are quite different from those of Chlorophyceae.

  15. Core genome conservation of Staphylococcus haemolyticus limits sequence based population structure analysis.

    Science.gov (United States)

    Cavanagh, Jorunn Pauline; Klingenberg, Claus; Hanssen, Anne-Merethe; Fredheim, Elizabeth Aarag; Francois, Patrice; Schrenzel, Jacques; Flægstad, Trond; Sollid, Johanna Ericson

    2012-06-01

    The notoriously multi-resistant Staphylococcus haemolyticus is an emerging pathogen causing serious infections in immunocompromised patients. Defining the population structure is important to detect outbreaks and spread of antimicrobial resistant clones. Currently, the standard typing technique is pulsed-field gel electrophoresis (PFGE). In this study we describe novel molecular typing schemes for S. haemolyticus using multi locus sequence typing (MLST) and multi locus variable number of tandem repeats (VNTR) analysis. Seven housekeeping genes (MLST) and five VNTR loci (MLVF) were selected for the novel typing schemes. A panel of 45 human and veterinary S. haemolyticus isolates was investigated. The collection had diverse PFGE patterns (38 PFGE types) and was sampled over a 20 year-period from eight countries. MLST resolved 17 sequence types (Simpsons index of diversity [SID]=0.877) and MLVF resolved 14 repeat types (SID=0.831). We found a low sequence diversity. Phylogenetic analysis clustered the isolates in three (MLST) and one (MLVF) clonal complexes, respectively. Taken together, neither the MLST nor the MLVF scheme was suitable to resolve the population structure of this S. haemolyticus collection. Future MLVF and MLST schemes will benefit from addition of more variable core genome sequences identified by comparing different fully sequenced S. haemolyticus genomes. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Coordination of genomic structure and transcription by the main bacterial nucleoid-associated protein HU

    Science.gov (United States)

    Berger, Michael; Farcas, Anca; Geertz, Marcel; Zhelyazkova, Petya; Brix, Klaudia; Travers, Andrew; Muskhelishvili, Georgi

    2010-01-01

    The histone-like protein HU is a highly abundant DNA architectural protein that is involved in compacting the DNA of the bacterial nucleoid and in regulating the main DNA transactions, including gene transcription. However, the coordination of the genomic structure and function by HU is poorly understood. Here, we address this question by comparing transcript patterns and spatial distributions of RNA polymerase in Escherichia coli wild-type and hupA/B mutant cells. We demonstrate that, in mutant cells, upregulated genes are preferentially clustered in a large chromosomal domain comprising the ribosomal RNA operons organized on both sides of OriC. Furthermore, we show that, in parallel to this transcription asymmetry, mutant cells are also impaired in forming the transcription foci—spatially confined aggregations of RNA polymerase molecules transcribing strong ribosomal RNA operons. Our data thus implicate HU in coordinating the global genomic structure and function by regulating the spatial distribution of RNA polymerase in the nucleoid. PMID:20010798

  17. Population structure and genomic inbreeding in nine Swiss dairy cattle populations.

    Science.gov (United States)

    Signer-Hasler, Heidi; Burren, Alexander; Neuditschko, Markus; Frischknecht, Mirjam; Garrick, Dorian; Stricker, Christian; Gredler, Birgit; Bapst, Beat; Flury, Christine

    2017-11-07

    Domestication, breed formation and intensive selection have resulted in divergent cattle breeds that likely exhibit their own genomic signatures. In this study, we used genotypes from 27,612 autosomal single nucleotide polymorphisms to characterize population structure based on 9214 sires representing nine Swiss dairy cattle populations: Brown Swiss (BS), Braunvieh (BV), Original Braunvieh (OB), Holstein (HO), Red Holstein (RH), Swiss Fleckvieh (SF), Simmental (SI), Eringer (ER) and Evolèner (EV). Genomic inbreeding (F ROH ) and signatures of selection were determined by calculating runs of homozygosity (ROH). The results build the basis for a better understanding of the genetic development of Swiss dairy cattle populations and highlight differences between the original populations (i.e. OB, SI, ER and EV) and those that have become more popular in Switzerland as currently reflected by their larger populations (i.e. BS, BV, HO, RH and SF). The levels of genetic diversity were highest and lowest in the SF and BS breeds, respectively. Based on F ST values, we conclude that, among all pairwise comparisons, BS and HO (0.156) differ more than the other pairs of populations. The original Swiss cattle populations OB, SI, ER, and EV are clearly genetically separated from the Swiss cattle populations that are now more common and represented by larger numbers of cows. Mean levels of F ROH ranged from 0.027 (ER) to 0.091 (BS). Three of the original Swiss cattle populations, ER (F ROH : 0.027), OB (F ROH : 0.029), and SI (F ROH : 0.039), showed low levels of genomic inbreeding, whereas it was much higher in EV (F ROH : 0.074). Private signatures of selection for the original Swiss cattle populations are reported for BTA4, 5, 11 and 26. The low levels of genomic inbreeding observed in the original Swiss cattle populations ER, OB and SI compared to the other breeds are explained by a lesser use of artificial insemination and greater use of natural service. Natural service

  18. From Genome to Structure and Back Again: A Family Portrait of the Transcarbamylases

    Directory of Open Access Journals (Sweden)

    Dashuang Shi

    2015-08-01

    Full Text Available Enzymes in the transcarbamylase family catalyze the transfer of a carbamyl group from carbamyl phosphate (CP to an amino group of a second substrate. The two best-characterized members, aspartate transcarbamylase (ATCase and ornithine transcarbamylase (OTCase, are present in most organisms from bacteria to humans. Recently, structures of four new transcarbamylase members, N-acetyl-l-ornithine transcarbamylase (AOTCase, N-succinyl-l-ornithine transcarbamylase (SOTCase, ygeW encoded transcarbamylase (YTCase and putrescine transcarbamylase (PTCase have also been determined. Crystal structures of these enzymes have shown that they have a common overall fold with a trimer as their basic biological unit. The monomer structures share a common CP binding site in their N-terminal domain, but have different second substrate binding sites in their C-terminal domain. The discovery of three new transcarbamylases, l-2,3-diaminopropionate transcarbamylase (DPTCase, l-2,4-diaminobutyrate transcarbamylase (DBTCase and ureidoglycine transcarbamylase (UGTCase, demonstrates that our knowledge and understanding of the spectrum of the transcarbamylase family is still incomplete. In this review, we summarize studies on the structures and function of transcarbamylases demonstrating how structural information helps to define biological function and how small structural differences govern enzyme specificity. Such information is important for correctly annotating transcarbamylase sequences in the genome databases and for identifying new members of the transcarbamylase family.

  19. Structure of the acidianus filamentous virus 3 and comparative genomics of related archaeal lipothrixviruses

    DEFF Research Database (Denmark)

    Vestergaard, Gisle Alberg; Aramayo, Ricardo; Basta, Tamara

    2008-01-01

    Four novel filamentous viruses with double-stranded DNA genomes, namely, Acidianus filamentous virus 3 (AFV3), AFV6, AFV7, and AFV8, have been characterized from the hyperthermophilic archaeal genus Acidianus, and they are assigned to the Betalipothrixvirus genus of the family Lipothrixviridae....... The structures of the approximately 2-mum-long virions are similar, and one of them, AFV3, was studied in detail. It consists of a cylindrical envelope containing globular subunits arranged in a helical formation that is unique for any known double-stranded DNA virus. The envelope is 3.1 nm thick and encases...... structural proteins; (iii) multiple overlapping open reading frames, which may be indicative of gene recoding; (iv) putative 12-bp genetic elements; and (v) partial gene sequences corresponding closely to spacer sequences of chromosomal repeat clusters....

  20. Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

    Science.gov (United States)

    Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M

    2015-11-01

    Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.

  1. High-throughput crystal-optimization strategies in the South Paris Yeast Structural Genomics Project: one size fits all?

    Science.gov (United States)

    Leulliot, Nicolas; Trésaugues, Lionel; Bremang, Michael; Sorel, Isabelle; Ulryck, Nathalie; Graille, Marc; Aboulfath, Ilham; Poupon, Anne; Liger, Dominique; Quevillon-Cheruel, Sophie; Janin, Joël; van Tilbeurgh, Herman

    2005-06-01

    Crystallization has long been regarded as one of the major bottlenecks in high-throughput structural determination by X-ray crystallography. Structural genomics projects have addressed this issue by using robots to set up automated crystal screens using nanodrop technology. This has moved the bottleneck from obtaining the first crystal hit to obtaining diffraction-quality crystals, as crystal optimization is a notoriously slow process that is difficult to automatize. This article describes the high-throughput optimization strategies used in the Yeast Structural Genomics project, with selected successful examples.

  2. SCHEMA computational design of virus capsid chimeras: calibrating how genome packaging, protection, and transduction correlate with calculated structural disruption.

    Science.gov (United States)

    Ho, Michelle L; Adler, Benjamin A; Torre, Michael L; Silberg, Jonathan J; Suh, Junghae

    2013-12-20

    Adeno-associated virus (AAV) recombination can result in chimeric capsid protein subunits whose ability to assemble into an oligomeric capsid, package a genome, and transduce cells depends on the inheritance of sequence from different AAV parents. To develop quantitative design principles for guiding site-directed recombination of AAV capsids, we have examined how capsid structural perturbations predicted by the SCHEMA algorithm correlate with experimental measurements of disruption in seventeen chimeric capsid proteins. In our small chimera population, created by recombining AAV serotypes 2 and 4, we found that protection of viral genomes and cellular transduction were inversely related to calculated disruption of the capsid structure. Interestingly, however, we did not observe a correlation between genome packaging and calculated structural disruption; a majority of the chimeric capsid proteins formed at least partially assembled capsids and more than half packaged genomes, including those with the highest SCHEMA disruption. These results suggest that the sequence space accessed by recombination of divergent AAV serotypes is rich in capsid chimeras that assemble into 60-mer capsids and package viral genomes. Overall, the SCHEMA algorithm may be useful for delineating quantitative design principles to guide the creation of libraries enriched in genome-protecting virus nanoparticles that can effectively transduce cells. Such improvements to the virus design process may help advance not only gene therapy applications but also other bionanotechnologies dependent upon the development of viruses with new sequences and functions.

  3. DNA is structured as a linear "jigsaw puzzle" in the genomes of Arabidopsis, rice, and budding yeast.

    Science.gov (United States)

    Liu, Yun-Hua; Zhang, Meiping; Wu, Chengcang; Huang, James J; Zhang, Hong-Bin

    2014-01-01

    Knowledge of how a genome is structured and organized from its constituent elements is crucial to understanding its biology and evolution. Here, we report the genome structuring and organization pattern as revealed by systems analysis of the sequences of three model species, Arabidopsis, rice and yeast, at the whole-genome and chromosome levels. We found that all fundamental function elements (FFE) constituting the genomes, including genes (GEN), DNA transposable elements (DTE), retrotransposable elements (RTE), simple sequence repeats (SSR), and (or) low complexity repeats (LCR), are structured in a nonrandom and correlative manner, thus leading to a hypothesis that the DNA of the species is structured as a linear "jigsaw puzzle". Furthermore, we showed that different FFE differ in their importance in the formation and evolution of the DNA jigsaw puzzle structure between species. DTE and RTE play more important roles than GEN, LCR, and SSR in Arabidopsis, whereas GEN and RTE play more important roles than LCR, SSR, and DTE in rice. The genes having multiple recognized functions play more important roles than those having single functions. These results provide useful knowledge necessary for better understanding genome biology and evolution of the species and for effective molecular breeding of rice.

  4. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    Science.gov (United States)

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was

  5. Accurate evaluation of subband structure in a carrier accumulation layer at an n-type InAs surface: LDF calculation combined with high-resolution photoelectron spectroscopy

    Directory of Open Access Journals (Sweden)

    Takeshi Inaoka

    2012-12-01

    Full Text Available Adsorption on an n-type InAs surface often induces a gradual formation of a carrier-accumulation layer at the surface. By means of high-resolution photoelectron spectroscopy (PES, Betti et al. made a systematic observation of subbands in the accumulation layer in the formation process. Incorporating a highly nonparabolic (NP dispersion of the conduction band into the local-density-functional (LDF formalism, we examine the subband structure in the accumulation-layer formation process. Combining the LDF calculation with the PES experiment, we make an accurate evaluation of the accumulated-carrier density, the subband-edge energies, and the subband energy dispersion at each formation stage. Our theoretical calculation can reproduce the three observed subbands quantitatively. The subband dispersion, which deviates downward from that of the projected bulk conduction band with an increase in wave number, becomes significantly weaker in the formation process. Accurate evaluation of the NP subband dispersion at each formation stage is indispensable in making a quantitative analysis of collective electronic excitations and transport properties in the subbands.

  6. Genome characterization and population genetic structure of the zoonotic pathogen, Streptococcus canis

    Directory of Open Access Journals (Sweden)

    Richards Vincent P

    2012-12-01

    Full Text Available Abstract Background Streptococcus canis is an important opportunistic pathogen of dogs and cats that can also infect a wide range of additional mammals including cows where it can cause mastitis. It is also an emerging human pathogen. Results Here we provide characterization of the first genome sequence for this species, strain FSL S3-227 (milk isolate from a cow with an intra-mammary infection. A diverse array of putative virulence factors was encoded by the S. canis FSL S3-227 genome. Approximately 75% of these gene sequences were homologous to known Streptococcal virulence factors involved in invasion, evasion, and colonization. Present in the genome are multiple potentially mobile genetic elements (MGEs [plasmid, phage, integrative conjugative element (ICE] and comparison to other species provided convincing evidence for lateral gene transfer (LGT between S. canis and two additional bovine mastitis causing pathogens (Streptococcus agalactiae, and Streptococcus dysgalactiae subsp. dysgalactiae, with this transfer possibly contributing to host adaptation. Population structure among isolates obtained from Europe and USA [bovine = 56, canine = 26, and feline = 1] was explored. Ribotyping of all isolates and multi locus sequence typing (MLST of a subset of the isolates (n = 45 detected significant differentiation between bovine and canine isolates (Fisher exact test: P = 0.0000 [ribotypes], P = 0.0030 [sequence types], suggesting possible host adaptation of some genotypes. Concurrently, the ancestral clonal complex (54% of isolates occurred in many tissue types, all hosts, and all geographic locations suggesting the possibility of a wide and diverse niche. Conclusion This study provides evidence highlighting the importance of LGT in the evolution of the bacteria S. canis, specifically, its possible role in host adaptation and acquisition of virulence factors. Furthermore, recent LGT detected between S. canis and human

  7. Genome characterization and population genetic structure of the zoonotic pathogen, Streptococcus canis.

    Science.gov (United States)

    Richards, Vincent P; Zadoks, Ruth N; Pavinski Bitar, Paulina D; Lefébure, Tristan; Lang, Ping; Werner, Brenda; Tikofsky, Linda; Moroni, Paolo; Stanhope, Michael J

    2012-12-18

    Streptococcus canis is an important opportunistic pathogen of dogs and cats that can also infect a wide range of additional mammals including cows where it can cause mastitis. It is also an emerging human pathogen. Here we provide characterization of the first genome sequence for this species, strain FSL S3-227 (milk isolate from a cow with an intra-mammary infection). A diverse array of putative virulence factors was encoded by the S. canis FSL S3-227 genome. Approximately 75% of these gene sequences were homologous to known Streptococcal virulence factors involved in invasion, evasion, and colonization. Present in the genome are multiple potentially mobile genetic elements (MGEs) [plasmid, phage, integrative conjugative element (ICE)] and comparison to other species provided convincing evidence for lateral gene transfer (LGT) between S. canis and two additional bovine mastitis causing pathogens (Streptococcus agalactiae, and Streptococcus dysgalactiae subsp. dysgalactiae), with this transfer possibly contributing to host adaptation. Population structure among isolates obtained from Europe and USA [bovine = 56, canine = 26, and feline = 1] was explored. Ribotyping of all isolates and multi locus sequence typing (MLST) of a subset of the isolates (n = 45) detected significant differentiation between bovine and canine isolates (Fisher exact test: P = 0.0000 [ribotypes], P = 0.0030 [sequence types]), suggesting possible host adaptation of some genotypes. Concurrently, the ancestral clonal complex (54% of isolates) occurred in many tissue types, all hosts, and all geographic locations suggesting the possibility of a wide and diverse niche. This study provides evidence highlighting the importance of LGT in the evolution of the bacteria S. canis, specifically, its possible role in host adaptation and acquisition of virulence factors. Furthermore, recent LGT detected between S. canis and human bacteria (Streptococcus urinalis) is cause for concern

  8. Variation in the OC locus of Acinetobacter baumannii genomes predicts extensive structural diversity in the lipooligosaccharide.

    Directory of Open Access Journals (Sweden)

    Johanna J Kenyon

    Full Text Available Lipooligosaccharide (LOS is a complex surface structure that is linked to many pathogenic properties of Acinetobacter baumannii. In A. baumannii, the genes responsible for the synthesis of the outer core (OC component of the LOS are located between ilvE and aspS. The content of the OC locus is usually variable within a species, and examination of 6 complete and 227 draft A. baumannii genome sequences available in GenBank non-redundant and Whole Genome Shotgun databases revealed nine distinct new types, OCL4-OCL12, in addition to the three known ones. The twelve gene clusters fell into two distinct groups, designated Group A and Group B, based on similarities in the genes present. OCL6 (Group B was unique in that it included genes for the synthesis of L-Rhamnosep. Genetic exchange of the different configurations between strains has occurred as some OC forms were found in several different sequence types (STs. OCL1 (Group A was the most widely distributed being present in 18 STs, and OCL6 was found in 16 STs. Variation within clones was also observed, with more than one OC locus type found in the two globally disseminated clones, GC1 and GC2, that include the majority of multiply antibiotic resistant isolates. OCL1 was the most abundant gene cluster in both GC1 and GC2 genomes but GC1 isolates also carried OCL2, OCL3 or OCL5, and OCL3 was also present in GC2. As replacement of the OC locus in the major global clones indicates the presence of sub-lineages, a PCR typing scheme was developed to rapidly distinguish Group A and Group B types, and to distinguish the specific forms found in GC1 and GC2 isolates.

  9. Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi

    KAUST Repository

    Assefa, Samuel

    2015-10-06

    Malaria cases caused by the zoonotic parasite Plasmodium knowlesi are being increasingly reported throughout Southeast Asia and in travelers returning from the region. To test for evidence of signatures of selection or unusual population structure in this parasite, we surveyed genome sequence diversity in 48 clinical isolates recently sampled from Malaysian Borneo and in five lines maintained in laboratory rhesus macaques after isolation in the 1960s from Peninsular Malaysia and the Philippines. Overall genomewide nucleotide diversity (π = 6.03 × 10) was much higher than has been seen in worldwide samples of either of the major endemic malaria parasite species Plasmodium falciparum and Plasmodium vivax. A remarkable substructure is revealed within P. knowlesi, consisting of two major sympatric clusters of the clinical isolates and a third cluster comprising the laboratory isolates. There was deep differentiation between the two clusters of clinical isolates [mean genomewide fixation index (F) = 0.21, with 9,293 SNPs having fixed differences of F = 1.0]. This differentiation showed marked heterogeneity across the genome, with mean F values of different chromosomes ranging from 0.08 to 0.34 and with further significant variation across regions within several chromosomes. Analysis of the largest cluster (cluster 1, 38 isolates) indicated long-term population growth, with negatively skewed allele frequency distributions (genomewide average Tajima\\'s D = -1.35). Against this background there was evidence of balancing selection on particular genes, including the circumsporozoite protein (csp) gene, which had the top Tajima\\'s D value (1.57), and scans of haplotype homozygosity implicate several genomic regions as being under recent positive selection.

  10. Comparative genome analyses reveal distinct structure in the saltwater crocodile MHC.

    Directory of Open Access Journals (Sweden)

    Weerachai Jaratlerdsiri

    Full Text Available The major histocompatibility complex (MHC is a dynamic genome region with an essential role in the adaptive immunity of vertebrates, especially antigen presentation. The MHC is generally divided into subregions (classes I, II and III containing genes of similar function across species, but with different gene number and organisation. Crocodylia (crocodilians are widely distributed and represent an evolutionary distinct group among higher vertebrates, but the genomic organisation of MHC within this lineage has been largely unexplored. Here, we studied the MHC region of the saltwater crocodile (Crocodylus porosus and compared it with that of other taxa. We characterised genomic clusters encompassing MHC class I and class II genes in the saltwater crocodile based on sequencing of bacterial artificial chromosomes. Six gene clusters spanning ∼452 kb were identified to contain nine MHC class I genes, six MHC class II genes, three TAP genes, and a TRIM gene. These MHC class I and class II genes were in separate scaffold regions and were greater in length (2-6 times longer than their counterparts in well-studied fowl B loci, suggesting that the compaction of avian MHC occurred after the crocodilian-avian split. Comparative analyses between the saltwater crocodile MHC and that from the alligator and gharial showed large syntenic areas (>80% identity with similar gene order. Comparisons with other vertebrates showed that the saltwater crocodile had MHC class I genes located along with TAP, consistent with birds studied. Linkage between MHC class I and TRIM39 observed in the saltwater crocodile resembled MHC in eutherians compared, but absent in avian MHC, suggesting that the saltwater crocodile MHC appears to have gene organisation intermediate between these two lineages. These observations suggest that the structure of the saltwater crocodile MHC, and other crocodilians, can help determine the MHC that was present in the ancestors of archosaurs.

  11. Population genomic structure and adaptation in the zoonotic malaria parasite Plasmodium knowlesi

    KAUST Repository

    Assefa, Samuel; Lim, Caeul; Preston, Mark D.; Duffy, Craig W.; Nair, Mridul; Adroub, Sabir; Kadir, Khamisah A.; Goldberg, Jonathan M.; Neafsey, Daniel E.; Divis, Paul; Clark, Taane G.; Duraisingh, Manoj T.; Conway, David J.; Pain, Arnab; Singh, Balbir

    2015-01-01

    Malaria cases caused by the zoonotic parasite Plasmodium knowlesi are being increasingly reported throughout Southeast Asia and in travelers returning from the region. To test for evidence of signatures of selection or unusual population structure in this parasite, we surveyed genome sequence diversity in 48 clinical isolates recently sampled from Malaysian Borneo and in five lines maintained in laboratory rhesus macaques after isolation in the 1960s from Peninsular Malaysia and the Philippines. Overall genomewide nucleotide diversity (π = 6.03 × 10) was much higher than has been seen in worldwide samples of either of the major endemic malaria parasite species Plasmodium falciparum and Plasmodium vivax. A remarkable substructure is revealed within P. knowlesi, consisting of two major sympatric clusters of the clinical isolates and a third cluster comprising the laboratory isolates. There was deep differentiation between the two clusters of clinical isolates [mean genomewide fixation index (F) = 0.21, with 9,293 SNPs having fixed differences of F = 1.0]. This differentiation showed marked heterogeneity across the genome, with mean F values of different chromosomes ranging from 0.08 to 0.34 and with further significant variation across regions within several chromosomes. Analysis of the largest cluster (cluster 1, 38 isolates) indicated long-term population growth, with negatively skewed allele frequency distributions (genomewide average Tajima's D = -1.35). Against this background there was evidence of balancing selection on particular genes, including the circumsporozoite protein (csp) gene, which had the top Tajima's D value (1.57), and scans of haplotype homozygosity implicate several genomic regions as being under recent positive selection.

  12. Genome characterization and population genetic structure of the zoonotic pathogen, Streptococcus canis

    Science.gov (United States)

    2012-01-01

    Background Streptococcus canis is an important opportunistic pathogen of dogs and cats that can also infect a wide range of additional mammals including cows where it can cause mastitis. It is also an emerging human pathogen. Results Here we provide characterization of the first genome sequence for this species, strain FSL S3-227 (milk isolate from a cow with an intra-mammary infection). A diverse array of putative virulence factors was encoded by the S. canis FSL S3-227 genome. Approximately 75% of these gene sequences were homologous to known Streptococcal virulence factors involved in invasion, evasion, and colonization. Present in the genome are multiple potentially mobile genetic elements (MGEs) [plasmid, phage, integrative conjugative element (ICE)] and comparison to other species provided convincing evidence for lateral gene transfer (LGT) between S. canis and two additional bovine mastitis causing pathogens (Streptococcus agalactiae, and Streptococcus dysgalactiae subsp. dysgalactiae), with this transfer possibly contributing to host adaptation. Population structure among isolates obtained from Europe and USA [bovine = 56, canine = 26, and feline = 1] was explored. Ribotyping of all isolates and multi locus sequence typing (MLST) of a subset of the isolates (n = 45) detected significant differentiation between bovine and canine isolates (Fisher exact test: P = 0.0000 [ribotypes], P = 0.0030 [sequence types]), suggesting possible host adaptation of some genotypes. Concurrently, the ancestral clonal complex (54% of isolates) occurred in many tissue types, all hosts, and all geographic locations suggesting the possibility of a wide and diverse niche. Conclusion This study provides evidence highlighting the importance of LGT in the evolution of the bacteria S. canis, specifically, its possible role in host adaptation and acquisition of virulence factors. Furthermore, recent LGT detected between S. canis and human bacteria (Streptococcus

  13. THE HYPERFINE STRUCTURE OF THE ROTATIONAL SPECTRUM OF HDO AND ITS EXTENSION TO THE THz REGION: ACCURATE REST FREQUENCIES AND SPECTROSCOPIC PARAMETERS FOR ASTROPHYSICAL OBSERVATIONS

    Energy Technology Data Exchange (ETDEWEB)

    Cazzoli, Gabriele; Lattanzi, Valerio; Puzzarini, Cristina [Dipartimento di Chimica “Giacomo Ciamician”, Università di Bologna, Via Selmi 2, I-40126 Bologna (Italy); Alonso, José Luis [Grupo de Espectroscopía Molecular (GEM), Unidad Asociada CSIC, Edificio Quifima, Laboratorios de Espectroscopia y Bioespectroscopia, Parque Científico UVa, Universidad de Valladolid, E-47005 Valladolid (Spain); Gauss, Jürgen, E-mail: cristina.puzzarini@unibo.it [Institut für Physikalische Chemie, Universität Mainz, D-55099 Mainz (Germany)

    2015-06-10

    The rotational spectrum of the mono-deuterated isotopologue of water, HD{sup 16}O, has been investigated in the millimeter- and submillimeter-wave frequency regions, up to 1.6 THz. The Lamb-dip technique has been exploited to obtain sub-Doppler resolution and to resolve the hyperfine (hf) structure due to the deuterium and hydrogen nuclei, thus enabling the accurate determination of the corresponding hf parameters. Their experimental determination has been supported by high-level quantum-chemical calculations. The Lamb-dip measurements have been supplemented by Doppler-limited measurements (weak high-J and high-frequency transitions) in order to extend the predictive capability of the available spectroscopic constants. The possibility of resolving hf splittings in astronomical spectra has been discussed.

  14. Structure modeling of all identified G protein-coupled receptors in the human genome.

    Science.gov (United States)

    Zhang, Yang; Devries, Mark E; Skolnick, Jeffrey

    2006-02-01

    G protein-coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha) root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness

  15. Structure modeling of all identified G protein-coupled receptors in the human genome.

    Directory of Open Access Journals (Sweden)

    Yang Zhang

    2006-02-01

    Full Text Available G protein-coupled receptors (GPCRs, encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global C(alpha root-mean-squared deviation from native of 4.6 angstroms, with a root-mean-squared deviation in the transmembrane helix region of 2.1 angstroms. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness

  16. Population genomic analysis of ancient and modern genomes yields new insights into the genetic ancestry of the Tyrolean Iceman and the genetic structure of Europe.

    Directory of Open Access Journals (Sweden)

    Martin Sikora

    2014-05-01

    Full Text Available Genome sequencing of the 5,300-year-old mummy of the Tyrolean Iceman, found in 1991 on a glacier near the border of Italy and Austria, has yielded new insights into his origin and relationship to modern European populations. A key finding of that study was an apparent recent common ancestry with individuals from Sardinia, based largely on the Y chromosome haplogroup and common autosomal SNP variation. Here, we compiled and analyzed genomic datasets from both modern and ancient Europeans, including genome sequence data from over 400 Sardinians and two ancient Thracians from Bulgaria, to investigate this result in greater detail and determine its implications for the genetic structure of Neolithic Europe. Using whole-genome sequencing data, we confirm that the Iceman is, indeed, most closely related to Sardinians. Furthermore, we show that this relationship extends to other individuals from cultural contexts associated with the spread of agriculture during the Neolithic transition, in contrast to individuals from a hunter-gatherer context. We hypothesize that this genetic affinity of ancient samples from different parts of Europe with Sardinians represents a common genetic component that was geographically widespread across Europe during the Neolithic, likely related to migrations and population expansions associated with the spread of agriculture.

  17. The population genomics of begomoviruses: global scale population structure and gene flow

    Directory of Open Access Journals (Sweden)

    Prasanna HC

    2010-09-01

    Full Text Available Abstract Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could

  18. Determination of the structure of γ-alumina from interatomic potential and first-principles calculations: The requirement of significant numbers of nonspinel positions to achieve an accurate structural model

    International Nuclear Information System (INIS)

    Paglia, Gianluca; Rohl, Andrew L.; Gale, Julian D.; Buckley, Craig E.

    2005-01-01

    We have performed an extensive computational study of γ-Al 2 O 3 , beginning with the geometric analysis of approximately 1.47 billion spinel-based structural candidates, followed by derivative method energy minimization calculations of approximately 122 000 structures. Optimization of the spinel-based structural models demonstrated that structures exhibiting nonspinel site occupancy after simulation were more energetically favorable, as suggested in other computational studies. More importantly, none of the spinel structures exhibited simulated diffraction patterns that were characteristic of γ-Al 2 O 3 . This suggests that cations of γ-Al 2 O 3 are not exclusively held in spinel positions, that the spinel model of γ-Al 2 O 3 does not accurately reflect its structure, and that a representative structure cannot be achieved from molecular modeling when the spinel representation is used as the starting structure. The latter two of these three findings are extremely important when trying to accurately model the structure. A second set of starting models were generated with a large number of cations occupying c symmetry positions, based on the findings from recent experiments. Optimization of the new c symmetry-based structural models resulted in simulated diffraction patterns that were characteristic of γ-Al 2 O 3 . The modeling, conducted using supercells, yields a more accurate and complete determination of the defect structure of γ-Al 2 O 3 than can be achieved with current experimental techniques. The results show that on average over 40% of the cations in the structure occupy nonspinel positions, and approximately two-thirds of these occupy c symmetry positions. The structures exhibit variable occupancy in the site positions that follow local symmetry exclusion rules. This variation was predominantly represented by a migration of cations away from a symmetry positions to other tetrahedral site positions during optimization which were found not to affect the

  19. Structural and functional screening in human induced-pluripotent stem cell-derived cardiomyocytes accurately identifies cardiotoxicity of multiple drug types

    Energy Technology Data Exchange (ETDEWEB)

    Doherty, Kimberly R., E-mail: kimberly.doherty@quintiles.com; Talbert, Dominique R.; Trusk, Patricia B.; Moran, Diarmuid M.; Shell, Scott A.; Bacus, Sarah

    2015-05-15

    Safety pharmacology studies that evaluate new drug entities for potential cardiac liability remain a critical component of drug development. Current studies have shown that in vitro tests utilizing human induced pluripotent stem cell-derived cardiomyocytes (hiPS-CM) may be beneficial for preclinical risk evaluation. We recently demonstrated that an in vitro multi-parameter test panel assessing overall cardiac health and function could accurately reflect the associated clinical cardiotoxicity of 4 FDA-approved targeted oncology agents using hiPS-CM. The present studies expand upon this initial observation to assess whether this in vitro screen could detect cardiotoxicity across multiple drug classes with known clinical cardiac risks. Thus, 24 drugs were examined for their effect on both structural (viability, reactive oxygen species generation, lipid formation, troponin secretion) and functional (beating activity) endpoints in hiPS-CM. Using this screen, the cardiac-safe drugs showed no effects on any of the tests in our panel. However, 16 of 18 compounds with known clinical cardiac risk showed drug-induced changes in hiPS-CM by at least one method. Moreover, when taking into account the Cmax values, these 16 compounds could be further classified depending on whether the effects were structural, functional, or both. Overall, the most sensitive test assessed cardiac beating using the xCELLigence platform (88.9%) while the structural endpoints provided additional insight into the mechanism of cardiotoxicity for several drugs. These studies show that a multi-parameter approach examining both cardiac cell health and function in hiPS-CM provides a comprehensive and robust assessment that can aid in the determination of potential cardiac liability. - Highlights: • 24 drugs were tested for cardiac liability using an in vitro multi-parameter screen. • Changes in beating activity were the most sensitive in predicting cardiac risk. • Structural effects add in

  20. Towards accurate structural characterization of metal centres in protein crystals: the structures of Ni and Cu T6 bovine insulin derivatives

    DEFF Research Database (Denmark)

    Frankær, Christian Grundahl; Mossin, Susanne; Ståhl, Kenny

    2014-01-01

    Using synchrotron radiation (SR), the crystal structures of T6 bovine insulin complexed with Ni2+ and Cu2+ were solved to 1.50 and 1.45 Å resolution, respectively. The level of detail around the metal centres in these structures was highly limited, and the coordination of water in Cu site II...... of the copper insulin derivative was deteriorated as a consequence of radiation damage. To provide more detail, X-ray absorption spectroscopy (XAS) was used to improve the information level about metal coordination in each derivative. The nickel derivative contains hexacoordinated Ni2+ with trigonal symmetry...... by electron paramagnetic resonance (EPR). The coordination distances were refined from EXAFS with standard deviations within 0.01 Å. The insulin derivative containing Cu2+ is sensitive towards photoreduction when exposed to SR. During the reduction of Cu2+ to Cu+, the coordination geometry of copper changes...

  1. Whole-Genome Analysis of a Novel Fish Reovirus (MsReV Discloses Aquareovirus Genomic Structure Relationship with Host in Saline Environments

    Directory of Open Access Journals (Sweden)

    Zhong-Yuan Chen

    2015-08-01

    Full Text Available Aquareoviruses are serious pathogens of aquatic animals. Here, genome characterization and functional gene analysis of a novel aquareovirus, largemouth bass Micropterus salmoides reovirus (MsReV, was described. It comprises 11 dsRNA segments (S1–S11 covering 24,024 bp, and encodes 12 putative proteins including the inclusion forming-related protein NS87 and the fusion-associated small transmembrane (FAST protein NS22. The function of NS22 was confirmed by expression in fish cells. Subsequently, MsReV was compared with two representative aquareoviruses, saltwater fish turbot Scophthalmus maximus reovirus (SMReV and freshwater fish grass carp reovirus strain 109 (GCReV-109. MsReV NS87 and NS22 genes have the same structure and function with those of SMReV, whereas GCReV-109 is either missing the coiled-coil region in NS79 or the gene-encoding NS22. Significant similarities are also revealed among equivalent genome segments between MsReV and SMReV, but a difference is found between MsReV and GCReV-109. Furthermore, phylogenetic analysis showed that 13 aquareoviruses could be divided into freshwater and saline environments subgroups, and MsReV was closely related to SMReV in saline environments. Consequently, these viruses from hosts in saline environments have more genomic structural similarities than the viruses from hosts in freshwater. This is the first study of the relationships between aquareovirus genomic structure and their host environments.

  2. Towards accurate structural characterization of metal centres in protein crystals: the structures of Ni and Cu T6 bovine insulin derivatives

    International Nuclear Information System (INIS)

    Frankaer, Christian Grundahl; Mossin, Susanne; Ståhl, Kenny; Harris, Pernille

    2013-01-01

    The level of structural detail around the metal sites in Ni 2+ and Cu 2+ T 6 insulin derivatives was significantly improved by using a combination of single-crystal X-ray crystallography and X-ray absorption spectroscopy. Photoreduction and subsequent radiation damage of the Cu 2+ sites in Cu insulin was followed by XANES spectroscopy. Using synchrotron radiation (SR), the crystal structures of T 6 bovine insulin complexed with Ni 2+ and Cu 2+ were solved to 1.50 and 1.45 Å resolution, respectively. The level of detail around the metal centres in these structures was highly limited, and the coordination of water in Cu site II of the copper insulin derivative was deteriorated as a consequence of radiation damage. To provide more detail, X-ray absorption spectroscopy (XAS) was used to improve the information level about metal coordination in each derivative. The nickel derivative contains hexacoordinated Ni 2+ with trigonal symmetry, whereas the copper derivative contains tetragonally distorted hexacoordinated Cu 2+ as a result of the Jahn–Teller effect, with a significantly longer coordination distance for one of the three water molecules in the coordination sphere. That the copper centre is of type II was further confirmed by electron paramagnetic resonance (EPR). The coordination distances were refined from EXAFS with standard deviations within 0.01 Å. The insulin derivative containing Cu 2+ is sensitive towards photoreduction when exposed to SR. During the reduction of Cu 2+ to Cu + , the coordination geometry of copper changes towards lower coordination numbers. Primary damage, i.e. photoreduction, was followed directly by XANES as a function of radiation dose, while secondary damage in the form of structural changes around the Cu atoms after exposure to different radiation doses was studied by crystallography using a laboratory diffractometer. Protection against photoreduction and subsequent radiation damage was carried out by solid embedment of Cu insulin

  3. Predicting effects of structural stress in a genome-reduced model bacterial metabolism

    Science.gov (United States)

    Güell, Oriol; Sagués, Francesc; Serrano, M. Ángeles

    2012-08-01

    Mycoplasma pneumoniae is a human pathogen recently proposed as a genome-reduced model for bacterial systems biology. Here, we study the response of its metabolic network to different forms of structural stress, including removal of individual and pairs of reactions and knockout of genes and clusters of co-expressed genes. Our results reveal a network architecture as robust as that of other model bacteria regarding multiple failures, although less robust against individual reaction inactivation. Interestingly, metabolite motifs associated to reactions can predict the propagation of inactivation cascades and damage amplification effects arising in double knockouts. We also detect a significant correlation between gene essentiality and damages produced by single gene knockouts, and find that genes controlling high-damage reactions tend to be expressed independently of each other, a functional switch mechanism that, simultaneously, acts as a genetic firewall to protect metabolism. Prediction of failure propagation is crucial for metabolic engineering or disease treatment.

  4. Processive and nonprocessive cellulases for biofuel production. Lessons from bacterial genomes and structural analysis

    Energy Technology Data Exchange (ETDEWEB)

    Wilson, David B. [Cornell Univ. Ithaca, New York, NY (United States). Dept. of Molecular Biology and Genetics

    2012-01-15

    Cellulases are key enzymes used in many processes for producing liquid fuels from biomass. Currently there many efforts to reduce the cost of cellulases using both structural approaches to improve the properties of individual cellulases and genomic approaches to identify new cellulases as well as other proteins that increase the activity of cellulases in degrading pretreated biomass materials. Fungal GH-61 proteins are important new enzymes that increase the activity of current commercial cellulases leading to lower total protein loading and thus lower cost. Recent work has greatly increased our knowledge of these novel enzymes that appear to be oxido-reductases that target crystalline cellulose and increase its accessibility to cellulases. They appear to carry out the C1 activity originally proposed by Dr Reese. Cellobiose dehydrogenase appears to interact with GH-61 proteins in this function, providing a role for this puzzling enzyme. Cellulase research is making considerable progress and appears to be poised for even greater advances. (orig.)

  5. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; Platteel, Mathieu; Swertz, Morris A.; Wijmenga, Cisca

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  6. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  7. Genome Structural Diversity among 31 Bordetella pertussis Isolates from Two Recent U.S. Whooping Cough Statewide Epidemics.

    Science.gov (United States)

    Bowden, Katherine E; Weigand, Michael R; Peng, Yanhui; Cassiday, Pamela K; Sammons, Scott; Knipe, Kristen; Rowe, Lori A; Loparev, Vladimir; Sheth, Mili; Weening, Keeley; Tondella, M Lucia; Williams, Margaret M

    2016-01-01

    During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B. pertussis populations. IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B. pertussis strains circulating during epidemics exhibit diversity visible on a genome structural

  8. Brain Genomics Superstruct Project initial data release with structural, functional, and behavioral measures.

    Science.gov (United States)

    Holmes, Avram J; Hollinshead, Marisa O; O'Keefe, Timothy M; Petrov, Victor I; Fariello, Gabriele R; Wald, Lawrence L; Fischl, Bruce; Rosen, Bruce R; Mair, Ross W; Roffman, Joshua L; Smoller, Jordan W; Buckner, Randy L

    2015-01-01

    The goal of the Brain Genomics Superstruct Project (GSP) is to enable large-scale exploration of the links between brain function, behavior, and ultimately genetic variation. To provide the broader scientific community data to probe these associations, a repository of structural and functional magnetic resonance imaging (MRI) scans linked to genetic information was constructed from a sample of healthy individuals. The initial release, detailed in the present manuscript, encompasses quality screened cross-sectional data from 1,570 participants ages 18 to 35 years who were scanned with MRI and completed demographic and health questionnaires. Personality and cognitive measures were obtained on a subset of participants. Each dataset contains a T1-weighted structural MRI scan and either one (n=1,570) or two (n=1,139) resting state functional MRI scans. Test-retest reliability datasets are included from 69 participants scanned within six months of their initial visit. For the majority of participants self-report behavioral and cognitive measures are included (n=926 and n=892 respectively). Analyses of data quality, structure, function, personality, and cognition are presented to demonstrate the dataset's utility.

  9. Distinct Mechanisms of Nuclease-Directed DNA-Structure-Induced Genetic Instability in Cancer Genomes.

    Science.gov (United States)

    Zhao, Junhua; Wang, Guliang; Del Mundo, Imee M; McKinney, Jennifer A; Lu, Xiuli; Bacolla, Albino; Boulware, Stephen B; Zhang, Changsheng; Zhang, Haihua; Ren, Pengyu; Freudenreich, Catherine H; Vasquez, Karen M

    2018-01-30

    Sequences with the capacity to adopt alternative DNA structures have been implicated in cancer etiology; however, the mechanisms are unclear. For example, H-DNA-forming sequences within oncogenes have been shown to stimulate genetic instability in mammals. Here, we report that H-DNA-forming sequences are enriched at translocation breakpoints in human cancer genomes, further implicating them in cancer etiology. H-DNA-induced mutations were suppressed in human cells deficient in the nucleotide excision repair nucleases, ERCC1-XPF and XPG, but were stimulated in cells deficient in FEN1, a replication-related endonuclease. Further, we found that these nucleases cleaved H-DNA conformations, and the interactions of modeled H-DNA with ERCC1-XPF, XPG, and FEN1 proteins were explored at the sub-molecular level. The results suggest mechanisms of genetic instability triggered by H-DNA through distinct structure-specific, cleavage-based replication-independent and replication-dependent pathways, providing critical evidence for a role of the DNA structure itself in the etiology of cancer and other human diseases. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  10. XTHs from Fragaria vesca: genomic structure and transcriptomic analysis in ripening fruit and other tissues.

    Science.gov (United States)

    Opazo, María Cecilia; Lizana, Rodrigo; Stappung, Yazmina; Davis, Thomas M; Herrera, Raúl; Moya-León, María Alejandra

    2017-11-07

    Fragaria vesca or 'woodland strawberry' has emerged as an attractive model for the study of ripening of non-climacteric fruit. It has several advantages, such as its small genome and its diploidy. The recent availability of the complete sequence of its genome opens the possibility for further analysis and its use as a reference species. Fruit softening is a physiological event and involves many biochemical changes that take place at the final stages of fruit development; among them, the remodeling of cell walls by the action of a set of enzymes. Xyloglucan endotransglycosylase/hydrolase (XTH) is a cell wall-associated enzyme, which is encoded by a multigene family. Its action modifies the structure of xyloglucans, a diverse group of polysaccharides that crosslink with cellulose microfibrills, affecting therefore the functional structure of the cell wall. The aim of this work is to identify the XTH-encoding genes present in F. vesca and to determine its transcription level in ripening fruit. The search resulted in identification of 26 XTH-encoding genes named as FvXTHs. Genetic structure and phylogenetic analyses were performed allowing the classification of FvXTH genes into three phylogenetic groups: 17 in group I/II, 2 in group IIIA and 4 in group IIIB. Two sequences were included into the ancestral group. Through a comparative analysis, characteristic structural protein domains were found in FvXTH protein sequences. In complement, expression analyses of FvXTHs by qPCR were performed in fruit at different developmental and ripening stages, as well as, in other tissues. The results showed a diverse expression pattern of FvXTHs in several tissues, although most of them are highly expressed in roots. Their expression patterns are not related to their respective phylogenetic groups. In addition, most FvXTHs are expressed in ripe fruit, and interestingly, some of them (FvXTH 18 and 20, belonging to phylogenic group I/II, and FvXTH 25 and 26 to group IIIB) display an

  11. Accurate quantum chemical calculations

    Science.gov (United States)

    Bauschlicher, Charles W., Jr.; Langhoff, Stephen R.; Taylor, Peter R.

    1989-01-01

    An important goal of quantum chemical calculations is to provide an understanding of chemical bonding and molecular electronic structure. A second goal, the prediction of energy differences to chemical accuracy, has been much harder to attain. First, the computational resources required to achieve such accuracy are very large, and second, it is not straightforward to demonstrate that an apparently accurate result, in terms of agreement with experiment, does not result from a cancellation of errors. Recent advances in electronic structure methodology, coupled with the power of vector supercomputers, have made it possible to solve a number of electronic structure problems exactly using the full configuration interaction (FCI) method within a subspace of the complete Hilbert space. These exact results can be used to benchmark approximate techniques that are applicable to a wider range of chemical and physical problems. The methodology of many-electron quantum chemistry is reviewed. Methods are considered in detail for performing FCI calculations. The application of FCI methods to several three-electron problems in molecular physics are discussed. A number of benchmark applications of FCI wave functions are described. Atomic basis sets and the development of improved methods for handling very large basis sets are discussed: these are then applied to a number of chemical and spectroscopic problems; to transition metals; and to problems involving potential energy surfaces. Although the experiences described give considerable grounds for optimism about the general ability to perform accurate calculations, there are several problems that have proved less tractable, at least with current computer resources, and these and possible solutions are discussed.

  12. Comparisons of Copy Number, Genomic Structure, and Conserved Motifs for α-Amylase Genes from Barley, Rice, and Wheat

    Directory of Open Access Journals (Sweden)

    Qisen Zhang

    2017-10-01

    Full Text Available Barley is an important crop for the production of malt and beer. However, crops such as rice and wheat are rarely used for malting. α-amylase is the key enzyme that degrades starch during malting. In this study, we compared the genomic properties, gene copies, and conserved promoter motifs of α-amylase genes in barley, rice, and wheat. In all three crops, α-amylase consists of four subfamilies designated amy1, amy2, amy3, and amy4. In wheat and barley, members of amy1 and amy2 genes are localized on chromosomes 6 and 7, respectively. In rice, members of amy1 genes are found on chromosomes 1 and 2, and amy2 genes on chromosome 6. The barley genome has six amy1 members and three amy2 members. The wheat B genome contains four amy1 members and three amy2 members, while the rice genome has three amy1 members and one amy2 member. The B genome has mostly amy1 and amy2 members among the three wheat genomes. Amy1 promoters from all three crop genomes contain a GA-responsive complex consisting of a GA-responsive element (CAATAAA, pyrimidine box (CCTTTT and TATCCAT/C box. This study has shown that amy1 and amy2 from both wheat and barley have similar genomic properties, including exon/intron structures and GA-responsive elements on promoters, but these differ in rice. Like barley, wheat should have sufficient amy activity to degrade starch completely during malting. Other factors, such as high protein with haze issues and the lack of husk causing Lauting difficulty, may limit the use of wheat for brewing.

  13. Structural genomic abnormalities in autism and schizophrenia. With a focus on the 22q11.2 deletion syndrome

    NARCIS (Netherlands)

    Vorstman, J.A.S.

    2008-01-01

    The research presented in this thesis is centered around one question: What can we learn from the study of psychiatric phenotypes related to structural genomic abnormalities? In this thesis this subject is examined, with most studies focused on the clinical and genetic aspects of the 22q11.2

  14. Insight into structure and assembly of the nuclear pore complex by utilizing the genome of a eukaryotic thermophile

    DEFF Research Database (Denmark)

    Amlacher, Stefan; Sarges, Phillip; Flemming, Dirk

    2011-01-01

    is composed of two large Nups, Nup192 and Nup170, which are flexibly bridged by short linear motifs made up of linker Nups, Nic96 and Nup53. This assembly illustrates how Nup interactions can generate structural plasticity within the NPC scaffold. Our findings therefore demonstrate the utility of the genome...

  15. Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber

    NARCIS (Netherlands)

    Zhang, Z.; Mao, L.; Chen, Junshi; Bu, F.; Li, G.; Sun, J.; Li, S.; Sun, H.; Jiao, C.; Blakely, R.; Pan, J.; Cai, R.; Luo, R.; Peer, Van de Y.; Jacobsen, E.; Fei, Z.; Huang, S.

    2015-01-01

    Structural variations (SVs) represent a major source of genetic diversity. However, the functional impact and formation mechanisms of SVs in plant genomes remain largely unexplored. Here, we report a nucleotide-resolution SV map of cucumber (Cucumis sativas) that comprises 26,788 SVs based on deep

  16. Evidence of pervasive biologically functional secondary structures within the genomes of eukaryotic single-stranded DNA viruses.

    Science.gov (United States)

    Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y F; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren Patrick

    2014-02-01

    Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.

  17. Detection of G-Quadruplex Structures Formed by G-Rich Sequences from Rice Genome and Transcriptome Using Combined Probes.

    Science.gov (United States)

    Chang, Tianjun; Li, Weiguo; Ding, Zhan; Cheng, Shaofei; Liang, Kun; Liu, Xiangjun; Bing, Tao; Shangguan, Dihua

    2017-08-01

    Putative G-quadruplex (G4) forming sequences (PQS) are highly prevalent in the genome and transcriptome of various organisms and are considered as potential regulation elements in many biological processes by forming G4 structures. The formation of G4 structures highly depends on the sequences and the environment. In most cases, it is difficult to predict G4 formation by PQS, especially PQS containing G2 tracts. Therefore, the experimental identification of G4 formation is essential in the study of G4-related biological functions. Herein, we report a rapid and simple method for the detection of G4 structures by using a pair of complementary reporters, hemin and BMSP. This method was applied to detect G4 structures formed by PQS (DNA and RNA) searched in the genome and transcriptome of Oryza sativa. Unlike most of the reported G4 probes that only recognize part of G4 structures, the proposed method based on combined probes positively responded to almost all G4 conformations, including parallel, antiparallel, and mixed/hybrid G4, but did not respond to non-G4 sequences. This method shows potential for high-throughput identification of G4 structures in genome and transcriptome. Furthermore, BMSP was observed to drive some PQS to form more stable G4 structures or induce the G4 formation of some PQS that cannot form G4 in normal physiological conditions, which may provide a powerful molecular tool for gene regulation.

  18. Genomic structure, expression and association study of the porcine FSD2.

    Science.gov (United States)

    Lim, Kyu-Sang; Lee, Kyung-Tai; Lee, Si-Woo; Chai, Han-Ha; Jang, Gulwon; Hong, Ki-Chang; Kim, Tae-Hun

    2016-09-01

    The fibronectin type III and SPRY domain containing 2 (FSD2) on porcine chromosome 7 is considered a candidate gene for pork quality, since its two domains, which were present in fibronectin and ryanodine receptor. The fibronectin type III and SPRY domains were first identified in fibronectin and ryanodine receptor, respectively, which are candidate genes for meat quality. The aim of this study was to elucidate the genomic structure of FSD2 and functions of single nucleotide polymorphisms (SNPs) within FSD2 that are related to meat quality in pigs. Using a bacterial artificial chromosome clone sequence, we revealed that porcine FSD2 consisted of 13 exons encoding 750 amino acids. In addition, FSD2 was expressed in heart, longissimus dorsi muscle, psoas muscle, and tendon among 23 kinds of porcine tissues tested. A total of ten SNPs, including four missense mutations, were identified in the exonic region of FSD2, and two major haplotypes were obtained based on the SNP genotypes of 633 Berkshire pigs. Both haplotypes were associated significantly with intramuscular fat content (IMF, P meat color, affecting yellowness (P = 0.002). These haplotype effects were further supported by the alteration of putative protein structures with amino acid substitutions. Taken together, our results suggest that FSD2 haplotypes are involved in regulating meat quality including IMF, MP, and meat color in pigs, and may be used as meaningful molecular makers to identify pigs with preferable pork quality.

  19. The first two mitochondrial genomes from Taeniopterygidae (Insecta: Plecoptera): Structural features and phylogenetic implications.

    Science.gov (United States)

    Chen, Zhi-Teng; Du, Yu-Zhou

    2018-05-01

    The complete mitochondrial genomes (mitogenomes) of Taeniopteryx ugola and Doddsia occidentalis (Plecoptera: Taeniopterygidae) were firstly sequenced from the family Taeniopterygidae. The 15,353-bp long mitogenome of T. ugola and the 16,020-bp long mitogenome of D. occidentalis each contained 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosomal RNA genes (rRNAs) and a control region (CR). The mitochondrial gene arrangement of the two taeniopterygids and other stoneflies was identical with the putative ancestral mitogenome of Drosophila yakuba. Most PCGs used standard ATN start codons and TAN termination codons. Twenty-one of the 22 tRNAs in each mitogenome could fold into the cloverleaf secondary structures, while the dihydrouridine (DHU) arm of trnSer (AGN) was reduced or absent. Stem-loop (SL) structures, poly-T stretch, poly-[AT] n stretch and tandem repeats were found in the CRs of the two mitogenomes. The phylogenetic analyses using Bayesian inference (BI) and maximum likelihood methods (ML) generated identical results, both supporting the monophyly of all stonefly families and the two infraorders, Systellognatha and Euholognatha. Taeniopterygidae was grouped with another two families from Euholognatha. The relationships within Plecoptera were recovered as (((Perlidae+Peltoperlidae)+((Pteronarcyidae+Chloroperlidae)+Styloperlidae))+((Capniidae+Taeniopterygidae)+Nemouridae))+Gripopterygidae. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Functional and Structural Overview of G-Protein-Coupled Receptors Comprehensively Obtained from Genome Sequences

    Directory of Open Access Journals (Sweden)

    Makiko Suwa

    2011-04-01

    Full Text Available An understanding of the functional mechanisms of G-protein-coupled receptors (GPCRs is very important for GPCR-related drug design. We have developed an integrated GPCR database (SEVENS http://sevens.cbrc.jp/ that includes 64,090 reliable GPCR genes comprehensively identified from 56 eukaryote genome sequences, and overviewed the sequences and structure spaces of the GPCRs. In vertebrates, the number of receptors for biological amines, peptides, etc. is conserved in most species, whereas the number of chemosensory receptors for odorant, pheromone, etc. significantly differs among species. The latter receptors tend to be single exon type or a few exon type and show a high ratio in the numbers of GPCRs, whereas some families, such as Class B and Class C receptors, have long lengths due to the presence of many exons. Statistical analyses of amino acid residues reveal that most of the conserved residues in Class A GPCRs are found in the cytoplasmic half regions of transmembrane (TM helices, while residues characteristic to each subfamily found on the extracellular half regions. The 69 of Protein Data Bank (PDB entries of complete or fragmentary structures could be mapped on the TM/loop regions of Class A GPCRs covering 14 subfamilies.

  1. Hierarchical role for transcription factors and chromatin structure in genome organization along adipogenesis

    DEFF Research Database (Denmark)

    Sarusi Portuguez, Avital; Schwartz, Michal; Siersbaek, Rasmus

    2017-01-01

    The three dimensional folding of mammalian genomes is cell type specific and difficult to alter suggesting that it is an important component of gene regulation. However, given the multitude of chromatin-associating factors, the mechanisms driving the colocalization of active chromosomal domains...... by PPARγ and Lpin1, undergoes orchestrated reorganization during adipogenesis. Coupling the dynamics of genome architecture with multiple chromatin datasets indicated that among all the transcription factors (TFs) tested, RXR is central to genome reorganization at the beginning of adipogenesis...

  2. Simultaneous gene finding in multiple genomes.

    Science.gov (United States)

    König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario

    2016-11-15

    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae: structural comparative analysis, gene content and microsatellite detection

    Directory of Open Access Journals (Sweden)

    Andrew W. Gichira

    2017-01-01

    Full Text Available Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp, with a pair of Inverted Repeats (IR 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp and a small single copy (SSC, 18,696. H. abyssinica’s chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  4. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection.

    Science.gov (United States)

    Gichira, Andrew W; Li, Zhizhong; Saina, Josphat K; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W; Wang, Qingfeng; Chen, Jinming

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica 's chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene ( infA ) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica . A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  5. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome.

    Science.gov (United States)

    Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E

    2017-03-06

    Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.

  6. Willingness to participate in genomics research and desire for personal results among underrepresented minority patients: a structured interview study.

    Science.gov (United States)

    Sanderson, Saskia C; Diefenbach, Michael A; Zinberg, Randi; Horowitz, Carol R; Smirnoff, Margaret; Zweig, Micol; Streicher, Samantha; Jabs, Ethylin Wang; Richardson, Lynne D

    2013-10-01

    Patients from traditionally underrepresented communities need to be involved in discussions around genomics research including attitudes towards participation and receiving personal results. Structured interviews, including open-ended and closed-ended questions, were conducted with 205 patients in an inner-city hospital outpatient clinic: 48 % of participants self-identified as Black or African American, 29 % Hispanic, 10 % White; 49 % had an annual household income of personal results to be returned was not mentioned, 82 % of participants were willing to participate in genomics research. Reasons for willingness fell into four themes: altruism; benefit to family members; personal health benefit; personal curiosity and improving understanding. Reasons for being unwilling fell into five themes: negative perception of research; not personally relevant; negative feelings about procedures (e.g., blood draws); practical barriers; and fear of results. Participants were more likely to report that they would participate in genomics research if personal results were offered than if they were not offered (89 vs. 62 % respectively, p personal genomic risk results for cancer, heart disease and type 2 diabetes than obesity (89, 89, 91, 80 % respectively, all p personal results was disease-specific worry. There was considerable willingness to participate in and desire for personal results from genomics research in this sample of predominantly low-income, Hispanic and African American patients. When returning results is not practical, or even when it is, alternatively or additionally providing generic information about genomics and health may also be a valuable commodity to underrepresented minority and other populations considering participating in genomics research.

  7. Cell-of-Origin-Specific 3D Genome Structure Acquired during Somatic Cell Reprogramming

    NARCIS (Netherlands)

    Krijger, Peter Hugo Lodewijk; Di Stefano, Bruno; de Wit, Elzo; Limone, Francesco; van Oevelen, Chris; de Laat, Wouter; Graf, Thomas

    2016-01-01

    Forced expression of reprogramming factors can convert somatic cells into induced pluripotent stem cells (iPSCs). Here we studied genome topology dynamics during reprogramming of different somatic cell types with highly distinct genome conformations. We find large-scale topologically associated

  8. DHX9 helicase is involved in preventing genomic instability induced by alternatively structured DNA in human cells.

    Science.gov (United States)

    Jain, Aklank; Bacolla, Albino; Del Mundo, Imee M; Zhao, Junhua; Wang, Guliang; Vasquez, Karen M

    2013-12-01

    Sequences that have the capacity to adopt alternative (i.e. non-B) DNA structures in the human genome have been implicated in stimulating genomic instability. Previously, we found that a naturally occurring intra-molecular triplex (H-DNA) caused genetic instability in mammals largely in the form of DNA double-strand breaks. Thus, it is of interest to determine the mechanism(s) involved in processing H-DNA. Recently, we demonstrated that human DHX9 helicase preferentially unwinds inter-molecular triplex DNA in vitro. Herein, we used a mutation-reporter system containing H-DNA to examine the relevance of DHX9 activity on naturally occurring H-DNA structures in human cells. We found that H-DNA significantly increased mutagenesis in small-interfering siRNA-treated, DHX9-depleted cells, affecting mostly deletions. Moreover, DHX9 associated with H-DNA in the context of supercoiled plasmids. To further investigate the role of DHX9 in the recognition/processing of H-DNA, we performed binding assays in vitro and chromatin immunoprecipitation assays in U2OS cells. DHX9 recognized H-DNA, as evidenced by its binding to the H-DNA structure and enrichment at the H-DNA region compared with a control region in human cells. These composite data implicate DHX9 in processing H-DNA structures in vivo and support its role in the overall maintenance of genomic stability at sites of alternatively structured DNA.

  9. Effect of chemical fixatives on accurate preservation of Escherichia coli and Bacillus subtilis structure in cells prepared by freeze-substitution

    International Nuclear Information System (INIS)

    Graham, L.L.; Beveridge, T.J.

    1990-01-01

    Five chemical fixatives were evaluated for their ability to accurately preserve bacterial ultrastructure during freeze-substitution of select Escherichia coli and Bacillus subtilis strains. Radioisotopes were specifically incorporated into the peptidoglycan, lipopolysaccharide, and nucleic acids of E. coli SFK11 and W7 and into the peptidoglycan and RNA of B. subtilis 168 and W23. The ease of extraction of radiolabels, as assessed by liquid scintillation counting during all stages of processing for freeze-substitution, was used as an indicator of cell structural integrity and retention of cellular chemical composition. Subsequent visual examination by electron microscopy was used to confirm ultrastructural conformation. The fixatives used were: 2% (wt/vol) osmium tetroxide and 2% (wt/vol) uranyl acetate; 2% (vol/vol) glutaraldehyde and 2% (wt/vol) uranyl acetate; 2% (vol/vol) acrolein and 2% (wt/vol) uranyl acetate; 2% (wt/vol) gallic acid; and 2% (wt/vol) uranyl acetate. All fixatives were prepared in a substitution solvent of anhydrous acetone. Extraction of cellular constituents depended on the chemical fixative used. A combination of 2% osmium tetroxide-2% uranyl acetate or 2% gallic acid alone resulted in optimum fixation as ascertained by least extraction of radiolabels. In both gram-positive and gram-negative organisms, high levels of radiolabel were detected in the processing fluids in which 2% acrolein-2% uranyl acetate, 2% glutaraldehyde-2% uranyl acetate, or 2% uranyl acetate alone were used as fixatives. Ultrastructural variations were observed in cells freeze-substituted in the presence of different chemical fixatives. We recommend the use of osmium tetroxide and uranyl acetate in acetone for routine freeze-substitution of eubacteria, while gallic acid is recommended for use when microanalytical processing necessitates the omission of osmium

  10. Extensive gene content variation in the Brachypodium distachyon pan-genome correlates with population structure.

    Science.gov (United States)

    Gordon, Sean P; Contreras-Moreira, Bruno; Woods, Daniel P; Des Marais, David L; Burgess, Diane; Shu, Shengqiang; Stritt, Christoph; Roulin, Anne C; Schackwitz, Wendy; Tyler, Ludmila; Martin, Joel; Lipzen, Anna; Dochy, Niklas; Phillips, Jeremy; Barry, Kerrie; Geuten, Koen; Budak, Hikmet; Juenger, Thomas E; Amasino, Richard; Caicedo, Ana L; Goodstein, David; Davidson, Patrick; Mur, Luis A J; Figueroa, Melania; Freeling, Michael; Catalan, Pilar; Vogel, John P

    2017-12-19

    While prokaryotic pan-genomes have been shown to contain many more genes than any individual organism, the prevalence and functional significance of differentially present genes in eukaryotes remains poorly understood. Whole-genome de novo assembly and annotation of 54 lines of the grass Brachypodium distachyon yield a pan-genome containing nearly twice the number of genes found in any individual genome. Genes present in all lines are enriched for essential biological functions, while genes present in only some lines are enriched for conditionally beneficial functions (e.g., defense and development), display faster evolutionary rates, lie closer to transposable elements and are less likely to be syntenic with orthologous genes in other grasses. Our data suggest that differentially present genes contribute substantially to phenotypic variation within a eukaryote species, these genes have a major influence in population genetics, and transposable elements play a key role in pan-genome evolution.

  11. Identification and classification of conserved RNA secondary structures in the human genome

    DEFF Research Database (Denmark)

    Pedersen, Jakob Skou; Bejerano, Gill; Siepel, Adam

    2006-01-01

    The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars...... for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set......, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization....

  12. Visualizing information across multidimensional post-genomic structured and textual databases.

    Science.gov (United States)

    Tao, Ying; Friedman, Carol; Lussier, Yves A

    2005-04-15

    Visualizing relationships among biological information to facilitate understanding is crucial to biological research during the post-genomic era. Although different systems have been developed to view gene-phenotype relationships for specific databases, very few have been designed specifically as a general flexible tool for visualizing multidimensional genotypic and phenotypic information together. Our goal is to develop a method for visualizing multidimensional genotypic and phenotypic information and a model that unifies different biological databases in order to present the integrated knowledge using a uniform interface. We developed a novel, flexible and generalizable visualization tool, called PhenoGenesviewer (PGviewer), which in this paper was used to display gene-phenotype relationships from a human-curated database (OMIM) and from an automatic method using a Natural Language Processing tool called BioMedLEE. Data obtained from multiple databases were first integrated into a uniform structure and then organized by PGviewer. PGviewer provides a flexible query interface that allows dynamic selection and ordering of any desired dimension in the databases. Based on users' queries, results can be visualized using hierarchical expandable trees that present views specified by users according to their research interests. We believe that this method, which allows users to dynamically organize and visualize multiple dimensions, is a potentially powerful and promising tool that should substantially facilitate biological research. PhenogenesViewer as well as its support and tutorial are available at http://www.dbmi.columbia.edu/pgviewer/ Lussier@dbmi.columbia.edu.

  13. Deciphering the genomic structure, function and evolution of carotenogenesis related phytoene synthases in grasses

    Directory of Open Access Journals (Sweden)

    Dibari Bianca

    2012-06-01

    Full Text Available Abstract Background Carotenoids are isoprenoid pigments, essential for photosynthesis and photoprotection in plants. The enzyme phytoene synthase (PSY plays an essential role in mediating condensation of two geranylgeranyl diphosphate molecules, the first committed step in carotenogenesis. PSY are nuclear enzymes encoded by a small gene family consisting of three paralogous genes (PSY1-3 that have been widely characterized in rice, maize and sorghum. Results In wheat, for which yellow pigment content is extremely important for flour colour, only PSY1 has been extensively studied because of its association with QTLs reported for yellow pigment whereas PSY2 has been partially characterized. Here, we report the isolation of bread wheat PSY3 genes from a Renan BAC library using Brachypodium as a model genome for the Triticeae to develop Conserved Orthologous Set markers prior to gene cloning and sequencing. Wheat PSY3 homoeologous genes were sequenced and annotated, unravelling their novel structure associated with intron-loss events and consequent exonic fusions. A wheat PSY3 promoter region was also investigated for the presence of cis-acting elements involved in the response to abscisic acid (ABA, since carotenoids also play an important role as precursors of signalling molecules devoted to plant development and biotic/abiotic stress responses. Expression of wheat PSYs in leaves and roots was investigated during ABA treatment to confirm the up-regulation of PSY3 during abiotic stress. Conclusions We investigated the structural and functional determinisms of PSY genes in wheat. More generally, among eudicots and monocots, the PSY gene family was found to be associated with differences in gene copy numbers, allowing us to propose an evolutionary model for the entire PSY gene family in Grasses.

  14. Congruence as a measurement of extended haplotype structure across the genome

    Science.gov (United States)

    2012-01-01

    Background Historically, extended haplotypes have been defined using only a few data points, such as alleles for several HLA genes in the MHC. High-density SNP data, and the increasing affordability of whole genome SNP typing, creates the opportunity to define higher resolution extended haplotypes. This drives the need for new tools that support quantification and visualization of extended haplotypes as defined by as many as 2000 SNPs. Confronted with high-density SNP data across the major histocompatibility complex (MHC) for 2,300 complete families, compiled by the Type 1 Diabetes Genetics Consortium (T1DGC), we developed software for studying extended haplotypes. Methods The software, called ExHap (Extended Haplotype), uses a similarity measurement we term congruence to identify and quantify long-range allele identity. Using ExHap, we analyzed congruence in both the T1DGC data and family-phased data from the International HapMap Project. Results Congruent chromosomes from the T1DGC data have between 96.5% and 99.9% allele identity over 1,818 SNPs spanning 2.64 megabases of the MHC (HLA-DRB1 to HLA-A). Thirty-three of 132 DQ-DR-B-A defined haplotype groups have > 50% congruent chromosomes in this region. For example, 92% of chromosomes within the DR3-B8-A1 haplotype are congruent from HLA-DRB1 to HLA-A (99.8% allele identity). We also applied ExHap to all 22 autosomes for both CEU and YRI cohorts from the International HapMap Project, identifying multiple candidate extended haplotypes. Conclusions Long-range congruence is not unique to the MHC region. Patterns of allele identity on phased chromosomes provide a simple, straightforward approach to visually and quantitatively inspect complex long-range structural patterns in the genome. Such patterns aid the biologist in appreciating genetic similarities and differences across cohorts, and can lead to hypothesis generation for subsequent studies. PMID:22369243

  15. Salmonella strains isolated from Galápagos iguanas show spatial structuring of serovar and genomic diversity.

    Directory of Open Access Journals (Sweden)

    Emily W Lankau

    Full Text Available It is thought that dispersal limitation primarily structures host-associated bacterial populations because host distributions inherently limit transmission opportunities. However, enteric bacteria may disperse great distances during food-borne outbreaks. It is unclear if such rapid long-distance dispersal events happen regularly in natural systems or if these events represent an anthropogenic exception. We characterized Salmonella enterica isolates from the feces of free-living Galápagos land and marine iguanas from five sites on four islands using serotyping and genomic fingerprinting. Each site hosted unique and nearly exclusive serovar assemblages. Genomic fingerprint analysis offered a more complex model of S. enterica biogeography, with evidence of both unique strain pools and of spatial population structuring along a geographic gradient. These findings suggest that even relatively generalist enteric bacteria may be strongly dispersal limited in a natural system with strong barriers, such as oceanic divides. Yet, these differing results seen on two typing methods also suggests that genomic variation is less dispersal limited, allowing for different ecological processes to shape biogeographical patterns of the core and flexible portions of this bacterial species' genome.

  16. Salmonella strains isolated from Galápagos iguanas show spatial structuring of serovar and genomic diversity.

    Science.gov (United States)

    Lankau, Emily W; Cruz Bedon, Lenin; Mackie, Roderick I

    2012-01-01

    It is thought that dispersal limitation primarily structures host-associated bacterial populations because host distributions inherently limit transmission opportunities. However, enteric bacteria may disperse great distances during food-borne outbreaks. It is unclear if such rapid long-distance dispersal events happen regularly in natural systems or if these events represent an anthropogenic exception. We characterized Salmonella enterica isolates from the feces of free-living Galápagos land and marine iguanas from five sites on four islands using serotyping and genomic fingerprinting. Each site hosted unique and nearly exclusive serovar assemblages. Genomic fingerprint analysis offered a more complex model of S. enterica biogeography, with evidence of both unique strain pools and of spatial population structuring along a geographic gradient. These findings suggest that even relatively generalist enteric bacteria may be strongly dispersal limited in a natural system with strong barriers, such as oceanic divides. Yet, these differing results seen on two typing methods also suggests that genomic variation is less dispersal limited, allowing for different ecological processes to shape biogeographical patterns of the core and flexible portions of this bacterial species' genome.

  17. Salmonella Strains Isolated from Galápagos Iguanas Show Spatial Structuring of Serovar and Genomic Diversity

    Science.gov (United States)

    Lankau, Emily W.; Cruz Bedon, Lenin; Mackie, Roderick I.

    2012-01-01

    It is thought that dispersal limitation primarily structures host-associated bacterial populations because host distributions inherently limit transmission opportunities. However, enteric bacteria may disperse great distances during food-borne outbreaks. It is unclear if such rapid long-distance dispersal events happen regularly in natural systems or if these events represent an anthropogenic exception. We characterized Salmonella enterica isolates from the feces of free-living Galápagos land and marine iguanas from five sites on four islands using serotyping and genomic fingerprinting. Each site hosted unique and nearly exclusive serovar assemblages. Genomic fingerprint analysis offered a more complex model of S. enterica biogeography, with evidence of both unique strain pools and of spatial population structuring along a geographic gradient. These findings suggest that even relatively generalist enteric bacteria may be strongly dispersal limited in a natural system with strong barriers, such as oceanic divides. Yet, these differing results seen on two typing methods also suggests that genomic variation is less dispersal limited, allowing for different ecological processes to shape biogeographical patterns of the core and flexible portions of this bacterial species' genome. PMID:22615968

  18. Integration of Structural Dynamics and Molecular Evolution via Protein Interaction Networks: A New Era in Genomic Medicine

    Science.gov (United States)

    Kumar, Avishek; Butler, Brandon M.; Kumar, Sudhir; Ozkan, S. Banu

    2016-01-01

    Summary Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. PMID:26684487

  19. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine.

    Science.gov (United States)

    Kumar, Avishek; Butler, Brandon M; Kumar, Sudhir; Ozkan, S Banu

    2015-12-01

    Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Ultra high-resolution gene centric genomic structural analysis of a non-syndromic congenital heart defect, Tetralogy of Fallot.

    Directory of Open Access Journals (Sweden)

    Douglas C Bittel

    Full Text Available Tetralogy of Fallot (TOF is one of the most common severe congenital heart malformations. Great progress has been made in identifying key genes that regulate heart development, yet approximately 70% of TOF cases are sporadic and nonsyndromic with no known genetic cause. We created an ultra high-resolution gene centric comparative genomic hybridization (gcCGH microarray based on 591 genes with a validated association with cardiovascular development or function. We used our gcCGH array to analyze the genomic structure of 34 infants with sporadic TOF without a deletion on chromosome 22q11.2 (n male = 20; n female = 14; age range of 2 to 10 months. Using our custom-made gcCGH microarray platform, we identified a total of 613 copy number variations (CNVs ranging in size from 78 base pairs to 19.5 Mb. We identified 16 subjects with 33 CNVs that contained 13 different genes which are known to be directly associated with heart development. Additionally, there were 79 genes from the broader list of genes that were partially or completely contained in a CNV. All 34 individuals examined had at least one CNV involving these 79 genes. Furthermore, we had available whole genome exon arrays from right ventricular tissue in 13 of our subjects. We analyzed these for correlations between copy number and gene expression level. Surprisingly, we could detect only one clear association between CNVs and expression (GSTT1 for any of the 591 focal genes on the gcCGH array. The expression levels of GSTT1 were correlated with copy number in all cases examined (r = 0.95, p = 0.001. We identified a large number of small CNVs in genes with varying associations with heart development. Our results illustrate the complexity of human genome structural variation and underscore the need for multifactorial assessment of potential genetic/genomic factors that contribute to congenital heart defects.

  1. The role of parasite-driven selection in shaping landscape genomic structure in red grouse (Lagopus lagopus scotica).

    Science.gov (United States)

    Wenzel, Marius A; Douglas, Alex; James, Marianne C; Redpath, Steve M; Piertney, Stuart B

    2016-01-01

    Landscape genomics promises to provide novel insights into how neutral and adaptive processes shape genome-wide variation within and among populations. However, there has been little emphasis on examining whether individual-based phenotype-genotype relationships derived from approaches such as genome-wide association (GWAS) manifest themselves as a population-level signature of selection in a landscape context. The two may prove irreconcilable as individual-level patterns become diluted by high levels of gene flow and complex phenotypic or environmental heterogeneity. We illustrate this issue with a case study that examines the role of the highly prevalent gastrointestinal nematode Trichostrongylus tenuis in shaping genomic signatures of selection in red grouse (Lagopus lagopus scotica). Individual-level GWAS involving 384 SNPs has previously identified five SNPs that explain variation in T. tenuis burden. Here, we examine whether these same SNPs display population-level relationships between T. tenuis burden and genetic structure across a small-scale landscape of 21 sites with heterogeneous parasite pressure. Moreover, we identify adaptive SNPs showing signatures of directional selection using F(ST) outlier analysis and relate population- and individual-level patterns of multilocus neutral and adaptive genetic structure to T. tenuis burden. The five candidate SNPs for parasite-driven selection were neither associated with T. tenuis burden on a population level, nor under directional selection. Similarly, there was no evidence of parasite-driven selection in SNPs identified as candidates for directional selection. We discuss these results in the context of red grouse ecology and highlight the broader consequences for the utility of landscape genomics approaches for identifying signatures of selection. © 2015 John Wiley & Sons Ltd.

  2. Striking structural dynamism and nucleotide sequence variation of the transposon Galileo in the genome of Drosophila mojavensis.

    Science.gov (United States)

    Marzo, Mar; Bello, Xabier; Puig, Marta; Maside, Xulio; Ruiz, Alfredo

    2013-02-04

    Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome.

  3. Structure, sequence and expression of the hepatitis delta (δ) viral genome

    Science.gov (United States)

    Wang, Kang-Sheng; Choo, Qui-Lim; Weiner, Amy J.; Ou, Jing-Hsiung; Najarian, Richard C.; Thayer, Richard M.; Mullenbach, Guy T.; Denniston, Katherine J.; Gerin, John L.; Houghton, Michael

    1986-10-01

    Biochemical and electron microscopic data indicate that the human hepatitis δ viral agent contains a covalently closed circular and single-stranded RNA genome that has certain similarities with viroid-like agents from plants. The sequence of the viral genome (1,678 nucleotides) has been determined and an open reading frame within the complementary strand has been shown to encode an antigen that binds specifically to antisera from patients with chronic hepatitis δ viral infections.

  4. Northern Bobwhite (Colinus virginianus Mitochondrial Population Genomics Reveals Structure, Divergence, and Evidence for Heteroplasmy.

    Directory of Open Access Journals (Sweden)

    Yvette A Halley

    Full Text Available Herein, we evaluated the concordance of population inferences and conclusions resulting from the analysis of short mitochondrial fragments (i.e., partial or complete D-Loop nucleotide sequences versus complete mitogenome sequences for 53 bobwhites representing six ecoregions across TX and OK (USA. Median joining (MJ haplotype networks demonstrated that analyses performed using small mitochondrial fragments were insufficient for estimating the true (i.e., complete mitogenome haplotype structure, corresponding levels of divergence, and maternal population history of our samples. Notably, discordant demographic inferences were observed when mismatch distributions of partial (i.e., partial D-Loop versus complete mitogenome sequences were compared, with the reduction in mitochondrial genomic information content observed to encourage spurious inferences in our samples. A probabilistic approach to variant prediction for the complete bobwhite mitogenomes revealed 344 segregating sites corresponding to 347 total mutations, including 49 putative nonsynonymous single nucleotide variants (SNVs distributed across 12 protein coding genes. Evidence of gross heteroplasmy was observed for 13 bobwhites, with 10 of the 13 heteroplasmies involving one moderate to high frequency SNV. Haplotype network and phylogenetic analyses for the complete bobwhite mitogenome sequences revealed two divergent maternal lineages (dXY = 0.00731; FST = 0.849; P < 0.05, thereby supporting the potential for two putative subspecies. However, the diverged lineage (n = 103 variants almost exclusively involved bobwhites geographically classified as Colinus virginianus texanus, which is discordant with the expectations of previous geographic subspecies designations. Tests of adaptive evolution for functional divergence (MKT, frequency distribution tests (D, FS and phylogenetic analyses (RAxML provide no evidence for positive selection or hybridization with the sympatric scaled quail

  5. Comprehensive Genome Analysis of Carbapenemase-Producing Enterobacter spp.: New Insights into Phylogeny, Population Structure, and Resistance Mechanisms.

    Science.gov (United States)

    Chavda, Kalyan D; Chen, Liang; Fouts, Derrick E; Sutton, Granger; Brinkac, Lauren; Jenkins, Stephen G; Bonomo, Robert A; Adams, Mark D; Kreiswirth, Barry N

    2016-12-13

    Knowledge regarding the genomic structure of Enterobacter spp., the second most prevalent carbapenemase-producing Enterobacteriaceae, remains limited. Here we sequenced 97 clinical Enterobacter species isolates that were both carbapenem susceptible and resistant from various geographic regions to decipher the molecular origins of carbapenem resistance and to understand the changing phylogeny of these emerging and drug-resistant pathogens. Of the carbapenem-resistant isolates, 30 possessed bla KPC-2 , 40 had bla KPC-3 , 2 had bla KPC-4 , and 2 had bla NDM-1 Twenty-three isolates were carbapenem susceptible. Six genomes were sequenced to completion, and their sizes ranged from 4.6 to 5.1 Mbp. Phylogenomic analysis placed 96 of these genomes, 351 additional Enterobacter genomes downloaded from NCBI GenBank, and six newly sequenced type strains into 19 phylogenomic groups-18 groups (A to R) in the Enterobacter cloacae complex and Enterobacter aerogenes Diverse mechanisms underlying the molecular evolutionary trajectory of these drug-resistant Enterobacter spp. were revealed, including the acquisition of an antibiotic resistance plasmid, followed by clonal spread, horizontal transfer of bla KPC -harboring plasmids between different phylogenomic groups, and repeated transposition of the bla KPC gene among different plasmid backbones. Group A, which comprises multilocus sequence type 171 (ST171), was the most commonly identified (23% of isolates). Genomic analysis showed that ST171 isolates evolved from a common ancestor and formed two different major clusters; each acquiring unique bla KPC -harboring plasmids, followed by clonal expansion. The data presented here represent the first comprehensive study of phylogenomic interrogation and the relationship between antibiotic resistance and plasmid discrimination among carbapenem-resistant Enterobacter spp., demonstrating the genetic diversity and complexity of the molecular mechanisms driving antibiotic resistance in this

  6. The complete mitochondrial genome and its remarkable secondary structure for a stonefly Acroneuria hainana Wu (Insecta: Plecoptera, Perlidae).

    Science.gov (United States)

    Huang, Mingchao; Wang, Yuyu; Liu, Xingyue; Li, Weihai; Kang, Zehui; Wang, Kai; Li, Xuankun; Yang, Ding

    2015-02-15

    The Plecoptera (stoneflies) is a hemimetabolous order of insects, whose larvae are usually used as indicators for fresh water biomonitoring. Herein, we describe the complete mitochondrial (mt) genome of a stonefly species, namely Acroneuria hainana Wu belonging to the family Perlidae. This mt genome contains 13 PCGs, 22 tRNA-coding genes and 2 rRNA-coding genes that are conserved in most insect mt genomes, and it also has the identical gene order with the insect ancestral gene order. However, there are three special initiation codons of ND1, ND5 and COI in PCGs: TTG, GTG and CGA, coding for L, V and R, respectively. Additionally, the 899-bp control region, with 73.30% A+T content, has two long repeated sequences which are found at the 3'-end closing to the tRNA(Ile) gene. Both of them can be folded into a stem-loop structure, whose adjacent upstream and downstream sequences can be also folded into stem-loop structures. It is presumed that the four special structures in series could be associated with the D-loop replication. It might be able to adjust the replication speed of two replicate directions. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. Genetic diversity and structure of elite cotton germplasm (Gossypium hirsutum L.) using genome-wide SNP data.

    Science.gov (United States)

    Ai, XianTao; Liang, YaJun; Wang, JunDuo; Zheng, JuYun; Gong, ZhaoLong; Guo, JiangPing; Li, XueYuan; Qu, YanYing

    2017-10-01

    Cotton (Gossypium spp.) is the most important natural textile fiber crop, and Gossypium hirsutum L. is responsible for 90% of the annual cotton crop in the world. Information on cotton genetic diversity and population structure is essential for new breeding lines. In this study, we analyzed population structure and genetic diversity of 288 elite Gossypium hirsutum cultivar accessions collected from around the world, and especially from China, using genome-wide single nucleotide polymorphisms (SNP) markers. The average polymorphsim information content (PIC) was 0.25, indicating a relatively low degree of genetic diversity. Population structure analysis revealed extensive admixture and identified three subgroups. Phylogenetic analysis supported the subgroups identified by STRUCTURE. The results from both population structure and phylogenetic analysis were, for the most part, in agreement with pedigree information. Analysis of molecular variance revealed a larger amount of variation was due to diversity within the groups. Establishment of genetic diversity and population structure from this study could be useful for genetic and genomic analysis and systematic utilization of the standing genetic variation in upland cotton.

  8. Spatial variation in the parasite communities and genomic structure of urban rats in New York City.

    Science.gov (United States)

    Angley, L P; Combs, M; Firth, C; Frye, M J; Lipkin, I; Richardson, J L; Munshi-South, J

    2018-02-01

    Brown rats (Rattus norvegicus) are a globally distributed pest. Urban habitats can support large infestations of rats, posing a potential risk to public health from the parasites and pathogens they carry. Despite the potential influence of rodent-borne zoonotic diseases on human health, it is unclear how urban habitats affect the structure and transmission dynamics of ectoparasite and microbial communities (all referred to as "parasites" hereafter) among rat colonies. In this study, we use ecological data on parasites and genomic sequencing of their rat hosts to examine associations between spatial proximity, genetic relatedness and the parasite communities associated with 133 rats at five sites in sections of New York City with persistent rat infestations. We build on previous work showing that rats in New York carry a wide variety of parasites and report that these communities differ significantly among sites, even across small geographical distances. Ectoparasite community similarity was positively associated with geographical proximity; however, there was no general association between distance and microbial communities of rats. Sites with greater overall parasite diversity also had rats with greater infection levels and parasite species richness. Parasite community similarity among sites was not linked to genetic relatedness of rats, suggesting that these communities are not associated with genetic similarity among host individuals or host dispersal among sites. Discriminant analysis identified site-specific associations of several parasite species, suggesting that the presence of some species within parasite communities may allow researchers to determine the sites of origin for newly sampled rats. The results of our study help clarify the roles that colony structure and geographical proximity play in determining the ecology of R. norvegicus as a significant urban reservoir of zoonotic diseases. Our study also highlights the spatial variation present in urban

  9. Genome-scale characterization of RNA tertiary structures and their functional impact by RNA solvent accessibility prediction.

    Science.gov (United States)

    Yang, Yuedong; Li, Xiaomei; Zhao, Huiying; Zhan, Jian; Wang, Jihua; Zhou, Yaoqi

    2017-01-01

    As most RNA structures are elusive to structure determination, obtaining solvent accessible surface areas (ASAs) of nucleotides in an RNA structure is an important first step to characterize potential functional sites and core structural regions. Here, we developed RNAsnap, the first machine-learning method trained on protein-bound RNA structures for solvent accessibility prediction. Built on sequence profiles from multiple sequence alignment (RNAsnap-prof), the method provided robust prediction in fivefold cross-validation and an independent test (Pearson correlation coefficients, r, between predicted and actual ASA values are 0.66 and 0.63, respectively). Application of the method to 6178 mRNAs revealed its positive correlation to mRNA accessibility by dimethyl sulphate (DMS) experimentally measured in vivo (r = 0.37) but not in vitro (r = 0.07), despite the lack of training on mRNAs and the fact that DMS accessibility is only an approximation to solvent accessibility. We further found strong association across coding and noncoding regions between predicted solvent accessibility of the mutation site of a single nucleotide variant (SNV) and the frequency of that variant in the population for 2.2 million SNVs obtained in the 1000 Genomes Project. Moreover, mapping solvent accessibility of RNAs to the human genome indicated that introns, 5' cap of 5' and 3' cap of 3' untranslated regions, are more solvent accessible, consistent with their respective functional roles. These results support conformational selections as the mechanism for the formation of RNA-protein complexes and highlight the utility of genome-scale characterization of RNA tertiary structures by RNAsnap. The server and its stand-alone downloadable version are available at http://sparks-lab.org. © 2016 Yang et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  10. Supplementary Material for: Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody

    2016-01-01

    Abstract Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. Methods To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. Results The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. Conclusions Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel

  11. Mycobacterium tuberculosis whole genome sequencing and protein structure modelling provides insights into anti-tuberculosis drug resistance

    KAUST Repository

    Phelan, Jody

    2016-03-23

    Background Combating the spread of drug resistant tuberculosis is a global health priority. Whole genome association studies are being applied to identify genetic determinants of resistance to anti-tuberculosis drugs. Protein structure and interaction modelling are used to understand the functional effects of putative mutations and provide insight into the molecular mechanisms leading to resistance. Methods To investigate the potential utility of these approaches, we analysed the genomes of 144 Mycobacterium tuberculosis clinical isolates from The Special Programme for Research and Training in Tropical Diseases (TDR) collection sourced from 20 countries in four continents. A genome-wide approach was applied to 127 isolates to identify polymorphisms associated with minimum inhibitory concentrations for first-line anti-tuberculosis drugs. In addition, the effect of identified candidate mutations on protein stability and interactions was assessed quantitatively with well-established computational methods. Results The analysis revealed that mutations in the genes rpoB (rifampicin), katG (isoniazid), inhA-promoter (isoniazid), rpsL (streptomycin) and embB (ethambutol) were responsible for the majority of resistance observed. A subset of the mutations identified in rpoB and katG were predicted to affect protein stability. Further, a strong direct correlation was observed between the minimum inhibitory concentration values and the distance of the mutated residues in the three-dimensional structures of rpoB and katG to their respective drugs binding sites. Conclusions Using the TDR resource, we demonstrate the usefulness of whole genome association and convergent evolution approaches to detect known and potentially novel mutations associated with drug resistance. Further, protein structural modelling could provide a means of predicting the impact of polymorphisms on drug efficacy in the absence of phenotypic data. These approaches could ultimately lead to novel resistance

  12. The subclonal structure and genomic evolution of oral squamous cell carcinoma revealed by ultra-deep sequencing

    DEFF Research Database (Denmark)

    Tabatabaeifar, Siavosh; Thomassen, Mads; Larsen, Martin J

    2017-01-01

    Recent studies suggest that head and neck squamous cell carcinomas are very heterogeneous between patients; however the subclonal structure remains unexplored mainly due to studies using only a single biopsy per patient. To deconvolutethe clonal structure and describe the genomic cancer evolution......, we applied whole-exome sequencing combined with ultra-deep targeted sequencing on oral squamous cell carcinomas (OSCC). From each patient, a set of biopsies was sampled from distinct geographical sites in primary tumor and lymph node metastasis.We demonstrate that the included OSCCs show a high...

  13. Determination of accurate 1H positions of an alanine tripeptide with anti-parallel and parallel β-sheet structures by high resolution 1H solid state NMR and GIPAW chemical shift calculation.

    Science.gov (United States)

    Yazawa, Koji; Suzuki, Furitsu; Nishiyama, Yusuke; Ohata, Takuya; Aoki, Akihiro; Nishimura, Katsuyuki; Kaji, Hironori; Shimizu, Tadashi; Asakura, Tetsuo

    2012-11-25

    The accurate (1)H positions of alanine tripeptide, A(3), with anti-parallel and parallel β-sheet structures could be determined by highly resolved (1)H DQMAS solid-state NMR spectra and (1)H chemical shift calculation with gauge-including projector augmented wave calculations.

  14. Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health

    Science.gov (United States)

    Holt, Kathryn E.; Wertheim, Heiman; Zadoks, Ruth N.; Baker, Stephen; Whitehouse, Chris A.; Dance, David; Jenney, Adam; Connor, Thomas R.; Hsu, Li Yang; Severin, Juliëtte; Brisse, Sylvain; Cao, Hanwei; Wilksch, Jonathan; Gorrie, Claire; Schultz, Mark B.; Edwards, David J.; Nguyen, Kinh Van; Nguyen, Trung Vu; Dao, Trinh Tuyet; Mensink, Martijn; Minh, Vien Le; Nhu, Nguyen Thi Khanh; Schultsz, Constance; Kuntaman, Kuntaman; Newton, Paul N.; Moore, Catrin E.; Strugnell, Richard A.; Thomson, Nicholas R.

    2015-01-01

    Klebsiella pneumoniae is now recognized as an urgent threat to human health because of the emergence of multidrug-resistant strains associated with hospital outbreaks and hypervirulent strains associated with severe community-acquired infections. K. pneumoniae is ubiquitous in the environment and can colonize and infect both plants and animals. However, little is known about the population structure of K. pneumoniae, so it is difficult to recognize or understand the emergence of clinically important clones within this highly genetically diverse species. Here we present a detailed genomic framework for K. pneumoniae based on whole-genome sequencing of more than 300 human and animal isolates spanning four continents. Our data provide genome-wide support for the splitting of K. pneumoniae into three distinct species, KpI (K. pneumoniae), KpII (K. quasipneumoniae), and KpIII (K. variicola). Further, for K. pneumoniae (KpI), the entity most frequently associated with human infection, we show the existence of >150 deeply branching lineages including numerous multidrug-resistant or hypervirulent clones. We show K. pneumoniae has a large accessory genome approaching 30,000 protein-coding genes, including a number of virulence functions that are significantly associated with invasive community-acquired disease in humans. In our dataset, antimicrobial resistance genes were common among human carriage isolates and hospital-acquired infections, which generally lacked the genes associated with invasive disease. The convergence of virulence and resistance genes potentially could lead to the emergence of untreatable invasive K. pneumoniae infections; our data provide the whole-genome framework against which to track the emergence of such threats. PMID:26100894

  15. Cas9 versus Cas12a/Cpf1: Structure-function comparisons and implications for genome editing.

    Science.gov (United States)

    Swarts, Daan C; Jinek, Martin

    2018-05-22

    Cas9 and Cas12a are multidomain CRISPR-associated nucleases that can be programmed with a guide RNA to bind and cleave complementary DNA targets. The guide RNA sequence can be varied, making these effector enzymes versatile tools for genome editing and gene regulation applications. While Cas9 is currently the best-characterized and most widely used nuclease for such purposes, Cas12a (previously named Cpf1) has recently emerged as an alternative for Cas9. Cas9 and Cas12a have distinct evolutionary origins and exhibit different structural architectures, resulting in distinct molecular mechanisms. Here we compare the structural and mechanistic features that distinguish Cas9 and Cas12a, and describe how these features modulate their activity. We discuss implications for genome editing, and how they may influence the choice of Cas9 or Cas12a for specific applications. Finally, we review recent studies in which Cas12a has been utilized as a genome editing tool. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Biogenesis of Effector Small RNAs RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes. © 2018 Wiley Periodicals, Inc.

  16. A multivalent adsorption apparatus explains the broad host range of phage phi92: a comprehensive genomic and structural analysis.

    Science.gov (United States)

    Schwarzer, David; Buettner, Falk F R; Browning, Christopher; Nazarov, Sergey; Rabsch, Wolfgang; Bethe, Andrea; Oberbeck, Astrid; Bowman, Valorie D; Stummeyer, Katharina; Mühlenhoff, Martina; Leiman, Petr G; Gerardy-Schahn, Rita

    2012-10-01

    Bacteriophage phi92 is a large, lytic myovirus isolated in 1983 from pathogenic Escherichia coli strains that carry a polysialic acid capsule. Here we report the genome organization of phi92, the cryoelectron microscopy reconstruction of its virion, and the reinvestigation of its host specificity. The genome consists of a linear, double-stranded 148,612-bp DNA sequence containing 248 potential open reading frames and 11 putative tRNA genes. Orthologs were found for 130 of the predicted proteins. Most of the virion proteins showed significant sequence similarities to proteins of myoviruses rv5 and PVP-SE1, indicating that phi92 is a new member of the novel genus of rv5-like phages. Reinvestigation of phi92 host specificity showed that the host range is not limited to polysialic acid-encapsulated Escherichia coli but includes most laboratory strains of Escherichia coli and many Salmonella strains. Structure analysis of the phi92 virion demonstrated the presence of four different types of tail fibers and/or tailspikes, which enable the phage to use attachment sites on encapsulated and nonencapsulated bacteria. With this report, we provide the first detailed description of a multivalent, multispecies phage armed with a host cell adsorption apparatus resembling a nanosized Swiss army knife. The genome, structure, and, in particular, the organization of the baseplate of phi92 demonstrate how a bacteriophage can evolve into a multi-pathogen-killing agent.

  17. Genome structures and halophyte-specific gene expression of the extremophile thellungiella parvula in comparison with Thellungiella salsuginea (Thellungiella halophila) and arabidopsis

    KAUST Repository

    Oh, Dongha; Dassanayake, Maheshi; Haas, Jeffrey S.; Kropornika, Anna; Wright, Chris L.; D'Urzo, Matilde Paino; Hong, Hyewon; Ali, Shahjahan; Herná ndez, Á lvaro Gonzalez; Lambert, Georgina M.; Inan, Gü nsu; Galbraith, David; Bressan, Ray Anthony; Yun, Daejin; Zhu, Jian-Kang; Cheeseman, John McP; Bohnert, Hans Jü rgen

    2010-01-01

    and an uneven distribution of repeat sequences. T. parvula genome structure and DNA sequences were compared with orthologous regions from Arabidopsis and publicly available bacterial artificial chromosome sequences from Thellungiella salsuginea (previously

  18. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale

    DEFF Research Database (Denmark)

    Liu, Siyang; Huang, Shujia; Rao, Junhua

    2015-01-01

    present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome......) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We...... assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction...

  19. Genomic Prediction from Whole Genome Sequence in Livestock: The 1000 Bull Genomes Project

    DEFF Research Database (Denmark)

    Hayes, Benjamin J; MacLeod, Iona M; Daetwyler, Hans D

    Advantages of using whole genome sequence data to predict genomic estimated breeding values (GEBV) include better persistence of accuracy of GEBV across generations and more accurate GEBV across breeds. The 1000 Bull Genomes Project provides a database of whole genome sequenced key ancestor bulls....... In a dairy data set, predictions using BayesRC and imputed sequence data from 1000 Bull Genomes were 2% more accurate than with 800k data. We could demonstrate the method identified causal mutations in some cases. Further improvements will come from more accurate imputation of sequence variant genotypes...

  20. The mosaic genome structure of the Wolbachia wRi strain infecting Drosophila simulans

    DEFF Research Database (Denmark)

    Klasson, Lisa; Westberg, Joakim; Sapountzis, Panagiotis

    2009-01-01

    genome of W. pipientis strain wRi that induces very strong cytoplasmic incompatibility in its natural host Drosophila simulans. A comparison with the previously sequenced genome of W. pipientis strain wMel from Drosophila melanogaster identified 35 breakpoints associated with mobile elements and repeated...... sequences that are stable in Drosophila lines transinfected with wRi. Additionally, 450 genes with orthologs in wRi and wMel were sequenced from the W. pipientis strain wUni, responsible for the induction of parthenogenesis in the parasitoid wasp Muscidifurax uniraptor. The comparison of these A...

  1. An Emerging Tick-Borne Disease of Humans Is Caused by a Subset of Strains with Conserved Genome Structure

    Science.gov (United States)

    Barbet, Anthony F.; Al-Khedery, Basima; Stuen, Snorre; Granquist, Erik G.; Felsheim, Roderick F.; Munderloh, Ulrike G.

    2013-01-01

    The prevalence of tick-borne diseases is increasing worldwide. One such emerging disease is human anaplasmosis. The causative organism, Anaplasma phagocytophilum, is known to infect multiple animal species and cause human fatalities in the U.S., Europe and Asia. Although long known to infect ruminants, it is unclear why there are increasing numbers of human infections. We analyzed the genome sequences of strains infecting humans, animals and ticks from diverse geographic locations. Despite extensive variability amongst these strains, those infecting humans had conserved genome structure including the pfam01617 superfamily that encodes the major, neutralization-sensitive, surface antigen. These data provide potential targets to identify human-infective strains and have significance for understanding the selective pressures that lead to emergence of disease in new species. PMID:25437207

  2. Genome-wide identification and structure-function studies of proteases and protease inhibitors in Cicer arietinum (chickpea).

    Science.gov (United States)

    Sharma, Ranu; Suresh, C G

    2015-01-01

    Proteases are a family of enzymes present in almost all living organisms. In plants they are involved in many biological processes requiring stress response in situations such as water deficiency, pathogen attack, maintaining protein content of the cell, programmed cell death, senescence, reproduction and many more. Similarly, protease inhibitors (PIs) are involved in various important functions like suppression of invasion by pathogenic nematodes, inhibition of spores-germination and mycelium growth of Alternaria alternata and response to wounding and fungal attack. As much as we know, no genome-wide study of proteases together with proteinaceous PIs is reported in any of the sequenced genomes till now. Phylogenetic studies and domain analysis of proteases were carried out to understand the molecular evolution as well as gene and protein features. Structural analysis was carried out to explore the binding mode and affinity of PIs for cognate proteases and prolyl oligopeptidase protease with inhibitor ligand. In the study reported here, a significant number of proteases and PIs were identified in chickpea genome. The gene expression profiles of proteases and PIs in five different plant tissues revealed a differential expression pattern in more than one plant tissue. Molecular dynamics studies revealed the formation of stable complex owing to increased number of protein-ligand and inter and intramolecular protein-protein hydrogen bonds. The genome-wide identification, characterization, evolutionary understanding, gene expression, and structural analysis of proteases and PIs provide a framework for future analysis when defining their roles in stress response and developing a more stress tolerant variety of chickpea. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Visualization of Genome Diversity in German Shepherd Dogs

    OpenAIRE

    Sally-Anne Mortlock; Rachel Booth; Hamutal Mazrier; Mehar S. Khatkar; Peter Williamson

    2016-01-01

    A loss of genetic diversity may lead to increased disease risks in subpopulations of dogs. The canine breed structure has contributed to relatively small effective population size in many breeds and can limit the options for selective breeding strategies to maintain diversity. With the completion of the canine genome sequencing project, and the subsequent reduction in the cost of genotyping on a genomic scale, evaluating diversity in dogs has become much more accurate and accessible. This pro...

  4. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    Directory of Open Access Journals (Sweden)

    Feltus Frank A

    2011-07-01

    homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy.

  5. Genomic Epidemiology of Salmonella enterica Serotype Enteritidis based on Population Structure of Prevalent Lineages

    DEFF Research Database (Denmark)

    Deng, Xiangyu; Desai, Prerak T.; den Bakker, Henk C.

    2014-01-01

    serotype Nitra strains. Single-nucleotide polymorphisms were filtered to identify 4,887 reliable loci that distinguished all isolates from each other. Our whole-genome single-nucleotide polymorphism typing approach was robust for S. enterica Enteritidis subtyping with combined data for different strains...

  6. Gene finding with a hidden Markov model of genome structure and evolution

    DEFF Research Database (Denmark)

    Pedersen, Jakob Skou; Hein, Jotun

    2003-01-01

    -specific evolutionary models based on a phylogenetic tree. All parameters can be estimated by maximum likelihood, including the phylogenetic tree. It can handle any number of aligned genomes, using their phylogenetic tree to model the evolutionary correlations. The time complexity of all algorithms used for handling...

  7. Genomic structure in Europeans dating back at least 36,200 years

    DEFF Research Database (Denmark)

    Seguin-Orlando, Andaine; Korneliussen, Thorfinn Sand; Sikora, Martin

    2014-01-01

    The origin of contemporary Europeans remains contentious. We obtained a genome sequence from Kostenki 14 in European Russia dating from 38,700 to 36,200 years ago, one of the oldest fossils of anatomically modern humans from Europe. We find that Kostenki 14 shares a close ancestry with the 24,000...

  8. Gene finding with a hidden Markov model of genome structure and evolution

    DEFF Research Database (Denmark)

    Pedersen, Jakob Skou; Hein, Jotun

    2003-01-01

    the model are linear in alignment length and genome number. The model is applied to the problem of gene finding. The benefit of modelling sequence evolution is demonstrated both in a range of simulations and on a set of orthologous human/mouse gene pairs. AVAILABILITY: Free availability over the Internet...

  9. Mitochondrial genome diversity and population structure of the giant squid Architeuthis

    DEFF Research Database (Denmark)

    Winkelmann, Inger Eleanor Hall; Campos, Paula; Strugnell, Jan

    2013-01-01

    techniques, considerable controversy exists with regard to topics as varied as their taxonomy, biology and even behaviour. In this study, we have characterized the mitochondrial genome (mitogenome) diversity of 43 Architeuthis samples collected from across the range of the species, in order to use genetic...... a recent population expansion or selective sweep, which may explain the low level of genetic diversity....

  10. Impact of nuclear organization and chromatin structure on DNA repair and genome stability

    International Nuclear Information System (INIS)

    Batte, Amandine

    2016-01-01

    The non-random organization of the eukaryotic cell nucleus and the folding of genome in chromatin more or less condensed can influence many functions related to DNA metabolism, including genome stability. Double-strand breaks (DSBs) are the most deleterious DNA damages for the cells. To preserve genome integrity, eukaryotic cells thus developed DSB repair mechanisms conserved from yeast to human, among which homologous recombination (HR) that uses an intact homologous sequence to repair a broken chromosome. HR can be separated in two sub-pathways: Gene Conversion (GC) transfers genetic information from one molecule to its homologous and Break Induced Replication (BIR) establishes a replication fork than can proceed until the chromosome end. My doctorate work was focused on the contribution of the chromatin context and 3D genome organization on DSB repair. In S. cerevisiae, nuclear organization and heterochromatin spreading at sub-telomeres can be modified through the overexpression of the Sir3 or sir3A2Q mutant proteins. We demonstrated that reducing the physical distance between homologous sequences increased GC rates, reinforcing the notion that homology search is a limiting step for recombination. We also showed that hetero-chromatinization of DSB site fine-tunes DSB resection, limiting the loss of the DSB ends required to perform homology search and complete HR. Finally, we noticed that the presence of heterochromatin at the donor locus decreased both GC and BIR efficiencies, probably by affecting strand invasion. This work highlights new regulatory pathways of DNA repair. (author) [fr

  11. Long-term response to genomic selection: effects of estimation method and reference population structure for different genetic architectures.

    Science.gov (United States)

    Bastiaansen, John W M; Coster, Albart; Calus, Mario P L; van Arendonk, Johan A M; Bovenhuis, Henk

    2012-01-24

    Genomic selection has become an important tool in the genetic improvement of animals and plants. The objective of this study was to investigate the impacts of breeding value estimation method, reference population structure, and trait genetic architecture, on long-term response to genomic selection without updating marker effects. Three methods were used to estimate genomic breeding values: a BLUP method with relationships estimated from genome-wide markers (GBLUP), a Bayesian method, and a partial least squares regression method (PLSR). A shallow (individuals from one generation) or deep reference population (individuals from five generations) was used with each method. The effects of the different selection approaches were compared under four different genetic architectures for the trait under selection. Selection was based on one of the three genomic breeding values, on pedigree BLUP breeding values, or performed at random. Selection continued for ten generations. Differences in long-term selection response were small. For a genetic architecture with a very small number of three to four quantitative trait loci (QTL), the Bayesian method achieved a response that was 0.05 to 0.1 genetic standard deviation higher than other methods in generation 10. For genetic architectures with approximately 30 to 300 QTL, PLSR (shallow reference) or GBLUP (deep reference) had an average advantage of 0.2 genetic standard deviation over the Bayesian method in generation 10. GBLUP resulted in 0.6% and 0.9% less inbreeding than PLSR and BM and on average a one third smaller reduction of genetic variance. Responses in early generations were greater with the shallow reference population while long-term response was not affected by reference population structure. The ranking of estimation methods was different with than without selection. Under selection, applying GBLUP led to lower inbreeding and a smaller reduction of genetic variance while a similar response to selection was

  12. When Is Network Lasso Accurate?

    Directory of Open Access Journals (Sweden)

    Alexander Jung

    2018-01-01

    Full Text Available The “least absolute shrinkage and selection operator” (Lasso method has been adapted recently for network-structured datasets. In particular, this network Lasso method allows to learn graph signals from a small number of noisy signal samples by using the total variation of a graph signal for regularization. While efficient and scalable implementations of the network Lasso are available, only little is known about the conditions on the underlying network structure which ensure network Lasso to be accurate. By leveraging concepts of compressed sensing, we address this gap and derive precise conditions on the underlying network topology and sampling set which guarantee the network Lasso for a particular loss function to deliver an accurate estimate of the entire underlying graph signal. We also quantify the error incurred by network Lasso in terms of two constants which reflect the connectivity of the sampled nodes.

  13. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    Science.gov (United States)

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  14. Intraspecies genomic diversity and natural population structure of the meat-borne lactic acid bacterium Lactobacillus sakei.

    Science.gov (United States)

    Chaillou, Stéphane; Daty, Marie; Baraige, Fabienne; Dudez, Anne-Marie; Anglade, Patricia; Jones, Rhys; Alpert, Carl-Alfred; Champomier-Vergès, Marie-Christine; Zagorec, Monique

    2009-02-01

    Lactobacillus sakei is a food-borne bacterium naturally found in meat and fish products. A study was performed to examine the intraspecies diversity among 73 isolates sourced from laboratory collections in several different countries. Pulsed-field gel electrophoresis analysis demonstrated a 25% variation in genome size between isolates, ranging from 1,815 kb to 2,310 kb. The relatedness between isolates was then determined using a PCR-based method that detects the possession of 60 chromosomal genes belonging to the flexible gene pool. Ten different strain clusters were identified that had noticeable differences in their average genome size reflecting the natural population structure. The results show that many different genotypes may be isolated from similar types of meat products, suggesting a complex ecological habitat in which intraspecies diversity may be required for successful adaptation. Finally, proteomic analysis revealed a slight difference between the migration patterns of highly abundant GapA isoforms of the two prevailing L. sakei subspecies (sakei and carnosus). This analysis was used to affiliate the genotypic clusters with the corresponding subspecies. These findings reveal for the first time the extent of intraspecies genomic diversity in L. sakei. Consequently, identification of molecular subtypes may in the future prove valuable for a better understanding of microbial ecosystems in food products.

  15. Integrating genomic information with protein sequence and 3D atomic level structure at the RCSB protein data bank.

    Science.gov (United States)

    Prlic, Andreas; Kalro, Tara; Bhattacharya, Roshni; Christie, Cole; Burley, Stephen K; Rose, Peter W

    2016-12-15

    The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics. The new views are available from http://www.rcsb.org and software is available from https://github.com/rcsb/. andreas.prlic@rcsb.orgSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  16. Spectrally accurate contour dynamics

    International Nuclear Information System (INIS)

    Van Buskirk, R.D.; Marcus, P.S.

    1994-01-01

    We present an exponentially accurate boundary integral method for calculation the equilibria and dynamics of piece-wise constant distributions of potential vorticity. The method represents contours of potential vorticity as a spectral sum and solves the Biot-Savart equation for the velocity by spectrally evaluating a desingularized contour integral. We use the technique in both an initial-value code and a newton continuation method. Our methods are tested by comparing the numerical solutions with known analytic results, and it is shown that for the same amount of computational work our spectral methods are more accurate than other contour dynamics methods currently in use

  17. 3D modeling of genome macroorganization on the basis of its structural changes after action of radiation

    International Nuclear Information System (INIS)

    Aleksandrov, I.D.; Aleksandrova, M.V.; Zaikin, N.S.; Koren'kov, V.V.; Pervushova, O.V.; Stepanenko, V.A.

    2006-01-01

    At present, after 120 years of the theoretical and experimental works, the issue of the genome macroarchitecture as the highest level of interphase chromosome organization in somatic cell nuclei remains still unresolved. The problem of the spatial arrangement of interphase chromosomes in haploid germ cells has never even been studied. A 3D simulation of packaging of the entire second chromosome in Drosophila mature sperms has been performed by using mathematical approaches and visualization methods to present macromolecular structure data. As genetic markers for simulation, frequency and location of the second inversion breakpoints for 72 structural υg mutants induced by ionizing radiation were used supposing that both ends of each inversion are topologically brought together forming loop of appropriate size. For the account of a degree of spatial affinity and visualization of chromosomal loops modern 3D-modeling methods with application of splines, libraries OpenGL, language Delphi, program Gmax were used. According to the model proposed, the entire second chromosome within mature sperm nuclei seems to be packaged in the form of a megarosette-loop structure which may be a basic principle of organization of the genome macro-architecture in animal haploid germ cells

  18. SHAPE analysis of the FIV Leader RNA reveals a structural switch potentially controlling viral packaging and genome dimerization.

    Science.gov (United States)

    Kenyon, Julia C; Tanner, Sian J; Legiewicz, Michal; Phillip, Pretty S; Rizvi, Tahir A; Le Grice, Stuart F J; Lever, Andrew M L

    2011-08-01

    Feline immunodeficiency virus (FIV) infects many species of cat, and is related to HIV, causing a similar pathology. High-throughput selective 2' hydroxyl acylation analysed by primer extension (SHAPE), a technique that allows structural interrogation at each nucleotide, was used to map the secondary structure of the FIV packaging signal RNA. Previous studies of this RNA showed four conserved stem-loops, extensive long-range interactions (LRIs) and a small, palindromic stem-loop (SL5) within the gag open reading frame (ORF) that may act as a dimerization initiation site (DIS), enabling the virus to package two copies of its genome. Our analyses of wild-type (wt) and mutant RNAs suggest that although the four conserved stem-loops are static structures, the 5' and 3' regions previously shown to form LRI also adopt an alternative, yet similarly conserved conformation, in which the putative DIS is occluded, and which may thus favour translational and splicing functions over encapsidation. SHAPE and in vitro dimerization assays were used to examine SL5 mutants. Dimerization contacts appear to be made between palindromic loop sequences in SL5. As this stem-loop is located within the gag ORF, recognition of a dimeric RNA provides a possible mechanism for the specific packaging of genomic over spliced viral RNAs.

  19. The genome structure of Arachis hypogaea (Linnaeus, 1753 and an induced Arachis allotetraploid revealed by molecular cytogenetics

    Directory of Open Access Journals (Sweden)

    Eliza F. de M. B. do Nascimento

    2018-03-01

    Full Text Available Peanut, Arachis hypogaea (Linnaeus, 1753 is an allotetraploid cultivated plant with two subgenomes derived from the hybridization between two diploid wild species, A. duranensis (Krapovickas & W. C. Gregory, 1994 and A. ipaensis (Krapovickas & W. C. Gregory, 1994, followed by spontaneous chromosomal duplication. To understand genome changes following polyploidy, the chromosomes of A. hypogaea, IpaDur1, an induced allotetraploid (A. ipaensis × A. duranensis4x and the diploid progenitor species were cytogenetically compared. The karyotypes of the allotetraploids share the number and general morphology of chromosomes; DAPI+ bands pattern and number of 5S rDNA loci. However, one 5S rDNA locus presents a heteromorphic FISH signal in both allotetraploids, relative to corresponding progenitor. Whilst for A. hypogaea the number of 45S rDNA loci was equivalent to the sum of those present in the diploid species, in IpaDur1, two loci have not been detected. Overall distribution of repetitive DNA sequences was similar in both allotetraploids, although A. hypogaea had additional CMA3+ bands and few slight differences in the LTR-retrotransposons distribution compared to IpaDur1. GISH showed that the chromosomes of both allotetraploids had preferential hybridization to their corresponding diploid genomes. Nevertheless, at least one pair of IpaDur1 chromosomes had a clear mosaic hybridization pattern indicating recombination between the subgenomes, clear evidence that the genome of IpaDur1 shows some instability comparing to the genome of A. hypogaea that shows no mosaic of subgenomes, although both allotetraploids derive from the same progenitor species. For some reasons, the chromosome structure of A. hypogaea is inherently more stable, or, it has been at least, partially stabilized through genetic changes and selection.

  20. Accurate phylogenetic classification of DNA fragments based onsequence composition

    Energy Technology Data Exchange (ETDEWEB)

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequence characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.

  1. The complete genome structure and phylogenetic relationship of infectious hematopoietic necrosis virus

    Science.gov (United States)

    Morzunov , Sergey P.; Winton, James R.; Nichol, Stuart T.

    1995-01-01

    Infectious hematopoietic necrosis virus (IHNV), a member of the family Rhabdoviridae, causes a severe disease with high mortality in salmonid fish. The nucleotide sequence (11, 131 bases) of the entire genome was determined for the pathogenic WRAC strain of IHNV from southern Idaho. This allowed detailed analysis of all 6 genes, the deduced amino acid sequences of their encoded proteins, and important control motifs including leader, trailer and gene junction regions. Sequence analysis revealed that the 6 virus genes are located along the genome in the 3′ to 5′ order: nucleocapsid (N), polymerase-associated phosphoprotein (P or M1), matrix protein (M or M2), surface glycoprotein (G), a unique non-virion protein (NV) and virus polymerase (L). The IHNV genome RNA was found to have highly complementary termini (15 of 16 nucleotides). The gene junction regions display the highly conserved sequence UCURUC(U)7RCCGUG(N)4CACR (in the vRNA sense), which includes the typical rhabdovirus transcription termination/polyadenylation signal and a novel putative transcription initiation signal. Phylogenetic analysis of M, G and L protein sequences allowed insights into the evolutionary and taxonomic relationship of rhabdoviruses of fish relative to those of insects or mammals, and a broader sense of the relationship of non-segmented negative-strand RNA viruses. Based on these data, a new genus, piscivirus, is proposed which will initially contain IHNV, viral hemorrhagic septicemia virus and Hirame rhabdovirus.

  2. Structurally Complex Organization of Repetitive DNAs in the Genome of Cobia (Rachycentron canadum).

    Science.gov (United States)

    Costa, Gideão W W F; Cioffi, Marcelo de B; Bertollo, Luiz A C; Molina, Wagner F

    2015-06-01

    Repetitive DNAs comprise the largest fraction of the eukaryotic genome. They include microsatellites or simple sequence repeats (SSRs), which play an important role in the chromosome differentiation among fishes. Rachycentron canadum is the only representative of the family Rachycentridae. This species has been focused on several multidisciplinary studies in view of its important potential for marine fish farming. In the present study, distinct classes of repetitive DNAs, with emphasis on SSRs, were mapped in the chromosomes of this species to improve the knowledge of its genome organization. Microsatellites exhibited a diversified distribution, both dispersed in euchromatin and clustered in the heterochromatin. The multilocus location of SSRs strengthened the heterochromatin heterogeneity in this species, as suggested by some previous studies. The colocalization of SSRs with retrotransposons and transposons pointed to a close evolutionary relationship between these repetitive sequences. A number of heterochromatic regions highlighted a greater complex organization than previously supposed, harboring a diversity of repetitive elements. In this sense, there was also evidence of colocalization of active genetic regions and different classes of repetitive DNAs in a common heterochromatic region, which offers a potential opportunity for further researches regarding the interaction of these distinct fractions in fish genomes.

  3. The genome of obligately intracellular Ehrlichia canis revealsthemes of complex membrane structure and immune evasion strategies

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K.; Kuyler Doyle, C.; Lykidis, A.; Ivanova, N.; Francino, P.; Chain, P.; Shin, M.; Malfatti, S.; Larimer, F.; Copeland,A.; Detter, J.C.; Land, M.; Richardson, P.M.; Yu, X.J.; Walker, D.H.; McBride, J.W.; Kyrpides, N.C.

    2005-09-01

    Ehrlichia canis, a small obligately intracellular, tick-transmitted, gram-negative, a-proteobacterium is the primary etiologic agent of globally distributed canine monocytic ehrlichiosis. Complete genome sequencing revealed that the E. canis genome consists of a single circular chromosome of 1,315,030 bp predicted to encode 925 proteins, 40 stable RNA species, and 17 putative pseudogenes, and a substantial proportion of non-coding sequence (27 percent). Interesting genome features include a large set of proteins with transmembrane helices and/or signal sequences, and a unique serine-threonine bias associated with the potential for O-glycosylation that was prominent in proteins associated with pathogen-host interactions. Furthermore, two paralogous protein families associated with immune evasion were identified, one of which contains poly G:C tracts, suggesting that they may play a role in phase variation and facilitation of persistent infections. Proteins associated with pathogen-host interactions were identified including a small group of proteins (12) with tandem repeats and another with eukaryotic-like ankyrin domains (7).

  4. Genome-wide analysis of the expansin gene superfamily reveals grapevine-specific structural and functional characteristics.

    Directory of Open Access Journals (Sweden)

    Silvia Dal Santo

    Full Text Available BACKGROUND: Expansins are proteins that loosen plant cell walls in a pH-dependent manner, probably by increasing the relative movement among polymers thus causing irreversible expansion. The expansin superfamily (EXP comprises four distinct families: expansin A (EXPA, expansin B (EXPB, expansin-like A (EXLA and expansin-like B (EXLB. There is experimental evidence that EXPA and EXPB proteins are required for cell expansion and developmental processes involving cell wall modification, whereas the exact functions of EXLA and EXLB remain unclear. The complete grapevine (Vitis vinifera genome sequence has allowed the characterization of many gene families, but an exhaustive genome-wide analysis of expansin gene expression has not been attempted thus far. METHODOLOGY/PRINCIPAL FINDINGS: We identified 29 EXP superfamily genes in the grapevine genome, representing all four EXP families. Members of the same EXP family shared the same exon-intron structure, and phylogenetic analysis confirmed a closer relationship between EXP genes from woody species, i.e. grapevine and poplar (Populus trichocarpa, compared to those from Arabidopsis thaliana and rice (Oryza sativa. We also identified grapevine-specific duplication events involving the EXLB family. Global gene expression analysis confirmed a strong correlation among EXP genes expressed in mature and green/vegetative samples, respectively, as reported for other gene families in the recently-published grapevine gene expression atlas. We also observed the specific co-expression of EXLB genes in woody organs, and the involvement of certain grapevine EXP genes in berry development and post-harvest withering. CONCLUSION: Our comprehensive analysis of the grapevine EXP superfamily confirmed and extended current knowledge about the structural and functional characteristics of this gene family, and also identified properties that are currently unique to grapevine expansin genes. Our data provide a model for the

  5. Whole-genome sequence, SNP chips and pedigree structure: building demographic profiles in domestic dog breeds to optimize genetic-trait mapping

    Science.gov (United States)

    Dreger, Dayna L.; Rimbault, Maud; Davis, Brian W.; Bhatnagar, Adrienne; Parker, Heidi G.

    2016-01-01

    ABSTRACT In the decade following publication of the draft genome sequence of the domestic dog, extraordinary advances with application to several fields have been credited to the canine genetic system. Taking advantage of closed breeding populations and the subsequent selection for aesthetic and behavioral characteristics, researchers have leveraged the dog as an effective natural model for the study of complex traits, such as disease susceptibility, behavior and morphology, generating unique contributions to human health and biology. When designing genetic studies using purebred dogs, it is essential to consider the unique demography of each population, including estimation of effective population size and timing of population bottlenecks. The analytical design approach for genome-wide association studies (GWAS) and analysis of whole-genome sequence (WGS) experiments are inextricable from demographic data. We have performed a comprehensive study of genomic homozygosity, using high-depth WGS data for 90 individuals, and Illumina HD SNP data from 800 individuals representing 80 breeds. These data were coupled with extensive pedigree data analyses for 11 breeds that, together, allowed us to compute breed structure, demography, and molecular measures of genome diversity. Our comparative analyses characterize the extent, formation and implication of breed-specific diversity as it relates to population structure. These data demonstrate the relationship between breed-specific genome dynamics and population architecture, and provide important considerations influencing the technological and cohort design of association and other genomic studies. PMID:27874836

  6. Exploiting the functional and taxonomic structure of genomic data by probabilistic topic modeling.

    Science.gov (United States)

    Chen, Xin; Hu, Xiaohua; Lim, Tze Y; Shen, Xiajiong; Park, E K; Rosen, Gail L

    2012-01-01

    In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a “document,” which has a mixture of functional groups, while each functional group (also known as a “latent topic”) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of “N-mer” features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the “N-mer” features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.

  7. ReMixT: clone-specific genomic structure estimation in cancer.

    Science.gov (United States)

    McPherson, Andrew W; Roth, Andrew; Ha, Gavin; Chauve, Cedric; Steif, Adi; de Souza, Camila P E; Eirew, Peter; Bouchard-Côté, Alexandre; Aparicio, Sam; Sahinalp, S Cenk; Shah, Sohrab P

    2017-07-27

    Somatic evolution of malignant cells produces tumors composed of multiple clonal populations, distinguished in part by rearrangements and copy number changes affecting chromosomal segments. Whole genome sequencing mixes the signals of sampled populations, diluting the signals of clone-specific aberrations, and complicating estimation of clone-specific genotypes. We introduce ReMixT, a method to unmix tumor and contaminating normal signals and jointly predict mixture proportions, clone-specific segment copy number, and clone specificity of breakpoints. ReMixT is free, open-source software and is available at http://bitbucket.org/dranew/remixt .

  8. Gene design, cloning and protein-expression methods for high-value targets at the Seattle Structural Genomics Center for Infectious Disease

    International Nuclear Information System (INIS)

    Raymond, Amy; Haffner, Taryn; Ng, Nathan; Lorimer, Don; Staker, Bart; Stewart, Lance

    2011-01-01

    An overview of one salvage strategy for high-value SSGCID targets is given. Any structural genomics endeavor, particularly ambitious ones such as the NIAID-funded Seattle Structural Genomics Center for Infectious Disease (SSGCID) and Center for Structural Genomics of Infectious Disease (CSGID), face technical challenges at all points of the production pipeline. One salvage strategy employed by SSGCID is combined gene engineering and structure-guided construct design to overcome challenges at the levels of protein expression and protein crystallization. Multiple constructs of each target are cloned in parallel using Polymerase Incomplete Primer Extension cloning and small-scale expressions of these are rapidly analyzed by capillary electrophoresis. Using the methods reported here, which have proven particularly useful for high-value targets, otherwise intractable targets can be resolved

  9. Functional annotation by sequence-weighted structure alignments: statistical analysis and case studies from the Protein 3000 structural genomics project in Japan.

    Science.gov (United States)

    Standley, Daron M; Toh, Hiroyuki; Nakamura, Haruki

    2008-09-01

    A method to functionally annotate structural genomics targets, based on a novel structural alignment scoring function, is proposed. In the proposed score, position-specific scoring matrices are used to weight structurally aligned residue pairs to highlight evolutionarily conserved motifs. The functional form of the score is first optimized for discriminating domains belonging to the same Pfam family from domains belonging to different families but the same CATH or SCOP superfamily. In the optimization stage, we consider four standard weighting functions as well as our own, the "maximum substitution probability," and combinations of these functions. The optimized score achieves an area of 0.87 under the receiver-operating characteristic curve with respect to identifying Pfam families within a sequence-unique benchmark set of domain pairs. Confidence measures are then derived from the benchmark distribution of true-positive scores. The alignment method is next applied to the task of functionally annotating 230 query proteins released to the public as part of the Protein 3000 structural genomics project in Japan. Of these queries, 78 were found to align to templates with the same Pfam family as the query or had sequence identities > or = 30%. Another 49 queries were found to match more distantly related templates. Within this group, the template predicted by our method to be the closest functional relative was often not the most structurally similar. Several nontrivial cases are discussed in detail. Finally, 103 queries matched templates at the fold level, but not the family or superfamily level, and remain functionally uncharacterized. 2008 Wiley-Liss, Inc.

  10. Embarras de richesses - It is not good to be too anomalous: Accurate structure of selenourea, a chiral crystal of planar molecules.

    Science.gov (United States)

    Luo, Zhipu; Dauter, Zbigniew

    2017-01-01

    Selenourea, SeC(NH2)2, recently found an application as a derivatization reagent providing a significant anomalous diffraction signal used for phasing macromolecular crystal structures. The crystal structure of selenourea itself was solved about 50 years ago, from data recorded on films and evaluated by eye and refined to R = 0.15 with errors of bond lengths and angles about 0.1 Å and 6°. In the current work this structure is re-evaluated on the basis of synchrotron data and refined to R1 = 0.021 with bond and angle errors about 0.007 Å and 0.5°. The nine planar molecules of selenourea pack either in the P31 or in the P32 unit cell. All unique molecules are connected by a complex network of Se•••H-N hydrogen bonds and Se•••Se contacts. The packing of selenourea molecules is highly pseudosymmetric, approximating either of the P31(2)12, R3, and R32 space groups. Because the overwhelming majority of diffracted X-ray intensity originates form the anomalously scattering selenium atoms, the measurable anomalous Bijvoet differences are diminished and it was not possible to solve this crystal structure based on the anomalous signal alone.

  11. Structural and genomic properties of the hyperthermophilic archaeal virus ATV with an extracellular stage of the reproductive cycle.

    Science.gov (United States)

    Prangishvili, David; Vestergaard, Gisle; Häring, Monika; Aramayo, Ricardo; Basta, Tamara; Rachel, Reinhard; Garrett, Roger A

    2006-06-23

    A novel virus, ATV, of the hyperthermophilic archaeal genus Acidianus has the unique property of undergoing a major morphological development outside of, and independently of, the host cell. Virions are extruded from host cells as lemon-shaped tail-less particles, after which they develop long tails at each pointed end, at temperatures close to that of the natural habitat, 85 degrees C. The extracellularly developed tails constitute tubes, which terminate in an anchor-like structure that is not observed in the tail-less particles. A thin filament is located within the tube, which exhibits a periodic structure. Tail development produces a one half reduction in the volume of the virion, concurrent with a slight expansion of the virion surface. The circular, double-stranded DNA genome contains 62,730 bp and is exceptional for a crenarchaeal virus in that it carries four putative transposable elements as well as genes, which previously have been associated only with archaeal self-transmissable plasmids. In total, it encodes 72 predicted proteins, including 11 structural proteins with molecular masses in the range of 12 to 90 kDa. Several of the larger proteins are rich in coiled coil and/or low complexity sequence domains, which are unusual for archaea. One protein, in particular P800, resembles an intermediate filament protein in its structural properties. It is modified in the two-tailed, but not in the tail-less, virion particles and it may contribute to viral tail development. Exceptionally for a crenarchaeal virus, infection with ATV results either in viral replication and subsequent cell lysis or in conversion of the infected cell to a lysogen. The lysogenic cycle involves integration of the viral genome into the host chromosome, probably facilitated by the virus-encoded integrase and this process can be interrupted by different stress factors.

  12. Structural diversity and dynamics of genomic replication origins in Schizosaccharomyces pombe

    Science.gov (United States)

    Cotobal, Cristina; Segurado, Mónica; Antequera, Francisco

    2010-01-01

    DNA replication origins (ORI) in Schizosaccharomyces pombe colocalize with adenine and thymine (A+T)-rich regions, and earlier analyses have established a size from 0.5 to over 3 kb for a DNA fragment to drive replication in plasmid assays. We have asked what are the requirements for ORI function in the chromosomal context. By designing artificial ORIs, we have found that A+T-rich fragments as short as 100 bp without homology to S. pombe DNA are able to initiate replication in the genome. On the other hand, functional dissection of endogenous ORIs has revealed that some of them span a few kilobases and include several modules that may be as short as 25–30 contiguous A+Ts capable of initiating replication from ectopic chromosome positions. The search for elements with these characteristics across the genome has uncovered an earlier unnoticed class of low-efficiency ORIs that fire late during S phase. These results indicate that ORI specification and dynamics varies widely in S. pombe, ranging from very short elements to large regions reminiscent of replication initiation zones in mammals. PMID:20094030

  13. Vitamin D receptor signaling and its therapeutic implications: Genome-wide and structural view.

    Science.gov (United States)

    Carlberg, Carsten; Molnár, Ferdinand

    2015-05-01

    Vitamin D3 is one of the few natural compounds that has, via its metabolite 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) and the transcription factor vitamin D receptor (VDR), a direct effect on gene regulation. For efficiently applying the therapeutic and disease-preventing potential of 1,25(OH)2D3 and its synthetic analogs, the key steps in vitamin D signaling need to be understood. These are the different types of molecular interactions with the VDR, such as (i) the complex formation of VDR with genomic DNA, (ii) the interaction of VDR with its partner transcription factors, (iii) the binding of 1,25(OH)2D3 or its synthetic analogs within the ligand-binding pocket of the VDR, and (iv) the resulting conformational change on the surface of the VDR leading to a change of the protein-protein interaction profile of the receptor with other proteins. This review will present the latest genome-wide insight into vitamin D signaling, and will discuss its therapeutic implications.

  14. De novo prediction of structured RNAs from genomic sequences

    DEFF Research Database (Denmark)

    Gorodkin, Jan; Hofacker, Ivo L.; Þórarinsson, Elfar

    2010-01-01

    currently available, because evolutionary conservation highlights functionally important regions. Conserved secondary structure, rather than primary sequence, is the hallmark of many functionally important RNAs, because compensatory substitutions in base-paired regions preserve structure. Unfortunately...

  15. FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

    Science.gov (United States)

    Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

    2009-12-01

    , industrial and physical applications. However, despite recent modelling advances, the accurate numerical solution of the equations governing such problems is still at a relatively early stage. Indeed, recent studies employing a simplifying long-wave approximation have shown that highly efficient numerical methods are necessary to solve the resulting lubrication equations in order to achieve the level of grid resolution required to accurately capture the effects of micro- and nano-scale topographical features. Solution method: A portable parallel multigrid algorithm has been developed for the above purpose, for the particular case of flow over submerged topographical features. Within the multigrid framework adopted, a W-cycle is used to accelerate convergence in respect of the time dependent nature of the problem, with relaxation sweeps performed using a fixed number of pre- and post-Red-Black Gauss-Seidel Newton iterations. In addition, the algorithm incorporates automatic adaptive time-stepping to avoid the computational expense associated with repeated time-step failure. Running time: 1.31 minutes using 128 processors on BlueGene/P with a problem size of over 16.7 million mesh points.

  16. Structure and possible function of a G-quadruplex in the long terminal repeat of the proviral HIV-1 genome.

    Science.gov (United States)

    De Nicola, Beatrice; Lech, Christopher J; Heddi, Brahim; Regmi, Sagar; Frasson, Ilaria; Perrone, Rosalba; Richter, Sara N; Phan, Anh Tuân

    2016-07-27

    The long terminal repeat (LTR) of the proviral human immunodeficiency virus (HIV)-1 genome is integral to virus transcription and host cell infection. The guanine-rich U3 region within the LTR promoter, previously shown to form G-quadruplex structures, represents an attractive target to inhibit HIV transcription and replication. In this work, we report the structure of a biologically relevant G-quadruplex within the LTR promoter region of HIV-1. The guanine-rich sequence designated LTR-IV forms a well-defined structure in physiological cationic solution. The nuclear magnetic resonance (NMR) structure of this sequence reveals a parallel-stranded G-quadruplex containing a single-nucleotide thymine bulge, which participates in a conserved stacking interaction with a neighboring single-nucleotide adenine loop. Transcription analysis in a HIV-1 replication competent cell indicates that the LTR-IV region may act as a modulator of G-quadruplex formation in the LTR promoter. Consequently, the LTR-IV G-quadruplex structure presented within this work could represent a valuable target for the design of HIV therapeutics. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Insights in the electronic structure and redox reaction energy in LiFePO4 battery material from an accurate Tran-Blaha modified Becke Johnson potential

    International Nuclear Information System (INIS)

    Araujo, Rafael B.; Almeida, J. de S; Ferreira da Silva, A.; Ahuja, Rajeev

    2015-01-01

    The main goals of this paper are to investigate the accuracy of the Tran-Blaha modified Becke Johnson (TB-mBJ) potential to predict the electronic structure of lithium iron phosphate and the related redox reaction energy with the lithium deintercalation process. The computed electronic structures show that the TB-mBJ method is able to partially localize Fe-3d electrons in LiFePO 4 and FePO 4 which usually is a problem for the generalized gradient approximation (GGA) due to the self interaction error. The energy band gap is also improved by the TB-mBJ calculations in comparison with the GGA results. It turned out, however, that the redox reaction energy evaluated by the TB-mBJ technique is not in good agreement with the measured one. It is speculated that this disagreement in the computed redox energy and the experimental value is due to the lack of a formal expression to evaluate the exchange and correlation energy. Therefore, the TB-mBJ is an efficient method to improve the prediction of the electronic structures coming form the standard GGA functional in LiFePO 4 and FePO 4 . However, it does not appear to have the same efficiency for evaluating the redox reaction energies for the investigated system

  18. Accurate Assessment of the Oxygen Reduction Electrocatalytic Activity of Mn/Polypyrrole Nanocomposites Based on Rotating Disk Electrode Measurements, Complemented with Multitechnique Structural Characterizations

    Science.gov (United States)

    Sánchez, Carolina Ramírez; Taurino, Antonietta; Bozzini, Benedetto

    2016-01-01

    This paper reports on the quantitative assessment of the oxygen reduction reaction (ORR) electrocatalytic activity of electrodeposited Mn/polypyrrole (PPy) nanocomposites for alkaline aqueous solutions, based on the Rotating Disk Electrode (RDE) method and accompanied by structural characterizations relevant to the establishment of structure-function relationships. The characterization of Mn/PPy films is addressed to the following: (i) morphology, as assessed by Field-Emission Scanning Electron Microscopy (FE-SEM) and Atomic Force Microscope (AFM); (ii) local electrical conductivity, as measured by Scanning Probe Microscopy (SPM); and (iii) molecular structure, accessed by Raman Spectroscopy; these data provide the background against which the electrocatalytic activity can be rationalised. For comparison, the properties of Mn/PPy are gauged against those of graphite, PPy, and polycrystalline-Pt (poly-Pt). Due to the literature lack of accepted protocols for precise catalytic activity measurement at poly-Pt electrode in alkaline solution using the RDE methodology, we have also worked on the obtainment of an intralaboratory benchmark by evidencing some of the time-consuming parameters which drastically affect the reliability and repeatability of the measurement. PMID:28042491

  19. Accurate Assessment of the Oxygen Reduction Electrocatalytic Activity of Mn/Polypyrrole Nanocomposites Based on Rotating Disk Electrode Measurements, Complemented with Multitechnique Structural Characterizations

    Directory of Open Access Journals (Sweden)

    Patrizia Bocchetta

    2016-01-01

    Full Text Available This paper reports on the quantitative assessment of the oxygen reduction reaction (ORR electrocatalytic activity of electrodeposited Mn/polypyrrole (PPy nanocomposites for alkaline aqueous solutions, based on the Rotating Disk Electrode (RDE method and accompanied by structural characterizations relevant to the establishment of structure-function relationships. The characterization of Mn/PPy films is addressed to the following: (i morphology, as assessed by Field-Emission Scanning Electron Microscopy (FE-SEM and Atomic Force Microscope (AFM; (ii local electrical conductivity, as measured by Scanning Probe Microscopy (SPM; and (iii molecular structure, accessed by Raman Spectroscopy; these data provide the background against which the electrocatalytic activity can be rationalised. For comparison, the properties of Mn/PPy are gauged against those of graphite, PPy, and polycrystalline-Pt (poly-Pt. Due to the literature lack of accepted protocols for precise catalytic activity measurement at poly-Pt electrode in alkaline solution using the RDE methodology, we have also worked on the obtainment of an intralaboratory benchmark by evidencing some of the time-consuming parameters which drastically affect the reliability and repeatability of the measurement.

  20. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Lai-Ping Wong

    2014-05-01

    Full Text Available South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP. The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP. SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  1. Insights into the genetic structure and diversity of 38 South Asian Indians from deep whole-genome sequencing.

    Science.gov (United States)

    Wong, Lai-Ping; Lai, Jason Kuan-Han; Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2014-05-01

    South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language-speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP.

  2. Elevated Rate of Genome Rearrangements in Radiation-Resistant Bacteria.

    Science.gov (United States)

    Repar, Jelena; Supek, Fran; Klanjscek, Tin; Warnecke, Tobias; Zahradka, Ksenija; Zahradka, Davor

    2017-04-01

    A number of bacterial, archaeal, and eukaryotic species are known for their resistance to ionizing radiation. One of the challenges these species face is a potent environmental source of DNA double-strand breaks, potential drivers of genome structure evolution. Efficient and accurate DNA double-strand break repair systems have been demonstrated in several unrelated radiation-resistant species and are putative adaptations to the DNA damaging environment. Such adaptations are expected to compensate for the genome-destabilizing effect of environmental DNA damage and may be expected to result in a more conserved gene order in radiation-resistant species. However, here we show that rates of genome rearrangements, measured as loss of gene order conservation with time, are higher in radiation-resistant species in multiple, phylogenetically independent groups of bacteria. Comparison of indicators of selection for genome organization between radiation-resistant and phylogenetically matched, nonresistant species argues against tolerance to disruption of genome structure as a strategy for radiation resistance. Interestingly, an important mechanism affecting genome rearrangements in prokaryotes, the symmetrical inversions around the origin of DNA replication, shapes genome structure of both radiation-resistant and nonresistant species. In conclusion, the opposing effects of environmental DNA damage and DNA repair result in elevated rates of genome rearrangements in radiation-resistant bacteria. Copyright © 2017 Repar et al.

  3. Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage

    Directory of Open Access Journals (Sweden)

    Vaisman Iosif I

    2010-10-01

    Full Text Available Abstract Background HIV-1 targets human cells expressing both the CD4 receptor, which binds the viral envelope glycoprotein gp120, as well as either the CCR5 (R5 or CXCR4 (X4 co-receptors, which interact primarily with the third hypervariable loop (V3 loop of gp120. Determination of HIV-1 affinity for either the R5 or X4 co-receptor on host cells facilitates the inclusion of co-receptor antagonists as a part of patient treatment strategies. A dataset of 1193 distinct gp120 V3 loop peptide sequences (989 R5-utilizing, 204 X4-capable is utilized to train predictive classifiers based on implementations of random forest, support vector machine, boosted decision tree, and neural network machine learning algorithms. An in silico mutagenesis procedure employing multibody statistical potentials, computational geometry, and threading of variant V3 sequences onto an experimental structure, is used to generate a feature vector representation for each variant whose components measure environmental perturbations at corresponding structural positions. Results Classifier performance is evaluated based on stratified 10-fold cross-validation, stratified dataset splits (2/3 training, 1/3 validation, and leave-one-out cross-validation. Best reported values of sensitivity (85%, specificity (100%, and precision (98% for predicting X4-capable HIV-1 virus, overall accuracy (97%, Matthew's correlation coefficient (89%, balanced error rate (0.08, and ROC area (0.97 all reach critical thresholds, suggesting that the models outperform six other state-of-the-art methods and come closer to competing with phenotype assays. Conclusions The trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence

  4. The Devil is in the Details: Using X-Ray Computed Tomography to Develop Accurate 3D Grain Characteristics and Bed Structure Metrics for Gravel Bed Rivers

    Science.gov (United States)

    Voepel, H.; Hodge, R. A.; Leyland, J.; Sear, D. A.; Ahmed, S. I.

    2014-12-01

    Uncertainty for bedload estimates in gravel bed rivers is largely driven by our inability to characterize the arrangement and orientation of the sediment grains within the bed. The characteristics of the surface structure are produced by the water working of grains, which leads to structural differences in bedforms through differential patterns of grain sorting, packing, imbrication, mortaring and degree of bed armoring. Until recently the technical and logistical difficulties of characterizing the arrangement of sediment in 3D have prohibited a full understanding of how grains interact with stream flow and the feedback mechanisms that exist. Micro-focus X-ray CT has been used for non-destructive 3D imaging of grains within a series of intact sections of river bed taken from key morphological units (see Figure 1). Volume, center of mass, points of contact, protrusion and spatial orientation of individual surface grains are derived from these 3D images, which in turn, facilitates estimates of 3D static force properties at the grain-scale such as pivoting angles, buoyancy and gravity forces, and grain exposure. By aggregating representative samples of grain-scale properties of localized interacting sediment into overall metrics, we can compare and contrast bed stability at a macro-scale with respect to stream bed morphology. Understanding differences in bed stability through representative metrics derived at the grain-scale will ultimately lead to improved bedload estimates with reduced uncertainty and increased understanding of interactions between grain-scale properties on channel morphology. Figure 1. CT-Scans of a water worked gravel-filled pot. a. 3D rendered scan showing the outer mesh, and b. the same pot with the mesh removed. c. vertical change in porosity of the gravels sampled in 5mm volumes. Values are typical of those measured in the field and lab. d. 2-D slices through the gravels at 20% depth from surface (porosity = 0.35), and e. 75% depth from

  5. From the genome to the phenome and back: linking genes with human brain function and structure using genetically informed neuroimaging

    DEFF Research Database (Denmark)

    Siebner, H R; Callicott, J H; Sommer, T

    2009-01-01

    In recent years, an array of brain mapping techniques has been successfully employed to link individual differences in circuit function or structure in the living human brain with individual variations in the human genome. Several proof-of-principle studies provided converging evidence that brain...... imaging can establish important links between genes and behaviour. The overarching goal is to use genetically informed brain imaging to pinpoint neurobiological mechanisms that contribute to behavioural intermediate phenotypes or disease states. This special issue on "Linking Genes to Brain Function...... in Health and Disease" provides an overview over how the "imaging genetics" approach is currently applied in the various fields of systems neuroscience to reveal the genetic underpinnings of complex behaviours and brain diseases. While the rapidly emerging field of imaging genetics holds great promise...

  6. Correlation between sequence conservation and structural thermodynamics of microRNA precursors from human, mouse, and chicken genomes

    Directory of Open Access Journals (Sweden)

    Wang Shengqi

    2010-10-01

    Full Text Available Abstract Background Previous studies have shown that microRNA precursors (pre-miRNAs have considerably more stable secondary structures than other native RNAs (tRNA, rRNA, and mRNA and artificial RNA sequences. However, pre-miRNAs with ultra stable secondary structures have not been investigated. It is not known if there is a tendency in pre-miRNA sequences towards or against ultra stable structures? Furthermore, the relationship between the structural thermodynamic stability of pre-miRNA and their evolution remains unclear. Results We investigated the correlation between pre-miRNA sequence conservation and structural stability as measured by adjusted minimum folding free energies in pre-miRNAs isolated from human, mouse, and chicken. The analysis revealed that conserved and non-conserved pre-miRNA sequences had structures with similar average stabilities. However, the relatively ultra stable and unstable pre-miRNAs were more likely to be non-conserved than pre-miRNAs with moderate stability. Non-conserved pre-miRNAs had more G+C than A+U nucleotides, while conserved pre-miRNAs contained more A+U nucleotides. Notably, the U content of conserved pre-miRNAs was especially higher than that of non-conserved pre-miRNAs. Further investigations showed that conserved and non-conserved pre-miRNAs exhibited different structural element features, even though they had comparable levels of stability. Conclusions We proposed that there is a correlation between structural thermodynamic stability and sequence conservation for pre-miRNAs from human, mouse, and chicken genomes. Our analyses suggested that pre-miRNAs with relatively ultra stable or unstable structures were less favoured by natural selection than those with moderately stable structures. Comparison of nucleotide compositions between non-conserved and conserved pre-miRNAs indicated the importance of U nucleotides in the pre-miRNA evolutionary process. Several characteristic structural elements were

  7. Genomic effects on advertisement call structure in diploid and triploid hybrid waterfrogs (Anura, Pelophylax esculentus).

    Science.gov (United States)

    Hoffmann, Alexandra; Reyer, Heinz-Ulrich

    2013-12-04

    In anurans, differences in male mating calls have intensively been studied with respect to taxonomic classification, phylogeographic comparisons among different populations and sexual selection. Although overall successful, there is often much unexplained variation in these studies. Potential causes for such variation include differences among genotypes and breeding systems, as well as differences between populations. We investigated how these three factors affect call properties in male water frogs of Pelophylax lessonae (genotype LL), P. ridibundus (RR) and their interspecific hybrid P. esculentus which comes in diploid (LR) and triploid types (LLR, LRR). We investigated five call parameters that all showed a genomic dosage effect, i.e. they either decreased or increased with the L/R ratio in the order LL-LLR-LR-LRR-RR. Not all parameters differentiated equally well between the five genotypes, but combined they provided a good separation. Two of the five call parameters were also affected by the breeding system. Calls of diploid LR males varied, depending on whether these males mated with one or both of the parental species (diploid systems) or triploid hybrids (mixed ploidy systems). With the exception of the northernmost mixed-ploidy population, call differences were not related to the geographic location of the population and they were not correlated with genetic distances in the R and L genomes. We found an influence of all three tested factors on call parameters, with the effect size decreasing from genotype through breeding system to geographic location of the population. Overall, results were in line with predictions from a dosage effect in L/R ratios, but in three call parameters all three hybrid types were more similar to one or the other parental species. Also calls of diploid hybrids varied between breeding systems in agreement with the sexual host required for successful reproduction. The lack of hybrid call differences in a mixed-ploidy population at

  8. Band-structure calculations of noble-gas and alkali halide solids using accurate Kohn-Sham potentials with self-interaction correction

    International Nuclear Information System (INIS)

    Li, Y.; Krieger, J.B.; Norman, M.R.; Iafrate, G.J.

    1991-01-01

    The optimized-effective-potential (OEP) method and a method developed recently by Krieger, Li, and Iafrate (KLI) are applied to the band-structure calculations of noble-gas and alkali halide solids employing the self-interaction-corrected (SIC) local-spin-density (LSD) approximation for the exchange-correlation energy functional. The resulting band gaps from both calculations are found to be in fair agreement with the experimental values. The discrepancies are typically within a few percent with results that are nearly the same as those of previously published orbital-dependent multipotential SIC calculations, whereas the LSD results underestimate the band gaps by as much as 40%. As in the LSD---and it is believed to be the case even for the exact Kohn-Sham potential---both the OEP and KLI predict valence-band widths which are narrower than those of experiment. In all cases, the KLI method yields essentially the same results as the OEP

  9. Structural variation and rates of genome evolution in the grass family seen through comparison of sequences of genomes greatly differing in size.

    Science.gov (United States)

    Dvorak, Jan; Wang, Le; Zhu, Tingting; Jorgensen, Chad M; Deal, Karin R; Dai, Xiongtao; Dawson, Matthew W; Müller, Hans-Georg; Luo, Ming-Cheng; Ramasamy, Ramesh K; Dehghani, Hamid; Gu, Yong Q; Gill, Bikram S; Distelfeld, Assaf; Devos, Katrien M; Qi, Peng; You, Frank M; Gulick, Patrick J; McGuire, Patrick E

    2018-05-16

    Homology was searched with genes annotated in the Aegilops tauschii pseudomolecules against genes annotated in the pseudomolecules of tetraploid wild emmer wheat, Brachypodium distachyon, sorghum, and rice. Similar searches were initiated with genes annotated in the rice pseudomolecules. Matrices of colinear genes and rearrangements in their order were constructed. Optical Bionano genome maps were constructed and used to validate rearrangements unique to the wild emmer and Ae. tauschii genomes. Most common rearrangements were short paracentric inversions and short intrachromosomal translocations. Intrachromosomal translocations outnumbered segmental intrachromosomal duplications. The densities of paracentric inversion lengths were approximated by exponential distributions in all six genomes. Densities of colinear genes along the Ae. tauschii chromosomes were highly correlated with meiotic recombination rates but those of rearrangements were not, suggesting different causes of the erosion of gene colinearity and evolution of major chromosome rearrangements. Frequent rearrangements sharing breakpoints suggested that chromosomes have been rearranged recurrently at some sites. The distal 4 Mb of the short arms of rice chromosomes Os11 and Os12 and corresponding regions in the sorghum, B. distachyon, and Triticeae genomes contain clusters of interstitial translocations including from 1 to 7 colinear genes. The rates of acquisition of major rearrangements were greater in the wild emmer wheat and Ae. tauschii genomes than in the lineage preceding their divergence or in the B. distachyon, rice, and sorghum lineages. It is suggested that synergy between large quantities of dynamic transposable elements and annual growth habit caused the fast evolution of the Triticeae genomes. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  10. Genomic Structural Variations Affecting Virulence During Clonal Expansion of Pseudomonas syringae pv. actinidiae Biovar 3 in Europe.

    Science.gov (United States)

    Firrao, Giuseppe; Torelli, Emanuela; Polano, Cesare; Ferrante, Patrizia; Ferrini, Francesca; Martini, Marta; Marcelletti, Simone; Scortichini, Marco; Ermacora, Paolo

    2018-01-01

    Pseudomonas syringae pv. actinidiae (Psa) biovar 3 caused pandemic bacterial canker of Actinidia chinensis and Actinidia deliciosa since 2008. In Europe, the disease spread rapidly in the kiwifruit cultivation areas from a single introduction. In this study, we investigated the genomic diversity of Psa biovar 3 strains during the primary clonal expansion in Europe using single molecule real-time (SMRT), Illumina and Sanger sequencing technologies. We recorded evidences of frequent mobilization and loss of transposon Tn6212, large chromosome inversions, and ectopic integration of IS sequences (remarkably ISPsy31, ISPsy36, and ISPsy37). While no phenotype change associated with Tn6212 mobilization could be detected, strains CRAFRU 12.29 and CRAFRU 12.50 did not elicit the hypersensitivity response (HR) on tobacco and eggplant leaves and were limited in their growth in kiwifruit leaves due to insertion of ISPsy31 and ISPsy36 in the hrpS and hrpR genes, respectively, interrupting the hrp cluster. Both strains had been isolated from symptomatic plants, suggesting coexistence of variant strains with reduced virulence together with virulent strains in mixed populations. The structural differences caused by rearrangements of self-genetic elements within European and New Zealand strains were comparable in number and type to those occurring among the European strains, in contrast with the significant difference in terms of nucleotide polymorphisms. We hypothesize a relaxation, during clonal expansion, of the selection limiting the accumulation of deleterious mutations associated with genome structural variation due to transposition of mobile elements. This consideration may be relevant when evaluating strategies to be adopted for epidemics management.

  11. Nuclear Species-Diagnostic SNP Markers Mined from 454 Amplicon Sequencing Reveal Admixture Genomic Structure of Modern Citrus Varieties

    Science.gov (United States)

    Curk, Franck; Ancillo, Gema; Ollitrault, Frédérique; Perrier, Xavier; Jacquemoud-Collet, Jean-Pierre; Garcia-Lor, Andres; Navarro, Luis; Ollitrault, Patrick

    2015-01-01

    Most cultivated Citrus species originated from interspecific hybridisation between four ancestral taxa (C. reticulata, C. maxima, C. medica, and C. micrantha) with limited further interspecific recombination due to vegetative propagation. This evolution resulted in admixture genomes with frequent interspecific heterozygosity. Moreover, a major part of the phenotypic diversity of edible citrus results from the initial differentiation between these taxa. Deciphering the phylogenomic structure of citrus germplasm is therefore essential for an efficient utilization of citrus biodiversity in breeding schemes. The objective of this work was to develop a set of species-diagnostic single nucleotide polymorphism (SNP) markers for the four Citrus ancestral taxa covering the nine chromosomes, and to use these markers to infer the phylogenomic structure of secondary species and modern cultivars. Species-diagnostic SNPs were mined from 454 amplicon sequencing of 57 gene fragments from 26 genotypes of the four basic taxa. Of the 1,053 SNPs mined from 28,507 kb sequence, 273 were found to be highly diagnostic for a single basic taxon. Species-diagnostic SNP markers (105) were used to analyse the admixture structure of varieties and rootstocks. This revealed C. maxima introgressions in most of the old and in all recent selections of mandarins, and suggested that C. reticulata × C. maxima reticulation and introgression processes were important in edible mandarin domestication. The large range of phylogenomic constitutions between C. reticulata and C. maxima revealed in mandarins, tangelos, tangors, sweet oranges, sour oranges, grapefruits, and orangelos is favourable for genetic association studies based on phylogenomic structures of the germplasm. Inferred admixture structures were in agreement with previous hypotheses regarding the origin of several secondary species and also revealed the probable origin of several acid citrus varieties. The developed species-diagnostic SNP

  12. Revised genomic structure of the human ghrelin gene and identification of novel exons, alternative splice variants and natural antisense transcripts

    Directory of Open Access Journals (Sweden)

    Herington Adrian C

    2007-08-01

    Full Text Available Abstract Background Ghrelin is a multifunctional peptide hormone expressed in a range of normal tissues and pathologies. It has been reported that the human ghrelin gene consists of five exons which span 5 kb of genomic DNA on chromosome 3 and includes a 20 bp non-coding first exon (20 bp exon 0. The availability of bioinformatic tools enabling comparative analysis and the finalisation of the human genome prompted us to re-examine the genomic structure of the ghrelin locus. Results We have demonstrated the presence of an additional novel exon (exon -1 and 5' extensions to exon 0 and 1 using comparative in silico analysis and have demonstrated their existence experimentally using RT-PCR and 5' RACE. A revised exon-intron structure demonstrates that the human ghrelin gene spans 7.2 kb and consists of six rather than five exons. Several ghrelin gene-derived splice forms were detected in a range of human tissues and cell lines. We have demonstrated ghrelin gene-derived mRNA transcripts that do not code for ghrelin, but instead may encode the C-terminal region of full-length preproghrelin (C-ghrelin, which contains the coding region for obestatin and a transcript encoding obestatin-only. Splice variants that differed in their 5' untranslated regions were also found, suggesting a role of these regions in the post-transcriptional regulation of preproghrelin translation. Finally, several natural antisense transcripts, termed ghrelinOS (ghrelin opposite strand transcripts, were demonstrated via orientation-specific RT-PCR, 5' RACE and in silico analysis of ESTs and cloned amplicons. Conclusion The sense and antisense alternative transcripts demonstrated in this study may function as non-coding regulatory RNA, or code for novel protein isoforms. This is the first demonstration of putative obestatin and C-ghrelin specific transcripts and these findings suggest that these ghrelin gene-derived peptides may also be produced independently of preproghrelin

  13. Symbolic complexity for nucleotide sequences: a sign of the genome structure

    International Nuclear Information System (INIS)

    Salgado-García, R; Ugalde, E

    2016-01-01

    We introduce a method for estimating the complexity function (which counts the number of observable words of a given length) of a finite symbolic sequence, which we use to estimate the complexity function of coding DNA sequences for several species of the Hominidae family. In all cases, the obtained symbolic complexities show the same characteristic behavior: exponential growth for small word lengths, followed by linear growth for larger word lengths. The symbolic complexities of the species we consider exhibit a systematic trend in correspondence with the phylogenetic tree. Using our method, we estimate the complexity function of sequences obtained by some known evolution models, and in some cases we observe the characteristic exponential-linear growth of the Hominidae coding DNA complexity. Analysis of the symbolic complexity of sequences obtained from a specific evolution model points to the following conclusion: linear growth arises from the random duplication of large segments during the evolution of the genome, while the decrease in the overall complexity from one species to another is due to a difference in the speed of accumulation of point mutations. (paper)

  14. Prospects and limitations of full-text index structures in genome analysis

    Science.gov (United States)

    Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter

    2012-01-01

    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621

  15. Partial structure of the phylloxin gene from the giant monkey frog, Phyllomedusa bicolor: parallel cloning of precursor cDNA and genomic DNA from lyophilized skin secretion.

    Science.gov (United States)

    Chen, Tianbao; Gagliardo, Ron; Walker, Brian; Zhou, Mei; Shaw, Chris

    2005-12-01

    Phylloxin is a novel prototype antimicrobial peptide from the skin of Phyllomedusa bicolor. Here, we describe parallel identification and sequencing of phylloxin precursor transcript (mRNA) and partial gene structure (genomic DNA) from the same sample of lyophilized skin secretion using our recently-described cloning technique. The open-reading frame of the phylloxin precursor was identical in nucleotide sequence to that previously reported and alignment with the nucleotide sequence derived from genomic DNA indicated the presence of a 175 bp intron located in a near identical position to that found in the dermaseptins. The highly-conserved structural organization of skin secretion peptide genes in P. bicolor can thus be extended to include that encoding phylloxin (plx). These data further reinforce our assertion that application of the described methodology can provide robust genomic/transcriptomic/peptidomic data without the need for specimen sacrifice.

  16. Phylodynamics with Migration: A Computational Framework to Quantify Population Structure from Genomic Data.

    Science.gov (United States)

    Kühnert, Denise; Stadler, Tanja; Vaughan, Timothy G; Drummond, Alexei J

    2016-08-01

    When viruses spread, outbreaks can be spawned in previously unaffected regions. Depending on the time and mode of introduction, each regional outbreak can have its own epidemic dynamics. The migration and phylodynamic processes are often intertwined and need to be taken into account when analyzing temporally and spatially structured virus data. In this article, we present a fully probabilistic approach for the joint reconstruction of phylodynamic history in structured populations (such as geographic structure) based on a multitype birth-death process. This approach can be used to quantify the spread of a pathogen in a structured population. Changes in epidemic dynamics through time within subpopulations are incorporated through piecewise constant changes in transmission parameters.We analyze a global human influenza H3N2 virus data set from a geographically structured host population to demonstrate how seasonal dynamics can be inferred simultaneously with the phylogeny and migration process. Our results suggest that the main migration path among the northern, tropical, and southern region represented in the sample analyzed here is the one leading from the tropics to the northern region. Furthermore, the time-dependent transmission dynamics between and within two HIV risk groups, heterosexuals and injecting drug users, in the Latvian HIV epidemic are investigated. Our analyses confirm that the Latvian HIV epidemic peaking around 2001 was mainly driven by the injecting drug user risk group. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. The population genomic landscape of human genetic structure, admixture history and local adaptation in Peninsular Malaysia.

    Science.gov (United States)

    Deng, Lian; Hoh, Boon Peng; Lu, Dongsheng; Fu, Ruiqing; Phipps, Maude E; Li, Shilin; Nur-Shafawati, Ab Rajab; Hatin, Wan Isa; Ismail, Endom; Mokhtar, Siti Shuhada; Jin, Li; Zilfalil, Bin Alwi; Marshall, Christian R; Scherer, Stephen W; Al-Mulla, Fahd; Xu, Shuhua

    2014-09-01

    Peninsular Malaysia is a strategic region which might have played an important role in the initial peopling and subsequent human migrations in Asia. However, the genetic diversity and history of human populations--especially indigenous populations--inhabiting this area remain poorly understood. Here, we conducted a genome-wide study using over 900,000 single nucleotide polymorphisms (SNPs) in four major Malaysian ethnic groups (MEGs; Malay, Proto-Malay, Senoi and Negrito), and made comparisons of 17 world-wide populations. Our data revealed that Peninsular Malaysia has greater genetic diversity corresponding to its role as a contact zone of both early and recent human migrations in Asia. However, each single Orang Asli (indigenous) group was less diverse with a smaller effective population size (N(e)) than a European or an East Asian population, indicating a substantial isolation of some duration for these groups. All four MEGs were genetically more similar to Asian populations than to other continental groups, and the divergence time between MEGs and East Asian populations (12,000--6,000 years ago) was also much shorter than that between East Asians and Europeans. Thus, Malaysian Orang Asli groups, despite their significantly different features, may share a common origin with the other Asian groups. Nevertheless, we identified traces of recent gene flow from non-Asians to MEGs. Finally, natural selection signatures were detected in a batch of genes associated with immune response, human height, skin pigmentation, hair and facial morphology and blood pressure in MEGs. Notable examples include SYN3 which is associated with human height in all Orang Asli groups, a height-related gene (PNPT1) and two blood pressure-related genes (CDH13 and PAX5) in Negritos. We conclude that a long isolation period, subsequent gene flow and local adaptations have jointly shaped the genetic architectures of MEGs, and this study provides insight into the peopling and human migration

  18. Genome structure and reproductive behaviour influence the evolutionary potential of a fungal phytopathogen.

    Directory of Open Access Journals (Sweden)

    Guillaume Daverdin

    Full Text Available Modern agriculture favours the selection and spread of novel plant diseases. Furthermore, crop genetic resistance against pathogens is often rendered ineffective within a few years of its commercial deployment. Leptosphaeria maculans, the cause of phoma stem canker of oilseed rape, develops gene-for-gene interactions with its host plant, and has a high evolutionary potential to render ineffective novel sources of resistance in crops. Here, we established a four-year field experiment to monitor the evolution of populations confronted with the newly released Rlm7 resistance and to investigate the nature of the mutations responsible for virulence against Rlm7. A total of 2551 fungal isolates were collected from experimental crops of a Rlm7 cultivar or a cultivar without Rlm7. All isolates were phenotyped for virulence and a subset was genotyped with neutral genetic markers. Virulent isolates were investigated for molecular events at the AvrLm4-7 locus. Whilst virulent isolates were not found in neighbouring crops, their frequency had reached 36% in the experimental field after four years. An extreme diversity of independent molecular events leading to virulence was identified in populations, with large-scale Repeat Induced Point mutations or complete deletion of AvrLm4-7 being the most frequent. Our data suggest that increased mutability of fungal genes involved in the interactions with plants is directly related to their genomic environment and reproductive system. Thus, rapid allelic diversification of avirulence genes can be generated in L. maculans populations in a single field provided that large population sizes and sexual reproduction are favoured by agricultural practices.

  19. Long span DNA paired-end-tag (DNA-PET sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons.

    Directory of Open Access Journals (Sweden)

    Fei Yao

    Full Text Available Structural variations (SVs contribute significantly to the variability of the human genome and extensive genomic rearrangements are a hallmark of cancer. While genomic DNA paired-end-tag (DNA-PET sequencing is an attractive approach to identify genomic SVs, the current application of PET sequencing with short insert size DNA can be insufficient for the comprehensive mapping of SVs in low complexity and repeat-rich genomic regions. We employed a recently developed procedure to generate PET sequencing data using large DNA inserts of 10-20 kb and compared their characteristics with short insert (1 kb libraries for their ability to identify SVs. Our results suggest that although short insert libraries bear an advantage in identifying small deletions, they do not provide significantly better breakpoint resolution. In contrast, large inserts are superior to short inserts in providing higher physical genome coverage for the same sequencing cost and achieve greater sensitivity, in practice, for the identification of several classes of SVs, such as copy number neutral and complex events. Furthermore, our results confirm that large insert libraries allow for the identification of SVs within repetitive sequences, which cannot be spanned by short inserts. This provides a key advantage in studying rearrangements in cancer, and we show how it can be used in a fusion-point-guided-concatenation algorithm to study focally amplified regions in cancer.

  20. Revisiting Francisella tularensis subsp. holarctica, Causative Agent of Tularemia in Germany With Bioinformatics: New Insights in Genome Structure, DNA Methylation and Comparative Phylogenetic Analysis

    Directory of Open Access Journals (Sweden)

    Anne Busch

    2018-03-01

    Full Text Available Francisella (F. tularensis is a highly virulent, Gram-negative bacterial pathogen and the causative agent of the zoonotic disease tularemia. Here, we generated, analyzed and characterized a high quality circular genome sequence of the F. tularensis subsp. holarctica strain 12T0050 that caused fatal tularemia in a hare. Besides the genomic structure, we focused on the analysis of oriC, unique to the Francisella genus and regulating replication in and outside hosts and the first report on genomic DNA methylation of a Francisella strain. The high quality genome was used to establish and evaluate a diagnostic whole genome sequencing pipeline. A genotyping strategy for F. tularensis was developed using various bioinformatics tools for genotyping. Additionally, whole genome sequences of F. tularensis subsp. holarctica isolates isolated in the years 2008–2015 in Germany were generated. A phylogenetic analysis allowed to determine the genetic relatedness of these isolates and confirmed the highly conserved nature of F. tularensis subsp. holarctica.

  1. Genomics and structure/function studies of Rhabdoviridae proteins involved in replication and transcription.

    Science.gov (United States)

    Assenberg, R; Delmas, O; Morin, B; Graham, S C; De Lamballerie, X; Laubert, C; Coutard, B; Grimes, J M; Neyts, J; Owens, R J; Brandt, B W; Gorbalenya, A; Tucker, P; Stuart, D I; Canard, B; Bourhy, H

    2010-08-01

    Some mammalian rhabdoviruses may infect humans, and also infect invertebrates, dogs, and bats, which may act as vectors transmitting viruses among different host species. The VIZIER programme, an EU-funded FP6 program, has characterized viruses that belong to the Vesiculovirus, Ephemerovirus and Lyssavirus genera of the Rhabdoviridae family to perform ground-breaking research on the identification of potential new drug targets against these RNA viruses through comprehensive structural characterization of the replicative machinery. The contribution of VIZIER programme was of several orders. First, it contributed substantially to research aimed at understanding the origin, evolution and diversity of rhabdoviruses. This diversity was then used to obtain further structural information on the proteins involved in replication. Two strategies were used to produce recombinant proteins by expression of both full length or domain constructs in either E. coli or insect cells, using the baculovirus system. In both cases, parallel cloning and expression screening at small-scale of multiple constructs based on different viruses including the addition of fusion tags, was key to the rapid generation of expression data. As a result, some progress has been made in the VIZIER programme towards dissecting the multi-functional L protein into components suitable for structural and functional studies. However, the phosphoprotein polymerase co-factor and the structural matrix protein, which play a number of roles during viral replication and drives viral assembly, have both proved much more amenable to structural biology. Applying the multi-construct/multi-virus approach central to protein production processes in VIZIER has yielded new structural information which may ultimately be exploitable in the derivation of novel ways of intervening in viral replication. Copyright 2010 Elsevier B.V. All rights reserved.

  2. NMR solution structure of poliovirus uridylyated peptide linked to the genome (VPgpU)

    Science.gov (United States)

    Schein, Catherine H.; Oezguen, Numan; van der Heden van Noort, Gerbrand J.; Filippov, Dmitri V.; Paul, Aniko; Kumar, Eric; Braun, Werner

    2010-01-01

    Picornaviruses have a 22–24 amino acid peptide, VPg, bound covalently at the 5’ end of their RNA, that is essential for replication. VPgs are uridylylated at a conserved Tyrosine to form VPgpU, the primer of RNA synthesis by the viral polymerase. This first complete structure for any uridylylated VPg, of poliovirus type 1 (PV1)-VPgpU, shows that conserved amino acids in VPg stabilize the bound UMP, with the uridine atoms involved in base pairing and chain elongation projected outward. Comparing this structure to PV1-VPg and partial structures of VPg/VPgpU from other picornaviruses suggests that enteroviral polymerases require a more stable VPg structure than does the distantly related aphthovirus, foot and mouth disease virus (FMDV). The glutamine residue at the C-terminus of PV1-VPgpU lies in back of the uridine base and may stabilize its position during chain elongation and/or contribute to base specificity. Under in vivo-like conditions with the authentic cre(2C) hairpin RNA and Mg++, 5-methylUTP cannot compete with UTP for VPg uridylyation in an in vitro uridylyation assay, but both nucleotides are equally incorporated by PV1-polymerase with Mn++ and a poly-A RNA template. This indicates the 5 position is recognized under in vivo conditions. The compact VPgpU structure docks within the active site cavity of the PV-polymerase, close to the position seen for the fragment of FMDV-VPgpU with its polymerase. This structure could aid in design of novel enterovirus inhibitors, and stabilization upon uridylylation may also be pertinent for post-translational uridylylation reactions that underlie other biological processes. PMID:20441784

  3. A scale-free structure prior for graphical models with applications in functional genomics.

    Directory of Open Access Journals (Sweden)

    Paul Sheridan

    Full Text Available The problem of reconstructing large-scale, gene regulatory networks from gene expression data has garnered considerable attention in bioinformatics over the past decade with the graphical modeling paradigm having emerged as a popular framework for inference. Analysis in a full Bayesian setting is contingent upon the assignment of a so-called structure prior-a probability distribution on networks, encoding a priori biological knowledge either in the form of supplemental data or high-level topological features. A key topological consideration is that a wide range of cellular networks are approximately scale-free, meaning that the fraction, , of nodes in a network with degree is roughly described by a power-law with exponent between and . The standard practice, however, is to utilize a random structure prior, which favors networks with binomially distributed degree distributions. In this paper, we introduce a scale-free structure prior for graphical models based on the formula for the probability of a network under a simple scale-free network model. Unlike the random structure prior, its scale-free counterpart requires a node labeling as a parameter. In order to use this prior for large-scale network inference, we design a novel Metropolis-Hastings sampler for graphical models that includes a node labeling as a state space variable. In a simulation study, we demonstrate that the scale-free structure prior outperforms the random structure prior at recovering scale-free networks while at the same time retains the ability to recover random networks. We then estimate a gene association network from gene expression data taken from a breast cancer tumor study, showing that scale-free structure prior recovers hubs, including the previously unknown hub SLC39A6, which is a zinc transporter that has been implicated with the spread of breast cancer to the lymph nodes. Our analysis of the breast cancer expression data underscores the value of the scale

  4. The Complete Chloroplast Genome Sequences of Five Epimedium Species: Lights into Phylogenetic and Taxonomic Analyses

    Science.gov (United States)

    Zhang, Yanjun; Du, Liuwen; Liu, Ao; Chen, Jianjun; Wu, Li; Hu, Weiming; Zhang, Wei; Kim, Kyunghee; Lee, Sang-Choon; Yang, Tae-Jin; Wang, Ying

    2016-01-01

    Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp) genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR) region and the single-copy (SC) boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR) and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants. PMID:27014326

  5. The complete chloroplast genome sequences of five Epimedium species: lights into phylogenetic and taxonomic analyses

    Directory of Open Access Journals (Sweden)

    Yanjun eZhang

    2016-03-01

    Full Text Available Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR region and the single-copy (SC boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants.

  6. Bioinformatical approaches to RNA structure prediction & Sequencing of an ancient human genome

    DEFF Research Database (Denmark)

    Lindgreen, Stinus

    Stinus Lindgreen has been working in two different fields during his Ph.D. The first part has been focused on computational approaches to predict the structure of non-coding RNA molecules at the base pairing level. This has resulted in the analysis of various measures of the base pairing potentia...

  7. Integrated view of genome structure and sequence of a single DNA molecule in a nanofluidic device

    DEFF Research Database (Denmark)

    Marie, Rodolphe; Pedersen, Jonas Nyvold; L. V. Bauer, David

    2013-01-01

    as well as unique structural variation. Following its mapping, a molecule of interest was rescued fromthe chip;amplified and localized to a chromosome by FISH; and interrogated down to 1-bp resolution with a commercial sequencer, thereby reconciling haplotype-phased chromosome substructure with sequence....

  8. Genome-wide discovery and verification of novel structured RNAs in Plasmodium falciparum

    DEFF Research Database (Denmark)

    Mourier, Tobias; Carret, Celine; Kyes, Sue

    2008-01-01

    ncRNAs in P. falciparum and are not represented in any RNA databases. We provide supporting evidence for purifying selection acting on the experimentally verified ncRNAs by comparing the nucleotide substitutions in the predicted ncRNA candidate structures in P. falciparum with the closely related...

  9. Genomic patterns in Acropora cervicornis show extensive population structure and variable genetic diversity.

    Science.gov (United States)

    Drury, Crawford; Schopmeyer, Stephanie; Goergen, Elizabeth; Bartels, Erich; Nedimyer, Ken; Johnson, Meaghan; Maxwell, Kerry; Galvan, Victor; Manfrino, Carrie; Lirman, Diego

    2017-08-01

    Threatened Caribbean coral communities can benefit from high-resolution genetic data used to inform management and conservation action. We use Genotyping by Sequencing (GBS) to investigate genetic patterns in the threatened coral, Acropora cervicornis , across the Florida Reef Tract (FRT) and the western Caribbean. Results show extensive population structure at regional scales and resolve previously unknown structure within the FRT. Different regions also exhibit up to threefold differences in genetic diversity (He), suggesting targeted management based on the goals and resources of each population is needed. Patterns of genetic diversity have a strong spatial component, and our results show Broward and the Lower Keys are among the most diverse populations in Florida. The genetic diversity of Caribbean staghorn coral is concentrated within populations and within individual reefs (AMOVA), highlighting the complex mosaic of population structure. This variance structure is similar over regional and local scales, which suggests that in situ nurseries are adequately capturing natural patterns of diversity, representing a resource that can replicate the average diversity of wild assemblages, serving to increase intraspecific diversity and potentially leading to improved biodiversity and ecosystem function. Results presented here can be translated into specific goals for the recovery of A. cervicornis , including active focus on low diversity areas, protection of high diversity and connectivity, and practical thresholds for responsible restoration.

  10. Constraining Genome-Scale Models to Represent the Bow Tie Structure of Metabolism for 13C Metabolic Flux Analysis

    Directory of Open Access Journals (Sweden)

    Tyler W. H. Backman

    2018-01-01

    Full Text Available Determination of internal metabolic fluxes is crucial for fundamental and applied biology because they map how carbon and electrons flow through metabolism to enable cell function. 13 C Metabolic Flux Analysis ( 13 C MFA and Two-Scale 13 C Metabolic Flux Analysis (2S- 13 C MFA are two techniques used to determine such fluxes. Both operate on the simplifying approximation that metabolic flux from peripheral metabolism into central “core” carbon metabolism is minimal, and can be omitted when modeling isotopic labeling in core metabolism. The validity of this “two-scale” or “bow tie” approximation is supported both by the ability to accurately model experimental isotopic labeling data, and by experimentally verified metabolic engineering predictions using these methods. However, the boundaries of core metabolism that satisfy this approximation can vary across species, and across cell culture conditions. Here, we present a set of algorithms that (1 systematically calculate flux bounds for any specified “core” of a genome-scale model so as to satisfy the bow tie approximation and (2 automatically identify an updated set of core reactions that can satisfy this approximation more efficiently. First, we leverage linear programming to simultaneously identify the lowest fluxes from peripheral metabolism into core metabolism compatible with the observed growth rate and extracellular metabolite exchange fluxes. Second, we use Simulated Annealing to identify an updated set of core reactions that allow for a minimum of fluxes into core metabolism to satisfy these experimental constraints. Together, these methods accelerate and automate the identification of a biologically reasonable set of core reactions for use with 13 C MFA or 2S- 13 C MFA, as well as provide for a substantially lower set of flux bounds for fluxes into the core as compared with previous methods. We provide an open source Python implementation of these algorithms at https://github.com/JBEI/limitfluxtocore.

  11. Protein Structure Initiative Material Repository: an open shared public resource of structural genomics plasmids for the biological community

    Science.gov (United States)

    Cormier, Catherine Y.; Mohr, Stephanie E.; Zuo, Dongmei; Hu, Yanhui; Rolfs, Andreas; Kramer, Jason; Taycher, Elena; Kelley, Fontina; Fiacco, Michael; Turnbull, Greggory; LaBaer, Joshua

    2010-01-01

    The Protein Structure Initiative Material Repository (PSI-MR; http://psimr.asu.edu) provides centralized storage and distribution for the protein expression plasmids created by PSI researchers. These plasmids are a resource that allows the research community to dissect the biological function of proteins whose structures have been identified by the PSI. The plasmid annotation, which includes the full length sequence, vector information and associated publications, is stored in a freely available, searchable database called DNASU (http://dnasu.asu.edu). Each PSI plasmid is also linked to a variety of additional resources, which facilitates cross-referencing of a particular plasmid to protein annotations and experimental data. Plasmid samples can be requested directly through the website. We have also developed a novel strategy to avoid the most common concern encountered when distributing plasmids namely, the complexity of material transfer agreement (MTA) processing and the resulting delays this causes. The Expedited Process MTA, in which we created a network of institutions that agree to the terms of transfer in advance of a material request, eliminates these delays. Our hope is that by creating a repository of expression-ready plasmids and expediting the process for receiving these plasmids, we will help accelerate the accessibility and pace of scientific discovery. PMID:19906724

  12. Genomic survey, gene expression analysis and structural modeling suggest diverse roles of DNA methyltransferases in legumes.

    Directory of Open Access Journals (Sweden)

    Rohini Garg

    Full Text Available DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases, namely Methyltransferase (MET, Chromomethylase (CMT and Domains Rearranged Methyltransferase (DRM, which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2 subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes.

  13. National Academy of Sciences and Academy of Sciences of the USSR workshop on structure of the eucaryotic genome and regulation of its expression. Final report

    Energy Technology Data Exchange (ETDEWEB)

    1990-12-31

    This report provides a brief overview of the Workshop on Structure of the Eukaryotic Genome and Regulation of its Expression held in Tbilisi, Georgia, USSR. The report describes the presentations made at the meeting but also goes on to describe the state of molecular biology and genetics research in the Soviet Union and makes recommendations on how to improve future such meetings.

  14. National Academy of Sciences and Academy of Sciences of the USSR workshop on structure of the eucaryotic genome and regulation of its expression

    Energy Technology Data Exchange (ETDEWEB)

    1990-01-01

    This report provides a brief overview of the Workshop on Structure of the Eukaryotic Genome and Regulation of its Expression held in Tbilisi, Georgia, USSR. The report describes the presentations made at the meeting but also goes on to describe the state of molecular biology and genetics research in the Soviet Union and makes recommendations on how to improve future such meetings.

  15. Molecular Cloning, Genomic Organization and Developmental Regulation of a Novel Receptor from Drosophila melanogaster Structurally Related to Gonadotropin-Releasing Hormone Receptors from Vertebrates

    DEFF Research Database (Denmark)

    Hauser, Frank; Søndergaard, Leif; Grimmelikhuijzen, Cornelis J.P.

    1998-01-01

    After screening the data base of the BerkeleyDrosophilaGenome Project with a sequence coding for the transmembrane region of a G protein-coupled receptor, we found thatDrosophilamight contain a gene coding for a receptor that is structurally related to the Gonadotropin-Releasing Hormone (GnRH) re...

  16. Population structure revealed by different marker types (SSR or DArT) has an impact on the results of genome-wide association mapping in European barley cultivars

    NARCIS (Netherlands)

    Matthies, I.E.; Hintum, van T.J.L.; Weise, S.; Röder, M.S.

    2012-01-01

    Diversity arrays technology (DArT) and simple sequence repeat (SSR) markers were applied to investigate population structure, extent of linkage disequilibrium and genetic diversity (kinship) on a genome-wide level in European barley (Hordeum vulgare L.) cultivars. A set of 183 varieties could be

  17. Genome packaging in viruses

    OpenAIRE

    Sun, Siyang; Rao, Venigalla B.; Rossmann, Michael G.

    2010-01-01

    Genome packaging is a fundamental process in a viral life cycle. Many viruses assemble preformed capsids into which the genomic material is subsequently packaged. These viruses use a packaging motor protein that is driven by the hydrolysis of ATP to condense the nucleic acids into a confined space. How these motor proteins package viral genomes had been poorly understood until recently, when a few X-ray crystal structures and cryo-electron microscopy structures became available. Here we discu...

  18. Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia.

    Science.gov (United States)

    Metspalu, Mait; Romero, Irene Gallego; Yunusbayev, Bayazit; Chaubey, Gyaneshwer; Mallick, Chandana Basu; Hudjashov, Georgi; Nelis, Mari; Mägi, Reedik; Metspalu, Ene; Remm, Maido; Pitchappan, Ramasamy; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Kivisild, Toomas

    2011-12-09

    South Asia harbors one of the highest levels genetic diversity in Eurasia, which could be interpreted as a result of its long-term large effective population size and of admixture during its complex demographic history. In contrast to Pakistani populations, populations of Indian origin have been underrepresented in previous genomic scans of positive selection and population structure. Here we report data for more than 600,000 SNP markers genotyped in 142 samples from 30 ethnic groups in India. Combining our results with other available genome-wide data, we show that Indian populations are characterized by two major ancestry components, one of which is spread at comparable frequency and haplotype diversity in populations of South and West Asia and the Caucasus. The second component is more restricted to South Asia and accounts for more than 50% of the ancestry in Indian populations. Haplotype diversity associated with these South Asian ancestry components is significantly higher than that of the components dominating the West Eurasian ancestry palette. Modeling of the observed haplotype diversities suggests that both Indian ancestry components are older than the purported Indo-Aryan invasion 3,500 YBP. Consistent with the results of pairwise genetic distances among world regions, Indians share more ancestry signals with West than with East Eurasians. However, compared to Pakistani populations, a higher proportion of their genes show regionally specific signals of high haplotype homozygosity. Among such candidates of positive selection in India are MSTN and DOK5, both of which have potential implications in lipid metabolism and the etiology of type 2 diabetes. Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  19. Investigation of the spatial structure and interactions of the genome at sub-kilobase-pair resolution using T2C.

    Science.gov (United States)

    Kolovos, Petros; Brouwer, Rutger W W; Kockx, Christel E M; Lesnussa, Michael; Kepper, Nick; Zuin, Jessica; Imam, A M Ali; van de Werken, Harmen J G; Wendt, Kerstin S; Knoch, Tobias A; van IJcken, Wilfred F J; Grosveld, Frank

    2018-03-01

    Chromosome conformation capture (3C) and its derivatives (e.g., 4C, 5C and Hi-C) are used to analyze the 3D organization of genomes. We recently developed targeted chromatin capture (T2C), an inexpensive method for studying the 3D organization of genomes, interactomes and structural changes associated with gene regulation, the cell cycle, and cell survival and development. Here, we present the protocol for T2C based on capture, describing all experimental steps and bio-informatic tools in full detail. T2C offers high resolution, a large dynamic interaction frequency range and a high signal-to-noise ratio. Its resolution is determined by the resulting fragment size of the chosen restriction enzyme, which can lead to sub-kilobase-pair resolution. T2C's high coverage allows the identification of the interactome of each individual DNA fragment, which makes binning of reads (often used in other methods) basically unnecessary. Notably, T2C requires low sequencing efforts. T2C also allows multiplexing of samples for the direct comparison of multiple samples. It can be used to study topologically associating domains (TADs), determining their position, shape, boundaries, and intra- and inter-domain interactions, as well as the composition of aggregated loops, interactions between nucleosomes, individual transcription factor binding sites, and promoters and enhancers. T2C can be performed by any investigator with basic skills in molecular biology techniques in ∼7-8 d. Data analysis requires basic expertise in bioinformatics and in Linux and Python environments.

  20. Genomic structure and promoter functional analysis of GnRH3 gene in large yellow croaker (Larimichthys crocea).

    Science.gov (United States)

    Huang, Wei; Zhang, Jianshe; Liao, Zhi; Lv, Zhenming; Wu, Huifei; Zhu, Aiyi; Wu, Changwen

    2016-01-15

    Gonadotropin-releasing hormone III (GnRH3) is considered to be a key neurohormone in fish reproduction control. In the present study, the cDNA and genomic sequences of GnRH3 were cloned and characterized from large yellow croaker Larimichthys crocea. The cDNA encoded a protein of 99 amino acids with four functional motifs. The full-length genome sequence was composed of 3797 nucleotides, including four exons and three introns. Higher identities of amino acid sequences and conserved exon-intron organizations were found between LcGnRH3 and other GnRH3 genes. In addition, some special features of the sequences were detected in partial species. For example, two specific residues (V and A) were found in the family Sciaenidae, and the unique 75-72 bp type of the open reading frame 2 and 3 existed in the family Cyprinidae. Analysis of the 2576 bp promoter fragment of LcGnRH3 showed a number of transcription factor binding sites, such as AP1, CREB, GATA-1, HSF, FOXA2, and FOXL1. Promoter functional analysis using an EGFP reporter fusion in zebrafish larvae presented positive signals in the brain, including the olfactory region, the terminal nerve ganglion, the telencephalon, and the hypothalamus. The expression pattern was generally consistent with the endogenous GnRH3 GFP-expressing transgenic zebrafish lines, but the details were different. These results indicate that the structure and function of LcGnRH3 are generally similar to the other teleost GnRH3 genes, but there exist some distinctions among them. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Gene expression in chicken reveals correlation with structural genomic features and conserved patterns of transcription in the terrestrial vertebrates.

    Directory of Open Access Journals (Sweden)

    Haisheng Nie

    Full Text Available BACKGROUND: The chicken is an important agricultural and avian-model species. A survey of gene expression in a range of different tissues will provide a benchmark for understanding expression levels under normal physiological conditions in birds. With expression data for birds being very scant, this benchmark is of particular interest for comparative expression analysis among various terrestrial vertebrates. METHODOLOGY/PRINCIPAL FINDINGS: We carried out a gene expression survey in eight major chicken tissues using whole genome microarrays. A global picture of gene expression is presented for the eight tissues, and tissue specific as well as common gene expression were identified. A Gene Ontology (GO term enrichment analysis showed that tissue-specific genes are enriched with GO terms reflecting the physiological functions of the specific tissue, and housekeeping genes are enriched with GO terms related to essential biological functions. Comparisons of structural genomic features between tissue-specific genes and housekeeping genes show that housekeeping genes are more compact. Specifically, coding sequence and particularly introns are shorter than genes that display more variation in expression between tissues, and in addition intergenic space was also shorter. Meanwhile, housekeeping genes are more likely to co-localize with other abundantly or highly expressed genes on the same chromosomal regions. Furthermore, comparisons of gene expression in a panel of five common tissues between birds, mammals and amphibians showed that the expression patterns across tissues are highly similar for orthologous genes compared to random gene pairs within each pair-wise comparison, indicating a high degree of functional conservation in gene expression among terrestrial vertebrates. CONCLUSIONS: The housekeeping genes identified in this study have shorter gene length, shorter coding sequence length, shorter introns, and shorter intergenic regions, there seems

  2. Structural organization of poliovirus RNA replication is mediated by viral proteins of the P2 genomic region

    International Nuclear Information System (INIS)

    Bienz, K.; Egger, D.; Troxler, M.; Pasamontes, L.

    1990-01-01

    Transcriptionally active replication complexes bound to smooth membrane vesicles were isolated from poliovirus-infected cells. In electron microscopic, negatively stained preparations, the replication complex appeared as an irregularly shaped, oblong structure attached to several virus-induced vesicles of a rosettelike arrangement. Electron microscopic immunocytochemistry of such preparations demonstrated that the poliovirus replication complex contains the proteins coded by the P2 genomic region (P2 proteins) in a membrane-associated form. In addition, the P2 proteins are also associated with viral RNA, and they can be cross-linked to viral RNA by UV irradiation. Guanidine hydrochloride prevented the P2 proteins from becoming membrane bound but did not change their association with viral RNA. The findings allow the conclusion that the protein 2C or 2C-containing precursor(s) is responsible for the attachment of the viral RNA to the vesicular membrane and for the spatial organization of the replication complex necessary for its proper functioning in viral transcription. A model for the structure of the viral replication complex and for the function of the 2C-containing P2 protein(s) and the vesicular membranes is proposed

  3. Genome-wide analysis of Epstein-Barr virus identifies variants and genes associated with gastric carcinoma and population structure.

    Science.gov (United States)

    Yao, Youyuan; Xu, Miao; Liang, Liming; Zhang, Haojiong; Xu, Ruihua; Feng, Qisheng; Feng, Lin; Luo, Bing; Zeng, Yi-Xin

    2017-10-01

    Epstein-Barr virus is a ubiquitous virus and is associated with several human malignances, including the significant subset of gastric carcinoma, Epstein-Barr virus-associated gastric carcinoma. Some Epstein-Barr virus-associated diseases are uniquely prevalent in populations with different geographic origins. However, the features of the disease and geographically associated Epstein-Barr virus genetic variation as well as the roles that the variation plays in carcinogenesis and evolution remain unclear. Therefore, in this study, we sequenced 95 geographically distinct Epstein-Barr virus isolates from Epstein-Barr virus-associated gastric carcinoma biopsies and saliva of healthy donors to detect variants and genes associated with gastric carcinoma and population structure from a genome-wide spectrum. We demonstrated that Epstein-Barr virus revealed the population structure between North China and South China. In addition, we observed population stratification between Epstein-Barr virus strains from gastric carcinoma and healthy controls, indicating that certain Epstein-Barr virus subtypes are associated with different gastric carcinoma risks. We identified that the BRLF1, BBRF3, and BBLF2/BBLF3 genes had significant associations with gastric carcinoma. LMP1 and BNLF2a genes were strongly geographically associated genes in Epstein-Barr virus. Our study provides insights into the genetic basis of oncogenic Epstein-Barr virus for gastric carcinoma, and the genetic variants associated with gastric carcinoma can serve as biomarkers for oncogenic Epstein-Barr virus.

  4. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination.

    Directory of Open Access Journals (Sweden)

    Caitlin Collins

    2018-02-01

    Full Text Available Genome-Wide Association Studies (GWAS in microbial organisms have the potential to vastly improve the way we understand, manage, and treat infectious diseases. Yet, microbial GWAS methods established thus far remain insufficiently able to capitalise on the growing wealth of bacterial and viral genetic sequence data. Facing clonal population structure and homologous recombination, existing GWAS methods struggle to achieve both the precision necessary to reject spurious findings and the power required to detect associations in microbes. In this paper, we introduce a novel phylogenetic approach that has been tailor-made for microbial GWAS, which is applicable to organisms ranging from purely clonal to frequently recombining, and to both binary and continuous phenotypes. Our approach is robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Thorough testing via application to simulated data provides strong support for the power and specificity of our approach and demonstrates the advantages offered over alternative cluster-based and dimension-reduction methods. Two applications to Neisseria meningitidis illustrate the versatility and potential of our method, confirming previously-identified penicillin resistance loci and resulting in the identification of both well-characterised and novel drivers of invasive disease. Our method is implemented as an open-source R package called treeWAS which is freely available at https://github.com/caitiecollins/treeWAS.

  5. Genome re-sequencing of semi-wild soybean reveals a complex Soja population structure and deep introgression.

    Directory of Open Access Journals (Sweden)

    Jie Qiu

    Full Text Available Semi-wild soybean is a unique type of soybean that retains both wild and domesticated characteristics, which provides an important intermediate type for understanding the evolution of the subgenus Soja population in the Glycine genus. In this study, a semi-wild soybean line (Maliaodou and a wild line (Lanxi 1 collected from the lower Yangtze regions were deeply sequenced while nine other semi-wild lines were sequenced to a 3-fold genome coverage. Sequence analysis revealed that (1 no independent phylogenetic branch covering all 10 semi-wild lines was observed in the Soja phylogenetic tree; (2 besides two distinct subpopulations of wild and cultivated soybean in the Soja population structure, all semi-wild lines were mixed with some wild lines into a subpopulation rather than an independent one or an intermediate transition type of soybean domestication; (3 high heterozygous rates (0.19-0.49 were observed in several semi-wild lines; and (4 over 100 putative selective regions were identified by selective sweep analysis, including those related to the development of seed size. Our results suggested a hybridization origin for the semi-wild soybean, which makes a complex Soja population structure.

  6. Functional analysis of the stem-loop structures at the 5' end of the Aichi virus genome

    International Nuclear Information System (INIS)

    Nagashima, Shigeo; Sasaki, Jun; Taniguchi, Koki

    2003-01-01

    Aichi virus is a member of the family Picornaviridae. Computer-assisted secondary structure prediction suggested the formation of three stem-loop structures (SL-A, SL-B, and SL-C from the 5' end) within the 5'-end 120 nucleotides of the genome. We have already shown that the most 5'-end stem-loop, SL-A, is critical for viral RNA replication. Here, using an infectious cDNA clone and a replicon harboring a luciferase gene, we revealed that formation of SL-B and SL-C on the positive strand is essential for viral RNA replication. In addition, the specific nucleotide sequence of the loop segment of SL-B was also shown to be critical for viral RNA replication. Mutations of the upper and lower stems of SL-C that do not disrupt the base-pairings hardly affected RNA replication, but decreased the yields of viable viruses significantly compared with for the wild-type. This suggests that SL-C plays a role at some step besides RNA replication during virus infection

  7. Geochip: A high throughput genomic tool for linking community structure to functions

    Energy Technology Data Exchange (ETDEWEB)

    Van Nostrand, Joy D.; Liang, Yuting; He, Zhili; Li, Guanghe; Zhou, Jizhong

    2009-01-30

    GeoChip is a comprehensive functional gene array that targets key functional genes involved in the geochemical cycling of N, C, and P, sulfate reduction, metal resistance and reduction, and contaminant degradation. Studies have shown the GeoChip to be a sensitive, specific, and high-throughput tool for microbial community analysis that has the power to link geochemical processes with microbial community structure. However, several challenges remain regarding the development and applications of microarrays for microbial community analysis.

  8. Viruses in the marine environment: community dynamics, phage-host interactions and genomic structure

    OpenAIRE

    Lara de la Casa, Elena

    2014-01-01

    There are an estimated 1030 viruses in the world oceans, the majority of which are phages (viruses that infect bacteria). Extensive research has demonstrated the significant influence of marine phages on microbial abundance, community structure, genetic exchange and global biogeochemical cycles. In this thesis, we contribute to increase the knowledge about the ecological role of viruses in marine systems, but also we aimed to provide a better understanding about the interactions between phage...

  9. A Genomic View of the Peopling and Population Structure of India

    Science.gov (United States)

    Majumder, Partha P.; Basu, Analabha

    2015-01-01

    Recent advances in molecular and statistical genetics have enabled the reconstruction of human history by studying living humans. The ability to sequence and study DNA by calibrating the rate of accumulation of changes with evolutionary time has enabled robust inferences about how humans have evolved. These data indicate that modern humans evolved in Africa about 150,000 years ago and, consistent with paleontological evidence, migrated out of Africa. And through a series of settlements, demographic expansions, and further migrations, they populated the entire world. One of the first waves of migration from Africa was into India. Subsequent, more recent, waves of migration from other parts of the world have resulted in India being a genetic melting pot. Contemporary India has a rich tapestry of cultures and ecologies. There are about 400 tribal groups and more than 4000 groups of castes and subcastes, speaking dialects of 22 recognized languages belonging to four major language families. The contemporary social structure of Indian populations is characterized by endogamy with different degrees of porosity. The social structure, possibly coupled with large ecological heterogeneity, has resulted in considerable genetic diversity and local genetic differences within India. In this essay, we provide genetic evidence of how India may have been peopled, the nature and extent of its genetic diversity, and genetic structure among the extant populations of India. PMID:25147176

  10. Structural basis of genomic RNA (gRNA) dimerization and packaging determinants of mouse mammary tumor virus (MMTV).

    Science.gov (United States)

    Aktar, Suriya J; Vivet-Boudou, Valérie; Ali, Lizna M; Jabeen, Ayesha; Kalloush, Rawan M; Richer, Delphine; Mustafa, Farah; Marquet, Roland; Rizvi, Tahir A

    2014-11-14

    One of the hallmarks of retroviral life cycle is the efficient and specific packaging of two copies of retroviral gRNA in the form of a non-covalent RNA dimer by the assembling virions. It is becoming increasingly clear that the process of dimerization is closely linked with gRNA packaging, and in some retroviruses, the latter depends on the former. Earlier mutational analysis of the 5' end of the MMTV genome indicated that MMTV gRNA packaging determinants comprise sequences both within the 5' untranslated region (5' UTR) and the beginning of gag. The RNA secondary structure of MMTV gRNA packaging sequences was elucidated employing selective 2'hydroxyl acylation analyzed by primer extension (SHAPE). SHAPE analyses revealed the presence of a U5/Gag long-range interaction (U5/Gag LRI), not predicted by minimum free-energy structure predictions that potentially stabilizes the global structure of this region. Structure conservation along with base-pair covariations between different strains of MMTV further supported the SHAPE-validated model. The 5' region of the MMTV gRNA contains multiple palindromic (pal) sequences that could initiate intermolecular interaction during RNA dimerization. In vitro RNA dimerization, SHAPE analysis, and structure prediction approaches on a series of pal mutants revealed that MMTV RNA utilizes a palindromic point of contact to initiate intermolecular interactions between two gRNAs, leading to dimerization. This contact point resides within pal II (5' CGGCCG 3') at the 5' UTR and contains a canonical "GC" dyad and therefore likely constitutes the MMTV RNA dimerization initiation site (DIS). Further analyses of these pal mutants employing in vivo genetic approaches indicate that pal II, as well as pal sequences located in the primer binding site (PBS) are both required for efficient MMTV gRNA packaging. Employing structural prediction, biochemical, and genetic approaches, we show that pal II functions as a primary point of contact between

  11. In situ optical sequencing and structure analysis of a trinucleotide repeat genome region by localization microscopy after specific COMBO-FISH nano-probing

    Science.gov (United States)

    Stuhlmüller, M.; Schwarz-Finsterle, J.; Fey, E.; Lux, J.; Bach, M.; Cremer, C.; Hinderhofer, K.; Hausmann, M.; Hildenbrand, G.

    2015-10-01

    Trinucleotide repeat expansions (like (CGG)n) of chromatin in the genome of cell nuclei can cause neurological disorders such as for example the Fragile-X syndrome. Until now the mechanisms are not clearly understood as to how these expansions develop during cell proliferation. Therefore in situ investigations of chromatin structures on the nanoscale are required to better understand supra-molecular mechanisms on the single cell level. By super-resolution localization microscopy (Spectral Position Determination Microscopy; SPDM) in combination with nano-probing using COMBO-FISH (COMBinatorial Oligonucleotide FISH), novel insights into the nano-architecture of the genome will become possible. The native spatial structure of trinucleotide repeat expansion genome regions was analysed and optical sequencing of repetitive units was performed within 3D-conserved nuclei using SPDM after COMBO-FISH. We analysed a (CGG)n-expansion region inside the 5' untranslated region of the FMR1 gene. The number of CGG repeats for a full mutation causing the Fragile-X syndrome was found and also verified by Southern blot. The FMR1 promotor region was similarly condensed like a centromeric region whereas the arrangement of the probes labelling the expansion region seemed to indicate a loop-like nano-structure. These results for the first time demonstrate that in situ chromatin structure measurements on the nanoscale are feasible. Due to further methodological progress it will become possible to estimate the state of trinucleotide repeat mutations in detail and to determine the associated chromatin strand structural changes on the single cell level. In general, the application of the described approach to any genome region will lead to new insights into genome nano-architecture and open new avenues for understanding mechanisms and their relevance in the development of heredity diseases.

  12. Genome instability in the context of chromatin structure and fragile sites

    Czech Academy of Sciences Publication Activity Database

    Bártová, Eva; Galiová-Šustáčková, Gabriela; Legartová, Soňa; Stixová, Lenka; Jugová, Alžbeta; Kozubek, Stanislav

    2010-01-01

    Roč. 20, č. 3 (2010), s. 181-194 ISSN 1045-4403 R&D Projects: GA MŠk ME 919; GA ČR(CZ) GAP302/10/1022 Grant - others:GA MŠk(CZ) LC06027; GA MŠk(CZ) LC535 Program:LC; LC Institutional research plan: CEZ:AV0Z50040507; CEZ:AV0Z50040702 Keywords : gene amplification * fragile sites * chromatin structure Subject RIV: BO - Biophysics Impact factor: 4.111, year: 2010

  13. Factor structure of cognition and functional capacity in two studies of schizophrenia and bipolar disorder: Implications for genomic studies.

    Science.gov (United States)

    Harvey, Philip D; Aslan, Mihaela; Du, Mengtian; Zhao, Hongyu; Siever, Larry J; Pulver, Ann; Gaziano, J Michael; Concato, John

    2016-01-01

    Impairments in cognition and everyday functioning are common in schizophrenia and bipolar disorder (BPD). In this article, we present factor analyses of cognitive and functional capacity (FC) measures based on 2 studies of schizophrenia (SCZ) and bipolar I disorder (BPI) using similar methods. The overall goal of these analyses was to determine whether performance-based assessments should be examined individually, or aggregated on the basis of the correlational structure of the tests, as well as to evaluate the similarity of factor structures of SCZ and BPI. Veterans Affairs Cooperative Studies Program Study #572 (Harvey et al., 2014) evaluated cognitive and FC measures among 5,414 BPI and 3,942 SCZ patients. A 2nd study evaluated similar neuropsychological (NP) and FC measures among 368 BPI and 436 SCZ patients. Principal components analysis, as well as exploratory and CFAs, were used to examine the data. Analyses in both datasets suggested that NP and FC measures were explained by a single underlying factor in BPI and SCZ patients, both when analyzed separately or as in a combined sample. The factor structure in both studies was similar, with or without inclusion of FC measures; homogeneous loadings were observed for that single factor across cognitive and FC domains across the samples. The empirically derived factor model suggests that NP performance and FC are best explained as a single latent trait applicable to people with SCZ and BPD. This single measure may enhance the robustness of the analyses relating genomic data to performance-based phenotypes. (c) 2015 APA, all rights reserved).

  14. Genomic Resources of Three Pulsatilla Species Reveal Evolutionary Hotspots, Species-Specific Sites and Variable Plastid Structure in the Family Ranunculaceae.

    Science.gov (United States)

    Szczecińska, Monika; Sawicki, Jakub

    2015-09-15

    The European continent is presently colonized by nine species of the genus Pulsatilla, five of which are encountered only in mountainous regions of southwest and south-central Europe. The remaining four species inhabit lowlands in the north-central and eastern parts of the continent. Most plants of the genus Pulsatilla are rare and endangered, which is why most research efforts focused on their biology, ecology and hybridization. The objective of this study was to develop genomic resources, including complete plastid genomes and nuclear rRNA clusters, for three sympatric Pulsatilla species that are most commonly found in Central Europe. The results will supply valuable information about genetic variation, which can be used in the process of designing primers for population studies and conservation genetics research. The complete plastid genomes together with the nuclear rRNA cluster can serve as a useful tool in hybridization studies. Six complete plastid genomes and nuclear rRNA clusters were sequenced from three species of Pulsatilla using the Illumina sequencing technology. Four junctions between single copy regions and inverted repeats and junctions between the identified locally-collinear blocks (LCB) were confirmed by Sanger sequencing. Pulsatilla genomes of 120 unique genes had a total length of approximately 161-162 kb, and 21 were duplicated in the inverted repeats (IR) region. Comparative plastid genomes of newly-sequenced Pulsatilla and the previously-identified plastomes of Aconitum and Ranunculus species belonging to the family Ranunculaceae revealed several variations in the structure of the genome, but the gene content remained constant. The nuclear rRNA cluster (18S-ITS1-5.8S-ITS2-26S) of studied Pulsatilla species is 5795 bp long. Among five analyzed regions of the rRNA cluster, only Internal Transcribed Spacer 2 (ITS2) enabled the molecular delimitation of closely-related Pulsatilla patens and Pulsatilla vernalis. The determination of complete

  15. ATLAS (Automatic Tool for Local Assembly Structures) - A Comprehensive Infrastructure for Assembly, Annotation, and Genomic Binning of Metagenomic and Metaranscripomic Data

    Energy Technology Data Exchange (ETDEWEB)

    White, Richard A.; Brown, Joseph M.; Colby, Sean M.; Overall, Christopher C.; Lee, Joon-Yong; Zucker, Jeremy D.; Glaesemann, Kurt R.; Jansson, Georg C.; Jansson, Janet K.

    2017-03-02

    ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.

  16. Clustering of 770,000 genomes reveals post-colonial population structure of North America

    Science.gov (United States)

    Han, Eunjung; Carbonetto, Peter; Curtis, Ross E.; Wang, Yong; Granka, Julie M.; Byrnes, Jake; Noto, Keith; Kermany, Amir R.; Myres, Natalie M.; Barber, Mathew J.; Rand, Kristin A.; Song, Shiya; Roman, Theodore; Battat, Erin; Elyashiv, Eyal; Guturu, Harendra; Hong, Eurie L.; Chahine, Kenneth G.; Ball, Catherine A.

    2017-02-01

    Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.

  17. Genome-wide population structure and admixture analysis reveals weak differentiation among Ugandan goat breeds.

    Science.gov (United States)

    Onzima, R B; Upadhyay, M R; Mukiibi, R; Kanis, E; Groenen, M A M; Crooijmans, R P M A

    2018-02-01

    Uganda has a large population of goats, predominantly from indigenous breeds reared in diverse production systems, whose existence is threatened by crossbreeding with exotic Boer goats. Knowledge about the genetic characteristics and relationships among these Ugandan goat breeds and the potent